Re: [PATCH] binary heap implementation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] binary heap implementation
Date: 2012-11-20 20:47:25
Message-ID: CA+TgmobvR7XW9fjj2RNY7sKK-VAG5nahfai_zV51rHVLDNvaBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 20, 2012 at 3:19 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> I don't have a fundamental problem with it, but I don't think it's worth
> the cost. If you have variable sized (from the point of the compiler)
> values you either have more complex offset computations to the
> individual elements or one more indirection due to a pointer array.
>
> Do you feel strongly about it? If so, go ahead, otherwise I would prefer
> the current way (with parameters changed to be Datum's of course).

Here's a revised patch that takes the approach of just changing void *
to Datum; it includes a bunch of other cleanups that I felt were
worthwhile. I tested this using the following setup:

CREATE TABLE sample (a int, b numeric);
DO $$
BEGIN
FOR i IN 1 .. 1000 LOOP
EXECUTE 'CREATE TABLE sample' || i || ' (CHECK (b = ' || i
|| ')) INHERITS (sample)';
EXECUTE 'INSERT INTO sample' || i
|| ' SELECT g, ' || i || ' FROM
generate_series(1,100000) g;';
EXECUTE 'ALTER TABLE sample' || i
|| ' ADD PRIMARY KEY (a)';
END LOOP;
END
$$;
VACUUM ANALYZE;

And this test query:

select sum(1) from (select * from sample order by a limit 250) x;

On my system, on repeated execution, this takes about 113 ms with
unpatched master and about 114 ms with the patch. I'm inclined to
think that's not worth getting excited about, as it could be simply
code shifting around across cache line boundaries and not indicative
of an actual regression due to the difference in logic. Even if there
is a regression, it's pretty darn small: most people will not have
1000 partitions, and those that do will often find query execution
time dominated by other factors. Against that, there is positive
value in having this code somewhat more abstracted.

With respect to the question of the API, no, I don't feel
overwhelmingly strongly that we have to whack the API with a bat
that's larger than the one I've used here. On the flip side, it
wouldn't surprise me if, as the code acquires more users, we find that
we actually do want to make some changes, either the one I already
proposed or something else. Fortunately, it's not a lot of code, so
if we end up refactoring it some more later, it shouldn't be a big
deal. So I'm happy enough to commit this the way I have it and sort
out the rest as we go. If there are no objections I'll go ahead and
do that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
binaryheap-rmh.patch application/octet-stream 16.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2012-11-20 20:47:56 review: plpgsql return a row-expression
Previous Message Alvaro Herrera 2012-11-20 20:36:14 Re: foreign key locks