Re: [PATCH] binary heap implementation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] binary heap implementation
Date: 2012-11-20 20:06:58
Message-ID: CA+TgmoaajxCvcuJBQBx0+Kvz4bq=cuKwxggLi-v+3+NPKgwzGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 20, 2012 at 2:29 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2012-11-20 14:03:12 -0500, Robert Haas wrote:
>> On Thu, Nov 15, 2012 at 8:56 PM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com> wrote:
>> > [ new patch ]
>>
>> I went over this again today with a view to getting it committed, but
>> discovered some compiler warnings that look like this:
>>
>> warning: cast to pointer from integer of different size
>>
>> The problem seems to be that the binary heap implementation wants the
>> key and value to be a void *, but the way it's being used in
>> nodeMergeAppend.c the value is actually an int. I don't think we want
>> to commit a binary heap implementation which has an impedance mismatch
>> of this type with its only client. The obvious solution seems to be
>> to make the key and value each a Datum, and then clients can use
>> WhateverGetDatum and DatumGetWhatever to pass whatever built-in data
>> type they happen to have. I'm trying that approach now and will post
>> an updated patch based on that approach if it seems to work OK and
>> nobody suggests something better between now and then.
>
> Sounds like a good plan, one other alternative would have been declaring
> the parameters a size_t (and thus explicitly always big enough to store
> a pointer, but also valid to store inline data) instead of using a
> void*. But given there is DatumGetPointer/PointerGetDatum et al, using
> the postgres-y datatype seems fine to me.
>
> In the LCR case those really are pointers, so preserving that case is
> important to me ;), but your proposal does that, so ...

On further reflection, I'm wondering if it wouldn't be a smarter idea
to just get rid of the binaryheap_node concept altogether. Instead,
when you allocate a binary heap, you specify an "element size" as well
as a capacity. Then, the binary heap can truly treat the elements as
an opaque array of bytes rather than a struct of any particular type.
Of course, the disadvantage of this approach is that it's likely to be
slightly slower, because we'll need to do a multiplication by a
variable rather than simple bit shift. But the extra flexibility
might be worth it, because for example for merge append this is kind
of inefficient from a storage perspective. We only really need N+1
pointers, but we've got storage for 2N. With the above change plus a
user-specified third argument for the comparator function (passed as
yet another argument to binaryheap_allocate) we could get around that
while preserving the option for LCR or any other client code to do as
it sees fit.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-11-20 20:19:38 Re: [PATCH] binary heap implementation
Previous Message Alexander Korotkov 2012-11-20 20:01:51 Re: WIP: index support for regexp search