Re: [PATCH] binary heap implementation

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] binary heap implementation
Date: 2012-11-20 20:19:38
Message-ID: 20121120201938.GB26019@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2012-11-20 15:06:58 -0500, Robert Haas wrote:
> On Tue, Nov 20, 2012 at 2:29 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2012-11-20 14:03:12 -0500, Robert Haas wrote:
> >> On Thu, Nov 15, 2012 at 8:56 PM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com> wrote:
> >> > [ new patch ]
> >>
> >> I went over this again today with a view to getting it committed, but
> >> discovered some compiler warnings that look like this:
> >>
> >> warning: cast to pointer from integer of different size
> >>
> >> The problem seems to be that the binary heap implementation wants the
> >> key and value to be a void *, but the way it's being used in
> >> nodeMergeAppend.c the value is actually an int. I don't think we want
> >> to commit a binary heap implementation which has an impedance mismatch
> >> of this type with its only client. The obvious solution seems to be
> >> to make the key and value each a Datum, and then clients can use
> >> WhateverGetDatum and DatumGetWhatever to pass whatever built-in data
> >> type they happen to have. I'm trying that approach now and will post
> >> an updated patch based on that approach if it seems to work OK and
> >> nobody suggests something better between now and then.
> >
> > Sounds like a good plan, one other alternative would have been declaring
> > the parameters a size_t (and thus explicitly always big enough to store
> > a pointer, but also valid to store inline data) instead of using a
> > void*. But given there is DatumGetPointer/PointerGetDatum et al, using
> > the postgres-y datatype seems fine to me.
> >
> > In the LCR case those really are pointers, so preserving that case is
> > important to me ;), but your proposal does that, so ...

> On further reflection, I'm wondering if it wouldn't be a smarter idea
> to just get rid of the binaryheap_node concept altogether. Instead,
> when you allocate a binary heap, you specify an "element size" as well
> as a capacity. Then, the binary heap can truly treat the elements as
> an opaque array of bytes rather than a struct of any particular type.
> Of course, the disadvantage of this approach is that it's likely to be
> slightly slower, because we'll need to do a multiplication by a
> variable rather than simple bit shift. But the extra flexibility
> might be worth it, because for example for merge append this is kind
> of inefficient from a storage perspective. We only really need N+1
> pointers, but we've got storage for 2N. With the above change plus a
> user-specified third argument for the comparator function (passed as
> yet another argument to binaryheap_allocate) we could get around that
> while preserving the option for LCR or any other client code to do as
> it sees fit.

> Thoughts?

I don't have a fundamental problem with it, but I don't think it's worth
the cost. If you have variable sized (from the point of the compiler)
values you either have more complex offset computations to the
individual elements or one more indirection due to a pointer array.

Do you feel strongly about it? If so, go ahead, otherwise I would prefer
the current way (with parameters changed to be Datum's of course).

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2012-11-20 20:36:14 Re: foreign key locks
Previous Message Robert Haas 2012-11-20 20:06:58 Re: [PATCH] binary heap implementation