Re: [PATCH] binary heap implementation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] binary heap implementation
Date: 2012-11-20 23:01:10
Message-ID: 17109.1353452470@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Here's a revised patch that takes the approach of just changing void *
> to Datum; it includes a bunch of other cleanups that I felt were
> worthwhile.

The comment in binaryheap.h says

* For a max-heap, the comparator must return -1 iff a < b, 0 iff a == b,
* and +1 iff a > b. For a min-heap, the conditions are reversed.

Is it actually required that the output be exactly these three values,
or is the specification really <0, 0, >0 ? If the latter, it should
say that, rather than overdefine the implementation of comparison.

Also, wouldn't it be reasonable to const-ify the comparator function, ie

typedef int (*binaryheap_comparator) (const binaryheap_node *a, const binaryheap_node *b);

* nodes the first of a list of "space" nodes

s/list/array/, no? Also, it's standard to mark this sort of hack with
/* VARIABLE LENGTH ARRAY */, or even better use FLEXIBLE_ARRAY_MEMBER
(which would require fixing the space calculation in
binaryheap_allocate, not that that would be a bad thing anyway).

left_offset and friends are defined as taking size_t and returning int,
which is broken on machines where size_t is wider than int, or even on
machines where they're the same width since int is signed. Personally
I'd replace pretty much every single usage of size_t in this module with
int, as I am unconvinced either that we need 8-byte indexes or that the
logic is correct if it tries to form index -1 (as would happen for
example with binaryheap_build applied to an empty heap).

The Assert() in binaryheap_add_unordered fills me with no confidence.
If we're not going to support expansion of the array (which might be a
better idea anyway) then this needs to be an actual run-time check, not
something that won't be there in production.

What I think *would* be worth asserting for is whether the heap is
correctly ordered or not --- that is, I think you should add a flag that
is cleared by binaryheap_add_unordered and set by binaryheap_build, and
then the functions that presume correct ordering should assert it's set.
An Assert in binaryheap_replace_first that the heap is not empty seems
appropriate as well.

This in binaryheap_replace_first is simply bizarre:

if (key)
heap->nodes[0].key = key;
if (val)
heap->nodes[0].value = val;

Why are these assignments conditional? If they're sane at all, they
should not be testing the Datums directly in any case, but the result of
DatumGetSomething.

static int32
! heap_compare_slots(binaryheap_node *a, binaryheap_node *b)
{
+ MergeAppendState *node = (MergeAppendState *) a->value;

This should use DatumGetPointer(a->value), and the function result
should be int not int32 to comport with the typedef for it.

While I'm thinking about it, why are the fields of a binaryheap_node
called "key" and "value"? That implies a semantics not actually used
here. Maybe "value1" and "value2" instead?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2012-11-20 23:39:31 Re: StrategyGetBuffer questions
Previous Message Jeff Janes 2012-11-20 22:50:46 Re: StrategyGetBuffer questions