Re: GiST kNN search queue (Re: KNN-GiST with recheck)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GiST kNN search queue (Re: KNN-GiST with recheck)
Date: 2014-12-20 20:50:09
Message-ID: CAM3SWZQOBog8FiWA5WOYyATzm5y0ac8bj077F0vGqg__1HOMYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 17, 2014 at 6:07 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> How about adding a src/backend/lib/README for that, per attached?

I took a quick look at this. Some observations:

* I like the idea of adding a .../lib README. However, the README
fails to note that ilist.c implements an *integrated* list, unlike the
much more prevalent cell-based List structure. It should note that,
since that's the whole point of ilist.c.

* pairingheap_remove() is technically dead code. It still makes sense
that you'd have it in this patch, but I think there's an argument for
not including it at all on the theory that if you need to use it you
should use a different data structure. After all, the actual
(non-amortized) complexity of that operation is O(n) [1], and if
remove operations are infrequent as we might expect, that might be the
more important consideration. As long as you are including
pairingheap_remove(), though, why is the local variable "prev_ptr" a
pointer to a pointer to a pairingheap_node, rather than just a pointer
to a pairingheap_node?

* Similarly, the function-like macro pairingheap_reset() doesn't seem
to pull its weight. Why does it exist alongside pairingheap_free()?
I'm not seeing a need to re-use a heap like that.

* "Assert(!pairingheap_is_empty(heap))" appears in a few places.
You're basically asserting that a pointer isn't null, often
immediately before dereferencing the pointer. This seems to be of
questionable value.

* I think that the indentation of code could use some tweaking.

* More comments, please. In particular, comment the struct fields in
pairingheap_node. There are various blocks of code that could use at
least an additional terse comment, too.

* You talked about tuplesort.c integration. In order for that to
happen, I think the comparator logic should know less about min-heaps.
This should formally be a max-heap, with the ability to provide
customizations only encapsulated in the comparator (like inverting the
comparison logic to get a min-heap, or like custom NULLs first/last
behavior). So IMV this comment should be more generic/anticipatory:

+ /*
+ * For a max-heap, the comparator must return <0 iff a < b, 0 iff a == b,
+ * and >0 iff a > b. For a min-heap, the conditions are reversed.
+ */
+ typedef int (*pairingheap_comparator) (const pairingheap_node *a,
const pairingheap_node *b, void *arg);

I think the functions should be called pairing_max_heap* for this
reason, too. Although that isn't consistent with binaryheap.c, so I
guess this whole argument is a non-starter.

* We should just move rbtree.c to .../lib. We're not using CVS anymore
-- the history will be magically preserved.

Anyway, to get to the heart of the matter: in general, I think the
argument for the patch is sound. It's not a stellar improvement, but
it's worthwhile. That's all I have for now...

[1] https://www.cise.ufl.edu/~sahni/dsaac/enrich/c13/pairing.htm
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-12-20 21:14:25 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Previous Message Jim Nasby 2014-12-20 20:13:37 Re: hash_create API changes (was Re: speedup tidbitmap patch: hash BlockNumber)