Re: knngist - 0.8

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2011-02-18 06:07:54
Message-ID: 9909.1298009274@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> I've applied all of this, and written documentation for all of it,
> Thank you a lot
>> except for the contrib/btree_gist additions which still need to be
>> redone for the revised API (and then documented!). My patience ran out

> Done, btree_gist is reworked for a new API.

I did a quick look at this patch. The major problem with it is of
course that it needs to be fixed for the recent extension-related
changes. I transposed the .sql.in changes into additions to
btree_gist--1.0.sql (attached), but haven't really sanity-checked
them beyond checking that the regression tests pass. The same mods
would need to be made in btree_gist--unpackaged--1.0.sql.

However, I feel that this is not ready to apply even given those fixes.
Problems yet to solve:

1. oid_dist() returns oid ... really? Oid is unsigned. I'd be inclined
to argue though that distance between Oids is a meaningless concept, so
you should remove this not just mess with the result type. Anybody who
actually wants to form a distance between Oids should have to cast them
to an arithmetic type first. Let the user figure out how wraparound
cases should be handled.

2. Beyond that, none of the distance routines have given any thought to
avoiding overflow. For instance, dist_int2 had better return something
wider than int2, and so on up. It looks to me like the internal gist
distance functions also suffer overflow risks, in that they tend to form
the difference first (in the source datatype) and only afterwards cast
to float8.

3. I was surprised that there wasn't a distance implementation for
numeric. I suppose that this might be difficult to do without risking
overflow in conversion to float8, though.

4. I didn't much care for changing the result type of gbt_num_consistent
from bool to float8; that's just messy, and I don't see any compensating
advantage. I suggest you leave gbt_num_consistent and its callers
alone, and add a separate gbt_num_distance routine that only handles the
KNNDistance case.

There might be more issues, I haven't read the patch in detail.
But anyway, I'm going to set it to Waiting on Author. I think it
needs at least a day or so's work, and I can't put in that kind of
time on it now.

regards, tom lane

Attachment Content-Type Size
btree_gist.sql.patch text/x-patch 12.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gan Jiadong 2011-02-18 06:42:02 Re: About the performance of startup after dropping many tables
Previous Message Itagaki Takahiro 2011-02-18 05:35:05 Re: CommitFest 2011-01 as of 2011-02-04