Re: knngist - 0.8

Lists: pgsql-hackers
From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: knngist - 0.8
Date: 2010-07-22 15:30:44
Message-ID: 4C486424.2010808@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

http://www.sigaev.ru/misc/builtin_knngist_core-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_itself-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_proc-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_contrib_pg_trgm-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_contrib_btree_gist-0.8.gz

Changes:
* pg_amop has boolean column amoporder which points to clause's usage of
operator
* Syntax of CREATE OPERATOR CLASS
CREATE OPERATOR CLASS ...
[ORDER] OPERATOR ....
ORDER OPERATOR is marked with pg_amop.amoporder = true
* Bool-returning operator could not be used as ORDER OPERATOR, but type of
returning value still should have a default Btree operator class.
* Added flag SK_ORDER to SkanKey flag to indicate order operation, so access
methods (only GiST right now) should check this flag (in previous versions of
patch GiST checked returned value of operator's function)

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-09-07 00:42:53
Message-ID: AANLkTim1+Vfgy-tuae=xQYjp6tpKmvZvqAFob1AsSF9H@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/22 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
> http://www.sigaev.ru/misc/builtin_knngist_core-0.8.gz
> http://www.sigaev.ru/misc/builtin_knngist_itself-0.8.gz
> http://www.sigaev.ru/misc/builtin_knngist_proc-0.8.gz
> http://www.sigaev.ru/misc/builtin_knngist_contrib_pg_trgm-0.8.gz
> http://www.sigaev.ru/misc/builtin_knngist_contrib_btree_gist-0.8.gz
>
> Changes:
> * pg_amop has boolean column amoporder which points to clause's usage of
>  operator
> * Syntax of CREATE OPERATOR CLASS
>     CREATE OPERATOR CLASS ...
>         [ORDER] OPERATOR ....
>  ORDER OPERATOR is marked with pg_amop.amoporder = true
> * Bool-returning operator could not be used as ORDER OPERATOR, but type of
>  returning value still should have a default Btree operator class.
> * Added flag SK_ORDER to SkanKey flag to indicate order operation, so access
>  methods (only GiST right now) should check this flag (in previous versions
> of
>  patch GiST checked returned value of operator's function)

AFAICS, these patches include no documentation. That's pretty much a
fatal flaw for a feature of this magnitude. At an absolute minimum,
you need to update the system catalog documentation and the
documentation on CREATE / ALTER OPERATOR CLASS. There might be some
other places that need to be touched, also.

+ if (opform->oprresult == BOOLOID)
+ ereport(ERROR,
+
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("index ordering
operators must not return boolean")));

My first thought was that this code was there to prevent people from
doing the wrong thing by accident. But I have a niggling feeling that
you're actually relying on this for the correctness of the system. I
hope I'm wrong, because I don't think that would be a very good idea.

The GIST code code use more comments; and perhaps the names of some of
the functions and structures could be chosen to be more descriptive.
I think that what used to be called GISTSearchStack has apparently
been replaced with DataPointer; it's not obvious to me that it's good
to change the name, but if it is I don't think DataPointer is a good
choice. gistindex_keytest has been replaced (sort of) by
processIndexTuple, which again seems more generic than what it
replaced.

Minor nit: the word "shoould" is mis-spelled.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2010-09-07 14:54:14
Message-ID: 4C865216.1090305@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> http://www.sigaev.ru/misc/builtin_knngist_core-0.8.gz
>> http://www.sigaev.ru/misc/builtin_knngist_itself-0.8.gz
>> http://www.sigaev.ru/misc/builtin_knngist_proc-0.8.gz
>> http://www.sigaev.ru/misc/builtin_knngist_contrib_pg_trgm-0.8.gz
>> http://www.sigaev.ru/misc/builtin_knngist_contrib_btree_gist-0.8.gz

New version, synced with CVS HEAD
http://www.sigaev.ru/misc/builtin_knngist_itself-0.8.2.gz
http://www.sigaev.ru/misc/builtin_knngist_contrib_btree_gist-0.8.1.gz

> AFAICS, these patches include no documentation. That's pretty much a
> fatal flaw for a feature of this magnitude. At an absolute minimum,
> you need to update the system catalog documentation and the
> documentation on CREATE / ALTER OPERATOR CLASS. There might be some
> other places that need to be touched, also.
Oleg promised to do that

> + if (opform->oprresult == BOOLOID)
> + ereport(ERROR,
> +
> (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
> + errmsg("index ordering
> operators must not return boolean")));
>
> My first thought was that this code was there to prevent people from
> doing the wrong thing by accident. But I have a niggling feeling that
> you're actually relying on this for the correctness of the system. I
> hope I'm wrong, because I don't think that would be a very good idea.

This play is around do we really want to have support of boolean-distance in
GiST? I think no, because it's a strange idea to measure distance in true/false
measurement units. I can't imagine such real-life distance definition and never
heard about that.

Next, pg_amop_opr_fam_index requires uniqueness of operation in operation family
and a lot of places in planner believes in that. Suppose, changing that requires
a lot of work which has the single aim to support boolean distance in ORDER BY
clause.

> The GIST code code use more comments; and perhaps the names of some of
> the functions and structures could be chosen to be more descriptive.
> I think that what used to be called GISTSearchStack has apparently
> been replaced with DataPointer; it's not obvious to me that it's good
> to change the name, but if it is I don't think DataPointer is a good
GISTSearchStack is replaced by RBTree (GISTScanOpaqueData->stack), tree's nodes
contain a StackElem struct which represents list of pointers at the same
distance. Each pointer could be a pointer to the inner index's page or to the
heap's tuple and this struct is a DataPointer.

Note, list of DataPointer in StackElem struct is organized by non-obvious way:
we keep pointer to the head of list and pointer to the middle of list. New
pointer-to-heap is inserted in the beginning of list, pointers-to-index-page -
in the middle. That's done because we would like to:
1) pop pointers-to-heap as fast as possible, before any pointers-to-index-page
2) pop pointers-to-index-page to deep page (which is closer to leaf pages)
first. That's good for KNN performance and emulates classical first-depth
search in ordinary search.

> choice. gistindex_keytest has been replaced (sort of) by
> processIndexTuple, which again seems more generic than what it
> replaced.
Renamed, comments are improved

> Minor nit: the word "shoould" is mis-spelled.
fixed

BTW, now consistentFn is able to "manage" tree traversal - even for for ordinary
search, GiST will choose child page with minimal distance.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2010-09-10 01:08:57
Message-ID: AANLkTi=Y2D=v5BaOhcYcXv68XC5EyTMLsihH+_mdEYg2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/9/7 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
>> AFAICS, these patches include no documentation.  That's pretty much a
>> fatal flaw for a feature of this magnitude.  At an absolute minimum,
>> you need to update the system catalog documentation and the
>> documentation on CREATE / ALTER OPERATOR CLASS.  There might be some
>> other places that need to be touched, also.
>
> Oleg promised to do that

Fair enough, but where is it? It's kind of difficult even to review
this without some documentation, and you wrote this patch in 2009.
And as Tom pointed out the last time we reviewed this, lack of
documentation is sufficient grounds for rejection in and of itself.

http://archives.postgresql.org/pgsql-hackers/2010-02/msg00820.php

>> +               if (opform->oprresult == BOOLOID)
>> +                       ereport(ERROR,
>> +
>> (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
>> +                                        errmsg("index ordering
>> operators must not return boolean")));
>>
>> My first thought was that this code was there to prevent people from
>> doing the wrong thing by accident.  But I have a niggling feeling that
>> you're actually relying on this for the correctness of the system.  I
>> hope I'm wrong, because I don't think that would be a very good idea.
>
> This play is around do we really want to have support of boolean-distance in
> GiST? I think no, because it's a strange idea to measure distance in
> true/false measurement units. I can't imagine such real-life distance
> definition and never heard about that.
>
> Next, pg_amop_opr_fam_index requires uniqueness of operation in operation
> family and a lot of places in planner believes in that. Suppose, changing
> that requires a lot of work which has the single aim to support boolean
> distance in ORDER BY clause.

Tom and I both expressed the opinion 7 months ago that we don't think
this design is acceptable.

http://archives.postgresql.org/pgsql-hackers/2010-02/msg01063.php

I'm not sure why you expect it to be acceptable now if it wasn't
acceptable then. I'm sort of surprised that you haven't taken the
intervening 7 months to rework it along the lines discussed. We had a
very long and detailed discussion about how to make it work, and
committed a huge, invasive patch that Tom's going to be grumpy about
for years to provide the infrastructure for 5-key syscaches --
specifically so you'd be able change pg_amop_fam_strat_index to use a
5-part key. I would actually think that would be a fairly mechanical
transformation.

It seems to me that there is not very much point in reviewing this
further until you incorporate the feedback that was given last time,
and specifically the two major points discussed above.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2010-09-13 13:22:24
Message-ID: 4C8E2590.6040802@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

http://www.sigaev.ru/misc/builtin_knngist_core-0.9.gz
http://www.sigaev.ru/misc/builtin_knngist_itself-0.8.2.gz
http://www.sigaev.ru/misc/builtin_knngist_proc-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_contrib_pg_trgm-0.8.gz
http://www.sigaev.ru/misc/builtin_knngist_contrib_btree_gist-0.8.1.gz

>>> + if (opform->oprresult == BOOLOID)
>>> + ereport(ERROR,
>>> +
>>> (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
>>> + errmsg("index ordering
>>> operators must not return boolean")));

New version allows to use boolean distance, i.e. the same operator could be used
both in WHERE and ORDER BY clauses. Now operator could present in operator
family twice, but with different StrategyNumber: as ordinary search operator (in
WHERE clause) and as an order operator (ORDER BY).
So, list of changes:
1) "pg_amop_opr_fam_index" UNIQUE, btree (amopopr, amopfamily) =>
"pg_amop_opr_fam_index" UNIQUE, btree (amopopr, amopfamily, amoporder)
2) op_in_opfamily, get_op_opfamily_strategy, get_op_opfamily_properties accept
one more argument, bool orderop, to point which kind of operator we need to
find
3) bool OpExpr.orderop. This field is needed to correctly set SK_ORDER flags
for index scan (ExecIndexBuildScanKeys function). SK_ORDER flag points type
of tree traversal algorithm to GiST core.

Introducing two fields, per Tom's suggestion, doesn't resolve following issues:
- we still need to distinguish WHERE and ORDER BY operator in
ExecIndexBuildScanKey, as it done in new version of patch
- When we execute AMOPOPID cache request we still need to check pg_amop.amorder
or introduce two new indexes (amopopr, amopfamily, amopcanwhere) and
(amopopr, amopfamily, amopcanorder) and the same changes in lsyscache's
functions
- If one operation could be used for both usage type then it will be good to
distinguish them in consistent method. New version requires to use
different strategy number for each usage type.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2010-09-15 03:22:58
Message-ID: AANLkTimfqa-RUn9w-u=1C=t_8kP3frVJ7WdmhBXqnRTV@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/9/13 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
> [updated patch]

I realize I'm repeating myself, but... this patch needs
documentation. It's not optional.

It seems to me that you need to do something about AMOPSTRATEGY.
Hence the five-key syscaches patches I wrote that is sitting in queue.
And also LookupOpClassInfo(). Am I confused?

Is there a reason we're using a boolean 'amoporder' distinguish
ordering operators from regular old operators? Tom and I were talking
about some kind of an integer field, in case we need more categories
later. You know, like 'amopcategory' or something like that. It
could be just one byte, perhaps, but one bit seems unduly narrow. You
could define constants OPCAT_QUAL and OPCAT_ORDER, or something like
that.

This error message seems like it ought to be complaining about the
access method, not the index:

+ if (!pg_am->amcanorderbyop)
+ ereport(ERROR,
+
(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+ errmsg("index doesn't support
ordering operations")));

I sort of understand the reasons behind the following restriction, but
is this truly the best we can do?

+ /*
+ * Currently, only descending and nulls last ordering
is supported
+ */
+ if (!(pathkey->pk_strategy == BTLessStrategyNumber &&
pathkey->pk_nulls_first == false))
+ return;

What happens if we have an index strategy that can efficiently return
the points most distant from the bright center of the universe?

By the way:

gistproc.c: In function ‘gist_point_consistent’:
gistproc.c:1023: error: ‘distance’ may be used uninitialized in this function

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: knngist patch preliminary review (2010-09 commitfest)
Date: 2010-09-22 07:29:55
Message-ID: 8762xy8ivg.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I haven't quite finished reviewing this, but here is the position so
far. I'm going to continue with some performance and other tests, but
I'm posting this for discussion in the mean time.

1) General patch issues

- applies cleanly and passes regression

- one small warning with ecpg parser build, which I assume is just due
to the patch having touched the main parser in a way the ecpg build
doesn't expect (presumably this will be cleaned up at some later stage)

- *no* documentation (see below)

2) Design questions

Reading through the previous discussion on -hackers, I didn't get the
impression that there was agreement on the question of how to
represent the ordering operators in the catalog. This patch takes the
approach of adding a boolean column pg_amop.amoporder and doing
changes that require touching unrelated code in a fair number of places.

In addition there are the points Robert Haas expressed in his earlier
response.

This approach doesn't feel right to me, but I don't know enough about
it (especially any possible other features that might want to do
something similar) to express a strong opinion on it. So that point is
open for discussion.

3) Documentation

There are problems with the GiST docs that go much deeper than this
patch alone; the manual sections on writing GiST support functions are
wholly inadequate, and many features that have been around for a long
time, such as secondary split in GiST picksplit functions, are essentially
undocumented other than as (uncommented!) example code in contrib modules.

This patch would, as it stands, make that issue worse.

My opinion is that the gist-implementation section of the docs needs
to be substantially expanded so that it explains, for each support
function, exactly what the function is expected to do.

If it would help, I'm prepared to put some time towards actually
writing something up for this.

--
Andrew (irc:RhodiumToad)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: knngist patch preliminary review (2010-09 commitfest)
Date: 2010-09-22 14:30:45
Message-ID: 24889.1285165845@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> My opinion is that the gist-implementation section of the docs needs
> to be substantially expanded so that it explains, for each support
> function, exactly what the function is expected to do.

Yes, the GIST (and GIN) parts of the docs are really pretty bad.

> If it would help, I'm prepared to put some time towards actually
> writing something up for this.

That would be outstanding.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-hackers(at)postgresql(dot)org, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: knngist patch preliminary review (2010-09 commitfest)
Date: 2010-09-22 15:12:31
Message-ID: AANLkTi=tbQCCPTDJfrLLOH9cDqnz0U6Yuwgz78nOhntN@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 22, 2010 at 10:30 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> If it would help, I'm prepared to put some time towards actually
>> writing something up for this.
>
> That would be outstanding.

+1. Or really, plus several.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-05 21:05:43
Message-ID: Pine.LNX.4.64.1010060018200.466@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert,

Are you sure postgres doesn't have limitation at all ? There are a lot ot them.
Of course, there are limitation and limitation and we should avoid limitations,
which will require incompatible changes in future in user's code, or which
prevent future improvements (getting rid limitation). We suggested
implementation of k-nn search, which has limitations, but in my opinion,
they are rather small and doesn't prevent in future to
write a descent patch, based on your five-key syscaches patches,
which will touch a lot of places in the pg source.

Who need boolean distance ? Ok, you insisted, we now support it.
Teodor wrote arguments (http://archives.postgresql.org/message-id/4C8E2590.6040802@sigaev.ru)
why two fields solution doesn't really helped.

You want "the points most distant from the bright center of the universe?",
sorry, for now. I think, this is a limitation we can live with for now.
It's k-nearest neighbourhood search, and not k-furthest outliers search.

You want documentation for review, I don't believe a reviewer can't review
without users documentation like CREATE/ALTER OPERATOR CLASS, etc.
Andrew Gierth was true,
that GiST documentation needed to be rewritten and he suggested to do that if
I understand him. I'd love to help, but I don't have any message from him.
If he changed his mind, I'll describe these changes.

We're not full-time pg-employers and it's not easy for us to follow
new cleaner-way. I think there is a risk, that there are will be lesser big
features introduced by non-pg employers in the future and eventually,
pg will be open-source database developed by pg-employers with some
amount cosmetic changes and fixes.

I suggest to find a sensible consensus on implementation, taking into
account Pareto Rule, and leave place for future improvement.

Oleg

On Tue, 14 Sep 2010, Robert Haas wrote:

> 2010/9/13 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
>> [updated patch]
>
> I realize I'm repeating myself, but... this patch needs
> documentation. It's not optional.
>
> It seems to me that you need to do something about AMOPSTRATEGY.
> Hence the five-key syscaches patches I wrote that is sitting in queue.
> And also LookupOpClassInfo(). Am I confused?
>
> Is there a reason we're using a boolean 'amoporder' distinguish
> ordering operators from regular old operators? Tom and I were talking
> about some kind of an integer field, in case we need more categories
> later. You know, like 'amopcategory' or something like that. It
> could be just one byte, perhaps, but one bit seems unduly narrow. You
> could define constants OPCAT_QUAL and OPCAT_ORDER, or something like
> that.
>
> This error message seems like it ought to be complaining about the
> access method, not the index:
>
> + if (!pg_am->amcanorderbyop)
> + ereport(ERROR,
> +
> (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
> + errmsg("index doesn't support
> ordering operations")));
>
> I sort of understand the reasons behind the following restriction, but
> is this truly the best we can do?
>
> + /*
> + * Currently, only descending and nulls last ordering
> is supported
> + */
> + if (!(pathkey->pk_strategy == BTLessStrategyNumber &&
> pathkey->pk_nulls_first == false))
> + return;
>
> What happens if we have an index strategy that can efficiently return
> the points most distant from the bright center of the universe?
>
> By the way:
>
> gistproc.c: In function ?gist_point_consistent?:
> gistproc.c:1023: error: ?distance? may be used uninitialized in this function
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-05 22:38:28
Message-ID: AANLkTikT5fDNXAgZqDi2mun0ozzAqfHXzQq4UdztEzpL@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 5, 2010 at 5:05 PM, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
> Are you sure postgres doesn't have limitation at all ? There are a lot ot
> them. Of course, there are limitation and limitation and we should avoid
> limitations, which will require incompatible changes in future in user's
> code, or which prevent future improvements (getting rid limitation). We
> suggested implementation of k-nn search, which has limitations, but in my
> opinion, they are rather small and doesn't prevent in future to
> write a descent patch, based on your five-key syscaches patches,
> which will touch a lot of places in the pg source.
>
> Who need boolean distance ? Ok, you insisted,  we now support it. Teodor
> wrote arguments
> (http://archives.postgresql.org/message-id/4C8E2590.6040802@sigaev.ru)
> why two fields solution doesn't really helped.
>
> You want "the points most distant from the bright center of the universe?",
> sorry, for now. I think, this is a limitation we can live with for now. It's
> k-nearest neighbourhood search, and not k-furthest outliers search.
>
> You want documentation for review, I don't believe a reviewer can't review
> without users documentation like CREATE/ALTER OPERATOR CLASS, etc. Andrew
> Gierth was true,
> that GiST documentation needed to be rewritten and he suggested to do that
> if I understand him. I'd love to help, but I don't have any message from
> him.
> If he changed his mind, I'll describe these changes.
>
> We're not full-time pg-employers and it's not easy for us to follow new
> cleaner-way. I think there is a risk, that  there are will be lesser big
> features introduced by non-pg employers in the future and eventually, pg
> will be open-source database developed by pg-employers with some amount
> cosmetic changes and fixes.
>
> I suggest to find a sensible consensus on implementation, taking into
> account Pareto Rule, and leave place for future improvement.

I am sorry my review pissed you off, as it seems to have done.
Looking back on it, I realize now that it was phrased more harshly
than it needed to be. That having been said, it seems to me that we
really are at an impasse and I am not sure what to propose as a way
forward. Tom and I laid out a technical design back in January and I
still think it's a good one, even though I know you think it's silly.
We may just have to agree to disagree on this point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-15 15:39:40
Message-ID: 4CB875BC.7090406@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> forward. Tom and I laid out a technical design back in January and I
> still think it's a good one, even though I know you think it's silly.
> We may just have to agree to disagree on this point.
As I remember, there were several suggested designs:
1) 5-th boolean column (amopfamily, amoplefttype, amoprighttype, amopstrategy,
amoporder) to point kind of operator (search or order)
+ saves one record for operator in pg_amop
- operator could not be used in both roles
- increase number of arguments for syscache machinery
2) 5-th combined column, which contains some kind of flag for each role
+ saves one record for operator in pg_amop
+ operator could be used in both roles
- strategy number for operator is the same for both roles, it's unacceptable
because GiST's consistentFn will not have information about role. GiST
itself could distinguish them by invented SK_ORDER flag. So, this
requires to introduce one more argument for consistentFn, while it
already has 5 arguments.
- increase number of arguments for syscache machinery
3) 3-rd boolean column (amopopr, amopfamily, amoporder)
- could be two records per operator
+ operator could be used in both roles
+ strategy number could be different for different roles

All three options require to add flag of role
op_in_opfamily/get_op_opfamily_strategy/get_op_opfamily_properties to check
applicability of operation in current code path. First two options could do not
change of interface of op_in_opfamily/get_op_opfamily_strategy but it will be
needed to check actual role of operator later.

Basing on this comparison, I think, that 2) is worse and 3) is better.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-15 16:02:33
Message-ID: 21784.1287158553@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> As I remember, there were several suggested designs:
> 1) 5-th boolean column (amopfamily, amoplefttype, amoprighttype, amopstrategy,
> amoporder) to point kind of operator (search or order)
> + saves one record for operator in pg_amop
> - operator could not be used in both roles
> - increase number of arguments for syscache machinery
> 2) 5-th combined column, which contains some kind of flag for each role
> + saves one record for operator in pg_amop
> + operator could be used in both roles
> - strategy number for operator is the same for both roles, it's unacceptable
> because GiST's consistentFn will not have information about role. GiST
> itself could distinguish them by invented SK_ORDER flag. So, this
> requires to introduce one more argument for consistentFn, while it
> already has 5 arguments.
> - increase number of arguments for syscache machinery
> 3) 3-rd boolean column (amopopr, amopfamily, amoporder)
> - could be two records per operator
> + operator could be used in both roles
> + strategy number could be different for different roles

How can #3 work at all? It's ignoring a couple of critical index
columns. In particular, I believe the sticking point here is this
unique index:

"pg_amop_fam_strat_index" UNIQUE, btree (amopfamily, amoplefttype, amoprighttype, amopstrategy)

#3 doesn't explain what to do about this index. If we extend it to
five columns, as we'd logically have to do to preserve uniqueness,
then the associated syscache has to have five key columns as well,
and now we're at solution #1.

I'm not terribly thrilled with using a boolean here in any case.
Now that we have two "roles" an operator might play in an opclass,
who's to say there might not be more roles someday? We should use
a column type that will support more than two roles without basic
rejiggering.

BTW, have we discussed the idea of embedding the role in the strategy
number? For example, require regular operators to have strategy
numbers 1-9999, while ordering operators have numbers 10000-19999,
leaving room for a couple more roles before we have to rethink the
assignment or widen amopstrategy to an int. That's a bit ugly but
might be better than adding a separate role column.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-15 21:35:17
Message-ID: AANLkTi=Xw6PDg-rp_O+yksQQoSjTjRLHrLzoDm=pegRy@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 15, 2010 at 12:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> BTW, have we discussed the idea of embedding the role in the strategy
> number?  For example, require regular operators to have strategy
> numbers 1-9999, while ordering operators have numbers 10000-19999,
> leaving room for a couple more roles before we have to rethink the
> assignment or widen amopstrategy to an int.  That's a bit ugly but
> might be better than adding a separate role column.

Yeah, we talked about it.

http://archives.postgresql.org/pgsql-hackers/2010-02/msg01075.php
http://archives.postgresql.org/pgsql-hackers/2010-02/msg01079.php

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-15 22:24:19
Message-ID: AANLkTimEp+_mSYrhAJwgTYk43in+LX-egusPoSOZfH6c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 15, 2010 at 5:35 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Oct 15, 2010 at 12:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> BTW, have we discussed the idea of embedding the role in the strategy
>> number?  For example, require regular operators to have strategy
>> numbers 1-9999, while ordering operators have numbers 10000-19999,
>> leaving room for a couple more roles before we have to rethink the
>> assignment or widen amopstrategy to an int.  That's a bit ugly but
>> might be better than adding a separate role column.
>
> Yeah, we talked about it.
>
> http://archives.postgresql.org/pgsql-hackers/2010-02/msg01075.php
> http://archives.postgresql.org/pgsql-hackers/2010-02/msg01079.php

Having said that, I'm not wild on having 5-key syscaches, even though
the patch is ready to go (modulo whatever rebasing is needed, which
probably isn't much). So if you can figure out a way to avoid it,
let's do that.

I still feel vaguely uneasy about the fact that the proposed patch
can't handle ASC/DESC or NULLS FIRST/LAST, and that unease grew a bit
more last night when I read Peter's patch to add collation support.
We could possibly cram ASC/DESC and NULLS FIRST/LAST in by defining
four new categories of operator strategies rather than one, but
there's no way that's going to work for collations. Is there some
other way to approach this problem? Can we leave pg_amop as it is,
and pass the context (sort vs. qual, ASC/DESC, NULLS FIRST/LAST,
collation, whatever...) to the index via some sort of side channel?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-15 23:10:06
Message-ID: 21255.1287184206@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I still feel vaguely uneasy about the fact that the proposed patch
> can't handle ASC/DESC or NULLS FIRST/LAST, and that unease grew a bit
> more last night when I read Peter's patch to add collation support.

Good point.

> We could possibly cram ASC/DESC and NULLS FIRST/LAST in by defining
> four new categories of operator strategies rather than one, but
> there's no way that's going to work for collations. Is there some
> other way to approach this problem? Can we leave pg_amop as it is,
> and pass the context (sort vs. qual, ASC/DESC, NULLS FIRST/LAST,
> collation, whatever...) to the index via some sort of side channel?

Well, we cannot avoid changing pg_amop, or at least changing its
interpretation, because the current scheme simply can't represent
indexable operators that are used for anything except simple boolean
WHERE tests. I agree though that we do *not* want pg_amop involved
in the details of exactly what sort ordering is referenced by a sortable
operator. Somehow that needs to be passed in a side channel.

Maybe we should think in terms of a side channel for Peter's patch
as well. I share your feeling that trying to propagate collation
the same way we now propagate typmod is a recipe for serious pain.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-15 23:49:01
Message-ID: AANLkTinc4-eO0b24-68ENG2P9tuGOsNiw3PhbEKOjYtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 15, 2010 at 7:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I still feel vaguely uneasy about the fact that the proposed patch
>> can't handle ASC/DESC or NULLS FIRST/LAST, and that unease grew a bit
>> more last night when I read Peter's patch to add collation support.
>
> Good point.
>
>> We could possibly cram ASC/DESC and NULLS FIRST/LAST in by defining
>> four new categories of operator strategies rather than one, but
>> there's no way that's going to work for collations.  Is there some
>> other way to approach this problem?  Can we leave pg_amop as it is,
>> and pass the context (sort vs. qual, ASC/DESC, NULLS FIRST/LAST,
>> collation, whatever...) to the index via some sort of side channel?
>
> Well, we cannot avoid changing pg_amop, or at least changing its
> interpretation, because the current scheme simply can't represent
> indexable operators that are used for anything except simple boolean
> WHERE tests.

What exactly do you mean by that?

It has always seemed to me that the operator class mechanism is a
complicated abstraction mechanism that actually adds only a very small
amount of abstraction. Instead of referring to operators by name or
OID we refer to them by a number that maps onto a name or OID. That
allows the user to change the name or OID without breaking anything,
but that's about it. Perhaps we should think of pg_amop not so much
as a way to tell the AM what to do, but just a way to tell it what
operator is logically involved without relying on the name or OID.

> I agree though that we do *not* want pg_amop involved
> in the details of exactly what sort ordering is referenced by a sortable
> operator.  Somehow that needs to be passed in a side channel.

Yeah.

> Maybe we should think in terms of a side channel for Peter's patch
> as well.  I share your feeling that trying to propagate collation
> the same way we now propagate typmod is a recipe for serious pain.

I'm not sure what you're thinking of here. I think we can have the
idea of a FullySpecifiedType = <typid, typmod, collationoid>, but
that's not so much a side channel as an abstraction layer. It
absolutely won't work to stuff the collations in a global variable or
something like that, if that's what you're imagining.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-16 00:10:42
Message-ID: 24252.1287187842@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Oct 15, 2010 at 7:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Well, we cannot avoid changing pg_amop, or at least changing its
>> interpretation, because the current scheme simply can't represent
>> indexable operators that are used for anything except simple boolean
>> WHERE tests.

> What exactly do you mean by that?

> It has always seemed to me that the operator class mechanism is a
> complicated abstraction mechanism that actually adds only a very small
> amount of abstraction. Instead of referring to operators by name or
> OID we refer to them by a number that maps onto a name or OID.

Well, the amount of abstraction might be minimal, but the point is to be
able to understand which operators are related to an index and exactly
what the relationship is. There might be a better way to represent
"this operator acts as <= for btree int4 indexes" than "this operator
has strategy number 2 for btree int4 indexes", but it doesn't seem to me
that there's a lot of win available there. C code certainly finds it
more convenient to work with numbers than names, so I'm not excited
about replacing the strategy numbers with strategy names, if that's what
you're thinking.

> Perhaps we should think of pg_amop not so much
> as a way to tell the AM what to do, but just a way to tell it what
> operator is logically involved without relying on the name or OID.

I already think of it that way ...

>> Maybe we should think in terms of a side channel for Peter's patch
>> as well. I share your feeling that trying to propagate collation
>> the same way we now propagate typmod is a recipe for serious pain.

> I'm not sure what you're thinking of here.

I'm not either. I'm dissatisfied with the direction he's heading
because of the amount of code it's going to break, but I don't have a
better idea. It may well be impossible to have this feature without
breaking everything in sight. (And, if we are going to break everything
in sight, now would be a good time to think about changing typmod to
something more flexible than one int32.)

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-16 00:45:26
Message-ID: AANLkTikzc=NpiPYNtpnXy3hOZ8208hUNhQdppyRJ7uXf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 15, 2010 at 8:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Perhaps we should think of pg_amop not so much
>> as a way to tell the AM what to do, but just a way to tell it what
>> operator is logically involved without relying on the name or OID.
>
> I already think of it that way ...

OK.

>>> Maybe we should think in terms of a side channel for Peter's patch
>>> as well.  I share your feeling that trying to propagate collation
>>> the same way we now propagate typmod is a recipe for serious pain.
>
>> I'm not sure what you're thinking of here.
>
> I'm not either.  I'm dissatisfied with the direction he's heading
> because of the amount of code it's going to break, but I don't have a
> better idea.  It may well be impossible to have this feature without
> breaking everything in sight.  (And, if we are going to break everything
> in sight, now would be a good time to think about changing typmod to
> something more flexible than one int32.)

I assume I don't need to tell you my vote on that.

But I'm also not sure how far this gets us with KNNGIST, where the
issue is not the typmods but the auxilliary information about the
context of the sort and/or whether this is a sort or qual.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-16 01:39:13
Message-ID: AANLkTi=p_n-R-Z1JSZdOV80E5NNyyNTN2HFtYDf=a_xL@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 15, 2010 at 8:45 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Oct 15, 2010 at 8:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Perhaps we should think of pg_amop not so much
>>> as a way to tell the AM what to do, but just a way to tell it what
>>> operator is logically involved without relying on the name or OID.
>>
>> I already think of it that way ...
>
> OK.

Thinking about it that way, perhaps we could add an integer column
amop_whats_it_good_for that gets used as a bit field. That wouldn't
require changing the index structure, although it might break some
other things.

The core KNNGIST patch is actually quite small, actually. Excluding a
lot of not-very-interesting changes to pg_amop.h, it's just over 300
lines of new code, about half of which is in indxpath.c. If we could
get this issue of how to structure the catalogs resolved, it seems to
me that we might be able to see our way to committing that part of
this work fairly quickly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-16 11:39:21
Message-ID: 20101016113921.GA12287@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 15, 2010 at 08:45:26PM -0400, Robert Haas wrote:
> But I'm also not sure how far this gets us with KNNGIST, where the
> issue is not the typmods but the auxilliary information about the
> context of the sort and/or whether this is a sort or qual.

ISTM there are two issues here. With Btree you have the issue where the
index is of a particular collation and at planning time you're given a
collation and either it's compatable or it's not.

With Gist you have the situation where index isn't of a specific
collation but it can be used in different ways to get different
collection as output. (I'm here thinking of that Gist can in some
circumstance be asked to return tuples in a certain order.)

Actually, NULLS FIRST/LAST, ASC/DESC are just variations on collations
and my original patch many years back for each locale made *four*
collation IDs, one for each of the combinations of the above. Not
terribly scalable, but I never did get the planning code to work. :)

What you're going for here I think is a sort of cross-collation
operators, or something, that tells the planner how to get the
particular collation out of the given index. So you have something
like:

Input: en_US, NULLS FIRST, ASC
Output: en_US, NULLS_LAST, DESC
-> Reverse index scan

Input: en_US, NULLS FIRST, ASC
Output: en_US, NULLS LAST, ASC
-> Index scan, add x IS NOT NULL test
-> Append output of x IS NULL.

Input: Gist
Output: KnnGist
-> Index scan with scan type 10000

Input: Gist
Output: Gist
-> Index scan with scan type 0

Obviously this doesn't solve the problem of being able to represent the
required collation in the first place, and if this is at all compatable
with what the SQL standard calls a collation. I just don't think you
can make hard and fast rules about how to make this work and that
perhaps we should be looking for a way to push that to the index
implementation code, with the default rule being: same collection yes,
different no.

Just some thoughts,

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
> - Charles de Gaulle


From: Paul Ramsey <pramsey(at)cleverelephant(dot)ca>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-16 16:23:36
Message-ID: 40A281D3-4D80-4CFE-BC7D-F4428A6E46AD@cleverelephant.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>> (And, if we are going to break everything
> in sight, now would be a good time to think about changing typmod to
> something more flexible than one int32.)

As someone who is jamming geometry type, spatial reference number and dimensionality into said 32bit typmod, let me say emphatically ... Amen!

P


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Paul Ramsey <pramsey(at)cleverelephant(dot)ca>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-16 17:17:48
Message-ID: 1287249468.5599.11.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On lör, 2010-10-16 at 09:23 -0700, Paul Ramsey wrote:
> >> (And, if we are going to break everything
> > in sight, now would be a good time to think about changing typmod to
> > something more flexible than one int32.)
>
> As someone who is jamming geometry type, spatial reference number and
> dimensionality into said 32bit typmod, let me say emphatically ...
> Amen!

So what kind of data structure would you like for a typmod?


From: Paul Ramsey <pramsey(at)cleverelephant(dot)ca>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-16 22:13:30
Message-ID: AANLkTimcJ9SdKiXwd-J69DyGvDNOSMyxukJ-Ui8j6Zrm@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Oct 16, 2010 at 10:17 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On lör, 2010-10-16 at 09:23 -0700, Paul Ramsey wrote:
>> >>   (And, if we are going to break everything
>> > in sight, now would be a good time to think about changing typmod to
>> > something more flexible than one int32.)
>>
>> As someone who is jamming geometry type, spatial reference number and
>> dimensionality into said 32bit typmod, let me say emphatically ...
>> Amen!
>
> So what kind of data structure would you like for a typmod?

I'm a primitive enough beast that just having 64-bits would make me
happy. As a general matter though, a bytea?

P


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Paul Ramsey <pramsey(at)cleverelephant(dot)ca>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-17 01:25:34
Message-ID: AANLkTin5d9RVwaNdGuo45J8gWS9w+Q-G1CjsyjRLVmTW@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Oct 16, 2010 at 6:13 PM, Paul Ramsey <pramsey(at)cleverelephant(dot)ca> wrote:
> On Sat, Oct 16, 2010 at 10:17 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> On lör, 2010-10-16 at 09:23 -0700, Paul Ramsey wrote:
>>> >>   (And, if we are going to break everything
>>> > in sight, now would be a good time to think about changing typmod to
>>> > something more flexible than one int32.)
>>>
>>> As someone who is jamming geometry type, spatial reference number and
>>> dimensionality into said 32bit typmod, let me say emphatically ...
>>> Amen!
>>
>> So what kind of data structure would you like for a typmod?
>
> I'm a primitive enough beast that just having 64-bits would make me
> happy. As a general matter though, a bytea?

Yeah. It strikes me that there are three main kinds of things people
might want to represent:

1. An integer. e.g. for a numeric, precision or scale; for a varchar,
length; for an array, number of dimensions.
2. An OID. e.g. for varchar or text, a collation OID.
3. Recursive structure. So you might have an array (which is
one-dimensional) containing strings (which are limited to 80
characters and collated in Klingon). You want to hold onto all of
those details somehow.

There might be use cases for even crazier things - like packing all
the field names and types for a record object in there... but maybe
that's too crazy to be workable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-17 01:54:57
Message-ID: AANLkTim9nACLt1yGbnB=Xyh6fA2DHvyY7pZ5dJd1hRNp@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 15, 2010 at 9:39 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Oct 15, 2010 at 8:45 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Fri, Oct 15, 2010 at 8:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> Perhaps we should think of pg_amop not so much
>>>> as a way to tell the AM what to do, but just a way to tell it what
>>>> operator is logically involved without relying on the name or OID.
>>>
>>> I already think of it that way ...
>>
>> OK.
>
> Thinking about it that way, perhaps we could add an integer column
> amop_whats_it_good_for that gets used as a bit field.  That wouldn't
> require changing the index structure, although it might break some
> other things.

I gave this a shot (though I called it amoppurpose rather than
amop_whats_it_good_for) and I think it's a reasonable way to proceed.
Proof-of-concept patch attached. This just adds the column (using the
existing padding space), defines AMOP_SEARCH and AMOP_ORDER, and makes
just about everything ignore anything not marked AMOP_SEARCH,
attached. This would obviously need some more hacking to pay
attention to AMOP_ORDER where relevant, etc. and to create some actual
syntax around it. Currently CREATE OPERATOR CLASS / ALTER OPERATOR
FAMILY have this bit:

OPERATOR strategy_number ( op_type [ , op_type ] )

knngist-0.9 implements this:

[ORDER] OPERATOR strategy_number ( op_type [, op_type ] )

...but with the design proposed above that's not quite what we'd want,
because amoppurpose is a bit field, so you could have one or both of
the two possible purposes. Perhaps:

OPERATOR strategy_number ( op_type [ , op_type ] ) [ FOR { SEARCH |
ORDER } [, ...] ]

With the default being FOR SEARCH.

Comments?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-17 01:56:51
Message-ID: AANLkTik2rY30d3xwtpmFGdCVoF_YzdkY3hjeGPCoS7eB@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Oct 16, 2010 at 9:54 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Oct 15, 2010 at 9:39 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Fri, Oct 15, 2010 at 8:45 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Fri, Oct 15, 2010 at 8:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>>> Perhaps we should think of pg_amop not so much
>>>>> as a way to tell the AM what to do, but just a way to tell it what
>>>>> operator is logically involved without relying on the name or OID.
>>>>
>>>> I already think of it that way ...
>>>
>>> OK.
>>
>> Thinking about it that way, perhaps we could add an integer column
>> amop_whats_it_good_for that gets used as a bit field.  That wouldn't
>> require changing the index structure, although it might break some
>> other things.
>
> I gave this a shot (though I called it amoppurpose rather than
> amop_whats_it_good_for) and I think it's a reasonable way to proceed.
> Proof-of-concept patch attached.  This just adds the column (using the
> existing padding space), defines AMOP_SEARCH and AMOP_ORDER, and makes
> just about everything ignore anything not marked AMOP_SEARCH,
> attached.  This would obviously need some more hacking to pay
> attention to AMOP_ORDER where relevant, etc. and to create some actual
> syntax around it.  Currently CREATE OPERATOR CLASS / ALTER OPERATOR
> FAMILY have this bit:
>
> OPERATOR strategy_number ( op_type [ , op_type ] )
>
> knngist-0.9 implements this:
>
> [ORDER] OPERATOR strategy_number ( op_type [, op_type ] )
>
> ...but with the design proposed above that's not quite what we'd want,
> because amoppurpose is a bit field, so you could have one or both of
> the two possible purposes.  Perhaps:
>
> OPERATOR strategy_number ( op_type [ , op_type ] ) [ FOR { SEARCH |
> ORDER } [, ...] ]
>
> With the default being FOR SEARCH.
>
> Comments?

And here's the attachment, sorry.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
amoppurpose-v1.patch text/x-patch 73.9 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-17 02:26:55
Message-ID: AANLkTi=Pfbreb=t_0NRwNmRd2WqtJ5stVeYb9KOM3GJ_@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 15, 2010 at 7:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I still feel vaguely uneasy about the fact that the proposed patch
>> can't handle ASC/DESC or NULLS FIRST/LAST, and that unease grew a bit
>> more last night when I read Peter's patch to add collation support.
>
> Good point.
>
>> We could possibly cram ASC/DESC and NULLS FIRST/LAST in by defining
>> four new categories of operator strategies rather than one, but
>> there's no way that's going to work for collations.  Is there some
>> other way to approach this problem?  Can we leave pg_amop as it is,
>> and pass the context (sort vs. qual, ASC/DESC, NULLS FIRST/LAST,
>> collation, whatever...) to the index via some sort of side channel?
>
> Well, we cannot avoid changing pg_amop, or at least changing its
> interpretation, because the current scheme simply can't represent
> indexable operators that are used for anything except simple boolean
> WHERE tests.  I agree though that we do *not* want pg_amop involved
> in the details of exactly what sort ordering is referenced by a sortable
> operator.  Somehow that needs to be passed in a side channel.

I spent some time tonight looking at how this is handled in
builtin_knngist_core-0.9. It looks like the requested sort, if there
is one, just gets tossed into the indexquals list and eventually
transformed into a scankey with a new flag SK_ORDER set. I think what
we should do instead is add an additional IndexPath field which can
point to a node indicating the desired ordering properties. This node
might look similar to a scan key but having it be a separate node type
will allow us to add in whatever, ahem, miscellaneous crap we want to
pass: currently ASC/DESC and NULLS FIRST/LAST, and eventually
collation and so forth. That can then get propagated into the
IndexScan and passed to index_beginscan, where it can be passed to the
AM's ambeginscan function (i.e. we'll add an additional argument to
the signatures of those functions).

There's a second problem, too. If we assume that we want to treat
this problem with some degree of generality - that is, we really do
care about things like ASC/DESC and NULLS FIRST/LAST and eventually
collation - then the proposed amcanorderbyop flag isn't really
sufficient. The AM will need to be able to indicate whether a given
combination of parameters is one that it can handle. I'm thinking
perhaps in lieu of a boolean, we can add another indexam method which,
if not InvalidOid, gets called when we're wondering about whether a
given clause is something that the index can order by. Although
knngist focuses on a the ORDER BY col OP constant case, which
certainly seems like the most useful one, there's no theoretical
reason why an AM couldn't allow ordering by some more complex clause.
I don't know whether anyone will ever feel like writing code to do
something like that, but if we set the API up this way then it should
be at least theoretically possible.

Comments?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>
To: Paul Ramsey <pramsey(at)cleverelephant(dot)ca>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-18 10:41:06
Message-ID: 4CBC2442.8040701@siriusit.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Paul Ramsey wrote:

>> So what kind of data structure would you like for a typmod?
>
> I'm a primitive enough beast that just having 64-bits would make me
> happy. As a general matter though, a bytea?
>
> P

For my vote, I'd prefer either the Oid of a custom type or an array of
Oid, Datum pairs - i.e. something we can extend in the future if required.

ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs


From: David Fetter <david(at)fetter(dot)org>
To: Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>
Cc: Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-18 13:29:55
Message-ID: 20101018132955.GJ17565@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 18, 2010 at 11:41:06AM +0100, Mark Cave-Ayland wrote:
> Paul Ramsey wrote:
>
> >>So what kind of data structure would you like for a typmod?
> >
> >I'm a primitive enough beast that just having 64-bits would make me
> >happy. As a general matter though, a bytea?
> >
> >P
>
> For my vote, I'd prefer either the Oid of a custom type or an array
> of Oid, Datum pairs - i.e. something we can extend in the future if
> required.

This sounds a lot like a foreign key to another table. Are you not
proposing doing that because of performance considerations?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>
To: David Fetter <david(at)fetter(dot)org>
Cc: Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-18 15:13:04
Message-ID: 4CBC6400.50009@siriusit.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Fetter wrote:

>> For my vote, I'd prefer either the Oid of a custom type or an array
>> of Oid, Datum pairs - i.e. something we can extend in the future if
>> required.
>
> This sounds a lot like a foreign key to another table. Are you not
> proposing doing that because of performance considerations?
>
> Cheers,
> David.

Well, in PostGIS a typmod contains 3 pieces of information:

1) the SRID
2) the dimension
3) the geometry type

The SRID is technically already a foreign key into another table, with
dimension and SRID as other information. At the moment, we bit-pack the
dimension and geometry type into the SRID and use that as the typmod but
this only leaves 21 bits IIRC for the SRID. The additional complication
is that SRIDs at the higher end of the range are allowed for anyone to
use, and so people may have their own customised spheroids defined in
this region of the table.

If we had a foreign key into another table, we'd need to ensure that no
one could tamper with it as otherwise all chaos would break lose, e.g.
breaking the geometry type constraint on a column. Heck, we even have
people deleting the geometry_columns table sometimes because they are
not aware of what it does. By storing this information in the PG catalog
then this can't happen, plus the information is available easily in
Form_pg_attribute without having to implement our own cache, with its
own related problems such as how/when to invalidate etc.

There is also a chance that we'd want to include additional information
in the future related to geometry validity, for example, which would
mean further reducing the range allowed within the spatial_ref_sys table
in its existing form.

ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>
Cc: Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-18 19:33:34
Message-ID: 1287430415.26448.5.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On mån, 2010-10-18 at 11:41 +0100, Mark Cave-Ayland wrote:
> Paul Ramsey wrote:
>
> >> So what kind of data structure would you like for a typmod?
> >
> > I'm a primitive enough beast that just having 64-bits would make me
> > happy. As a general matter though, a bytea?
> >
> > P
>
> For my vote, I'd prefer either the Oid of a custom type or an array of
> Oid, Datum pairs - i.e. something we can extend in the future if required.

I think if we really wanted to design this generally, we'd give a type
function arguments. So, numeric would get (int default = 0, int default
= 0). That can easily get very complicated, of course.

In any case, for the shorter term, it's clear that refactoring the
passing around of type + typmod would help this endeavor, so I'm going
to give it a try.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-18 19:36:51
Message-ID: AANLkTi=tTVu_XX=G1NoxZUa9icOnx5DQ6UREZ0oBoq1o@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 18, 2010 at 3:33 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On mån, 2010-10-18 at 11:41 +0100, Mark Cave-Ayland wrote:
>> Paul Ramsey wrote:
>>
>> >> So what kind of data structure would you like for a typmod?
>> >
>> > I'm a primitive enough beast that just having 64-bits would make me
>> > happy. As a general matter though, a bytea?
>> >
>> > P
>>
>> For my vote, I'd prefer either the Oid of a custom type or an array of
>> Oid, Datum pairs - i.e. something we can extend in the future if required.
>
> I think if we really wanted to design this generally, we'd give a type
> function arguments.  So, numeric would get (int default = 0, int default
> = 0).  That can easily get very complicated, of course.
>
> In any case, for the shorter term, it's clear that refactoring the
> passing around of type + typmod would help this endeavor, so I'm going
> to give it a try.

By "this endeavor" do you mean KNNGIST, or per-column collation?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-18 19:40:10
Message-ID: 1287430810.26448.7.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On mån, 2010-10-18 at 15:36 -0400, Robert Haas wrote:
> On Mon, Oct 18, 2010 at 3:33 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> > On mån, 2010-10-18 at 11:41 +0100, Mark Cave-Ayland wrote:
> >> Paul Ramsey wrote:
> >>
> >> >> So what kind of data structure would you like for a typmod?
> >> >
> >> > I'm a primitive enough beast that just having 64-bits would make me
> >> > happy. As a general matter though, a bytea?
> >> >
> >> > P
> >>
> >> For my vote, I'd prefer either the Oid of a custom type or an array of
> >> Oid, Datum pairs - i.e. something we can extend in the future if required.
> >
> > I think if we really wanted to design this generally, we'd give a type
> > function arguments. So, numeric would get (int default = 0, int default
> > = 0). That can easily get very complicated, of course.
> >
> > In any case, for the shorter term, it's clear that refactoring the
> > passing around of type + typmod would help this endeavor, so I'm going
> > to give it a try.
>
> By "this endeavor" do you mean KNNGIST, or per-column collation?

No, I was referring to the talk about a different/better/more flexible
typmod. Per-column collation could also benefit, as you have observed.
Frankly, I have no idea what knngist has to do with any of this, as I
haven't followed that topic at all.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-18 19:41:06
Message-ID: AANLkTi=1g5-zQUDXEEqrrW94fSkqnDV2tARup3HVdVZu@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 18, 2010 at 3:40 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On mån, 2010-10-18 at 15:36 -0400, Robert Haas wrote:
>> On Mon, Oct 18, 2010 at 3:33 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> > On mån, 2010-10-18 at 11:41 +0100, Mark Cave-Ayland wrote:
>> >> Paul Ramsey wrote:
>> >>
>> >> >> So what kind of data structure would you like for a typmod?
>> >> >
>> >> > I'm a primitive enough beast that just having 64-bits would make me
>> >> > happy. As a general matter though, a bytea?
>> >> >
>> >> > P
>> >>
>> >> For my vote, I'd prefer either the Oid of a custom type or an array of
>> >> Oid, Datum pairs - i.e. something we can extend in the future if required.
>> >
>> > I think if we really wanted to design this generally, we'd give a type
>> > function arguments.  So, numeric would get (int default = 0, int default
>> > = 0).  That can easily get very complicated, of course.
>> >
>> > In any case, for the shorter term, it's clear that refactoring the
>> > passing around of type + typmod would help this endeavor, so I'm going
>> > to give it a try.
>>
>> By "this endeavor" do you mean KNNGIST, or per-column collation?
>
> No, I was referring to the talk about a different/better/more flexible
> typmod.  Per-column collation could also benefit, as you have observed.
> Frankly, I have no idea what knngist has to do with any of this, as I
> haven't followed that topic at all.

Me neither, except for the fact that the PostGIS guys are interested
in both things. I think this thread has been hijacked OT.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>, Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-18 20:02:14
Message-ID: 24841.1287432134@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Oct 18, 2010 at 3:40 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> Frankly, I have no idea what knngist has to do with any of this, as I
>> haven't followed that topic at all.

> Me neither, except for the fact that the PostGIS guys are interested
> in both things. I think this thread has been hijacked OT.

We got there by discussing how ordering information could be passed
around for knngist operators. Robert said something about side
channels, and I mentioned that maybe collation information should
pass through a side channel as well instead of trying to make it work
like typmods, and then it drifted to rethinking typmods as long as we
were thinking about having to touch everyplace that deals in typmods.
I agree the thread is far OT --- most of the recent messages belong
in discussion of the collation patch more than they do with knngist.
But there is some overlap if you think about whether these problems
could be solved together.

regards, tom lane


From: David Fetter <david(at)fetter(dot)org>
To: Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>
Cc: Paul Ramsey <pramsey(at)cleverelephant(dot)ca>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-19 14:17:06
Message-ID: 20101019141706.GB6871@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 18, 2010 at 04:13:04PM +0100, Mark Cave-Ayland wrote:
> David Fetter wrote:
>
> >>For my vote, I'd prefer either the Oid of a custom type or an array
> >>of Oid, Datum pairs - i.e. something we can extend in the future if
> >>required.
> >
> >This sounds a lot like a foreign key to another table. Are you not
> >proposing doing that because of performance considerations?
> >
> >Cheers,
> >David.
>
> Well, in PostGIS a typmod contains 3 pieces of information:
>
> 1) the SRID
> 2) the dimension
> 3) the geometry type
>
> The SRID is technically already a foreign key into another table,
> with dimension and SRID as other information. At the moment, we
> bit-pack the dimension and geometry type into the SRID and use that
> as the typmod but this only leaves 21 bits IIRC for the SRID. The
> additional complication is that SRIDs at the higher end of the range
> are allowed for anyone to use, and so people may have their own
> customised spheroids defined in this region of the table.

Sounds like you need space for all of these separately, and once you
have that, you'll probably start thinking about other possible pieces
of information you could store, especially as you start to store new
data types and have new operators on them.

It sounds to me as thought the typemod, or something like it, needs to
be able to store, in general, completely arbitrary information,
although not likely much over a page or two of memory. Cf. Robert
Haas's recent comment about Klingon. While we could make this a
completely opaque structure from the SQL level, I'm not sure it's a
good idea.

> If we had a foreign key into another table, we'd need to ensure that
> no one could tamper with it as otherwise all chaos would break lose,
> e.g. breaking the geometry type constraint on a column.

Our system catalog tables have the nice property that no one can
accidentally modify them by hand, and "all chaos," as you put it, can
reasonably be expected to occur should they choose to do so.

> Heck, we even have people deleting the geometry_columns table
> sometimes because they are not aware of what it does. By storing
> this information in the PG catalog then this can't happen, plus the
> information is available easily in Form_pg_attribute without having
> to implement our own cache, with its own related problems such as
> how/when to invalidate etc.

:)

> There is also a chance that we'd want to include additional
> information in the future related to geometry validity, for example,
> which would mean further reducing the range allowed within the
> spatial_ref_sys table in its existing form.

Right.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-19 15:11:15
Message-ID: 4CBDB513.9020108@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> 3) 3-rd boolean column (amopopr, amopfamily, amoporder)
>> - could be two records per operator
>> + operator could be used in both roles
>> + strategy number could be different for different roles

> How can #3 work at all? It's ignoring a couple of critical index
> columns. In particular, I believe the sticking point here is this
> unique index:
>
> "pg_amop_fam_strat_index" UNIQUE, btree (amopfamily, amoplefttype, amoprighttype, amopstrategy)
>
I believe, that columns (amoplefttype, amoprighttype) are fixed for operation
and could not be changed. So, two operations in one opfamily should be differ in
strategy number for different roles. It also gives a direct way to transfer
knowledge abot role to consistent method of GiST.

> I'm not terribly thrilled with using a boolean here in any case.
> Now that we have two "roles" an operator might play in an opclass,
> who's to say there might not be more roles someday? We should use
> a column type that will support more than two roles without basic
> rejiggering.

It's easy to change to char type, for now only 's'search and 'o'order characters
will be allowed.

> BTW, have we discussed the idea of embedding the role in the strategy
> number? For example, require regular operators to have strategy

In this schema, one operator could not be used in more than one role. Right now
it's not strict limitation, because the we still don't have an example of
boolean distance.

Anyhow, it needs to distinguish roles while IndexPath is built. So,
op_in_opfamily/get_op_opfamily_strategy/get_op_opfamily_properties and friends
will accept extra argument pointing to interested role.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-19 15:16:21
Message-ID: 4CBDB645.4040609@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> Thinking about it that way, perhaps we could add an integer column
>> amop_whats_it_good_for that gets used as a bit field. That wouldn't
>> require changing the index structure, although it might break some
>> other things.

> OPERATOR strategy_number ( op_type [ , op_type ] ) [ FOR { SEARCH |
> ORDER } [, ...] ]

It's very desirable thing to be able to distinguish roles in consistent method
of GiST: computation of distance could be very expensive and. The single way to
provide it in current GiST interface is a strategy number. Of course, we could
add 6-th argument to consistent to point role, but I don't think that's good
decision.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-19 15:43:31
Message-ID: 4CBDBCA3.6040602@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> combination of parameters is one that it can handle. I'm thinking
> perhaps in lieu of a boolean, we can add another indexam method which,
> if not InvalidOid, gets called when we're wondering about whether a
> given clause is something that the index can order by. Although
> knngist focuses on a the ORDER BY col OP constant case, which

Hmm, interesting idea. To be more general, amcanorder (without byop suffix)
could be eliminated too.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-19 19:20:38
Message-ID: AANLkTi=NwrDFjYGu9Ws3EkU3TE=94jcx2mWh_ZX2o1hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/10/19 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
>>> Thinking about it that way, perhaps we could add an integer column
>>> amop_whats_it_good_for that gets used as a bit field.  That wouldn't
>>> require changing the index structure, although it might break some
>>> other things.
>
>> OPERATOR strategy_number ( op_type [ , op_type ] ) [ FOR { SEARCH |
>> ORDER } [, ...] ]
>
> It's very desirable thing to be able to distinguish roles in consistent
> method of GiST: computation of distance could be very expensive and. The
> single way to provide it in current GiST interface is a strategy number. Of
> course, we could add 6-th argument to consistent to point role, but I don't
> think that's good decision.

To me, adding an additional argument (or maybe providing a whole
separate support function, separate from consistent) seems quite
natural, because now you have a way to pass all the other little bits
that might matter... ASC/DESC, NULLS FIRST/LAST, collation OID, etc.
You can define the additional argument as providing all of the extra
info about how the operator is being used, and, if it's being used for
ordering, the details of the requested order. What is your thinking
on the matter?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-22 10:58:24
Message-ID: 4CC16E50.90104@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> You can define the additional argument as providing all of the extra
> info about how the operator is being used, and, if it's being used for
> ordering, the details of the requested order. What is your thinking
> on the matter?

Maby be useful, but it seems to me to be a bit overengineering for now. GiST
will not support knn-search with different than ASC NULLS LAST order in foresee
future, because it stores NULLs in unmaintainable manner for different ordering:
NULLs could be everywhere in tree. So, implementation of ASC NULLS FIRST
requires to read whole tree to find all NULLs value first. I can imagine, how to
reorganize tree to store NULLs together for single-column index, but not for
multi-column index. Orders DESC NULLS LAST/FIRST requires to introduce two more
"infinite" values: one to notion of distance more than +INF and another,
non-negative, but less than zero distance.

Again, it could be useful for modification of GiST known as ordered GiST. But
ordered GiST uses completely different tree traversal algorithm for search and
insert, and it hasn't any benefits comparing with B-Tree and GiST. It's looks
like an autogiro which successfully combines disadvantages of airplanes and
helicopters :)
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-24 12:16:38
Message-ID: AANLkTinmxGwycSBkTTLeeRy=HgN8bgZS5WVBTScRvE8T@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/10/22 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
>> You can define the additional argument as providing all of the extra
>> info about how the operator is being used, and, if it's being used for
>> ordering, the details of the requested order.  What is your thinking
>> on the matter?
>
> Maby be useful, but it seems to me to be a bit overengineering for now. GiST
> will not support knn-search with different than ASC NULLS LAST order in
> foresee future, because it stores NULLs in unmaintainable manner for
> different ordering: NULLs could be everywhere in tree. So, implementation of
> ASC NULLS FIRST requires to read whole tree to find all NULLs value first. I
> can imagine, how to reorganize tree to store NULLs together for
> single-column index, but not for multi-column index. Orders DESC NULLS
> LAST/FIRST requires to introduce two more "infinite" values: one to notion
> of distance more than +INF and another, non-negative, but less than zero
> distance.

I don't think it's overengineering, just because making core changes
to the planner is such a pain in the neck that we don't want to have
to come back and do it again any too soon, or at least we'd like them
to be as minor as possible. It's better to have one API break and be
done with it than maybe have to come back and do a second round of
code changes. And from what I can see it's not going to be that ugly.
I'm trying to drum up some time to hack on it...

> Again, it could be useful for modification of GiST known as ordered GiST.
> But ordered GiST uses completely different tree traversal algorithm for
> search and insert, and it hasn't any benefits comparing with B-Tree and
> GiST. It's looks like an autogiro which successfully combines disadvantages
> of airplanes and helicopters :)

Heh. :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-10-25 03:49:49
Message-ID: AANLkTin49UUZJ+Ne_BuLTvJ7O7cMhQ-x-6oaUEa0byxs@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Oct 16, 2010 at 9:54 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Thinking about it that way, perhaps we could add an integer column
>> amop_whats_it_good_for that gets used as a bit field.  That wouldn't
>> require changing the index structure, although it might break some
>> other things.
>
> I gave this a shot (though I called it amoppurpose rather than
> amop_whats_it_good_for) and I think it's a reasonable way to proceed.
> Proof-of-concept patch attached.  This just adds the column (using the
> existing padding space), defines AMOP_SEARCH and AMOP_ORDER, and makes
> just about everything ignore anything not marked AMOP_SEARCH,
> attached.  This would obviously need some more hacking to pay
> attention to AMOP_ORDER where relevant, etc. and to create some actual
> syntax around it.  Currently CREATE OPERATOR CLASS / ALTER OPERATOR
> FAMILY have this bit:
>
> OPERATOR strategy_number ( op_type [ , op_type ] )
>
> knngist-0.9 implements this:
>
> [ORDER] OPERATOR strategy_number ( op_type [, op_type ] )
>
> ...but with the design proposed above that's not quite what we'd want,
> because amoppurpose is a bit field, so you could have one or both of
> the two possible purposes.  Perhaps:
>
> OPERATOR strategy_number ( op_type [ , op_type ] ) [ FOR { SEARCH |
> ORDER } [, ...] ]
>
> With the default being FOR SEARCH.

Slightly-more-fleshed out proof of concept patch attached, with actual
syntax, documentation, and pg_dump support added. This might be
thought of as a subset of the builtin_knngist_core patch, without the
parts that make it actually do something useful (which is mostly
match_pathkey_to_index - which I'm still rather hoping to abstract in
some way via the access method interface, though I'm currently unsure
what the best way to do that is).

I notice that builtin_knngist_core checks whether the return type of
an ordering operator has a built-in btree opclass. I'm not sure
whether we should bother checking that, because even if it's true I
don't think there's anything preventing it from becoming false later.
I think it's probably sufficient to just check this condition at plan
time and silently skip trying to build knn-type index paths if it's
not met.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
amoppurpose-v2.patch text/x-patch 93.0 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-11-12 17:24:13
Message-ID: 201011121724.oACHODB04788@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Robert, it is great you are taking this on. This is really a well-known
area of the code for you, but not so much for Teodor and Oleg, so I am
sure they appreciate your assistance.

---------------------------------------------------------------------------

Robert Haas wrote:
> On Sat, Oct 16, 2010 at 9:54 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >> Thinking about it that way, perhaps we could add an integer column
> >> amop_whats_it_good_for that gets used as a bit field. ?That wouldn't
> >> require changing the index structure, although it might break some
> >> other things.
> >
> > I gave this a shot (though I called it amoppurpose rather than
> > amop_whats_it_good_for) and I think it's a reasonable way to proceed.
> > Proof-of-concept patch attached. ?This just adds the column (using the
> > existing padding space), defines AMOP_SEARCH and AMOP_ORDER, and makes
> > just about everything ignore anything not marked AMOP_SEARCH,
> > attached. ?This would obviously need some more hacking to pay
> > attention to AMOP_ORDER where relevant, etc. and to create some actual
> > syntax around it. ?Currently CREATE OPERATOR CLASS / ALTER OPERATOR
> > FAMILY have this bit:
> >
> > OPERATOR strategy_number ( op_type [ , op_type ] )
> >
> > knngist-0.9 implements this:
> >
> > [ORDER] OPERATOR strategy_number ( op_type [, op_type ] )
> >
> > ...but with the design proposed above that's not quite what we'd want,
> > because amoppurpose is a bit field, so you could have one or both of
> > the two possible purposes. ?Perhaps:
> >
> > OPERATOR strategy_number ( op_type [ , op_type ] ) [ FOR { SEARCH |
> > ORDER } [, ...] ]
> >
> > With the default being FOR SEARCH.
>
> Slightly-more-fleshed out proof of concept patch attached, with actual
> syntax, documentation, and pg_dump support added. This might be
> thought of as a subset of the builtin_knngist_core patch, without the
> parts that make it actually do something useful (which is mostly
> match_pathkey_to_index - which I'm still rather hoping to abstract in
> some way via the access method interface, though I'm currently unsure
> what the best way to do that is).
>
> I notice that builtin_knngist_core checks whether the return type of
> an ordering operator has a built-in btree opclass. I'm not sure
> whether we should bother checking that, because even if it's true I
> don't think there's anything preventing it from becoming false later.
> I think it's probably sufficient to just check this condition at plan
> time and silently skip trying to build knn-type index paths if it's
> not met.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

[ Attachment, skipping... ]

>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-11-12 18:24:23
Message-ID: 4CDD8657.6020903@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Slightly-more-fleshed out proof of concept patch attached, with actual
> syntax, documentation, and pg_dump support added. This might be
> thought of as a subset of the builtin_knngist_core patch, without the
> parts that make it actually do something useful (which is mostly
> match_pathkey_to_index - which I'm still rather hoping to abstract in
> some way via the access method interface, though I'm currently unsure
> what the best way to do that is).

I don't see in your patch how to propagate knowledge of kind of operation
(AMOP_SEARCH or AMOP_ORDER) to GiST and consistent method. For both of them they
aren't distinguishable. That's not acceptably for both, because GiST needs to
choose right traversal algorithm, consistentFn needs role to decide return
infinity or false (-1) value.

My variants informs GiST by SK_ORDER flags and consistentFn looks at strategy
number (strategy numbers are different for different purposes).

> I notice that builtin_knngist_core checks whether the return type of
> an ordering operator has a built-in btree opclass. I'm not sure
Actually, GiST doesn't use that knowledge, check is done only to be sure that
operation returns orderable data type.

> whether we should bother checking that, because even if it's true I
> don't think there's anything preventing it from becoming false later.
> I think it's probably sufficient to just check this condition at plan
> time and silently skip trying to build knn-type index paths if it's
> not met.

It's already do it: you can not ORDER BY over non-orderable data type. That
check just make diagnostic earlier.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-11-20 23:54:10
Message-ID: AANLkTinu0hr-=uMOBgxiBqR-JPfp+kBKNC9hdVf-93Ge@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/11/12 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
>> Slightly-more-fleshed out proof of concept patch attached, with actual
>> syntax, documentation, and pg_dump support added.  This might be
>> thought of as a subset of the builtin_knngist_core patch, without the
>> parts that make it actually do something useful (which is mostly
>> match_pathkey_to_index - which I'm still rather hoping to abstract in
>> some way via the access method interface, though I'm currently unsure
>> what the best way to do that is).
>
> I don't see in your patch how to propagate knowledge of kind of operation
> (AMOP_SEARCH or AMOP_ORDER) to GiST and consistent method. For both of them
> they aren't distinguishable. That's not acceptably for both, because GiST
> needs to choose right traversal algorithm, consistentFn needs role to decide
> return infinity or false (-1) value.

My version of the patch is suffering from a bad case of not being
done. It sort of started out as a proof-of-concept to show what the
syntax might look like and seems to be gradually growing into
something more real. I'm not sure what we'll end up with.

> My variants informs GiST by SK_ORDER flags and consistentFn looks at
> strategy number (strategy numbers are different for different purposes).

Yeah. At ten thousand feet, I think the open design question here is
to what extent it's OK to rely on the fact that the ORDER BY clauses
we wish to optimize happen to look a lot like the WHERE clauses we
already know how to optimize: namely, they're both binary opclauses of
the form <indexed-column> <op> <constant>. Your patch manages to
reuse a LOT of existing machinery by shoving ordering expressions
through the same code paths that quals take. Code reuse is generally
a good thing, but here's we're forming RestrictInfo and ScanKey
objects out of things that are neither restrictions nor keys, which
might lead to maintainability problems down the road. I'd like to get
some input from Tom on how he feels about that, and any alternatives
he sees.

It seems to me that our concept of ScanDirection is really woefully
under-expressive. For example, given:

CREATE TABLE foo (
id integer NOT NULL,
name character varying NOT NULL,
PRIMARY KEY (id)
);

We use the index for the first of these but not the second:

select * from foo order by id nulls last;
select * from foo order by id nulls first;

In an ideal world, we'd like to handle the second one by finding the
first non-NULL entry in the index, scanning away from the NULLs, and
then, when we run off the end, looping back around to spit out the
NULL entries. Or, similarly, if we have an index on (id, name), we
can handle the first two of the following with the index, but not the
second two:

select * from foo order by id, name;
select * from foo order by id desc, name desc;
select * from foo order by id, name desc;
select * from foo order by id, name nulls last;
(can't handle this last one even if name is marked NOT NULL!)

It seems like it might even be possible to handle these (albeit maybe
not real efficiently) via a sufficiently complicated scanning order,
but even if we had the code to do the scan, we have no way of telling
the index what scan direction we want: forward, backward, and no
movement don't really cut it. The idea that forward and backward mean
anything at all is really pretty btree-centric.

The trick, of course, is to come up with something better. I have a
vague notion of using something like an array or list of objects of
the form <index column, asc/desc, nulls first/last, value (null except
for knngist)>. That could easily be extended with collation or
whatever as needed. The trick is - you need to create this object and
then ask the index - hey, is this something you can support? And the
index needs to then respond by saying whether it can do that (and
maybe doing some sort of translation into what instructions should be
provided at execution time).

Trouble is, that sounds like a lot of work on code I'm not altogether
comfortable with, and I'd really like to get KNNGIST in shape for 9.1
if possible. I'm not real sure where to draw the line between, on the
one hand, not wanting to commit something that is excessively crufty
and, on the other hand, wanting to finish in finite time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-11-21 22:24:00
Message-ID: 27958.1290378240@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> 2010/11/12 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
>> My variants informs GiST by SK_ORDER flags and consistentFn looks at
>> strategy number (strategy numbers are different for different purposes).

> Yeah. At ten thousand feet, I think the open design question here is
> to what extent it's OK to rely on the fact that the ORDER BY clauses
> we wish to optimize happen to look a lot like the WHERE clauses we
> already know how to optimize: namely, they're both binary opclauses of
> the form <indexed-column> <op> <constant>. Your patch manages to
> reuse a LOT of existing machinery by shoving ordering expressions
> through the same code paths that quals take. Code reuse is generally
> a good thing, but here's we're forming RestrictInfo and ScanKey
> objects out of things that are neither restrictions nor keys, which
> might lead to maintainability problems down the road. I'd like to get
> some input from Tom on how he feels about that, and any alternatives
> he sees.

I haven't spent any time on this patch yet (hope to start looking at it
next week). As for your specific question above, I don't have a big
problem with reusing ScanKey this way, but I do agree that using
RestrictInfo for this would be a crock. ISTM what we ought to have is
just the ability to match PathKeys with expressions of the form
"indexedcol op constant" to an index.

I'm undecided about the big-picture question of how much extra
generality ought to be put into the system along with this patch.
The argument not to is that with no candidate uses of additional
generality on the horizon, it's a waste of time to design something
more general, because we'll probably get it wrong anyway. I'm not
a fan of designing APIs without use-cases in mind. On the other hand,
there's an argument *for* doing something more general, which is
basically Polya's paradox: the more general problem may be easier to
solve. To support that argument, we'd need a design that is clearly
cleaner than bolting KNNGIST on according to the current patch.
AIUI we don't have that at the moment, but I still think it's worth
spending a bit more time looking for one.

> It seems to me that our concept of ScanDirection is really woefully
> under-expressive. For example, given:

> CREATE TABLE foo (
> id integer NOT NULL,
> name character varying NOT NULL,
> PRIMARY KEY (id)
> );

> We use the index for the first of these but not the second:

> select * from foo order by id nulls last;
> select * from foo order by id nulls first;

> In an ideal world, we'd like to handle the second one by finding the
> first non-NULL entry in the index, scanning away from the NULLs, and
> then, when we run off the end, looping back around to spit out the
> NULL entries.

This example leaves me totally cold, not least because it assumes a
specific storage implementation for nulls in an index. It is also,
I think, misunderstanding what ScanDirection is for. That's only
intended to allow executor plans to be run forward and then backed up
in the same fashion that fetching backwards from a cursor would do;
which is not a btree-specific concept, indeed not even index-specific.
If there is sufficient interest in doing what you suggest, what we'd
want to do is pass the PathKey representation to the index and let the
index AM figure out what it has to do to produce that sort order. But
that is way way down my priority list.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-11-21 23:21:36
Message-ID: AANLkTikVJSmh6n3-1L=7EB2LOkdG5AGqKPL7omLc=5Th@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 21, 2010 at 5:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I haven't spent any time on this patch yet (hope to start looking at it
> next week).  As for your specific question above, I don't have a big
> problem with reusing ScanKey this way, but I do agree that using
> RestrictInfo for this would be a crock.  ISTM what we ought to have is
> just the ability to match PathKeys with expressions of the form
> "indexedcol op constant" to an index.

That doesn't seem very hard on its face. The trick is what to do with
that information once you've got it. As far as I can tell, you need
to drill some kind of hole that lets you pass "additional details
about the desired sort order" to the index AM. What I'd sort of like
to be able to do is throw the PathKeys at the index AM and say "you
want these?". Short of that, we're probably going to have to resign
ourselves to the core code basically knowing exactly what the
capabilities of KNNGIST are, making the index API pretty porous - not
that it already isn't. There's really nothing special about the
subset of the problem space KNNGIST happens to attack except that it
makes the GIS guys drool; the next problem someone wants to attack in
this area is as likely as not to look completely different.

> This example leaves me totally cold, not least because it assumes a
> specific storage implementation for nulls in an index.  It is also,
> I think, misunderstanding what ScanDirection is for.  That's only
> intended to allow executor plans to be run forward and then backed up
> in the same fashion that fetching backwards from a cursor would do;
> which is not a btree-specific concept, indeed not even index-specific.

Ah, OK.

> If there is sufficient interest in doing what you suggest, what we'd
> want to do is pass the PathKey representation to the index and let the
> index AM figure out what it has to do to produce that sort order.  But
> that is way way down my priority list.

Yeah, this is basically what I'm wondering whether we can reasonably
do for KNNGIST; with hopes of later reuse. But it may be unworkable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-11-21 23:38:05
Message-ID: 29341.1290382685@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> That doesn't seem very hard on its face. The trick is what to do with
> that information once you've got it. As far as I can tell, you need
> to drill some kind of hole that lets you pass "additional details
> about the desired sort order" to the index AM.

We clearly need to add additional information to IndexScan plan nodes
to tell the index AM which sort order is required. Up to now, an
indexscan has only had one possible resultant sort order (two if you
count backwards scan, but as I said I don't think generalizing that
particular feature is the way to approach this). I would imagine
that the best way to handle that is to add a PathKey list or something
equivalent to it, and add that to the arguments passed to ambeginscan.

The other issue is how the planner can figure out what the possible
orderings are when it's considering an index. You seem to be
contemplating adding a new index AM function that the planner would call
at the right point; but I'm not sure that that adds much of anything,
because the index AM can't have hard-wired behavior either. We really
have to have enough information in the system catalog entries about an
opclass to allow the possible orderings to be determined. Given that,
I think it makes more sense for the core planner to know what to do than
to put possibly duplicative code into multiple AMs.

I guess a third alternative would be to create per-opclass hook
functions for the planner to call, but I'm not thrilled with that
idea; it would still be largely duplicative code, and in a lot more
places. I think it would also bind our hands with respect to making
internal planner changes in future, because the data structures
representing pathkeys would be pretty well locked down by such a choice.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-11-23 00:33:56
Message-ID: 21593.1290472436@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've done some *very* preliminary review of this patch now. I think
that the way to move forward is to first commit the work surrounding
changes to pg_amop, including suitable changes to CREATE/ALTER OPERATOR
CLASS/FAMILY so that there's a way to put the new info into the table.
The system won't initially *do* anything useful with ordering-operator
entries, but this is necessary infrastructure before we can start to
do anything interesting.

After reviewing both Teodor's and Robert's patches for the issue,
I believe that Teodor has a better basic approach: if an operator
is useful for both searching and ordering, the way to handle that
is to make two pg_amop entries for it, with different strategy
numbers. This means in particular that we can continue to require
strategy numbers to be unique within an opclass, so right offhand
I see no need to widen pg_amop_fam_strat_index to five columns.
(Which means we don't need that patch to allow five-key syscaches.)
What we need instead is to add the "purpose" column to the existing
unique index on (amopopr, amopfamily). In English that means that
instead of allowing an operator to have only one entry per opfamily,
it can now have one entry per opfamily per purpose.

I do however concur with Robert that it's a bit silly to go to all this
work and not leave room in the design for additional operator "purposes"
later. So rather than using a boolean amoporder column, I think we want
an enumerated-type column "amoppurpose". Per our usual system catalog
conventions for "poor man's enums", I suggest declaring the column as
"char" with the two allowed values
AMOP_SEARCH 's'
AMOP_ORDER 'o'
Aside from leaving room for possible extension, I think that using
AMOP_SEARCH/AMOP_ORDER rather than true/false in the code will be more
readable and less error-prone.

As far as the syntax of CREATE/ALTER OPERATOR CLASS/FAMILY is concerned,
I like Robert's proposal (OPERATOR ... FOR ORDER or FOR SEARCH) better
than Teodor's, though we don't need the multiple-purposes-per-entry
aspect of it. This is mainly because it looks more easily extensible
if we ever do get around to having more purposes.

There's a lot more to be done after this, but if there are no objections
to this much, I'll go merge the relevant parts of the two patches and
commit.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-11-23 01:32:09
Message-ID: 22720.1290475929@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> As far as the syntax of CREATE/ALTER OPERATOR CLASS/FAMILY is concerned,
> I like Robert's proposal (OPERATOR ... FOR ORDER or FOR SEARCH) better
> than Teodor's, though we don't need the multiple-purposes-per-entry
> aspect of it. This is mainly because it looks more easily extensible
> if we ever do get around to having more purposes.

Although having said that ... I was poking around a bit more and noticed
how entirely bogus the current patch's matching of pathkeys is. It pays
no attention at all to which pk_opfamily is specified, which means it
will fail given a query like
SELECT ... ORDER BY indexable_col <-> constant USING <<<
where <<< represents some sort order that has nothing to do with the
order that GIST is expecting. The query will be considered to match the
index and then it will proceed to produce results in the wrong order.

We could perhaps kluge around that by insisting that the pathkey opclass
be the default btree opclass for the relevant datatype; but that's
fairly expensive to check for in match_pathkey_to_index, and it's
getting even further away from having a general-purpose solution.

I'm inclined to think that instead of just specifying "FOR ORDER",
the opclass definition ought to specify "FOR ORDER BY something",
where "something" could perhaps be a pre-existing btree opclass name.
Or maybe it could be a suitable sort operator, though then we'd have
to do a bit of reverse-engineering in match_pathkey_to_index, so that's
still no fun from a lookup-speed point of view. In any case what we'd
be specifying is a sort order for the datatype that the opclass member
operator yields, not the indexed datatype (so in practice it'd usually
be float8_ops or float8 <, no doubt).

The reason I bring this up now is that it affects the decision as to
what the unique key for pg_amop ought to be. Instead of having an
enum "purpose" column, maybe we should consider that the unique key
is (operator oid, opfamily oid, order-by-oid), where order-by-oid
is zero for a search operator and the OID of the btree opclass or sort
operator for an ordering operator. This would be of value if we
imagined that a single opclass could support ordering by more than one
distance ordering operator; which seems a bit far-fetched but perhaps
not impossible. On the other side of the coin it'd mean we aren't
leaving room for other sorts of operator "purposes".

On balance I'm inclined to leave the unique key as per previous proposal
(with a "purpose" column) and add the which-sort-order-is-that
information as payload columns that aren't part of the key.

Thoughts?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-11-23 03:18:25
Message-ID: AANLkTinFWMPrE+NYaR_eZyQi8azeQDSWVFRmekymSQNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 22, 2010 at 8:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The reason I bring this up now is that it affects the decision as to
> what the unique key for pg_amop ought to be.  Instead of having an
> enum "purpose" column, maybe we should consider that the unique key
> is (operator oid, opfamily oid, order-by-oid), where order-by-oid
> is zero for a search operator and the OID of the btree opclass or sort
> operator for an ordering operator.  This would be of value if we
> imagined that a single opclass could support ordering by more than one
> distance ordering operator; which seems a bit far-fetched but perhaps
> not impossible.  On the other side of the coin it'd mean we aren't
> leaving room for other sorts of operator "purposes".

Since the need for additional purposes is mostly hypothetical, this
wouldn't bother me any.

> On balance I'm inclined to leave the unique key as per previous proposal
> (with a "purpose" column) and add the which-sort-order-is-that
> information as payload columns that aren't part of the key.

This is probably OK too, although I confess I'm a lot less happy about
it now that you've pointed out the need for those payload columns.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-11-23 04:05:00
Message-ID: 25580.1290485100@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Nov 22, 2010 at 8:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> On balance I'm inclined to leave the unique key as per previous proposal
>> (with a "purpose" column) and add the which-sort-order-is-that
>> information as payload columns that aren't part of the key.

> This is probably OK too, although I confess I'm a lot less happy about
> it now that you've pointed out the need for those payload columns.

The reason I said "columns" is that I can foresee eventually wanting to
specify a pathkey in its entirety --- opfamily, asc/desc, nulls_first,
and whatever we come up with for collation. We don't currently need to
store more than the opfamily, since the others can never need to have
non-default values given the current implementation of KNNGIST. But the
idea that they might all be there eventually makes me feel that we don't
want to try to incorporate this data in pg_amop's unique key. I'm
satisfied to say that only one sort order can be associated with a
particular operator in a particular opclass, which is what would be
implied by using AMOP_SEARCH/AMOP_ORDER as the unique key component.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-11-23 16:04:18
Message-ID: AANLkTim64XLz7MgkV8xKhTKmwd8RmYacGAv94ancgRQ0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 22, 2010 at 11:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Nov 22, 2010 at 8:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> On balance I'm inclined to leave the unique key as per previous proposal
>>> (with a "purpose" column) and add the which-sort-order-is-that
>>> information as payload columns that aren't part of the key.
>
>> This is probably OK too, although I confess I'm a lot less happy about
>> it now that you've pointed out the need for those payload columns.
>
> The reason I said "columns" is that I can foresee eventually wanting to
> specify a pathkey in its entirety --- opfamily, asc/desc, nulls_first,
> and whatever we come up with for collation.  We don't currently need to
> store more than the opfamily, since the others can never need to have
> non-default values given the current implementation of KNNGIST.  But the
> idea that they might all be there eventually makes me feel that we don't
> want to try to incorporate this data in pg_amop's unique key.  I'm
> satisfied to say that only one sort order can be associated with a
> particular operator in a particular opclass, which is what would be
> implied by using AMOP_SEARCH/AMOP_ORDER as the unique key component.

Does that imply that KNNGIST would only be able to support one
ordering per AMOP_ORDER-operator, or does it imply that each such
ordering would require a separate strategy number? The second might
be OK, but the first sounds bad.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-11-23 16:12:29
Message-ID: 9709.1290528749@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Nov 22, 2010 at 11:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm satisfied to say that only one sort order can be associated with a
>> particular operator in a particular opclass, which is what would be
>> implied by using AMOP_SEARCH/AMOP_ORDER as the unique key component.

> Does that imply that KNNGIST would only be able to support one
> ordering per AMOP_ORDER-operator, or does it imply that each such
> ordering would require a separate strategy number? The second might
> be OK, but the first sounds bad.

It would be the first, because simply assigning another strategy number
only satisfies one of the unique constraints on pg_amop. To allow
arbitrary flexibility here, we would have to include all components of
the ordering specification in the unique constraint that's presently
just (amopopr, amopfamily) and is proposed to become
(amopopr, amopfamily, amoppurpose). I think that's an undue amount of
complexity to support something that's most likely physically impossible
from the index's standpoint anyway.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-11-23 16:25:34
Message-ID: AANLkTi=WhcD3t8iQUd+T3uQnEHbNe5HuyuQo2k=mxU59@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 23, 2010 at 11:12 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Nov 22, 2010 at 11:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I'm satisfied to say that only one sort order can be associated with a
>>> particular operator in a particular opclass, which is what would be
>>> implied by using AMOP_SEARCH/AMOP_ORDER as the unique key component.
>
>> Does that imply that KNNGIST would only be able to support one
>> ordering per AMOP_ORDER-operator, or does it imply that each such
>> ordering would require a separate strategy number?  The second might
>> be OK, but the first sounds bad.
>
> It would be the first, because simply assigning another strategy number
> only satisfies one of the unique constraints on pg_amop.  To allow
> arbitrary flexibility here, we would have to include all components of
> the ordering specification in the unique constraint that's presently
> just (amopopr, amopfamily) and is proposed to become
> (amopopr, amopfamily, amoppurpose).  I think that's an undue amount of
> complexity to support something that's most likely physically impossible
> from the index's standpoint anyway.

Or, you'd need to pass these details separately to the AM, which seems
like a more more flexible way of doing it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-11-23 16:37:42
Message-ID: 10316.1290530262@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Nov 23, 2010 at 11:12 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> It would be the first, because simply assigning another strategy number
>> only satisfies one of the unique constraints on pg_amop. To allow
>> arbitrary flexibility here, we would have to include all components of
>> the ordering specification in the unique constraint that's presently
>> just (amopopr, amopfamily) and is proposed to become
>> (amopopr, amopfamily, amoppurpose). I think that's an undue amount of
>> complexity to support something that's most likely physically impossible
>> from the index's standpoint anyway.

> Or, you'd need to pass these details separately to the AM, which seems
> like a more more flexible way of doing it.

A more flexible way of doing what? The first requirement here is that
the catalog entries provide sufficient information to determine the
semantics. You can't just say "this opclass supports ordering", you
have to specify what that ordering is. Punting to the index AM helps
not at all, unless your proposal is to hard-wire this in GIST rather
than in the core planner.

We will probably *also* want to pass these details explicitly to the
index AM, but that doesn't solve the problem that some catalog somewhere
has to say what it is that the opclass can do.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-11-23 16:53:37
Message-ID: 10705.1290531217@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> We will probably *also* want to pass these details explicitly to the
> index AM, but that doesn't solve the problem that some catalog somewhere
> has to say what it is that the opclass can do.

... although having said that, the obvious question is why that catalog
has to be pg_amop. Maybe we should leave pg_amop alone (so that it
represents only search operators) and invent a new catalog pg_amorderop.
I envision it having the same columns as pg_amop, plus an ordering
opclass OID (and maybe we might as well stick in asc/desc and nulls_first).
The reason to do this instead of just adding those columns to pg_amop
is that then we can have a different set of unique indexes. I'm
thinking about (amopfamily, amoplefttype, amoprighttype, amopstrategy),
which would be the same as in pg_amop, plus
(amopopr, amopfamily, amopstrategy). This would allow a single operator
to be registered under multiple strategy numbers, which presumably would
carry different payload sort-order-specification columns.

This still seems like overkill to me, because I don't actually believe
that it's practical for an index to support multiple sort orders.
But if anyone would like to make an argument why that's not pie in the
sky, this might be the way to represent it.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-11-23 16:58:51
Message-ID: AANLkTinPQwTJ=YFan8w_Wu7yn8Pzu+fv29WNf-kr6KoX@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 23, 2010 at 11:37 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Nov 23, 2010 at 11:12 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> It would be the first, because simply assigning another strategy number
>>> only satisfies one of the unique constraints on pg_amop.  To allow
>>> arbitrary flexibility here, we would have to include all components of
>>> the ordering specification in the unique constraint that's presently
>>> just (amopopr, amopfamily) and is proposed to become
>>> (amopopr, amopfamily, amoppurpose).  I think that's an undue amount of
>>> complexity to support something that's most likely physically impossible
>>> from the index's standpoint anyway.
>
>> Or, you'd need to pass these details separately to the AM, which seems
>> like a more more flexible way of doing it.
>
> A more flexible way of doing what?  The first requirement here is that
> the catalog entries provide sufficient information to determine the
> semantics.  You can't just say "this opclass supports ordering", you
> have to specify what that ordering is.  Punting to the index AM helps
> not at all, unless your proposal is to hard-wire this in GIST rather
> than in the core planner.

Eh, let's just do it the way you want to do it. It's probably going
to have to be redone the next time somebody wants to make an
enhancement in this area, but I guess it's going to be easy to do that
then than to figure where to go with it now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2010-12-04 05:28:13
Message-ID: 13091.1291440493@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> 2010/9/13 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
>> [updated patch]

> I realize I'm repeating myself, but... this patch needs
> documentation. It's not optional.

I've applied all of this, and written documentation for all of it,
except for the contrib/btree_gist additions which still need to be
redone for the revised API (and then documented!). My patience ran out
somewhere around there, so I'm marking that part as returned with
feedback.

What we have at this point (pending contrib/btree_gist fixes) is
nearest-neighbor searching capability for point columns. And
trigram-based nearest-neighbor for text strings, if you install
contrib/pg_trgm. That doesn't seem like a lot of return for the
amount of work that went into it. Are there plans to add KNN support
for any other standard types?

regards, tom lane


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-12-04 18:07:50
Message-ID: Pine.LNX.4.64.1012042104330.12632@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks to all for patience !

We'll sync contrib/btree_gist with current API. Also, we're thinking
about distance for ltree, boxes, circles. I'm not sure about polygons, though.
So, we'll have rather complete set of knn-fied data types.

Oleg
On Sat, 4 Dec 2010, Tom Lane wrote:

> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> 2010/9/13 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
>>> [updated patch]
>
>> I realize I'm repeating myself, but... this patch needs
>> documentation. It's not optional.
>
> I've applied all of this, and written documentation for all of it,
> except for the contrib/btree_gist additions which still need to be
> redone for the revised API (and then documented!). My patience ran out
> somewhere around there, so I'm marking that part as returned with
> feedback.
>
> What we have at this point (pending contrib/btree_gist fixes) is
> nearest-neighbor searching capability for point columns. And
> trigram-based nearest-neighbor for text strings, if you install
> contrib/pg_trgm. That doesn't seem like a lot of return for the
> amount of work that went into it. Are there plans to add KNN support
> for any other standard types?
>
> regards, tom lane
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-12-04 18:25:21
Message-ID: AANLkTin_qT_ShY-=dJwxhFHFoZjm_MwsChGKw4MD5acA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 4, 2010 at 6:07 PM, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
> We'll sync contrib/btree_gist with current API. Also, we're thinking
> about distance for ltree, boxes, circles. I'm not sure about polygons,
> though.
> So, we'll have rather complete set of knn-fied data types.

I kind of assumed the natural client for KNN-gist was the tsearch full
text search indexes handling sorting by relevance. For example if I
search for "Postgres DBA" I should find documents where those words
appear adjacent first and documents where the two words appear far
apart in the document sorted further down. Is that not on the list of
operators supported or planned to be supported?

--
greg


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-12-04 21:20:02
Message-ID: m2wrnpnsl9.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> I kind of assumed the natural client for KNN-gist was the tsearch full
> text search indexes handling sorting by relevance. For example if I
> search for "Postgres DBA" I should find documents where those words
> appear adjacent first and documents where the two words appear far
> apart in the document sorted further down. Is that not on the list of
> operators supported or planned to be supported?

From the presentation I've seen, the typical use case is more searching
"PostgreSQL DBA" at 100 km around a known location. Or more typical yet,
Pizza restaurants around a known place :)

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-12-04 21:58:18
Message-ID: 12543.1291499898@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> writes:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
>> I kind of assumed the natural client for KNN-gist was the tsearch full
>> text search indexes handling sorting by relevance. For example if I
>> search for "Postgres DBA" I should find documents where those words
>> appear adjacent first and documents where the two words appear far
>> apart in the document sorted further down. Is that not on the list of
>> operators supported or planned to be supported?

> From the presentation I've seen, the typical use case is more searching
> "PostgreSQL DBA" at 100 km around a known location. Or more typical yet,
> Pizza restaurants around a known place:)

Right offhand I don't see how KNNGIST could usefully be applied to the
problem Greg is thinking about. A KNNGIST search is only going to be
fast if the target items can be found in a reasonably small part of the
index. Nearest-neighbor in a geometrically organized index qualifies,
but I don't see how Greg's problem matches the structure of a tsearch
index.

regards, tom lane


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2010-12-06 14:46:37
Message-ID: Pine.LNX.4.64.1012061745570.12632@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 4 Dec 2010, Greg Stark wrote:

> On Sat, Dec 4, 2010 at 6:07 PM, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
>> We'll sync contrib/btree_gist with current API. Also, we're thinking
>> about distance for ltree, boxes, circles. I'm not sure about polygons,
>> though.
>> So, we'll have rather complete set of knn-fied data types.
>
> I kind of assumed the natural client for KNN-gist was the tsearch full
> text search indexes handling sorting by relevance. For example if I
> search for "Postgres DBA" I should find documents where those words
> appear adjacent first and documents where the two words appear far
> apart in the document sorted further down. Is that not on the list of
> operators supported or planned to be supported?

We'll start thinking about this once we know how to store coordinate
information in index :)

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2010-12-23 01:04:29
Message-ID: 201012230104.oBN14TB07696@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > 2010/9/13 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
> >> [updated patch]
>
> > I realize I'm repeating myself, but... this patch needs
> > documentation. It's not optional.
>
> I've applied all of this, and written documentation for all of it,
> except for the contrib/btree_gist additions which still need to be
> redone for the revised API (and then documented!). My patience ran out
> somewhere around there, so I'm marking that part as returned with
> feedback.
>
> What we have at this point (pending contrib/btree_gist fixes) is
> nearest-neighbor searching capability for point columns. And
> trigram-based nearest-neighbor for text strings, if you install
> contrib/pg_trgm. That doesn't seem like a lot of return for the
> amount of work that went into it. Are there plans to add KNN support
> for any other standard types?

I was thinking /contrib/fuzzystrmatch could use it to find the words
that mostly closely match a string.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2010-12-23 03:05:33
Message-ID: AANLkTikFwwb_Dp1t9s1r5Hp-nr5HDYgfkeb3wpwp6ULG@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 22, 2010 at 8:04 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Tom Lane wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> > 2010/9/13 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
>> >> [updated patch]
>>
>> > I realize I'm repeating myself, but...  this patch needs
>> > documentation.  It's not optional.
>>
>> I've applied all of this, and written documentation for all of it,
>> except for the contrib/btree_gist additions which still need to be
>> redone for the revised API (and then documented!).  My patience ran out
>> somewhere around there, so I'm marking that part as returned with
>> feedback.
>>
>> What we have at this point (pending contrib/btree_gist fixes) is
>> nearest-neighbor searching capability for point columns.  And
>> trigram-based nearest-neighbor for text strings, if you install
>> contrib/pg_trgm.  That doesn't seem like a lot of return for the
>> amount of work that went into it.  Are there plans to add KNN support
>> for any other standard types?
>
> I was thinking /contrib/fuzzystrmatch could use it to find the words
> that mostly closely match a string.

Seems unlikely to be feasible.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2010-12-26 18:19:07
Message-ID: AANLkTimFmQmbzJ5CTXvE_PwT_zmCuHPoet3gaQq6Pvo8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/12/4 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> What we have at this point (pending contrib/btree_gist fixes) is
> nearest-neighbor searching capability for point columns.  And
> trigram-based nearest-neighbor for text strings, if you install
> contrib/pg_trgm.  That doesn't seem like a lot of return for the
> amount of work that went into it.  Are there plans to add KNN support
> for any other standard types?

Catching up tonight, I wonder I could propose to add ordering
operators in btree, not in gist, for basic types. So far, we couldn't
optimize simple examples like:

regression=# create index ti on t using btree(i);
CREATE INDEX
regression=# explain select * from t order by i - 10 limit 10;
QUERY PLAN
--------------------------------------------------------------------
Limit (cost=405.10..405.12 rows=10 width=24)
-> Sort (cost=405.10..430.10 rows=10000 width=24)
Sort Key: ((i - 10))
-> Seq Scan on t (cost=0.00..189.00 rows=10000 width=24)
(4 rows)

While this looks too stupid at a glance, adding 2 strategies into
btree, "addition" and "subtraction", will help RANGE concept on data
types; we were stacked around how to implement RANGE in both of window
functions' frame and PARTITION because of lack of "addition" and
"subtraction" idea. If we have 2 more strategies in btree, they can be
solved IMHO. I know addition and subtraction is never relevant to
btree indexing but avoiding unnecessary seq scan and supporting RANGE
concept may buy somehow.

Regards,

--
Hitoshi Harada


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2010-12-26 19:41:23
Message-ID: 19342.1293392483@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:
> Catching up tonight, I wonder I could propose to add ordering
> operators in btree, not in gist, for basic types.

I thought about that for a bit while working on the knngist patch, but
couldn't really see any useful application. In particular, I don't see
a way to shoehorn + and - in there as ordering operators. They don't
match the structure. The RANGE problem wants to add operators that are
somehow related to a btree's operators, but they're not related in the
way that knngist uses.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2010-12-26 20:41:46
Message-ID: AANLkTik1j1n4Xb0n0q5nTnnvRmkLCS2sUdLNrC3quSu7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 26, 2010 at 2:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:
>> Catching up tonight, I wonder I could propose to add ordering
>> operators in btree, not in gist, for basic types.
>
> I thought about that for a bit while working on the knngist patch, but
> couldn't really see any useful application.  In particular, I don't see
> a way to shoehorn + and - in there as ordering operators.  They don't
> match the structure.  The RANGE problem wants to add operators that are
> somehow related to a btree's operators, but they're not related in the
> way that knngist uses.

It's superficially syntactically similar (value op constant) but as
you say it's really a different problem. The KNNGIST case could occur
in combination with this case, too: ORDER BY (point-column +
zero-vector-constant) distance-from point-constant. What's really
going on here is that there's a class of operations which can be
applied to an ORDER-BY column without actually changing the ordering -
adding or subtracting a constant, multiplying by a positive value,
concatenating with the empty string, etc. In theory, we could try to
recognize such cases and avoid a sort, but it seems like a lot of work
in proportion to the likely benefit.

A far more valuable case to optimize would be the one where you have
ORDER BY a, b and can obtain input pre-sorted by a but not by a, b.
It'd be extremely useful to be able to take the data sorted by just a
and sort each group on b.

As far as window functions go, we clearly need some kind of type
interface feature, but I am unclear whether we should sandwhich it
into the btree opclass machinery or whether it might be better to
create a whole separate concept just for this purpose. Range types
might also like to have some of the same information (addition and
subtraction operators for a type, and perhaps also the identity and
unit if those exist).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2010-12-27 00:52:59
Message-ID: AANLkTinKOB7836fmTqXpSCdX+QKMJfipe-ydFGc1RU82@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/12/27 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Sun, Dec 26, 2010 at 2:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:
>>> Catching up tonight, I wonder I could propose to add ordering
>>> operators in btree, not in gist, for basic types.
>>
>> I thought about that for a bit while working on the knngist patch, but
>> couldn't really see any useful application.  In particular, I don't see
>> a way to shoehorn + and - in there as ordering operators.  They don't
>> match the structure.  The RANGE problem wants to add operators that are
>> somehow related to a btree's operators, but they're not related in the
>> way that knngist uses.
>
> As far as window functions go, we clearly need some kind of type
> interface feature, but I am unclear whether we should sandwhich it
> into the btree opclass machinery or whether it might be better to
> create a whole separate concept just for this purpose.  Range types
> might also like to have some of the same information (addition and
> subtraction operators for a type, and perhaps also the identity and
> unit if those exist).

I believe we should use btree opclass machinery to represent
add/subtract interfaces because in the RANGE case you eventually need
add constant to value and compare it with some boundary by using
btree. Or, btree operator set would be shared in the type interface
machinery. We will want to avoid duplication between them.

Regards,

--
Hitoshi Harada


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2010-12-27 01:13:40
Message-ID: 23613.1293412420@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:
> 2010/12/27 Robert Haas <robertmhaas(at)gmail(dot)com>:
>> As far as window functions go, we clearly need some kind of type
>> interface feature, but I am unclear whether we should sandwhich it
>> into the btree opclass machinery or whether it might be better to
>> create a whole separate concept just for this purpose.

> I believe we should use btree opclass machinery to represent
> add/subtract interfaces because in the RANGE case you eventually need
> add constant to value and compare it with some boundary by using
> btree. Or, btree operator set would be shared in the type interface
> machinery. We will want to avoid duplication between them.

The thing is, these operators have no arguable application in the
context of actual btree index searches. We could certainly stick them
into pg_amop anyway, using a third amoppurpose category to distinguish
them from searchable or orderable operators. But I share Robert's
discomfort with this --- it's unarguably an abuse of the original
intention of operator classes.

OTOH, operator classes are what we've got to represent the abstract
semantics of operators, and it's not real clear to me what we'd gain by
inventing an independent catalog structure to do more or less the same
sort of thing. One good reason *not* to do that is it'll represent even
more stuff that has to be cached during backend startup.

[ thinks for a bit... ] One reason for having a different structure
would be if we needed to represent abstract semantics for some operators
that couldn't be associated with a btree opclass. This is clearly not
an issue for what RANGE needs, since anything you can order by will
surely have a btree opclass for that, and in fact we probably need to
tie those operators to specific orderings if a datatype has more than
one sort ordering. But maybe there could be some other situation where
we'd need to describe operator behavior for a datatype independently of
any sort ordering. Can anyone come up with a plausible example?

regards, tom lane


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2010-12-28 08:22:52
Message-ID: 20101228082252.GA3085@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 26, 2010 at 08:13:40PM -0500, Tom Lane wrote:
> [ thinks for a bit... ] One reason for having a different structure
> would be if we needed to represent abstract semantics for some operators
> that couldn't be associated with a btree opclass. This is clearly not
> an issue for what RANGE needs, since anything you can order by will
> surely have a btree opclass for that, and in fact we probably need to
> tie those operators to specific orderings if a datatype has more than
> one sort ordering. But maybe there could be some other situation where
> we'd need to describe operator behavior for a datatype independently of
> any sort ordering. Can anyone come up with a plausible example?

One thing that comes to mind is the operators used for hash indexes,
namely the hash() function. It is closely related to the collation but
isn't actually used for sorting. For every btree class you can make a
hash class with the same equality operator and an appropriate hash
function.

I've had the idea of defining a parent object and deriving the btree
and hash operator classes from that, but it gets messy once you get
into cross-type operators (i.e. operator families).

With respect to the collation of strings I have thought it useful to be
able to define a sortkey() function, which would map the input space to
a 8 byte integer and satisfies the rule:

sortkey(a) < sortkey(b) implies a < b

The idea being that you can use this as an efficient first step to
speed up sorting strings, since adding it to the sort list implicitly
before the actual column doesn't change the result. In the case of
strings the sortkey() could be generated with strxfrm().

A similar idea could be used with other expensive comparison
operations, but I can't think of any at the moment. Actually, perhaps
you could use it with ints/floats/etc as well, since you could skip the
function call overhead. You'd be trading (n log n int compares + n
sortkeys) with (n log n comparisions).

Just some thoughts,

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
> - Charles de Gaulle


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2010-12-28 16:12:40
Message-ID: 3247.1293552760@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> On Sun, Dec 26, 2010 at 08:13:40PM -0500, Tom Lane wrote:
>> [ thinks for a bit... ] One reason for having a different structure
>> would be if we needed to represent abstract semantics for some operators
>> that couldn't be associated with a btree opclass.

> One thing that comes to mind is the operators used for hash indexes,
> namely the hash() function.

The hash opclasses handle that fine. I cannot conceive of any reason
for shoehorning hash functions into btree opclasses.

> With respect to the collation of strings I have thought it useful to be
> able to define a sortkey() function, which would map the input space to
> a 8 byte integer and satisfies the rule:
> sortkey(a) < sortkey(b) implies a < b

I'm pretty dubious about the workability of that one, but again, there
isn't any obvious reason why we'd need a new catalog structure to
support it. If we did want it, it could be an optional support function
in btree opclasses.

regards, tom lane


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2010-12-28 16:55:19
Message-ID: 4D1A1677.80300@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I've applied all of this, and written documentation for all of it,

Thank you a lot
> except for the contrib/btree_gist additions which still need to be
> redone for the revised API (and then documented!). My patience ran out

Done, btree_gist is reworked for a new API.

I'm very sorry, but I'm rather busy now and will be accessible only after
January, 10.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
builtin_knngist_contrib_btree_gist-0.9.gz application/x-tar 9.8 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2011-02-18 06:07:54
Message-ID: 9909.1298009274@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> I've applied all of this, and written documentation for all of it,
> Thank you a lot
>> except for the contrib/btree_gist additions which still need to be
>> redone for the revised API (and then documented!). My patience ran out

> Done, btree_gist is reworked for a new API.

I did a quick look at this patch. The major problem with it is of
course that it needs to be fixed for the recent extension-related
changes. I transposed the .sql.in changes into additions to
btree_gist--1.0.sql (attached), but haven't really sanity-checked
them beyond checking that the regression tests pass. The same mods
would need to be made in btree_gist--unpackaged--1.0.sql.

However, I feel that this is not ready to apply even given those fixes.
Problems yet to solve:

1. oid_dist() returns oid ... really? Oid is unsigned. I'd be inclined
to argue though that distance between Oids is a meaningless concept, so
you should remove this not just mess with the result type. Anybody who
actually wants to form a distance between Oids should have to cast them
to an arithmetic type first. Let the user figure out how wraparound
cases should be handled.

2. Beyond that, none of the distance routines have given any thought to
avoiding overflow. For instance, dist_int2 had better return something
wider than int2, and so on up. It looks to me like the internal gist
distance functions also suffer overflow risks, in that they tend to form
the difference first (in the source datatype) and only afterwards cast
to float8.

3. I was surprised that there wasn't a distance implementation for
numeric. I suppose that this might be difficult to do without risking
overflow in conversion to float8, though.

4. I didn't much care for changing the result type of gbt_num_consistent
from bool to float8; that's just messy, and I don't see any compensating
advantage. I suggest you leave gbt_num_consistent and its callers
alone, and add a separate gbt_num_distance routine that only handles the
KNNDistance case.

There might be more issues, I haven't read the patch in detail.
But anyway, I'm going to set it to Waiting on Author. I think it
needs at least a day or so's work, and I can't put in that kind of
time on it now.

regards, tom lane

Attachment Content-Type Size
btree_gist.sql.patch text/x-patch 12.9 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2011-02-28 17:01:31
Message-ID: AANLkTikTXE6MZx8OH3i75TUvZGz6yHU0hwsftj9YtC--@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 18, 2011 at 1:07 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> There might be more issues, I haven't read the patch in detail.
> But anyway, I'm going to set it to Waiting on Author.  I think it
> needs at least a day or so's work, and I can't put in that kind of
> time on it now.

Since no one has stepped up to fix these issues, I have marked this
patch Returned with Feedback.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2011-02-28 18:53:26
Message-ID: 4D6BEF26.2040802@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Since no one has stepped up to fix these issues, I have marked this
> patch Returned with Feedback.

This is just contrib/btree_GIST, yes?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2011-02-28 18:59:02
Message-ID: AANLkTikN8COR-3WuiVohse125wXiRya6BWCkV6snQwM+@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 28, 2011 at 1:53 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
>> Since no one has stepped up to fix these issues, I have marked this
>> patch Returned with Feedback.
>
> This is just contrib/btree_GIST, yes?

Yes, core KNN was committed by Tom during the November CommitFest.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2011-02-28 19:25:15
Message-ID: 28087.1298921115@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Feb 28, 2011 at 1:53 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>> Since no one has stepped up to fix these issues, I have marked this
>>> patch Returned with Feedback.

>> This is just contrib/btree_GIST, yes?

> Yes, core KNN was committed by Tom during the November CommitFest.

Right. However, it's disappointing that this isn't in, because the
number of use cases for KNN-gist in core isn't very large. We really
need support for KNN in btree_gist to make it useful.

Given that it is a contrib module, I personally wouldn't object to it
getting patched later, like during alpha or beta. But somebody's got
to do the work, and I've got a dozen higher-priority problems right now.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2011-02-28 19:27:22
Message-ID: AANLkTin9iiQ2CPYo3PkSuGN8B7WL_kAPAJ0GUWnEOME_@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 28, 2011 at 2:25 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Feb 28, 2011 at 1:53 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>>> Since no one has stepped up to fix these issues, I have marked this
>>>> patch Returned with Feedback.
>
>>> This is just contrib/btree_GIST, yes?
>
>> Yes, core KNN was committed by Tom during the November CommitFest.
>
> Right.  However, it's disappointing that this isn't in, because the
> number of use cases for KNN-gist in core isn't very large.  We really
> need support for KNN in btree_gist to make it useful.
>
> Given that it is a contrib module, I personally wouldn't object to it
> getting patched later, like during alpha or beta.  But somebody's got
> to do the work, and I've got a dozen higher-priority problems right now.

Well, we can argue about whether it's too late for 9.1 if and when a
patch shows up. Right now we don't have that problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2011-03-01 17:58:36
Message-ID: 4D6D33CC.9060802@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I did a quick look at this patch. The major problem with it is of
> course that it needs to be fixed for the recent extension-related
> changes. I transposed the .sql.in changes into additions to
> btree_gist--1.0.sql (attached), but haven't really sanity-checked
> them beyond checking that the regression tests pass. The same mods
> would need to be made in btree_gist--unpackaged--1.0.sql.

Fixed

> 1. oid_dist() returns oid ... really? Oid is unsigned. I'd be inclined
> to argue though that distance between Oids is a meaningless concept, so
Hmm, oid is often used as unsigned int.

> you should remove this not just mess with the result type. Anybody who
> actually wants to form a distance between Oids should have to cast them
> to an arithmetic type first. Let the user figure out how wraparound
> cases should be handled.

Distance between unsigned 32-bit integers could not be more than 2^32.

>
> 2. Beyond that, none of the distance routines have given any thought to
> avoiding overflow. For instance, dist_int2 had better return something
> wider than int2, and so on up. It looks to me like the internal gist
Just like other operations:
# select 32000::smallint + 32000::smallint;
ERROR: smallint out of range

> distance functions also suffer overflow risks, in that they tend to form
> the difference first (in the source datatype) and only afterwards cast
> to float8.
fixed

> 3. I was surprised that there wasn't a distance implementation for
> numeric. I suppose that this might be difficult to do without risking
> overflow in conversion to float8, though.
Exactly

> 4. I didn't much care for changing the result type of gbt_num_consistent
> from bool to float8; that's just messy, and I don't see any compensating
> advantage. I suggest you leave gbt_num_consistent and its callers
> alone, and add a separate gbt_num_distance routine that only handles the
> KNNDistance case.
Done

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
builtin_knngist_contrib_btree_gist-0.12.gz application/x-tar 10.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: knngist - 0.8
Date: 2011-03-01 18:35:20
Message-ID: 23414.1299004520@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Feb 28, 2011 at 2:25 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Given that it is a contrib module, I personally wouldn't object to it
>> getting patched later, like during alpha or beta. But somebody's got
>> to do the work, and I've got a dozen higher-priority problems right now.

> Well, we can argue about whether it's too late for 9.1 if and when a
> patch shows up. Right now we don't have that problem.

We do now ...
http://archives.postgresql.org/pgsql-hackers/2011-03/msg00038.php

Since we appear to be still holding the commitfest open for Sync Rep,
I guess this ought to get reviewed.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: knngist - 0.8
Date: 2011-03-02 19:47:21
Message-ID: 12550.1299095241@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> [ builtin_knngist_contrib_btree_gist-0.12 patch ]

Applied with some corrections --- mostly, that the upgrade script was
all wet. I added some documentation too.

regards, tom lane


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: knngist - 0.8
Date: 2011-03-03 10:55:20
Message-ID: Pine.LNX.4.64.1103031354080.14517@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks, Tom !

Oleg
On Wed, 2 Mar 2011, Tom Lane wrote:

> Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> [ builtin_knngist_contrib_btree_gist-0.12 patch ]
>
> Applied with some corrections --- mostly, that the upgrade script was
> all wet. I added some documentation too.
>
> regards, tom lane
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83