Re: knngist - 0.8

Lists: pgsql-hackers
From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: teodor(at)sigaev(dot)ru
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: knngist - 0.8
Date: 2010-07-29 19:38:56
Message-ID: AANLkTinV_HK3CLC6_98coYkorPC75vsdYO7vheTmv5Yd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I think that queries like this:
select * from test where val <-> 500 < 1 order by val <-> 500;
can also be optimized using knngist. In case of btree_gist this query can be
easily rewritten:
select * from test where val > 499 and val < 501 order by val <-> 500;
But, in pg_trgm it makes it possible to combine different similarity levels
in one query. For example:
select * from test_trgm order by t <-> 'asdf' < 0.5 or t <-> 'qwer' < 0.4;
Is there any chance to handle this syntax also?

----
With best regards,
Alexander Korotkov.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: teodor(at)sigaev(dot)ru, pgsql-hackers(at)postgresql(dot)org
Subject: Re: knngist - 0.8
Date: 2010-08-02 16:14:19
Message-ID: AANLkTinrnxCJDOYiUkBd-yU0J-42_5TquQXCAr+KB4Hp@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/29 Alexander Korotkov <aekorotkov(at)gmail(dot)com>:
> But, in pg_trgm it makes it possible to combine different similarity levels
> in one query. For example:
> select * from test_trgm order by t <-> 'asdf' < 0.5 or t <-> 'qwer' < 0.4;
> Is there any chance to handle this syntax also?

Maybe I'm missing something, but I don't think that ORDER BY clause
makes much sense. OR is going to reduce a true or false value - and
it's usually not that interesting to order by a column that can only
take one of two values.

Am I confused?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: teodor(at)sigaev(dot)ru, pgsql-hackers(at)postgresql(dot)org
Subject: Re: knngist - 0.8
Date: 2010-08-08 20:28:34
Message-ID: AANLkTi=h8thE74d0K_A-g=0TSLbZuNSHicH7hGZWGnD0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

In gist consitent method support only filtering strategies. For such
strategies consistent method returns true if subtree can contain matching
node and false otherwise. Knngist introduce also order by strategies. For
filtering strategies knngist consistent method returns 0 if subtree can
contain matching node and -1 otherwise. For order by strategies knngist
consistent method returns minimal possible distance in subtree. I think we
can use consistent method with order by strategies not only for ordering but
also for filtering. If query contain assertion that distance is less than
some value, than we can call consistent method with order by strategy and
compare result with query value in order to determine whether scan subtree.
Such approach can give benefit when we need to filter by similarity. For
example, in pg_trgm "%" is used for similarity filtering, but similarity
threshold is global for session. That's why we can't create complex queries
which contain similarity filtering with different threshold.

----
With best regards,
Alexander Korotkov.

On Mon, Aug 2, 2010 at 8:14 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> 2010/7/29 Alexander Korotkov <aekorotkov(at)gmail(dot)com>:
> > But, in pg_trgm it makes it possible to combine different similarity
> levels
> > in one query. For example:
> > select * from test_trgm order by t <-> 'asdf' < 0.5 or t <-> 'qwer' <
> 0.4;
> > Is there any chance to handle this syntax also?
>
> Maybe I'm missing something, but I don't think that ORDER BY clause
> makes much sense. OR is going to reduce a true or false value - and
> it's usually not that interesting to order by a column that can only
> take one of two values.
>
> Am I confused?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: teodor(at)sigaev(dot)ru, pgsql-hackers(at)postgresql(dot)org
Subject: Re: knngist - 0.8
Date: 2010-08-09 18:32:09
Message-ID: AANLkTin9QBrxbcjG4Dmbsv0u-xHEfqmQDSCo+PNRh1A0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Aug 8, 2010 at 4:28 PM, Alexander Korotkov <aekorotkov(at)gmail(dot)com> wrote:
> In gist consitent method support only filtering strategies. For such
> strategies consistent method returns true if subtree can contain matching
> node and false otherwise. Knngist introduce also order by strategies. For
> filtering strategies knngist consistent method returns 0 if  subtree can
> contain matching node and -1 otherwise. For order by strategies knngist
> consistent method returns minimal possible distance in subtree. I think we
> can use consistent method with order by strategies not only for ordering but
> also for filtering. If query contain assertion that distance is less than
> some value, than we can call consistent method with order by strategy and
> compare result with query value in order to determine whether scan subtree.

I am not an expert on this code, but after thinking this over, I
believe you are correct and that this is a good point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, teodor(at)sigaev(dot)ru, pgsql-hackers(at)postgresql(dot)org
Subject: Re: knngist - 0.8
Date: 2010-08-09 21:35:47
Message-ID: 4C6074B3.7020307@timbira.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alexander Korotkov escreveu:
> Such approach
> can give benefit when we need to filter by similarity. For example, in
> pg_trgm "%" is used for similarity filtering, but similarity threshold
> is global for session. That's why we can't create complex queries which
> contain similarity filtering with different threshold.
>
What do you mean by complex queries? You can always use the SET command. Sadly
it doesn't work when you have different thresholds within distinct subqueries.
(In pg_similarity I use this approach to set the function's thresholds). What
I am investigating is a way to build an index with some user-defined
parameters. (We already have some infra-structure in reloptions for that but
it needs some work to support my idea). I have some half-baked patch that I'm
planning to submit to some of the CFs. Unfortunately, I don't have time for it
ATM.

--
Euler Taveira de Oliveira
http://www.timbira.com/


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, teodor(at)sigaev(dot)ru, pgsql-hackers(at)postgresql(dot)org
Subject: Re: knngist - 0.8
Date: 2010-08-09 22:00:25
Message-ID: AANLkTim_aHaY-To-ApdN1VHm3bef6zHc_K0eYgQROxNe@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 10, 2010 at 1:35 AM, Euler Taveira de Oliveira <
euler(at)timbira(dot)com> wrote:

> What do you mean by complex queries? You can always use the SET command.
> Sadly
> it doesn't work when you have different thresholds within distinct
> subqueries.
> (In pg_similarity I use this approach to set the function's thresholds).

I mean exactly different thresholds in distinct subqueries.

What
> I am investigating is a way to build an index with some user-defined
> parameters. (We already have some infra-structure in reloptions for that
> but
> it needs some work to support my idea). I have some half-baked patch that
> I'm
> planning to submit to some of the CFs. Unfortunately, I don't have time for
> it
> ATM.
>
User-defined parameters for GiST would be a great feature. I'm performing
some experiments with GiST and I'm really feeling the need of it.

----
With best regards,
Alexander Korotkov.


From: David Fetter <david(at)fetter(dot)org>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, teodor(at)sigaev(dot)ru, pgsql-hackers(at)postgresql(dot)org
Subject: Re: knngist - 0.8
Date: 2010-08-09 23:46:19
Message-ID: 20100809234619.GA11024@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 09, 2010 at 06:35:47PM -0300, Euler Taveira de Oliveira wrote:
> Alexander Korotkov escreveu:
> > Such approach can give benefit when we need to filter by
> > similarity. For example, in pg_trgm "%" is used for similarity
> > filtering, but similarity threshold is global for session. That's
> > why we can't create complex queries which contain similarity
> > filtering with different threshold.
> >
> What do you mean by complex queries? You can always use the SET
> command. Sadly it doesn't work when you have different thresholds
> within distinct subqueries. (In pg_similarity I use this approach
> to set the function's thresholds). What I am investigating is a way
> to build an index with some user-defined parameters. (We already
> have some infra-structure in reloptions for that but it needs some
> work to support my idea). I have some half-baked patch that I'm
> planning to submit to some of the CFs. Unfortunately, I don't have
> time for it ATM.

Do you have enough of it to send out as WIP?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate