Re: Efficiency Issue with "SELECT IN (..)" and "SELECT .. OR .."

Lists: pgsql-sql
From: Mike Winter <mike(dot)winter(at)frontlogic(dot)com>
To: <pgsql-sql(at)postgresql(dot)org>
Subject: Efficiency Issue with "SELECT IN (..)" and "SELECT .. OR .."
Date: 2003-05-15 21:44:29
Message-ID: Pine.LNX.4.33L2.0305151524590.6984-100000@frontlogic.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Hi, when doing queries of the type:

SELECT id FROM foo WHERE id IN (1, 4, 3, 2, 10, 11, 14) .., I get
terrible performance on tables of any resonable size. I see the
same behaviour when doing queries of the form "SELECT id FROM
foo WHERE id = 5 OR id = 6 OR ..."

When doing an "EXPLAIN" on the query, I get output like the
following:

Index Scan using foo_idx, foo_idx, foo_idx, foo_idx, foo_idx,
foo_idx on foo (cost=0.00..18.16 rows=6 width=4)

If the "IN (1, 2, 3, 6, ..., n)" clause is big enough, the
database will actually throw an error saying "Recursive Depth
Exceeded" or something similar and not complete the query.

It looks to me like the query parser is recursively calling
an index scan for each row in the 'IN' clause rather than just
doing one index scan that it seems it should be.

My question is, does anyone have any alternate ideas for how I
can do a query like this and have it perform well? The tables I
am working with are big enough that a sequential scan is not
helpful. Is this a bug I am encountering or an error in my
query? Is this a known issue?

I have seen this beahaviour on 7.2.1 and 7.3.2 on both Solaris
and Linux platforms.

Thanks for any input.


From: Richard Huxton <dev(at)archonet(dot)com>
To: Mike Winter <mike(dot)winter(at)frontlogic(dot)com>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Efficiency Issue with "SELECT IN (..)" and "SELECT .. OR .."
Date: 2003-05-16 17:29:09
Message-ID: 200305161829.09631.dev@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Thursday 15 May 2003 10:44 pm, Mike Winter wrote:
> Hi, when doing queries of the type:
>
> SELECT id FROM foo WHERE id IN (1, 4, 3, 2, 10, 11, 14) .., I get
> terrible performance on tables of any resonable size. I see the
> same behaviour when doing queries of the form "SELECT id FROM
> foo WHERE id = 5 OR id = 6 OR ..."
[snip]
> It looks to me like the query parser is recursively calling
> an index scan for each row in the 'IN' clause rather than just
> doing one index scan that it seems it should be.

Hmm - not sure how you could. When it says index-scan it's actually traversing
a btree (probably), not scanning a list of indexes. The IN is basically
treated like a series of a OR b OR c, hence the similar behaviour.

> My question is, does anyone have any alternate ideas for how I
> can do a query like this and have it perform well? The tables I
> am working with are big enough that a sequential scan is not
> helpful. Is this a bug I am encountering or an error in my
> query? Is this a known issue?

Known issue - the usual advice is to rewrite in the form of EXISTS, but I
can't think how to do that if you have a long list of literal values. You
could create a temp table to hold your matching values and join against it,
but I realise that's not a terribly elegant solution. Unless of course, it's
a search-engine type of situation where it makes a certain amount of sense.

> I have seen this beahaviour on 7.2.1 and 7.3.2 on both Solaris
> and Linux platforms.

Supposed to be some improvements in the forthcoming 7.4 but I don't know if
that will help your particular case.
--
Richard Huxton


From: Randall Lucas <rlucas(at)tercent(dot)net>
To: Mike Winter <mike(dot)winter(at)frontlogic(dot)com>
Cc: <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Efficiency Issue with "SELECT IN (..)" and "SELECT .. OR .."
Date: 2003-05-16 17:34:37
Message-ID: AA65DD22-87C4-11D7-8D9A-000A957653D6@tercent.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql


Hi Mike,

This is a well-known issue and to my knowledge has been addressed in
the 7.4 branch.

The recommended solution is to rephrase your query using EXISTS and
eliminating the IN (hint: may require adding a join to the query);
search pgsql-sql or pgsql-performance for details on others (this
question is posted approximately weekly.

Best,

Randall

On Thursday, May 15, 2003, at 05:44 PM, Mike Winter wrote:

> Hi, when doing queries of the type:
>
> SELECT id FROM foo WHERE id IN (1, 4, 3, 2, 10, 11, 14) .., I get
> terrible performance on tables of any resonable size. I see the
> same behaviour when doing queries of the form "SELECT id FROM
> foo WHERE id = 5 OR id = 6 OR ..."
>
> When doing an "EXPLAIN" on the query, I get output like the
> following:
>
> Index Scan using foo_idx, foo_idx, foo_idx, foo_idx, foo_idx,
> foo_idx on foo (cost=0.00..18.16 rows=6 width=4)
>
> If the "IN (1, 2, 3, 6, ..., n)" clause is big enough, the
> database will actually throw an error saying "Recursive Depth
> Exceeded" or something similar and not complete the query.
>
> It looks to me like the query parser is recursively calling
> an index scan for each row in the 'IN' clause rather than just
> doing one index scan that it seems it should be.
>
> My question is, does anyone have any alternate ideas for how I
> can do a query like this and have it perform well? The tables I
> am working with are big enough that a sequential scan is not
> helpful. Is this a bug I am encountering or an error in my
> query? Is this a known issue?
>
> I have seen this beahaviour on 7.2.1 and 7.3.2 on both Solaris
> and Linux platforms.
>
> Thanks for any input.
>
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mike Winter <mike(dot)winter(at)frontlogic(dot)com>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: Efficiency Issue with "SELECT IN (..)" and "SELECT .. OR .."
Date: 2003-05-16 19:25:22
Message-ID: 29030.1053113122@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

Mike Winter <mike(dot)winter(at)frontlogic(dot)com> writes:
> If the "IN (1, 2, 3, 6, ..., n)" clause is big enough, the
> database will actually throw an error saying "Recursive Depth
> Exceeded" or something similar and not complete the query.

Would it perhaps be saying "out of free buffers: time to abort!" ?
If so, you're probably running into this bug, which was introduced
(by me :-() in 7.3:
http://archives.postgresql.org/pgsql-hackers/2003-03/msg00939.php
There is a fix in place for 7.3.3.

> It looks to me like the query parser is recursively calling
> an index scan for each row in the 'IN' clause rather than just
> doing one index scan that it seems it should be.

It is performing one index search per target value, yes, but not
recursively. That's what it's supposed to do.

regards, tom lane


From: Mike Winter <mike(dot)winter(at)frontlogic(dot)com>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Efficiency Issue with "SELECT IN (..)" and "SELECT .. OR
Date: 2003-05-16 19:28:31
Message-ID: Pine.LNX.4.33L2.0305161307330.19426-100000@frontlogic.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-sql

On Fri, 16 May 2003, Richard Huxton wrote:

> > My question is, does anyone have any alternate ideas for how I
> > can do a query like this and have it perform well? The tables I
> > am working with are big enough that a sequential scan is not
> > helpful. Is this a bug I am encountering or an error in my
> > query? Is this a known issue?
>
> Known issue - the usual advice is to rewrite in the form of EXISTS, but I
> can't think how to do that if you have a long list of literal values. You
> could create a temp table to hold your matching values and join against it,
> but I realise that's not a terribly elegant solution. Unless of course, it's
> a search-engine type of situation where it makes a certain amount of sense.

Thanks to everyone for the replies.

The temporary table route is something we did. It's an acceptable
stopgap, but it's still not outstanding in terms of performance
and has lead to other issues. I will examine the 'EXISTS' clause
option depending on the progress of 7.4.

> > I have seen this beahaviour on 7.2.1 and 7.3.2 on both Solaris
> > and Linux platforms.
>
> Supposed to be some improvements in the forthcoming 7.4 but I don't know if
> that will help your particular case.

That's good news.

--
_______________________________________________________________________
Front Logic Inc. Tel: 306.653.2725 x14
226 Pacific Ave or 1.800.521.4510 x14
Saskatoon, SK Fax: 306.653.0972
S7K 1N9 Canada Cell: 306.717.3746
http://www.frontlogic.com mike(dot)winter(at)frontlogic(dot)com

Find out what TYPENGO(tm) N300 Search Technology can do for your
company: http://www.frontlogic.com/interactive/hosts/typengo/index.html