Re: Efficiency Issue with "SELECT IN (..)" and "SELECT .. OR .."

From: Richard Huxton <dev(at)archonet(dot)com>
To: Mike Winter <mike(dot)winter(at)frontlogic(dot)com>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: Efficiency Issue with "SELECT IN (..)" and "SELECT .. OR .."
Date: 2003-05-16 17:29:09
Message-ID: 200305161829.09631.dev@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Thursday 15 May 2003 10:44 pm, Mike Winter wrote:
> Hi, when doing queries of the type:
>
> SELECT id FROM foo WHERE id IN (1, 4, 3, 2, 10, 11, 14) .., I get
> terrible performance on tables of any resonable size. I see the
> same behaviour when doing queries of the form "SELECT id FROM
> foo WHERE id = 5 OR id = 6 OR ..."
[snip]
> It looks to me like the query parser is recursively calling
> an index scan for each row in the 'IN' clause rather than just
> doing one index scan that it seems it should be.

Hmm - not sure how you could. When it says index-scan it's actually traversing
a btree (probably), not scanning a list of indexes. The IN is basically
treated like a series of a OR b OR c, hence the similar behaviour.

> My question is, does anyone have any alternate ideas for how I
> can do a query like this and have it perform well? The tables I
> am working with are big enough that a sequential scan is not
> helpful. Is this a bug I am encountering or an error in my
> query? Is this a known issue?

Known issue - the usual advice is to rewrite in the form of EXISTS, but I
can't think how to do that if you have a long list of literal values. You
could create a temp table to hold your matching values and join against it,
but I realise that's not a terribly elegant solution. Unless of course, it's
a search-engine type of situation where it makes a certain amount of sense.

> I have seen this beahaviour on 7.2.1 and 7.3.2 on both Solaris
> and Linux platforms.

Supposed to be some improvements in the forthcoming 7.4 but I don't know if
that will help your particular case.
--
Richard Huxton

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Randall Lucas 2003-05-16 17:34:37 Re: Efficiency Issue with "SELECT IN (..)" and "SELECT .. OR .."
Previous Message Richard Huxton 2003-05-16 17:22:37 Re: Chain/Thread Problem