Re: performance question (something to do w/

From: Kenneth Marshall <ktm(at)it(dot)is(dot)rice(dot)edu>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Mark Lewis <mark(dot)lewis(at)mir3(dot)com>, Jeffrey Tenny <jeffrey(dot)tenny(at)comcast(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: performance question (something to do w/
Date: 2006-05-08 18:01:36
Message-ID: 20060508180136.GK1862@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Mon, May 08, 2006 at 12:50:13PM -0500, Jim C. Nasby wrote:
> On Mon, May 08, 2006 at 10:42:21AM -0700, Mark Lewis wrote:
> > Doing a SELECT with a large list of variables inside an IN runs slowly
> > on every database we've tested. We've tested mostly in Oracle and
> > PostgreSQL, and both get very slow very quickly (actually Oracle refuses
> > to process the query at all after it gets too many bind parameters).
> >
> > In our case, we have a (potentially quite large) set of external values
> > that we want to look up in the database. We originally thought that
> > doing a single select with a large IN clause was the way to go, but then
> > we did some performance analysis on the optimal batch size (number of
> > items to include per IN clause), and discovered that for most databases,
> > the optimal batch size was 1. For PostgreSQL I think it was 2.
> >
> > The moral of the story is that you're probably better off running a
> > bunch of small selects than in trying to optimize things with one
> > gargantuan select.
>
> Ever experiment with loading the parameters into a temp table and
> joining to that?
> --
> Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
> Pervasive Software http://pervasive.com work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>
> ---------------------------(end of broadcast)---------------------------

The DB use by the DSPAM software is very similar to your use case. The
fastest queries are made using the PostgreSQL generate_series functionality
to unwind the "IN *" to multiple single selects. Here is the lookup function
that they use:

create function lookup_tokens(integer,bigint[])
returns setof dspam_token_data
language plpgsql stable
as '
declare
v_rec record;
begin
for v_rec in select * from dspam_token_data
where uid=$1
and token in (select $2[i]
from generate_series(array_lower($2,1),
array_upper($2,1)) s(i))
loop
return next v_rec;
end loop;
return;
end;';

You should be able to try something similar for your workload.

Ken Marshall

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Rod Taylor 2006-05-08 18:03:17 Re: Number of dimensions of an array parameter
Previous Message Tom Lane 2006-05-08 17:59:56 Re: performance question (something to do w/

Browse pgsql-performance by date

  From Date Subject
Next Message Andrus 2006-05-08 18:10:07 Re: Query runs 38 seconds for small database!
Previous Message Tom Lane 2006-05-08 17:59:56 Re: performance question (something to do w/