Re: performance question (something to do w/ parameterized

From: Jeffrey Tenny <jeffrey(dot)tenny(at)comcast(dot)net>
To: Mark Lewis <mark(dot)lewis(at)mir3(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: performance question (something to do w/ parameterized
Date: 2006-05-08 19:51:26
Message-ID: 445FA13E.2090304@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Mark Lewis wrote:
> Doing a SELECT with a large list of variables inside an IN runs slowly
> on every database we've tested. We've tested mostly in Oracle and
> PostgreSQL, and both get very slow very quickly (actually Oracle refuses
> to process the query at all after it gets too many bind parameters).
>
> In our case, we have a (potentially quite large) set of external values
> that we want to look up in the database. We originally thought that
> doing a single select with a large IN clause was the way to go, but then
> we did some performance analysis on the optimal batch size (number of
> items to include per IN clause), and discovered that for most databases,
> the optimal batch size was 1. For PostgreSQL I think it was 2.

So that is for parameterized queries (the batch size?).

In my case, I was concerned about latency between app and database
server, so I try to minimize the number of queries I send to the
database server. (My app servers can be anywhere, they /should/ be
close to the database server, but there are no guarantees and I can't
control it).

The last time I tested for optimal batch size using non-parameterized
queries with same-host database and app, I got a batch size of
approximately 700 IN list elements (again, not variables in that test).
That was on postgres 7.X.Y.

Guess I'll have to try a test where I turn the parameterized statements
into regular statements.

I'm pretty sure it would be a bad idea for me to send one IN list
element at a time in all cases. Even if the query query prep was fast,
the network latency could kill my app.

>
> The moral of the story is that you're probably better off running a
> bunch of small selects than in trying to optimize things with one
> gargantuan select.

The algorithm currently tries to ensure that IN-lists of not more than
700 elements are sent to the database server, and breaks them into
multiple queries. If it has to break it into at least 3 queries, it
uses parameterized statements for the first 2+ and then a
non-parameterized statement for the last one (which may have a different
number of IN list elements than the prior batches).

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Hallgren 2006-05-08 19:52:22 Re: Number of dimensions of an array parameter
Previous Message Jeffrey Tenny 2006-05-08 19:43:39 Re: performance question (something to do w/ parameterized

Browse pgsql-performance by date

  From Date Subject
Next Message Alvaro Herrera 2006-05-08 20:02:59 Re: Query runs 38 seconds for small database!
Previous Message Tom Lane 2006-05-08 19:48:38 Re: Query runs 38 seconds for small database!