From: | Jeffrey Tenny <jeffrey(dot)tenny(at)comcast(dot)net> |
---|---|
To: | Mark Lewis <mark(dot)lewis(at)mir3(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: performance question (something to do w/ parameterized |
Date: | 2006-05-08 19:51:26 |
Message-ID: | 445FA13E.2090304@comcast.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-performance |
Mark Lewis wrote:
> Doing a SELECT with a large list of variables inside an IN runs slowly
> on every database we've tested. We've tested mostly in Oracle and
> PostgreSQL, and both get very slow very quickly (actually Oracle refuses
> to process the query at all after it gets too many bind parameters).
>
> In our case, we have a (potentially quite large) set of external values
> that we want to look up in the database. We originally thought that
> doing a single select with a large IN clause was the way to go, but then
> we did some performance analysis on the optimal batch size (number of
> items to include per IN clause), and discovered that for most databases,
> the optimal batch size was 1. For PostgreSQL I think it was 2.
So that is for parameterized queries (the batch size?).
In my case, I was concerned about latency between app and database
server, so I try to minimize the number of queries I send to the
database server. (My app servers can be anywhere, they /should/ be
close to the database server, but there are no guarantees and I can't
control it).
The last time I tested for optimal batch size using non-parameterized
queries with same-host database and app, I got a batch size of
approximately 700 IN list elements (again, not variables in that test).
That was on postgres 7.X.Y.
Guess I'll have to try a test where I turn the parameterized statements
into regular statements.
I'm pretty sure it would be a bad idea for me to send one IN list
element at a time in all cases. Even if the query query prep was fast,
the network latency could kill my app.
>
> The moral of the story is that you're probably better off running a
> bunch of small selects than in trying to optimize things with one
> gargantuan select.
The algorithm currently tries to ensure that IN-lists of not more than
700 elements are sent to the database server, and breaks them into
multiple queries. If it has to break it into at least 3 queries, it
uses parameterized statements for the first 2+ and then a
non-parameterized statement for the last one (which may have a different
number of IN list elements than the prior batches).
From | Date | Subject | |
---|---|---|---|
Next Message | Thomas Hallgren | 2006-05-08 19:52:22 | Re: Number of dimensions of an array parameter |
Previous Message | Jeffrey Tenny | 2006-05-08 19:43:39 | Re: performance question (something to do w/ parameterized |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2006-05-08 20:02:59 | Re: Query runs 38 seconds for small database! |
Previous Message | Tom Lane | 2006-05-08 19:48:38 | Re: Query runs 38 seconds for small database! |