Wildly erratic query performance

Lists: pgsql-general
From: Eric Schwarzenbach <subscriber(at)blackbrook(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Wildly erratic query performance
Date: 2008-10-31 19:34:48
Message-ID: 490B5DD8.3060905@blackbrook.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I've got a particular query that is giving me ridiculously erratic query
performance. I have the SQL in a pgadmin query window, and from one
execution to another, with no changes, the time it takes varies from
half a second to, well, at least 10 minutes or so at which point I give
up an cancel the query. A typical time is 2-3 seconds, but it's all over
the map. I've seen numbers like 112 seconds for one which returns
without exceeding my patience. In every half a dozen or so execution
there will be one time which is an order of magnitude bigger than the
others. A typical series of executions might be something like 2
seconds, 5 seconds, 8 seconds, 0.5 seconds, 80 seconds, 5 seconds.

Note that the database is running on my local machine, the same machine
I'm running the queries from, and nothing else is using this postgresql
installation. The data in the database is also not changing--there are
no inserts or updates happening between queries. I ran a vaccuum (full,
analyze) just before I trying these queries. I do monitor my CPU usage
and there is definitely not some other process on my machine sucking up
all the cpu cycles now and then to explain this.

This is postgreslq 8.3, on Windows XP. The query joins about 17 tables
(without an explicit JOIN, just using the WHERE criteria) with a few
further predicates. One thing which distinguishes it from other similar
queries I've been doing where I haven't seen this odd erraticness is
there are 2 predicates ORred together (and then ANDed with all the other
conditions which are all ANDed) which effectively divides 2 subsets of
joined tables which are not joined to each other, but both joined to
another set of tables. (I don't know if that was a comprehensible way of
explaining this...but I don't know if it's relevant enough to be worth
explaining in more detail).

I've tried running explain, however the wild erraticness seems to go
away when I use explain, taking in the ballpark of 1.5 seconds every
time. This is faster than my average query time using a plain execute,
even if I don't discount all the unusually long times.

Is there any reasonable explanation for this phenomena?

I do realize I could help the query planner with explicit JOINs, however
I have not yet embarked on this optimization, and might not bother if
the query performance is acceptable without doing so. I don't expect the
execution plan to be optimal, however I do expect it to be deterministic.

Thanks,
Eric


From: "Dann Corbit" <DCorbit(at)connx(dot)com>
To: "Eric Schwarzenbach" <subscriber(at)blackbrook(dot)org>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: Wildly erratic query performance
Date: 2008-10-31 20:09:23
Message-ID: D425483C2C5C9F49B5B7A41F89441547010011F6@postal.corporate.connx.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> -----Original Message-----
> From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-
> owner(at)postgresql(dot)org] On Behalf Of Eric Schwarzenbach
> Sent: Friday, October 31, 2008 12:35 PM
> To: pgsql-general(at)postgresql(dot)org
> Subject: [GENERAL] Wildly erratic query performance
>
> I've got a particular query that is giving me ridiculously erratic
> query
> performance. I have the SQL in a pgadmin query window, and from one
> execution to another, with no changes, the time it takes varies from
> half a second to, well, at least 10 minutes or so at which point I
give
> up an cancel the query. A typical time is 2-3 seconds, but it's all
> over
> the map. I've seen numbers like 112 seconds for one which returns
> without exceeding my patience. In every half a dozen or so execution
> there will be one time which is an order of magnitude bigger than the
> others. A typical series of executions might be something like 2
> seconds, 5 seconds, 8 seconds, 0.5 seconds, 80 seconds, 5 seconds.
>
> Note that the database is running on my local machine, the same
machine
> I'm running the queries from, and nothing else is using this
postgresql
> installation. The data in the database is also not changing--there are
> no inserts or updates happening between queries. I ran a vaccuum
(full,
> analyze) just before I trying these queries. I do monitor my CPU usage
> and there is definitely not some other process on my machine sucking
up
> all the cpu cycles now and then to explain this.
>
> This is postgreslq 8.3, on Windows XP. The query joins about 17 tables
> (without an explicit JOIN, just using the WHERE criteria) with a few
> further predicates. One thing which distinguishes it from other
similar
> queries I've been doing where I haven't seen this odd erraticness is
> there are 2 predicates ORred together (and then ANDed with all the
> other
> conditions which are all ANDed) which effectively divides 2 subsets of
> joined tables which are not joined to each other, but both joined to
> another set of tables. (I don't know if that was a comprehensible way
> of
> explaining this...but I don't know if it's relevant enough to be worth
> explaining in more detail).
>
> I've tried running explain, however the wild erraticness seems to go
> away when I use explain, taking in the ballpark of 1.5 seconds every
> time. This is faster than my average query time using a plain execute,
> even if I don't discount all the unusually long times.
>
> Is there any reasonable explanation for this phenomena?
>
> I do realize I could help the query planner with explicit JOINs,
> however
> I have not yet embarked on this optimization, and might not bother if
> the query performance is acceptable without doing so. I don't expect
> the
> execution plan to be optimal, however I do expect it to be
> deterministic.

Something is missing from your descriptions.
An explain analyze on the query and a list of the schema for the
relevant tables would be helpful.
Are the queries identical? Just changing the where clause a bit can
cause big differences in query speed.

Consider:
SELECT age, status, phone FROM work_force WHERE state IN ('NY', 'CA',
'TX');
May run more slowly than:
SELECT age, status, phone FROM work_force WHERE state IN ('ID', 'MT',
'NV');
Because the first three states have large populations and the last three
states have smaller populations.

Does the database machine run solely as a database server or are there
other things going on? E.g. If you are doing a compile and link of
10,000 source files during one query and the machine is otherwise idle
during a different one, we will expect different results.

There will be (of course) a logical explanation for the query time
differences.

I suggest the following:
1. Do an explain analyze on a query that is slow
2. Do an explain analyze on a query that is fast (unless the queries
are literally identical in every way)
I guess that (along with the schema) will be enough to get an idea what
is happening.


From: Eric Schwarzenbach <subscriber(at)blackbrook(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Wildly erratic query performance
Date: 2008-10-31 20:36:02
Message-ID: 490B6C32.3020208@blackbrook.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Dann,

Thanks for your response. I thought I'd covered most of what your are
asking in my first message, but these results are weird enough that I
can understand you might not give me the benefit of the doubt and
without very explicit confirmation. To answer your questions:

YES the query each time is IDENTICAL. I am not changing a single
character, I am simply clicking the execute button in pgadmin for each run.

This is my personal laptop, so of course there are other processes
running, like say, a web browser, an email client, etc. And of course,
as on any such machine there may be minor processes that execute in the
background without my awareness. But I am not actively doing anything
else (like running a large compile, yeesh!) while the query is
executing. This is why I noted that I was monitoring the CPU usage (and
processes), so that I can be confident that something major is not
suddenly running in the background without my initiating it directly.
And to make the difference between 2 seconds and a minute, let alone 10
minutes, would take a pretty major and hard not to notice process.

As I explained already (no pun intended) running the query using EXPLAIN
makes the wild variation go away. So I cannot get explain results for a
fast and for a slow execution.

I did not include schema information and such because I am not clear I
am allowed to make them public, and because I'm not looking for a highly
specific answer, merely are there ANY conditions where the SAME EXACT
QUERY should perform so radically differently. If the query planner, for
example, used a random number generator to choose the order in which it
performed my joins, such that the join order would be different each
time, this would explain it--that possibility would seem bizarre to me,
but it would certainly answer my question.

Eric

Dann Corbit wrote:
>> -----Original Message-----
>> From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-
>> owner(at)postgresql(dot)org] On Behalf Of Eric Schwarzenbach
>> Sent: Friday, October 31, 2008 12:35 PM
>> To: pgsql-general(at)postgresql(dot)org
>> Subject: [GENERAL] Wildly erratic query performance
>>
>> I've got a particular query that is giving me ridiculously erratic
>> query
>> performance. I have the SQL in a pgadmin query window, and from one
>> execution to another, with no changes, the time it takes varies from
>> half a second to, well, at least 10 minutes or so at which point I
>>
> give
>
>> up an cancel the query. A typical time is 2-3 seconds, but it's all
>> over
>> the map. I've seen numbers like 112 seconds for one which returns
>> without exceeding my patience. In every half a dozen or so execution
>> there will be one time which is an order of magnitude bigger than the
>> others. A typical series of executions might be something like 2
>> seconds, 5 seconds, 8 seconds, 0.5 seconds, 80 seconds, 5 seconds.
>>
>> Note that the database is running on my local machine, the same
>>
> machine
>
>> I'm running the queries from, and nothing else is using this
>>
> postgresql
>
>> installation. The data in the database is also not changing--there are
>> no inserts or updates happening between queries. I ran a vaccuum
>>
> (full,
>
>> analyze) just before I trying these queries. I do monitor my CPU usage
>> and there is definitely not some other process on my machine sucking
>>
> up
>
>> all the cpu cycles now and then to explain this.
>>
>> This is postgreslq 8.3, on Windows XP. The query joins about 17 tables
>> (without an explicit JOIN, just using the WHERE criteria) with a few
>> further predicates. One thing which distinguishes it from other
>>
> similar
>
>> queries I've been doing where I haven't seen this odd erraticness is
>> there are 2 predicates ORred together (and then ANDed with all the
>> other
>> conditions which are all ANDed) which effectively divides 2 subsets of
>> joined tables which are not joined to each other, but both joined to
>> another set of tables. (I don't know if that was a comprehensible way
>> of
>> explaining this...but I don't know if it's relevant enough to be worth
>> explaining in more detail).
>>
>> I've tried running explain, however the wild erraticness seems to go
>> away when I use explain, taking in the ballpark of 1.5 seconds every
>> time. This is faster than my average query time using a plain execute,
>> even if I don't discount all the unusually long times.
>>
>> Is there any reasonable explanation for this phenomena?
>>
>> I do realize I could help the query planner with explicit JOINs,
>> however
>> I have not yet embarked on this optimization, and might not bother if
>> the query performance is acceptable without doing so. I don't expect
>> the
>> execution plan to be optimal, however I do expect it to be
>> deterministic.
>>
>
> Something is missing from your descriptions.
> An explain analyze on the query and a list of the schema for the
> relevant tables would be helpful.
> Are the queries identical? Just changing the where clause a bit can
> cause big differences in query speed.
>
> Consider:
> SELECT age, status, phone FROM work_force WHERE state IN ('NY', 'CA',
> 'TX');
> May run more slowly than:
> SELECT age, status, phone FROM work_force WHERE state IN ('ID', 'MT',
> 'NV');
> Because the first three states have large populations and the last three
> states have smaller populations.
>
> Does the database machine run solely as a database server or are there
> other things going on? E.g. If you are doing a compile and link of
> 10,000 source files during one query and the machine is otherwise idle
> during a different one, we will expect different results.
>
> There will be (of course) a logical explanation for the query time
> differences.
>
> I suggest the following:
> 1. Do an explain analyze on a query that is slow
> 2. Do an explain analyze on a query that is fast (unless the queries
> are literally identical in every way)
> I guess that (along with the schema) will be enough to get an idea what
> is happening.
>
>


From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "Eric Schwarzenbach" <subscriber(at)blackbrook(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Wildly erratic query performance
Date: 2008-10-31 21:09:24
Message-ID: dcc563d10810311409x125cd922hce13afee6956d3da@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Oct 31, 2008 at 1:34 PM, Eric Schwarzenbach
<subscriber(at)blackbrook(dot)org> wrote:
> I've got a particular query that is giving me ridiculously erratic query
> performance. I have the SQL in a pgadmin query window, and from one
> execution to another, with no changes, the time it takes varies from

SNIP

> This is postgreslq 8.3, on Windows XP. The query joins about 17 tables
> (without an explicit JOIN, just using the WHERE criteria) with a few

OK, whether you use join syntax or where clause syntax, postgresql can
attempt to use the GEQO method to determine a close fit for the query
plan. You can change these settings:

#geqo = on
#geqo_threshold = 12
#geqo_effort = 5 # range 1-10

To control the GEQO. Just crank the threshold to 20 or something so
it doesn't kick in for now and see how long your queries take. The
planning time will go up, because pgsql will do exhaustive logic to
determine the best plan, but it should consistently pick a good plan.

and look at these too:

#from_collapse_limit = 8
#join_collapse_limit = 8


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
Cc: "Eric Schwarzenbach" <subscriber(at)blackbrook(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: Wildly erratic query performance
Date: 2008-10-31 21:27:03
Message-ID: 27848.1225488423@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com> writes:
> On Fri, Oct 31, 2008 at 1:34 PM, Eric Schwarzenbach
> <subscriber(at)blackbrook(dot)org> wrote:
>> This is postgreslq 8.3, on Windows XP. The query joins about 17 tables
>> (without an explicit JOIN, just using the WHERE criteria) with a few

> OK, whether you use join syntax or where clause syntax, postgresql can
> attempt to use the GEQO method to determine a close fit for the query
> plan. You can change these settings:

> #geqo = on
> #geqo_threshold = 12
> #geqo_effort = 5 # range 1-10

> To control the GEQO. Just crank the threshold to 20 or something so
> it doesn't kick in for now and see how long your queries take. The
> planning time will go up, because pgsql will do exhaustive logic to
> determine the best plan, but it should consistently pick a good plan.

The exhaustive search's time can be exponential in the number of tables
to be joined, so the above advice might or might not be workable. If
you find that planning takes too long when you disable geqo or bump up
the threshold, an alternative possibility is to kick up the geqo_effort
parameter to make it more likely that the randomized search will find a
decent plan.

> and look at these too:
> #from_collapse_limit = 8
> #join_collapse_limit = 8

If the query is given in the form of a "flat" FROM-list of 17 tables,
neither of those knobs will affect anything.

regards, tom lane


From: Dennis Brakhane <brakhane(at)googlemail(dot)com>
To: Eric Schwarzenbach <subscriber(at)blackbrook(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Wildly erratic query performance
Date: 2008-10-31 22:16:35
Message-ID: 20081031221635.GA4413@thinky
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Oct 31, 2008 at 04:36:02PM -0400, Eric Schwarzenbach wrote:
> As I explained already (no pun intended) running the query using EXPLAIN
> makes the wild variation go away. So I cannot get explain results for a
> fast and for a slow execution.

EXPLAIN only determines and outputs the query plan. It does not actually
run the query.

You probably want to use EXPLAIN ANALYZE