Re: Avoiding bad prepared-statement plans.

Lists: pgsql-hackers
From: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Avoiding bad prepared-statement plans.
Date: 2010-02-09 12:08:54
Message-ID: 4B715056.8060103@xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've been discussing this with Josh, Heikki, and Peter E. over the past
few weeks.

As Peter observed years ago, prepared statements can perform badly
because their plans are overly generic. Also, statistics change and
sometimes plans should change with them. It would be nice if we could
avoid users having to worry about these things.

I have some ideas that I'm willing to work on, if people agree that
they're useful. These are simple changes; the goal is to avoid
pathological performance at minimal cost, not to make prepared
statements faster than parameterized ones for everyone. The ideas
interact in various ways.

= Projected-cost threshold =

If a prepared statement takes parameters, and the generic plan has a
high projected cost, re-plan each EXECUTE individually with all its
parameter values bound. It may or may not help, but unless the planner
is vastly over-pessimistic, re-planning isn't going to dominate
execution time for these cases anyway.

= Actual-cost threshold =

Also stop using the generic plan if the statement takes a long time to
run in practice. Statistics may have gone bad. It could also be a
one-off due to a load peak or something, but that's handled by:

= Plan refresh =

Periodically re-plan prepared statements on EXECUTE. This is also a
chance for queries that were being re-planned every time to go back to a
generic plan.

Does that sound useful? Maybe it's too much engineering for little gain
compared to re-planning every EXECUTE. OTOH it doesn't look that much
harder than going that way either. And maybe there's some extra
planning effort that might be worthwhile for a reusable plan but not for
an individual query.

Jeroen


From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-09 12:59:15
Message-ID: 4B715C23.30101@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeroen Vermeulen wrote:
> I've been discussing this with Josh, Heikki, and Peter E. over the
> past few weeks.
Is this searchable in the archives? I'm interested in ideas discussed.
> If a prepared statement takes parameters, and the generic plan has a
> high projected cost, re-plan each EXECUTE individually with all its
> parameter values bound. It may or may not help, but unless the
> planner is vastly over-pessimistic, re-planning isn't going to
> dominate execution time for these cases anyway.
This sounds like a really nice to have feature. Maybe it'd also be
possible to skip replanning between executes if the current bound values
are 'indexwise-equivalent' to the values used at previous planning, i.e.
nothing in the statistics indicates that execution cost would be (much)
different. Are there more ways to cut down on planning time? Obviously
some plannedstatement/plannerinfo structures could be kept, but maybe
it'd also be possible to plan only that part of the join tree where the
params are used in a scan/join qual.

regards,
Yeb Havinga


From: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
To: Yeb Havinga <yebhavinga(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-09 13:46:55
Message-ID: 4B71674F.6030505@xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Yeb Havinga wrote:

>> I've been discussing this with Josh, Heikki, and Peter E. over the
>> past few weeks.
> Is this searchable in the archives? I'm interested in ideas discussed.

No, sorry. These were face-to-face discussions at linux.conf.au and FOSDEM.

>> If a prepared statement takes parameters, and the generic plan has a
>> high projected cost, re-plan each EXECUTE individually with all its
>> parameter values bound. It may or may not help, but unless the
>> planner is vastly over-pessimistic, re-planning isn't going to
>> dominate execution time for these cases anyway.

> This sounds like a really nice to have feature. Maybe it'd also be
> possible to skip replanning between executes if the current bound values
> are 'indexwise-equivalent' to the values used at previous planning, i.e.
> nothing in the statistics indicates that execution cost would be (much)
> different. Are there more ways to cut down on planning time? Obviously
> some plannedstatement/plannerinfo structures could be kept, but maybe
> it'd also be possible to plan only that part of the join tree where the
> params are used in a scan/join qual.

I think we should be careful not to over-think this. Planning isn't
*that* costly, so apply Amdahl's Law liberally. I'm proposing some easy
things we could do without adding much overhead or maintenance burden;
I've been assuming that getting intimate with the planner would risk
those advantages.

Jeroen


From: Richard Huxton <dev(at)archonet(dot)com>
To: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-09 13:50:32
Message-ID: 4B716828.2030303@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/02/10 12:08, Jeroen Vermeulen wrote:
>
> = Projected-cost threshold =

[snip - this is the simple bit. Sounds very sensible. ]

> = Actual-cost threshold =
>
> Also stop using the generic plan if the statement takes a long time to
> run in practice.

Do you mean:
1. Rollback the current query and start again
2. Mark the plan as a bad one and plan again next execute

If you can figure out how to do #1 then you could probably do it for all
queries, but I'm guessing it's far from simple to implement.

> = Plan refresh =
>
> Periodically re-plan prepared statements on EXECUTE. This is also a
> chance for queries that were being re-planned every time to go back to a
> generic plan.

Presumably some score based on update stats and vacuum activity etc.

The good side of all these ideas is good indeed. The bad side is plan
instability. Someone somewhere will have a generic plan that turns out
better than the specific plan (due to bad stats or config settings or
just planner limitations). The question is (I guess): How many more
winners will there be than losers?

--
Richard Huxton
Archonet Ltd


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-09 14:10:22
Message-ID: 201002091510.24203.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday 09 February 2010 13:08:54 Jeroen Vermeulen wrote:
> I've been discussing this with Josh, Heikki, and Peter E. over the past
> few weeks.
>
> As Peter observed years ago, prepared statements can perform badly
> because their plans are overly generic. Also, statistics change and
> sometimes plans should change with them. It would be nice if we could
> avoid users having to worry about these things.
>
> I have some ideas that I'm willing to work on, if people agree that
> they're useful. These are simple changes; the goal is to avoid
> pathological performance at minimal cost, not to make prepared
> statements faster than parameterized ones for everyone. The ideas
> interact in various ways.
>
>
> = Projected-cost threshold =
>
> If a prepared statement takes parameters, and the generic plan has a
> high projected cost, re-plan each EXECUTE individually with all its
> parameter values bound. It may or may not help, but unless the planner
> is vastly over-pessimistic, re-planning isn't going to dominate
> execution time for these cases anyway.
>
> = Actual-cost threshold =
>
> Also stop using the generic plan if the statement takes a long time to
> run in practice. Statistics may have gone bad. It could also be a
> one-off due to a load peak or something, but that's handled by:
That is not that easy. It means that you have to use savepoints enclosing each
and every execution of a prepared statement because the query could have
sideeffects. Which wouldnt be terribly efficient...

Andres


From: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-09 14:25:19
Message-ID: 4B71704F.5070708@xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Richard Huxton wrote:

>> = Actual-cost threshold =
>>
>> Also stop using the generic plan if the statement takes a long time to
>> run in practice.
>
> Do you mean:
> 1. Rollback the current query and start again
> 2. Mark the plan as a bad one and plan again next execute
>
> If you can figure out how to do #1 then you could probably do it for all
> queries, but I'm guessing it's far from simple to implement.

I'm talking about #2. As a matter of fact #1 did come up in one of
those discussions, but how do you know you're not killing the query
juuust before it'd done, and then maybe executing a different plan
that's no better?

>> = Plan refresh =
>>
>> Periodically re-plan prepared statements on EXECUTE. This is also a
>> chance for queries that were being re-planned every time to go back to a
>> generic plan.
>
> Presumably some score based on update stats and vacuum activity etc.

I was thinking of something very simple: re-do whatever we'd do if the
statement were only being prepared at that point.

> The good side of all these ideas is good indeed. The bad side is plan
> instability. Someone somewhere will have a generic plan that turns out
> better than the specific plan (due to bad stats or config settings or
> just planner limitations). The question is (I guess): How many more
> winners will there be than losers?

That's a good and surprising point, and therefore I'd like to draw
attention away to a different point. :-)

Yes, there will be losers in the sense that people may have optimized
their use of prepared statements to whatever the current planner does.
Maybe somebody out there even deliberately uses them to trick the
planner into a different plan. But that is always going to happen;
we're aiming for better plans, not for giving more detailed control over
them. If you really can't take a change, don't upgrade.

The competing point is: people out there may currently be forgoing
prepared statements entirely because of erratic performance. To those
people, if we can help them, it's like having a new feature.

Jeroen


From: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-09 14:28:01
Message-ID: 4B7170F1.5020106@xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund wrote:

>> = Actual-cost threshold =
>>
>> Also stop using the generic plan if the statement takes a long time to
>> run in practice. Statistics may have gone bad. It could also be a
>> one-off due to a load peak or something, but that's handled by:

> That is not that easy. It means that you have to use savepoints enclosing each
> and every execution of a prepared statement because the query could have
> sideeffects. Which wouldnt be terribly efficient...

This is not within an execution of the statement, but across executions.
So the next execution learns from the previous result. So I'm not
talking about aborting the ongoing execution. Sorry for being unclear.

Jeroen


From: Richard Huxton <dev(at)archonet(dot)com>
To: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-09 14:45:57
Message-ID: 4B717525.2010907@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/02/10 14:25, Jeroen Vermeulen wrote:
> Richard Huxton wrote:
>
>>> = Actual-cost threshold =
>>>
>>> Also stop using the generic plan if the statement takes a long time to
>>> run in practice.
>>
>> Do you mean:
>> 1. Rollback the current query and start again
>> 2. Mark the plan as a bad one and plan again next execute
>>
>> If you can figure out how to do #1 then you could probably do it for
>> all queries, but I'm guessing it's far from simple to implement.
>
> I'm talking about #2. As a matter of fact #1 did come up in one of those
> discussions, but how do you know you're not killing the query juuust
> before it'd done, and then maybe executing a different plan that's no
> better?

Ah, you'd need to be smarter when planning and also remember the
expected rows from each node. That way if your (index driven) inner node
was expecting 3 rows you could mark it to force a cancellation if it
returns (say) 30 or more. You'd allow more slack in later processing and
less slack earlier on where a bad estimate can explode the final number
of rows.

Or, there is always the case where we reverse-search an index to find
the last 10 messages in a group say, but the particular group in
question hasn't had a comment for months, so you trawl half the table.
People regularly get bitten by that, and there's not much to be done
about it. If we could abort when it looks like we're in worst-case
rather than best-case scenarios then it would be one less thing for
users to worry about.

>>> = Plan refresh =
>>>
>>> Periodically re-plan prepared statements on EXECUTE. This is also a
>>> chance for queries that were being re-planned every time to go back to a
>>> generic plan.
>>
>> Presumably some score based on update stats and vacuum activity etc.
>
> I was thinking of something very simple: re-do whatever we'd do if the
> statement were only being prepared at that point.

Yes, I thought so, the scoring was for *when* to decide to cancel the
old plan. I suppose total query-time would be another way to decide this
plan needs reworking.

>> The good side of all these ideas is good indeed. The bad side is plan
>> instability. Someone somewhere will have a generic plan that turns out
>> better than the specific plan (due to bad stats or config settings or
>> just planner limitations). The question is (I guess): How many more
>> winners will there be than losers?
>
> That's a good and surprising point, and therefore I'd like to draw
> attention away to a different point. :-)
>
> Yes, there will be losers in the sense that people may have optimized
> their use of prepared statements to whatever the current planner does.
> Maybe somebody out there even deliberately uses them to trick the
> planner into a different plan. But that is always going to happen; we're
> aiming for better plans, not for giving more detailed control over them.
> If you really can't take a change, don't upgrade.
>
> The competing point is: people out there may currently be forgoing
> prepared statements entirely because of erratic performance. To those
> people, if we can help them, it's like having a new feature.

Oh, I'm persuaded, but that doesn't really get you anywhere :-)

--
Richard Huxton
Archonet Ltd


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
Cc: Yeb Havinga <yebhavinga(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-09 14:53:17
Message-ID: 4B7176DD.5060305@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/09/2010 08:46 AM, Jeroen Vermeulen wrote:
>> This sounds like a really nice to have feature. Maybe it'd also be
>> possible to skip replanning between executes if the current bound
>> values are 'indexwise-equivalent' to the values used at previous
>> planning, i.e. nothing in the statistics indicates that execution
>> cost would be (much) different. Are there more ways to cut down on
>> planning time? Obviously some plannedstatement/plannerinfo structures
>> could be kept, but maybe it'd also be possible to plan only that part
>> of the join tree where the params are used in a scan/join qual.
>
> I think we should be careful not to over-think this. Planning isn't
> *that* costly, so apply Amdahl's Law liberally. I'm proposing some
> easy things we could do without adding much overhead or maintenance
> burden; I've been assuming that getting intimate with the planner
> would risk those advantages.

In a current commercial app we have that uses JDBC and prepared plans
for just about everything, it regularly ends up with execution times of
30+ milliseconds when a complete plan + execute would take less than 1
millisecond.

PostgreSQL planning is pretty fast. In terms of not over thinking things
- I think I would even prefer an option that said "always re-plan
prepared statements" as a starting point. If it happened to become
smarter over time, such that it would have invalidation criteria that
would trigger a re-plan, that would be awesome, but in terms of what
would help me *today* - being able to convert prepared plans into just a
means to use place holders would help me today on certain real
applications in production use right now.

Cheers,
mark

--
Mark Mielke<mark(at)mielke(dot)cc>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-09 16:43:52
Message-ID: 603c8f071002090843j5bf43df1wa29cbb268c225909@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 9, 2010 at 7:08 AM, Jeroen Vermeulen <jtv(at)xs4all(dot)nl> wrote:
> = Projected-cost threshold =
>
> If a prepared statement takes parameters, and the generic plan has a high
> projected cost, re-plan each EXECUTE individually with all its parameter
> values bound.  It may or may not help, but unless the planner is vastly
> over-pessimistic, re-planning isn't going to dominate execution time for
> these cases anyway.

How high is high?

> = Actual-cost threshold =
>
> Also stop using the generic plan if the statement takes a long time to run
> in practice.  Statistics may have gone bad.  It could also be a one-off due
> to a load peak or something, but that's handled by:
>
> = Plan refresh =
>
> Periodically re-plan prepared statements on EXECUTE.  This is also a chance
> for queries that were being re-planned every time to go back to a generic
> plan.

The most common problem here seems to be that (some?) MCVs need
different treatment than non-MCVs, so I don't think periodically
replanning is going to help very much. What might help is something
like plan twice, once assuming you have the most common MCV and once
assuming you have a non-MCV. If the two plans are same, you're
probably safe. Or if you can somehow determine that one of the plans
will still be pretty fast in the other case, you can just use that
plan across the board. Otherwise, you have to somehow vary the plan
based on the actual parameter value.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-09 16:59:41
Message-ID: 17062.1265734781@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Feb 9, 2010 at 7:08 AM, Jeroen Vermeulen <jtv(at)xs4all(dot)nl> wrote:
>> Periodically re-plan prepared statements on EXECUTE. This is also a chance
>> for queries that were being re-planned every time to go back to a generic
>> plan.

> The most common problem here seems to be that (some?) MCVs need
> different treatment than non-MCVs, so I don't think periodically
> replanning is going to help very much.

It won't help at all. The only reason for replanning is if something
about the schema or the statistics change, and we already have got
automatic cached-plan invalidation in both those cases. If you replan
simply because some time has elapsed, you'll just get exactly the
same plan.

The only case that I think still has any merit is where you get a
significantly better plan with known parameter values than without.
The projected-cost threshold might be a reasonable approach for
attacking that, ie, if estimated cost of generic plan exceeds X
then take the time to build a custom plan instead. I'm not sure that
really will fix the problem, but it would be a very simple change to
make to see how much it helps people.

regards, tom lane


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
Cc: Yeb Havinga <yebhavinga(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-09 22:21:43
Message-ID: m2zl3inj7s.fsf@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeroen Vermeulen <jtv(at)xs4all(dot)nl> writes:
> I think we should be careful not to over-think this. Planning isn't *that*
> costly, so apply Amdahl's Law liberally. I'm proposing some easy things we
> could do without adding much overhead or maintenance burden; I've been
> assuming that getting intimate with the planner would risk those advantages.

On a project where performance is a must (5ms per query is about all you
can ask) I have queries for which planning is 40+ ms and execute 2 to 5
ms (dataset fits in RAM, by design).

I'm then abusing pgbouncer so that the PREPARE is shared by a lot of
clients, all the ones landing into the session (transaction pooling).

See preprepare to get a better idea, even if we're yet to run it (it's
being used in production elsewhere, though, I've been told).

http://preprepare.projects.postgresql.org/

Regards,
--
dim


From: Kris Jurka <books(at)ejurka(dot)com>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Yeb Havinga <yebhavinga(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-10 00:47:18
Message-ID: alpine.BSO.2.00.1002091933001.23965@leary.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 9 Feb 2010, Mark Mielke wrote:

> In a current commercial app we have that uses JDBC and prepared plans for
> just about everything, it regularly ends up with execution times of 30+
> milliseconds when a complete plan + execute would take less than 1
> millisecond.
>
> PostgreSQL planning is pretty fast. In terms of not over thinking things - I
> think I would even prefer an option that said "always re-plan prepared
> statements" as a starting point. If it happened to become smarter over time,
> such that it would have invalidation criteria that would trigger a re-plan,
> that would be awesome, but in terms of what would help me *today* - being
> able to convert prepared plans into just a means to use place holders would
> help me today on certain real applications in production use right now.
>

The JDBC driver has two methods of disabling permanently planned prepared
statements:

1) Use the version two frontend/backend protocol via adding
protocolVersion=2 to your URL. This interpolates all parameters into
the query on the client side.

2) Execute PreparedStatements using the unnamed statement rather than a
named statement via adding prepareThreshold=0 to your URL. A named
statement is expected to be re-used for later execution and ignores the
passed parameters for planning purposes. An unnamed statement may be
re-used, but it doesn't expect to be. The unnamed statement uses the
passed parameters for planning purposes, but still cannot make certain
optimatizations based on the parameter values because it may be
re-executed again later with different parameters. For example a LIKE
query with a parameter value of 'ABC%' cannot be transformed into range
query because the next execution may use a different parameter value for
which the transform is not valid. By default the driver switches to using
a named statement after the same PreparedStatement object is executed five
times.

http://jdbc.postgresql.org/documentation/84/connect.html#connection-parameters
http://jdbc.postgresql.org/documentation/84/server-prepare.html

Kris Jurka


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Kris Jurka <books(at)ejurka(dot)com>
Cc: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Yeb Havinga <yebhavinga(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-11 03:12:41
Message-ID: 201002110312.o1B3CfS12297@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kris Jurka wrote:
>
> The JDBC driver has two methods of disabling permanently planned prepared
> statements:
>
> 1) Use the version two frontend/backend protocol via adding
> protocolVersion=2 to your URL. This interpolates all parameters into
> the query on the client side.
>
> 2) Execute PreparedStatements using the unnamed statement rather than a
> named statement via adding prepareThreshold=0 to your URL. A named
> statement is expected to be re-used for later execution and ignores the
> passed parameters for planning purposes. An unnamed statement may be
> re-used, but it doesn't expect to be. The unnamed statement uses the
> passed parameters for planning purposes, but still cannot make certain
> optimatizations based on the parameter values because it may be
> re-executed again later with different parameters. For example a LIKE
> query with a parameter value of 'ABC%' cannot be transformed into range
> query because the next execution may use a different parameter value for
> which the transform is not valid. By default the driver switches to using
> a named statement after the same PreparedStatement object is executed five
> times.
>
> http://jdbc.postgresql.org/documentation/84/connect.html#connection-parameters
> http://jdbc.postgresql.org/documentation/84/server-prepare.html

Can someone explain to me why we only do "delayed binding" for unnamed
prepared queries? Why do we not allow this option for named protocol
prepared queries and SQL prepared queries?

Here is what our documentation has in the protocols section:

The unnamed prepared statement is likewise planned during Parse processing
if the Parse message defines no parameters. But if there are parameters,
query planning occurs during Bind processing instead. This allows the
planner to make use of the actual values of the parameters provided in
the Bind message when planning the query.

and here is someone who is having problems with the generic plans we
create:

http://www.odecee.com.au/blogs/?p=134

Can we not document this better?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-11 03:15:01
Message-ID: 201002110315.o1B3F1R12683@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Tue, Feb 9, 2010 at 7:08 AM, Jeroen Vermeulen <jtv(at)xs4all(dot)nl> wrote:
> >> Periodically re-plan prepared statements on EXECUTE. This is also a chance
> >> for queries that were being re-planned every time to go back to a generic
> >> plan.
>
> > The most common problem here seems to be that (some?) MCVs need
> > different treatment than non-MCVs, so I don't think periodically
> > replanning is going to help very much.
>
> It won't help at all. The only reason for replanning is if something
> about the schema or the statistics change, and we already have got
> automatic cached-plan invalidation in both those cases. If you replan
> simply because some time has elapsed, you'll just get exactly the
> same plan.
>
> The only case that I think still has any merit is where you get a
> significantly better plan with known parameter values than without.
> The projected-cost threshold might be a reasonable approach for
> attacking that, ie, if estimated cost of generic plan exceeds X
> then take the time to build a custom plan instead. I'm not sure that
> really will fix the problem, but it would be a very simple change to
> make to see how much it helps people.

Ideally we would do late binding (bind on the first supplied parameters,
like we do for unnamed protocol prepared queries now), and then replan
if the statistics for later parameters significantly differ from the
ones used for the the initial planning.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Kris Jurka <books(at)ejurka(dot)com>, Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Yeb Havinga <yebhavinga(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-11 04:07:57
Message-ID: 28584.1265861277@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Can someone explain to me why we only do "delayed binding" for unnamed
> prepared queries?

It was a way of shoehorning in some driver control over the behavior
without the protocol bump that would be involved in adding an actual
option to Parse messages.

regards, tom lane


From: Дмитрий Фефелов <fozzy(at)ac-sw(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-11 06:26:15
Message-ID: 201002111226.15967.fozzy@ac-sw.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> The only case that I think still has any merit is where you get a
> significantly better plan with known parameter values than without.
> The projected-cost threshold might be a reasonable approach for
> attacking that, ie, if estimated cost of generic plan exceeds X
> then take the time to build a custom plan instead. I'm not sure that
> really will fix the problem, but it would be a very simple change to
> make to see how much it helps people.
>
> regards, tom lane
>

It will definitely help with partitioned tables. It's very common case when
raw data taken from hardware stored in single table first, and later we start
to make partitions for each month/week/day. Feature can improve performance
transparently to client apps.

regards,
Dmitry

> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Bart Samwel <bart(at)samwel(dot)tk>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-11 12:09:33
Message-ID: ded01eb21002110409m5b729dffn168061dae0cad213@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Robert,

On Tue, Feb 9, 2010 at 17:43, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Feb 9, 2010 at 7:08 AM, Jeroen Vermeulen <jtv(at)xs4all(dot)nl> wrote:
> > = Projected-cost threshold =
> >
> > If a prepared statement takes parameters, and the generic plan has a high
> > projected cost, re-plan each EXECUTE individually with all its parameter
> > values bound. It may or may not help, but unless the planner is vastly
> > over-pessimistic, re-planning isn't going to dominate execution time for
> > these cases anyway.
>
> How high is high?
>

Perhaps this could be based on a (configurable?) ratio of observed planning
time and projected execution time. I mean, if planning it the first time
took 30 ms and projected execution time is 1 ms, then by all means NEVER
re-plan. But if planning the first time took 1 ms and resulted in a
projected execution time of 50 ms, then it's relatively cheap to re-plan
every time (cost increase per execution is 1/50 = 2%), and the potential
gains are much greater (taking a chunk out of 50 ms adds up quickly).

Cheers,
Bart


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Bart Samwel <bart(at)samwel(dot)tk>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-11 12:25:28
Message-ID: 162867791002110425r2a0b41cu5e4ee979a7f1f716@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/2/11 Bart Samwel <bart(at)samwel(dot)tk>:
> Hi Robert,
>
> On Tue, Feb 9, 2010 at 17:43, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>> On Tue, Feb 9, 2010 at 7:08 AM, Jeroen Vermeulen <jtv(at)xs4all(dot)nl> wrote:
>> > = Projected-cost threshold =
>> >
>> > If a prepared statement takes parameters, and the generic plan has a
>> > high
>> > projected cost, re-plan each EXECUTE individually with all its parameter
>> > values bound.  It may or may not help, but unless the planner is vastly
>> > over-pessimistic, re-planning isn't going to dominate execution time for
>> > these cases anyway.
>>
>> How high is high?
>
> Perhaps this could be based on a (configurable?) ratio of observed planning
> time and projected execution time. I mean, if planning it the first time
> took 30 ms and projected execution time is 1 ms, then by all means NEVER
> re-plan. But if planning the first time took 1 ms and resulted in a
> projected execution time of 50 ms, then it's relatively cheap to re-plan
> every time (cost increase per execution is 1/50 = 2%), and the potential
> gains are much greater (taking a chunk out of 50 ms adds up quickly).

It could be a good idea. I don't belive to sophisticate methods. There
can be a very simply solution. The could be a "limit" for price. More
expensive queries can be replaned every time when the price will be
over limit.

Regards

Pavel Stehule
>
> Cheers,
> Bart
>


From: Bart Samwel <bart(at)samwel(dot)tk>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-11 12:39:49
Message-ID: ded01eb21002110439w593e540cod112c52c15cb6e4d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 11, 2010 at 13:25, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>wrote:

> 2010/2/11 Bart Samwel <bart(at)samwel(dot)tk>:
> > Perhaps this could be based on a (configurable?) ratio of observed
> planning
> > time and projected execution time. I mean, if planning it the first time
> > took 30 ms and projected execution time is 1 ms, then by all means NEVER
> > re-plan. But if planning the first time took 1 ms and resulted in a
> > projected execution time of 50 ms, then it's relatively cheap to re-plan
> > every time (cost increase per execution is 1/50 = 2%), and the potential
> > gains are much greater (taking a chunk out of 50 ms adds up quickly).
>
>
> It could be a good idea. I don't belive to sophisticate methods. There
> can be a very simply solution. The could be a "limit" for price. More
> expensive queries can be replaned every time when the price will be
> over limit.
>

I guess the required complexity depends on how variable planning costs are.
If planning is typically <= 2 ms, then a hard limit on estimated price is
useful and can be set as low as (the equivalent of) 15 ms. However, if
planning costs can be 50 ms, then the lowest reasonable "fixed" limit is
quite a bit larger than that -- and that does not solve the problem reported
earlier in this thread, where a query takes 30 ms using a generic plan and 1
ms using a specialized plan.

Anyhow, I have no clue how much time the planner takes. Can anybody provide
any statistics in that regard?

Cheers,
Bart


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bart Samwel <bart(at)samwel(dot)tk>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-11 12:41:12
Message-ID: 603c8f071002110441j1669930fs7f653edd0d3b1aa8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 11, 2010 at 7:39 AM, Bart Samwel <bart(at)samwel(dot)tk> wrote:
> On Thu, Feb 11, 2010 at 13:25, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>>
>> 2010/2/11 Bart Samwel <bart(at)samwel(dot)tk>:
>> > Perhaps this could be based on a (configurable?) ratio of observed
>> > planning
>> > time and projected execution time. I mean, if planning it the first time
>> > took 30 ms and projected execution time is 1 ms, then by all means NEVER
>> > re-plan. But if planning the first time took 1 ms and resulted in a
>> > projected execution time of 50 ms, then it's relatively cheap to re-plan
>> > every time (cost increase per execution is 1/50 = 2%), and the potential
>> > gains are much greater (taking a chunk out of 50 ms adds up quickly).
>>
>>
>> It could be a good idea. I don't belive to sophisticate methods. There
>> can be a very simply solution. The could be a "limit" for price.  More
>> expensive queries can be replaned every time when the price will be
>> over limit.
>
> I guess the required complexity depends on how variable planning costs are.
> If planning is typically <= 2 ms, then a hard limit on estimated price is
> useful and can be set as low as (the equivalent of) 15 ms. However, if
> planning costs can be 50 ms, then the lowest reasonable "fixed" limit is
> quite a bit larger than that -- and that does not solve the problem reported
> earlier in this thread, where a query takes 30 ms using a generic plan and 1
> ms using a specialized plan.
>
> Anyhow, I have no clue how much time the planner takes. Can anybody provide
> any statistics in that regard?

It depends a great deal on the query, which is one of the things that
makes implementing this rather challenging.

...Robert


From: Bart Samwel <bart(at)samwel(dot)tk>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-11 12:48:14
Message-ID: ded01eb21002110448h3a452e8fwefe575bd869ec04@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 11, 2010 at 13:41, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Feb 11, 2010 at 7:39 AM, Bart Samwel <bart(at)samwel(dot)tk> wrote:
> > Anyhow, I have no clue how much time the planner takes. Can anybody
> provide
> > any statistics in that regard?
>
> It depends a great deal on the query, which is one of the things that
> makes implementing this rather challenging.
>

But I guess you can probably expect it to be on the same order for the same
query in generic form and with filled-in parameters? Because that's the
underlying assumption of the "ratio" criterion -- that re-planning with
filled-in parameters takes about as much time as the initial planning run
took.

Cheers,
Bart


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bart Samwel <bart(at)samwel(dot)tk>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-11 13:04:46
Message-ID: 603c8f071002110504yb07519do9dd258c5a0ee6143@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 11, 2010 at 7:48 AM, Bart Samwel <bart(at)samwel(dot)tk> wrote:
> On Thu, Feb 11, 2010 at 13:41, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>> On Thu, Feb 11, 2010 at 7:39 AM, Bart Samwel <bart(at)samwel(dot)tk> wrote:
>> > Anyhow, I have no clue how much time the planner takes. Can anybody
>> > provide
>> > any statistics in that regard?
>>
>> It depends a great deal on the query, which is one of the things that
>> makes implementing this rather challenging.
>
> But I guess you can probably expect it to be on the same order for the same
> query in generic form and with filled-in parameters?

I think so.... but I wouldn't bet the farm on it without testing.

> Because that's the
> underlying assumption of the "ratio" criterion -- that re-planning with
> filled-in parameters takes about as much time as the initial planning run
> took.

We only want to replan when replanning is relatively cheap compared to
execution, so the other assumption is that the planning-to-execution
ratio is more or less constant. Whether that's sufficiently true to
make the proposed system useful and reliable is not clear to me.

...Robert


From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Bart Samwel <bart(at)samwel(dot)tk>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-11 13:41:02
Message-ID: 4B7408EE.5020905@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bart Samwel wrote:
> Perhaps this could be based on a (configurable?) ratio of observed
> planning time and projected execution time. I mean, if planning it the
> first time took 30 ms and projected execution time is 1 ms, then by
> all means NEVER re-plan.
IMHO looking at ms is bad for this 'possible replan' decision. The only
comparable numbers invariant to system load are the planners costs (not
in ms but unitless) and maybe actual number of processed tuples, but
never actual ms.

Regards,
Yeb Havinga


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-11 16:17:40
Message-ID: 10153.1265905060@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Feb 11, 2010 at 7:48 AM, Bart Samwel <bart(at)samwel(dot)tk> wrote:
>> Because that's the
>> underlying assumption of the "ratio" criterion -- that re-planning with
>> filled-in parameters takes about as much time as the initial planning run
>> took.

> We only want to replan when replanning is relatively cheap compared to
> execution,

Well, no, consider the situation where planning takes 50 ms, the generic
plan costs 100ms to execute, but a parameter-specific plan would take 1ms
to execute. Planning is very expensive compared to execution but it's
still a win to do it.

The problem that we face is that we don't have any very good way to tell
whether a fresh planning attempt is likely to yield a plan significantly
better than the generic plan. I can think of some heuristics --- for
example if the query contains LIKE with a parameterized pattern or a
partitioned table --- but that doesn't seem like a particularly nice
road to travel.

A possible scheme is to try it and keep track of whether we ever
actually do get a better plan. If, after N attempts, none of the custom
plans were ever more than X% cheaper than the generic one, then give up
and stop attempting to produce custom plans. Tuning the variables might
be challenging though.

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bart Samwel <bart(at)samwel(dot)tk>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-11 16:39:11
Message-ID: 162867791002110839n326bdb0cl3ebc3cb5cdf9d11a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/2/11 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Feb 11, 2010 at 7:48 AM, Bart Samwel <bart(at)samwel(dot)tk> wrote:
>>> Because that's the
>>> underlying assumption of the "ratio" criterion -- that re-planning with
>>> filled-in parameters takes about as much time as the initial planning run
>>> took.
>
>> We only want to replan when replanning is relatively cheap compared to
>> execution,
>
> Well, no, consider the situation where planning takes 50 ms, the generic
> plan costs 100ms to execute, but a parameter-specific plan would take 1ms
> to execute.  Planning is very expensive compared to execution but it's
> still a win to do it.
>
> The problem that we face is that we don't have any very good way to tell
> whether a fresh planning attempt is likely to yield a plan significantly
> better than the generic plan.  I can think of some heuristics --- for
> example if the query contains LIKE with a parameterized pattern or a
> partitioned table --- but that doesn't seem like a particularly nice
> road to travel.
>
> A possible scheme is to try it and keep track of whether we ever
> actually do get a better plan.  If, after N attempts, none of the custom
> plans were ever more than X% cheaper than the generic one, then give up
> and stop attempting to produce custom plans.  Tuning the variables might
> be challenging though.

I afraid so every heuristic is bad. Problem is identification of bad
generic plan. And nobody ensure, so non generic plan will be better
than generic. Still I thing we need some way for lazy prepared
statements - plan is generated everytime with known parameters.

Other idea: some special debug/test mod, where pg store generic plan
for every prepared statement, and still generate specific plan. When
the prices are different, then pg produces a warning. This can be
slower, but can identify problematic queries. It could be implemented
as contrib module - some like autoexplain.

regards
Pavel

>
>                        regards, tom lane
>


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bart Samwel <bart(at)samwel(dot)tk>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-15 19:11:34
Message-ID: 201002151911.o1FJBYh22763@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:
> > The problem that we face is that we don't have any very good way to tell
> > whether a fresh planning attempt is likely to yield a plan significantly
> > better than the generic plan. ?I can think of some heuristics --- for
> > example if the query contains LIKE with a parameterized pattern or a
> > partitioned table --- but that doesn't seem like a particularly nice
> > road to travel.
> >
> > A possible scheme is to try it and keep track of whether we ever
> > actually do get a better plan. ?If, after N attempts, none of the custom
> > plans were ever more than X% cheaper than the generic one, then give up
> > and stop attempting to produce custom plans. ?Tuning the variables might
> > be challenging though.
>
> I afraid so every heuristic is bad. Problem is identification of bad
> generic plan. And nobody ensure, so non generic plan will be better
> than generic. Still I thing we need some way for lazy prepared
> statements - plan is generated everytime with known parameters.

Yea, this opens a whole host of questions for me:

1. Why do we only do bind-level planning for anonymous wire-level queries?

2. I realize we did anonymous-only because that was the only way we had
in the protocol to _signal_ bind-time planning, but didn't we think of
this when we were implementing the wire-level protocol?

3. Do we have no place to add this cleanly without a protocol version
bump?

4. Why don't we just always do planning at first bind time? When is
that worse than using generic values?

5. Why have we not added an option for SQL-level prepare to do this?

6. When do our generic columns costs significantly worse than having
specific constants? I assume unique columns are fine with generic
constants.

7. Why is there no option to do parameterized-queries which replan every
time?

This just seems like an area that has been neglected, or maybe I am
missing something and our current setup is acceptable. We have done a
lot of work to generate acceptable optimizer statistics, but we are not
using them for a significant part of our user base, particularly JDBC.

We do have a TODO item, but it has gotten little attention:

Allow finer control over the caching of prepared query plans

Currently anonymous (un-named) queries prepared via the libpq API
are planned at bind time using the supplied parameters --- allow SQL
PREPARE to do the same. Also, allow control over replanning prepared
queries either manually or automatically when statistics for execute
parameters differ dramatically from those used during planning.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-15 19:51:26
Message-ID: 4B79A5BE.6060906@xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

> Well, no, consider the situation where planning takes 50 ms, the generic
> plan costs 100ms to execute, but a parameter-specific plan would take 1ms
> to execute. Planning is very expensive compared to execution but it's
> still a win to do it.

I think that's a fun and worthwhile problem. But my limited personal
aim right now is a stopgap for pathological cases. I'd like to pick the
low-hanging fruit; actually squeezing the fat out of prepared statements
is a job I wouldn't get around to completing. Sorry for mixing metaphors.

Here's what I like about the really slow plans. (Now why does that sound
so strange?) We don't know if re-planning will help, but we do know
that (1) it won't hurt much relative to execution time, so we really
don't _care_; and (2) there is lots of potential for improvement, so
catching just one execution that can be much faster might pay for all
the extra time spent re-planning.

Where do we draw the line between costly and pathological? I still like
Bart's idea of a fixed ratio to planning time, because it reflects a
self-tuning sense of proportion. Sure, planning time can vary a lot but
we're talking about an order-of-magnitude difference, not an exact 19:21
optimum. We can be sloppy and still expect to win.

AFAIC a statement could go to "re-planning mode" if the shortest
execution time for the generic plan takes at least 10x longer than the
longest planning time. That gives us a decent shot at finding
statements where re-planning is a safe bet. A parameter that we or the
user would have to tweak would just be a fragile approximation of that.

> A possible scheme is to try it and keep track of whether we ever
> actually do get a better plan. If, after N attempts, none of the custom
> plans were ever more than X% cheaper than the generic one, then give up
> and stop attempting to produce custom plans. Tuning the variables might
> be challenging though.

A simple stopgap implementation may also be a useful experimentation
platform for refinements. It shouldn't be too complex to rip out when
something better comes along.

Jeroen


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bart Samwel <bart(at)samwel(dot)tk>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-15 20:03:21
Message-ID: 603c8f071002151203l184249bfx714fba1e83f47af5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 15, 2010 at 2:11 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Pavel Stehule wrote:
>> > The problem that we face is that we don't have any very good way to tell
>> > whether a fresh planning attempt is likely to yield a plan significantly
>> > better than the generic plan. ?I can think of some heuristics --- for
>> > example if the query contains LIKE with a parameterized pattern or a
>> > partitioned table --- but that doesn't seem like a particularly nice
>> > road to travel.
>> >
>> > A possible scheme is to try it and keep track of whether we ever
>> > actually do get a better plan. ?If, after N attempts, none of the custom
>> > plans were ever more than X% cheaper than the generic one, then give up
>> > and stop attempting to produce custom plans. ?Tuning the variables might
>> > be challenging though.
>>
>> I afraid so every heuristic is bad. Problem is identification of bad
>> generic plan. And nobody ensure, so non generic plan will be better
>> than generic. Still I thing we need some way for lazy prepared
>> statements - plan is generated everytime with known parameters.
>
> Yea, this opens a whole host of questions for me:
>
> 1. Why do we only do bind-level planning for anonymous wire-level queries?
>
> 2. I realize we did anonymous-only because that was the only way we had
> in the protocol to _signal_ bind-time planning, but didn't we think of
> this when we were implementing the wire-level protocol?
>
> 3. Do we have no place to add this cleanly without a protocol version
> bump?
>
> 4. Why don't we just always do planning at first bind time?  When is
> that worse than using generic values?
>
> 5. Why have we not added an option for SQL-level prepare to do this?
>
> 6. When do our generic columns costs significantly worse than having
> specific constants?  I assume unique columns are fine with generic
> constants.
>
> 7. Why is there no option to do parameterized-queries which replan every
> time?
>
> This just seems like an area that has been neglected, or maybe I am
> missing something and our current setup is acceptable.

No, our current setup is not acceptable, and your questions are all
right on target. I have been hoping that someone would take an
interest in this problem for years. An option to replan on every
execution would be a very, very fine thing. IMHO, there should also
be a way to signal to PL/pgsql that you want this behavior for a
particular query, short of wrapping it using EXECUTE, which is clunky
and also forces a re-parse on every execution.

...Robert


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bart Samwel <bart(at)samwel(dot)tk>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-15 20:05:29
Message-ID: 201002152005.o1FK5Ts29002@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> > 7. Why is there no option to do parameterized-queries which replan every
> > time?
> >
> > This just seems like an area that has been neglected, or maybe I am
> > missing something and our current setup is acceptable.
>
> No, our current setup is not acceptable, and your questions are all
> right on target. I have been hoping that someone would take an
> interest in this problem for years. An option to replan on every
> execution would be a very, very fine thing. IMHO, there should also
> be a way to signal to PL/pgsql that you want this behavior for a
> particular query, short of wrapping it using EXECUTE, which is clunky
> and also forces a re-parse on every execution.

I was hoping I was wrong. :-(

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-16 14:22:00
Message-ID: 407d949e1002160622l65719aabpf68165681ee8b6be@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 15, 2010 at 7:51 PM, Jeroen Vermeulen <jtv(at)xs4all(dot)nl> wrote:
> AFAIC a statement could go to "re-planning mode" if the shortest execution
> time for the generic plan takes at least 10x longer than the longest
> planning time.  That gives us a decent shot at finding statements where
> re-planning is a safe bet.  A parameter that we or the user would have to
> tweak would just be a fragile approximation of that.

So in principle I agree with this idea. I think a conservative value
for the constant would be more like 100x though. If I told you we had
an easy way to speed all your queries up by 10% by caching queries but
were just choosing not to then I think you would be unhappy. Whereas
if I told you we were spending 1% of the run-time planning queries I
think most people would not be concerned.

There's a second problem though. We don't actually know how long any
given query is going to take to plan or execute. We could just
remember how long it took to plan and execute last time or how long it
took to plan last time and the average execution time since we cached
that plan. Perhaps we should track the stddev of the execution plan,
or the max execution time of the plan? Ie there are still unanswered
questions about the precise heuristic to use but I bet we can come up
with something reasonable.

--
greg


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bart Samwel <bart(at)samwel(dot)tk>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-16 14:28:57
Message-ID: 407d949e1002160628v14966092iea1bf26e34039344@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 15, 2010 at 7:11 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> 1. Why do we only do bind-level planning for anonymous wire-level queries?
>
> 2. I realize we did anonymous-only because that was the only way we had
> in the protocol to _signal_ bind-time planning, but didn't we think of
> this when we were implementing the wire-level protocol?

Is there any other difference between anonymous and non-anonymous
queries? If this is the only major difference do we need to separate
them? Is there any particular reason a driver would need two prepared
queries if they're both just going to be planned at execution time?

Incidentally, can you have two active anonymous portals at the same time?

> 4. Why don't we just always do planning at first bind time?  When is
> that worse than using generic values?
>
> 6. When do our generic columns costs significantly worse than having
> specific constants?  I assume unique columns are fine with generic
> constants.

Well using parameters will always have a better chance of producing a
better plan but that's not the only factor people consider important.
For a lot of users *predictability* is more important than absolute
performance. If my web server could run 10% faster that might be nice
but if it's capable of keeping up at its current speed it's not
terribly important. But if it means it crashes once a day because some
particular combination of parameters causes a bad plan to be used for
a specific user that's a bad trade-off.

--
greg


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bart Samwel <bart(at)samwel(dot)tk>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-16 14:31:44
Message-ID: 162867791002160631h7814785bo64d3e203fddb3ac1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> Well using parameters will always have a better chance of producing a
> better plan but that's not the only factor people consider important.
> For a lot of users *predictability* is more important than absolute
> performance. If my web server could run 10% faster that might be nice
> but if it's capable of keeping up at its current speed it's not
> terribly important. But if it means it crashes once a day because some
> particular combination of parameters causes a bad plan to be used for
> a specific user that's a bad trade-off.
>

+1

Pavel

> --
> greg
>


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bart Samwel <bart(at)samwel(dot)tk>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-16 20:17:32
Message-ID: 201002162017.o1GKHWa08835@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
> On Mon, Feb 15, 2010 at 7:11 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > 1. Why do we only do bind-level planning for anonymous wire-level queries?
> >
> > 2. I realize we did anonymous-only because that was the only way we had
> > in the protocol to _signal_ bind-time planning, but didn't we think of
> > this when we were implementing the wire-level protocol?
>
> Is there any other difference between anonymous and non-anonymous
> queries? If this is the only major difference do we need to separate
> them? Is there any particular reason a driver would need two prepared
> queries if they're both just going to be planned at execution time?

Well, anonymous prepared queries are replanned for _every_ bind, so I
don't see a huge value in allowing multiple unnamed queries, except you
have to re-send the old query to prepare if you need to reuse it.

In fact, this behavior was not totally clear so I updated the
documentation a little with the attached patch.

> Incidentally, can you have two active anonymous portals at the same time?

No, the first one is deleted when the second is created, i.e., our docs
have:

An unnamed prepared statement lasts only until the next Parse statement
specifying the unnamed statement as destination is issued. (Note that a
simple Query message also destroys the unnamed statement.)

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
/rtmp/diff text/x-diff 1.8 KB

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bart Samwel <bart(at)samwel(dot)tk>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-17 03:12:50
Message-ID: 407d949e1002161912o58e60b0fu390eb0ebcc6b2b7b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 16, 2010 at 8:17 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> Incidentally, can you have two active anonymous portals at the same time?
>
> No, the first one is deleted when the second is created, i.e., our docs
> have:
>
>        An unnamed prepared statement lasts only until the next Parse statement
>        specifying the unnamed statement as destination is issued.  (Note that a
>        simple Query message also destroys the unnamed statement.)

I had to reread it myself but I think you've confused portals with
prepared statements. You can have an unnamed prepared statement --
which is planned at Bind time -- but execute it with a named portal
allowing you to keep it active while you execute a second query.

If you have code like:

$sth = $dbh->execute('select * from tab');
while ($sth->fetch) {
$dbh->execute('insert into tab2');
}

The first execute needs to prepare and execute the first statement. It
doesn't need to keep a named prepared statement handle around because
it'll never be able to re-execute it anyways. But the resulting portal
does need to be a named portal because otherwise the driver will be up
a creek when it comes to the second execute.

The case that's fairly awkward at the moment -- but still not
impossible to handle -- is when the driver sees a prepare and bind but
no execute for a while. Coding like:

$sth1 = $dbh->prepare('select * from tab where id = ?');
$sth2 = $dbh->prepare('select * from tab2 where id = ?');
$sth1->bind(1);
$sth2->bind(2);
$sth1->execute();
$sth2->execute();

In that case the driver is kind of stuck. It can't use the unnamed
prepared statement when the prepare() calls are done. If it wants the
plan-at-bind semantics then It would have to create a "fake" prepared
statement which it doesn't actually send the prepare message for until
the bind arrives.

--
greg


From: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-17 22:52:14
Message-ID: 4B7C731E.6080904@xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:

> So in principle I agree with this idea. I think a conservative value
> for the constant would be more like 100x though. If I told you we had
> an easy way to speed all your queries up by 10% by caching queries but
> were just choosing not to then I think you would be unhappy. Whereas
> if I told you we were spending 1% of the run-time planning queries I
> think most people would not be concerned.

Makes sense. The main thing is that there be an order-of-magnitude
difference to hide the potential extra planning cost in. If that
includes a switched SSL connection, 10% of execution is probably
reasonable because it's a much smaller portion of overall response
time--but on a local connection it's a liability.

> There's a second problem though. We don't actually know how long any
> given query is going to take to plan or execute. We could just
> remember how long it took to plan and execute last time or how long it
> took to plan last time and the average execution time since we cached
> that plan. Perhaps we should track the stddev of the execution plan,
> or the max execution time of the plan? Ie there are still unanswered
> questions about the precise heuristic to use but I bet we can come up
> with something reasonable.

I may have cut this out of my original email for brevity... my
impression is that the planner's estimate is likely to err on the side
of scalability, not best-case response time; and that this is more
likely to happen than an optimistic plan going bad at runtime.

If that is true, then the cost estimate is at least a useful predictor
of statements that deserve re-planning. If it's not true (or for cases
where it's not true), actual execution time would be a useful back-up at
the cost of an occasional slow execution.

Yeb points out a devil in the details though: the cost estimate is
unitless. We'd have to have some orders-of-magnitude notion of how the
estimates fit into the picture of real performance.

Jeroen


From: "Pierre C" <lists(at)peufeu(dot)com>
To: "Greg Stark" <gsstark(at)mit(dot)edu>, "Jeroen Vermeulen" <jtv(at)xs4all(dot)nl>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Bart Samwel" <bart(at)samwel(dot)tk>, "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-18 14:58:25
Message-ID: op.u8bvvnwveorkce@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 16 Feb 2010 15:22:00 +0100, Greg Stark <gsstark(at)mit(dot)edu> wrote:

> There's a second problem though. We don't actually know how long any
> given query is going to take to plan or execute. We could just
> remember how long it took to plan and execute last time or how long it
> took to plan last time and the average execution time since we cached
> that plan. Perhaps we should track the stddev of the execution plan,
> or the max execution time of the plan? Ie there are still unanswered
> questions about the precise heuristic to use but I bet we can come up
> with something reasonable.

This could be an occasion to implement plan caching...

Web 2.0 = AJAX means less need for heavy webpage reloads with (usually)
lots of queries, and more small simple queries like selects returning 1 or
a few rows every time the user clicks on something.

See benchmark here : (PG 8.4.2, MYSQL 5.1.37)
http://purity.bobfuck.net/posts/postgres/2010-02-Prep/

If prepared statements are used, MySQL is not faster for "small, simple
selects"...
However, when not using prepared statements, most of the postmaster CPU
time is spent parsing & planning.

Problem with prepared statements is they're a chore to use in web apps,
especially PHP, since after grabbing a connection from the pool, you don't
know if it has prepared plans in it or not.

The postmaster could keep a hash of already prepared plans, using the
$-parameterized query as a hash key, and when it receives parse+bind
message, look up in this cache and fetch plans for the query, avoiding
planning entirely.

This could be done by the connection pooler too, but it doesn't have the
information to decide wether it's wise to cache a plan or not.

Of course all the subtility is to determine if the plan is reusable with
other parameters...

- after planning and executing the query, only cache it if the plan time
is a significant part of the query time (as said previously).
- only simple queries should be automatically cached like this
- perhaps some measure of "plan volatility" ? For the examples I give in
the link above, it's quite easy at least in 2 of the cases : searching
UNIQUE columns can't return more than 1 row, so volatility is zero. It
only depends on the table size.


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: "Pierre C" <lists(at)peufeu(dot)com>
Cc: "Greg Stark" <gsstark(at)mit(dot)edu>, "Jeroen Vermeulen" <jtv(at)xs4all(dot)nl>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Bart Samwel" <bart(at)samwel(dot)tk>, "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-18 15:09:42
Message-ID: 871vgir361.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Pierre C" <lists(at)peufeu(dot)com> writes:
> Problem with prepared statements is they're a chore to use in web apps,
> especially PHP, since after grabbing a connection from the pool, you don't
> know if it has prepared plans in it or not.

Have you met preprepare yet?

http://preprepare.projects.postgresql.org/README.html
http://packages.debian.org/source/sid/preprepare

Regards,
--
dim


From: "Pierre C" <lists(at)peufeu(dot)com>
To: "Dimitri Fontaine" <dfontaine(at)hi-media(dot)com>
Cc: "Greg Stark" <gsstark(at)mit(dot)edu>, "Jeroen Vermeulen" <jtv(at)xs4all(dot)nl>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Bart Samwel" <bart(at)samwel(dot)tk>, "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-18 16:47:01
Message-ID: op.u8b0wnhteorkce@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 18 Feb 2010 16:09:42 +0100, Dimitri Fontaine
<dfontaine(at)hi-media(dot)com> wrote:

> "Pierre C" <lists(at)peufeu(dot)com> writes:
>> Problem with prepared statements is they're a chore to use in web apps,
>> especially PHP, since after grabbing a connection from the pool, you
>> don't
>> know if it has prepared plans in it or not.
>
> Have you met preprepare yet?
>
> http://preprepare.projects.postgresql.org/README.html
> http://packages.debian.org/source/sid/preprepare
>
> Regards,

Hey, this thing is nice.
How hard would it be to put a hook in pg so that, instead of raising an
error and cancelling the txn when EXECUTing a statement that is not
prepared, it would call a user function (of the user's choice) which
would, if possible, prepare said statement, or if not, raise the error ?


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: "Pierre C" <lists(at)peufeu(dot)com>
Cc: "Greg Stark" <gsstark(at)mit(dot)edu>, "Jeroen Vermeulen" <jtv(at)xs4all(dot)nl>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Bart Samwel" <bart(at)samwel(dot)tk>, "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-18 16:54:31
Message-ID: 873a0ypjqw.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Pierre C" <lists(at)peufeu(dot)com> writes:
> On Thu, 18 Feb 2010 16:09:42 +0100, Dimitri Fontaine
> <dfontaine(at)hi-media(dot)com> wrote:
>> http://preprepare.projects.postgresql.org/README.html
>> http://packages.debian.org/source/sid/preprepare
>
> Hey, this thing is nice.

Thanks :)

> How hard would it be to put a hook in pg so that, instead of raising an
> error and cancelling the txn when EXECUTing a statement that is not
> prepared, it would call a user function (of the user's choice) which would,
> if possible, prepare said statement, or if not, raise the error ?

What about catching the error in the application and INSERT'ing into the
current preprepare.relation table? The aim would be to do that in dev or
in pre-prod environments, then copy the table content in production.

Notice you can also SET preprepare.relation TO 'any.table_or_view', so
that you can have more than one table for different applications or
parts.

I wonder how far you'd get using a VIEW instead and calling a user
function there, but arranging preprepare to support a function instead
is indeed possible. I'll try and see about it this evening, maybe
tomorrow afternoon.

Regards,
--
dim


From: "Pierre C" <lists(at)peufeu(dot)com>
To: "Dimitri Fontaine" <dfontaine(at)hi-media(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-18 20:19:21
Message-ID: op.u8caiphxeorkce@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> What about catching the error in the application and INSERT'ing into the
> current preprepare.relation table? The aim would be to do that in dev or
> in pre-prod environments, then copy the table content in production.

Yep, but it's a bit awkward and time-consuming, and not quite suited to
ORM-generated requests since you got to generate all the plan names, when
the SQL query itself would be the most convenient "unique identifier"...

A cool hack would be something like that :

pg_execute( "SELECT ...", arguments... )

By inserting a hook which calls a user-specified function on non-existing
plan instead of raising an error, this could work.
However, this wouldn't work as-is since the plan name must be <=
NAMEDATALEN, but you get the idea ;)


From: David Christensen <david(at)endpoint(dot)com>
To: Pierre C <lists(at)peufeu(dot)com>
Cc: "Dimitri Fontaine" <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-19 02:31:05
Message-ID: B2AD690B-D9B4-4D42-94AD-9D11CAE5DD03@endpoint.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Feb 18, 2010, at 2:19 PM, Pierre C wrote:

>
>> What about catching the error in the application and INSERT'ing
>> into the
>> current preprepare.relation table? The aim would be to do that in
>> dev or
>> in pre-prod environments, then copy the table content in production.
>
> Yep, but it's a bit awkward and time-consuming, and not quite suited
> to ORM-generated requests since you got to generate all the plan
> names, when the SQL query itself would be the most convenient
> "unique identifier"...
>
> A cool hack would be something like that :
>
> pg_execute( "SELECT ...", arguments... )
>
> By inserting a hook which calls a user-specified function on non-
> existing plan instead of raising an error, this could work.
> However, this wouldn't work as-is since the plan name must be <=
> NAMEDATALEN, but you get the idea ;)

How about the SHA1 hash of the query? Hey, it works for git... :-)

Regards,

David
--
David Christensen
End Point Corporation
david(at)endpoint(dot)com


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: David Christensen <david(at)endpoint(dot)com>
Cc: Pierre C <lists(at)peufeu(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-19 13:57:51
Message-ID: 20100219135751.GD25733@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 18, 2010 at 08:31:05PM -0600, David Christensen wrote:
>
> On Feb 18, 2010, at 2:19 PM, Pierre C wrote:
>
>>
>>> What about catching the error in the application and INSERT'ing into the
>>> current preprepare.relation table? The aim would be to do that in dev or
>>> in pre-prod environments, then copy the table content in production.
>>
>> Yep, but it's a bit awkward and time-consuming, and not quite suited to
>> ORM-generated requests since you got to generate all the plan names, when
>> the SQL query itself would be the most convenient "unique identifier"...
>>
>> A cool hack would be something like that :
>>
>> pg_execute( "SELECT ...", arguments... )
>>
>> By inserting a hook which calls a user-specified function on non-existing
>> plan instead of raising an error, this could work.
>> However, this wouldn't work as-is since the plan name must be <=
>> NAMEDATALEN, but you get the idea ;)
>
> How about the SHA1 hash of the query? Hey, it works for git... :-)
>
> Regards,
>
> David
> --
> David Christensen
> End Point Corporation
> david(at)endpoint(dot)com
>
>
Hi David,

Not to beat out own drum, but we already include a hashing function
that can be used for this purpose and is much faster than SHA-1. We
would want to use all 64-bits for this use instead of just the 32-bits
we currently use for the internal DB hashing. Here is an article
comparing the Jenkins' Hash (the one we use) and SHA-1:

http://home.comcast.net/~bretm/hash/

Regards,
Ken


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: "Pierre C" <lists(at)peufeu(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-19 17:56:12
Message-ID: m28wap5cub.fsf@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Pierre C" <lists(at)peufeu(dot)com> writes:
> Yep, but it's a bit awkward and time-consuming, and not quite suited to
> ORM-generated requests since you got to generate all the plan names, when
> the SQL query itself would be the most convenient "unique
> identifier"...

The SHA1 proposal seems better to me.

Now you still have to fill the table with statements and names, and I
stand on the opinion that having a development-only layer in the ORM to
do just that is better.

That's supposing you don't generate so many different normalized queries
that you can't generate them all in dev or from the unit tests…

> A cool hack would be something like that :
>
> pg_execute( "SELECT ...", arguments... )
>
> By inserting a hook which calls a user-specified function on non-existing
> plan instead of raising an error, this could work.

This I'm not helping with. It's definitely not the same skills and time
requirements as offering a simple function based alternative to the
table lookup in preprepare, in the "load them all" phase.

> However, this wouldn't work as-is since the plan name must be <=
> NAMEDATALEN, but you get the idea ;)

SHA1 or even MD5 would do, the later having the advantage of being
already supported by PostgreSQL core.

Regards,
--
dim


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-21 12:37:50
Message-ID: 603c8f071002210437w43a58131r85bbe7eff90bc266@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 17, 2010 at 5:52 PM, Jeroen Vermeulen <jtv(at)xs4all(dot)nl> wrote:
> I may have cut this out of my original email for brevity... my impression is
> that the planner's estimate is likely to err on the side of scalability, not
> best-case response time; and that this is more likely to happen than an
> optimistic plan going bad at runtime.

Interestingly, most of the mistakes that I have seen are in the
opposite direction.

> Yeb points out a devil in the details though: the cost estimate is unitless.
>  We'd have to have some orders-of-magnitude notion of how the estimates fit
> into the picture of real performance.

I'm not sure to what extent you can assume that the cost is
proportional to the execution time. I seem to remember someone
(Peter?) arguing that they're not related by any fixed ratio, partly
because things like page costs vs. cpu costs didn't match physical
reality, and that in fact some attempts to gather better empirically
better values for things like random_page_cost and seq_page_cost
actually ended up making the plans worse rather than better. It would
be nice to see some research in this area...

...Robert


From: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 02:48:05
Message-ID: 4B873665.20208@xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Wed, Feb 17, 2010 at 5:52 PM, Jeroen Vermeulen <jtv(at)xs4all(dot)nl> wrote:
>> I may have cut this out of my original email for brevity... my impression is
>> that the planner's estimate is likely to err on the side of scalability, not
>> best-case response time; and that this is more likely to happen than an
>> optimistic plan going bad at runtime.
>
> Interestingly, most of the mistakes that I have seen are in the
> opposite direction.

I guess there's not much we can do about those, except decide after
running that it's worth optimizing for specific values.

>> Yeb points out a devil in the details though: the cost estimate is unitless.
>> We'd have to have some orders-of-magnitude notion of how the estimates fit
>> into the picture of real performance.
>
> I'm not sure to what extent you can assume that the cost is
> proportional to the execution time. I seem to remember someone
> (Peter?) arguing that they're not related by any fixed ratio, partly
> because things like page costs vs. cpu costs didn't match physical
> reality, and that in fact some attempts to gather better empirically
> better values for things like random_page_cost and seq_page_cost
> actually ended up making the plans worse rather than better. It would
> be nice to see some research in this area...

Getting representative workloads and machine configurations may make
that hard. :/

But all we really want is a check for really obscene costs, as an extra
stopgap so we don't have to wait for the thing to execute before we
decide it's too costly. Surely there must be some line we can draw.

Jeroen


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 03:19:21
Message-ID: 603c8f071002251919m6f43747bi336680806c40adc4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 25, 2010 at 9:48 PM, Jeroen Vermeulen <jtv(at)xs4all(dot)nl> wrote:
> Robert Haas wrote:
>>
>> On Wed, Feb 17, 2010 at 5:52 PM, Jeroen Vermeulen <jtv(at)xs4all(dot)nl> wrote:
>>>
>>> I may have cut this out of my original email for brevity... my impression
>>> is
>>> that the planner's estimate is likely to err on the side of scalability,
>>> not
>>> best-case response time; and that this is more likely to happen than an
>>> optimistic plan going bad at runtime.
>>
>> Interestingly, most of the mistakes that I have seen are in the
>> opposite direction.
>
> I guess there's not much we can do about those, except decide after running
> that it's worth optimizing for specific values.
>
>
>>> Yeb points out a devil in the details though: the cost estimate is
>>> unitless.
>>>  We'd have to have some orders-of-magnitude notion of how the estimates
>>> fit
>>> into the picture of real performance.
>>
>> I'm not sure to what extent you can assume that the cost is
>> proportional to the execution time.  I seem to remember someone
>> (Peter?) arguing that they're not related by any fixed ratio, partly
>> because things like page costs vs. cpu costs didn't match physical
>> reality, and that in fact some attempts to gather better empirically
>> better values for things like random_page_cost and seq_page_cost
>> actually ended up making the plans worse rather than better.  It would
>> be nice to see some research in this area...
>
> Getting representative workloads and machine configurations may make that
> hard.  :/
>
> But all we really want is a check for really obscene costs, as an extra
> stopgap so we don't have to wait for the thing to execute before we decide
> it's too costly.  Surely there must be some line we can draw.

I actually think there isn't any clean line. Obscene is in the eye of
the beholder. Frankly, I think this discussion is getting off into
the weeds. It would be nice, perhaps, to have a feature that will
detect when the generic plan is the suxxor and attempt to find a
better one, but that's really, really hard for a whole bunch of
reasons. Bruce's suggestion that we should provide some user control
over whether we plan at bind time or execute time seems likely to be
(1) considerably simpler to implement, (2) considerably easier to get
consensus on, and (3) capable of giving 90% of the benefit for an only
higher inconvenience factor.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 03:40:35
Message-ID: 28723.1267155635@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I actually think there isn't any clean line. Obscene is in the eye of
> the beholder. Frankly, I think this discussion is getting off into
> the weeds. It would be nice, perhaps, to have a feature that will
> detect when the generic plan is the suxxor and attempt to find a
> better one, but that's really, really hard for a whole bunch of
> reasons. Bruce's suggestion that we should provide some user control
> over whether we plan at bind time or execute time seems likely to be
> (1) considerably simpler to implement, (2) considerably easier to get
> consensus on, and (3) capable of giving 90% of the benefit for an only
> higher inconvenience factor.

It's not going to be easier to implement. Yeah, it would be easy to
provide a global switch via a GUC setting, but that's not going to be
helpful, because this is the sort of thing that really needs to be
managed per-query. Almost any nontrivial application is going to have
some queries that really need the custom plan and many that don't.
If people just turn the GUC on we might as well throw away the plan
caching mechanism altogether. But putting support for a per-query level
of control into the protocol (and then every client library) as well as
every PL is going to be painful to implement, and even more painful to
use.

I still like the idea of automatically replanning with the known
parameter values, and noting whether the result plan was estimated to be
noticeably cheaper than the generic plan, and giving up on generating
custom plans if we didn't observe any such win over N tries.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 04:01:14
Message-ID: 603c8f071002252001n6b2467bal2e6bae26a9a2b79a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 25, 2010 at 10:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I actually think there isn't any clean line.  Obscene is in the eye of
>> the beholder.  Frankly, I think this discussion is getting off into
>> the weeds.  It would be nice, perhaps, to have a feature that will
>> detect when the generic plan is the suxxor and attempt to find a
>> better one, but that's really, really hard for a whole bunch of
>> reasons.  Bruce's suggestion that we should provide some user control
>> over whether we plan at bind time or execute time seems likely to be
>> (1) considerably simpler to implement, (2) considerably easier to get
>> consensus on, and (3) capable of giving 90% of the benefit for an only
>> higher inconvenience factor.
>
> It's not going to be easier to implement.  Yeah, it would be easy to
> provide a global switch via a GUC setting, but that's not going to be
> helpful, because this is the sort of thing that really needs to be
> managed per-query.  Almost any nontrivial application is going to have
> some queries that really need the custom plan and many that don't.
> If people just turn the GUC on we might as well throw away the plan
> caching mechanism altogether.

I agree. A GUC is a really bad idea.

> But putting support for a per-query level
> of control into the protocol (and then every client library) as well as
> every PL is going to be painful to implement, and even more painful to
> use.

I suppose I should have learned by now not to argue with you over
technical points, but I don't see why this should be painful. I mean,
it'll be a lot of work and it'll in the end touch a lot of different
parts of the code, but work != pain, and I don't see any reason why
the problem can't be attacked incrementally. I'm also deeply
unconvinced that any other solution will be as satisfactory.

> I still like the idea of automatically replanning with the known
> parameter values, and noting whether the result plan was estimated to be
> noticeably cheaper than the generic plan, and giving up on generating
> custom plans if we didn't observe any such win over N tries.

Isn't part of the problem here precisely that the cost estimates for
the generic plan might not be too accurate? The only instances of
this problem I've run across are the ones where MCVs need a different
treatment, and the problem isn't necessarily that the new estimate is
cheaper so much as that the old estimate isn't going to turn out as
predicted. Also, there's no guarantee that the distribution of values
tried will be random - there's the case where non-MCVs are tried for
the first N times and then a non-MCV is tried on try N+1.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 04:25:46
Message-ID: 7882.1267158346@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Feb 25, 2010 at 10:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I still like the idea of automatically replanning with the known
>> parameter values, and noting whether the result plan was estimated to be
>> noticeably cheaper than the generic plan, and giving up on generating
>> custom plans if we didn't observe any such win over N tries.

> Isn't part of the problem here precisely that the cost estimates for
> the generic plan might not be too accurate?

No, the estimates for the generic plan are typically fine *in
themselves*; they only look bad when you compare them to what you can do
with knowledge of specific parameter values. An example is that the
default selectivity estimate for a range query (WHERE x > something AND
x < somethingelse) is 0.005. In a large number of real cases, the
actual selectivity is way smaller, and you can determine that if you
know the actual comparison constants. But it's tough to argue for
decreasing the default guess --- it's already small enough that you
could get screwed badly in the other direction if you queried a wide
range.

There may be some cases where the generic plan is wrongly estimated to
be cheaper than a custom plan that's actually better, but I haven't seen
many. If that were happening a lot then people would be reporting that
the advice to force a replan via EXECUTE or whatever doesn't help.
I don't think that there is any body of evidence at all that would
justify undertaking extremely expensive development of an extremely
painful-to-use feature to deal with that type of case.

> Also, there's no guarantee that the distribution of values
> tried will be random - there's the case where non-MCVs are tried for
> the first N times and then a non-MCV is tried on try N+1.

Sure, there are always going to be cases where you lose. Pushing the
responsibility onto the user doesn't really fix that though. It's not
apparent to me that users are in that much better position than we are
to determine when a custom plan is helpful.

BTW, if it wasn't clear, I would be in favor of allowing the cutoff N to
be adjustable, as well as the cost ratio that's considered to constitute
a win. So there would be some wiggle room to deal with that type of
situation.

regards, tom lane


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 04:28:14
Message-ID: 34d269d41002252028n196327a0kc2ad47add6bab422@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 25, 2010 at 20:40, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> It's not going to be easier to implement.  Yeah, it would be easy to
> provide a global switch via a GUC setting, but that's not going to be
> helpful, because this is the sort of thing that really needs to be
> managed per-query.  Almost any nontrivial application is going to have
> some queries that really need the custom plan and many that don't.
> If people just turn the GUC on we might as well throw away the plan
> caching mechanism altogether.  But putting support for a per-query level
> of control into the protocol (and then every client library) as well as
> every PL is going to be painful to implement, and even more painful to
> use.

Not to mention you can already do this more or less client side with a
nice driver. For example with DBD::Pg i can say:

$sth = $dbh->prepare('select * from foo where x = ?', {'pg_server_prepare'=>1});

To get a prepared plan (it is also the default).

If for a particular query I know that I will get a better plan without
prepare, I can just change that 1 to a 0. Or I can set it globally
via $dbh->{'pg_server_prepare'} = 0;

In other words im not quite sure what this would buy us.


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 04:46:20
Message-ID: 34d269d41002252046n2b117436u845a25f892c02629@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 25, 2010 at 21:28, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> Not to mention you can already do this more or less client side with a
> nice driver.
> [ uninformed noise ... ]

I did seem to miss the part where everyone thinks this is a crock...
But I don't remember seeing numbers on parse time or how much
bandwidth this would potentially save. People seem to think it would
be a big savings for just those 2 reasons? Or did I miss some other
benefit?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 05:11:56
Message-ID: 16678.1267161116@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alex Hunsaker <badalex(at)gmail(dot)com> writes:
> I did seem to miss the part where everyone thinks this is a crock...
> But I don't remember seeing numbers on parse time or how much
> bandwidth this would potentially save. People seem to think it would
> be a big savings for just those 2 reasons? Or did I miss some other
> benefit?

Uh, no, this isn't about saving either parse time or bandwidth.
The discussion is about when to expend more planning time in hopes
of getting better plans.

regards, tom lane


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 06:29:41
Message-ID: 34d269d41002252229o2381fa19wbd898db756709d3a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 25, 2010 at 22:11, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alex Hunsaker <badalex(at)gmail(dot)com> writes:
> Uh, no, this isn't about saving either parse time or bandwidth.
> The discussion is about when to expend more planning time in hopes
> of getting better plans.

This is what im tripping over:

> > Bruce's suggestion that we should provide some user control over whether we plan at bind time or execute time

Let me see if I can sum up what I was trying to say:

Prepared plans + bind plan (what we have now):
Use: when you have a query that takes a long time to plan
Problems: if you use parameters you might no get a good plan
Solution: if you have stable parameters dont pass them as such, inline them
Better: If we could figure out and make we could make better plans
on the fly and use them

[ aka if you have a good driver you can easily control if its a
prepared statement or not without changing how you quote or inline
your sql ]

Prepared plans + exec plan (new guc/ protocol thing):
Use: not quite sure
Problems: slow because it would replan every time
Solutions: use a prepared plan with the appropriate things not
parametrized...?

[ aka we already have this, its called dont use a prepared statement ]

Whats the benefit of prepare + exec plan vs just inlining? What do
you save? Some parse time? Some bandwidth? Yeah if its a smart knob
that does things like "Oh, I see I have an mvc for that param, ill
give you a better plan". OK, But the knob no longer means plan at
execute time. It more in the realm of what Tom is suggesting IMHO.
Anyway I feel like im probably just violently agreeing.


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alex Hunsaker <badalex(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 08:13:31
Message-ID: 4B8782AB.4050504@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

My preference is to deal with the specific value vs generic value issue.

For this issue, it can affect performance even if PREPARE/EXECUTE is
execute exactly once.

In the last case I saw, a certain query was executing once every second,
and with a specific value it would take < 1 ms, and with a generic value
it would take > 50 ms. That's 5% system load for one CPU core to do
nothing. After analysis, it was clearly a "common value" vs "not common
value" problem. For this particular table, it stored an integer, but
only used two values across something like 100k rows. The query was for
a third value that did not exist. The difference was a sequential scan
vs an index lookup.

I do not know whether the application was doing PREPARE/EXECUTE each
time, or whether it was doing PREPARE once in advance and then EXECUTE
each time after that, but I don't think it matters, either, as I think
both cases deserve attention, and the problem is the same in both cases.
Even one generic plan run costs 50+ the cost of both planning and execution.

Re-planning a generic plan with another generic plan may generate zero
benefit, with a measurable cost. More on this after...

All the points about ms seem invalid to me. There are many reason why ms
could increase, and many of them have nothing to do with plan
efficiency. Again, re-planning due to a high ms, or a high ratio of ms,
does not indicate that re-planning will improve the success of the plan.
The planning process does not measure ms or predict ms.

My idea of an optimal system is as follows:

1) Prepare gathers and caches data about the tables involved in the
query, including column statistics that are likely to be required during
the planning process, but prepare does not running the planning process.

2) Execute runs the planning process re-using data cached by prepare,
and then executes the plan.

3) Advanced: Execute may cache the selected plan for re-use only if it
can identify a set of criteria that would allow the selected plan to be
tested and invalidated if the parameter nature has changed such that a
re-planning would likely choose another plan. Execute may cache multiple
plans against a prepared statement, provided that each cached plan
identify invalidation criteria.

4) Even more Advanced: Prepare may identify that elements of the plan
that will always be the same, no matter what parameter is specified, and
cache these results for substitution into the planning phase when
execute is run. (Effectively lifting the planning from execute to
prepare, but only where it makes obvious [= cheap to detect] sense)

This treats the whole statement planning and execution as a pipeline,
lengthening the pipeline, and adjusting some of the pipeline elements
from prepare to execute. It has the benefit of having fast
prepare/execute whether execute is invoked only once or many times. The
effect is that all statements are specifically planned, but specific
plans are re-used wherever possible.

To support the case of changing data, I think the analyze process should
be able to force invalidation of cached plans, and force the cached
column statistics for prepared statements to be invalidated and
re-queried on demand, or push new statistics directly into the prepared
statements. It makes no sense (to me) to re-plan for the same parameters
until an analyze is done, so this tells me that analyze is the event
that should cause the re-plan to occur.

I think anything less than the above will increasing the performance of
some queries while describing the performance of other queries. It might
be possible to guess which queries are more valuable to people than
others, and hard code solutions for these specific queries, but hard
coding solutions will probably always be a "lowest hanging fruit" solution.

After writing this, I'm pretty sure that implementation of the above
into PostgreSQL would be difficult, and it could be a valid concern that
the investment is not worth the benefit at this time. It's a tough problem.

My $0.01 CDN. :-)

Cheers,
mark


From: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alex Hunsaker <badalex(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 10:20:17
Message-ID: 4B87A061.40206@xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Mielke wrote:

> Re-planning a generic plan with another generic plan may generate zero
> benefit, with a measurable cost. More on this after...

Nobody's talking about doing that any more. I proposed it initially
because I didn't know about changes that made it unnecessary.

> All the points about ms seem invalid to me. There are many reason why ms
> could increase, and many of them have nothing to do with plan
> efficiency. Again, re-planning due to a high ms, or a high ratio of ms,
> does not indicate that re-planning will improve the success of the plan.
> The planning process does not measure ms or predict ms.

That's true, but missing some very basic points about the idea: one, if
we can tell that a query is going to be expensive, then the cost of
re-planning it is marginal. Two, if we can tell that a query is going
to be expensive, then we stand a lot to gain if re-planning turns out to
be useful. It follows that we can afford to re-plan on the off-chance,
without anything more than a vague orders-of-magnitude idea of what
"expensive" means.

What Tom said validates a big assumption I've been making: that we do in
fact have a decent shot at telling in advance that a query is going to
be expensive. Which means we have a decent shot at stopping your 100ms
query from taking seconds just because you prepared it and are missing
out on that tiny partial index. That would be worth the extra planning
time at a 1% hit rate, and there's not much downside if we don't reach that.

> My idea of an optimal system is as follows:
>
> 1) Prepare gathers and caches data about the tables involved in the
> query, including column statistics that are likely to be required during
> the planning process, but prepare does not running the planning process.

It sounds to me like you're in the process of inventing another planning
process. Developer time aside, how much CPU time can you afford to
throw at this?

I don't see any reason to argue over what would be optimal when so much
information is still missing. It just makes the problem look harder
than it is. To me, our best shot at getting something useful is to stay
simple and defensive. After that, if there is still a need, we'll have
code to help us gather more data and figure out how to make it better.
Nothing wrong with the lowest-hanging fruit.

Jeroen


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alex Hunsaker <badalex(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 14:49:35
Message-ID: 4B87DF7F.4000100@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/26/2010 05:20 AM, Jeroen Vermeulen wrote:
> Mark Mielke wrote:
>
>> All the points about ms seem invalid to me. There are many reason why
>> ms could increase, and many of them have nothing to do with plan
>> efficiency. Again, re-planning due to a high ms, or a high ratio of
>> ms, does not indicate that re-planning will improve the success of
>> the plan. The planning process does not measure ms or predict ms.
>
> That's true, but missing some very basic points about the idea: one,
> if we can tell that a query is going to be expensive, then the cost of
> re-planning it is marginal. Two, if we can tell that a query is going
> to be expensive, then we stand a lot to gain if re-planning turns out
> to be useful. It follows that we can afford to re-plan on the
> off-chance, without anything more than a vague orders-of-magnitude
> idea of what "expensive" means.
>
> What Tom said validates a big assumption I've been making: that we do
> in fact have a decent shot at telling in advance that a query is going
> to be expensive. Which means we have a decent shot at stopping your
> 100ms query from taking seconds just because you prepared it and are
> missing out on that tiny partial index. That would be worth the extra
> planning time at a 1% hit rate, and there's not much downside if we
> don't reach that.

You trimmed most of my concerns. :-) Problems:

1) If I do a PREPARE/EXECUTE, the above lengthens the process from
1 generic planning plus 1 generic plan execute to 1 generic planning, 1
specific planning, and 1 specific plan execution. This is still overall
longer than a regular statement and it still may be longer than the
original generic plan on its own. The hope is that the analysis is
somehow detecting the scenario where a generic plan makes no sense, but
the criteria is not about whether the generic plan actually does make
sense - the criteria is "can the customer afford to wait longer for us
to second guess ourselves?" It's a guess. As a guess, it means sometimes
it will be right, and sometimes it will be wrong.

2) Only the "order of magnitude" (by estimate) plans will benefit.
If you set the number to 100X, then most plans won't benefit. If you set
it to less than 100X, you increase the chance of guessing wrong in other
cases. In any case, there is still no guarantee that a specific plan
will be faster, so even in the 100X case, the overall results could be
slower - it's just that you've decided the customer can afford to wait
longer.

>> My idea of an optimal system is as follows:
>>
>> 1) Prepare gathers and caches data about the tables involved in the
>> query, including column statistics that are likely to be required
>> during the planning process, but prepare does not running the
>> planning process.
>
> It sounds to me like you're in the process of inventing another
> planning process. Developer time aside, how much CPU time can you
> afford to throw at this?

I already said I don't think PostgreSQL could easily evolve here.
However, I wanted to point out that the problem may be architectural.

As for developer time and CPU time, that's not really relevant. If
PREPARE/EXECUTE could be reliably sped up, than the savings is probably
measure in millions of dollars or more, as it is widely used by many
applications throughout the day on hundreds of thousands of computers.

Oh, you mean is it worth scratching my itch? :-) Not really. I was
thinking about it yesterday and decided that such a major change might
just as easily result in a new database engine, and I didn't want to go
there.

Still, if some clever person agrees with me that it is an architecture
problem, and that PostgreSQL could benefit from a clean "from scratch"
caching mechanism for statements (note that what I described could
probably be extended to support automatic prepare of every statement,
and matching of query to prepared statement based on text, similar to
MySQL query caching), and can come up with a way to do this using the
existing architecture - that would be great. Or, they can tell me "too
hard" as you are. That's fine too... :-)

> I don't see any reason to argue over what would be optimal when so
> much information is still missing. It just makes the problem look
> harder than it is. To me, our best shot at getting something useful
> is to stay simple and defensive. After that, if there is still a
> need, we'll have code to help us gather more data and figure out how
> to make it better. Nothing wrong with the lowest-hanging fruit.

What information is missing?

PREPARE sucks in many known situations. It is a documented fact. :-)

Will "guessing" at when the user can afford to wait longer improve the
situation? Maybe or often, but not always.

Cheers,
mark


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 15:07:00
Message-ID: 27054.1267196820@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alex Hunsaker <badalex(at)gmail(dot)com> writes:
> Let me see if I can sum up what I was trying to say:
> [ this can be solved by using or avoiding prepared statements ]

Not really. The place where that argument really fails is inside
server-side functions: you don't get to use query submission protocol
there. But even for client-submitted queries, it's often the case
that there is only one convenient way to do it given a particular
software stack on the client side. If there is more than one way
they're usually radically different notationally, and some of them
might be vulnerable to SQL injection. The comparable thing in plpgsql
is EXECUTE versus direct execution of a query: they're very different
notationally, so unpleasant to convert between, and it's easy to
create a security hole when interpolating parameters in EXECUTE.

I think this is basically a planner problem and should be fixed in the
planner, not by expecting users to make significant changes in
application logic in order to create an indirect effect.

regards, tom lane


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 15:27:30
Message-ID: 34d269d41002260727h4f85286awd369ba01f895f169@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 26, 2010 at 08:07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alex Hunsaker <badalex(at)gmail(dot)com> writes:
>> Let me see if I can sum up what I was trying to say:
>> [ this can be solved by using or avoiding prepared statements ]
>
> Not really.  The place where that argument really fails is inside
> server-side functions: you don't get to use query submission protocol
> there.

| You look around, its dark... You notice someone coming towards you.
He tosses you a torch. You kindly thank him.

Ahh. ISTM re-planning those every time is entirely the wrong knob.

> If there is more than one way
> they're usually radically different notationally, and some of them
> might be vulnerable to SQL injection.

Good point. However this is why I was talking about client side
drivers, they can provide an api to do this correctly and
conveniently. The fact that some have not might hint that its not
really worth it most of the time?

> I think this is basically a planner problem and should be fixed in the
> planner, not by expecting users to make significant changes in
> application logic in order to create an indirect effect.

Well, of course you do. We all know how you feel about planner hints
:) [ Tongue-in-cheek ]

Anyway, I completely agree-- its a much more elegant and workable solution.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 16:27:11
Message-ID: 28767.1267201631@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Mielke <mark(at)mark(dot)mielke(dot)cc> writes:
> Will "guessing" at when the user can afford to wait longer improve the
> situation? Maybe or often, but not always.

There is no way to eliminate "guessing". The entire point here is that
we don't know whether generating a custom plan will provide a win over
not doing so, until after we've done it (and already taken the planning
time hit). We could possibly put in heuristic tests based on the
query's use of certain features; but that's still guessing, and would
take nonzero time in itself.

I concur with Jeroen's feeling that going for a simple approach first
is the way to attack this. We could design and build something vastly
more complex, then find out that it doesn't actually work much better.

Also, I think there is a lot of confusion here over two different
issues: generic plan versus parameter-specific plan, and bad planner
estimates leading to a wrong plan choice. While the latter is certainly
an issue sometimes, there is no reason to believe that it affects
prepared statements worse than non-prepared ones. So I think that
designing a fix for prepared statements on the assumption that you can't
trust the planner's estimates is solving the wrong problem.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 16:50:37
Message-ID: 603c8f071002260850r19c2c9e4gcea15621b653ae92@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 26, 2010 at 1:29 AM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> Prepared plans + exec plan (new guc/ protocol thing):
>  Use: not quite sure
>  Problems: slow because it would replan every time
>  Solutions: use a prepared plan with the appropriate things not
> parametrized...?
>
> [ aka we already have this, its called dont use a prepared statement ]

The point is sometimes you'd like to replan every time, but not
reparse every time. There's no way to do that ATM.

...Robert


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alex Hunsaker <badalex(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 16:51:53
Message-ID: 603c8f071002260851k6ef85d23w1edbdd8e0eae1a3a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 26, 2010 at 10:07 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I think this is basically a planner problem and should be fixed in the
> planner, not by expecting users to make significant changes in
> application logic in order to create an indirect effect.

I would agree if I thought that were possible, but I'm skeptical about
your proposed solution.

...Robert


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 16:54:10
Message-ID: 603c8f071002260854v69716cfdw5f28cc76a167d0b6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 26, 2010 at 11:27 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Also, I think there is a lot of confusion here over two different
> issues: generic plan versus parameter-specific plan, and bad planner
> estimates leading to a wrong plan choice.  While the latter is certainly
> an issue sometimes, there is no reason to believe that it affects
> prepared statements worse than non-prepared ones.  So I think that
> designing a fix for prepared statements on the assumption that you can't
> trust the planner's estimates is solving the wrong problem.

The two issues seem intimately connected to me. Of course, estimates
can be inaccurate for many reasons, but ONE of those reasons is that
the planner is optimizing for an unknown parameter value which has
very different distribution from the actually supplied value.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 17:01:59
Message-ID: 29600.1267203719@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Feb 26, 2010 at 11:27 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Also, I think there is a lot of confusion here over two different
>> issues: generic plan versus parameter-specific plan, and bad planner
>> estimates leading to a wrong plan choice. While the latter is certainly
>> an issue sometimes, there is no reason to believe that it affects
>> prepared statements worse than non-prepared ones. So I think that
>> designing a fix for prepared statements on the assumption that you can't
>> trust the planner's estimates is solving the wrong problem.

> The two issues seem intimately connected to me. Of course, estimates
> can be inaccurate for many reasons, but ONE of those reasons is that
> the planner is optimizing for an unknown parameter value which has
> very different distribution from the actually supplied value.

Right, but if the parameter is unknown then its distribution is also
unknown. In any case that's just nitpicking, because the solution is
to create a custom plan for the specific value supplied. Or are you
suggesting that we should create a way for users to say "here is the
expected distribution of this parameter", and then try to fold that into
the planner estimates? I think the uptake on that would be about nil;
not to mention that it fails to fix several of the most pressing
problems, such as LIKE indexing and partition selection, where all the
distribution information in the world doesn't help. You need the exact
value.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alex Hunsaker <badalex(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 18:11:44
Message-ID: 1005.1267207904@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Feb 26, 2010 at 10:07 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I think this is basically a planner problem and should be fixed in the
>> planner, not by expecting users to make significant changes in
>> application logic in order to create an indirect effect.

> I would agree if I thought that were possible, but I'm skeptical about
> your proposed solution.

Fair enough --- maybe it will work well enough, or maybe it won't.
But the same can be said of every other proposal that's been made.
I'm in favor of trying the simpler approaches first.

regards, tom lane


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 18:53:38
Message-ID: 4B8818B2.8030703@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/26/2010 11:27 AM, Tom Lane wrote:
> Also, I think there is a lot of confusion here over two different
> issues: generic plan versus parameter-specific plan, and bad planner
> estimates leading to a wrong plan choice. While the latter is certainly
> an issue sometimes, there is no reason to believe that it affects
> prepared statements worse than non-prepared ones. So I think that
> designing a fix for prepared statements on the assumption that you can't
> trust the planner's estimates is solving the wrong problem.
>

Just to point out that I agree, and as per my original post, I think the
only time prepared statements should be re-planned for the statistics
case, is after 'analyze' has run. That sounds like a quicker solution,
and a much smaller gain. After 'analyze' of an object, invalidate all
cached plans for prepared statements that rely on that object and
require a re-plan. I doubt this will help me or many others very often.
It's something that should be done some day, but I don't recall ever
concluding that a performance problem I was experiencing was related to
using prepared statements too long. Also, the client is able to figure
this out. The client can choose to free prepared statements after 1
minute or 1000 calls. It's not really a problem.

It also has nothing to do with trust of the planner's estimates. Given
the same criteria, the planner should come up with the same best plan
most or all of the time. Trial and error planning, with the exception of
hugely complicated plans that cannot be produced in a finite time frame,
does not appeal to me at all. I do trust the planner's estimates.

The issue of specific parameter is the one I think most of us would
benefit from, and I think the most effective benefit is to not create
generic plans. I would prefer a prepare with specific plan and re-plan
when the specific plan does not apply, over generic plan, every time.
This has nothing to do with "time to prepare" or a ratio of "time to
prepare" vs "time to execute", or plans that are expected to take some
time to execute. The fact that I can run a PREPARE/EXECUTE, and SELECT,
and with only one invocation see a difference of over 100X shows that
generic plans is just not the right approach. It works according to
spec, but it is not practical under the current model.

Generic plans is the problem. My post was to bring attention to this, as
I see most comments focusing on an assumption that generic plans provide
value, and specific plans should only be used when generic plans are
expected to take a while to execute. It's walking around the problem
that the idea of a generic plan is just wrong. The only time a generic
plan is right, is when the specific plan would result in the same.

Cheers,
mark


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 18:59:22
Message-ID: 2060.1267210762@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Mielke <mark(at)mark(dot)mielke(dot)cc> writes:
> Just to point out that I agree, and as per my original post, I think the
> only time prepared statements should be re-planned for the statistics
> case, is after 'analyze' has run. That sounds like a quicker solution,
> and a much smaller gain. After 'analyze' of an object, invalidate all
> cached plans for prepared statements that rely on that object and
> require a re-plan.

Please note that that has been happening since 8.3, which is probably
why you haven't detected a problem.

> ... It's walking around the problem
> that the idea of a generic plan is just wrong. The only time a generic
> plan is right, is when the specific plan would result in the same.

I think that's a significant overstatement. There are a large number
of cases where a custom plan isn't worth it, even if it doesn't generate
exactly the same plan.

regards, tom lane


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 19:22:37
Message-ID: 4B881F7D.2020506@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/26/2010 01:59 PM, Tom Lane wrote:
>> ... It's walking around the problem
>> that the idea of a generic plan is just wrong. The only time a generic
>> plan is right, is when the specific plan would result in the same.
>>
> I think that's a significant overstatement. There are a large number
> of cases where a custom plan isn't worth it, even if it doesn't generate
> exactly the same plan.
>

There must be some way to lift the cost of planning out of the plan
enumeration and selection phase, such that only plan enumeration and
selection is run at execute time. In most cases, plan enumeration and
selection, provided that all data required to make these decisions is
all cached in data structures ready to go, should be very fast? Right?
Wrong? If right, my original post suggested that prepare should do the
parts of planning which are fixed, and not change based on the input
parameters, while execute should do the dynamic parts that would change
based on the input parameters.

By "not worth it", do you mean development effort or run time?

For development effort, it would definitely be worth it in the grand
scheme of things, but perhaps not worth it to specific individuals.

For run time, I've having trouble seeing the situation where it would
not be worth it. In the case that the resulting plan is the same (custom
vs generic) there should be no cost. In the case that the plan is
different, I think the difference proves that it is worth it. The case
where it wouldn't be worth it would be if a prepared statement was
called many times with many different parameters, and each set of
parameters required a re-plan - but my experience in this regard tells
me that the current model is to choose a sub-optimal plan, and the
entire query will run much slower than the planning time, on every
execute. We wouldn't be having this discussion if generic plans were
considered adequate. So, I feel that it is worth it in this case as well.

It's the development effort that is the problem. I can't do it, and I
can't make you do it. If you say "too hard", there isn't anything I can
do about it. :-)

Cheers,
mark


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 19:23:22
Message-ID: 4B881FAA.9090705@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/26/2010 01:59 PM, Tom Lane wrote:
> Mark Mielke<mark(at)mark(dot)mielke(dot)cc> writes:
>
>> Just to point out that I agree, and as per my original post, I think the
>> only time prepared statements should be re-planned for the statistics
>> case, is after 'analyze' has run. That sounds like a quicker solution,
>> and a much smaller gain. After 'analyze' of an object, invalidate all
>> cached plans for prepared statements that rely on that object and
>> require a re-plan.
>>
> Please note that that has been happening since 8.3, which is probably
> why you haven't detected a problem.
>

Excellent, and sorry for missing the release note on this.

Thanks,
mark


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 19:57:56
Message-ID: 3138.1267214276@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Mielke <mark(at)mark(dot)mielke(dot)cc> writes:
> There must be some way to lift the cost of planning out of the plan
> enumeration and selection phase, such that only plan enumeration and
> selection is run at execute time. In most cases, plan enumeration and
> selection, provided that all data required to make these decisions is
> all cached in data structures ready to go, should be very fast? Right?

Huh? What exactly do you think the cost of planning is, if not
enumeration and selection? There isn't very much that's cacheable,
at least not in any sanely-sized cache.

> By "not worth it", do you mean development effort or run time?

Run time. The development cost of what you are proposing is negligible:
just rip out the plan cache altogether. I don't believe it would be a
performance win though.

regards, tom lane


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 20:02:40
Message-ID: 34d269d41002261202n492cc8bcoefa6ec77e507fb58@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 26, 2010 at 09:50, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Feb 26, 2010 at 1:29 AM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>> Prepared plans + exec plan (new guc/ protocol thing):
>>  Use: not quite sure
>>  Problems: slow because it would replan every time
>>  Solutions: use a prepared plan with the appropriate things not
>> parametrized...?
>>
>> [ aka we already have this, its called dont use a prepared statement ]
>
> The point is sometimes you'd like to replan every time, but not
> reparse every time.  There's no way to do that ATM.

So what you save on parse time? Maybe that's worth it. I've never
run the numbers nor have I seen them in this thread. I probably
missed em... My _hunch_ is planning will on average take
significantly longer than parse time (read in the noise of plan time).
But that's unfounded :) I can certainly imagine cases where you have
HUGE queries where the parse time too slow-- wont the plan most of the
time be an order of magnitude slower? Anyway Ill stop until I get a
chance to do _some_ kind of benchmarking, I'm really quite clueless
here.


From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 20:11:16
Message-ID: 4B882AE4.9050400@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Right, but if the parameter is unknown then its distribution is also
> unknown. In any case that's just nitpicking, because the solution is
> to create a custom plan for the specific value supplied. Or are you
> suggesting that we should create a way for users to say "here is the
> expected distribution of this parameter", and then try to fold that into
> the planner estimates?
Or instead of letting users give the distribution, gather it
automatically in some plan statistics catalog? I suspect in most
applications queries stay the same for months and maybe years, so after
some number of iterations it is possible to have decent call statistics
/ parameter distributions. Maybe the the parameter value distribution
could even be annotated with actual cached plans.

regards,
Yeb Havinga


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 20:26:05
Message-ID: 4B882E5D.9080209@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/26/2010 02:57 PM, Tom Lane wrote:
> Mark Mielke<mark(at)mark(dot)mielke(dot)cc> writes:
>
>> There must be some way to lift the cost of planning out of the plan
>> enumeration and selection phase, such that only plan enumeration and
>> selection is run at execute time. In most cases, plan enumeration and
>> selection, provided that all data required to make these decisions is
>> all cached in data structures ready to go, should be very fast? Right?
>>
> Huh? What exactly do you think the cost of planning is, if not
> enumeration and selection? There isn't very much that's cacheable,
> at least not in any sanely-sized cache.
>

I think most operations, including this one, can be broken into a fixed
portion and a dynamic portion. The PREPARE should concern itself only
with the fixed portion, and should leave the dynamic portion to EXECUTE.
At present, the "planning process" is one big blob.

Here are parts that can be done "fixed":

1) Statement parsing and error checking.
2) Identification of tables and columns involved in the query.
3) Query the column statistics for involved columns, to be used in plan
cost estimation now and later.
4) Determine plan constraints under which elements of the plan must be
executed a certain way (something like constant folding for a compiler),
or for which parameter substitution would not impact the outcome.
5) Identify the elements of the plan that still require plan enumeration
and plan selection, to be used in a later part of the pipeline.

At a minimum, I am suggesting that 1), 2), and 3) should take a chunk
out of the planning process. I think 4) and 5) are more complex but
still valuable in terms of extracting the fixed portion out of the
planning process.

I think an assumption is being made that the planning process is an
atomic unit that cannot be turned into a pipeline or assembly line. I
think this assumption was what originally tied PREPARE = PLAN, and
EXECUTE = RUN. I think this assumption is leading to the conclusion that
EXECUTE should re-plan. I also expect that this assumption is tightly
woven into the current implementation and changing it would require some
amount of re-architecture. :-)

>> By "not worth it", do you mean development effort or run time?
>>
> Run time. The development cost of what you are proposing is negligible:
> just rip out the plan cache altogether. I don't believe it would be a
> performance win though.
>

That's not my proposal, though. I'm suspecting you didn't read it. :-)

I'm fine with you saying "too hard and not worth my development effort"
after you read it. I agree it would be a lot of work.

But if the conclusion is that the current architecture is the best that
can be had, and the decision is only about when to do a custom re-plan
or when to use the generic plan, I am putting my opinion out there that
the generic plan has always been a compromise, and it will always be a
compromise, and that this discussion exists primarily because the
compromise is not adequate in many real world scenarios.

And that all said, I think I am challenging the status quo and ticking
people off. So while my intent is to challenge the status quo, it is not
to tick people off. So, please let me know if you would like me to
continue, or if you have already written this off. :-)

Cheers,
mark


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Yeb Havinga <yebhavinga(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 20:29:11
Message-ID: 4B882F17.8050909@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/26/2010 03:11 PM, Yeb Havinga wrote:
> Tom Lane wrote:
>> Right, but if the parameter is unknown then its distribution is also
>> unknown. In any case that's just nitpicking, because the solution is
>> to create a custom plan for the specific value supplied. Or are you
>> suggesting that we should create a way for users to say "here is the
>> expected distribution of this parameter", and then try to fold that into
>> the planner estimates?
> Or instead of letting users give the distribution, gather it
> automatically in some plan statistics catalog? I suspect in most
> applications queries stay the same for months and maybe years, so
> after some number of iterations it is possible to have decent call
> statistics / parameter distributions. Maybe the the parameter value
> distribution could even be annotated with actual cached plans.

The problem with the last - actual cached plans - is that it implies the
other aspect I have been suggesting: In order to have a custom cached
plan, the primary model must be to use custom plans. If PREPARE/EXECUTE
uses generic plans normally, than the only cached plans available will
be generic plans.

Cheers,
mark


From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 20:46:27
Message-ID: 4B883323.5090701@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Mielke wrote:
> On 02/26/2010 03:11 PM, Yeb Havinga wrote:
>> Or instead of letting users give the distribution, gather it
>> automatically in some plan statistics catalog? I suspect in most
>> applications queries stay the same for months and maybe years, so
>> after some number of iterations it is possible to have decent call
>> statistics / parameter distributions. Maybe the the parameter value
>> distribution could even be annotated with actual cached plans.
>
> The problem with the last - actual cached plans - is that it implies
> the other aspect I have been suggesting: In order to have a custom
> cached plan, the primary model must be to use custom plans. If
> PREPARE/EXECUTE uses generic plans normally, than the only cached
> plans available will be generic plans.
I should have been clearer, with 'actual cached plans' I meant 'cached
plans planned with actual parameters' or 'cached custom plans'. It makes
no sense to annotate points or intervals in a gathered value
distribution with generic plans.

regards,
Yeb Havinga


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 20:47:55
Message-ID: 4127.1267217275@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Mielke <mark(at)mark(dot)mielke(dot)cc> writes:
> Here are parts that can be done "fixed":

> 1) Statement parsing and error checking.
> 2) Identification of tables and columns involved in the query.

The above two are done in the parser, not the planner.

> 3) Query the column statistics for involved columns, to be used in plan
> cost estimation now and later.

What makes you think that these can be cached any more effectively than
they are now?

> 4) Determine plan constraints under which elements of the plan must be
> executed a certain way (something like constant folding for a compiler),
> or for which parameter substitution would not impact the outcome.

Some marginal savings achievable there, perhaps. But you can't just
push const-folding upstream, because one of the purposes of it is to
const-fold expressions involving parameter values, if the planner is
allowed to assume the parameter values are known. So while we could
do a pass of const-folding upstream (in the rewriter say), we would
still need a pass at plan time. Not very clear that this nets out
as a win.

> 5) Identify the elements of the plan that still require plan enumeration
> and plan selection, to be used in a later part of the pipeline.

[ shrug... ] In practice that will typically be "all of them". The only
time it wouldn't be is when a parameter is only used in a join
condition, but I think the majority of practical cases have parameters
that are involved in relation scan conditions. Even if you could cache
the path results for some of the relations involved in a query, the
required size of the cache would be staggering.

> And that all said, I think I am challenging the status quo and ticking
> people off. So while my intent is to challenge the status quo, it is not
> to tick people off. So, please let me know if you would like me to
> continue, or if you have already written this off. :-)

It would be more productive if you spent awhile studying the code as it
actually exists now, and then started to propose rearchitecting. What
I'm reading here seems to be largely uninformed speculation.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-26 22:40:06
Message-ID: 603c8f071002261440k15ac3cddjc0a617ceb33fd3e8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 26, 2010 at 12:01 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Feb 26, 2010 at 11:27 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Also, I think there is a lot of confusion here over two different
>>> issues: generic plan versus parameter-specific plan, and bad planner
>>> estimates leading to a wrong plan choice.  While the latter is certainly
>>> an issue sometimes, there is no reason to believe that it affects
>>> prepared statements worse than non-prepared ones.  So I think that
>>> designing a fix for prepared statements on the assumption that you can't
>>> trust the planner's estimates is solving the wrong problem.
>
>> The two issues seem intimately connected to me.  Of course, estimates
>> can be inaccurate for many reasons, but ONE of those reasons is that
>> the planner is optimizing for an unknown parameter value which has
>> very different distribution from the actually supplied value.
>
> Right, but if the parameter is unknown then its distribution is also
> unknown.

I must be losing my mind. The distribution of the parameter will be
known at execution time because a specific value will be supplied. Of
course, it's not known at prepare time.

> In any case that's just nitpicking, because the solution is
> to create a custom plan for the specific value supplied.  Or are you
> suggesting that we should create a way for users to say "here is the
> expected distribution of this parameter", and then try to fold that into
> the planner estimates? I think the uptake on that would be about nil;
> not to mention that it fails to fix several of the most pressing
> problems, such as LIKE indexing and partition selection, where all the
> distribution information in the world doesn't help.  You need the exact
> value.

No, I'm not suggesting that.

Basically, what I really want here is some kind of keyword or other
syntax that I can stick into a PL/pgsql query that requests a replan
on every execution. It's easy to identify the cases where this is
needed: they are precisely the cases where my function runs for a
small eternity. I realize that you can use EXECUTE for this, but as
you pointed out upthread, that's not without its share of pain.
Exposing the same functionality via other interfaces (e.g. PQprepare,
other PLs) would be nice too, but for me personally, PL/pgsql is the
one that keeps biting me in the rear end.

One particularly ugly misadventure in this area had me write a
function which updated a bunch of data in user tables either for one
particular widget, if a widget id was passed in as an argument, or for
all widgets, if NULL was passed in. I did this by writing ...some big
hairy query... WHERE ... various conditions ... AND (id = $1 OR $1 IS
NULL). The point was that sometimes the caller knew that only a
particular widget needed to have its stuff recomputed, and sometimes
it didn't know for sure what might need updating so it just wanted to
update everything. Of course it turned out that this didn't really
work: the same plan was used in both cases, so the version with an id
specified took just as long to run as the generic version.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-27 00:03:06
Message-ID: 8170.1267228986@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Basically, what I really want here is some kind of keyword or other
> syntax that I can stick into a PL/pgsql query that requests a replan
> on every execution.

Wouldn't it be better if it just did the right thing automatically?

The sort of heuristic I'm envisioning would essentially do "replan every
time" for some number of executions, and give up only if it noticed that
it wasn't getting anything better than the generic plan. So you'd have
a fixed maximum overhead per session when the custom plan was useless,
and the Right Thing when it wasn't.

regards, tom lane


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-27 00:50:11
Message-ID: 4B886C43.3070508@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/26/2010 07:03 PM, Tom Lane wrote:
> Robert Haas<robertmhaas(at)gmail(dot)com> writes:
>
>> Basically, what I really want here is some kind of keyword or other
>> syntax that I can stick into a PL/pgsql query that requests a replan
>> on every execution.
>>
> Wouldn't it be better if it just did the right thing automatically?
>

Yes please. :-) Often, we are just users of the application, and we do
not have the freedom to change it.

> The sort of heuristic I'm envisioning would essentially do "replan every
> time" for some number of executions, and give up only if it noticed that
> it wasn't getting anything better than the generic plan. So you'd have
> a fixed maximum overhead per session when the custom plan was useless,
> and the Right Thing when it wasn't.

My other comments aside - I think generic plan + specific plan where
specific plan continues to beat generic plan, will meet the cases that
really annoyed me, and would make a lot of us very happy... Thanks.

Cheers,
mark


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-28 01:01:43
Message-ID: 603c8f071002271701w461a0102w6584fd81c03c17d6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 26, 2010 at 7:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Basically, what I really want here is some kind of keyword or other
>> syntax that I can stick into a PL/pgsql query that requests a replan
>> on every execution.
>
> Wouldn't it be better if it just did the right thing automatically?
>
> The sort of heuristic I'm envisioning would essentially do "replan every
> time" for some number of executions, and give up only if it noticed that
> it wasn't getting anything better than the generic plan.  So you'd have
> a fixed maximum overhead per session when the custom plan was useless,
> and the Right Thing when it wasn't.

Which is likely useless for my use case.

...Robert


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-28 04:20:38
Message-ID: 4B89EF16.3080606@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26/02/2010 11:40 AM, Tom Lane wrote:

> But putting support for a per-query level
> of control into the protocol (and then every client library) as well as
> every PL is going to be painful to implement, and even more painful to
> use.

You mean something like 'EXECUTE REPLAN' and protocol/PL-level equivalents?

That's what people on -GENERAL often seem to need. A way, for a
particular query, to say "replan this every time, because the stats are
very signifcant here".

> I still like the idea of automatically replanning with the known
> parameter values, and noting whether the result plan was estimated to be
> noticeably cheaper than the generic plan, and giving up on generating
> custom plans if we didn't observe any such win over N tries.

That risks making prepared statements less efficient when re-used
between 2 and N times. People would end up asking for a 'no replan'
knob, which I'm not sure is any improvement over a 'force replan' knob.

OTOH, a GUC to turn that off would be (IMO) fairly harmless - if you
know you don't have any problematic prepared queries, turn it off to
save some cycles.

Another thought: I wonder if this should be viewed from one step back.
Many of these issues come from people who don't actually want prepared
statements for performance, they're just using them to get convienient
and secure parameter placement and server-side caching of the query text.

Essentially, you have:

1) People preparing statements to save on parse+plan time; and
2) People preparing statements to get convenenient param placement.

I suspect that most of (1) also want (2), but many of (2) don't care
much about (1) and are just preparing statements for sql-injection
safety (param placement), because they've been told to by someone,
because their library does it for them, etc.

So: Would it be easier to handle control of replan vs no-replan at
PREPARE time? Or would that have very much the same protocol/pl change
issues?

--
Craig Ringer


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-28 04:22:00
Message-ID: 25921.1267330920@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Feb 26, 2010 at 7:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Wouldn't it be better if it just did the right thing automatically?
>>
>> The sort of heuristic I'm envisioning would essentially do "replan every
>> time" for some number of executions, and give up only if it noticed that
>> it wasn't getting anything better than the generic plan. So you'd have
>> a fixed maximum overhead per session when the custom plan was useless,
>> and the Right Thing when it wasn't.

> Which is likely useless for my use case.

[ shrug... ] You'd better explain exactly why, if you want me to take
that objection seriously.

regards, tom lane


From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-28 07:52:55
Message-ID: 4B8A20D7.1020902@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/27/2010 11:20 PM, Craig Ringer wrote:
> Essentially, you have:
>
> 1) People preparing statements to save on parse+plan time; and
> 2) People preparing statements to get convenenient param placement.
>
> I suspect that most of (1) also want (2), but many of (2) don't care
> much about (1) and are just preparing statements for sql-injection
> safety (param placement), because they've been told to by someone,
> because their library does it for them, etc.
>
> So: Would it be easier to handle control of replan vs no-replan at
> PREPARE time? Or would that have very much the same protocol/pl change
> issues?

I think if SQL hints were sufficient, that clients would only need to
remove the prepared statement and re-create it whenever required.

It should do the right thing automatically.

I'm convinced that means generic plans are always wrong, and that some
combination of performing fixed operations in PREPARE and variable
operations in EXECUTE, combined with a plan caching against the prepared
statement with criteria to determine whether or not the parameters match
the assumptions made when creating one of the cached plans. Tom says
extracting the fixed part of the planning out to PREPARE would be
difficult or less valuable than I think. And the multi-plan caching with
criteria seems to have been brought up and not commented on much by
several people. So, it doesn't look like I will get this unless I learn
how to implement it myself - which is probably not feasible at this
time. :-)

Not getting this, I think I'd be happy if PREPARE/EXECUTE can *easily*
detect the worst cases (i.e. not slower in the general case), and
generic plan plus custom plan plus custom execution is still
significantly faster than generic plan plus generic execution.

Adding SQL to indicate whether it should be re-planned or not is
completely unappealing. If I could change the code, today, I'd just turn
off or choose not to use PREPARE/EXECUTE. Today, PREPARE/EXECUTE seems
like it should always be considered slower unless one can prove it is
actually faster in a specific case, which is the exact opposite of what
people expect.

Cheers,
mark


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-28 08:18:26
Message-ID: m2hbp1ydrh.fsf@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> writes:
> 1) People preparing statements to save on parse+plan time; and
> 2) People preparing statements to get convenenient param placement.
>
> I suspect that most of (1) also want (2), but many of (2) don't care much
> about (1) and are just preparing statements for sql-injection safety (param
> placement), because they've been told to by someone, because their library
> does it for them, etc.
>
> So: Would it be easier to handle control of replan vs no-replan at PREPARE
> time? Or would that have very much the same protocol/pl change issues?

http://www.postgresql.org/docs/8.4/static/libpq-exec.html#LIBPQ-EXEC-MAIN

PQexecParams

Submits a command to the server and waits for the result, with the
ability to pass parameters separately from the SQL command text.

So I think what you're talking about is already in there.
--
dim


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-02-28 14:12:32
Message-ID: 407d949e1002280612p746e4757pf7f2937dd7fc7af6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 26, 2010 at 4:01 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> It's not going to be easier to implement.  Yeah, it would be easy to
>> provide a global switch via a GUC setting, but that's not going to be
>> helpful, because this is the sort of thing that really needs to be
>> managed per-query.  Almost any nontrivial application is going to have
>> some queries that really need the custom plan and many that don't.
>> If people just turn the GUC on we might as well throw away the plan
>> caching mechanism altogether.
>
> I agree.  A GUC is a really bad idea.

I'm not sure I see why. GUCs can be turned on and off per-query. Even
if you have protocol-level support a GUC could be more convenient in
some cases. Specifically it would be useful for testing explain plans
on queries that you expect to be planned without parameters. Currently
you have to rewrite the query using PREPARE QUERY which is far from
intuitive for users and even once you know how to do it prevents you
from just copying and pasting queries.

>> But putting support for a per-query level
>> of control into the protocol (and then every client library) as well as
>> every PL is going to be painful to implement, and even more painful to
>> use.
>
> I suppose I should have learned by now not to argue with you over
> technical points, but I don't see why this should be painful.  I mean,
> it'll be a lot of work and it'll in the end touch a lot of different
> parts of the code, but work != pain, and I don't see any reason why
> the problem can't be attacked incrementally.  I'm also deeply
> unconvinced that any other solution will be as satisfactory.

I'm not sure we should be so conservative about adding features to the
protocol. Sure it'll take time to filter through to all the drivers
and the better ones will support it before the less well maintained
ones. But that's just a question of time and nobody will be too
surprised by that.

I think we should have a general purpose options field for all our
messages. We can define an enum of options keys and pass an integer
value for each option. In some cases they'll actually be boolean and
other cases they'll be tunables but it seems simpler to make it so the
server can parse all the options out of the message even if it doesn't
understand them all. And that means proxies can pass them on without
understanding what they all mean.

We discussed a while back marking all "latency-critical" queries so
the backend knows that any buffers touched by that query should be
marked as more important to stay in cache. If we got more complex
priority handling we would be able to pass an option for the time
budget for a query or its nice level.

--
greg


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Alex Hunsaker <badalex(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-03-01 02:47:38
Message-ID: 603c8f071002281847v8dafbd5lcb6d554cffe65ae8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Feb 27, 2010 at 11:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Feb 26, 2010 at 7:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Wouldn't it be better if it just did the right thing automatically?
>>>
>>> The sort of heuristic I'm envisioning would essentially do "replan every
>>> time" for some number of executions, and give up only if it noticed that
>>> it wasn't getting anything better than the generic plan.  So you'd have
>>> a fixed maximum overhead per session when the custom plan was useless,
>>> and the Right Thing when it wasn't.
>
>> Which is likely useless for my use case.
>
> [ shrug... ]  You'd better explain exactly why, if you want me to take
> that objection seriously.

Hmm... on further thought, maybe it *would* work in that case. I'm
still not convinced this is going to be generally satisfactory. It
seems like it depends a great deal on how many times the function
figures to be called per session and in what percentage of those cases
a non-generic plan figures to be better. The appeal of a
user-controllable knob is that I am pretty sure from experience that I
can set it correctly, but hey...

...Robert


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-03-01 02:51:14
Message-ID: 603c8f071002281851u222a66e1pbcc6903a101c39d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Feb 28, 2010 at 2:52 AM, Mark Mielke <mark(at)mark(dot)mielke(dot)cc> wrote:
> On 02/27/2010 11:20 PM, Craig Ringer wrote:
>>
>> Essentially, you have:
>>
>> 1) People preparing statements to save on parse+plan time; and
>> 2) People preparing statements to get convenenient param placement.
>>
>> I suspect that most of (1) also want (2), but many of (2) don't care much
>> about (1) and are just preparing statements for sql-injection safety (param
>> placement), because they've been told to by someone, because their library
>> does it for them, etc.
>>
>> So: Would it be easier to handle control of replan vs no-replan at PREPARE
>> time? Or would that have very much the same protocol/pl change issues?
>
> I think if SQL hints were sufficient, that clients would only need to remove
> the prepared statement and re-create it whenever required.
>
> It should do the right thing automatically.
>
> I'm convinced that means generic plans are always wrong, and that some
> combination of performing fixed operations in PREPARE and variable
> operations in EXECUTE, combined with a plan caching against the prepared
> statement with criteria to determine whether or not the parameters match the
> assumptions made when creating one of the cached plans. Tom says extracting
> the fixed part of the planning out to PREPARE would be difficult or less
> valuable than I think. And the multi-plan caching with criteria seems to
> have been brought up and not commented on much by several people. So, it
> doesn't look like I will get this unless I learn how to implement it myself
> - which is probably not feasible at this time. :-)
>
> Not getting this, I think I'd be happy if PREPARE/EXECUTE can *easily*
> detect the worst cases (i.e. not slower in the general case), and generic
> plan plus custom plan plus custom execution is still significantly faster
> than generic plan plus generic execution.
>
> Adding SQL to indicate whether it should be re-planned or not is completely
> unappealing. If I could change the code, today, I'd just turn off or choose
> not to use PREPARE/EXECUTE. Today, PREPARE/EXECUTE seems like it should
> always be considered slower unless one can prove it is actually faster in a
> specific case, which is the exact opposite of what people expect.

I don't really understand most of what you're saying here, but there's
definitely some truth to your last sentence. This has easily got to
be one of the top ten questions on -performance.

...Robert


From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alex Hunsaker <badalex(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Polyplanner (was Re: Avoiding bad prepared-statement plans.)
Date: 2010-03-01 11:18:01
Message-ID: 4B8BA269.1060100@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

How about a totally different approach?

What if all queries and plans of all queries, simple and prepared, were
pre-planned and cached always, persistent?
For prepared statements with >= 1 parameters, histogram and mcv
information could be used to search the plan space for interesting
plans. Maybe with some heuristics to cut down on search space (i.e. when
operator is '=' and there is a unique index, skip that clause /
parameter from the search space).
Since processors keep getting more and more cores, and most database
activity is IO bound, why not keep one core busy with query analysis?

good:
- with the several hooks available it could be implemented as optional
contrib
- if offers plan stability
- nice info for management user interface
- might be a solution for prepared queries
- for queries with large joins, plans might be considered with
exhaustive search, so also here there could be an improvement.
- it might even be possible to 'test' plans during low-usage hours

bad:
- unknown how big space for cached plans should be
- if big cached plan space doesn't fit in memory, actual planning
probably better than fetching from disk, ~= 5 to 10ms.

regards,
Yeb Havinga


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-03-02 23:53:56
Message-ID: 201003022353.o22NruN14103@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine wrote:
> Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> writes:
> > 1) People preparing statements to save on parse+plan time; and
> > 2) People preparing statements to get convenenient param placement.
> >
> > I suspect that most of (1) also want (2), but many of (2) don't care much
> > about (1) and are just preparing statements for sql-injection safety (param
> > placement), because they've been told to by someone, because their library
> > does it for them, etc.
> >
> > So: Would it be easier to handle control of replan vs no-replan at PREPARE
> > time? Or would that have very much the same protocol/pl change issues?
>
> http://www.postgresql.org/docs/8.4/static/libpq-exec.html#LIBPQ-EXEC-MAIN
>
> PQexecParams
>
> Submits a command to the server and waits for the result, with the
> ability to pass parameters separately from the SQL command text.
>
> So I think what you're talking about is already in there.

There are three levels, SQL, libpq, and the wire protocol. The wire
protocol offers this via unnamed statements. libpq offers it via
PQexecParams(). SQL does not, as far as I can tell.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-03-02 23:54:28
Message-ID: 201003022354.o22NsSM14130@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> > Adding SQL to indicate whether it should be re-planned or not is completely
> > unappealing. If I could change the code, today, I'd just turn off or choose
> > not to use PREPARE/EXECUTE. Today, PREPARE/EXECUTE seems like it should
> > always be considered slower unless one can prove it is actually faster in a
> > specific case, which is the exact opposite of what people expect.
>
> I don't really understand most of what you're saying here, but there's
> definitely some truth to your last sentence. This has easily got to
> be one of the top ten questions on -performance.

It seems it is the problem everyone knows about but no one fixes. :-(

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeroen Vermeulen <jtv(at)xs4all(dot)nl>, Greg Stark <gsstark(at)mit(dot)edu>, Bart Samwel <bart(at)samwel(dot)tk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Avoiding bad prepared-statement plans.
Date: 2010-03-03 05:13:46
Message-ID: 603c8f071003022113y6f78f7aete88e28c9f2d6a0c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 2, 2010 at 6:54 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Robert Haas wrote:
>> > Adding SQL to indicate whether it should be re-planned or not is completely
>> > unappealing. If I could change the code, today, I'd just turn off or choose
>> > not to use PREPARE/EXECUTE. Today, PREPARE/EXECUTE seems like it should
>> > always be considered slower unless one can prove it is actually faster in a
>> > specific case, which is the exact opposite of what people expect.
>>
>> I don't really understand most of what you're saying here, but there's
>> definitely some truth to your last sentence.  This has easily got to
>> be one of the top ten questions on -performance.
>
> It seems it is the problem everyone knows about but no one fixes.  :-(

I'd work on it, but Tom doesn't like my proposed fix. *shrug*

...Robert