Re: generalizing the planner knobs

Lists: pgsql-hackers
From: Neil Conway <neilc(at)samurai(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: generalizing the planner knobs
Date: 2005-12-01 10:30:59
Message-ID: 1133433059.16522.21.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

There are currently some rather crude knobs for persuading the planner
to favour certain kinds of query plans: the enable_XXX GUC variables.
Several people have asked for a more flexible way to give hints to the
planner. I'm not interested in implementing fully-general planner hints
at the moment, but ISTM that a simple improvement to what we have now
would allow for a wider range of planner hints with only minor changes:
we could replace the enable_XXX variables with a set of variables that
would add an arbitrary constant to the estimated cost of each type of
query node. (Alternatively, an arbitrary multiplier could be specified;
I'm not sure which would be better.)

This would also be useful when diagnosing bad query plans: for example,
setting enable_seqscan=false often causes the planner to disregard the
use of *any* sequential scan, anywhere in the plan. The ability to
slightly bump up the cost of particular operations would allow more
alternative plans to be examined.

On the other hand, the whole mechanism is still a hack. It also means
that applications using this will be more dependent on the actual
costing values produced by the planner, which is not good. However, if
you're in the sort of desperate straights where this sort of hackery is
required, perhaps that's acceptable.

Comments?

-Neil


From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-01 15:55:40
Message-ID: 36e682920512010755l44de5eb8t6e8151258c457ad0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hey Neil,

In the last couple weeks I too have been thinking about planner hints.
Assuming I have read your post correctly, the issue I see with this idea is
that, in most cases, there won't be much of a difference between adding an
arbitrary cost value to each type of node and disabling it completely.
Also, by fiddling with an arbitrary cost the user may introduce a lot of
variation into the planner which may actually result in worse query plans.

While Tom's done a great job with the planner, there are certain cases where
a user knows exactly what type of join or index they want to use for a
query. In that case I'd favor run-time hints from the user similar to
Oracle. I've read about seven papers on query optimization and planning in
the last few weeks and have a lot of ideas... I'm just not sure when I may
get time to work on them :(

-Jonah

On 12/1/05, Neil Conway <neilc(at)samurai(dot)com> wrote:
>
> There are currently some rather crude knobs for persuading the planner
> to favour certain kinds of query plans: the enable_XXX GUC variables.
> Several people have asked for a more flexible way to give hints to the
> planner. I'm not interested in implementing fully-general planner hints
> at the moment, but ISTM that a simple improvement to what we have now
> would allow for a wider range of planner hints with only minor changes:
> we could replace the enable_XXX variables with a set of variables that
> would add an arbitrary constant to the estimated cost of each type of
> query node. (Alternatively, an arbitrary multiplier could be specified;
> I'm not sure which would be better.)
>
> This would also be useful when diagnosing bad query plans: for example,
> setting enable_seqscan=false often causes the planner to disregard the
> use of *any* sequential scan, anywhere in the plan. The ability to
> slightly bump up the cost of particular operations would allow more
> alternative plans to be examined.
>
> On the other hand, the whole mechanism is still a hack. It also means
> that applications using this will be more dependent on the actual
> costing values produced by the planner, which is not good. However, if
> you're in the sort of desperate straights where this sort of hackery is
> required, perhaps that's acceptable.
>
> Comments?
>
> -Neil
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-01 16:49:42
Message-ID: 11805.1133455782@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> ... ISTM that a simple improvement to what we have now
> would allow for a wider range of planner hints with only minor changes:
> we could replace the enable_XXX variables with a set of variables that
> would add an arbitrary constant to the estimated cost of each type of
> query node. (Alternatively, an arbitrary multiplier could be specified;
> I'm not sure which would be better.)

I think the multiplier would be better, because it'd avoid the problem
you mention later that useful values would be dependent on the planner's
cost units. Also, one could sanely allow a multiplier less than one,
so as to favor instead of penalize a particular plan type.

regards, tom lane


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: generalizing the planner knobs
Date: 2005-12-01 17:32:12
Message-ID: dmnc17$30b9$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Neil Conway" <neilc(at)samurai(dot)com> wrote
>
> This would also be useful when diagnosing bad query plans: for example,
> setting enable_seqscan=false often causes the planner to disregard the
> use of *any* sequential scan, anywhere in the plan. The ability to
> slightly bump up the cost of particular operations would allow more
> alternative plans to be examined.
>

This method also has the problem of "enable_seqscan=false" in some
situations. I would vote we implement the final general solution like query
plan hints directly.

Regards,
Qingqing


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-01 18:20:24
Message-ID: 15364.1133461224@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Jonah H. Harris" <jonah(dot)harris(at)gmail(dot)com> writes:
> In the last couple weeks I too have been thinking about planner hints.
> Assuming I have read your post correctly, the issue I see with this idea is
> that, in most cases, there won't be much of a difference between adding an
> arbitrary cost value to each type of node and disabling it completely.
> Also, by fiddling with an arbitrary cost the user may introduce a lot of
> variation into the planner which may actually result in worse query plans.

Which is pretty much exactly the problem with "planner hints", too.
I've resisted that suggestion in the past and will continue to do so,
because hints are accidents waiting to happen. Even if the hint is right
today for your current Postgres version and current data distribution,
it's likely not to be right further down the road --- but once the hint
is embedded in your application, how often are you going to revisit it?
As an example, a hint forcing the planner to use an indexscan with a
particular index might have been a great idea in PG 8.0 and a lousy idea
in 8.1, because it would prevent substitution of a possibly-far-better
bitmap indexscan.

The enable_foo switches are debug aids, not something you are expected
to fool with for production purposes, and the same would be true of
Neil's suggested multipliers. While I don't feel any strong need for
variable multipliers, they'd be a small enough incremental amount of
work that the suggestion doesn't require a lot of supporting argument.
Adding a planner hint facility would be several orders of magnitude
more work, and it would be taking the system in a design direction that
I think is fundamentally misguided.

regards, tom lane


From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-01 18:45:48
Message-ID: 36e682920512011045o2d229711j7dd1f81f22227006@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom,

Don't get me wrong, I agree with you completely. I would rather put effort
into enhancing the planner than in developing work-arounds. In 99% of all
cases the planner works correctly, but I know people who actually have to
disable planning options (mergejoin) in production applications because they
get bad plans. The "bad" plans are not really bad in terms of what the
planner knows about the query, just in areas where the planner doesn't look
at other things.

I also agree that a significant amount of work would be required to add
run-time hints which would be better spent enhancing the system as a whole.
My only suggestion was that it would be better than Part 1 of Neil's
statement. Somehow I missed the end mention of multipliers which I agree
requires less effort.

On 12/1/05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> "Jonah H. Harris" <jonah(dot)harris(at)gmail(dot)com> writes:
> > In the last couple weeks I too have been thinking about planner hints.
> > Assuming I have read your post correctly, the issue I see with this idea
> is
> > that, in most cases, there won't be much of a difference between adding
> an
> > arbitrary cost value to each type of node and disabling it completely.
> > Also, by fiddling with an arbitrary cost the user may introduce a lot of
> > variation into the planner which may actually result in worse query
> plans.
>
> Which is pretty much exactly the problem with "planner hints", too.
> I've resisted that suggestion in the past and will continue to do so,
> because hints are accidents waiting to happen. Even if the hint is right
> today for your current Postgres version and current data distribution,
> it's likely not to be right further down the road --- but once the hint
> is embedded in your application, how often are you going to revisit it?
> As an example, a hint forcing the planner to use an indexscan with a
> particular index might have been a great idea in PG 8.0 and a lousy idea
> in 8.1, because it would prevent substitution of a possibly-far-better
> bitmap indexscan.
>
> The enable_foo switches are debug aids, not something you are expected
> to fool with for production purposes, and the same would be true of
> Neil's suggested multipliers. While I don't feel any strong need for
> variable multipliers, they'd be a small enough incremental amount of
> work that the suggestion doesn't require a lot of supporting argument.
> Adding a planner hint facility would be several orders of magnitude
> more work, and it would be taking the system in a design direction that
> I think is fundamentally misguided.
>
> regards, tom lane
>


From: Greg Stark <gsstark(at)mit(dot)edu>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-01 20:54:42
Message-ID: 8764q8h71p.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Jonah H. Harris" <jonah(dot)harris(at)gmail(dot)com> writes:

> Tom,
>
> Don't get me wrong, I agree with you completely. I would rather put effort
> into enhancing the planner than in developing work-arounds. In 99% of all
> cases the planner works correctly, but I know people who actually have to
> disable planning options (mergejoin) in production applications because they
> get bad plans. The "bad" plans are not really bad in terms of what the
> planner knows about the query, just in areas where the planner doesn't look
> at other things.

I would like to draw a distinction between two sorts of hints. Currently
you're talking about one sort of hint, namely hints that tell the planner to
alter its cost model and choose a different plan than the inputs it has would
dictate. Using these require the user to have a fairly in depth understanding
of the planner and what options it has available.

On the other hand the type I would prefer to see are hints that feed directly
into filling in information the planner lacks. This only requires that the
user understand his own data and still lets the planner pick the best plan
based on the provided information.

So for example I would love to see a hint that allowed you to specify the
selectivity of a where clause. And one that let you specify the "density" of a
grouping clause.

Most of the time the planner makes a mistake it's because of a bad
miscalculation in estimating these givens. If it had the correct values for
the inputs then it would make the right decision about the plan.

Making the planner very good at making the right decisions given accurate
inputs is an attainable goal. Computers are pretty deterministic and it's
possible to come up with very accurate cost models. Despite some known
problems with Postgres's current models they're remarkably good already. And
there's no particular reason to think they can't be made nearly perfect.

Making the planner very good at producing accurate estimates is a much harder
goal. No matter how accurate it gets there will always be more complex
expressions that are harder to predict and there will always be cases the
planner can't estimate well. The user however knows his own data and may well
know the answer.

In the extreme consider user-defined operators, which will always be dependent
on the user to provide estimator functions. If it's a rarely used operator the
user may find it easier to simply tell the planner the selectivity of each
expression rather than come up with a general solution.

(I also think things like joins and group by clauses will never be able to be
guaranteed accurate results in general. Not without a lot more costs up front
including giving up on calculating statistics based on only a sample.)

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-01 21:01:34
Message-ID: 17078.1133470894@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> On the other hand the type I would prefer to see are hints that feed directly
> into filling in information the planner lacks. This only requires that the
> user understand his own data and still lets the planner pick the best plan
> based on the provided information.

This would avoid some issues, but it still is vulnerable to the problem
that the hint you put in your code today will fail to track changes in
your data tomorrow.

regards, tom lane


From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-02 10:07:06
Message-ID: 1133518026.4779.132.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2005-12-01 at 22:01, Tom Lane wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > On the other hand the type I would prefer to see are hints that feed directly
> > into filling in information the planner lacks. This only requires that the
> > user understand his own data and still lets the planner pick the best plan
> > based on the provided information.
>
> This would avoid some issues, but it still is vulnerable to the problem
> that the hint you put in your code today will fail to track changes in
> your data tomorrow.

Tom, I have to disagree here. At least in our application, we must
provide for an acceptable worst case scenario, and sometimes a slightly
wrong estimate can lead to a plan which is very fast 99% of the time but
completely wrong in 1% of the cases. Sometimes the percentage is 50/50.
I've had this situation with some "limit" plans where the planner had
chosen a wrong index. The problem there was that the planner had
estimated that the query will have 20 rows as a result, but it had less,
and resulted in the complete scan of the index... as opposed to a much
smaller scan that would have resulted by scanning the other index, as
that one would have provided an end condition orders of magnitudes
sooner. Now the statistics will always be only an estimation, and +/- a
few can really make a big difference in some situations. In this
particular situation the index choice of the planner would have been
faster for all cases where there were really 20 rows returned, but I
forced it to always choose the other plan (by adding the proper order
by) because I can't risk a bad result in any of the cases.
In this particular case I was able to force the planner choose a
specific plan, but that might not be always possible, so I guess it
really would make sense to be able to tell the planner how selective
some conditions are. And yes, sometimes I would like to "freeze" a
specific "safe" plan for a specific query, even if it is not optimal.

So for me the "hint" mechanism is good for telling the server that I'm
not interested at all in the BEST plan but which risks getting very bad
on occasions, but in a good enough plan which is safe.

And as for the selectivity changes over time, the hints will change
along. In most of the situations when selectivity change, the SQL has to
change too, sometimes even the complete workflow. I find that if changed
hints will help in some occasions then having them would mean less
maintenance than the code rewriting that would be otherwise involved...
and I'm completely sure the server can't compensate for the change of
the dynamics of the data all the time. And it definitely can't keep up
with highly dynamic data, where the statistics change constantly in big
tables...

Our application for example has kind of batch processing, where we
insert smaller or larger batches of data in a HUGE table (~200 millions
of rows), and then that data is immediately used for different
operations and then reports, and furthermore it is heavily updated. I
can't think of any reasonable statistics target and ANALYZE strategy
which could satisfy both small batches and large batches without running
ANALYZE permanently with high statistics targets on the key fields...
and even that would not be specific enough when "limit 20" is involved.
For queries involving this table I really would like to freeze plans, as
any misplanning has bad consequences.

Cheers,
Csaba.


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-02 10:53:25
Message-ID: 20051202105325.GB3057@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 02, 2005 at 11:07:06AM +0100, Csaba Nagy wrote:
> So for me the "hint" mechanism is good for telling the server that I'm
> not interested at all in the BEST plan but which risks getting very bad
> on occasions, but in a good enough plan which is safe.

I'm wondering if long term another approach might be to have another
parameter in the planner, cost_error or selectivity_error which is an
indication of how accurate we think it is.

So for example you have an index scan might cost x but with a possible
error of 15% and the seqscan might cost y but with an error of 1%.

The "error" for nested loop would be the product of the two inputs,
whereas a merge join whould be much less sensetive to error. A sort or
hash join would react badly to large variations of input.

So in cases where there is a choice between two indexscans with one
slightly more expensive and more accurate but can result in a mergejoin
would be a better choice than a possibly highly selective index but
without accurate info that needs to be fed into a nested loop. Even
though the latter might look better, the former is the "safer" option.

I think this would solve the problem where people see sudden flip-flops
between good and bad plans. The downside is that it's yet another
parameter for the planner to get wrong.

Unfortunatly, this is the kind of thing people write thesises on and I
don't think many people have the grounding in statistics to make it all
work.

Have a ncie day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-02 23:01:36
Message-ID: 1133564496.2906.696.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2005-12-02 at 11:53 +0100, Martijn van Oosterhout wrote:
> On Fri, Dec 02, 2005 at 11:07:06AM +0100, Csaba Nagy wrote:
> > So for me the "hint" mechanism is good for telling the server that I'm
> > not interested at all in the BEST plan but which risks getting very bad
> > on occasions, but in a good enough plan which is safe.
>
> I'm wondering if long term another approach might be to have another
> parameter in the planner, cost_error or selectivity_error which is an
> indication of how accurate we think it is.
>
> So for example you have an index scan might cost x but with a possible
> error of 15% and the seqscan might cost y but with an error of 1%.
>
> The "error" for nested loop would be the product of the two inputs,
> whereas a merge join whould be much less sensetive to error. A sort or
> hash join would react badly to large variations of input.
>
> So in cases where there is a choice between two indexscans with one
> slightly more expensive and more accurate but can result in a mergejoin
> would be a better choice than a possibly highly selective index but
> without accurate info that needs to be fed into a nested loop. Even
> though the latter might look better, the former is the "safer" option.
>
> I think this would solve the problem where people see sudden flip-flops
> between good and bad plans. The downside is that it's yet another
> parameter for the planner to get wrong.

Measuring parameters more accurately is a lengthy experimental job, not
a theoretical one. I think we are just waiting for someone to do this.

> Unfortunatly, this is the kind of thing people write thesises on and I
> don't think many people have the grounding in statistics to make it all
> work.

I'd considered that before; its just a lot of work.

The theory of error propagation is straightforward: you just take the
root mean square of the errors on the parameters.

Trouble is, many of the planning parameters are just guesses, so you
have no idea of the error estimates either. Hence you can't really
calculate the error propagation accurately enough to make a sensible
stab at risk control. But it would be useful sometimes, which is about
the best it gets with the planner.

Right now the worst part of the planner is:
- the estimation of number of distinct values, which is an inherent
statistical limitation
- need for multi-column interaction statistics

The two are somewhat related.

Best Regards, Simon Riggs


From: Hans-Juergen Schoenig <postgres(at)cybertec(dot)at>
To: Jonah H(dot)Harris <jonah(dot)harris(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-03 12:34:45
Message-ID: E365094A-266F-4C66-888D-22F3A6D01276@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

wouldn't it be more flexible to define a multiplicator or some sort
of bool flag on a per object level?
oracle hints are a total overkill and i agree with tom that usually
people will abuse this feature.
if we had a per object flag the actual planner hint can be decoupled
from the actual query (i don't think putting a hint inside a query is
the most clever thing).
changing a flag would be as simple as running UPDATE on some system
table.
this should not be too intrusive as well.

best regards,

hans

On Dec 1, 2005, at 7:45 PM, Jonah H. Harris wrote:

> Tom,
>
> Don't get me wrong, I agree with you completely. I would rather
> put effort into enhancing the planner than in developing work-
> arounds. In 99% of all cases the planner works correctly, but I
> know people who actually have to disable planning options
> (mergejoin) in production applications because they get bad plans.
> The "bad" plans are not really bad in terms of what the planner
> knows about the query, just in areas where the planner doesn't look
> at other things.
>
> I also agree that a significant amount of work would be required to
> add run-time hints which would be better spent enhancing the system
> as a whole. My only suggestion was that it would be better than
> Part 1 of Neil's statement. Somehow I missed the end mention of
> multipliers which I agree requires less effort.
>
>
>
> On 12/1/05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Jonah H. Harris" <jonah(dot)harris(at)gmail(dot)com> writes:
> > In the last couple weeks I too have been thinking about planner
> hints.
> > Assuming I have read your post correctly, the issue I see with
> this idea is
> > that, in most cases, there won't be much of a difference between
> adding an
> > arbitrary cost value to each type of node and disabling it
> completely.
> > Also, by fiddling with an arbitrary cost the user may introduce a
> lot of
> > variation into the planner which may actually result in worse
> query plans.
>
> Which is pretty much exactly the problem with "planner hints", too.
> I've resisted that suggestion in the past and will continue to do so,
> because hints are accidents waiting to happen. Even if the hint is
> right
> today for your current Postgres version and current data distribution,
> it's likely not to be right further down the road --- but once the
> hint
> is embedded in your application, how often are you going to revisit
> it?
> As an example, a hint forcing the planner to use an indexscan with a
> particular index might have been a great idea in PG 8.0 and a lousy
> idea
> in 8.1, because it would prevent substitution of a possibly-far-better
> bitmap indexscan.
>
> The enable_foo switches are debug aids, not something you are expected
> to fool with for production purposes, and the same would be true of
> Neil's suggested multipliers. While I don't feel any strong need for
> variable multipliers, they'd be a small enough incremental amount of
> work that the suggestion doesn't require a lot of supporting argument.
> Adding a planner hint facility would be several orders of magnitude
> more work, and it would be taking the system in a design direction
> that
> I think is fundamentally misguided.
>
> regards, tom lane
>


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-04 13:57:19
Message-ID: 1133704639.2906.840.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2005-12-02 at 11:07 +0100, Csaba Nagy wrote:
> On Thu, 2005-12-01 at 22:01, Tom Lane wrote:
> > Greg Stark <gsstark(at)mit(dot)edu> writes:
> > > On the other hand the type I would prefer to see are hints that feed directly
> > > into filling in information the planner lacks. This only requires that the
> > > user understand his own data and still lets the planner pick the best plan
> > > based on the provided information.
> >
> > This would avoid some issues, but it still is vulnerable to the problem
> > that the hint you put in your code today will fail to track changes in
> > your data tomorrow.
>
> Tom, I have to disagree here. At least in our application, we must
> provide for an acceptable worst case scenario, and sometimes a slightly
> wrong estimate can lead to a plan which is very fast 99% of the time but
> completely wrong in 1% of the cases. Sometimes the percentage is 50/50.
> I've had this situation with some "limit" plans where the planner had
> chosen a wrong index. The problem there was that the planner had
> estimated that the query will have 20 rows as a result, but it had less,
> and resulted in the complete scan of the index... as opposed to a much
> smaller scan that would have resulted by scanning the other index, as
> that one would have provided an end condition orders of magnitudes
> sooner. Now the statistics will always be only an estimation, and +/- a
> few can really make a big difference in some situations. In this
> particular situation the index choice of the planner would have been
> faster for all cases where there were really 20 rows returned, but I
> forced it to always choose the other plan (by adding the proper order
> by) because I can't risk a bad result in any of the cases.
> In this particular case I was able to force the planner choose a
> specific plan, but that might not be always possible, so I guess it
> really would make sense to be able to tell the planner how selective
> some conditions are. And yes, sometimes I would like to "freeze" a
> specific "safe" plan for a specific query, even if it is not optimal.

Csaba raises a good point here. Many people say they want "hints" when
what they actually require the plan to be both stable and predictable.

Tom is right to point out that data can change over time. However,
experience with packaged application tuning is that you actually do want
to have things work in a stable way, even if that is somewhat
sub-optimal because when you have 1000s of statements it is important
that it doesn't change after you tune it - otherwise you never finish.
So I would like to give that requirement a name "Plan Stability"; the
actual solution to that could be many things.

Another aspect to this is predictability. At the moment, we optimise
according to the first parameter a prepared statement is bound with.
Many data distributions contain a small number of values that represent
a large fraction of the total. This can mean that it is pretty random
whether we will get a SeqScan and be stuck with it, or get an IndexScan
and be stuck with it. Either plan being wrong 50% of the time. In these
cases, hinting is definitely a very bad thing, since whichever you hint,
you'll be wrong. This situation gives us two more requirements:
- predictability - because we want to know the worst case
- flexibility - because we want to be able to take advantage of the best
case, but without causing an unconstrained worst case

So IMHO, the requirements list for prepared statement planning is that
optimization must be:
- Flexible
- Predictable
- Stable

The actual solutions to all of those things could be many and varied.

ISTM we could do some of that with another GUC, lets call it
prepare_once = on. The system default is to have a prepared statement
bound to a plan on its first parameter bind. If we set this to "off",
then the statement will replan each time we bind. This would give us
both flexibility and predictability. (As ever, someone suggest a better
name?).

The requirements for very large statement tuning are fairly different
from that. e.g. large reports or Data Warehousing queries. In those
cases, Tom's warning about the data changing can be critical and we
don't have the same need to prepare queries. Neil's original point about
needing to avoid SeqScans on some tables but not others hasn't really
been addressed. That got turned into hinting on particular nodes/node
types, but there are still issues: How do you identify one node in a
complex plan?

Perhaps the way forward would be to make enable_* accept a list of
tables, or a * (like listen_addresses). (And make "on" a synonym for "*"
and off a synonym for an empty list). That way, you'd be able to control
on a per-table basis what plan types are possible. (Still wouldn't work
that well for complex plans where same table referenced multiple times,
but its a step in the right direction.) I'm not sure I like that
particular idea much, but I'm attempting to address Neil's original
point, with which I agree.

Best Regards, Simon Riggs


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-04 15:49:52
Message-ID: 20051204154952.GA12868@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:

> ISTM we could do some of that with another GUC, lets call it
> prepare_once = on. The system default is to have a prepared statement
> bound to a plan on its first parameter bind. If we set this to "off",
> then the statement will replan each time we bind. This would give us
> both flexibility and predictability. (As ever, someone suggest a better
> name?).

Why would all statements behave the same? I think an important
percentage of cases would require a fixed plan (thus planning at first
sight is a good idea), while a limited number of cases would require
planning every time the sentence is called. Your idea of qualifying it
by table name does not make too much sense to me, because you can have
both types of queries for each table, and further any query where this
is necessary will involve more than one table anyway, so which one do
you choose to make the decision?

So we would provide a protocol/libpq option to allow first-params-
planning (the default and current behavior), and another to allow
planning-every-time. The latter would tell the server to save only the
parsetree of the query and replan each time it is invoked.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-05 00:55:44
Message-ID: 873bl8e50v.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Simon Riggs <simon(at)2ndquadrant(dot)com> writes:

> Csaba raises a good point here. Many people say they want "hints" when
> what they actually require the plan to be both stable and predictable.

Plan stability is also an important feature, especially for OLTP systems which
have hard real-time requirements. OLTP systems typically don't care about
getting the "best" plan for a query, only a plan that is "good enough".

"Good enough" means it can keep up with the rate of incoming requests; it
doesn't matter whether it keeps up with 10% headroom or 20% headroom. But if
one incoming query even one in a thousand takes 1000% of the time available
then the entire system risks falling down.

But plan stability is something that should be integrated directly in the
server. Not something achieved by having the user hint every query to defeat
the optimizer.

What I'm working on for my purposes here is a perl script that takes all the
queries in the application (either gathered from the log or stored statically)
and runs ANALYZE on all of them. Then within a transaction it runs ANALYZE on
the database and re-ANALYZES every query again. If any plans change then it
mails them to the DBA and rolls back the transaction with the database
analysis. The DBA gets a chance to approve the new plans before they go into
effect.

That's more or less what I expect an integrated plan stability feature to do.
It's like a shared query plan cache except that instead of being a cache it's
a database of plans that are specifically approved by the DBA. Queries that
don't have an approved plan could be configured to either produce a warning or
an error until the plan is approved.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-05 03:17:27
Message-ID: 12469.1133752647@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Plan stability is also an important feature, especially for OLTP
> systems which have hard real-time requirements. OLTP systems typically
> don't care about getting the "best" plan for a query, only a plan that
> is "good enough".

> "Good enough" means it can keep up with the rate of incoming requests; it
> doesn't matter whether it keeps up with 10% headroom or 20% headroom. But if
> one incoming query even one in a thousand takes 1000% of the time available
> then the entire system risks falling down.

Is it worth pointing out that using the same plan all the time is *no*
recipe for guaranteeing response time? There is no such thing as a plan
that is good for every case --- outlying data values can make a
usually-good plan blow out your performance guarantee anyway. Disabling
the planner is just a recipe for ensuring that that will happen, IMHO.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-05 06:53:55
Message-ID: 87lkz0c9vg.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Is it worth pointing out that using the same plan all the time is *no*
> recipe for guaranteeing response time? There is no such thing as a plan
> that is good for every case --- outlying data values can make a
> usually-good plan blow out your performance guarantee anyway. Disabling
> the planner is just a recipe for ensuring that that will happen, IMHO.

But outlying data is something the user has control over. The user when
approving plans needs to be aware not just that the plan is experimentally
good, but that it will perform reliably within the constraints based on his
knowledge of the application and the data.

My point is that I don't need a plan that is "good" for every case. I need a
plan I can trust to perform as expected. If my boss asks me what impact
doubling the number of users will have I need to be able to answer "it'll be
at worst twice as slow" (knowing that my queries and the plans I've seen are
all O(users)).

If twice as slow is still tolerable then that's fine, even if a faster plan
was possible. What I don't want to say is "well we'll have to try it and see"
which is all I can say if there's a risk the plans will change.

Actually I would expect the facility to only be useful if there was still a
way to update the plans. The DBA would periodically re-analyze the queries in
the system and check any changed plans to ensure they were still reasonable.
Effectively the same as my script except at the query level rather than at the
database statistics level.

The scenario where this is useful is not in a development environment where
things are changing dynamically. But rather in a mature application where the
data distribution is well established. Outlying data almost certainly
represents an application bug and should be signalled, not allowed to
spontaneously bring down the rest of the system.

As anecdotal evidence, in the last job where I worked, once we had 10 million
users and over a hundred web requests per second it would have been pretty
hard to believe any reasonable query could involve a sequential scan.
Certainly no query that the web application should be performing itself
without human intervention. If for whatever reason there was some outlying
data point where that would have been the "right" plan it would have
immediately brought down the web site.

--
greg


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-05 08:28:39
Message-ID: 1133771319.2906.858.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2005-12-04 at 12:49 -0300, Alvaro Herrera wrote:
> Simon Riggs wrote:
>
> > ISTM we could do some of that with another GUC, lets call it
> > prepare_once = on. The system default is to have a prepared statement
> > bound to a plan on its first parameter bind. If we set this to "off",
> > then the statement will replan each time we bind. This would give us
> > both flexibility and predictability. (As ever, someone suggest a better
> > name?).
>
> Why would all statements behave the same?

They would be flexible and predictable, but not the same.

prepare_once = off
would reoptimize each statement, so each could have a potentially
different plan. Which, in the case I cited, is the only optimal
behaviour: sticking to any one plan, by any method, would be wrong.

The plans would be predictable because performance never exceeds the
worst case SeqScan; planning would be flexible because it will always
take the best plan.

> I think an important
> percentage of cases would require a fixed plan (thus planning at first
> sight is a good idea), while a limited number of cases would require
> planning every time the sentence is called.

Yes, that is exactly what I see. Hence a GUC with a default the same as
it is now: they would only be prepared once. You would only set the GUC
to another value when you have a statement that looks like it needs
"hinting" i.e. the plan flips from SeqScan to IndexScan and back
depending upon the input data.

> Your idea of qualifying it
> by table name does not make too much sense to me, because you can have
> both types of queries for each table, and further any query where this
> is necessary will involve more than one table anyway, so which one do
> you choose to make the decision?

That was a different idea later down my note, not a variation of the
same one: that had nothing to do with the prepare_once concept. Those
options were meant to be set on a per statement basis, not at the server
level.

I was trying to solve Neil's stated problem: How to force one part of a
query to avoid a SeqScan, yet without touching the others.

> So we would provide a protocol/libpq option to allow first-params-
> planning (the default and current behavior), and another to allow
> planning-every-time. The latter would tell the server to save only the
> parsetree of the query and replan each time it is invoked.

Or some function similar. I prefer the GUC because it does not imply a
protocol change.

Best Regards, Simon Riggs


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-05 08:44:55
Message-ID: 1133772295.2906.873.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2005-12-05 at 01:53 -0500, Greg Stark wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> > There is no such thing as a plan
> > that is good for every case --- outlying data values can make a
> > usually-good plan blow out your performance guarantee anyway.
>
> But outlying data is something the user has control over.

Unfortunately, the DBA cannot choose the data distribution in his
database. So the appearance of control is somewhat illusory.

> The user when
> approving plans needs to be aware not just that the plan is experimentally
> good, but that it will perform reliably within the constraints based on his
> knowledge of the application and the data.

Greg's idea to have a plan comparator is a good one, for most
situations.

What you'll see if you run it though is no matter what you do, there
will be a few queries that are resistant to tuning. Their stored plans
will flip from SeqScan to IndexScan and back depending upon the
parameters used; neither will be suitable all the time and either
setting will cause very variable response times.

For those queries only, I seek a solution.

["Priming" the cache by executing IndexScan causing queries does not
work for all cases, so again the appearance of control is illusory.]

My solution is to replan the queries each time, rather than just once on
first parameter bind. By some mechanism; the GUC is just one of those.

Best Regards, Simon Riggs


From: Hans-Juergen Schoenig <postgres(at)cybertec(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-05 09:36:09
Message-ID: FF3B93B7-1915-45E9-A774-CCABBB6DC420@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Dec 5, 2005, at 4:17 AM, Tom Lane wrote:

> Greg Stark <gsstark(at)mit(dot)edu> writes:
>> Plan stability is also an important feature, especially for OLTP
>> systems which have hard real-time requirements. OLTP systems
>> typically
>> don't care about getting the "best" plan for a query, only a plan
>> that
>> is "good enough".
>
>> "Good enough" means it can keep up with the rate of incoming
>> requests; it
>> doesn't matter whether it keeps up with 10% headroom or 20%
>> headroom. But if
>> one incoming query even one in a thousand takes 1000% of the time
>> available
>> then the entire system risks falling down.
>
> Is it worth pointing out that using the same plan all the time is *no*
> recipe for guaranteeing response time? There is no such thing as a
> plan
> that is good for every case --- outlying data values can make a
> usually-good plan blow out your performance guarantee anyway.
> Disabling
> the planner is just a recipe for ensuring that that will happen, IMHO.
>
> regards, tom lane
>

I think I know what Greg is trying to say: I think in this plan
stability does not mean that the plan has to be completely fixed -
usually it is all about indexing. People start with an empty
perfectly analyzed database and data is added. However, some day some
cron job doing ANALYZE or whatever fails and the system will slow
down or even break down because data is added to some table which is
still seq-scanned. This is what usually happens and which leads to
support cases.

Adding hints to some comments or to the statement itself is not a
good solution as well. This is why I proposed a table or some flag
telling the planner what to favour (= always use a certain index). So
the basic idea is not to turn index of in general but to have the
chance to do it on a per index basis. I guess this would not be to
complex to implement and it solves 90% of all problems without having
to hide some information inside comments (which is no good at all).

best regards,

hans


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Hans-Juergen Schoenig <postgres(at)cybertec(dot)at>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-05 17:12:17
Message-ID: 87slt7bh8u.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Hans-Juergen Schoenig <postgres(at)cybertec(dot)at> writes:

> I think I know what Greg is trying to say: I think in this plan stability
> does not mean that the plan has to be completely fixed - usually it is all
> about indexing.

"Usually" problems occur because someone hasn't run analyze at all. That's not
what I'm talking about. I'm talking about a large mature system where the DBA
has everything tuned and adjusted properly and just wants to get a good
night's sleep, confident that the nightly analyze isn't going to suddenly
change the performance of existing queries.

> Adding hints to some comments or to the statement itself is not a good
> solution as well. This is why I proposed a table or some flag telling the
> planner what to favour (= always use a certain index). So the basic idea is
> not to turn index of in general but to have the chance to do it on a per index
> basis. I guess this would not be to complex to implement and it solves 90% of
> all problems without having to hide some information inside comments (which is
> no good at all).

I disagree that this is a reasonable solution.

I want to be sure my existing queries keep using the plans they've been using
until I allow them to change.

I don't want to sit down and type "select count(*) from users" and have it not
work correctly (ie, use a sequential scan) because the system is so single
mindedly tuned for the OLTP application.

--
greg


From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Hans-Juergen Schoenig <postgres(at)cybertec(dot)at>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-05 17:31:05
Message-ID: 1133803864.4779.179.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[snip]
> I want to be sure my existing queries keep using the plans they've been using
> until I allow them to change.
>
> I don't want to sit down and type "select count(*) from users" and have it not
> work correctly (ie, use a sequential scan) because the system is so single
> mindedly tuned for the OLTP application.
>

Now this is exactly what I've had in mind... it would be nice to
"fixate" a plan for some of the queries, and let the planner choose the
best for all the rest. I think some other data bases have something like
an "optimizer plan stability" feature, providing "outlines" of query
plan bundles. This is maybe too much, but specifying that for a certain
query I definitely want to use one index and not the other would be
nice...

On another note, it might be interesting to have some kind of "prepare
analyze", where the planner is allowed to go and get some more detailed
estimation from the actual table data based on the hard-coded parameter
values, and produce some more detailed statistics for the parameterized
values so it can then produce hot-shot plans for the actual parameter
values on each execution... I wonder if this makes any sense. This way
we could have some very detailed statistics directly supporting the
queries we actually use. I would call this kind of prepare for the most
used/problematic queries from time to time, and the planner should
decide what statistics it needs to support it and go and get it...

Cheers,
Csaba.


From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: generalizing the planner knobs
Date: 2005-12-08 05:35:45
Message-ID: 20051208053545.GI16053@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 01, 2005 at 12:32:12PM -0500, Qingqing Zhou wrote:
>
> "Neil Conway" <neilc(at)samurai(dot)com> wrote
> >
> > This would also be useful when diagnosing bad query plans: for example,
> > setting enable_seqscan=false often causes the planner to disregard the
> > use of *any* sequential scan, anywhere in the plan. The ability to
> > slightly bump up the cost of particular operations would allow more
> > alternative plans to be examined.
> >
>
> This method also has the problem of "enable_seqscan=false" in some
> situations. I would vote we implement the final general solution like query
> plan hints directly.

BTW, there's another end to the 'enable_seqscan=false' problem... it
sometimes doesn't work! Last I looked, enable_seqscan=false only added a
fixed overhead cost to a seqscan (1000000 IIRC). The problem is, some
queries will produce estimates for other methodes that are more
expensive than a seqscan even with the added burden. If instead of
adding a fixed amount enable_seqscan=false multiplied by some amount
then this would probably be impossible to occur.

(And before someone asks, no, I don't remember which query was actually
faster...)
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Jaime Casanova <systemguards(at)gmail(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: generalizing the planner knobs
Date: 2005-12-08 17:35:25
Message-ID: c2d9e70e0512080935m24823f9eke50dff8c2bacb3c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> BTW, there's another end to the 'enable_seqscan=false' problem... it
> sometimes doesn't work! Last I looked, enable_seqscan=false only added a
> fixed overhead cost to a seqscan (1000000 IIRC). The problem is, some
> queries will produce estimates for other methodes that are more
> expensive than a seqscan even with the added burden. If instead of
> adding a fixed amount enable_seqscan=false multiplied by some amount
> then this would probably be impossible to occur.
>
> (And before someone asks, no, I don't remember which query was actually
> faster...)
> --
> Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
> Pervasive Software http://pervasive.com work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>

I have often considered that this is an indication that seq scan is
actually the better plan... although, i have to admit that is a little
confusing that enable_seqscan = false actually let you use a seqscan
if the other plans are bad enough

--
regards,
Jaime Casanova
(DBA: DataBase Aniquilator ;)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jaime Casanova <systemguards(at)gmail(dot)com>
Cc: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: generalizing the planner knobs
Date: 2005-12-08 18:07:10
Message-ID: 12408.1134065230@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jaime Casanova <systemguards(at)gmail(dot)com> writes:
>> BTW, there's another end to the 'enable_seqscan=false' problem... it
>> sometimes doesn't work!

> I have often considered that this is an indication that seq scan is
> actually the better plan...

There are cases where it is the *only* plan, eg, you have no relevant
indexes. I am not sure that applies to Jim's complaint though.

regards, tom lane


From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jaime Casanova <systemguards(at)gmail(dot)com>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: generalizing the planner knobs
Date: 2005-12-08 19:18:10
Message-ID: 20051208191810.GE58449@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 08, 2005 at 01:07:10PM -0500, Tom Lane wrote:
> Jaime Casanova <systemguards(at)gmail(dot)com> writes:
> >> BTW, there's another end to the 'enable_seqscan=false' problem... it
> >> sometimes doesn't work!
>
> > I have often considered that this is an indication that seq scan is
> > actually the better plan...
>
> There are cases where it is the *only* plan, eg, you have no relevant
> indexes. I am not sure that applies to Jim's complaint though.

IIRC I ran into this when I was working on generating some numbers about
how well a high correlation improves the performance of an index scan
(since afaict the cost estimator for index scan is rather broken :( ) In
that case, I had defined an index on a ~120M row table on a collumn with
a very low correlation. It's pretty much a given that a seqscan and sort
would be faster than the index scan, but it would have still been good
to be able to verify that. Because of how enable_seqscan works, I
couldn't.

BTW,
http://archives.postgresql.org/pgsql-performance/2005-04/msg00669.php is
where I first mentioned this, including the cost function that I think
is broken.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461