Re: generalizing the planner knobs

Lists: pgsql-hackers
From: "Pollard, Mike" <mpollard(at)cincom(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Greg Stark" <gsstark(at)mit(dot)edu>
Cc: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, "Neil Conway" <neilc(at)samurai(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-01 21:14:11
Message-ID: 6418CC03D0FB1943A464E1FEFB3ED46B01B220F2@im01.cincom.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> On the other hand the type I would prefer to see are hints that feed
directly
> into filling in information the planner lacks. This only requires that
the
> user understand his own data and still lets the planner pick the best
plan
> based on the provided information.

Optimizer hints were added because some databases just don't have a very
smart optimizer. But you are much better served tracking down cases in
which the optimizer makes a bad choice, and teaching the optimizer how
to make a better one. That way, all users get the benefit of the fix.
Remember, the purpose of SQL is to isolate the end user from having to
care about how the data is retrieved; that is the RDBMS' problem. (the
other thing forgotten was that it was supposed to be a natural language.
NVL. Bah.)

Mike Pollard
SUPRA Server SQL Engineering and Support
Cincom Systems, Inc.


From: Gregory Maxwell <gmaxwell(at)gmail(dot)com>
To: "Pollard, Mike" <mpollard(at)cincom(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-02 02:01:34
Message-ID: e692861c0512011801v91461aepa007a93f02c5e68e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/1/05, Pollard, Mike <mpollard(at)cincom(dot)com> wrote:
> Optimizer hints were added because some databases just don't have a very
> smart optimizer. But you are much better served tracking down cases in
> which the optimizer makes a bad choice, and teaching the optimizer how
> to make a better one. That way, all users get the benefit of the fix.
> Remember, the purpose of SQL is to isolate the end user from having to
> care about how the data is retrieved; that is the RDBMS' problem. (the
> other thing forgotten was that it was supposed to be a natural language.
> NVL. Bah.)

The flipside there is that a good set of hinting options may increase
the amount of detailed feedback we get from users on improvements
needed in the optimizer. The current knobs are pretty blunt and don't
do as much as I'd like when trying to track down exactly where the
optimiser has gone wrong.

If we'd really like to avoid people using the knobs to rig queries,
how about making them only work with explain analyze, useful for
debugging but not so useful for actual queries.


From: Neil Conway <neilc(at)samurai(dot)com>
To: Gregory Maxwell <gmaxwell(at)gmail(dot)com>
Cc: "Pollard, Mike" <mpollard(at)cincom(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-02 03:37:26
Message-ID: 1133494646.9942.3.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2005-12-01 at 21:01 -0500, Gregory Maxwell wrote:
> If we'd really like to avoid people using the knobs to rig queries,
> how about making them only work with explain analyze, useful for
> debugging but not so useful for actual queries.

That seems a pretty arbitrary limitation. I agree that it's not ideal to
have users adjust planner behavior via this means, but until we have
something better, I think applying that limitation would only make the
status quo worse.

-Neil


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Gregory Maxwell <gmaxwell(at)gmail(dot)com>, "Pollard, Mike" <mpollard(at)cincom(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-02 03:56:10
Message-ID: 27415.1133495770@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> On Thu, 2005-12-01 at 21:01 -0500, Gregory Maxwell wrote:
>> If we'd really like to avoid people using the knobs to rig queries,
>> how about making them only work with explain analyze, useful for
>> debugging but not so useful for actual queries.

> That seems a pretty arbitrary limitation. I agree that it's not ideal to
> have users adjust planner behavior via this means, but until we have
> something better, I think applying that limitation would only make the
> status quo worse.

Yeah, I agree. Adding code to prevent people from using a facility
doesn't seem very reasonable, even if it's our policy that using the
facility for production purposes is not a good idea. In fact, we just
today had a counterexample --- see this thread:
http://archives.postgresql.org/pgsql-performance/2005-12/msg00015.php
Being able to use enable_nestloop got Markus out of a short-term bind,
which to me is exactly what you want to be able to do with this sort
of thing.

I don't have any problem with expending small amounts of work to make
it easier to hack the planner in small ways. The real problem I have
with a "planner hints" facility (in the form that I think most people
who ask for it have in mind) is that it would be a *very large* amount
of work to do it reasonably well, and I think that amount of effort
would be better spent in other ways.

regards, tom lane


From: Trent Shipley <tshipley(at)deru(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: generalizing the planner knobs
Date: 2005-12-02 07:06:28
Message-ID: 200512020006.28307.tshipley@deru.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 2005-12-01 19:01, Gregory Maxwell wrote:
> On 12/1/05, Pollard, Mike <mpollard(at)cincom(dot)com> wrote:
> > Optimizer hints were added because some databases just don't have a very
> > smart optimizer. But you are much better served tracking down cases in
> > which the optimizer makes a bad choice, and teaching the optimizer how
> > to make a better one. That way, all users get the benefit of the fix.
> > Remember, the purpose of SQL is to isolate the end user from having to
> > care about how the data is retrieved; that is the RDBMS' problem. (the
> > other thing forgotten was that it was supposed to be a natural language.
> > NVL. Bah.)
>
> The flipside there is that a good set of hinting options may increase
> the amount of detailed feedback we get from users on improvements
> needed in the optimizer. The current knobs are pretty blunt and don't
> do as much as I'd like when trying to track down exactly where the
> optimiser has gone wrong.
>
> If we'd really like to avoid people using the knobs to rig queries,
> how about making them only work with explain analyze, useful for
> debugging but not so useful for actual queries.

I'm all in favor of sticking to the declarative language ideal.

Also, I'm much in favor of protecting people from themselves.

On the other hand, if folks insist on engaging in extreme sports (like second
guessing the optimizer) I'm against regulating their freedom. I think
exposing planner variables would be a good thing, on net. Naturally, you
would warn everyone not to touch them. (Safety and freedom are both
necessary.)

If you can play with the knobs, you should let them be used to return real
result sets. That way, when you get feedback, you will be able to tell if
the cost estimator is "broken". Just returning a modified plan won't
challenge costing assumptions.


From: Greg Stark <gsstark(at)mit(dot)edu>
To: "Pollard, Mike" <mpollard(at)cincom(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Greg Stark" <gsstark(at)mit(dot)edu>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, "Neil Conway" <neilc(at)samurai(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-02 07:09:49
Message-ID: 87d5kgf002.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Pollard, Mike" <mpollard(at)cincom(dot)com> writes:

> Optimizer hints were added because some databases just don't have a very
> smart optimizer. But you are much better served tracking down cases in
> which the optimizer makes a bad choice, and teaching the optimizer how
> to make a better one.

You more or less missed my entire point.

You can always teach the optimizer to make better decisions based on good
data. Your statement is basically right when talking about tweaking the
optimizer's decisions to ignore its best judgement.

But there are many many cases where the data the optimizer has available isn't
good and for good reason. And in plenty of those cases the data the optimizer
has available *can't* be good.

In the extreme, no amount of added intelligence in the optimizer is going to
help it come up with any sane selectivity estimate for something like

WHERE radius_authenticate(user) = 'OK'

--
greg


From: Rod Taylor <pg(at)rbt(dot)ca>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: "Pollard, Mike" <mpollard(at)cincom(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-02 15:20:07
Message-ID: 1133536807.12218.168.camel@home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> In the extreme, no amount of added intelligence in the optimizer is going to
> help it come up with any sane selectivity estimate for something like
>
> WHERE radius_authenticate(user) = 'OK'

Why not?

The missing capability in this case is to be able to provide or generate
(self learning?) statistics for a function that describe a typical
result and the cost of getting that result.
--


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Rod Taylor <pg(at)rbt(dot)ca>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, "Pollard, Mike" <mpollard(at)cincom(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-02 20:49:02
Message-ID: 87y833dy2p.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Rod Taylor <pg(at)rbt(dot)ca> writes:

> > In the extreme, no amount of added intelligence in the optimizer is going to
> > help it come up with any sane selectivity estimate for something like
> >
> > WHERE radius_authenticate(user) = 'OK'
>
> Why not?
>
> The missing capability in this case is to be able to provide or generate
> (self learning?) statistics for a function that describe a typical result
> and the cost of getting that result.

Ok, try "WHERE radius_authenticate(user, (select ...), ?)"

The point is that you can improve the estimates the planner gets. But you can
never make them omniscient. There will always be cases where the user knows
his data more than the planner. And those hints are still valid when a new
optimizer has new plans available.

This is different from hints that tell the planner what plan to use. Every
situation where the predicted cost is inaccurate despite accurate estimates
represents a fixable bug in the optimizer's cost model. When a new version of
the optimizer is available with a more accurate cost model or new available
plans those kinds of hints will only get in the way.

--
greg


From: Rod Taylor <pg(at)rbt(dot)ca>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: "Pollard, Mike" <mpollard(at)cincom(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-02 20:51:52
Message-ID: 1133556712.28999.24.camel@home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2005-12-02 at 15:49 -0500, Greg Stark wrote:
> Rod Taylor <pg(at)rbt(dot)ca> writes:
>
> > > In the extreme, no amount of added intelligence in the optimizer is going to
> > > help it come up with any sane selectivity estimate for something like
> > >
> > > WHERE radius_authenticate(user) = 'OK'
> >
> > Why not?
> >
> > The missing capability in this case is to be able to provide or generate
> > (self learning?) statistics for a function that describe a typical result
> > and the cost of getting that result.
>
> Ok, try "WHERE radius_authenticate(user, (select ...), ?)"
>
> The point is that you can improve the estimates the planner gets. But you can
> never make them omniscient. There will always be cases where the user knows
> his data more than the planner. And those hints are still valid when a new
> optimizer has new plans available.

You missed my point. If the user knows there data there is absolutely no
reason, aside from missing functionality in PostgreSQL, that statistics
cannot be generated to represent what the user knows about their data.

Once the planner knows the statistics it can make the right decision
without any hints.

The missing feature here is the ability to generate or provide
statistics and costs for functions.

--


From: Gregory Maxwell <gmaxwell(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Rod Taylor <pg(at)rbt(dot)ca>, "Pollard, Mike" <mpollard(at)cincom(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: generalizing the planner knobs
Date: 2005-12-02 21:23:27
Message-ID: e692861c0512021323j2909bff8hb13095309acee6c5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02 Dec 2005 15:49:02 -0500, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> Rod Taylor <pg(at)rbt(dot)ca> writes:
> > The missing capability in this case is to be able to provide or generate
> > (self learning?) statistics for a function that describe a typical result
> > and the cost of getting that result.
>
> Ok, try "WHERE radius_authenticate(user, (select ...), ?)"
>
> The point is that you can improve the estimates the planner gets. But you can
> never make them omniscient. There will always be cases where the user knows
> his data more than the planner. And those hints are still valid when a new
> optimizer has new plans available.

Actually... If a statistics engine stores the entire query as well
and used that as a key I don't see why it couldn't figure this out.
I.e. in queries that look like Z operation X has historically had
selectivity Y.

The the instruction to the user is simple: 'make sure that queries
with different results look different' . This is often naturally the
case.

The challenge becomes how do you group together queries which are
mostly the same so that you get enough data, but not falsely cluster
queries with different statistics.

The simplest way check the statistics list for the most similar query
match, and use that information. If the result is similar to what is
expected, use it to update the statistics record. If the measured
selectivity is too different make a new record which will then attract
similar queries.

Sounds like a good research project for someone.


From: Trent Shipley <tshipley(at)deru(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: generalizing the planner knobs
Date: 2005-12-02 23:05:24
Message-ID: 200512021605.25458.tshipley@deru.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Is it possible to submit a hand written or arbitrary execution plan to the
retrieval engine? (That is, can one bypass the SQL parser and planner or
optimizer and just provide instructions to nested loop join table a to table
b ...)