Re: Hints proposal

Lists: pgsql-hackerspgsql-performance
From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Hints proposal
Date: 2006-10-12 15:14:39
Message-ID: 20061012151439.GT28647@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Posting here instead of hackers since this is where the thread got
started...

The argument has been made that producing a hints system will be as hard
as actually fixing the optimizer. There's also been clamoring for an
actual proposal, so here's one that (I hope) wouldn't be very difficult
to implemen.

My goal with this is to keep the coding aspect as simple as possible, so
that implementation and maintenance of this isn't a big burden. Towards
that end, these hints either tell the planner specifically how to handle
some aspect of a query, or they tell it to modify specific cost
estimates. My hope is that this information could be added to the
internal representation of a query without much pain, and that the
planner can then use that information when generating plans.

The syntax these hints is something arbitrary. I'm borrowing Oracle's
idea of embedding hints in comments, but we can use some other method if
desired. Right now I'm more concerned with getting the general idea
across.

Since this is such a controversial topic, I've left this at a 'rough
draft' stage - it's meant more as a framework for discussion than a
final proposal for implementation.

Forcing a Plan
--------------
These hints would outright force the planner to do things a certain way.

... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */

This would force the planner to access table via a seqscan or
index_name. For the index case, you can also specify if the access must
or must not be via a bitmap scan. If neither is specified, the planner
is free to choose either one.

Theoretically, we could also allow "ACCESS INDEX" without an index name,
which would simply enforce that a seqscan not be used, but I'm not sure
how useful that would be.

... FROM a JOIN b /* {HASH|NESTED LOOP|MERGE} JOIN */ ON (...)
... FROM a JOIN b ON (...) /* [HASH|NESTED LOOP|MERGE] JOIN */

Force the specified join mechanism on the join. The first form would not
enforce a join order, it would only force table b to be joined to the
rest of the relations using the specified join type. The second form
would specify that a joins to b in that order, and optionally specify
what type of join to use.

... GROUP BY ... /* {HASH|SORT} AGGREGATE */

Specify how aggregation should be handled.

Cost Tweaking
-------------
It would also be useful to allow tweaking of planner cost estimates.
This would take the general form of

node operator value

where node would be a planner node/hint (ie: ACCESS INDEX), operator
would be +, -, *, /, and value would be the amount to change the
estimate by. So "ACCESS INDEX my_index / 2" would tell the planner to
cut the estimated cost of any index scan on a given table in half.

(I realize the syntax will probably need to change to avoid pain in the
grammar code.)

Unlike the hints above that are ment to force a certain behavior on an
operation, you could potentially have multiple cost hints in a single
location, ie:

FROM a /* HASH JOIN * 1.1 NESTED LOOP JOIN * 2 MERGE JOIN + 5000 */
JOIN b ON (...) /* NESTED LOOP JOIN - 5000 */

The first comment block would apply to any joins against a, while the
second one would apply only to joins between a and b. The effects would
be cumulative, so this example means that any merge join against a gets
an added cost of 5000, unless it's a join with b (because +5000 + -5000
= 0). I think you could end up with odd cases if the second form just
over-rode the first, which is why it should be cummulative.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Hints proposal
Date: 2006-10-12 15:19:10
Message-ID: 200610121519.k9CFJA416544@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance


Because DB2 doesn't like hints, and the fact that they have gotten to a
point where they feel they do not need them, I feel we too can get to a
point where we don't need them either. The question is whether we can
get there quickly enough for our userbase.

I perfer attacking the problem at the table definition level, like
something like "volatile", or adding to the existing table statistics.

---------------------------------------------------------------------------

Jim C. Nasby wrote:
> Posting here instead of hackers since this is where the thread got
> started...
>
> The argument has been made that producing a hints system will be as hard
> as actually fixing the optimizer. There's also been clamoring for an
> actual proposal, so here's one that (I hope) wouldn't be very difficult
> to implemen.
>
> My goal with this is to keep the coding aspect as simple as possible, so
> that implementation and maintenance of this isn't a big burden. Towards
> that end, these hints either tell the planner specifically how to handle
> some aspect of a query, or they tell it to modify specific cost
> estimates. My hope is that this information could be added to the
> internal representation of a query without much pain, and that the
> planner can then use that information when generating plans.
>
> The syntax these hints is something arbitrary. I'm borrowing Oracle's
> idea of embedding hints in comments, but we can use some other method if
> desired. Right now I'm more concerned with getting the general idea
> across.
>
> Since this is such a controversial topic, I've left this at a 'rough
> draft' stage - it's meant more as a framework for discussion than a
> final proposal for implementation.
>
> Forcing a Plan
> --------------
> These hints would outright force the planner to do things a certain way.
>
> ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */
>
> This would force the planner to access table via a seqscan or
> index_name. For the index case, you can also specify if the access must
> or must not be via a bitmap scan. If neither is specified, the planner
> is free to choose either one.
>
> Theoretically, we could also allow "ACCESS INDEX" without an index name,
> which would simply enforce that a seqscan not be used, but I'm not sure
> how useful that would be.
>
> ... FROM a JOIN b /* {HASH|NESTED LOOP|MERGE} JOIN */ ON (...)
> ... FROM a JOIN b ON (...) /* [HASH|NESTED LOOP|MERGE] JOIN */
>
> Force the specified join mechanism on the join. The first form would not
> enforce a join order, it would only force table b to be joined to the
> rest of the relations using the specified join type. The second form
> would specify that a joins to b in that order, and optionally specify
> what type of join to use.
>
> ... GROUP BY ... /* {HASH|SORT} AGGREGATE */
>
> Specify how aggregation should be handled.
>
> Cost Tweaking
> -------------
> It would also be useful to allow tweaking of planner cost estimates.
> This would take the general form of
>
> node operator value
>
> where node would be a planner node/hint (ie: ACCESS INDEX), operator
> would be +, -, *, /, and value would be the amount to change the
> estimate by. So "ACCESS INDEX my_index / 2" would tell the planner to
> cut the estimated cost of any index scan on a given table in half.
>
> (I realize the syntax will probably need to change to avoid pain in the
> grammar code.)
>
> Unlike the hints above that are ment to force a certain behavior on an
> operation, you could potentially have multiple cost hints in a single
> location, ie:
>
> FROM a /* HASH JOIN * 1.1 NESTED LOOP JOIN * 2 MERGE JOIN + 5000 */
> JOIN b ON (...) /* NESTED LOOP JOIN - 5000 */
>
> The first comment block would apply to any joins against a, while the
> second one would apply only to joins between a and b. The effects would
> be cumulative, so this example means that any merge join against a gets
> an added cost of 5000, unless it's a join with b (because +5000 + -5000
> = 0). I think you could end up with odd cases if the second form just
> over-rode the first, which is why it should be cummulative.
> --
> Jim Nasby jim(at)nasby(dot)net
> EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Joshua Marsh" <icub3d(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Hints proposal
Date: 2006-10-12 15:26:24
Message-ID: 38242de90610120826l5328759bt41468644e54badae@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On 10/12/06, Jim C. Nasby <jim(at)nasby(dot)net> wrote:
>
> Posting here instead of hackers since this is where the thread got
> started...
>
> The argument has been made that producing a hints system will be as hard
> as actually fixing the optimizer. There's also been clamoring for an
> actual proposal, so here's one that (I hope) wouldn't be very difficult
> to implemen.
>
> My goal with this is to keep the coding aspect as simple as possible, so
> that implementation and maintenance of this isn't a big burden. Towards
> that end, these hints either tell the planner specifically how to handle
> some aspect of a query, or they tell it to modify specific cost
> estimates. My hope is that this information could be added to the
> internal representation of a query without much pain, and that the
> planner can then use that information when generating plans.

I've been following the last thread with a bit of interest. I like the
proposal. It seems simple and easy to use. What is it about hinting that
makes it so easily breakable with new versions? I don't have any experience
with Oracle, so I'm not sure how they screwed logic like this up. Hinting
to use a specific merge or scan seems fairly straight forward; if the query
requests to use an index on a join, I don't see how hard it is to go with
the suggestion. It will become painfully obvious to the developer if his
hinting is broken.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Hints proposal
Date: 2006-10-12 15:42:32
Message-ID: 10866.1160667752@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

[ This is off-topic for -performance, please continue the thread in
-hackers ]

"Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> These hints would outright force the planner to do things a certain way.
> ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */

This proposal seems to deliberately ignore every point that has been
made *against* doing things that way. It doesn't separate the hints
from the queries, it doesn't focus on fixing the statistical or cost
misestimates that are at the heart of the issue, and it takes no account
of the problem of hints being obsoleted by system improvements.

> It would also be useful to allow tweaking of planner cost estimates.
> This would take the general form of
> node operator value

This is at least focusing on the right sort of thing, although I still
find it completely misguided to be attaching hints like this to
individual queries.

What I would like to see is information *stored in a system catalog*
that affects the planner's cost estimates. As an example, the DBA might
know that a particular table is touched sufficiently often that it's
likely to remain RAM-resident, in which case reducing the page fetch
cost estimates for just that table would make sense. (BTW, this is
something the planner could in principle know, but we're unlikely to
do it anytime soon, for a number of reasons including a desire for plan
stability.) The other general category of thing I think we need is a
way to override selectivity estimates for particular forms of WHERE
clauses.

regards, tom lane


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hints proposal
Date: 2006-10-12 15:55:17
Message-ID: 452E6565.4010400@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Bruce Momjian wrote:
> Because DB2 doesn't like hints, and the fact that they have gotten to a
> point where they feel they do not need them, I feel we too can get to a
> point where we don't need them either. The question is whether we can
> get there quickly enough for our userbase.

In all fairness, when I used to work with DB2 we often had to rewrite
queries to persuade the planner to choose a different plan. Often it was
more of an issue of plan stability; a query would suddenly become
horribly slow in production because a table had grown slowly to the
point that it chose a different plan than before. Then we had to modify
the query again, or manually set the statistics. In extreme cases we had
to split a query to multiple parts and use temporary tables and move
logic to the application to get a query to perform consistently and fast
enough. I really really missed hints.

Because DB2 doesn't have MVCC, an accidental table scan is very serious,
because with stricter isolation levels that keeps the whole table locked.

That said, I really don't like the idea of hints like "use index X"
embedded in a query. I do like the idea of hints that give the planner
more information about the data. I don't have a concrete proposal, but
here's some examples of hints I'd like to see:

"table X sometimes has millions of records and sometimes it's empty"
"Expression (table.foo = table2.bar * 2) has selectivity 0.99"
"if foo.bar = 5 then foo.field2 IS NULL"
"Column X is unique"
"function foobar() always returns either 1 or 2, and it returns 2 90% of
the time."
"if it's Monday, then table NEW_ORDERS has a cardinality of 100000,
otherwise 10."

BTW: Do we make use of CHECK constraints in the planner? In DB2, that
was one nice and clean way of hinting the planner about things. If I
remember correctly, you could even define CHECK constraints that weren't
actually checked at run-time, but were used by the planner.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hints proposal
Date: 2006-10-12 16:22:10
Message-ID: b42b73150610120922x53bbc14atd9591f06f6e4e274@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On 10/12/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> [ This is off-topic for -performance, please continue the thread in
> -hackers ]

> This proposal seems to deliberately ignore every point that has been
> made *against* doing things that way. It doesn't separate the hints
> from the queries, it doesn't focus on fixing the statistical or cost
> misestimates that are at the heart of the issue, and it takes no account
> of the problem of hints being obsoleted by system improvements.

what about extending the domain system so that we can put in ranges
that override the statistics or (imo much more importantly) provide
information when the planner would have to restort to a guess. my case
for this is prepared statements with a parameterized limit clause.

prepare foo(l int) as select * from bar limit $1;

maybe:
create domain foo_lmt as int hint 1; -- probably needs to be fleshed out
prepare foo(l foolmt) as select * from bar limit $1;

this says: "if you have to guess me, please use this"

what I like about this over previous attempts to persuade you is the
grammar changes are localized and also imo future proofed. planner can
ignore the hints if they are not appropriate for the oparation.

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hints proposal
Date: 2006-10-12 16:24:16
Message-ID: 11204.1160670256@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> BTW: Do we make use of CHECK constraints in the planner?

Only for "constraint exclusion", and at the moment that's off by default.

The gating problem here is that if the planner relies on a CHECK
constraint, and then you drop the constraint, the previously generated
plan might start to silently deliver wrong answers. So I'd like to see
a plan invalidation mechanism in place before we go very far down the
path of relying on constraints for planning. That's something I'm going
to try to make happen for 8.3, though.

regards, tom lane


From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-12 16:25:25
Message-ID: 20061012162525.GF28647@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Thu, Oct 12, 2006 at 11:42:32AM -0400, Tom Lane wrote:
> [ This is off-topic for -performance, please continue the thread in
> -hackers ]
>
> "Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> > These hints would outright force the planner to do things a certain way.
> > ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */
>
> This proposal seems to deliberately ignore every point that has been
> made *against* doing things that way. It doesn't separate the hints
> from the queries, it doesn't focus on fixing the statistical or cost
> misestimates that are at the heart of the issue, and it takes no account
> of the problem of hints being obsoleted by system improvements.

Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See
also my comment below.

> > It would also be useful to allow tweaking of planner cost estimates.
> > This would take the general form of
> > node operator value
>
> This is at least focusing on the right sort of thing, although I still
> find it completely misguided to be attaching hints like this to
> individual queries.

Yes, but as I mentioned the idea here was to come up with something that
is (hopefully) easy to define and implement. In other words, something
that should be doable for 8.3. Because this proposal essentially amounts
to limiting plans the planner will consider and tweaking it's cost
estimates, I'm hoping that it should be (relatively) easy to implement.

> What I would like to see is information *stored in a system catalog*
> that affects the planner's cost estimates. As an example, the DBA might
> know that a particular table is touched sufficiently often that it's
> likely to remain RAM-resident, in which case reducing the page fetch
> cost estimates for just that table would make sense. (BTW, this is
> something the planner could in principle know, but we're unlikely to
> do it anytime soon, for a number of reasons including a desire for plan
> stability.)

All this stuff is great and I would love to see it! But this is all so
abstract that I'm doubtful this could make it into 8.4, let alone 8.3.
Especially if we want a comprehensive system that will handle most/all
cases. I don't know if we even have a list of all the cases we need to
handle.

> The other general category of thing I think we need is a
> way to override selectivity estimates for particular forms of WHERE
> clauses.

I hadn't thought about that for hints, but it would be a good addition.
I think the stats-tweaking model would work, but we'd probably want to
allow "=" as well (which could go into the other stats tweaking hints as
well).

... WHERE a = b /* SELECTIVITY {+|-|*|/|=} value */
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hints proposal
Date: 2006-10-12 16:34:25
Message-ID: 1160670865.28751.58.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

OK, I just have to comment...

"Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> > These hints would outright force the planner to do things a certain way.
> > ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */
>
> This proposal seems to deliberately ignore every point that has been
> made *against* doing things that way. It doesn't separate the hints
> from the queries, it doesn't focus on fixing the statistical or cost
> misestimates that are at the heart of the issue, and it takes no account
> of the problem of hints being obsoleted by system improvements.

But whatever arguments you made about planner improvements and the like,
it will NEVER be possible to correctly estimate in all cases the
statistics for a query, even if you perfectly know WHAT statistics you
need, which is also not the case all the time.

Tom, you're the one who knows best how the planner works... can you bet
anything you care about on the fact that one day the planner will never
ever generate a catastrophic plan without DBA tweaking ? And how far in
time we'll get to that point ?

Until that point is achieved, the above proposal is one of the simplest
to understand for the tweaking DBA, and the fastest to deploy when faced
with catastrophic plans. And I would guess it is one of the simplest to
be implemented and probably not very high maintenance either, although
this is just a guess.

If I could hint some of my queries, I would enable anonymous prepared
statements to take into account the parameter values, but I can't
because that results in runaway queries every now and then, so I had to
force postgres generate generic queries without knowing anything about
parameter values... so the effect for me is an overall slower postgres
system because I couldn't fix the particular problems I had and had to
tweak general settings. And when I have a problem I can't wait until the
planner is fixed, I have to solve it immediately... the current means to
do that are suboptimal.

The argument that planner hints would hide problems from being solved is
a fallacy. To put a hint in place almost the same amount of analysis is
needed from the DBA as solving the problem now, so users who ask now for
help will further do it even in the presence of hints. The ones who
wouldn't are not coming for help now either, they know their way out of
the problems... and the ones who still report a shortcoming of the
planner will do it with hints too.

I would even say it would be an added benefit, cause then you could
really see how well a specific plan will do without having the planner
capable to generate alone that plan... so knowledgeable users could come
to you further down the road when they know where the planner is wrong,
saving you time.

I must say it again, this kind of query-level hinting would be the
easiest to understand for the developers... there are many
trial-end-error type of programmers out there, if you got a hint wrong,
you fix it and move on, doesn't need to be perfect, it just have to be
good enough. I heavily doubt that postgres will get bad publicity
because user Joe sot himself in the foot by using bad hints... the
probability for that is low, you must actively put those hints there,
and if you take the time to do that then you're not the average Joe, and
probably not so lazy either, and if you're putting random hints, then
you would probably mess it up some other way anyway.

And the thing about missing new features is also not very founded. If I
would want to exclude a full table scan on a specific table for a
specific query, than that's about for sure that I want to do that
regardless what new features postgres will offer in the future. Picking
one specific access method is more prone to missing new access methods,
but even then, when I upgrade the DB server to a new version, I usually
have enough other compatibility problems (till now I always had some on
every upgrade I had) that making a round of upgrading hints is not an
outstanding problem. And if the application works good enough with
suboptimal plans, why would I even take that extra effort ?

I guess the angle is: I, as a practicing DBA would like to be able to
experiment and get most out of the imperfect tool I have, and you, the
developers, want to make the tool perfect... I don't care about perfect
tools, it just have to do the job... hints or anything else, if I can
make it work GOOD ENOUGH, it's all fine. And hints is something I would
understand and be able to use.

Thanks for your patience if you're still reading this...

Cheers,
Csaba.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-12 16:37:46
Message-ID: 11366.1160671066@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

"Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> Yes, but as I mentioned the idea here was to come up with something that
> is (hopefully) easy to define and implement. In other words, something
> that should be doable for 8.3.

Sorry, but that is not anywhere on my list of criteria for an important
feature. Having to live with a quick-and-dirty design for the
foreseeable future is an ugly prospect --- and anything that puts hints
into application code is going to lock us down to supporting it forever.

regards, tom lane


From: "Bucky Jordan" <bjordan(at)lumeta(dot)com>
To: "Joshua Marsh" <icub3d(at)gmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hints proposal
Date: 2006-10-12 16:40:13
Message-ID: 78ED28FACE63744386D68D8A9D1CF5D4209C9A@MAIL.corp.lumeta.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

>What is it about hinting that makes it so easily breakable with new versions?  I >don't have any experience with Oracle, so I'm not sure how they screwed logic like >this up.  

I don't have a ton of experience with oracle either, mostly DB2, MSSQL and PG. So, I thought I'd do some googling, and maybe others might find this useful info.

http://asktom.oracle.com/pls/ask/f?p=4950:8:2177642270773127589::NO::F4950_P8_DISPLAYID,F4950_P8_CRITERIA:7038986332061

Interesting quote: "In Oracle Applications development (11i apps - HR, CRM, etc) Hints are strictly forbidden. We find the underlying cause and fix it." and
"Hints -- only useful if you are in RBO and you want to make use of an access
path."

Maybe because I haven't had access to hints before, I've never been tempted to use them. However, I can't remember having to re-write SQL due to a PG upgrade either.

Oh, and if you want to see everything that gets broken/depreciated with new versions, just take a look at oracle's release notes for 9i and 10g. I particularly dislike how they rename stuff for no apparent reason (e.g. NOPARALLEL is now NO_PARALLEL - http://www.oracle-base.com/articles/10g/PerformanceTuningEnhancements10g.php)

At the very least, I agree it is important to separate the query (what data do I want) from performance options (config, indexes, hints, etc). The data I want doesn't change unless I have a functionality/requirements change. So I'd prefer not to have to go back and change that code just to tweak performance. In addition, this creates an even bigger mess for dynamic queries. I would be much more likely to consider hints if they could be applied separately.

- Bucky


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-12 16:40:30
Message-ID: 452E6FFE.7010504@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Jim,

>>> These hints would outright force the planner to do things a certain way.
>>> ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */
>> This proposal seems to deliberately ignore every point that has been
>> made *against* doing things that way. It doesn't separate the hints
>> from the queries, it doesn't focus on fixing the statistical or cost
>> misestimates that are at the heart of the issue, and it takes no account
>> of the problem of hints being obsoleted by system improvements.
>
> Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See
> also my comment below.

I don't see how adding extra tags to queries is easier to implement than
an ability to modify the system catalogs. Quite the opposite, really.

And, as I said, if you're going to push for a feature that will be
obsolesced in one version, then you're going to have a really rocky row
to hoe.

> Yes, but as I mentioned the idea here was to come up with something that
> is (hopefully) easy to define and implement. In other words, something
> that should be doable for 8.3. Because this proposal essentially amounts
> to limiting plans the planner will consider and tweaking it's cost
> estimates, I'm hoping that it should be (relatively) easy to implement.

Even I, the chief marketing geek, am more concerned with getting a
feature that we will still be proud of in 5 years than getting one in
the next nine months. Keep your pants on!

I actually think the way to attack this issue is to discuss the kinds of
errors the planner makes, and what tweaks we could do to correct them.
Here's the ones I'm aware of:

-- Incorrect selectivity of WHERE clause
-- Incorrect selectivity of JOIN
-- Wrong estimate of rows returned from SRF
-- Incorrect cost estimate for index use

Can you think of any others?

I also feel that a tenet of the design of the "planner tweaks" system
ought to be that the tweaks are collectible and analyzable in some form.
This would allow DBAs to mail in their tweaks to -performance or
-hackers, and then allow us to continue improving the planner.

--Josh Berkus


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Hints proposal
Date: 2006-10-12 16:42:55
Message-ID: 1160671375.31966.84.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Thu, 2006-10-12 at 10:14 -0500, Jim C. Nasby wrote:
> The syntax these hints is something arbitrary. I'm borrowing Oracle's
> idea of embedding hints in comments, but we can use some other method if
> desired. Right now I'm more concerned with getting the general idea
> across.
>

Is there any advantage to having the hints in the queries? To me that's
asking for trouble with no benefit at all. It would seem to me to be
better to have a system catalog that defined hints as something like:

"If user A executes a query matching regex R, then coerce (or force) the
planner in this way."

I'm not suggesting that we do that, but it seems better then embedding
the hints in the queries themselves.

Regards,
Jeff Davis


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc: postgres hackers <pgsql-hackers(at)postgresql(dot)org>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-12 16:45:23
Message-ID: 452E7123.1090302@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Csaba,

> I guess the angle is: I, as a practicing DBA would like to be able to
> experiment and get most out of the imperfect tool I have, and you, the
> developers, want to make the tool perfect... I don't care about perfect
> tools, it just have to do the job... hints or anything else, if I can
> make it work GOOD ENOUGH, it's all fine. And hints is something I would
> understand and be able to use.

Hmmm, if you already understand Visual Basic syntax, should we support
that too? Or maybe we should support MySQL's use of '0000-00-00' as the
"zero" date because people "understand" that?

We're just not going to adopt a bad design because Oracle DBAs are used
to it. If we wanted to do that, we could shut down the project and
join a proprietary DB staff.

The current discussion is:

a) Planner tweaking is sometimes necessary;
b) Oracle HINTS are a bad design for planner tweaking;
c) Can we come up with a good design for planner tweaking?

So, how about suggestions for a good design?

--Josh Berkus


From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Joshua Marsh <icub3d(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Hints proposal
Date: 2006-10-12 16:46:07
Message-ID: 20061012164607.GH28647@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Thu, Oct 12, 2006 at 09:26:24AM -0600, Joshua Marsh wrote:
> On 10/12/06, Jim C. Nasby <jim(at)nasby(dot)net> wrote:
> >
> >Posting here instead of hackers since this is where the thread got
> >started...
> >
> >The argument has been made that producing a hints system will be as hard
> >as actually fixing the optimizer. There's also been clamoring for an
> >actual proposal, so here's one that (I hope) wouldn't be very difficult
> >to implemen.
> >
> >My goal with this is to keep the coding aspect as simple as possible, so
> >that implementation and maintenance of this isn't a big burden. Towards
> >that end, these hints either tell the planner specifically how to handle
> >some aspect of a query, or they tell it to modify specific cost
> >estimates. My hope is that this information could be added to the
> >internal representation of a query without much pain, and that the
> >planner can then use that information when generating plans.
>
>
> I've been following the last thread with a bit of interest. I like the
> proposal. It seems simple and easy to use. What is it about hinting that
> makes it so easily breakable with new versions? I don't have any experience
> with Oracle, so I'm not sure how they screwed logic like this up. Hinting
> to use a specific merge or scan seems fairly straight forward; if the query
> requests to use an index on a join, I don't see how hard it is to go with
> the suggestion. It will become painfully obvious to the developer if his
> hinting is broken.

The problem is that when you 'hint' (which is actually not a great name
for the first part of my proposal, since it's really forcing the planner
to do something), you're tying the planner's hands. As the planner
improves in newer versions, it's very possible to end up with forced
query plans that are much less optimal than what the newer planner could
come up with. This is especially true as new query execution nodes are
created, such as hashaggregate.

The other downside is that it's per-query. It would certainly be useful
to be able to nudge the planner in the right direction on a per-table
level, but it's just not clear how to accomplish that. Like I said, the
idea behind my proposal is to have something that can be done soon, like
for 8.3.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Hints proposal
Date: 2006-10-12 16:53:27
Message-ID: 20061012165326.GI28647@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Thu, Oct 12, 2006 at 04:55:17PM +0100, Heikki Linnakangas wrote:
> Bruce Momjian wrote:
> >Because DB2 doesn't like hints, and the fact that they have gotten to a
> >point where they feel they do not need them, I feel we too can get to a
> >point where we don't need them either. The question is whether we can
> >get there quickly enough for our userbase.
>
> In all fairness, when I used to work with DB2 we often had to rewrite
> queries to persuade the planner to choose a different plan. Often it was
> more of an issue of plan stability; a query would suddenly become
> horribly slow in production because a table had grown slowly to the
> point that it chose a different plan than before. Then we had to modify
> the query again, or manually set the statistics. In extreme cases we had
> to split a query to multiple parts and use temporary tables and move
> logic to the application to get a query to perform consistently and fast
> enough. I really really missed hints.

Oracle has an interesting way to deal with this, in that you can store a
plan that the optimizer generates and tell it to always use it for that
query. There's some other management tools built on top of that. I don't
know how commonly it's used, though...

Also, on the DB2 argument... I'm wondering what happens when people end
up with a query that they can't get to execute the way it should? Is the
planner *that* good that it never happens? Do you have to wait for a
fixpack when it does happen? I'm all for having a super-smart planner,
but I'm highly doubtful it will always know exactly what to do.

> That said, I really don't like the idea of hints like "use index X"
> embedded in a query. I do like the idea of hints that give the planner
> more information about the data. I don't have a concrete proposal, but

Which is part of the problem... there's nothing to indicate we'll have
support for these improved hints anytime soon, especially if a number of
them depend on plan invalidation.

> here's some examples of hints I'd like to see:
>
> "table X sometimes has millions of records and sometimes it's empty"
> "Expression (table.foo = table2.bar * 2) has selectivity 0.99"
> "if foo.bar = 5 then foo.field2 IS NULL"
> "Column X is unique"
> "function foobar() always returns either 1 or 2, and it returns 2 90% of
> the time."
> "if it's Monday, then table NEW_ORDERS has a cardinality of 100000,
> otherwise 10."
>
> BTW: Do we make use of CHECK constraints in the planner? In DB2, that
> was one nice and clean way of hinting the planner about things. If I
> remember correctly, you could even define CHECK constraints that weren't
> actually checked at run-time, but were used by the planner.

I think you're right... and it is an elegant way to hint the planner.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: postgres hackers <pgsql-hackers(at)postgresql(dot)org>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-12 17:04:46
Message-ID: 1160672686.28751.76.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

> Hmmm, if you already understand Visual Basic syntax, should we support
> that too? Or maybe we should support MySQL's use of '0000-00-00' as the
> "zero" date because people "understand" that?

You completely misunderstood me... I have no idea about oracle hints,
never used Oracle in fact. My company uses oracle, but I have only very
very limited contact with oracle issues, and never touched a hint.

I'm only talking about ease of use, learning curves, and complexity in
general. While I do like the idea of an all automatic system optimizer
which takes your query portofolio and analyzes the data based on those
queries and creates you all the indexes you need and all that, that's
not gonna happen soon, because it's a very complex thing to implement.

The alternative is that you take your query portofolio, analyze it
yourself, figure out what statistics you need, create indexes, tweak
queries, hint the planner for correlations and stuff... which is a
complex task, and if you have to tell the server about some correlations
with the phase of the moon, you're screwed cause there will never be any
DB engine which will understand that.

But you always can put the corresponding hint in the query when you know
the correlation is there...

The problem is that the application sometimes really knows better than
the server, when the correlations are not standard.

> We're just not going to adopt a bad design because Oracle DBAs are used
> to it. If we wanted to do that, we could shut down the project and
> join a proprietary DB staff.

I have really nothing to do with Oracle. I think you guys are simply too
blinded by Oracle hate... I don't care about Oracle.

> The current discussion is:
>
> a) Planner tweaking is sometimes necessary;
> b) Oracle HINTS are a bad design for planner tweaking;

While there are plenty of arguments you made against query level hints
(can we not call them Oracle-hints ?), there are plenty of users of
postgres who expressed they would like them. I guess they were tweaking
postgres installations when they needed it, and not Oracle
installations. I expressed it clearly that for me query level hinting
would give more control and better understanding of what I have to do
for the desired result. Perfect planning -> forget it, I only care about
good enough with reasonable tuning effort. If I have to tweak statistics
I will NEVER be sure postgres will not backfire on me again. On the
other hand if I say never do a seq scan on this table for this query, I
could be sure it won't...

> c) Can we come up with a good design for planner tweaking?

Angles again: good enough now is better for end users, but programmers
always go for perfect tomorrow... pity.

Cheers,
Csaba.


From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hints proposal
Date: 2006-10-12 17:15:40
Message-ID: 1160673340.28751.83.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

> I'm not suggesting that we do that, but it seems better then embedding
> the hints in the queries themselves.

OK, what about this: if I execute the same query from a web client, I
want the not-so-optimal-but-safe plan, if I execute it asynchronously, I
let the planner choose the
best-overall-performance-but-sometimes-may-be-slow plan ?

What kind of statistics/table level hinting will get you this ?

I would say only query level hinting will buy you query level control.
And that's perfectly good in some situations.

I really can't see why a query-level hinting mechanism is so evil, why
it couldn't be kept forever, and augmented with the possibility of
correlation hinting, or table level hinting.

These are really solving different problems, with some overlapping...

Cheers,
Csaba.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc: postgres hackers <pgsql-hackers(at)postgresql(dot)org>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-12 17:18:17
Message-ID: 13954.1160673497@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Csaba Nagy <nagy(at)ecircle-ag(dot)com> writes:
> Until that point is achieved, the above proposal is one of the simplest
> to understand for the tweaking DBA, and the fastest to deploy when faced
> with catastrophic plans. And I would guess it is one of the simplest to
> be implemented and probably not very high maintenance either, although
> this is just a guess.

That guess is wrong ... but more to the point, if you think that "simple
and easy to implement" should be the overriding concern for designing a
new feature, see mysql. They've used that design approach for years and
look what a mess they've got. This project has traditionally done
things differently and I feel no need to change that mindset now.

regards, tom lane


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-12 17:45:03
Message-ID: 20061012174503.GA29203@phlogiston.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Thu, Oct 12, 2006 at 11:25:25AM -0500, Jim C. Nasby wrote:
> Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See
> also my comment below.

If I may argue in the other direction, speaking as one whose career
(if we may be generous enough to call it that) has been pretty much
exclusively on the operations end of things, I think that's an awful
idea.

There are two ways that quick-fix solve-the-problem-now hints are
going to be used. One is in the sort of one-off query that a DBA has
to run from time to time, that takes a long time, but that isn't
really a part of regular application load. The thing is, if you
already know your data well enough to provide a useful hint, you also
know your data well enough to work around the problem in the short
run (with some temp table tricks and the like).

The _other_ way it's going to be used is as a stealthy alteration to
regular behaviour, to solve a particular nasty performance problem
that happens to result on a given day. And every single time I've
seen anything like that done, the long term effect is always
monstrous. Two releases later, all your testing and careful
inspection and planning goes to naught one Saturday night at 3 am
(because we all know computers know what time it is _where you are_)
when the one-off trick that you pulled last quarter to solve the
manager's promise (which was made while out golfing, so nobody wrote
anything down) turns out to have a nasty effect now that the data
distribution is different. Or you think so. But now you're not
sure, because the code was tweaked a little to take some advantage of
something you now have because of the query plans that you ended up
getting because of the hint that was there because of the golf game,
so now if you start fiddling with the hints, maybe you break
something else. And you're tired, but the client is on the phone
from Hong King _right now_.

The second case is, from my experience, exactly the sort of thing you
want really a lot when the golf game is just over, and the sort of
thing you end up kicking yourself for in run-on sentences in the
middle of the night six months after the golf game is long since
forgotten.

The idea for knobs on the planner that allows the DBA to give
directed feedback, from which new planner enhancements can also come,
seems to me a really good idea. But any sort of quick and dirty hint
for right now gives me the willies.

A

--
Andrew Sullivan | ajs(at)crankycanuck(dot)ca
"The year's penultimate month" is not in truth a good way of saying
November.
--H.W. Fowler


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Andrew Sullivan" <ajs(at)crankycanuck(dot)ca>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-12 18:21:55
Message-ID: b42b73150610121121o50c2b082l2b13d38fc3e00820@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On 10/12/06, Andrew Sullivan <ajs(at)crankycanuck(dot)ca> wrote:
> On Thu, Oct 12, 2006 at 11:25:25AM -0500, Jim C. Nasby wrote:
> > Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See
> > also my comment below.
>
> If I may argue in the other direction, speaking as one whose career
> (if we may be generous enough to call it that) has been pretty much
> exclusively on the operations end of things, I think that's an awful
> idea.
>
> There are two ways that quick-fix solve-the-problem-now hints are
> going to be used. One is in the sort of one-off query that a DBA has

third way: to solve the problem of data (especially constants) not
being available to the planner at the time the plan was generated.
this happens most often with prepared statements and sql udfs. note
that changes to the plan generation mechanism (i think proposed by
peter e a few weeks back) might also solve this.

In a previous large project I had to keep bitmap scan and seqscan off
all the time because of this problem (the project used a lot of
prepared statements).

or am i way off base here?

merlin


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-12 19:03:47
Message-ID: 20061012190347.GD29203@phlogiston.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Thu, Oct 12, 2006 at 02:21:55PM -0400, Merlin Moncure wrote:
> third way: to solve the problem of data (especially constants) not
> being available to the planner at the time the plan was generated.
> this happens most often with prepared statements and sql udfs. note
> that changes to the plan generation mechanism (i think proposed by
> peter e a few weeks back) might also solve this.

You're right about this, but you also deliver the reason why we don't
need hints for that: the plan generation mechanism is a better
solution to that problem. It's this latter thing that I keep coming
back to. As a user of PostgreSQL, the thing that I really like about
it is its pragmatic emphasis on correctness. In my experience, it's
a system that feels very UNIX-y: there's a willingness to accept
"80/20" answers to a problem in the event you at least have a way to
get the last 20, but the developers are opposed to anything that
seems really kludgey.

In the case you're talking about, it seems to me that addressing the
problems where they come from is a better solution that trying to
find some way to work around them. And most of the use-cases I hear
for a statement-level hints system fall into this latter category.

A
--
Andrew Sullivan | ajs(at)crankycanuck(dot)ca
Unfortunately reformatting the Internet is a little more painful
than reformatting your hard drive when it gets out of whack.
--Scott Morris


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hints proposal
Date: 2006-10-12 19:07:11
Message-ID: 1160680031.31966.128.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Thu, 2006-10-12 at 19:15 +0200, Csaba Nagy wrote:
> > I'm not suggesting that we do that, but it seems better then embedding
> > the hints in the queries themselves.
>
> OK, what about this: if I execute the same query from a web client, I
> want the not-so-optimal-but-safe plan, if I execute it asynchronously, I
> let the planner choose the
> best-overall-performance-but-sometimes-may-be-slow plan ?
>

Connect as a different user to control whether the hint matches or not.
If this doesn't work for you, read below.

> What kind of statistics/table level hinting will get you this ?
>

It's based not just on the table, but on environment as well, such as
the user/role.

> I would say only query level hinting will buy you query level control.
> And that's perfectly good in some situations.

My particular proposal allows arbitrary regexes on the raw query. You
could add a comment with a "query id" in it.

My proposal has these advantages over query comments:
(1) Most people's needs would be solved by just matching the query
form.
(2) If the DBA really wanted to separate out queries individually (not
based on the query form), he could do it, but it would have an extra
step that might encourage him to reconsider the necessity
(3) If someone went to all that work to shoot themselves in the foot
with unmanagable hints that are way too specific, the postgres
developers are unlikely to be blamed
(4) No backwards compatibility issues that I can see, aside from people
making their own hints unmanagable. If someone started getting bad
plans, they could just remove all the hints from the system catalogs and
it would be just as if they had never used hints. If they added ugly
comments to their queries it wouldn't really have a bad effect.

To formalize the proposal a litte, you could have syntax like:

CREATE HINT [FOR USER username] MATCHES regex APPLY HINT some_hint;

Where "some_hint" would be a hinting language perhaps like Jim's, except
not guaranteed to be compatible between versions of PostgreSQL. The
developers could change the hinting language at every release and people
can just re-write the hints without changing their application.

> I really can't see why a query-level hinting mechanism is so evil, why
> it couldn't be kept forever, and augmented with the possibility of
> correlation hinting, or table level hinting.

Well, I wouldn't say "evil". Query hints are certainly against the
principles of a relational database, which separate the logical query
from the physical storage.

Regards,
Jeff Davis


From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-12 19:24:10
Message-ID: 20061012192410.GU28647@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Thu, Oct 12, 2006 at 09:40:30AM -0700, Josh Berkus wrote:
> Jim,
>
> >>>These hints would outright force the planner to do things a certain way.
> >>>... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */
> >>This proposal seems to deliberately ignore every point that has been
> >>made *against* doing things that way. It doesn't separate the hints
> >>from the queries, it doesn't focus on fixing the statistical or cost
> >>misestimates that are at the heart of the issue, and it takes no account
> >>of the problem of hints being obsoleted by system improvements.
> >
> >Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See
> >also my comment below.
>
> I don't see how adding extra tags to queries is easier to implement than
> an ability to modify the system catalogs. Quite the opposite, really.
>
> And, as I said, if you're going to push for a feature that will be
> obsolesced in one version, then you're going to have a really rocky row
> to hoe.

Unless you've got a time machine or a team of coders in your back
pocket, I don't see how the planner will suddenly become perfect in
8.4...

> >Yes, but as I mentioned the idea here was to come up with something that
> >is (hopefully) easy to define and implement. In other words, something
> >that should be doable for 8.3. Because this proposal essentially amounts
> >to limiting plans the planner will consider and tweaking it's cost
> >estimates, I'm hoping that it should be (relatively) easy to implement.
>
> Even I, the chief marketing geek, am more concerned with getting a
> feature that we will still be proud of in 5 years than getting one in
> the next nine months. Keep your pants on!

Hey, I wrote that email while dressed! :P

We've been seeing the same kinds of problems that are very difficult (or
impossible) to fix cropping up for literally years... it'd be really
good to at least be able to force the planner to do the sane thing even
if we don't have the manpower to fix it right now...

> I actually think the way to attack this issue is to discuss the kinds of
> errors the planner makes, and what tweaks we could do to correct them.
> Here's the ones I'm aware of:
>
> -- Incorrect selectivity of WHERE clause
> -- Incorrect selectivity of JOIN
> -- Wrong estimate of rows returned from SRF
> -- Incorrect cost estimate for index use
>
> Can you think of any others?

There's a range of correlations where the planner will incorrectly
choose a seqscan over an indexscan.

Function problems aren't limited to SRFs... we have 0 statistics ability
for functions.

There's the whole issue of multi-column statistics.

> I also feel that a tenet of the design of the "planner tweaks" system
> ought to be that the tweaks are collectible and analyzable in some form.
> This would allow DBAs to mail in their tweaks to -performance or
> -hackers, and then allow us to continue improving the planner.

Well, one nice thing about the per-query method is you can post before
and after EXPLAIN ANALYZE along with the hints. But yes, as we move
towards a per-table/index/function solution, there should be an easy way
to see how those hints are affecting the system and to report that data
back to the community.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Hints proposal
Date: 2006-10-12 19:34:15
Message-ID: 20061012193414.GV28647@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Thu, Oct 12, 2006 at 09:42:55AM -0700, Jeff Davis wrote:
> On Thu, 2006-10-12 at 10:14 -0500, Jim C. Nasby wrote:
> > The syntax these hints is something arbitrary. I'm borrowing Oracle's
> > idea of embedding hints in comments, but we can use some other method if
> > desired. Right now I'm more concerned with getting the general idea
> > across.
> >
>
> Is there any advantage to having the hints in the queries? To me that's
> asking for trouble with no benefit at all. It would seem to me to be
> better to have a system catalog that defined hints as something like:
>
> "If user A executes a query matching regex R, then coerce (or force) the
> planner in this way."
>
> I'm not suggesting that we do that, but it seems better then embedding
> the hints in the queries themselves.

My experience is that on the occasions when I want to beat the planner
into submission, it's usually a pretty complex query that's the issue,
and that it's unlikely to have more than a handful of them in the
application. That makes me think a regex facility would just get in the
way, but perhaps others have much more extensive need of hinting.

I also suspect that writing that regex could become a real bear.

Having said that... I see no reason why it couldn't work... but the real
challenge is defining the hints.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-12 20:58:22
Message-ID: 200610121358.22739.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Jim,

> > I don't see how adding extra tags to queries is easier to implement
> > than an ability to modify the system catalogs. Quite the opposite,
> > really.
> >
> > And, as I said, if you're going to push for a feature that will be
> > obsolesced in one version, then you're going to have a really rocky
> > row to hoe.
>
> Unless you've got a time machine or a team of coders in your back
> pocket, I don't see how the planner will suddenly become perfect in
> 8.4...

Since you're not a core code contributor, I really don't see why you
continue to claim that query hints are going to be easier to implement
than relation-level statistics modification. You think it's easier, but
the people who actually work on the planner don't believe that it is.

> We've been seeing the same kinds of problems that are very difficult (or
> impossible) to fix cropping up for literally years... it'd be really
> good to at least be able to force the planner to do the sane thing even
> if we don't have the manpower to fix it right now...

As I've said to other people on this thread, you keep making the incorrect
assumption that Oracle-style query hints are the only possible way of
manual nuts-and-bolts query tuning. They are not.

> > I actually think the way to attack this issue is to discuss the kinds
> > of errors the planner makes, and what tweaks we could do to correct
> > them. Here's the ones I'm aware of:
> >
> > -- Incorrect selectivity of WHERE clause
> > -- Incorrect selectivity of JOIN
> > -- Wrong estimate of rows returned from SRF
> > -- Incorrect cost estimate for index use
> >
> > Can you think of any others?
>
> There's a range of correlations where the planner will incorrectly
> choose a seqscan over an indexscan.

Please list some if you have ones which don't fall into one of the four
problems above.

> Function problems aren't limited to SRFs... we have 0 statistics ability
> for functions.
>
> There's the whole issue of multi-column statistics.

Sure, but again that falls into the category of "incorrect selectivity for
WHERE/JOIN". Don't make things more complicated than they need to be.

> Well, one nice thing about the per-query method is you can post before
> and after EXPLAIN ANALYZE along with the hints.

One bad thing is that application designers will tend to use the hint, fix
the immediate issue, and never report a problem at all. And query hints
would not be collectable in any organized way except the query log, which
would then require very sophisticated text parsing to get any useful
information at all.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


From: Arjen van der Meijden <acmmailing(at)tweakers(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Csaba Nagy <nagy(at)ecircle-ag(dot)com>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hints proposal
Date: 2006-10-12 21:07:12
Message-ID: 452EAE80.5030501@tweakers.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On 12-10-2006 21:07 Jeff Davis wrote:
> On Thu, 2006-10-12 at 19:15 +0200, Csaba Nagy wrote:
>
> To formalize the proposal a litte, you could have syntax like:
>
> CREATE HINT [FOR USER username] MATCHES regex APPLY HINT some_hint;
>
> Where "some_hint" would be a hinting language perhaps like Jim's, except
> not guaranteed to be compatible between versions of PostgreSQL. The
> developers could change the hinting language at every release and people
> can just re-write the hints without changing their application.

There are some disadvantages of not writing the hints in a query. But of
course there are disadvantages to do as well ;)

One I can think of is that it can be very hard to define which hint
should apply where. Especially in complex queries, defining at which
point exaclty you'd like your hint to work is not a simple matter,
unless you can just place a comment right at that position.

Say you have a complex query with several joins of the same table. And
in all but one of those joins postgresql actually chooses the best
option, but somehow you keep getting some form of join while a nested
loop would be best. How would you pinpoint just that specific clause,
while the others remain "unhinted" ?

Your approach seems to be a bit similar to aspect oriented programming
(in java for instance). You may need a large amount of information about
the queries and it is likely a "general" regexp with "general" hint will
not do much good (at least I expect a hinting-system to be only useable
in corner cases and very specific points in a query).

By the way, wouldn't it be possible if the planner learned from a query
execution, so it would know if a choice for a specific plan or estimate
was actually correct or not for future reference? Or is that in the line
of DB2's complexity and a very hard problem and/or would it add too much
overhead?

Best regards,

Arjen


From: "Bucky Jordan" <bjordan(at)lumeta(dot)com>
To: <josh(at)agliodbs(dot)com>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-12 21:19:29
Message-ID: 78ED28FACE63744386D68D8A9D1CF5D4209CA5@MAIL.corp.lumeta.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

> > Well, one nice thing about the per-query method is you can post
before
> > and after EXPLAIN ANALYZE along with the hints.
>
> One bad thing is that application designers will tend to use the hint,
fix
> the immediate issue, and never report a problem at all. And query
hints
> would not be collectable in any organized way except the query log,
which
> would then require very sophisticated text parsing to get any useful
> information at all.
>
Or they'll report it when the next version of Postgres "breaks" their
app because the hints changed, or because the planner does something
else which makes those hints obsolete.

My main concern with hints (aside from the fact I'd rather see more
intelligence in the planner/stats) is managing them appropriately. I
have two general types of SQL where I'd want to use hints- big OLAP
stuff (where I have a lot of big queries, so it's not just one or two
where I'd need them) or large dynamically generated queries (Users
building custom queries). Either way, I don't want to put them on a
query itself.

What about using regular expressions, plus, if you have a function
(views, or any other statement that is stored), you can assign a rule to
that particular function. So you get matching, plus explicit selection.
This way it's easy to find all your hints, turn them off, manage them,
etc. (Not to mention dynamically generated SQL is ugly enough without
having to put hints in there).

- Bucky


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Arjen van der Meijden <acmmailing(at)tweakers(dot)net>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, pgsql-hackers(at)postgreSQL(dot)org, postgres performance list <pgsql-performance(at)postgreSQL(dot)org>
Subject: Re: Hints proposal
Date: 2006-10-12 21:28:48
Message-ID: 19693.1160688528@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

[ trying once again to push this thread over to -hackers where it belongs ]

Arjen van der Meijden <acmmailing(at)tweakers(dot)net> writes:
> On 12-10-2006 21:07 Jeff Davis wrote:
>> On Thu, 2006-10-12 at 19:15 +0200, Csaba Nagy wrote:
>> To formalize the proposal a litte, you could have syntax like:
>> CREATE HINT [FOR USER username] MATCHES regex APPLY HINT some_hint;
>>
>> Where "some_hint" would be a hinting language perhaps like Jim's, except
>> not guaranteed to be compatible between versions of PostgreSQL. The
>> developers could change the hinting language at every release and people
>> can just re-write the hints without changing their application.

Do you have any idea how much push-back there would be to that? In
practice we'd be bound by backwards-compatibility concerns for the hints
too.

> There are some disadvantages of not writing the hints in a query. But of
> course there are disadvantages to do as well ;)

> One I can think of is that it can be very hard to define which hint
> should apply where. Especially in complex queries, defining at which
> point exaclty you'd like your hint to work is not a simple matter,
> unless you can just place a comment right at that position.

The problems that you are seeing all come from the insistence that a
hint should be textually associated with a query. Using a regex is a
little better than putting it right into the query, but the only thing
that really fixes is not having the hints directly embedded into
client-side code. It's still wrong at the conceptual level.

The right way to think about it is to ask why is the planner not picking
the right plan to start with --- is it missing a statistical
correlation, or are its cost parameters wrong for a specific case, or
is it perhaps unable to generate the desired plan at all? (If the
latter, no amount of hinting is going to help.) If it's a statistics or
costing problem, I think the right thing is to try to fix it with hints
at that level. You're much more likely to fix the behavior across a
class of queries than you will be with a hint textually matched to a
specific query.

regards, tom lane


From: Richard Broersma Jr <rabroersma(at)yahoo(dot)com>
To: Arjen van der Meijden <acmmailing(at)tweakers(dot)net>, Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Csaba Nagy <nagy(at)ecircle-ag(dot)com>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hints proposal
Date: 2006-10-12 21:41:11
Message-ID: 20061012214111.90031.qmail@web31807.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

> By the way, wouldn't it be possible if the planner learned from a query
> execution, so it would know if a choice for a specific plan or estimate
> was actually correct or not for future reference? Or is that in the line
> of DB2's complexity and a very hard problem and/or would it add too much
> overhead?

Just thinking out-loud here...

Wow, a learning cost based planner sounds a-lot like problem for control & dynamical systems
theory. As I understand it, much of the advice given for setting PostgreSQL's tune-able
parameters are from "RULES-OF-THUMB." I am sure that effect on server performance from all of the
parameters could be modeled and an adaptive feed-back controller could be designed to tuned these
parameters as demand on the server changes.

Al-thought, I suppose that a controller like this would have limited success since some of the
most affective parameters are non-run-time tune-able.

In regards to query planning, I wonder if there is way to model a controller that could
adjust/alter query plans based on a comparison of expected and actual query execution times.

Regards,

Richard Broersma Jr.


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Bucky Jordan <bjordan(at)lumeta(dot)com>
Cc: josh(at)agliodbs(dot)com, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-12 22:02:07
Message-ID: 20061012220207.GI18764@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Bucky Jordan wrote:

> What about using regular expressions, plus, if you have a function
> (views, or any other statement that is stored), you can assign a rule to
> that particular function. So you get matching, plus explicit selection.
> This way it's easy to find all your hints, turn them off, manage them,
> etc. (Not to mention dynamically generated SQL is ugly enough without
> having to put hints in there).

The regular expression idea that's being floated around makes my brain
feel like somebody is screeching a blackboard nearby. I don't think
it's a sane idea. I think you could achieve something similar by using
stored plan representations, like we do for rewrite rules. So you'd
look for, say, a matching join combination in a catalog, and get a
selectivity from a function that would get the selectivities of the
conditions on the base tables. Or something like that anyway.

That gets ugly pretty fast when you have to extract selectivities for
all the possible join paths in any given query.

But please don't talk about regular expressions.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: Arjen van der Meijden <acmmailing(at)tweakers(dot)net>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, postgres performance list <pgsql-performance(at)postgreSQL(dot)org>
Subject: Re: Hints proposal
Date: 2006-10-12 22:15:03
Message-ID: 1160691303.31966.139.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Thu, 2006-10-12 at 17:28 -0400, Tom Lane wrote:
> [ trying once again to push this thread over to -hackers where it belongs ]
>
> Arjen van der Meijden <acmmailing(at)tweakers(dot)net> writes:
> > On 12-10-2006 21:07 Jeff Davis wrote:
> >> On Thu, 2006-10-12 at 19:15 +0200, Csaba Nagy wrote:
> >> To formalize the proposal a litte, you could have syntax like:
> >> CREATE HINT [FOR USER username] MATCHES regex APPLY HINT some_hint;
> >>
> >> Where "some_hint" would be a hinting language perhaps like Jim's, except
> >> not guaranteed to be compatible between versions of PostgreSQL. The
> >> developers could change the hinting language at every release and people
> >> can just re-write the hints without changing their application.
>
> Do you have any idea how much push-back there would be to that? In
> practice we'd be bound by backwards-compatibility concerns for the hints
> too.
>

No, I don't have any idea, except that it would be less push-back than
changing a language that's embedded in client code. Also, I see no
reason to think that a hint would not be obsolete upon a new release
anyway.

> The problems that you are seeing all come from the insistence that a
> hint should be textually associated with a query. Using a regex is a
> little better than putting it right into the query, but the only thing

"Little better" is all I was going for. I was just making the
observation that we can separate two concepts:
(1) Embedding code in the client's queries, which I see as very
undesirable and unnecessary
(2) Providing very specific hints

which at least gives us a place to talk about the debate more
reasonably.

> that really fixes is not having the hints directly embedded into
> client-side code. It's still wrong at the conceptual level.
>

I won't disagree with that. I will just say it's no more wrong than
applying the same concept in addition to embedding the hints in client
queries.

> The right way to think about it is to ask why is the planner not picking
> the right plan to start with --- is it missing a statistical
> correlation, or are its cost parameters wrong for a specific case, or
> is it perhaps unable to generate the desired plan at all? (If the
> latter, no amount of hinting is going to help.) If it's a statistics or
> costing problem, I think the right thing is to try to fix it with hints
> at that level. You're much more likely to fix the behavior across a
> class of queries than you will be with a hint textually matched to a
> specific query.
>

Agreed.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Hints proposal
Date: 2006-10-12 22:41:00
Message-ID: 1160692860.31966.153.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Thu, 2006-10-12 at 14:34 -0500, Jim C. Nasby wrote:
> On Thu, Oct 12, 2006 at 09:42:55AM -0700, Jeff Davis wrote:
> > On Thu, 2006-10-12 at 10:14 -0500, Jim C. Nasby wrote:
> > > The syntax these hints is something arbitrary. I'm borrowing Oracle's
> > > idea of embedding hints in comments, but we can use some other method if
> > > desired. Right now I'm more concerned with getting the general idea
> > > across.
> > >
> >
> > Is there any advantage to having the hints in the queries? To me that's
> > asking for trouble with no benefit at all. It would seem to me to be
> > better to have a system catalog that defined hints as something like:
> >
> > "If user A executes a query matching regex R, then coerce (or force) the
> > planner in this way."
> >
> > I'm not suggesting that we do that, but it seems better then embedding
> > the hints in the queries themselves.
>
> My experience is that on the occasions when I want to beat the planner
> into submission, it's usually a pretty complex query that's the issue,
> and that it's unlikely to have more than a handful of them in the
> application. That makes me think a regex facility would just get in the
> way, but perhaps others have much more extensive need of hinting.
>
> I also suspect that writing that regex could become a real bear.
>

Well, writing the regex is just matching criteria to apply the hint. If
you really need a quick fix, you can just write a comment with a query
id number in the query. The benefit there is that when the hint is
obsolete later (as the planner improves, or data changes
characteristics) you drop the hint and the query is planned without
interference. No application changes required.

Also, and perhaps more importantly, let's say you are trying to improve
the performance of an existing application where it's impractical to
change the query text (24/7 app, closed source, etc.). You can still
apply a hint if you're willing to write the regex. Just enable query
logging or some such to capture the query, and copy it verbatim except
for a few parameters which are unknown. Instant regex. If you have to
change the query text to apply the hint, it would be impossible in this
case.

> Having said that... I see no reason why it couldn't work... but the real
> challenge is defining the hints.

Right. The only thing I was trying to solve was the problems associated
with the hint itself embedded in the client code. I view that as a
problem that doesn't need to exist.

I'll leave it to smarter people to either improve the planner or develop
a hinting language. I don't even need hints myself, just offering a
suggestion.

Regards,
Jeff Davis


From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Hints proposal
Date: 2006-10-13 02:54:02
Message-ID: 87r6xd9b91.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Quoth rabroersma(at)yahoo(dot)com (Richard Broersma Jr):
>> By the way, wouldn't it be possible if the planner learned from a query
>> execution, so it would know if a choice for a specific plan or estimate
>> was actually correct or not for future reference? Or is that in the line
>> of DB2's complexity and a very hard problem and/or would it add too much
>> overhead?
>
> Just thinking out-loud here...
>
> Wow, a learning cost based planner sounds a-lot like problem for
> control & dynamical systems theory.

Alas, dynamic control theory, home of considerable numbers of
Hamiltonian equations, as well as Pontryagin's Minimum Principle, is
replete with:
a) Gory multivariate calculus
b) Need for all kinds of continuity requirements (e.g. - continuous,
smooth functions with no discontinuities or other "nastiness")
otherwise the math gets *really* nasty

We don't have anything even resembling "continuous" because our
measures are all discrete (e.g. - the base values are all integers).

> As I understand it, much of the advice given for setting
> PostgreSQL's tune-able parameters are from "RULES-OF-THUMB." I am
> sure that effect on server performance from all of the parameters
> could be modeled and an adaptive feed-back controller could be
> designed to tuned these parameters as demand on the server changes.

Optimal control theory loves the "bang-bang" control, where you go to
one extreme or another, which requires all those continuity conditions
I mentioned, and is almost certainly not the right answer here.

> Al-thought, I suppose that a controller like this would have limited
> success since some of the most affective parameters are non-run-time
> tune-able.
>
> In regards to query planning, I wonder if there is way to model a
> controller that could adjust/alter query plans based on a comparison
> of expected and actual query execution times.

I think there would be something awesomely useful about recording
expected+actual statistics along with some of the plans.

The case that is easiest to argue for is where Actual >>> Expected
(e.g. - Actual "was a whole lot larger than" Expected); in such cases,
you've already spent a LONG time on the query, which means that
spending millisecond recording the moral equivalent to "Explain
Analyze" output should be an immaterial cost.

If we could record a whole lot of these cases, and possibly, with some
anonymization / permissioning, feed the data to a central place, then
some analysis could be done to see if there's merit to particular
modifications to the query plan cost model.

Part of the *really* fundamental query optimization problem is that
there seems to be some evidence that the cost model isn't perfectly
reflective of the costs of queries. Improving the quality of the cost
model is one of the factors that would improve the performance of the
query optimizer. That would represent a fundamental improvement.
--
let name="cbbrowne" and tld="gmail.com" in name ^ "@" ^ tld;;
http://linuxdatabases.info/info/languages.html
"If I can see farther it is because I am surrounded by dwarves."
-- Murray Gell-Mann


From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-13 03:12:29
Message-ID: 87lknkaoyq.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Quoth pgsql(at)j-davis(dot)com (Jeff Davis):
> On Thu, 2006-10-12 at 17:28 -0400, Tom Lane wrote:
>> [ trying once again to push this thread over to -hackers where it belongs ]
>>
>> Arjen van der Meijden <acmmailing(at)tweakers(dot)net> writes:
>> > On 12-10-2006 21:07 Jeff Davis wrote:
>> >> On Thu, 2006-10-12 at 19:15 +0200, Csaba Nagy wrote:
>> >> To formalize the proposal a litte, you could have syntax like:
>> >> CREATE HINT [FOR USER username] MATCHES regex APPLY HINT some_hint;
>> >>
>> >> Where "some_hint" would be a hinting language perhaps like
>> >> Jim's, except not guaranteed to be compatible between versions
>> >> of PostgreSQL. The developers could change the hinting language
>> >> at every release and people can just re-write the hints without
>> >> changing their application.
>>
>> Do you have any idea how much push-back there would be to that? In
>> practice we'd be bound by backwards-compatibility concerns for the
>> hints too.
>
> No, I don't have any idea, except that it would be less push-back
> than changing a language that's embedded in client code. Also, I see
> no reason to think that a hint would not be obsolete upon a new
> release anyway.

I see *plenty* of reason.

1. Suppose the scenario where Hint h was useful hasn't been affected
by *any* changes in how the query planner works in the new
version, it *obviously* continues to be necessary.

2. If Version n+0.1 hasn't resolved all/most cases where Hint h was
useful in Version n, then people will entirely reasonably expect
for Hint h to continue to be in effect in version n+0.1

3. Suppose support for Hint h is introduced in PostgreSQL version
n, and an optimization that makes it obsolete does not arrive
until version n+0.3, which is quite possible. That hint has been
carried forward for 2 versions already, long enough for client
code that contains it to start to ossify. (After all, if
developers get promoted to new projects every couple of years,
two versions is plenty of time for the original programmer to
be gone...)

That's not just one good reason, but three.

>> The problems that you are seeing all come from the insistence that a
>> hint should be textually associated with a query. Using a regex is a
>> little better than putting it right into the query, but the only thing
>
> "Little better" is all I was going for. I was just making the
> observation that we can separate two concepts:
> (1) Embedding code in the client's queries, which I see as very
> undesirable and unnecessary
> (2) Providing very specific hints
>
> which at least gives us a place to talk about the debate more
> reasonably.

It seems to me that there is a *LOT* of merit in trying to find
alternatives to embedding code into client queries, to be sure.

>> that really fixes is not having the hints directly embedded into
>> client-side code. It's still wrong at the conceptual level.
>
> I won't disagree with that. I will just say it's no more wrong than
> applying the same concept in addition to embedding the hints in client
> queries.
>
>> The right way to think about it is to ask why is the planner not
>> picking the right plan to start with --- is it missing a
>> statistical correlation, or are its cost parameters wrong for a
>> specific case, or is it perhaps unable to generate the desired plan
>> at all? (If the latter, no amount of hinting is going to help.)
>> If it's a statistics or costing problem, I think the right thing is
>> to try to fix it with hints at that level. You're much more likely
>> to fix the behavior across a class of queries than you will be with
>> a hint textually matched to a specific query.
>
> Agreed.

That's definitely a useful way to look at the issue, which seems to be
lacking in many of the cries for hints.

Perhaps I'm being unfair, but it often seems that people demanding
hinting systems are uninterested in why the planner is getting things
wrong. Yes, they have an immediate problem (namely the wrong plan
that is getting generated) that they want to resolve.

But I'm not sure that you can get anything out of hinting without
coming close to answering "why the planner got it wrong."
--
"cbbrowne","@","gmail.com"
http://linuxfinances.info/info/lsf.html
"Optimization hinders evolution." -- Alan Perlis


From: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
To: "Andrew Sullivan" <ajs(at)crankycanuck(dot)ca>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-13 08:41:36
Message-ID: E1539E0ED7043848906A8FF995BDA579016A0981@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance


> The _other_ way it's going to be used is as a stealthy
> alteration to regular behaviour, to solve a particular nasty
> performance problem that happens to result on a given day.
> And every single time I've seen anything like that done, the
> long term effect is always monstrous.

Funny, I very seldom use Informix hints (mostly none, maybe 2 per
project),
but I have yet to see one that backfires on me, even lightly.
I use hints like: don't use that index, use that join order, use that
index

Can you give us an example that had such a monstrous effect in Oracle,
other than that the hint was a mistake in the first place ?

Andreas


From: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
To: "Josh Berkus" <josh(at)agliodbs(dot)com>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-13 09:07:29
Message-ID: E1539E0ED7043848906A8FF995BDA579016A0990@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance


> I actually think the way to attack this issue is to discuss the kinds
of errors the planner makes, and what tweaks we could do to correct
them.
> Here's the ones I'm aware of:
>
> -- Incorrect selectivity of WHERE clause
> -- Incorrect selectivity of JOIN
> -- Wrong estimate of rows returned from SRF
> -- Incorrect cost estimate for index use
>
> Can you think of any others?

I think your points are too generic, there is no way to get them all
100% correct from statistical
data even with data hints (and it is usually not at all necessary for
good enough plans).
I think we need to more precisely define the problems of our system with
point in time statistics

-- no reaction to degree of other concurrent activity
-- no way to react to abnormal skew that only persists for a very short
duration
-- too late reaction to changing distribution (e.g. current date column
when a new year starts)
and the variant: too late adaption when a table is beeing filled
-- missing cost/selectivity estimates for several parts of the system

Andreas


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-13 14:09:15
Message-ID: 20061013140915.GC31912@phlogiston.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Fri, Oct 13, 2006 at 10:41:36AM +0200, Zeugswetter Andreas ADI SD wrote:
> Can you give us an example that had such a monstrous effect in Oracle,
> other than that the hint was a mistake in the first place ?

Of course the hint was a mistake in the first place; the little story
I told was exactly an example of such a case. The hint shouldn't
have been put in place at the beginning; instead, the root cause
should have been uncovered. It was not, the DBA added a hint, and
later that hint turned out to have unfortunate consequences for
some other use case. And it's a long-term monstrosity, remember,
not a short one: the problem is in maintenance overall.

This is a particularly sensitive area for PostgreSQL, because the
planner has been making giant leaps forward with every release.
Indeed, as Oracle's planner got better, the hints people had in place
sometimes started to cause them to have to re-tune everything. My
Oracle-using acquaintances tell me this has gotten better in recent
releases; but in the last two days, one person pointed out that hints
are increasingly relied on by one part of Oracle, even as another
Oracle application insists that they never be used. That's exactly
the sort of disagreement I'd expect to see when people have come to
rely on what is basically a kludge in the first place.

And remember, the places where PostgreSQL is getting used most
heavily are still the sort of environments where people will take a
lot of short cuts to achieve an immediate result, and be annoyed when
that short cut later turns out to have been expensive. Postgres will
get a black eye from that ("Too hard to manage! Upgrades cause all
sorts of breakage!").

A

--
Andrew Sullivan | ajs(at)crankycanuck(dot)ca
The plural of anecdote is not data.
--Roger Brinner


From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
Cc: postgres hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-13 14:20:08
Message-ID: 1160749207.28751.115.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

> And remember, the places where PostgreSQL is getting used most
> heavily are still the sort of environments where people will take a
> lot of short cuts to achieve an immediate result, and be annoyed when
> that short cut later turns out to have been expensive. Postgres will
> get a black eye from that ("Too hard to manage! Upgrades cause all
> sorts of breakage!").

Those guys will do their shortcuts anyway, and possibly reject postgres
as not suitable even before that if they can't take any shortcuts.

And upgrades are always causing breakage, I didn't have one upgrade
without some things to fix, so I would expect people is expecting that.
And that's true for Oracle too, our oracle guys always have something to
fix after an upgrade. And I repeat, I always had something to fix for
postgres too on all upgrades I've done till now.

Cheers,
Csaba.


From: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
To: "Andrew Sullivan" <ajs(at)crankycanuck(dot)ca>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-13 15:04:18
Message-ID: E1539E0ED7043848906A8FF995BDA579016A0ADC@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance


> > Can you give us an example that had such a monstrous effect in
Oracle,
> > other than that the hint was a mistake in the first place ?
>
> Of course the hint was a mistake in the first place; the
> little story I told was exactly an example of such a case.
> The hint shouldn't have been put in place at the beginning;
> instead, the root cause should have been uncovered.

This is not an example. For us to understand, we need an actual case
with syntax and all, and what happened.

Imho the use of a stupid hint, that was added without analyzing
the cause and background of the problem is no proof that statement hints
are bad,
only that the person involved was not doing his job.

Andreas


From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-13 16:16:14
Message-ID: 20061013161614.GT28647@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Thu, Oct 12, 2006 at 01:58:22PM -0700, Josh Berkus wrote:
> > Unless you've got a time machine or a team of coders in your back
> > pocket, I don't see how the planner will suddenly become perfect in
> > 8.4...
>
> Since you're not a core code contributor, I really don't see why you
> continue to claim that query hints are going to be easier to implement
> than relation-level statistics modification. You think it's easier, but
> the people who actually work on the planner don't believe that it is.

Well, that's not what I said (my point being that until the planner and
stats are perfect you need a way to over-ride them)... but I've also
never said hints would be faster or easier than stats modification (I
said I hope they would). But we'll never know which will be faster or
easier until there's actually a proposal for improving the stats.

> > We've been seeing the same kinds of problems that are very difficult (or
> > impossible) to fix cropping up for literally years... it'd be really
> > good to at least be able to force the planner to do the sane thing even
> > if we don't have the manpower to fix it right now...
>
> As I've said to other people on this thread, you keep making the incorrect
> assumption that Oracle-style query hints are the only possible way of
> manual nuts-and-bolts query tuning. They are not.

No, I've never said that. What I've said is a) I doubt that any system
will always be correct for every query, meaning you need to be able to
change things on a per-query basis, and b) I'm hoping that simple hints
will be easy enough to implement that they can go into 8.3.

I completely agree that it's much better *in the long run* to improve
the planner and the statistics system so that we don't need hints. But
there's been no plan put forward for how to do that, which means we also
have no idea when some of these problems will be resolved. If someone
comes up with a plan for that, then we can actually look at which options
are better and how soon we can get fixes for these problems in place.

Unfortunately, this problem is difficult enough that I suspect it could
take a long time just to come up with an idea of how to fix these
problems, which means that without some way to override the planner our
users are stuck in the same place for the foreseeable future. If that
turns out to be the case, then I think we should implement per-query
hints now so that users can handle bad plans while we focus on how to
improve the stats and planner so that in the future hints will become
pointless.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-13 16:36:04
Message-ID: 25352.1160757364@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

"Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> I completely agree that it's much better *in the long run* to improve
> the planner and the statistics system so that we don't need hints. But
> there's been no plan put forward for how to do that, which means we also
> have no idea when some of these problems will be resolved.

You keep arguing on the assumption that the planner is static and
there's no one working on it. That is false --- although this thread
is certainly wasting a lot of time that could have been used more
productively ;-).

I also dispute your assumption that hints of the style you propose
will be easier to implement or maintain than the sort of
statistical-assumption tweaking that's been counter-proposed. Just for
starters, how are you going to get those hints through the parser and
rewriter? That's going to take an entire boatload of very ugly code
that isn't needed at all in a saner design.

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Christopher Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-13 16:45:54
Message-ID: 1160757954.31966.215.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Thu, 2006-10-12 at 23:12 -0400, Christopher Browne wrote:
> > No, I don't have any idea, except that it would be less push-back
> > than changing a language that's embedded in client code. Also, I see
> > no reason to think that a hint would not be obsolete upon a new
> > release anyway.
>
> I see *plenty* of reason.
>
> 1. Suppose the scenario where Hint h was useful hasn't been affected
> by *any* changes in how the query planner works in the new
> version, it *obviously* continues to be necessary.
>
> 2. If Version n+0.1 hasn't resolved all/most cases where Hint h was
> useful in Version n, then people will entirely reasonably expect
> for Hint h to continue to be in effect in version n+0.1
>

Fair enough. I had considered those situations, but a lot of people are
talking about "I need a better plan now, can't wait for planner
improvements". Also, even if the hint is still useful, I would think
that on a new version you'd want to test to see how useful it still is.

> 3. Suppose support for Hint h is introduced in PostgreSQL version
> n, and an optimization that makes it obsolete does not arrive
> until version n+0.3, which is quite possible. That hint has been
> carried forward for 2 versions already, long enough for client
> code that contains it to start to ossify. (After all, if
> developers get promoted to new projects every couple of years,
> two versions is plenty of time for the original programmer to
> be gone...)

Ok, that is a good reason. But it's not helped at all by putting the
hints in the queries themselves.

> > "Little better" is all I was going for. I was just making the
> > observation that we can separate two concepts:
> > (1) Embedding code in the client's queries, which I see as very
> > undesirable and unnecessary
> > (2) Providing very specific hints
> >
> > which at least gives us a place to talk about the debate more
> > reasonably.
>
> It seems to me that there is a *LOT* of merit in trying to find
> alternatives to embedding code into client queries, to be sure.
>

I think almost any alternative to client query hints is worth
considering.

> >> The right way to think about it is to ask why is the planner not
> >> picking the right plan to start with --- is it missing a
> >> statistical correlation, or are its cost parameters wrong for a
> >> specific case, or is it perhaps unable to generate the desired plan
> >> at all? (If the latter, no amount of hinting is going to help.)
> >> If it's a statistics or costing problem, I think the right thing is
> >> to try to fix it with hints at that level. You're much more likely
> >> to fix the behavior across a class of queries than you will be with
> >> a hint textually matched to a specific query.
> >
> > Agreed.
>
> That's definitely a useful way to look at the issue, which seems to be
> lacking in many of the cries for hints.
>
> Perhaps I'm being unfair, but it often seems that people demanding
> hinting systems are uninterested in why the planner is getting things
> wrong. Yes, they have an immediate problem (namely the wrong plan
> that is getting generated) that they want to resolve.
>
> But I'm not sure that you can get anything out of hinting without
> coming close to answering "why the planner got it wrong."

Right. And it's not always easy to determine why the planner got it
wrong without making it execute other plans through hinting :)

Note: I'll restate this just to be clear. I'm not advocating an overly-
specific, band-aid style hinting language. My only real concern is that
if one appears, I would not like it to appear in the client's queries.

Same goes for more general kinds of hints. We don't want a bunch of
client queries to contain comments like "table foo has a
random_page_cost of 1.1". That belongs in the system catalogs.

Regards,
Jeff Davis


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: josh(at)agliodbs(dot)com
Cc: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-13 16:46:45
Message-ID: 87odsg17uy.fsf@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Josh Berkus <josh(at)agliodbs(dot)com> writes:

>> > I actually think the way to attack this issue is to discuss the kinds
>> > of errors the planner makes, and what tweaks we could do to correct
>> > them. Here's the ones I'm aware of:
>> >
>> > -- Incorrect selectivity of WHERE clause
>> > -- Incorrect selectivity of JOIN
>> > -- Wrong estimate of rows returned from SRF
>> > -- Incorrect cost estimate for index use
>> >
>> > Can you think of any others?

-- Incorrect estimate for result of DISTINCT or GROUP BY.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Bucky Jordan <bjordan(at)lumeta(dot)com>, josh(at)agliodbs(dot)com, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-13 17:00:29
Message-ID: 1160758829.31966.221.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Thu, 2006-10-12 at 18:02 -0400, Alvaro Herrera wrote:
> Bucky Jordan wrote:
>
> > What about using regular expressions, plus, if you have a function
> > (views, or any other statement that is stored), you can assign a rule to
> > that particular function. So you get matching, plus explicit selection.
> > This way it's easy to find all your hints, turn them off, manage them,
> > etc. (Not to mention dynamically generated SQL is ugly enough without
> > having to put hints in there).
>
> The regular expression idea that's being floated around makes my brain
> feel like somebody is screeching a blackboard nearby. I don't think
> it's a sane idea. I think you could achieve something similar by using
> stored plan representations, like we do for rewrite rules. So you'd
> look for, say, a matching join combination in a catalog, and get a
> selectivity from a function that would get the selectivities of the
> conditions on the base tables. Or something like that anyway.
>
> That gets ugly pretty fast when you have to extract selectivities for
> all the possible join paths in any given query.
>
> But please don't talk about regular expressions.
>

It sounds horrible to me too, and I'm the one that thought of it (or at
least I'm the one that introduced it to this thread).

However, everything is relative. Since the other idea floating around is
to put the same hinting information into the client queries themselves,
regexes look great by comparison (in my opinion).

Regards,
Jeff Davis


From: "Bucky Jordan" <bjordan(at)lumeta(dot)com>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: <josh(at)agliodbs(dot)com>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-13 17:08:27
Message-ID: 78ED28FACE63744386D68D8A9D1CF5D4209CBD@MAIL.corp.lumeta.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance


> -----Original Message-----
> From: Jeff Davis [mailto:pgsql(at)j-davis(dot)com]
> Sent: Friday, October 13, 2006 1:00 PM
> To: Alvaro Herrera
> Cc: Bucky Jordan; josh(at)agliodbs(dot)com; Jim C. Nasby; pgsql-
> hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] [PERFORM] Hints proposal
>
> On Thu, 2006-10-12 at 18:02 -0400, Alvaro Herrera wrote:
> > Bucky Jordan wrote:
> >
> > > What about using regular expressions, plus, if you have a function
> > > (views, or any other statement that is stored), you can assign a
rule
> to
> > > that particular function. So you get matching, plus explicit
> selection.
> > > This way it's easy to find all your hints, turn them off, manage
them,
> > > etc. (Not to mention dynamically generated SQL is ugly enough
without
> > > having to put hints in there).
> >
> > The regular expression idea that's being floated around makes my
brain
> > feel like somebody is screeching a blackboard nearby. I don't think
> > it's a sane idea. I think you could achieve something similar by
using
> > stored plan representations, like we do for rewrite rules. So you'd
> > look for, say, a matching join combination in a catalog, and get a
> > selectivity from a function that would get the selectivities of the
> > conditions on the base tables. Or something like that anyway.
> >
> > That gets ugly pretty fast when you have to extract selectivities
for
> > all the possible join paths in any given query.
> >
> > But please don't talk about regular expressions.
> >
>
> It sounds horrible to me too, and I'm the one that thought of it (or
at
> least I'm the one that introduced it to this thread).
>
> However, everything is relative. Since the other idea floating around
is
> to put the same hinting information into the client queries
themselves,
> regexes look great by comparison (in my opinion).

I was merely expressing the same opinion. But I'm not one of those
working on the planner, and all I can say to those of you who are is
your efforts on good design are most appreciated, even if they do take
longer than we users would like at times.

My only point was that they should *NOT* be put in queries themselves as
this scatters the nightmare into user code as well. Of course, other
more sane ideas are most welcome. I don't like screeching on blackboards
either. (regular expressions, although very valuable at times, seem to
have that effect quite often...)

- Bucky


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Bucky Jordan <bjordan(at)lumeta(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, josh(at)agliodbs(dot)com, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-13 17:23:31
Message-ID: 1160760211.31966.231.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Fri, 2006-10-13 at 13:08 -0400, Bucky Jordan wrote:
> > It sounds horrible to me too, and I'm the one that thought of it (or
> at
> > least I'm the one that introduced it to this thread).
> >
> > However, everything is relative. Since the other idea floating around
> is
> > to put the same hinting information into the client queries
> themselves,
> > regexes look great by comparison (in my opinion).
>
> I was merely expressing the same opinion. But I'm not one of those

I didn't mean to imply otherwise.

> working on the planner, and all I can say to those of you who are is
> your efforts on good design are most appreciated, even if they do take
> longer than we users would like at times.
>
> My only point was that they should *NOT* be put in queries themselves as
> this scatters the nightmare into user code as well. Of course, other
> more sane ideas are most welcome. I don't like screeching on blackboards
> either. (regular expressions, although very valuable at times, seem to
> have that effect quite often...)

Right. And I think the sane ideas are along the lines of estimate & cost
corrections (like Tom is saying).

Regards,
Jeff Davis


From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Bucky Jordan <bjordan(at)lumeta(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-13 17:30:24
Message-ID: 20061013173024.GE28647@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Fri, Oct 13, 2006 at 10:23:31AM -0700, Jeff Davis wrote:
> Right. And I think the sane ideas are along the lines of estimate & cost
> corrections (like Tom is saying).

Let me ask this... how long do you (and others) want to wait for those?
It's great that the planner is continually improving, but it also
appears that there's still a long road ahead. Having a dune-buggy to get
to your destination ahead of the road might not be a bad idea... :)
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Bucky Jordan <bjordan(at)lumeta(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-13 17:33:24
Message-ID: 452FCDE4.6090701@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Jim C. Nasby wrote:
> On Fri, Oct 13, 2006 at 10:23:31AM -0700, Jeff Davis wrote:
>> Right. And I think the sane ideas are along the lines of estimate & cost
>> corrections (like Tom is saying).
>
> Let me ask this... how long do you (and others) want to wait for those?
> It's great that the planner is continually improving, but it also
> appears that there's still a long road ahead. Having a dune-buggy to get
> to your destination ahead of the road might not be a bad idea... :)

It's all about resources Jim.. I have yet to see anyone step up and
offer to help work on the planner in this thread (except Tom of course).

Sincerely,

Joshua D. Drake

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate


From: "Bucky Jordan" <bjordan(at)lumeta(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, "Josh Berkus" <josh(at)agliodbs(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-13 17:33:43
Message-ID: 78ED28FACE63744386D68D8A9D1CF5D4209CBF@MAIL.corp.lumeta.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

> I completely agree that it's much better *in the long run* to improve
> the planner and the statistics system so that we don't need hints. But
> there's been no plan put forward for how to do that, which means we
also
> have no idea when some of these problems will be resolved. If someone
> comes up with a plan for that, then we can actually look at which
options
> are better and how soon we can get fixes for these problems in place.
>

Would it be helpful to have a database of EXPLAIN ANALYZE results and
related details that developers could search through? I guess we sort of
have that on the mailing list, but search/reporting features on that are
pretty limited. Something like the "Report Bug" feature that seems to be
growing popular in other software (Windows, OS X, Firefox, etc) might
allow collection of useful data. The goal would be to identify the most
common problems, and some hints at what's causing them.

Maybe have a form based submission so you could ask the user some
required questions, ensure that they aren't just submitting EXPLAIN
results (parse and look for times maybe?), etc?

I guess the general question is, what information could the users
provide developers to help with this, and how can it be made easy for
the users to submit the information, and easy for the developers to
access in a meaningful way?

As a developer/contributor, what questions would you want to ask a user?
>From reading the mailing lists, these seem to be common ones:
- Copy of your postgres.conf
- Basic hardware info
- Explain Analyze Results of poor performing query
- Explain Analyze Results of anything you've gotten to run better
- Comments

If there's interest- web development is something I can actually do
(unlike pg development) so I might actually be able to help with
something like this.

- Bucky


From: David Fetter <david(at)fetter(dot)org>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Bucky Jordan <bjordan(at)lumeta(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-13 17:43:57
Message-ID: 20061013174357.GO15009@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Fri, Oct 13, 2006 at 12:30:24PM -0500, Jim C. Nasby wrote:
> On Fri, Oct 13, 2006 at 10:23:31AM -0700, Jeff Davis wrote:
> > Right. And I think the sane ideas are along the lines of estimate
> > & cost corrections (like Tom is saying).
>
> Let me ask this... how long do you (and others) want to wait for
> those?

That's a good question, but see below.

> It's great that the planner is continually improving, but it
> also appears that there's still a long road ahead. Having a
> dune-buggy to get to your destination ahead of the road might not be
> a bad idea... :)

What evidence do you have that adding per-query hints would take less
time and be less work, even in the short term, than the current
strategy of continuously improving the planner and optimizer?

Cheers,
D
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Bucky Jordan <bjordan(at)lumeta(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-13 17:44:41
Message-ID: 1160761481.31966.242.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Fri, 2006-10-13 at 12:30 -0500, Jim C. Nasby wrote:
> On Fri, Oct 13, 2006 at 10:23:31AM -0700, Jeff Davis wrote:
> > Right. And I think the sane ideas are along the lines of estimate & cost
> > corrections (like Tom is saying).
>
> Let me ask this... how long do you (and others) want to wait for those?
> It's great that the planner is continually improving, but it also
> appears that there's still a long road ahead. Having a dune-buggy to get
> to your destination ahead of the road might not be a bad idea... :)

Fair enough. I can wait indefinitely right now, because I don't have any
serious problems with the planner as-is.

I am trying to empathize with people who are desperate to force plans
sometimes. Your original proposal included hints in the client queries.
I suggested that regexes on the server can accomplish the same goal
while avoiding a serious drawback. Don't you think some kind of server
matching rule is better?

I think an idea to get the hints into the server, regardless of the
types of hints you want to use, make it more likely to be accepted.
Don't you think that's a better road to take?

Regards,
Jeff Davis


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, Jeff Davis <pgsql(at)j-davis(dot)com>, Bucky Jordan <bjordan(at)lumeta(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-13 17:58:33
Message-ID: 452FD3C9.4000303@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Joshua D. Drake wrote:
> Jim C. Nasby wrote:
>
>> On Fri, Oct 13, 2006 at 10:23:31AM -0700, Jeff Davis wrote:
>>
>>> Right. And I think the sane ideas are along the lines of estimate & cost
>>> corrections (like Tom is saying).
>>>
>> Let me ask this... how long do you (and others) want to wait for those?
>> It's great that the planner is continually improving, but it also
>> appears that there's still a long road ahead. Having a dune-buggy to get
>> to your destination ahead of the road might not be a bad idea... :)
>>
>
> It's all about resources Jim.. I have yet to see anyone step up and
> offer to help work on the planner in this thread (except Tom of course).
>
>
>

It's worse than that. Dune buggies do not run cost free. They require
oil, petrol, and maintenance. Somebody wants to build a Maserati and you
want to divert resources to maintaining dune buggies?

cheers

andrew


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Bucky Jordan <bjordan(at)lumeta(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-13 18:07:17
Message-ID: 452FD5D5.3040500@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Jim C. Nasby wrote:
> On Fri, Oct 13, 2006 at 10:23:31AM -0700, Jeff Davis wrote:
>> Right. And I think the sane ideas are along the lines of estimate & cost
>> corrections (like Tom is saying).
>
> Let me ask this... how long do you (and others) want to wait for those?

well - we waited and got other features in the past and we will wait and
get new features in the future too...

> It's great that the planner is continually improving, but it also
> appears that there's still a long road ahead. Having a dune-buggy to get
> to your destination ahead of the road might not be a bad idea... :)

well the planner has improved dramatically in the last years - we have
apps here that are magnitudes faster with 8.1(even faster with 8.2) then
they were with 7.3/7.4 - most of that is pure plain planner improvements.

And I don't really believe that adding proper per statement hint support
is any easier then continuing to improve the planner or working on (imho
much more useful) improvements to the statistics infrastructure or
functionality to tweak the statistics usage of the planner.
The later however would prove to be much more useful for most of the
current "issues" and have benefits for most of the userbase.

Stefan


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-13 22:57:23
Message-ID: 200610131557.24420.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Jim,

> Well, that's not what I said (my point being that until the planner and
> stats are perfect you need a way to over-ride them)... but I've also
> never said hints would be faster or easier than stats modification (I
> said I hope they would).

Yes, you did. Repeatedly. On this and other threads, you've made the
statement at least three times that per-query hints are the only way to go
for 8.3. Your insistence on this view has been so strident that if I
didn't know you better, I would assume some kind of hidden agenda.

Stop harping on the "per-query hints are the true way and the only way", or
prepare to have people start simply ignoring you.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-13 22:59:17
Message-ID: 200610131559.18328.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Andreas,

> I think we need to more precisely define the problems of our system with
> point in time statistics
>
> -- no reaction to degree of other concurrent activity
> -- no way to react to abnormal skew that only persists for a very short
> duration
> -- too late reaction to changing distribution (e.g. current date column
> when a new year starts)
> and the variant: too late adaption when a table is beeing filled
> -- missing cost/selectivity estimates for several parts of the system

How would we manage point-in-time statistics? How would we collect them &
store them? I think this is an interesting idea, but very, very hard to
do ...

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
Subject: Re: [PERFORM] Hints proposal
Date: 2006-10-13 23:34:32
Message-ID: 200610131634.32731.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Csaba,

> And upgrades are always causing breakage, I didn't have one upgrade
> without some things to fix, so I would expect people is expecting that.
> And that's true for Oracle too, our oracle guys always have something to
> fix after an upgrade. And I repeat, I always had something to fix for
> postgres too on all upgrades I've done till now.

Really? Since 7.4, I've been able to do most upgrades without any
troubleshooting.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-13 23:36:15
Message-ID: 20061013233615.GA381@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Fri, Oct 13, 2006 at 03:57:23PM -0700, Josh Berkus wrote:
> Jim,
>
> > Well, that's not what I said (my point being that until the planner and
> > stats are perfect you need a way to over-ride them)... but I've also
> > never said hints would be faster or easier than stats modification (I
> > said I hope they would).
>
> Yes, you did. Repeatedly. On this and other threads, you've made the
> statement at least three times that per-query hints are the only way to go
> for 8.3. Your insistence on this view has been so strident that if I
> didn't know you better, I would assume some kind of hidden agenda.

Let me clarify, because that's not what I meant. Right now, there's not
even a shadow of a design for anything else, and this is a tough nut to
crack. That means it doesn't appear that anything else could be done for
8.3. If I'm wrong, great. If not, we should get something in place for
users now while we come up with something better.

So, does anyone out there have a plan for how we could give user's the
ability to control the planner at a per-table level in 8.3 or even 8.4?
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-14 01:29:01
Message-ID: 20061014012901.GC28725@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Jim C. Nasby wrote:

> So, does anyone out there have a plan for how we could give user's the
> ability to control the planner at a per-table level in 8.3 or even 8.4?

Per-table level? Some of the problems that have been put forward have
to do with table combinations (for example selectivity of joins), so not
all problems will be solved with a per-table design.

I think if it were per table, you could get away with storing stuff in
pg_statistics or some such. But how do you express statistics for
joins? How do you express cross-column correlation?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-14 04:19:46
Message-ID: 25999.1160799586@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

"Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> Let me clarify, because that's not what I meant. Right now, there's not
> even a shadow of a design for anything else, and this is a tough nut to
> crack.

I think you are not exactly measuring on a level playing field. On the
textually-embedded-hints side, I see a very handwavy suggestion of a
syntax and absolutely nothing about how it might be implemented --- in
particular, nothing about how the information would be transmitted
through to the planner, and nothing about exactly how the planner would
use it if it had it. (No, I don't think "the planner will obey the
hints" is an implementation sketch.) On the other side, the concept of
system catalog(s) containing overrides for statistical or costing
estimates is pretty handwavy too, but at least it's perfectly clear
where it would plug into the planner: before running one of the current
stats estimation or costing functions, we'd look for a matching override
command in the catalogs. The main question seems to be what we'd like
to be able to match on ... but that doesn't sound amazingly harder than
specifying what an embedded hint does.

IMO a textual hint facility will actually require *more* infrastructure
code to be written than what's being suggested for alternatives.

regards, tom lane


From: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Hints proposal
Date: 2006-10-15 23:55:54
Message-ID: 4532CA8A.5050308@modgraph-usa.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Josh Berkus wrote:
> I actually think the way to attack this issue is to discuss the kinds of
> errors the planner makes, and what tweaks we could do to correct them.
> Here's the ones I'm aware of:
>
> -- Incorrect selectivity of WHERE clause
> -- Incorrect selectivity of JOIN
> -- Wrong estimate of rows returned from SRF
> -- Incorrect cost estimate for index use
>
> Can you think of any others?

The one that started this discussion: Lack of cost information for functions. I think this feature is a good idea independent of the whole HINTS discussion.

At a minimum, a rough categorization is needed, such as "Lighning fast / Fast / Medium / Slow / Ludicrously slow", with some sort if milliseconds or CPU cycles associated with each category. Or perhaps something like, "This is (much faster|faster|same as|slower|much slower) than reading a block from the disk."

If I understand Tom and others, the planner already is capable of taking advantage of this information, it just doesn't have it yet. It could be part of the CREATE FUNCTION command.

CREATE OR REPLACE FUNCTION foobar(text, text, text) RETURNS text
AS '/usr/local/pgsql/lib/foobar.so', 'foobar'
COST LUDICROUSLY_SLOW
LANGUAGE 'C' STRICT;

Better yet ('tho I have no idea how hard this would be to implement...) would be an optional second function with the same parameter signature as the main function, but it would return a cost estimate:

CREATE OR REPLACE FUNCTION foobar(text, text, text) RETURNS text
AS '/usr/local/pgsql/lib/foobar.so', 'foobar'
COST foobar_cost
LANGUAGE 'C' STRICT;

The planner could call it with the same parameters it was about to use, and get an accurate estimate for the specific operation that is about to be done. In my particular case (running an NP-complete problem), there are cases where I can determine ahead of time that the function will be fast, but in most cases it is *really* slow.

Craig


From: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Hints proposal
Date: 2006-10-16 00:25:31
Message-ID: 4532D17B.20106@modgraph-usa.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

So let's cut to the bone: If someone thinks a proposal is a bad idea, and they're volunteering their time on an open-source project, why would they implement the proposal?

In all the heat and smoke, I believe there are two basic conclusions we all agree on.

1. Optimizer:
a) A perfect optimizer would be a wonderful thing
b) Optimization is a hard problem
c) Any problem that can be solve by improving the optimizer *should*
be solved by improving the optimizer.

2. Hints
a) On a aesthetic/theoretical level, hints suck. They're ugly and rude
b) On a practical level, introducing hints will cause short- and long-term problems
c) Hints would help DBAs solve urgent problems for which there is no other solution

The disagreements revolve around the degree to which 1 conflicts with 2.

1. Developers feel very strongly about 2(a) and 2(b).
2. DBAs "in the trenches" feel very strongly about 2(c).

So my question is: Is there any argument that can be made to persuade those of you who are volunteering your time on the optimizer to even consider a HINTS proposal? Has all this discussion changed your perspective on 2(c), and why it really matters to some of us? Are we just wasting our time, or is this a fruitful discussion?

Thanks,
Craig


From: mark(at)mark(dot)mielke(dot)cc
To: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Hints proposal
Date: 2006-10-16 11:00:23
Message-ID: 20061016110023.GC6709@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Sun, Oct 15, 2006 at 05:25:31PM -0700, Craig A. James wrote:
> So my question is: Is there any argument that can be made to persuade those
> of you who are volunteering your time on the optimizer to even consider a
> HINTS proposal? Has all this discussion changed your perspective on 2(c),
> and why it really matters to some of us? Are we just wasting our time, or
> is this a fruitful discussion?

They're waiting for an idea that captures their imagination. So far,
it seems like a re-hashing of old ideas that have been previously shot
down, none of which seem overly imaginative, or can be shown to
provide significant improvement short term or long term... :-)

Haha. That's my take on it. Sorry if it is harsh.

To get very competent people to volunteer their time, you need to make
them believe. They need to dream about it, and wake up the next morning
filled with a desire to try out some of their ideas.

You need to brain wash them... :-)

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Hints proposal
Date: 2006-10-16 12:34:30
Message-ID: 45337C56.8000209@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Craig A. James wrote:
>
>
> 2. Hints
> a) On a aesthetic/theoretical level, hints suck. They're ugly and rude
> b) On a practical level, introducing hints will cause short- and
> long-term problems
> c) Hints would help DBAs solve urgent problems for which there is no
> other solution

Pretty good summary!

Maybe there should be a 2d), 2e) and 2f).

2d) Hints will damage the ongoing development of the optimizer by
reducing or eliminating test cases for its improvement.
2e) Hints will divert developer resource away from ongoing development
of the optimizer.
2f) Hints may demoralize the developer community - many of whom will
have been attracted to Postgres precisely because this was a realm where
crude solutions were discouraged.

I understand that these points may seem a bit 'feel-good' and intangible
- especially for the DBA's moving to Pg from Oracle, but I think they
illustrate the mindset of the Postgres developer community, and the
developer community is, after all - the primary reason why Pg is such a
good product.

Of course - if we can find a way to define 'hint like' functionality
that is more in keeping with the 'Postgres way' (e.g. some of the
relation level statistical additions as discussed), then some of 2d-2f)
need not apply.

Best wishes

Mark


From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hints proposal
Date: 2006-10-16 13:27:46
Message-ID: 1161005266.28751.145.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

> 2d) Hints will damage the ongoing development of the optimizer by
> reducing or eliminating test cases for its improvement.

You have no evidence for this. The mindset of the postgres community you
cite further below usually mandates that you say things if you have
evidence for them... and this one could be even backwards, by putting
such a tool in normal mortals hands that they can experiment with
execution plans to see which one works better, thus giving more data to
the developers than it is possible now. This is of course a speculation
too, but not at all weaker than yours.

> 2e) Hints will divert developer resource away from ongoing development
> of the optimizer.

This is undebatable, although the long term cost/benefit is not clear.
And I would guess simple hinting would not need a genius to implement it
as planner optimizations mostly do... so it could possibly be done by
somebody else than the core planner hackers (is there any more of them
than Tom ?), and such not detract them too much from the planner
optimization tasks.

> 2f) Hints may demoralize the developer community - many of whom will
> have been attracted to Postgres precisely because this was a realm where
> crude solutions were discouraged.

I still don't get it why are you so against hints. Hints are a crude
solution only if you design them to be like that... otherwise they are
just yet another tool to get the work done, preferably now.

> I understand that these points may seem a bit 'feel-good' and intangible
> - especially for the DBA's moving to Pg from Oracle, but I think they
> illustrate the mindset of the Postgres developer community, and the
> developer community is, after all - the primary reason why Pg is such a
> good product.

I fail to see why would be a "hinted" postgres an inferior product...

> Of course - if we can find a way to define 'hint like' functionality
> that is more in keeping with the 'Postgres way' (e.g. some of the
> relation level statistical additions as discussed), then some of 2d-2f)
> need not apply.

I bet most of the users who wanted hints are perfectly fine with any
variations of it, if it solves the problems at hand.

Cheers,
Csaba.


From: Brian Hurt <bhurt(at)janestcapital(dot)com>
To: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hints proposal
Date: 2006-10-16 15:36:23
Message-ID: 4533A6F7.7010708@janestcapital.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

I haven't weighed in on this because 1) I'm not a postgresql developer,
and am firmly of the opinion that they who are doing the work get to
decide how the work gets done (especially when you aren't paying them
for the work), and 2) I don't have any experience as a developer with
hints, and thus don't know any of the pluses or minuses. I do, however,
know my fellow developers. As general rules:

1) If you give developers a feature, they will use it. The implicit
assumption seems to be that if you're given a feature, you've been given
it for a good reason, use it whenever possible. Therefor, any hints
feature *will* be used widely an in "inappropriate" circumstances.
Protestations that this wasn't what the feature was meant for will fall
on deaf ears.

2) Taking away a feature is painfull. Of course the developers will
*say* that they're doing it in a portable way that'll be easy to change
in the future, but we lie like cheap rugs. This is is often just a case
of stupidity and/or ignorance, but even the best developers can get
caught- 99 out of 100 uses of the feature are portable and easy to
update, it's #100 that's a true pain, and #100 was an accident, or a
kludge to get the app out the door under shipping schedule, etc. Taking
away, or breaking, a feature then just becomes a strong disincentive to
upgrade.

3) Developers are often astonishingly bad at predicting what is or is
not good for performance. A good example of this for databases is the
assumption that index scans are always faster than sequential scans.
The plan the programmer thinks they want is often not the plan the
programmer really wants. Especially considering generally the program
has so many other things they're having to deal with (the "it's hard to
remember you're out to drain the swamp when you're up to your ass in
alligators" problem) that we generally don't have the spare brainpower
left over for query optimization. Thus the strong tendancy to want to
adopt simple, rough and ready, mostly kinda true rules (like "index
scans are always faster than sequential scans") about what is or is not
good for performance.

Or, in shorter forms:
1) If you make it convient to use, expect it to be used a lot. If it
shouldn't be used a lot, don't make it convient.
2) Breaking features means that people won't upgrade.
3) Programmers are idiots- design accordingly.

Brian


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Brian Hurt <bhurt(at)janestcapital(dot)com>
Cc: postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hints proposal
Date: 2006-10-16 16:43:43
Message-ID: 200610161643.k9GGhhi05769@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Brian Hurt wrote:
> Or, in shorter forms:
> 1) If you make it convient to use, expect it to be used a lot. If it
> shouldn't be used a lot, don't make it convient.
> 2) Breaking features means that people won't upgrade.
> 3) Programmers are idiots- design accordingly.

The PostgreSQL project has had a philosophy of trying to limit user
choice when we can _usually_ make the right choice automatically.
Historically Oracle and others have favored giving users more choices,
but this adds complexity when using the database.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Shaun Thomas <sthomas(at)leapfrogonline(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Hints proposal
Date: 2006-10-16 17:00:01
Message-ID: 200610161200.01913.sthomas@leapfrogonline.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Monday 16 October 2006 10:36, Brian Hurt wrote:

> ... Therefor, any hints feature *will* be used widely
> and in "inappropriate" circumstances. Protestations that
> this wasn't what the feature was meant for will fall on
> deaf ears.

I don't really care about this topic, as I've used Oracle and never
actually made use of its hint system, but I liked knowing it was there.
But what's better here, asking the optimizer to use what is tested with
explain analyze to be a better plan, or to convolute a query so
horribly it's hardly recognizable, in an effort to "trick" the
optimizer?

Someone made a note earlier that any hints made irrelevant by optimizer
improvements would probably need to be removed, citing that as a
maintenence nightmare. But the same point holds for queries that have
been turned into unmaintainable spaghetti or a series of cursors to
circumvent the optimizer. Personally, I'd rather grep my code for a
couple deprecated key-words than re-check every big query between
upgrades to see if any optimizer improvements have been implemented.

Query planning is a very tough job, and SQL is a very high-level
language, making it doubly difficult to get the intended effect of a
query across to the optimizer. C allows inline assembler for exactly
this reason; sometimes the compiler is wrong about something, or
experience and testing shows a better way is available that no compiler
takes into account. As such a high-level language, SQL is inherently
flawed for performace tuning, relying almost entirely on the optimizer
knowing the best path. Here we have no recourse if the planner is just
plain wrong.

I almost wish the SQL standards committee would force syntax for sending
low-level commands to the optimizer for exactly this reason. C has
the "inline" keyword, so why can't SQL have something similar? I
agree, hints are essentially retarded comments to try and persuade the
optimizer to take a different action... what I'd actually like to see
is some way of directly addressing the query-planner's API and
circumvent SQL entirely for really nasty or otherwise convoluted
result-sets, but of course I know that's rather unreasonable.

C'mon, some of us DBAs have math degrees and know set theory... ;)

--

Shaun Thomas
Database Administrator

Leapfrog Online
807 Greenwood Street
Evanston, IL 60201
Tel. 847-440-8253
Fax. 847-570-5750
www.leapfrogonline.com


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hints proposal
Date: 2006-10-16 20:25:05
Message-ID: 4533EAA1.9020400@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Csaba Nagy wrote:
>> 2d) Hints will damage the ongoing development of the optimizer by
>> reducing or eliminating test cases for its improvement.
>
> You have no evidence for this.

My evidence (which I think I've mentioned in a couple of previous
postings), is the experience with the optimizer of that... err.. other
database that has hints, plus the experience of that (different) other
database that does not allow them :-) Several others have posted similar
comments.

>
>> 2f) Hints may demoralize the developer community - many of whom will
>> have been attracted to Postgres precisely because this was a realm where
>> crude solutions were discouraged.
>
> I still don't get it why are you so against hints. Hints are a crude
> solution only if you design them to be like that... otherwise they are
> just yet another tool to get the work done, preferably now.
>
>
> I fail to see why would be a "hinted" postgres an inferior product...
>

A rushed. and crude implementation will make it an inferior product -
now not every hint advocate is demanding them to be like that, but the
tone of many of the messages is "I need hints because they can help me
*now*, whereas optimizer improvements will take too long...". That
sounds to me like a quick fix. I think if we provide hint-like
functionality it must be *part of* our optimizer improvement plan, not
instead of it!

Now I may have come on a bit strong about this - and apologies if that's
the case, but one of the things that attracted me to Postgres originally
was the community attitude of "doing things properly or sensibly", I
think it would be a great loss - for the product, not just for me - if
that changes to something more like "doing things quickly".

best wishes

Mark


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, josh(at)agliodbs(dot)com, "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Hints proposal
Date: 2006-10-18 02:18:52
Message-ID: 200610172218.53194.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Friday 13 October 2006 12:46, Gregory Stark wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
> >> > I actually think the way to attack this issue is to discuss the kinds
> >> > of errors the planner makes, and what tweaks we could do to correct
> >> > them. Here's the ones I'm aware of:
> >> >
> >> > -- Incorrect selectivity of WHERE clause
> >> > -- Incorrect selectivity of JOIN
> >> > -- Wrong estimate of rows returned from SRF
> >> > -- Incorrect cost estimate for index use
> >> >
> >> > Can you think of any others?
>
> -- Incorrect estimate for result of DISTINCT or GROUP BY.

Yeah, that one is bad. I also ran into one the other day where the planner
did not seem to understand the distinctness of a columns values across table
partitions...

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Cc: "Bucky Jordan" <bjordan(at)lumeta(dot)com>, "Joshua Marsh" <icub3d(at)gmail(dot)com>
Subject: Re: Hints proposal
Date: 2006-10-18 02:40:19
Message-ID: 200610172240.20142.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Thursday 12 October 2006 12:40, Bucky Jordan wrote:
> >What is it about hinting that makes it so easily breakable with new
> > versions?  I >don't have any experience with Oracle, so I'm not sure how
> > they screwed logic like >this up.  
>
> I don't have a ton of experience with oracle either, mostly DB2, MSSQL and
> PG. So, I thought I'd do some googling, and maybe others might find this
> useful info.
>
> http://asktom.oracle.com/pls/ask/f?p=4950:8:2177642270773127589::NO::F4950_
>P8_DISPLAYID,F4950_P8_CRITERIA:7038986332061
>
> Interesting quote: "In Oracle Applications development (11i apps - HR, CRM,
> etc) Hints are strictly forbidden. We find the underlying cause and fix
> it." and "Hints -- only useful if you are in RBO and you want to make use
> of an access path."
>
> Maybe because I haven't had access to hints before, I've never been tempted
> to use them. However, I can't remember having to re-write SQL due to a PG
> upgrade either.
>

When it happens it tends to look something like this:
http://archives.postgresql.org/pgsql-performance/2006-01/msg00154.php

Funny that for all the people who claim that improving the planner should be
the primary goal that no one ever took interest in the above case.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org, "Bucky Jordan" <bjordan(at)lumeta(dot)com>, "Joshua Marsh" <icub3d(at)gmail(dot)com>
Subject: Re: Hints proposal
Date: 2006-10-18 02:55:29
Message-ID: 14539.1161140129@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Robert Treat <xzilla(at)users(dot)sourceforge(dot)net> writes:
> When it happens it tends to look something like this:
> http://archives.postgresql.org/pgsql-performance/2006-01/msg00154.php

> Funny that for all the people who claim that improving the planner should be
> the primary goal that no one ever took interest in the above case.

Well, you didn't provide sufficient data for anyone else to reproduce
the problem ...

regards, tom lane


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org, "Bucky Jordan" <bjordan(at)lumeta(dot)com>, "Joshua Marsh" <icub3d(at)gmail(dot)com>
Subject: Re: Hints proposal
Date: 2006-10-18 16:59:20
Message-ID: 200610181259.21389.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Tuesday 17 October 2006 22:55, Tom Lane wrote:
> Robert Treat <xzilla(at)users(dot)sourceforge(dot)net> writes:
> > When it happens it tends to look something like this:
> > http://archives.postgresql.org/pgsql-performance/2006-01/msg00154.php
> >
> > Funny that for all the people who claim that improving the planner should
> > be the primary goal that no one ever took interest in the above case.
>
> Well, you didn't provide sufficient data for anyone else to reproduce
> the problem ...
>

Geez Tom, cut me some slack... no one even bothered to respond that that post
with a "hey we can't tell cause we need more information"...

not that it matters because here is where I reposted the problem with more
information
http://archives.postgresql.org/pgsql-performance/2006-01/msg00248.php
where you'll note that Josh agreed with my thinking that there was an issue
with the planner and he specifically asked for comments from you.

And here is where I reposted the problem to -bugs
http://archives.postgresql.org/pgsql-bugs/2006-01/msg00134.php where I make
note of discussing this with several other people, got Bruce to hazard a
guess which was debunked, and where I noted to Bruce about 10 days later that
there had been no further action and no one had asked for the _sample
database_ I was able to put together.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL