Review remove {join,from}_collapse_limit, add enable_join_ordering

Lists: pgsql-hackers
From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Review remove {join,from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 08:48:45
Message-ID: 200907161048.45311.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Robert, Hi all,

The patch applies cleanly and works as intended - no surprise here. After the
changes the documentation is at least as easy to understand as before and the
code changes look sensible
Also not surprisingly that's not the area I expected problems I guess ;-)

For performance testing I replayed query logs from sites I easily could get my
hands on (3 different, halfway interesting ones). I found no relevant
differences on the first site which is sensible because
{from,join}_collapse_limit wasn't reached anyway.
More interesting are the queries from the two sites having reporting queries:

On the first, simpler, schema I found on average 30% plan time increase and 40%
execution time decrease. Most of the queries stayed the same, only a few
changed radically (in both directions). No big differences between geqo=on/off.

The queries on the second reporting schema unfortunately are different. Its the
one were I copied the crazy example I attached in the original thread.
With geqo=off a good part of the queries used daily use too much memory to plan
sensibly and geqo=on outright fails with:
"Error: Failed to make a valid plan"
on some.
I stopped trying to make performance measurements there.
Noticeable even some plans which were plannable in reasonable time before now
are problematic with enable_join_ordering=false!

I agree that those queries are crazy, but I am not sure how many of those are
out there...

So, while I think the changes are principally a good idea, as
{from,join}_collapse_limit are a bit confusing options, I personally! do not
think geqo is ready for it today, especially as the benefit is relatively
small.

If I am the only one having access to such complicated queries its fine - I am
working on the sites query generation/schema anyway.

Could perhaps some other people having complicated queries check how they work
out with those changes? It should be enough to check with a very big
{join,from}_collapse_limit? Kevin?

I have also to admit that I somewhat like the current behaviour in theory.
Currently you can have a view with hand-optimized JOIN order which will not
get inlined and/or reordered use it together with something unoptimized and
the unoptimized part will be reordered in many cases...

I found it somewhat hard to review a patch were my meaning was biased from
beginning. As Tom listed himself listed himself as a reviewer I will happiliy
(err?) concede to his and your judgement.

Andres


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join, from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 13:13:02
Message-ID: 23200.1247749982@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> The queries on the second reporting schema unfortunately are different. Its the
> one were I copied the crazy example I attached in the original thread.
> With geqo=off a good part of the queries used daily use too much memory to plan
> sensibly and geqo=on outright fails with:
> "Error: Failed to make a valid plan"
> on some.

We're not going to be able to fix this unless you show us examples.

> Noticeable even some plans which were plannable in reasonable time before now
> are problematic with enable_join_ordering=false!

And this even more so --- it doesn't make any sense at all.

> So, while I think the changes are principally a good idea, as
> {from,join}_collapse_limit are a bit confusing options, I personally! do not
> think geqo is ready for it today, especially as the benefit is relatively
> small.

In general I think that any such bugs are there anyway and need to be
fixed anyway.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join, from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 13:18:01
Message-ID: 200907161518.02401.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 16 July 2009 15:13:02 Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > The queries on the second reporting schema unfortunately are different.
> > Its the one were I copied the crazy example I attached in the original
> > thread. With geqo=off a good part of the queries used daily use too much
> > memory to plan sensibly and geqo=on outright fails with:
> > "Error: Failed to make a valid plan"
> > on some.
> We're not going to be able to fix this unless you show us examples.
In the other thread I attached a similar to the real schema + example query.
Not enough? And why?

> > Noticeable even some plans which were plannable in reasonable time before
> > now are problematic with enable_join_ordering=false!
> And this even more so --- it doesn't make any sense at all.
Why? With a high from_collapse_limit more subqueries get inlined - before
inlining they get planned separately.

> > So, while I think the changes are principally a good idea, as
> > {from,join}_collapse_limit are a bit confusing options, I personally! do
> > not think geqo is ready for it today, especially as the benefit is
> > relatively small.
> In general I think that any such bugs are there anyway and need to be
> fixed anyway.
Understandable.

Andres


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join, from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 13:25:33
Message-ID: 200907161525.34304.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 16 July 2009 15:18:01 Andres Freund wrote:
> On Thursday 16 July 2009 15:13:02 Tom Lane wrote:
> > Andres Freund <andres(at)anarazel(dot)de> writes:
> > > The queries on the second reporting schema unfortunately are different.
> > > Its the one were I copied the crazy example I attached in the original
> > > thread. With geqo=off a good part of the queries used daily use too
> > > much memory to plan sensibly and geqo=on outright fails with:
> > > "Error: Failed to make a valid plan"
> > > on some.
> >
> > We're not going to be able to fix this unless you show us examples.
>
> In the other thread I attached a similar to the real schema + example
> query. Not enough? And why?
For reference:
http://archives.postgresql.org/message-
id/200907091700(dot)43411(dot)andres(at)anarazel(dot)de

Andres


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join, from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 15:16:31
Message-ID: 9850.1247757391@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On Thursday 16 July 2009 15:13:02 Tom Lane wrote:
>> Andres Freund <andres(at)anarazel(dot)de> writes:
>>> "Error: Failed to make a valid plan"

>> We're not going to be able to fix this unless you show us examples.

> In the other thread I attached a similar to the real schema + example query.
> Not enough? And why?

I tried the example query and couldn't get "Failed to make a valid plan"
out of it ... what settings do you need for that?

However, I do observe that this seems a sufficient counterexample
against the theory that we can just remove the collapse limits and let
GEQO save us on very complex queries. On my machine, the example query
takes about 22 seconds to plan using CVS HEAD w/ all default settings.
If I set both collapse_limit variables to very high values (I used 999),
it takes ... um ... not sure; I gave up waiting after half an hour.
I also tried with geqo_effort reduced to the minimum of 1, but that
didn't produce a plan in reasonable time either (I gave up after ten
minutes). So if we remove the collapse limits, Postgres will completely
fail on this query --- the only way out would be enable_join_ordering =
off, which is hardly likely to produce a decent plan.

Maybe we should leave the collapse_limit logic alone and address
Robert's gripes by just raising the default values a lot (I'm thinking
100 or so). That way there's an escape hatch for anyone who has
pathological queries to deal with --- just dial the settings down.

>>> Noticeable even some plans which were plannable in reasonable time before
>>> now are problematic with enable_join_ordering=false!

>> And this even more so --- it doesn't make any sense at all.

> Why? With a high from_collapse_limit more subqueries get inlined - before
> inlining they get planned separately.

Okay, I misunderstood which two cases you were comparing there.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join,from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 15:27:39
Message-ID: 407d949e0907160827j45cc0a8ct98a49f56ddcd3768@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 16, 2009 at 4:16 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> However, I do observe that this seems a sufficient counterexample
> against the theory that we can just remove the collapse limits and let
> GEQO save us on very complex queries.  On my machine, the example query
> takes about 22 seconds to plan using CVS HEAD w/ all default settings.
> If I set both collapse_limit variables to very high values (I used 999),
> it takes ... um ... not sure; I gave up waiting after half an hour.

What's the point of GEQO if it doesn't guarantee to produce the
optimal plana and *also* doesn't guarantee to produce some plan, any
plan, within some reasonable amount of time? Either we need to fix
that or else I don't see what it's buying us over our regular planner
which also might not produce a plan within a reasonable amount of time
but at least if it does it'll be the right plan.

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join, from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 15:32:24
Message-ID: 10163.1247758344@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> If I set both collapse_limit variables to very high values (I used 999),
> it takes ... um ... not sure; I gave up waiting after half an hour.
> I also tried with geqo_effort reduced to the minimum of 1, but that
> didn't produce a plan in reasonable time either (I gave up after ten
> minutes).

After I gave up letting the machine be idle to get a fair timing,
I turned on oprofile monitoring. It looks a bit interesting:

samples % image name symbol name
886498 53.8090 postgres have_relevant_eclass_joinclause
460596 27.9574 postgres bms_overlap
142764 8.6655 postgres bms_is_subset
126274 7.6646 postgres have_join_order_restriction
14205 0.8622 postgres list_nth_cell
2721 0.1652 postgres generate_join_implied_equalities
2445 0.1484 libc-2.9.so memset
2202 0.1337 postgres have_relevant_joinclause
1678 0.1019 postgres make_canonical_pathkey
1648 0.1000 postgres pfree
884 0.0537 postgres bms_union
762 0.0463 postgres gimme_tree
660 0.0401 libc-2.9.so memcpy
571 0.0347 postgres AllocSetFree
475 0.0288 postgres AllocSetAlloc
431 0.0262 postgres has_relevant_eclass_joinclause
389 0.0236 postgres check_list_invariants
260 0.0158 postgres join_is_legal
238 0.0144 postgres bms_copy

So maybe a redesign of the equivalence-class joinclause mechanism is in
order. Still, this is unlikely to fix the fundamental issue that the
time for large join problems grows nonlinearly.

regards, tom lane


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Re: Review remove {join,from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 15:35:26
Message-ID: 20090716153526.GA20661@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 16, 2009 at 04:27:39PM +0100, Greg Stark wrote:
> On Thu, Jul 16, 2009 at 4:16 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > However, I do observe that this seems a sufficient counterexample
> > against the theory that we can just remove the collapse limits and let
> > GEQO save us on very complex queries. ?On my machine, the example query
> > takes about 22 seconds to plan using CVS HEAD w/ all default settings.
> > If I set both collapse_limit variables to very high values (I used 999),
> > it takes ... um ... not sure; I gave up waiting after half an hour.
>
> What's the point of GEQO if it doesn't guarantee to produce the
> optimal plana and *also* doesn't guarantee to produce some plan, any
> plan, within some reasonable amount of time? Either we need to fix
> that or else I don't see what it's buying us over our regular planner
> which also might not produce a plan within a reasonable amount of time
> but at least if it does it'll be the right plan.
>

I do agree that we should have an actually time limit cap for
GEQO that would have it return the best plan so far at that time.
Then you can at least bound your planning time.

Regards,
Ken


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join,from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 15:46:43
Message-ID: 407d949e0907160846l3afc6f04r71a3b80ffcfb0f5a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 16, 2009 at 4:32 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> samples  %        image name               symbol name
> 886498   53.8090  postgres                 have_relevant_eclass_joinclause
> 460596   27.9574  postgres                 bms_overlap
>
> So maybe a redesign of the equivalence-class joinclause mechanism is in
> order.  Still, this is unlikely to fix the fundamental issue that the
> time for large join problems grows nonlinearly.

Perhaps it's GEQO's fault that it's using these functions
inappropriately, calling them often to calculate these answers
whenever it needs them instead of looking once for join clauses and
then optimizing based on the results. But I've never actually looked
at geqo, mabe that's inherent in the design?

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join, from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 15:46:47
Message-ID: 200907161746.48355.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 16 July 2009 17:16:31 Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On Thursday 16 July 2009 15:13:02 Tom Lane wrote:
> >> Andres Freund <andres(at)anarazel(dot)de> writes:
> >>> "Error: Failed to make a valid plan"
> >>
> >> We're not going to be able to fix this unless you show us examples.
> >
> > In the other thread I attached a similar to the real schema + example
> > query. Not enough? And why?
>
> I tried the example query and couldn't get "Failed to make a valid plan"
> out of it ... what settings do you need for that?
It unfortunately depends on settings and luck. This dependence on luck was the
reason why I liked geqo to behave "somewhat" deterministically...

With {join,from}_collapse_limit = 100 it seems to be triggered reliably. With
lower values it seems harder trigger, with bigger it simply takes too long to
even get there.

Efficiencywise using geqo with higher limits nearly all time is spent in:

geqo
gimme_tree
have_join_order_restriction
has_legal_joinclause
have_relevant_joinclause
have_relevant_eclass (30% self)
bms_overlap (50%self)

I am not yet fully understanding geqo, but it looks like there are some
possibilities to improve this.
Although such efficiency improvements would no not explain the completely
failing plans...

Do you have an idea which kind of plans benefit most from using geqo? I had a
somewhat hard time finding any query were geqo was substantially faster than
the standard join search.

That also somewhat explains why I saw improvements with 64bit bitmapsets...

> However, I do observe that this seems a sufficient counterexample
> against the theory that we can just remove the collapse limits and let
> GEQO save us on very complex queries. On my machine, the example query
> takes about 22 seconds to plan using CVS HEAD w/ all default settings.
> If I set both collapse_limit variables to very high values (I used 999),
> it takes ... um ... not sure; I gave up waiting after half an hour.
> I also tried with geqo_effort reduced to the minimum of 1, but that
> didn't produce a plan in reasonable time either (I gave up after ten
> minutes). So if we remove the collapse limits, Postgres will completely
> fail on this query --- the only way out would be enable_join_ordering =
> off, which is hardly likely to produce a decent plan.
> Maybe we should leave the collapse_limit logic alone and address
> Robert's gripes by just raising the default values a lot (I'm thinking
> 100 or so). That way there's an escape hatch for anyone who has
> pathological queries to deal with --- just dial the settings down.
Yes, I think thats sensible. I don't know if there are any queries out there
that benefit from a higher limits.

Andres


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join, from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 15:52:30
Message-ID: 10541.1247759550@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> On Thu, Jul 16, 2009 at 4:32 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> So maybe a redesign of the equivalence-class joinclause mechanism is in
>> order. Still, this is unlikely to fix the fundamental issue that the
>> time for large join problems grows nonlinearly.

> Perhaps it's GEQO's fault that it's using these functions
> inappropriately, calling them often to calculate these answers
> whenever it needs them instead of looking once for join clauses and
> then optimizing based on the results. But I've never actually looked
> at geqo, mabe that's inherent in the design?

geqo isn't doing anything the regular planner wouldn't do under similar
conditions. It might well be that better caching is the answer to this
particular problem, but I don't have time to look closer today.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join, from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 15:52:52
Message-ID: 200907161752.53069.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 16 July 2009 17:27:39 Greg Stark wrote:
> On Thu, Jul 16, 2009 at 4:16 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > However, I do observe that this seems a sufficient counterexample
> > against the theory that we can just remove the collapse limits and let
> > GEQO save us on very complex queries. On my machine, the example query
> > takes about 22 seconds to plan using CVS HEAD w/ all default settings.
> > If I set both collapse_limit variables to very high values (I used 999),
> > it takes ... um ... not sure; I gave up waiting after half an hour.
> What's the point of GEQO if it doesn't guarantee to produce the
> optimal plana and *also* doesn't guarantee to produce some plan, any
> plan, within some reasonable amount of time? Either we need to fix
> that or else I don't see what it's buying us over our regular planner
> which also might not produce a plan within a reasonable amount of time
> but at least if it does it'll be the right plan.
Well, I could not find a plan where it errored out with the old limits. So one
could argue its just not adapted.
Although I also could not find a single case where geqo was relevantly faster
with the default settings even if it was used.
The default settings currently make it relatively hard to trigger geqo at all.

Andres


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join, from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 15:59:58
Message-ID: 10709.1247759998@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> The default settings currently make it relatively hard to trigger geqo at all.

Yes, and that was intentional. One of the implications of what we're
discussing here is that geqo would get used a lot more for "typical
complex queries" (if there is any such thing as a typical one). So
it's fully to be expected that the fallout would be pressure to improve
geqo in various ways.

Given that we are at the start of the development cycle, that prospect
doesn't scare me --- there's plenty of time to fix whatever needs
fixing. However, I am leaning to the feeling that I don't want to be
putting people in a position where they have no alternative but to use
geqo. So adjusting rather than removing the collapse limits is seeming
like a good idea.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join, from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 16:23:06
Message-ID: 11094.1247761386@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On Thursday 16 July 2009 17:16:31 Tom Lane wrote:
>> I tried the example query and couldn't get "Failed to make a valid plan"
>> out of it ... what settings do you need for that?

> It unfortunately depends on settings and luck. This dependence on luck was the
> reason why I liked geqo to behave "somewhat" deterministically...

> With {join,from}_collapse_limit = 100 it seems to be triggered reliably. With
> lower values it seems harder trigger, with bigger it simply takes too long to
> even get there.

OK, I see it at 100. Would you confirm that what you get is the failure
in random_init_pool (geqo_pool.c) not the identically-phrased message
elsewhere? (If you have VERBOSITY = verbose you should see the error
location info.)

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join, from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 16:30:16
Message-ID: 200907161830.16898.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 16 July 2009 18:23:06 Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On Thursday 16 July 2009 17:16:31 Tom Lane wrote:
> >> I tried the example query and couldn't get "Failed to make a valid plan"
> >> out of it ... what settings do you need for that?
> >
> > It unfortunately depends on settings and luck. This dependence on luck
> > was the reason why I liked geqo to behave "somewhat" deterministically...
> >
> > With {join,from}_collapse_limit = 100 it seems to be triggered reliably.
> > With lower values it seems harder trigger, with bigger it simply takes
> > too long to even get there.
>
> OK, I see it at 100. Would you confirm that what you get is the failure
> in random_init_pool (geqo_pool.c) not the identically-phrased message
> elsewhere? (If you have VERBOSITY = verbose you should see the error
> location info.)
Yes. I should have seen that. Its not exactly surprising...

Andres


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join, from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 16:49:08
Message-ID: 200907161849.09354.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 16 July 2009 17:59:58 Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > The default settings currently make it relatively hard to trigger geqo at
> > all.
>
> Yes, and that was intentional. One of the implications of what we're
> discussing here is that geqo would get used a lot more for "typical
> complex queries" (if there is any such thing as a typical one). So
> it's fully to be expected that the fallout would be pressure to improve
> geqo in various ways.
>
> Given that we are at the start of the development cycle, that prospect
> doesn't scare me --- there's plenty of time to fix whatever needs
> fixing. However, I am leaning to the feeling that I don't want to be
> putting people in a position where they have no alternative but to use
> geqo. So adjusting rather than removing the collapse limits is seeming
> like a good idea.
Hm. I see a, a bit more fundamental problem with geqo:
I tried several queries, and I found not a single one, where the whole
genetical process did any significant improvments to the 'worth'.
It seems that always the best variant out of the pool is either the path
choosen in the end, or at least the cost difference is _really_ low.

Andres


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join,from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 17:13:55
Message-ID: 603c8f070907161013o1c7d4dd0u45cf7fd907d18f0a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 16, 2009 at 12:49 PM, Andres Freund<andres(at)anarazel(dot)de> wrote:
> On Thursday 16 July 2009 17:59:58 Tom Lane wrote:
>> Andres Freund <andres(at)anarazel(dot)de> writes:
>> > The default settings currently make it relatively hard to trigger geqo at
>> > all.
>>
>> Yes, and that was intentional.  One of the implications of what we're
>> discussing here is that geqo would get used a lot more for "typical
>> complex queries" (if there is any such thing as a typical one).  So
>> it's fully to be expected that the fallout would be pressure to improve
>> geqo in various ways.
>>
>> Given that we are at the start of the development cycle, that prospect
>> doesn't scare me --- there's plenty of time to fix whatever needs
>> fixing.  However, I am leaning to the feeling that I don't want to be
>> putting people in a position where they have no alternative but to use
>> geqo.  So adjusting rather than removing the collapse limits is seeming
>> like a good idea.
> Hm. I see a, a bit more fundamental problem with geqo:
> I tried several queries, and I found not a single one, where the whole
> genetical process did any significant improvments to the 'worth'.
> It seems that always the best variant out of the pool is either the path
> choosen in the end, or at least the cost difference is _really_ low.

Ouch. Did you insert some debugging code to get that information, or
how did you come to that conclusion?

...Robert


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join,from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 17:22:30
Message-ID: 603c8f070907161022g14360f0i52c00ac3dc5a522a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 16, 2009 at 11:32 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> If I set both collapse_limit variables to very high values (I used 999),
>> it takes ... um ... not sure; I gave up waiting after half an hour.
>> I also tried with geqo_effort reduced to the minimum of 1, but that
>> didn't produce a plan in reasonable time either (I gave up after ten
>> minutes).
>
> After I gave up letting the machine be idle to get a fair timing,
> I turned on oprofile monitoring.  It looks a bit interesting:

That is interesting, but there's not really enough detail here to see
what is going on. I'm more interested in what the high-level
functions are doing that's causing these guys to be called so many
times. As Greg says, if the planning time curve for GEQO isn't better
than the one for the standard planner, it's the epitome of pointless.

> So maybe a redesign of the equivalence-class joinclause mechanism is in
> order.  Still, this is unlikely to fix the fundamental issue that the
> time for large join problems grows nonlinearly.

Nonlinear is one thing, but this looks more like exponential. I
understand that the standard planner is exponential; GEQO should not
be.

...Robert


From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join, from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 17:26:00
Message-ID: 200907161926.00591.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 16 July 2009 19:13:55 Robert Haas wrote:
> On Thu, Jul 16, 2009 at 12:49 PM, Andres Freund<andres(at)anarazel(dot)de> wrote:
> > On Thursday 16 July 2009 17:59:58 Tom Lane wrote:
> >> Andres Freund <andres(at)anarazel(dot)de> writes:
> >> > The default settings currently make it relatively hard to trigger geqo
> >> > at all.
> >>
> >> Yes, and that was intentional. One of the implications of what we're
> >> discussing here is that geqo would get used a lot more for "typical
> >> complex queries" (if there is any such thing as a typical one). So
> >> it's fully to be expected that the fallout would be pressure to improve
> >> geqo in various ways.
> >>
> >> Given that we are at the start of the development cycle, that prospect
> >> doesn't scare me --- there's plenty of time to fix whatever needs
> >> fixing. However, I am leaning to the feeling that I don't want to be
> >> putting people in a position where they have no alternative but to use
> >> geqo. So adjusting rather than removing the collapse limits is seeming
> >> like a good idea.
> >
> > Hm. I see a, a bit more fundamental problem with geqo:
> > I tried several queries, and I found not a single one, where the whole
> > genetical process did any significant improvments to the 'worth'.
> > It seems that always the best variant out of the pool is either the path
> > choosen in the end, or at least the cost difference is _really_ low.
> Ouch. Did you insert some debugging code to get that information, or
> how did you come to that conclusion?
Yes, I enabled GEQO_DEBUG and added some more debugging output.

Btw, a higher generation count does not change that.

Andres


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join, from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 17:26:22
Message-ID: 12841.1247765182@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Jul 16, 2009 at 11:32 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> So maybe a redesign of the equivalence-class joinclause mechanism is in
>> order. Still, this is unlikely to fix the fundamental issue that the
>> time for large join problems grows nonlinearly.

> Nonlinear is one thing, but this looks more like exponential. I
> understand that the standard planner is exponential; GEQO should not
> be.

Well, the equivclass code is new as of 8.3. It's possible that this
got broken relatively recently ...

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join, from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 17:29:09
Message-ID: 200907161929.09848.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 16 July 2009 19:22:30 Robert Haas wrote:
> On Thu, Jul 16, 2009 at 11:32 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > I wrote:
> >> If I set both collapse_limit variables to very high values (I used 999),
> >> it takes ... um ... not sure; I gave up waiting after half an hour.
> >> I also tried with geqo_effort reduced to the minimum of 1, but that
> >> didn't produce a plan in reasonable time either (I gave up after ten
> >> minutes).
> >
> > After I gave up letting the machine be idle to get a fair timing,
> > I turned on oprofile monitoring. It looks a bit interesting:
> That is interesting, but there's not really enough detail here to see
> what is going on. I'm more interested in what the high-level
> functions are doing that's causing these guys to be called so many
> times. As Greg says, if the planning time curve for GEQO isn't better
> than the one for the standard planner, it's the epitome of pointless.
It is not the actual genetic searching I now found out (or more precisely,
read the trace correctly).

At the start of the query GEQO fills a pool with random paths through the
searchspace. Unfortunately a random path is not very likely to succeed. So it
checks and checks and...

Thats why that problem is not visible with a simple join out of 100 or so
tables - all paths are valid there...

Andres


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject: Re: Review remove {join,from}_collapse_limit, add enable_join_ordering
Date: 2009-07-16 20:22:30
Message-ID: 20090716202230.GA1452@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 16, 2009 at 06:49:08PM +0200, Andres Freund wrote:
> On Thursday 16 July 2009 17:59:58 Tom Lane wrote:
> > Andres Freund <andres(at)anarazel(dot)de> writes:
> > > The default settings currently make it relatively hard to trigger geqo at
> > > all.
> >
> > Yes, and that was intentional. One of the implications of what we're
> > discussing here is that geqo would get used a lot more for "typical
> > complex queries" (if there is any such thing as a typical one). So
> > it's fully to be expected that the fallout would be pressure to improve
> > geqo in various ways.
> >
> > Given that we are at the start of the development cycle, that prospect
> > doesn't scare me --- there's plenty of time to fix whatever needs
> > fixing. However, I am leaning to the feeling that I don't want to be
> > putting people in a position where they have no alternative but to use
> > geqo. So adjusting rather than removing the collapse limits is seeming
> > like a good idea.
> Hm. I see a, a bit more fundamental problem with geqo:
> I tried several queries, and I found not a single one, where the whole
> genetical process did any significant improvments to the 'worth'.
> It seems that always the best variant out of the pool is either the path
> choosen in the end, or at least the cost difference is _really_ low.
>
>
> Andres
>

Hi Andres,

From some of my reading of the literature on join order
optimization via random sampling, such as what would establish
the initial GEQO pool, there is a very good possibility of having
a "pretty good" plan in the first pool, especially for our larger
initial pool sizes of 100-1000. And in fact, the final plan has
a good chance of being of approximately the same cost as a member
of the initial pool. Uniform sampling alone can give you a close
to optimum plan 80% of the time with an initial sample size of
100. And using biased sampling raises that to 99% or better.

Regards,
Ken