from_collapse_limit vs. geqo_threshold

Lists: pgsql-hackers
From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: from_collapse_limit vs. geqo_threshold
Date: 2009-05-21 03:40:31
Message-ID: 603c8f070905202040v66cd3054t434c0b73aa844c63@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The docs contain the following sage advice concerning from_collapse_limit:

"It is usually wise to keep this less than geqo_threshold."

I've been thinking through this advice on and off for about 2 years
and I still don't understand it. The point of either
from_collapse_limit and geqo_threshold is to avoid exponential growth
in planning time when planning large join nests. Either of them has
the effect of reducing planning time at the expense of possibly
getting an inferior plan. However, it's my experience that the plans
that result when the from_collapse_limit kicks in are almost
invariably terrible when fetching only a small number of rows. On the
other hand, the few GEQO plans I've had experience with have been
pretty reasonable.

It appears that this statement has been in our documentation since Tom
Lane added FROM_COLLAPSE_LIMIT (back then, it was capitalized) on
January 25, 2003 (9bf97ff426de9), but I can't find any justification
for it anywhere. I think we either need to justify this advice, or
remove it.

...Robert


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: from_collapse_limit vs. geqo_threshold
Date: 2009-05-21 04:21:40
Message-ID: 4A14D6D4.4090205@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert,

> It appears that this statement has been in our documentation since Tom
> Lane added FROM_COLLAPSE_LIMIT (back then, it was capitalized) on
> January 25, 2003 (9bf97ff426de9), but I can't find any justification
> for it anywhere. I think we either need to justify this advice, or
> remove it.

... trying to remember why I wrote that ... what would happen if
FROM_COLLAPSE_LIMIT was *more* than GEQO_THRESHOLD?

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: from_collapse_limit vs. geqo_threshold
Date: 2009-05-21 04:31:43
Message-ID: 603c8f070905202131n65dabc3cnb74b9f0b9524d151@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 21, 2009 at 12:21 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> It appears that this statement has been in our documentation since Tom
>> Lane added FROM_COLLAPSE_LIMIT (back then, it was capitalized) on
>> January 25, 2003 (9bf97ff426de9), but I can't find any justification
>> for it anywhere.  I think we either need to justify this advice, or
>> remove it.
>
> ... trying to remember why I wrote that ... what would happen if
> FROM_COLLAPSE_LIMIT was *more* than GEQO_THRESHOLD?

The two variables do different things, so there's nothing particularly
magical about which one is larger AFAICS. I believe that if you make
from_collapse_limit larger than geqo_threshold, then GEQO might be
asked to plan a query into which subqueries have been pulled up. But
that's not obviously bad; the alternative is planning the subquery
separately and first, which at least for the very small number of
cases that I've tested seems to be quite a bit worse.

Apparently before from_collapse_limit was added the behavior existed,
but the thereshold was geqo_threshold/2. So someone had a reason for
believing that when the join nest got too large, not pulling up
subqueries was a superior coping strategy versus invoking GEQO. I
just don't know what the reason is, or whether it's still valid.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: from_collapse_limit vs. geqo_threshold
Date: 2009-05-21 11:50:56
Message-ID: 8847.1242906656@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> Robert,
>> It appears that this statement has been in our documentation since Tom
>> Lane added FROM_COLLAPSE_LIMIT (back then, it was capitalized) on
>> January 25, 2003 (9bf97ff426de9), but I can't find any justification
>> for it anywhere. I think we either need to justify this advice, or
>> remove it.

> ... trying to remember why I wrote that ... what would happen if
> FROM_COLLAPSE_LIMIT was *more* than GEQO_THRESHOLD?

I think I wrote it, not you. The point of the advice is to keep
subquery collapsation (hm, what's the right noun form? Need caffeine)
from turning a non-GEQO query into a GEQO one, and thus subjecting
you to unpredictable plans. Maybe the resulting plans would be better
on average, or maybe they wouldn't, but in any case they'd be
unpredictable.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: from_collapse_limit vs. geqo_threshold
Date: 2009-05-21 12:13:09
Message-ID: 603c8f070905210513m29b1a916la3d0e53e8bf02f8d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 21, 2009 at 7:50 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> Robert,
>>> It appears that this statement has been in our documentation since Tom
>>> Lane added FROM_COLLAPSE_LIMIT (back then, it was capitalized) on
>>> January 25, 2003 (9bf97ff426de9), but I can't find any justification
>>> for it anywhere.  I think we either need to justify this advice, or
>>> remove it.
>
>> ... trying to remember why I wrote that ... what would happen if
>> FROM_COLLAPSE_LIMIT was *more* than GEQO_THRESHOLD?
>
> I think I wrote it, not you.  The point of the advice is to keep
> subquery collapsation (hm, what's the right noun form?  Need caffeine)
> from turning a non-GEQO query into a GEQO one, and thus subjecting
> you to unpredictable plans.  Maybe the resulting plans would be better
> on average, or maybe they wouldn't, but in any case they'd be
> unpredictable.

That's more or less what I figured, but my real world experience is
that pulling up subqueries and using GEQO leads to plans that are
random but tolerable, whereas not pulling up subqueries leads to plans
that are almost uniformly bad. Actually, it works OK if really would
have needed to materialize the entire subquery, but otherwise it
stinks. My real unvarnished opinion on this topic is that
from_collapse_limit is a loaded foot-gun waiting to go off. We might
as well have an option where if the number of tables in the query
exceeds a certain threshold, we'll just sequential-scan the table
rather than considering the use of indices. That option would
actually be better, because everyone who read the documentation would
be absolutely certain that they wanted to turn that option OFF,
whereas the behavior of from_collapse_limit is sufficiently complex
that it isn't obvious that it's a terrible idea.

...Robert


From: Greg Stark <stark(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: from_collapse_limit vs. geqo_threshold
Date: 2009-05-21 12:51:20
Message-ID: 4136ffa0905210551u22eeb31bn5655dbe7c9a3aed5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Having just raised the statistics targets I wonder if we should look
at raising these two parameters too. The experience on the lists are
that when people run into either of these two limits we recommend
raising them and we've never seen anyone come back complaining that
their planning time goes through the roof.

To benchmark this like we did for the statistics target will be tricky
though. I suppose we can gather candidate queries by trolling the
lists for recommendations, though we didn't always get schemas to go
along with the queries.

--
greg


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: from_collapse_limit vs. geqo_threshold
Date: 2009-05-21 13:38:12
Message-ID: 603c8f070905210638h2e60068fw7efcb96427eb84c5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 21, 2009 at 8:51 AM, Greg Stark <stark(at)enterprisedb(dot)com> wrote:
> Having just raised the statistics targets I wonder if we should look
> at raising these two parameters too. The experience on the lists are
> that when people run into either of these two limits we recommend
> raising them and we've never seen anyone come back complaining that
> their planning time goes through the roof.
>
> To benchmark this like we did for the statistics target will be tricky
> though. I suppose we can gather candidate queries by trolling the
> lists for recommendations, though we didn't always get schemas to go
> along with the queries.

I think that's a good idea, too, but it's unrelated to the question of
which one should be set to the higher value.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: from_collapse_limit vs. geqo_threshold
Date: 2009-05-25 22:15:06
Message-ID: 9134.1243289706@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, May 21, 2009 at 7:50 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>>> ... trying to remember why I wrote that ... what would happen if
>>> FROM_COLLAPSE_LIMIT was *more* than GEQO_THRESHOLD?
>>
>> I think I wrote it, not you. The point of the advice is to keep
>> subquery collapsation (hm, what's the right noun form? Need caffeine)
>> from turning a non-GEQO query into a GEQO one, and thus subjecting
>> you to unpredictable plans. Maybe the resulting plans would be better
>> on average, or maybe they wouldn't, but in any case they'd be
>> unpredictable.

> That's more or less what I figured, but my real world experience is
> that pulling up subqueries and using GEQO leads to plans that are
> random but tolerable, whereas not pulling up subqueries leads to plans
> that are almost uniformly bad.

I went back and looked at the CVS history to try to refresh my memory
about how we got here. As best I can find, there were two steps:

1. The original commit of the ability to have subqueries at all,
during 7.1 development:

2000-09-29 14:21 tgl

Subselects in FROM clause, per
ISO syntax: FROM (SELECT ...) [AS] alias. (Don't forget that an
alias is required.) Views reimplemented as expanding to
subselect-in-FROM. Grouping, aggregates, DISTINCT in views
actually work now (he says optimistically). No UNION support in
subselects/views yet, but I have some ideas about that.
Rule-related permissions checking moved out of rewriter and into
executor. INITDB REQUIRED!

This introduced the ability to pull up subqueries, but with an arbitrary
limit of geqo_threshold/2 on the number of relations that would be
collected into a single planning problem.

2. During 7.4 development, we did this:

2003-01-25 18:10 tgl

Allow the planner to collapse explicit inner JOINs together, rather
than necessarily following the JOIN syntax to develop the query
plan. The old behavior is still available by setting GUC variable
JOIN_COLLAPSE_LIMIT to 1. Also create a GUC variable
FROM_COLLAPSE_LIMIT to control the similar decision about when to
collapse sub-SELECT lists into their parent lists. (This behavior
existed already, but the limit was always GEQO_THRESHOLD/2; now
it's separately adjustable.)

The excuse for join_collapse_limit to exist at all is largely one of
backwards compatibility. Up to then, we had not-infrequently suggested
that people could force a desired join order by writing an explicit JOIN
nest, and eliminating that escape hatch altogether didn't seem like a
good idea. I think from_collapse_limit was added largely on grounds of
symmetry.

Now, as to why the original commit had the geqo_threshold/2 restriction:
it was obviously not based on field experience with flattening, because
we didn't have any. What I think it *was* based on was that GEQO sucked
really badly back then, and I wanted to avoid having it kick in for
queries that it had never kicked in for in previous releases. Some
quick comparisons say that 7.1 in GEQO mode was about 5X slower than
HEAD (despite its planning being a lot more simplistic), and tended to
find considerably worse plans. Some of the significant improvements
since then:

2004-01-23 18:54 tgl

Revise GEQO planner to make use of some heuristic knowledge about
SQL, namely that it's good to join where there are join clauses
rather than where there are not. Also enable it to generate bushy
plans at need, so that it doesn't fail in the presence of multiple
IN clauses containing sub-joins.

2004-01-21 18:33 tgl

Repair error apparently introduced in the initial
coding of GUC: the default value for geqo_effort is supposed to be
40, not 1. The actual 'genetic' component of the GEQO algorithm
has been practically disabled since 7.1 because of this mistake.

Also, up to 7.0 there were some nasty memory leaks in the planner and
especially in GEQO, because we didn't have the memory context mechanism.
I think those were actually fixed as of 2000-09-29, but GEQO still had a
reputation for blowing out backend memory.

Now I'm still not exactly happy with GEQO, but it's surely a lot better
than it was in the fall of 2000. So on the whole it does seem that the
current relationships between from_collapse_limit, join_collapse_limit,
and geqo_threshold are based on obsolete information and should be
revisited. I don't have any data at hand to suggest specific new
default values, though.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: from_collapse_limit vs. geqo_threshold
Date: 2009-05-26 01:00:21
Message-ID: 603c8f070905251800g5b86d2dav26eca7f417d15dbf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, May 25, 2009 at 6:15 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Now I'm still not exactly happy with GEQO, but it's surely a lot better
> than it was in the fall of 2000.  So on the whole it does seem that the
> current relationships between from_collapse_limit, join_collapse_limit,
> and geqo_threshold are based on obsolete information and should be
> revisited.  I don't have any data at hand to suggest specific new
> default values, though.

For 8.4, I'd be happy to just improve the documentation. I think this
sentence could just be deleted from the section on
from_collapse_limit:

It is usually wise to keep this less than <xref linkend="guc-geqo-threshold">.

We could put some other explanation in place of that sentence, but I'm
not exactly sure what that explanation would say. I guess the point
is that setting from_collapse_limit < geqo_threshold may delay GEQO
planning considerably in the face of complex subqueries, because
pulling up subqueries increases the size of the FROM list (I think).
That could be good if you want your query plans to be more
deterministic, but there's no guarantee they'll be good. Setting
from_collapse_limit > geqo_threshold is basically saying that the
standard planner will always have subqueries pulled up, so
from_collapse_limit should be based on what the pain point will be for
GEQO.

I'm not sure there's a lot of point in spelling all that out, though.
It more or less follows from the definition of the parameters. So,
I'd be just as happy to delete the misleading hint and call it good.
But I could go either way.

For 8.5, it sounds like we need to do some testing to determine an
appropriate set of values, but I'm not exactly sure what to test. As
a practical matter, the correct level of effort depends a lot on how
long the query figures to run. For OLAP queries, planning times of
more than 50 ms or so start to add noticeably to the overall runtime
of the query, but if the query is expected to run for several minutes,
we'd presumably be happy to spend several seconds planning it, which
might make it feasible to use the standard planner even for very, very
big queries.

I'm not 100% convinced of the value of join_collapse_limit for
anything other than explicit control over the join order. I have yet
to meet a PostgreSQL who thought that it was intuitive that it might
matter whether you wrote A JOIN B ON P1 JOIN C ON P2 JOIN D ON P3
[etc] or A, B, C, D, [etc] WHERE P1, P2, P3. I suspect there are many
people who, if they knew that the latter might optimize better than
the former in some circumstances, would simply always write it in the
latter fashion, which makes the whole thing look a lot like a
concealed foot-gun, since whether or not it actually protects you
against exponential planning-time growth has a lot to do with how you
happen to like to write your queries (myself, I've switched styles in
the last few years).

...Robert


From: Selena Deckelmann <selena(at)endpoint(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: from_collapse_limit vs. geqo_threshold
Date: 2009-06-01 19:34:51
Message-ID: 4A242D5B.50604@endpoint.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

In the spirit of helping wrap-up 8.4 todo items...

Robert Haas wrote:
> On Mon, May 25, 2009 at 6:15 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Now I'm still not exactly happy with GEQO, but it's surely a lot better
>> than it was in the fall of 2000. So on the whole it does seem that the
>> current relationships between from_collapse_limit, join_collapse_limit,
>> and geqo_threshold are based on obsolete information and should be
>> revisited. I don't have any data at hand to suggest specific new
>> default values, though.
>
> For 8.4, I'd be happy to just improve the documentation. I think this
> sentence could just be deleted from the section on
> from_collapse_limit:
>
> It is usually wise to keep this less than<xref linkend="guc-geqo-threshold">.
>
> We could put some other explanation in place of that sentence, but I'm
> not exactly sure what that explanation would say. I guess the point
> is that setting from_collapse_limit< geqo_threshold may delay GEQO
> planning considerably in the face of complex subqueries, because
> pulling up subqueries increases the size of the FROM list (I think).
> That could be good if you want your query plans to be more
> deterministic, but there's no guarantee they'll be good. Setting
> from_collapse_limit> geqo_threshold is basically saying that the
> standard planner will always have subqueries pulled up, so
> from_collapse_limit should be based on what the pain point will be for
> GEQO.

My vote would be to provide some information.

Suggested revision of Robert's prose:

Because genetic query optimization may be triggered, increasing
from_collapse_limit should be considered relative to <xref
linkend="guc-geqo-threshold">.

-selena

--
Selena Deckelmann
End Point Corporation
selena(at)endpoint(dot)com
503-282-2512


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Selena Deckelmann <selena(at)endpoint(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: from_collapse_limit vs. geqo_threshold
Date: 2009-06-02 03:46:39
Message-ID: 603c8f070906012046t5d741cflceda8b513f4a5d67@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 1, 2009 at 3:34 PM, Selena Deckelmann <selena(at)endpoint(dot)com> wrote:
> In the spirit of helping wrap-up 8.4 todo items...
> Robert Haas wrote:
>> For 8.4, I'd be happy to just improve the documentation.  I think this
>> sentence could just be deleted from the section on
>> from_collapse_limit:
> My vote would be to provide some information.
>
> Suggested revision of Robert's prose:
>
> Because genetic query optimization may be triggered, increasing
> from_collapse_limit should be considered relative to <xref
> linkend="guc-geqo-threshold">.

Here's my attempt.

...Robert

Attachment Content-Type Size
from-collapse-limit-doc.patch text/x-patch 1.2 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Selena Deckelmann <selena(at)endpoint(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: from_collapse_limit vs. geqo_threshold
Date: 2009-06-02 17:39:45
Message-ID: 15470.1243964385@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Jun 1, 2009 at 3:34 PM, Selena Deckelmann <selena(at)endpoint(dot)com> wrote:
>> Suggested revision of Robert's prose:
>>
>> Because genetic query optimization may be triggered, increasing
>> from_collapse_limit should be considered relative to <xref
>> linkend="guc-geqo-threshold">.

> Here's my attempt.

I applied the attached, along with some other wordsmithing.

regards, tom lane

***************
*** 2252,2261 ****
The planner will merge sub-queries into upper queries if the
resulting <literal>FROM</literal> list would have no more than
this many items. Smaller values reduce planning time but might
! yield inferior query plans. The default is eight. It is usually
! wise to keep this less than <xref linkend="guc-geqo-threshold">.
For more information see <xref linkend="explicit-joins">.
</para>
</listitem>
</varlistentry>

--- 2261,2275 ----
The planner will merge sub-queries into upper queries if the
resulting <literal>FROM</literal> list would have no more than
this many items. Smaller values reduce planning time but might
! yield inferior query plans. The default is eight.
For more information see <xref linkend="explicit-joins">.
</para>
+
+ <para>
+ Setting this value to <xref linkend="guc-geqo-threshold"> or more
+ may trigger use of the GEQO planner, resulting in nondeterministic
+ plans. See <xref linkend="runtime-config-query-geqo">.
+ </para>
</listitem>
</varlistentry>