Re: [GENERAL] Performance of full outer join in 8.3

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-16 11:35:25
Message-ID: 603c8f070904160435v27bdb791s4e6844b8d84c72a2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

2009/4/16 Simon Riggs <simon(at)2ndquadrant(dot)com>:
> On Wed, 2009-04-15 at 20:58 -0400, Robert Haas wrote:
>> On Wed, Apr 15, 2009 at 7:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> > The output of EXPLAIN is nowhere near stable enough to use within the
>> > current exact-match regression test framework.  I'm not sure it would
>> > be stable even if we suppressed the rowcount and cost figures.  Those
>> > figures vary across platforms (because of alignment effects and probably
>> > other things) and are also sensitive to the timing of autovacuums.  It
>> > is known that a nontrivial fraction of the existing regression test
>> > cases do suffer from uninteresting plan changes across platforms or
>> > as a result of various phase-of-the-moon effects; that's why we keep
>> > having to add "ORDER BY" clauses now and then.
>>
>> Interesting.  I suppose you could insulate yourself from this somewhat
>> by populating pg_statistic with a particular set of values rather than
>> relying on ANALYZE to gather them, but this would have the substantial
>> downside of being way more work to maintain, especially if anyone ever
>> changed pg_statistic.
>>
>> On a more practical level, I do think we need to give real
>> consideration to some kind of options syntax for EXPLAIN, maybe
>> something as simple as:
>>
>> EXPLAIN (option_name, ...) query
>>
>> Or maybe:
>>
>> EXPLAIN (option_name = value, ...) query
>>
>> It may or may not be the case that generating a useful regression test
>> suite for the planner is too much work for anyone to bother, but they
>> certainly won't if the tools aren't available.  It seems we get at
>> least one request a month for some kind of explain-output option:
>> suppress row counts, suppress costs, gather I/O statistics, show
>> outputs, show # of batches for a hash join, and on and on and on.  I
>> think we should implement a very basic version that maybe does nothing
>> more than let you optionally suppress some of the existing output, but
>> which provides an extensible syntax for others to build on.
>
> I think the way to do this is to introduce plan output in XML (that
> matches the node structure of the plan). We can then filter away any
> junk we don't want to see for regression tests, or better still augment
> the exact-match framework with a fuzzy-match spec that allows us to
> specify a range of values.

I think XML explain output is a good idea, but I don't think it's a
substitute for better options to control the human-readable form. But
the nice thing is that with an extensible syntax, this is not an
either/or proposition.

> The skill would be in constructing a set of tests that was not sensitive
> to minor changes. The OP's join for example had a huge cost range
> difference that would have clearly shown up in a regression test.
>
> This will only move forward if it adds value directly for Tom, so if
> it's worth doing then he needs to specify it and ask for someone to do
> it. There will be someone available if the task is well defined.

I'm not sure if by this you mean the EXPLAIN changes or the regression
tests, but either way I think you're half right: it's probably not
necessary for Tom to provide the spec, but it would sure be nice if he
could at least indicate his lack of objection to accepting a
well-designed patch in one of these areas - because no one is going to
want to go to the trouble of doing either of these things and then
have Tom say "well, I never liked that idea anyway".

...Robert

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sam Mason 2009-04-16 11:44:53 Re: Performance of full outer join in 8.3
Previous Message Christian Schröder 2009-04-16 11:31:45 Re: Performance of full outer join in 8.3

Browse pgsql-hackers by date

  From Date Subject
Next Message Sam Mason 2009-04-16 11:44:53 Re: Performance of full outer join in 8.3
Previous Message Christian Schröder 2009-04-16 11:31:45 Re: Performance of full outer join in 8.3