Re: [GENERAL] Performance of full outer join in 8.3

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-16 00:58:18
Message-ID: 603c8f070904151758w6af25641xac831b4cb71c4184@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Wed, Apr 15, 2009 at 7:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> We could add some regression tests that create a sample data set,
>> ANALYZE it, and then EXPLAIN various things.  The results should be
>> deterministic,
>
> Sorry, you're wrong.
>
> The output of EXPLAIN is nowhere near stable enough to use within the
> current exact-match regression test framework.  I'm not sure it would
> be stable even if we suppressed the rowcount and cost figures.  Those
> figures vary across platforms (because of alignment effects and probably
> other things) and are also sensitive to the timing of autovacuums.  It
> is known that a nontrivial fraction of the existing regression test
> cases do suffer from uninteresting plan changes across platforms or
> as a result of various phase-of-the-moon effects; that's why we keep
> having to add "ORDER BY" clauses now and then.

Interesting. I suppose you could insulate yourself from this somewhat
by populating pg_statistic with a particular set of values rather than
relying on ANALYZE to gather them, but this would have the substantial
downside of being way more work to maintain, especially if anyone ever
changed pg_statistic.

On a more practical level, I do think we need to give real
consideration to some kind of options syntax for EXPLAIN, maybe
something as simple as:

EXPLAIN (option_name, ...) query

Or maybe:

EXPLAIN (option_name = value, ...) query

It may or may not be the case that generating a useful regression test
suite for the planner is too much work for anyone to bother, but they
certainly won't if the tools aren't available. It seems we get at
least one request a month for some kind of explain-output option:
suppress row counts, suppress costs, gather I/O statistics, show
outputs, show # of batches for a hash join, and on and on and on. I
think we should implement a very basic version that maybe does nothing
more than let you optionally suppress some of the existing output, but
which provides an extensible syntax for others to build on.

Would you support such a change?

> The other problem with any large set of such tests is that any time you
> intentionally change the optimizer, a great deal of careful analysis
> would be needed to determine if the resulting EXPLAIN changes were good,
> bad, or indifferent; not to mention whether the change *should* have
> changed some plans that did not change.

Arguably it would be a good thing to examine planner changes with this
level of scrutiny, but I agree that the prospect is pretty
intimidating.

> There might be net value in maintaining such a test suite, but it would
> be a lot of work with no certain benefit, and I don't see anyone
> stepping up to do it.

...Robert

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message mrLami 2009-04-16 00:59:57 Re: PgAdmin & PosgreSQL Plus Setup
Previous Message Tom Lane 2009-04-15 23:39:51 Re: [GENERAL] Performance of full outer join in 8.3

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2009-04-16 02:41:42 Re: psql with "Function Type" in \df
Previous Message Alvaro Herrera 2009-04-16 00:54:10 Re: psql with "Function Type" in \df