Re: [GENERAL] Performance of full outer join in 8.3

Lists: pgsql-generalpgsql-hackers
From: Christian Schröder <cs(at)deriva(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Performance of full outer join in 8.3
Date: 2009-04-15 11:03:53
Message-ID: 49E5BF19.2040209@deriva.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hi list,
we have just migrated one of our databases from 8.2.12 to 8.3.7. We now
experience a strange problem: A query that was really fast on the 8.2
server is now much slower on the 8.3 server (1 ms vs. 60 sec). I had a
look at the query plan and it is completely different. Both servers run
on the same machine. The configuration (planner constants etc.) is
identical. The database has been vacuum analyzed after the migration. So
why the difference?

This is the query:
select isin from ts_frontend.attachment_isins full OUTER JOIN
ts_frontend.rec_isins using (attachment,isin) WHERE attachment=2698120
GROUP BY isin limit 1000;

Here is the explain analyze in 8.2:


QUERY
PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=826.44..826.61 rows=17 width=32) (actual time=0.163..0.172
rows=2 loops=1)
-> HashAggregate (cost=826.44..826.61 rows=17 width=32) (actual
time=0.159..0.162 rows=2 loops=1)
-> Merge Full Join (cost=799.62..826.40 rows=17 width=32)
(actual time=0.122..0.144 rows=2 loops=1)
Merge Cond: (("outer"."?column3?" = "inner"."?column3?")
AND (attachment_isins.attachment = rec_isins.attachment))
Filter: (COALESCE(attachment_isins.attachment,
rec_isins.attachment) = 2698120)
-> Sort (cost=13.39..13.74 rows=138 width=20) (actual
time=0.065..0.067 rows=1 loops=1)
Sort Key: (attachment_isins.isin)::bpchar,
attachment_isins.attachment
-> Index Scan using
attachment_isins_attachment_idx on attachment_isins (cost=0.00..8.49
rows=138 width=20) (actual time=0.042..0.047 rows=1 loops=1)
Index Cond: (attachment = 2698120)
-> Sort (cost=786.23..794.80 rows=3429 width=20)
(actual time=0.045..0.049 rows=2 loops=1)
Sort Key: (rec_isins.isin)::bpchar,
rec_isins.attachment
-> Index Scan using idx_rec_isins_attachment on
rec_isins (cost=0.00..584.89 rows=3429 width=20) (actual
time=0.019..0.024 rows=2 loops=1)
Index Cond: (attachment = 2698120)
Total runtime: 0.302 ms
(14 rows)

And this is the 8.3 plan:

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=345890.35..345900.35 rows=1000 width=26) (actual
time=53926.706..53927.071 rows=2 loops=1)
-> HashAggregate (cost=345890.35..346296.11 rows=40576 width=26)
(actual time=53926.702..53927.061 rows=2 loops=1)
-> Merge Full Join (cost=71575.91..345788.91 rows=40576
width=26) (actual time=10694.727..53926.559 rows=2 loops=1)
Merge Cond: (((rec_isins.isin)::bpchar =
(attachment_isins.isin)::bpchar) AND (rec_isins.attachment =
attachment_isins.attachment))
Filter: (COALESCE(attachment_isins.attachment,
rec_isins.attachment) = 2698120)
-> Index Scan using rec_isin_pkey on rec_isins
(cost=0.00..229562.97 rows=8115133 width=17) (actual
time=0.141..18043.605 rows=8036226 loops=1)
-> Materialize (cost=71575.91..78318.19 rows=539383
width=17) (actual time=10181.074..14471.215 rows=539101 loops=1)
-> Sort (cost=71575.91..72924.36 rows=539383
width=17) (actual time=10181.064..13019.906 rows=539101 loops=1)
Sort Key: attachment_isins.isin,
attachment_isins.attachment
Sort Method: external merge Disk: 18936kB
-> Seq Scan on attachment_isins
(cost=0.00..13111.83 rows=539383 width=17) (actual time=0.036..912.963
rows=539101 loops=1)
Total runtime: 53937.213 ms
(12 rows)

These are the table definitions:
Table "ts_frontend.attachment_isins"
Column | Type | Modifiers
--------------+--------------------------------+-----------
attachment | integer | not null
isin | isin | not null
editor | name |
last_changed | timestamp(0) without time zone |
Indexes:
"attachment_isins_pkey" PRIMARY KEY, btree (attachment, isin)
"attachment_isins_attachment_idx" btree (attachment)
"attachment_isins_attachment_isin" btree (attachment, isin)
"attachment_isins_isin_idx" btree (isin)
Foreign-key constraints:
"attachment_isins_attachment_fkey" FOREIGN KEY (attachment)
REFERENCES ts_frontend.attachments(id) ON UPDATE CASCADE ON DELETE CASCADE

Table "ts_frontend.rec_isins"
Column | Type | Modifiers
------------+---------+-----------
attachment | integer | not null
isin | isin | not null
Indexes:
"rec_isin_pkey" PRIMARY KEY, btree (isin, attachment)
"idx_rec_isins_attachment" btree (attachment)
Foreign-key constraints:
"rec_isins_attachment_fkey" FOREIGN KEY (attachment) REFERENCES
ts_frontend.attachments(id) ON UPDATE CASCADE ON DELETE CASCADE

Thanks for any ideas!

Regards
Christian

P.S.: I think the full outer join is not what the developer really
wanted to do. Instead, he should have done a union (which is pretty
fast, by the way). However, I still want to understand why the query
plan of his query changed between both database releases.

--
Deriva GmbH Tel.: +49 551 489500-42
Financial IT and Consulting Fax: +49 551 489500-91
Hans-Böckler-Straße 2 http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer


From: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
To: Christian Schröder <cs(at)deriva(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Performance of full outer join in 8.3
Date: 2009-04-15 11:43:04
Message-ID: 2f4958ff0904150443s595b54dfg2ccbdf973a99f051@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

set work_mem=24000; before running the query.

postgres is doing merge and sort on disc, that's always slow.

is there an index on column isin ?


From: Christian Schröder <cs(at)deriva(dot)de>
To: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Performance of full outer join in 8.3
Date: 2009-04-15 12:04:03
Message-ID: 49E5CD33.6070406@deriva.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Grzegorz Jaśkiewicz wrote:
> set work_mem=24000; before running the query.
>
> postgres is doing merge and sort on disc, that's always slow.
>
Ok, but why is the plan different in 8.2? As you can see the same query
is really fast in 8.2, but slow in 8.3.
> is there an index on column isin ?
>
There is a separate index on the isin column of the attachment_isins
table (attachment_isins_isin_idx). The other table (rec_isins) has the
combination of attachment and isin as primary key which creates an
implicit index. Can this index be used for the single column isin? And
again: Why doesn't this matter in 8.2??

Regards,
Christian

--
Deriva GmbH Tel.: +49 551 489500-42
Financial IT and Consulting Fax: +49 551 489500-91
Hans-Böckler-Straße 2 http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer


From: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
To: Christian Schröder <cs(at)deriva(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Performance of full outer join in 8.3
Date: 2009-04-15 12:11:44
Message-ID: 2f4958ff0904150511r2593325eu46627b173915006a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

2009/4/15 Christian Schröder <cs(at)deriva(dot)de>:
> Grzegorz Jaśkiewicz wrote:
>>
>> set work_mem=24000; before running the query.
>>
>> postgres is doing merge and sort on disc, that's always slow.
>>
>
> Ok, but why is the plan different in 8.2? As you can see the same query is
> really fast in 8.2, but slow in 8.3.

Did that set help ?

I think Tom will know more about it, but probably (and I am guessing
here, to be honest) - Materialize plan wasn't either available, or
didn't appear too be a planners favourite.
on 8.2 the two loops instead were were much faster.

Can you try increasing stat target to 100, vacuum analyze and see if
different plan is choosen ?

Again, I don't know at that point why is it so - just trying to
suggests things that I would try .

>> is there an index on column isin ?
>>
>
> There is a separate index on the isin column of the attachment_isins table
> (attachment_isins_isin_idx). The other table (rec_isins) has the combination
> of attachment and isin as primary key which creates an implicit index. Can
> this index be used for the single column isin? And again: Why doesn't this
> matter in 8.2??

well, it is a different major release, and differences between
8.2->8.3 are vast.

--
GJ


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Christian Schröder <cs(at)deriva(dot)de>
Cc: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Performance of full outer join in 8.3
Date: 2009-04-15 12:25:40
Message-ID: 1239798340.23905.21.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


On Wed, 2009-04-15 at 14:04 +0200, Christian Schröder wrote:
> Grzegorz Jaśkiewicz wrote:
> > set work_mem=24000; before running the query.
> >
> > postgres is doing merge and sort on disc, that's always slow.
> >
> Ok, but why is the plan different in 8.2? As you can see the same query
> is really fast in 8.2, but slow in 8.3.

The cost of the query seems accurate, so the absence of
attachment_isins_attachment_idx on the 8.3 plan looks to be the reason.
There's no way it would choose to scan 8115133 rows on the pkey if the
other index was available and usable.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Christian Schröder <cs(at)deriva(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: Performance of full outer join in 8.3
Date: 2009-04-15 12:31:33
Message-ID: 2f4958ff0904150531h32358fex5cb705472abd271e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, Apr 15, 2009 at 1:25 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> The cost of the query seems accurate, so the absence of
> attachment_isins_attachment_idx on the 8.3 plan looks to be the reason.
> There's no way it would choose to scan 8115133 rows on the pkey if the
> other index was available and usable.

hance my question, if there's index on it in 8.3 version of db.

--
GJ


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christian Schröder <cs(at)deriva(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Performance of full outer join in 8.3
Date: 2009-04-15 15:33:01
Message-ID: 11212.1239809581@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

=?ISO-8859-1?Q?Christian_Schr=F6der?= <cs(at)deriva(dot)de> writes:
> This is the query:
> select isin from ts_frontend.attachment_isins full OUTER JOIN
> ts_frontend.rec_isins using (attachment,isin) WHERE attachment=2698120
> GROUP BY isin limit 1000;

Hmm. It seems 8.3 is failing to push the attachment=2698120 condition
down to the input relations. Not sure why. All that code got massively
rewritten in 8.3, but I thought it still understood about pushing
equalities through a full join ...

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christian Schröder <cs(at)deriva(dot)de>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-15 16:34:12
Message-ID: 12408.1239813252@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

I wrote:
> =?ISO-8859-1?Q?Christian_Schr=F6der?= <cs(at)deriva(dot)de> writes:
>> This is the query:
>> select isin from ts_frontend.attachment_isins full OUTER JOIN
>> ts_frontend.rec_isins using (attachment,isin) WHERE attachment=2698120
>> GROUP BY isin limit 1000;

> Hmm. It seems 8.3 is failing to push the attachment=2698120 condition
> down to the input relations. Not sure why. All that code got massively
> rewritten in 8.3, but I thought it still understood about pushing
> equalities through a full join ...

On further review, this did work in 8.3 when released. I think it got
broken here:

http://archives.postgresql.org/pgsql-committers/2008-06/msg00336.php

because that change is preventing the "mergedvar = constant" clause from
being seen as an equivalence, when it should be seen as one. Need to
think about a tighter fix for the bug report that prompted that change.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-15 21:31:18
Message-ID: 719.1239831078@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

I wrote:
> On further review, this did work in 8.3 when released. I think it got
> broken here:
> http://archives.postgresql.org/pgsql-committers/2008-06/msg00336.php
> because that change is preventing the "mergedvar = constant" clause from
> being seen as an equivalence, when it should be seen as one. Need to
> think about a tighter fix for the bug report that prompted that change.

The original bug report involved create_or_index_quals() pulling out
an index condition from an OR clause that appeared above an outer join
that could null the relation it wanted to indexscan. (In practice this
only arises if at least one arm of the OR has an IS NULL clause for the
target relation --- if all arms have ordinary strict index clauses
then we'd have determined during reduce_outer_joins that the outer join
could be simplified to a plain join.) I tried to fix this by altering
the meaning of the outerjoin_delayed flag slightly, but what Christian's
complaint shows is that that was a bad idea because it breaks valid
equivalence deductions.

Using outerjoin_delayed in create_or_index_quals() was always pretty
much of a crude hack anyway --- there are other cases in which it
prevents us from extracting index conditions that *would* be legitimate.
In particular, there's no reason why we should not extract an index
condition for the outer relation of the same outer join.

So I'm thinking the right thing to do is to eliminate outerjoin_delayed
from RestrictInfo in favor of storing a bitmapset that shows exactly
which relations referenced by the clause are nullable by outer joins
that are below the clause. Then create_or_index_quals() could ignore
an OR, or not, depending on whether the target relation is nullable
below the OR clause. This might permit finer-grain analysis in the
other places that currently depend on outerjoin_delayed too, though
for the moment I'll just make them check for empty-or-nonempty-set.

outerjoin_delayed should revert to its longstanding meaning within
distribute_qual_to_rels, but right at the moment there seems no
application for preserving it beyond that point. (On the other hand,
eliminating it from RestrictInfo isn't going to save any space because
of alignment considerations, so maybe we should keep it there in case
we need it in future.)

The main objection I can see to this is the expansion of RestrictInfo,
but it's a pretty large struct already and one more pointer isn't
going to make much difference.

Comments?

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-15 21:55:13
Message-ID: 1239832513.23905.82.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


On Wed, 2009-04-15 at 12:34 -0400, Tom Lane wrote:

> On further review, this did work in 8.3 when released. I think it got
> broken here:
>
> http://archives.postgresql.org/pgsql-committers/2008-06/msg00336.php
>
> because that change is preventing the "mergedvar = constant" clause from
> being seen as an equivalence, when it should be seen as one. Need to
> think about a tighter fix for the bug report that prompted that change.

I've always been scared to ask this question, in case the answer is No,
but: Do we have a set of regression tests for the optimizer anywhere?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-15 22:04:13
Message-ID: 1344.1239833053@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> I've always been scared to ask this question, in case the answer is No,
> but: Do we have a set of regression tests for the optimizer anywhere?

Nothing beyond what is in the standard tests. While that's okay at
catching wrong answers --- and we have memorialized a number of such
issues in the tests --- the framework is not good for catching things
that run slower than they ought.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-15 23:23:01
Message-ID: 603c8f070904151623ne07d744k615edd4aa669a64a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, Apr 15, 2009 at 6:04 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> I've always been scared to ask this question, in case the answer is No,
>> but: Do we have a set of regression tests for the optimizer anywhere?
>
> Nothing beyond what is in the standard tests.  While that's okay at
> catching wrong answers --- and we have memorialized a number of such
> issues in the tests --- the framework is not good for catching things
> that run slower than they ought.

We could add some regression tests that create a sample data set,
ANALYZE it, and then EXPLAIN various things. The results should be
deterministic, but creating a reasonably comprehensive set of tests
might be a fair amount of work, and would likely add significantly to
the runtime of the tests. Maybe it would need to be a separate suite
just for optimizer testing.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-15 23:39:51
Message-ID: 3010.1239838791@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> We could add some regression tests that create a sample data set,
> ANALYZE it, and then EXPLAIN various things. The results should be
> deterministic,

Sorry, you're wrong.

The output of EXPLAIN is nowhere near stable enough to use within the
current exact-match regression test framework. I'm not sure it would
be stable even if we suppressed the rowcount and cost figures. Those
figures vary across platforms (because of alignment effects and probably
other things) and are also sensitive to the timing of autovacuums. It
is known that a nontrivial fraction of the existing regression test
cases do suffer from uninteresting plan changes across platforms or
as a result of various phase-of-the-moon effects; that's why we keep
having to add "ORDER BY" clauses now and then.

The other problem with any large set of such tests is that any time you
intentionally change the optimizer, a great deal of careful analysis
would be needed to determine if the resulting EXPLAIN changes were good,
bad, or indifferent; not to mention whether the change *should* have
changed some plans that did not change.

There might be net value in maintaining such a test suite, but it would
be a lot of work with no certain benefit, and I don't see anyone
stepping up to do it.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-16 00:58:18
Message-ID: 603c8f070904151758w6af25641xac831b4cb71c4184@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, Apr 15, 2009 at 7:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> We could add some regression tests that create a sample data set,
>> ANALYZE it, and then EXPLAIN various things.  The results should be
>> deterministic,
>
> Sorry, you're wrong.
>
> The output of EXPLAIN is nowhere near stable enough to use within the
> current exact-match regression test framework.  I'm not sure it would
> be stable even if we suppressed the rowcount and cost figures.  Those
> figures vary across platforms (because of alignment effects and probably
> other things) and are also sensitive to the timing of autovacuums.  It
> is known that a nontrivial fraction of the existing regression test
> cases do suffer from uninteresting plan changes across platforms or
> as a result of various phase-of-the-moon effects; that's why we keep
> having to add "ORDER BY" clauses now and then.

Interesting. I suppose you could insulate yourself from this somewhat
by populating pg_statistic with a particular set of values rather than
relying on ANALYZE to gather them, but this would have the substantial
downside of being way more work to maintain, especially if anyone ever
changed pg_statistic.

On a more practical level, I do think we need to give real
consideration to some kind of options syntax for EXPLAIN, maybe
something as simple as:

EXPLAIN (option_name, ...) query

Or maybe:

EXPLAIN (option_name = value, ...) query

It may or may not be the case that generating a useful regression test
suite for the planner is too much work for anyone to bother, but they
certainly won't if the tools aren't available. It seems we get at
least one request a month for some kind of explain-output option:
suppress row counts, suppress costs, gather I/O statistics, show
outputs, show # of batches for a hash join, and on and on and on. I
think we should implement a very basic version that maybe does nothing
more than let you optionally suppress some of the existing output, but
which provides an extensible syntax for others to build on.

Would you support such a change?

> The other problem with any large set of such tests is that any time you
> intentionally change the optimizer, a great deal of careful analysis
> would be needed to determine if the resulting EXPLAIN changes were good,
> bad, or indifferent; not to mention whether the change *should* have
> changed some plans that did not change.

Arguably it would be a good thing to examine planner changes with this
level of scrutiny, but I agree that the prospect is pretty
intimidating.

> There might be net value in maintaining such a test suite, but it would
> be a lot of work with no certain benefit, and I don't see anyone
> stepping up to do it.

...Robert


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-16 05:12:10
Message-ID: 1239858730.23905.125.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


On Wed, 2009-04-15 at 20:58 -0400, Robert Haas wrote:
> On Wed, Apr 15, 2009 at 7:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> > The output of EXPLAIN is nowhere near stable enough to use within the
> > current exact-match regression test framework. I'm not sure it would
> > be stable even if we suppressed the rowcount and cost figures. Those
> > figures vary across platforms (because of alignment effects and probably
> > other things) and are also sensitive to the timing of autovacuums. It
> > is known that a nontrivial fraction of the existing regression test
> > cases do suffer from uninteresting plan changes across platforms or
> > as a result of various phase-of-the-moon effects; that's why we keep
> > having to add "ORDER BY" clauses now and then.
>
> Interesting. I suppose you could insulate yourself from this somewhat
> by populating pg_statistic with a particular set of values rather than
> relying on ANALYZE to gather them, but this would have the substantial
> downside of being way more work to maintain, especially if anyone ever
> changed pg_statistic.
>
> On a more practical level, I do think we need to give real
> consideration to some kind of options syntax for EXPLAIN, maybe
> something as simple as:
>
> EXPLAIN (option_name, ...) query
>
> Or maybe:
>
> EXPLAIN (option_name = value, ...) query
>
> It may or may not be the case that generating a useful regression test
> suite for the planner is too much work for anyone to bother, but they
> certainly won't if the tools aren't available. It seems we get at
> least one request a month for some kind of explain-output option:
> suppress row counts, suppress costs, gather I/O statistics, show
> outputs, show # of batches for a hash join, and on and on and on. I
> think we should implement a very basic version that maybe does nothing
> more than let you optionally suppress some of the existing output, but
> which provides an extensible syntax for others to build on.

I think the way to do this is to introduce plan output in XML (that
matches the node structure of the plan). We can then filter away any
junk we don't want to see for regression tests, or better still augment
the exact-match framework with a fuzzy-match spec that allows us to
specify a range of values.

The skill would be in constructing a set of tests that was not sensitive
to minor changes. The OP's join for example had a huge cost range
difference that would have clearly shown up in a regression test.

This will only move forward if it adds value directly for Tom, so if
it's worth doing then he needs to specify it and ask for someone to do
it. There will be someone available if the task is well defined.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Christian Schröder <cs(at)deriva(dot)de>
To: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Performance of full outer join in 8.3
Date: 2009-04-16 11:31:45
Message-ID: 49E71721.4000800@deriva.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Grzegorz Jaśkiewicz wrote:
> On Wed, Apr 15, 2009 at 1:25 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>> The cost of the query seems accurate, so the absence of
>> attachment_isins_attachment_idx on the 8.3 plan looks to be the reason.
>> There's no way it would choose to scan 8115133 rows on the pkey if the
>> other index was available and usable.
>>
>
> hance my question, if there's index on it in 8.3 version of db.
>
I added an index on this column, but it didn't change the query plan.
Stupid question: Do I have to analyze again or perform a reindex after
adding the index?

Regards,
Christian

--
Deriva GmbH Tel.: +49 551 489500-42
Financial IT and Consulting Fax: +49 551 489500-91
Hans-Böckler-Straße 2 http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-16 11:35:25
Message-ID: 603c8f070904160435v27bdb791s4e6844b8d84c72a2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

2009/4/16 Simon Riggs <simon(at)2ndquadrant(dot)com>:
> On Wed, 2009-04-15 at 20:58 -0400, Robert Haas wrote:
>> On Wed, Apr 15, 2009 at 7:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> > The output of EXPLAIN is nowhere near stable enough to use within the
>> > current exact-match regression test framework.  I'm not sure it would
>> > be stable even if we suppressed the rowcount and cost figures.  Those
>> > figures vary across platforms (because of alignment effects and probably
>> > other things) and are also sensitive to the timing of autovacuums.  It
>> > is known that a nontrivial fraction of the existing regression test
>> > cases do suffer from uninteresting plan changes across platforms or
>> > as a result of various phase-of-the-moon effects; that's why we keep
>> > having to add "ORDER BY" clauses now and then.
>>
>> Interesting.  I suppose you could insulate yourself from this somewhat
>> by populating pg_statistic with a particular set of values rather than
>> relying on ANALYZE to gather them, but this would have the substantial
>> downside of being way more work to maintain, especially if anyone ever
>> changed pg_statistic.
>>
>> On a more practical level, I do think we need to give real
>> consideration to some kind of options syntax for EXPLAIN, maybe
>> something as simple as:
>>
>> EXPLAIN (option_name, ...) query
>>
>> Or maybe:
>>
>> EXPLAIN (option_name = value, ...) query
>>
>> It may or may not be the case that generating a useful regression test
>> suite for the planner is too much work for anyone to bother, but they
>> certainly won't if the tools aren't available.  It seems we get at
>> least one request a month for some kind of explain-output option:
>> suppress row counts, suppress costs, gather I/O statistics, show
>> outputs, show # of batches for a hash join, and on and on and on.  I
>> think we should implement a very basic version that maybe does nothing
>> more than let you optionally suppress some of the existing output, but
>> which provides an extensible syntax for others to build on.
>
> I think the way to do this is to introduce plan output in XML (that
> matches the node structure of the plan). We can then filter away any
> junk we don't want to see for regression tests, or better still augment
> the exact-match framework with a fuzzy-match spec that allows us to
> specify a range of values.

I think XML explain output is a good idea, but I don't think it's a
substitute for better options to control the human-readable form. But
the nice thing is that with an extensible syntax, this is not an
either/or proposition.

> The skill would be in constructing a set of tests that was not sensitive
> to minor changes. The OP's join for example had a huge cost range
> difference that would have clearly shown up in a regression test.
>
> This will only move forward if it adds value directly for Tom, so if
> it's worth doing then he needs to specify it and ask for someone to do
> it. There will be someone available if the task is well defined.

I'm not sure if by this you mean the EXPLAIN changes or the regression
tests, but either way I think you're half right: it's probably not
necessary for Tom to provide the spec, but it would sure be nice if he
could at least indicate his lack of objection to accepting a
well-designed patch in one of these areas - because no one is going to
want to go to the trouble of doing either of these things and then
have Tom say "well, I never liked that idea anyway".

...Robert


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Performance of full outer join in 8.3
Date: 2009-04-16 11:44:53
Message-ID: 20090416114452.GL12225@frubble.xen.chris-lamb.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Thu, Apr 16, 2009 at 01:31:45PM +0200, Christian Schröder wrote:
> Stupid question: Do I have to analyze again or perform a reindex after
> adding the index?

No, it's a regression in PG's handling of outer joins---it used to
realise that this was a possible optimisation, but now it doesn't.

Tom Lane started discussion on -hackers about this issue:

http://archives.postgresql.org/pgsql-hackers/2009-04/msg00849.php

it looks as though performance in 8.3 is going to be bad until this
behaviour is changed. A possible fix is to rewrite your query to work
around the problem:

SELECT isin
FROM (SELECT * FROM ts_frontend.attachment_isins WHERE attachment = 2698120) a
FULL OUTER JOIN (SELECT * FROM ts_frontend.rec_isins WHERE attachment = 2698120) USING (isin)
GROUP BY isin
LIMIT 1000;

It looks as though what you're trying to do could also be expressed as:

SELECT isin FROM ts_frontend.rec_isins WHERE attachment = 2698120
UNION
SELECT isin FROM ts_frontend.attachment_isins WHERE attachment = 2698120;

not sure if it's part of something larger so this may not be a useful
transform.

--
Sam http://samason.me.uk/


From: David Fetter <david(at)fetter(dot)org>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-16 15:21:38
Message-ID: 20090416152138.GB22988@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Thu, Apr 16, 2009 at 06:12:10AM +0100, Simon Riggs wrote:
> >
> > EXPLAIN (option_name, ...) query
> >
> > Or maybe:
> >
> > EXPLAIN (option_name = value, ...) query
> >
> > It may or may not be the case that generating a useful regression
> > test suite for the planner is too much work for anyone to bother,
> > but they certainly won't if the tools aren't available. It seems
> > we get at least one request a month for some kind of
> > explain-output option: suppress row counts, suppress costs, gather
> > I/O statistics, show outputs, show # of batches for a hash join,
> > and on and on and on. I think we should implement a very basic
> > version that maybe does nothing more than let you optionally
> > suppress some of the existing output, but which provides an
> > extensible syntax for others to build on.
>
> I think the way to do this is to introduce plan output in XML

If we're going with a serialization, which I think would be an
excellent idea, how about one that's light-weight and human-readable
like JSON?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-16 15:36:54
Message-ID: 603c8f070904160836h1ca9f1d6n90dd653e32a2916a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Thu, Apr 16, 2009 at 11:21 AM, David Fetter <david(at)fetter(dot)org> wrote:
> If we're going with a serialization, which I think would be an
> excellent idea, how about one that's light-weight and human-readable
> like JSON?

Wow, that's a great idea for another option to EXPLAIN. Wouldn't it
be nice if EXPLAIN supported an options syntax?!!!

:-)

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-16 15:50:51
Message-ID: 18888.1239897051@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I think XML explain output is a good idea, but I don't think it's a
> substitute for better options to control the human-readable form.

Yeah. I think a well-designed XML output format for EXPLAIN is a fine
thing to work on, but I don't believe it would make the "create a
planner test suite" problem noticeably easier.

I see the purpose of an XML format as being to allow tools like
Red Hat's old Visual Explain (now maintained by EDB IIRC) to parse
EXPLAIN's output with somewhat better odds of not breaking from
one release to the next.

regards, tom lane


From: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
To: David Fetter <david(at)fetter(dot)org>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-16 18:04:34
Message-ID: ACAEDC49-CDC9-4E3E-8264-87DA78B1E09A@pointblue.com.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


On 16 Apr 2009, at 16:21, David Fetter wrote:

> On Thu, Apr 16, 2009 at 06:12:10AM +0100, Simon Riggs wrote:
>>
>> I think the way to do this is to introduce plan output in XML
>
> If we're going with a serialization, which I think would be an
> excellent idea, how about one that's light-weight and human-readable
> like JSON?
+1

xml/json is machine readable.
I don't think, personaly that explain (analyze) is not easy to read by
human, quite contrary.


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
Cc: David Fetter <david(at)fetter(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-16 18:41:15
Message-ID: b42b73150904161141p2421e82cw55b59a92f402cddc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Thu, Apr 16, 2009 at 2:04 PM, Grzegorz Jaskiewicz
<gj(at)pointblue(dot)com(dot)pl> wrote:
>
> On 16 Apr 2009, at 16:21, David Fetter wrote:
>
>> On Thu, Apr 16, 2009 at 06:12:10AM +0100, Simon Riggs wrote:
>>>
>>> I think the way to do this is to introduce plan output in XML
>>
>> If we're going with a serialization, which I think would be an
>> excellent idea, how about one that's light-weight and human-readable
>> like JSON?
>
> +1
>
> xml/json is machine readable.
> I don't think, personaly that explain (analyze) is not easy to read by
> human, quite contrary.

Is that because of how the output is formatted though, or because the
concepts are difficult to express? (I agree though, json is better
especially for structures that are possibly highly nested).

merlni


From: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-16 18:45:47
Message-ID: 9B0D1547-B358-426B-A6DC-4A3D2E97F236@pointblue.com.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


On 16 Apr 2009, at 19:41, Merlin Moncure wrote:
>
> Is that because of how the output is formatted though, or because the
> concepts are difficult to express? (I agree though, json is better
> especially for structures that are possibly highly nested).
What I mean is that what postgresql displays currently as
explain(analyze[verbose]) is clear and understandable.
Also, it is getting better and better from version to version. So I
don't personally agree, that it is unreadable - and I am up for (and I
am sure many users like me are) JSON, or XML output.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christian Schröder <cs(at)deriva(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Performance of full outer join in 8.3
Date: 2009-04-16 20:48:23
Message-ID: 4827.1239914903@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

I wrote:
> =?ISO-8859-1?Q?Christian_Schr=F6der?= <cs(at)deriva(dot)de> writes:
>> This is the query:
>> select isin from ts_frontend.attachment_isins full OUTER JOIN
>> ts_frontend.rec_isins using (attachment,isin) WHERE attachment=2698120
>> GROUP BY isin limit 1000;

> Hmm. It seems 8.3 is failing to push the attachment=2698120 condition
> down to the input relations. Not sure why. All that code got massively
> rewritten in 8.3, but I thought it still understood about pushing
> equalities through a full join ...

I've applied a patch for this. It will be in 8.3.8, or if you're
in a hurry you can grab it from our CVS server or here:

http://archives.postgresql.org/message-id/20090416204228.579317540E2@cvs.postgresql.org

regards, tom lane


From: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christian Schröder <cs(at)deriva(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: Performance of full outer join in 8.3
Date: 2009-04-17 08:22:27
Message-ID: 2f4958ff0904170122r6d55f006qafc31e3ffd162b2f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Thu, Apr 16, 2009 at 9:48 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>
> I've applied a patch for this.  It will be in 8.3.8, or if you're
> in a hurry you can grab it from our CVS server or here:
>
> http://archives.postgresql.org/message-id/20090416204228.579317540E2@cvs.postgresql.org

just out of curiosity - when was it introduced, ie - which version was
the first affected ? We're still on 8.3.5 here.

--
GJ


From: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christian Schröder <cs(at)deriva(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: Performance of full outer join in 8.3
Date: 2009-04-17 08:24:33
Message-ID: 2f4958ff0904170124wb40fa96w5126fd5470d1e437@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Fri, Apr 17, 2009 at 9:22 AM, Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com> wrote:

> just out of curiosity - when was it introduced, ie - which version was
> the first affected ? We're still on 8.3.5 here.
(I had no idea release-notes have date), it got in by 8.3.4 (changed
right after 8.3.3 was released).

>
> --
> GJ
>

--
GJ


From: Christian Schröder <cs(at)deriva(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Performance of full outer join in 8.3
Date: 2009-04-17 10:52:30
Message-ID: 49E85F6E.6000704@deriva.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> I've applied a patch for this. It will be in 8.3.8, or if you're
> in a hurry you can grab it from our CVS server or here:
>
Thanks a lot for your effort and the quick response!

Regards,
Christian

--
Deriva GmbH Tel.: +49 551 489500-42
Financial IT and Consulting Fax: +49 551 489500-91
Hans-Böckler-Straße 2 http://www.deriva.de
D-37079 Göttingen

Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer


From: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
To: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-17 23:39:41
Message-ID: E7DEF98D-ADA4-4DC0-9254-857DA0D7F1F4@pointblue.com.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Btw, There was a "EXPLAIN XML" summer of code project, wasn't there ?


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-18 10:50:50
Message-ID: 1240051850.7401.7.camel@huvostro
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, 2009-04-15 at 18:04 -0400, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > I've always been scared to ask this question, in case the answer is No,
> > but: Do we have a set of regression tests for the optimizer anywhere?
>
> Nothing beyond what is in the standard tests. While that's okay at
> catching wrong answers --- and we have memorialized a number of such
> issues in the tests --- the framework is not good for catching things
> that run slower than they ought.

Can't we make first cut at it by just running with timings on and then
compare ratios of running times - maybe with 2-3X tolerance - to catch
most obvious regressions ?

> regards, tom lane

--
Hannu Krosing http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability
Services, Consulting and Training


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-18 12:07:23
Message-ID: 49E9C27B.1020606@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hannu Krosing wrote:
> On Wed, 2009-04-15 at 18:04 -0400, Tom Lane wrote:
>
>> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>>
>>> I've always been scared to ask this question, in case the answer is No,
>>> but: Do we have a set of regression tests for the optimizer anywhere?
>>>
>> Nothing beyond what is in the standard tests. While that's okay at
>> catching wrong answers --- and we have memorialized a number of such
>> issues in the tests --- the framework is not good for catching things
>> that run slower than they ought.
>>
>
> Can't we make first cut at it by just running with timings on and then
> compare ratios of running times - maybe with 2-3X tolerance - to catch
> most obvious regressions ?
>
>

The current regression tests are a series of yes/no answers to this
question: does the actual output match the expected output. Nothing like
as fuzzy as what you are suggesting is supported at all. From time to
time suggestions are made for a performance farm as a kind of analog to
the buildfarm, which would look at quantitative timing tests rather than
just success/failure tests. It on my (very long) list of things to do,
but it not something we can just tack on to the current regression suite
simply.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-18 12:32:07
Message-ID: 14553.1240057927@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Hannu Krosing wrote:
>> Can't we make first cut at it by just running with timings on and then
>> compare ratios of running times - maybe with 2-3X tolerance - to catch
>> most obvious regressions ?

> The current regression tests are a series of yes/no answers to this
> question: does the actual output match the expected output. Nothing like
> as fuzzy as what you are suggesting is supported at all.

Quite aside from that, I don't think that's really the framework we
want. The issues that I think would be worth having tests for are
questions like "will the planner push comparisons to constants down
through a full join?" (which was the bug that started this thread).
With a test methodology like the above, it wouldn't be enough to
write a test case that exercised the behavior; you'd have to make
sure that any alternative plan was an order of magnitude worse.

I'm inclined to think that some sort of fuzzy examination of EXPLAIN
output (in this example, "are there constant-comparison conditions in
the relation scans?") might do the job, but I'm not sure how we'd
go about that.

regards, tom lane


From: Greg Stark <stark(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-18 12:39:27
Message-ID: 4136ffa0904180539p64e3afc5s6ce52a371d2d3387@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Sat, Apr 18, 2009 at 1:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I'm inclined to think that some sort of fuzzy examination of EXPLAIN
> output (in this example, "are there constant-comparison conditions in
> the relation scans?") might do the job, but I'm not sure how we'd
> go about that.

If we just removed all the costs and other metrics from the explain
plan and verified that the plan structure was the same would you be
happy with that? It would still be work to maintain every time the
planner changed.

I suppose if we had explain-to-a-table then we could run explain and
then run an sql query to verify the specific properties we were
looking for.

A similar thing could be done with xml if we had powerful enough xml
predicates but we have a lot more sql skills in-house than xml.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <stark(at)enterprisedb(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-18 21:41:21
Message-ID: 23669.1240090881@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Greg Stark <stark(at)enterprisedb(dot)com> writes:
> On Sat, Apr 18, 2009 at 1:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm inclined to think that some sort of fuzzy examination of EXPLAIN
>> output (in this example, "are there constant-comparison conditions in
>> the relation scans?") might do the job, but I'm not sure how we'd
>> go about that.

> If we just removed all the costs and other metrics from the explain
> plan and verified that the plan structure was the same would you be
> happy with that? It would still be work to maintain every time the
> planner changed.

> I suppose if we had explain-to-a-table then we could run explain and
> then run an sql query to verify the specific properties we were
> looking for.

> A similar thing could be done with xml if we had powerful enough xml
> predicates but we have a lot more sql skills in-house than xml.

Yeah, I suspect the only really good answers involve the ability to
apply programmable checks to the EXPLAIN output. A SQL-based solution
shouldn't need any external moving parts, whereas analyzing XML output
presumably would.

I guess then one criterion for whether you've built a good output
definition for explain-to-table is whether it's feasible to check this
type of question using SQL predicates.

regards, tom lane


From: Tino Wildenhain <tino(at)wildenhain(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-18 22:36:54
Message-ID: 49EA5606.5020002@wildenhain.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> Greg Stark <stark(at)enterprisedb(dot)com> writes:
...
>> I suppose if we had explain-to-a-table then we could run explain and
>> then run an sql query to verify the specific properties we were
>> looking for.
>
>> A similar thing could be done with xml if we had powerful enough xml
>> predicates but we have a lot more sql skills in-house than xml.
>
> Yeah, I suspect the only really good answers involve the ability to
> apply programmable checks to the EXPLAIN output. A SQL-based solution
> shouldn't need any external moving parts, whereas analyzing XML output
> presumably would.

If only an explain-to-a-table would be one of the available options
and not the only option that would be great. The big O only has this
option and it totally sux if you want to explain a query on a production
environment where you can't just create tables here and there.

Tino


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Christian Schröder <cs(at)deriva(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] Performance of full outer join in 8.3
Date: 2009-04-20 06:09:40
Message-ID: 1240207780.23905.186.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


On Sat, 2009-04-18 at 08:32 -0400, Tom Lane wrote:

> The issues that I think would be worth having tests for are
> questions like "will the planner push comparisons to constants down
> through a full join?" (which was the bug that started this thread).

Yes, that sounds good.

> With a test methodology like the above, it wouldn't be enough to
> write a test case that exercised the behavior; you'd have to make
> sure that any alternative plan was an order of magnitude worse.
>
> I'm inclined to think that some sort of fuzzy examination of EXPLAIN
> output (in this example, "are there constant-comparison conditions in
> the relation scans?") might do the job, but I'm not sure how we'd
> go about that.

We can compose unit tests that have plans where the presence/absence of
the optimizer action is critical to a good plan. i.e. if the
constant-comparison is *not* pushed down it will be unable to use an
index created for it and so run cost will be much greater. We can then
define success in terms of a reduction in plan cost below a threshold.

So for each test we specify
* SQL
* a success threshold for cost

e.g.

For a piece of SQL we have cost = 60002.2 without optimisation or 12.45
with optimisation, so we make the threshold 20.0. Enough slack to allow
for changes in plan costs on platforms/over time, yet sufficient to
discriminate between working/non-working optimisation.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support