Possible planner bug/regression introduced in 8.2.5

Lists: pgsql-bugs
From: Jakub Ouhrabka <kuba(at)comgate(dot)cz>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Possible planner bug/regression introduced in 8.2.5
Date: 2007-10-24 08:57:18
Message-ID: 471F08EE.7040009@comgate.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi,

we switched from 8.2.4 to 8.2.5 and were forced to switch back. Our main
goal was to improve planner perfomance with outer joins per this change
from release notes:

Fix some planner problems with outer joins, notably poor size estimation
for t1 LEFT JOIN t2 WHERE t2.col IS NULL (Tom)

But it seems that this change introduced (or exhibited) new bug/regression.

We can't get even query plan (explain) for some queries. The server
process starts to chew up memory then server starts to swap and then we
have to kill the process. On 8.2.4 the query works fine.

I don't have self-contained test case but I'll try to provide as much
details as possible.

The simplified query but still causing the problem looks like this:

SELECT
1 -- select list doesn't matter
FROM
a JOIN b ON (pk = fk)
b JOIN c ON (...)

[ 13 joins like this in total, not exactly star query ]

LEFT OUTER JOIN x ON (pk = fk)
LEFT OUTER JOIN y ON (pk = fk)

-- WHERE clause doesn't matter
;

There are cca 15 tables involved, if I remove one of the outer joins,
query is planned ok. Some of the tables are big (millions), some are
small. There are many fks between the tables and many indexes (most if
not all join conditions are indexed).

The behaviour is exhibited when "normal" planner is used, not geqo.

Statistics target is set to 500.

I tried to simplify the query even more and now I have query which on
8.2.4 is planned instantly and on 8.2.5 takes cca 8 seconds. The query
and query plan is attached. All entities are tables, views were eliminated.

What more info should I provide? What can I do to debug this problem?
Can someone see the cause offhand from the planner changes in 8.2.5?

On (possibly) unrelated note: what is recommanded maximum for
geqo_threshold? We don't care if planning takes few seconds and produces
good plan - our experience with non-deterministic geqo for queries with
many big tables is not the best one (the plans vary too much between
subsequent runs).

Thanks,

Kuba

Attachment Content-Type Size
problem.txt text/plain 6.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jakub Ouhrabka <kuba(at)comgate(dot)cz>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possible planner bug/regression introduced in 8.2.5
Date: 2007-10-24 13:27:14
Message-ID: 1549.1193232434@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Jakub Ouhrabka <kuba(at)comgate(dot)cz> writes:
> I tried to simplify the query even more and now I have query which on
> 8.2.4 is planned instantly and on 8.2.5 takes cca 8 seconds.

Are you sure you were using the same planner parameters (particularly
join_collapse_limit and the geqo threshold) in both cases?

regards, tom lane


From: Jakub Ouhrabka <kuba(at)comgate(dot)cz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possible planner bug/regression introduced in 8.2.5
Date: 2007-10-24 13:46:38
Message-ID: 471F4CBE.5060208@comgate.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi Tom,

>> I tried to simplify the query even more and now I have query which on
>> 8.2.4 is planned instantly and on 8.2.5 takes cca 8 seconds.
>
> Are you sure you were using the same planner parameters (particularly
> join_collapse_limit and the geqo threshold) in both cases?

thanks for the reply.

Yes, I hope I'm using same parameters. To exhibit the behaviour I set
geqo to off and join_collapse_limit (and from_collapse_limit) to 1000.

On the test server, I can start postgresql 8.2.4 and 8.2.5 on the same
datafiles and there is still this big difference in planning time. I
hope I'm not missing something obvious and not wasting your time...

What can I do to help to debug it?

Thanks,

Kuba


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jakub Ouhrabka <kuba(at)comgate(dot)cz>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possible planner bug/regression introduced in 8.2.5
Date: 2007-10-24 14:01:22
Message-ID: 2018.1193234482@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Jakub Ouhrabka <kuba(at)comgate(dot)cz> writes:
> What can I do to help to debug it?

Either poke into the code yourself, or submit a self-contained test case
(the query alone does not a test case make). I can't offhand think of a
reason for 8.2.5 to be slower than 8.2.4 ...

regards, tom lane


From: Jakub Ouhrabka <kuba(at)comgate(dot)cz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possible planner bug/regression introduced in 8.2.5
Date: 2007-10-24 16:51:22
Message-ID: 471F780A.2020408@comgate.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi Tom,

> Either poke into the code yourself, or submit a self-contained test
> case (the query alone does not a test case make). I can't offhand
> think of a reason for 8.2.5 to be slower than 8.2.4 ...

preparing the test case was easier than I expected. It's attached. Fast
planning on 8.2.4, very slow on 8.2.5.

Thanks,

Kuba

begin;

set geqo to off;

set join_collapse_limit to 1000;
set from_collapse_limit to 1000;

Create table a1
(
a1_pk Serial NOT NULL,
primary key (a1_pk)
) Without Oids;

Create table a2
(
a2_pk Serial NOT NULL,
a3_id Integer NOT NULL,
primary key (a2_pk)
) Without Oids;

Create table a4
(
a4_pk Serial NOT NULL,
a4_verze_pk_prvni Integer,
a4_verze_pk_aktualni Integer,
primary key (a4_pk)
) Without Oids;

Create table a6
(
a6_pk Serial NOT NULL,
primary key (a6_pk)
) Without Oids;

Create table a7_verze
(
a7_verze_pk Serial NOT NULL,
a7_pk Integer NOT NULL,
a7_verze_pk_predchozi Integer,
a7_verze_pk_dalsi Integer,
a9_pk Integer NOT NULL,
a1_pk_ulozil Integer NOT NULL,
primary key (a7_verze_pk)
) Without Oids;

Create table a7
(
a7_pk Serial NOT NULL,
a2_pk Integer NOT NULL,
a4_pk Integer NOT NULL,
a7_verze_pk_prvni Integer,
a7_verze_pk_aktualni Integer,
primary key (a7_pk)
) Without Oids;

Create table a8
(
a8_pk Serial NOT NULL,
a8_id Integer NOT NULL UNIQUE,
primary key (a8_pk)
) Without Oids;

Create table a9
(
a9_pk Serial NOT NULL,
a9_a10_id Integer NOT NULL,
a3_id Integer,
a8_id Integer NOT NULL,
primary key (a9_pk)
) Without Oids;

Create table a4_verze
(
a4_verze_pk Serial NOT NULL,
a4_pk Integer NOT NULL,
a4_verze_pk_predchozi Integer,
a4_verze_pk_dalsi Integer,
a1_pk_ulozil Integer NOT NULL,
a6_pk Integer,
primary key (a4_verze_pk)
) Without Oids;

Create table a9_a10
(
a9_a10_pk Serial NOT NULL,
a9_a10_id Integer NOT NULL UNIQUE,
primary key (a9_a10_pk)
) Without Oids;

CREATE TABLE abc_a7 (
abc_a7_pk integer NOT NULL,
a7_verze_pk integer NOT NULL,
a6_pk_trvala integer NOT NULL,
a3 int
);

CREATE TABLE abc_prirazeni_a10u (
abc_prirazeni_a10u_pk integer NOT NULL,
abc_a7_pk integer NOT NULL,
abc_a10_id integer NOT NULL
);

CREATE TABLE a3 (
a3_pk integer NOT NULL,
a3_id integer NOT NULL
);

CREATE TABLE abc_a10 (
abc_a10_pk integer NOT NULL,
abc_a10_id integer NOT NULL
);

select now();
explain
SELECT
1
FROM a7_verze ov
JOIN a7 op ON op.a7_pk = ov.a7_pk
JOIN a9 sp ON sp.a9_pk = ov.a9_pk
JOIN a9_a10 sc ON sc.a9_a10_id = sp.a9_a10_id
JOIN a8 fr ON fr.a8_id = sp.a8_id
JOIN a4 ko ON ko.a4_pk = op.a4_pk
JOIN a4_verze kov ON kov.a4_pk = ko.a4_pk
JOIN a2 ka ON ka.a2_pk = op.a2_pk
JOIN a3 pr ON pr.a3_id = ka.a3_id
JOIN a1 uz_ko ON uz_ko.a1_pk = kov.a1_pk_ulozil
JOIN abc_a7 abc_pripad ON abc_pripad.a7_verze_pk =
ov.a7_verze_pk
LEFT OUTER JOIN a6 a6_trvala ON a6_trvala.a6_pk =
abc_pripad.a6_pk_trvala
LEFT OUTER JOIN abc_a10 abc_a10_abc_pripad_a3 ON
abc_a10_abc_pripad_a3.abc_a10_id = abc_pripad.a3
;

select now();

rollback;


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jakub Ouhrabka <kuba(at)comgate(dot)cz>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possible planner bug/regression introduced in 8.2.5
Date: 2007-10-24 20:57:13
Message-ID: 13632.1193259433@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Jakub Ouhrabka <kuba(at)comgate(dot)cz> writes:
> preparing the test case was easier than I expected. It's attached. Fast
> planning on 8.2.4, very slow on 8.2.5.

Hmm. I think there are two different bugs involved here. One is fixed
by the attached patch, and it masks the performance problem on your test
case, but I wonder whether it will be enough for your real application.
Can you try this and see if it fixes 8.2.5 for you?

regards, tom lane

Attachment Content-Type Size
unknown_filename text/plain 678 bytes

From: Jakub Ouhrabka <kuba(at)comgate(dot)cz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possible planner bug/regression introduced in 8.2.5
Date: 2007-10-26 13:59:17
Message-ID: 4721F2B5.80803@comgate.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi Tom,

> Hmm. I think there are two different bugs involved here. One is fixed
> by the attached patch, and it masks the performance problem on your test
> case, but I wonder whether it will be enough for your real application.
> Can you try this and see if it fixes 8.2.5 for you?

many thanks for the quick patch! PostreSQL support is truly unbeatable...

We did some test today with patched 8.2.5. For some cases it is ok but
for large ones it is still taking quite a big time (e.g. for 23 joins is
the planning time 107s). Maybe there's no regression from 8.2.4 - we'll
do the tests against 8.2.4 on Monday - it's not usual for us to run with
geqo switched off and we don't monitor the planning time - the total
runtime for these queries is in minutes so we don't know were the time
is spent.

I'll keep you updated.

Kuba


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jakub Ouhrabka <kuba(at)comgate(dot)cz>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possible planner bug/regression introduced in 8.2.5
Date: 2007-10-26 14:29:02
Message-ID: 17292.1193408942@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Jakub Ouhrabka <kuba(at)comgate(dot)cz> writes:
> We did some test today with patched 8.2.5. For some cases it is ok but
> for large ones it is still taking quite a big time (e.g. for 23 joins is
> the planning time 107s). Maybe there's no regression from 8.2.4 - we'll
> do the tests against 8.2.4 on Monday - it's not usual for us to run with
> geqo switched off and we don't monitor the planning time - the total
> runtime for these queries is in minutes so we don't know were the time
> is spent.

Yeah, I was afraid that might happen. In the test case you sent, if
the first LEFT JOIN is changed to RIGHT JOIN then the runtime goes
right back up, because then it actually is the case make_outerjoininfo
is looking for where the two outer joins can't be reordered. So you
probably have some cases like that in your real application. But I'd
expect 8.2.4 to be equally slow because it also contains the code that
is slow in that scenario. I'm hoping to get some time today to think
about how that could be fixed.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jakub Ouhrabka <kuba(at)comgate(dot)cz>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possible planner bug/regression introduced in 8.2.5
Date: 2007-10-26 18:12:55
Message-ID: 23523.1193422375@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

I wrote:
> ...probably have some cases like that in your real application. But I'd
> expect 8.2.4 to be equally slow because it also contains the code that
> is slow in that scenario. I'm hoping to get some time today to think
> about how that could be fixed.

Please try the attached patch (in addition to the one I sent earlier).

regards, tom lane

Attachment Content-Type Size
join-order-2.patch application/octet-stream 17.5 KB

From: Jakub Ouhrabka <kuba(at)comgate(dot)cz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possible planner bug/regression introduced in 8.2.5
Date: 2007-10-29 15:27:48
Message-ID: 4725FBF4.1080403@comgate.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi Tom,

>> ...probably have some cases like that in your real application. But I'd
>> expect 8.2.4 to be equally slow because it also contains the code that
>> is slow in that scenario. I'm hoping to get some time today to think
>> about how that could be fixed.
>
> Please try the attached patch (in addition to the one I sent earlier).

I can confirm that now there is no regression between 8.2.4 and
8.2.5+one-line-patch.

I've also tried your last patch but still for some queries there is long
planning time, e.g. more than 30s on a fast machine for query with 14
regularly JOINed tables plus 8 tables are LEFT OUTER JOINed (all normal
joins are followed by all outer joins). All joins are constrained on
pk/fk. Should this case run faster or is it simply too much for
geqo=off? Shall I prepare a test case?

Thanks,

Kuba


From: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possible planner bug/regression introduced in 8.2.5
Date: 2007-10-29 16:13:32
Message-ID: c94bd73628c655df40c3da997bdcaf65@biglumber.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

Tom Lane wrote:

> Please try the attached patch (in addition to the one I sent earlier).

This is biting us too, quite badly. Any chance this can get pushed into a
8.2.6?

- --
Greg Sabino Mullane greg(at)turnstep(dot)com
PGP Key: 0x14964AC8 200710291212
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFHJgZyvJuQZxSWSsgRA61KAJ9ZIz220HZD8u2fr0T+NEg+rFh/AwCgmZIA
iXhO5S+j3jGyqxamsQxqYlY=
=pR+g
-----END PGP SIGNATURE-----


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jakub Ouhrabka <kuba(at)comgate(dot)cz>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possible planner bug/regression introduced in 8.2.5
Date: 2007-10-29 18:18:15
Message-ID: 13324.1193681895@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Jakub Ouhrabka <kuba(at)comgate(dot)cz> writes:
> I've also tried your last patch but still for some queries there is long
> planning time, e.g. more than 30s on a fast machine for query with 14
> regularly JOINed tables plus 8 tables are LEFT OUTER JOINed (all normal
> joins are followed by all outer joins). All joins are constrained on
> pk/fk. Should this case run faster or is it simply too much for
> geqo=off? Shall I prepare a test case?

By "last patch" you mean
http://archives.postgresql.org/pgsql-committers/2007-10/msg00409.php
? If so that's about as fast as it's likely to get. 22 tables is well
beyond what I'd consider reasonable to run through the exhaustive
planner. You should try geqo again.

regards, tom lane


From: Jakub Ouhrabka <kuba(at)comgate(dot)cz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possible planner bug/regression introduced in 8.2.5
Date: 2007-10-29 18:35:21
Message-ID: 472627E9.5060206@comgate.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


> By "last patch" you mean
> http://archives.postgresql.org/pgsql-committers/2007-10/msg00409.php
> ?

Sorry for confusion, I meant this one:

http://archives.postgresql.org/pgsql-bugs/2007-10/msg00217.php

Is it the same as the commited one?

> If so that's about as fast as it's likely to get. 22 tables is well
> beyond what I'd consider reasonable to run through the exhaustive
> planner. You should try geqo again.

OK, no problem...

Anyway, many thanks for the quick patches!

Kuba


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jakub Ouhrabka <kuba(at)comgate(dot)cz>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possible planner bug/regression introduced in 8.2.5
Date: 2007-10-29 19:02:43
Message-ID: 14911.1193684563@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Jakub Ouhrabka <kuba(at)comgate(dot)cz> writes:
>> By "last patch" you mean
>> http://archives.postgresql.org/pgsql-committers/2007-10/msg00409.php

> Sorry for confusion, I meant this one:
> http://archives.postgresql.org/pgsql-bugs/2007-10/msg00217.php
> Is it the same as the commited one?

Yeah, should be the same.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possible planner bug/regression introduced in 8.2.5
Date: 2007-10-30 04:09:03
Message-ID: 15599.1193717343@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Greg Sabino Mullane" <greg(at)turnstep(dot)com> writes:
> Tom Lane wrote:
>> Please try the attached patch (in addition to the one I sent earlier).

> This is biting us too, quite badly. Any chance this can get pushed into a
> 8.2.6?

Those patches are certainly already in the 8.2 CVS branch, so your
question seems to mean "are we going to push 8.2.6 immediately to fix
this". My vote would be no --- 8.2.5 is less than six weeks old and
we don't have that many bugs against it. Given the overhead involved
in a release, both from our point of view as packagers and users' point
of view in having to install it, a single bug has to be pretty darn
catastrophic to force an update by itself. This doesn't seem to me
to reach that level...

regards, tom lane


From: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possible planner bug/regression introduced in 8.2.5
Date: 2007-11-05 18:04:49
Message-ID: 37b43c9a0baf9a954154d27aa6386668@biglumber.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

> Tom Lane wrote:
>> Please try the attached patch (in addition to the one I sent earlier).

> This is biting us too, quite badly. Any chance this can get pushed into a
> 8.2.6?

> Those patches are certainly already in the 8.2 CVS branch, so your
> question seems to mean "are we going to push 8.2.6 immediately to fix
> this". My vote would be no --- 8.2.5 is less than six weeks old and
> we don't have that many bugs against it. Given the overhead involved
> in a release, both from our point of view as packagers and users' point
> of view in having to install it, a single bug has to be pretty darn
> catastrophic to force an update by itself. This doesn't seem to me
> to reach that level...

I suppose catastophic is in the eye of the beholder, but this is very,
very severe to one of our clients. So much so that they may end up going
back to 8.1. The patches to 8.2.5 fix some of the queries, but not all;
some of the problems seem to exist on 8.2.4 as well. We'll try to
develop a self-contained test case that shows the problem, but until then
wanted to give a heads up that a problem may still exist.

- --
Greg Sabino Mullane greg(at)turnstep(dot)com
End Point Corporation
PGP Key: 0x14964AC8 200711051303
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFHL1rpvJuQZxSWSsgRAzNuAJ9xkww4QwjoavHO9SkTf9Zm7Jl9PgCfQ77k
UXPer7AeI0xXe/f3XkMkUps=
=imfW
-----END PGP SIGNATURE-----


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possible planner bug/regression introduced in 8.2.5
Date: 2007-11-05 19:08:13
Message-ID: 25827.1194289693@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Greg Sabino Mullane" <greg(at)turnstep(dot)com> writes:
>> Tom Lane wrote:
>> Those patches are certainly already in the 8.2 CVS branch, so your
>> question seems to mean "are we going to push 8.2.6 immediately to fix
>> this". My vote would be no --- 8.2.5 is less than six weeks old and

> I suppose catastophic is in the eye of the beholder, but this is very,
> very severe to one of our clients. So much so that they may end up going
> back to 8.1. The patches to 8.2.5 fix some of the queries, but not all;
> some of the problems seem to exist on 8.2.4 as well. We'll try to
> develop a self-contained test case that shows the problem, but until then
> wanted to give a heads up that a problem may still exist.

All the more reason not to push 8.2.6 immediately ...

regards, tom lane