Re: [PERFORM] Posible planner improvement?

Lists: pgsql-hackerspgsql-performance
From: "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
To: <albert(at)sedifa(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Posible planner improvement?
Date: 2008-05-21 11:52:28
Message-ID: 014F2941B0A1EA47BD61D21526B806E90162C3D2@MI8NYCMAIL08.Mi8.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

The problem is that the implied join predicate is not being propagated. This is definitely a planner deficiency.

- Luke

----- Original Message -----
From: pgsql-performance-owner(at)postgresql(dot)org <pgsql-performance-owner(at)postgresql(dot)org>
To: pgsql-performance(at)postgresql(dot)org <pgsql-performance(at)postgresql(dot)org>
Sent: Wed May 21 07:37:49 2008
Subject: Re: [PERFORM] Posible planner improvement?

A Dimecres 21 Maig 2008, Albert Cervera Areny va escriure:
> A Dimecres 21 Maig 2008, Mark Mielke va escriure:
> > A Dimecres 21 Maig 2008, Richard Huxton va escriure:
> > >> Albert Cervera Areny wrote:
> > >>> I've got a query similar to this:
> > >>>
> > >>> select * from t1, t2 where t1.id > 158507 and t1.id = t2.id;
> > >>>
> > >>> That took > 84 minutes (the query was a bit longer but this is the
> > >>> part that made the difference) after a little change the query took
> > >>> ~1 second:
> > >>>
> > >>> select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and
> > >>> t1.id = t2.id;
> > >>
> > >> Try posting EXPLAIN ANALYSE SELECT ... for both of those queries and
> > >> we'll see why it's better at the second one.
> >
> > Even if the estimates were off (they look a bit off for the first
> > table), the above two queries are logically identical, and I would
> > expect the planner to make the same decision for both.
> >
> > I am curious - what is the result of:
> >
> > select * from t1, t2 where t2.id > 158507 and t1.id = t2.id;
> >
> > Is it the same speed as the first or second, or is a third speed
> > entirely?
>
> Attached the same file with the third result at the end. The result is
> worst than the other two cases. Note that I've analyzed both tables but
> results are the same. One order of magnitude between the two first queries.

Sorry, it's not worse than the other two cases as shown in the file. However,
after repetition it seems the other two seem to decrease more than the third
one whose times vary a bit more and some times take up to 5 seconds.

Other queries are running in the same machine, so take times with a grain of
salt. What's clear is that always there's a big difference between first and
second queries.

>
> > If t1.id = t2.id, I would expect the planner to substitute them freely
> > in terms of identities?
> >
> > Cheers,
> > mark

--
Albert Cervera Areny
Dept. Informàtica Sedifa, S.L.

Av. Can Bordoll, 149
08202 - Sabadell (Barcelona)
Tel. 93 715 51 11
Fax. 93 715 51 12

====================================================================
........................ AVISO LEGAL ............................
La presente comunicación y sus anexos tiene como destinatario la
persona a la que va dirigida, por lo que si usted lo recibe
por error debe notificarlo al remitente y eliminarlo de su
sistema, no pudiendo utilizarlo, total o parcialmente, para
ningún fin. Su contenido puede tener información confidencial o
protegida legalmente y únicamente expresa la opinión del
remitente. El uso del correo electrónico vía Internet no
permite asegurar ni la confidencialidad de los mensajes
ni su correcta recepción. En el caso de que el
destinatario no consintiera la utilización del correo electrónico,
deberá ponerlo en nuestro conocimiento inmediatamente.
====================================================================
........................... DISCLAIMER .............................
This message and its attachments are intended exclusively for the
named addressee. If you receive this message in error, please
immediately delete it from your system and notify the sender. You
may not use this message or any part of it for any purpose.
The message may contain information that is confidential or
protected by law, and any opinions expressed are those of the
individual sender. Internet e-mail guarantees neither the
confidentiality nor the proper receipt of the message sent.
If the addressee of this message does not consent to the use
of internet e-mail, please inform us inmmediately.
====================================================================

--
Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


From: Richard Huxton <dev(at)archonet(dot)com>
To: Luke Lonergan <LLonergan(at)greenplum(dot)com>
Cc: albert(at)sedifa(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Posible planner improvement?
Date: 2008-05-21 13:09:49
Message-ID: 48341F1D.3090304@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Luke Lonergan wrote:
> The problem is that the implied join predicate is not being
> propagated. This is definitely a planner deficiency.

IIRC only equality conditions are propagated and gt, lt, between aren't.
I seem to remember that the argument given was that the cost of
checking for the ability to propagate was too high for the frequency
when it ocurred.

Of course, what was true for code and machines of 5 years ago might not
be so today.

--
Richard Huxton
Archonet Ltd


From: PFC <lists(at)peufeu(dot)com>
To: "Richard Huxton" <dev(at)archonet(dot)com>, "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
Cc: albert(at)sedifa(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: Posible planner improvement?
Date: 2008-05-21 16:18:27
Message-ID: op.ubih81yvcigqcu@apollo13.peufeu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

On Wed, 21 May 2008 15:09:49 +0200, Richard Huxton <dev(at)archonet(dot)com>
wrote:

> Luke Lonergan wrote:
>> The problem is that the implied join predicate is not being
>> propagated. This is definitely a planner deficiency.
>
> IIRC only equality conditions are propagated and gt, lt, between aren't.
> I seem to remember that the argument given was that the cost of
> checking for the ability to propagate was too high for the frequency
> when it ocurred.
>
> Of course, what was true for code and machines of 5 years ago might not
> be so today.
>

Suggestion : when executing a one-off sql statement, optimizer should try
to offer "best effort while being fast" ; when making a plan that will be
reused many times (ie PREPARE, functions...) planning time could be
muuuuch longer...


From: Albert Cervera Areny <albert(at)sedifa(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Posible planner improvement?
Date: 2008-05-21 16:22:42
Message-ID: 200805211822.42528.albert@sedifa.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

A Dimecres 21 Maig 2008, Richard Huxton va escriure:
> Luke Lonergan wrote:
> > The problem is that the implied join predicate is not being
> > propagated. This is definitely a planner deficiency.
>
> IIRC only equality conditions are propagated and gt, lt, between aren't.
> I seem to remember that the argument given was that the cost of
> checking for the ability to propagate was too high for the frequency
> when it ocurred.
>
> Of course, what was true for code and machines of 5 years ago might not
> be so today.

Hope this can be revisited given the huge difference in this case: 80 minutes
to 1 second.

--
Albert Cervera Areny
Dept. Informàtica Sedifa, S.L.

Av. Can Bordoll, 149
08202 - Sabadell (Barcelona)
Tel. 93 715 51 11
Fax. 93 715 51 12

====================================================================
........................ AVISO LEGAL ............................
La presente comunicación y sus anexos tiene como destinatario la
persona a la que va dirigida, por lo que si usted lo recibe
por error debe notificarlo al remitente y eliminarlo de su
sistema, no pudiendo utilizarlo, total o parcialmente, para
ningún fin. Su contenido puede tener información confidencial o
protegida legalmente y únicamente expresa la opinión del
remitente. El uso del correo electrónico vía Internet no
permite asegurar ni la confidencialidad de los mensajes
ni su correcta recepción. En el caso de que el
destinatario no consintiera la utilización del correo electrónico,
deberá ponerlo en nuestro conocimiento inmediatamente.
====================================================================
........................... DISCLAIMER .............................
This message and its attachments are intended exclusively for the
named addressee. If you receive this message in error, please
immediately delete it from your system and notify the sender. You
may not use this message or any part of it for any purpose.
The message may contain information that is confidential or
protected by law, and any opinions expressed are those of the
individual sender. Internet e-mail guarantees neither the
confidentiality nor the proper receipt of the message sent.
If the addressee of this message does not consent to the use
of internet e-mail, please inform us inmmediately.
====================================================================


From: Decibel! <decibel(at)decibel(dot)org>
To: Richard Huxton <dev(at)archonet(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Luke Lonergan <LLonergan(at)greenplum(dot)com>, albert(at)sedifa(dot)com
Subject: Re: Posible planner improvement?
Date: 2008-05-24 18:49:46
Message-ID: 1870DB98-E4E2-4DB5-8620-DB4A8309A73C@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Moving to -hackers...

On May 21, 2008, at 9:09 AM, Richard Huxton wrote:
> Luke Lonergan wrote:
>> The problem is that the implied join predicate is not being
>> propagated. This is definitely a planner deficiency.
>
> IIRC only equality conditions are propagated and gt, lt, between
> aren't. I seem to remember that the argument given was that the
> cost of checking for the ability to propagate was too high for the
> frequency when it ocurred.
>
> Of course, what was true for code and machines of 5 years ago might
> not be so today.

Definitely...

How hard would it be to propagate all conditions (except maybe
functions, though perhaps the new function cost estimates make that
more practical) in cases of equality?

For reference, the original query as posted to -performance:

select * from t1, t2 where t1.id > 158507 and t1.id = t2.id;

That took > 84 minutes (the query was a bit longer but this is the
part that made the difference) after a little change the query took
~1 second:

select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and
t1.id = t2.id;

--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828


From: "Stephen R(dot) van den Berg" <srb(at)cuci(dot)nl>
To: Decibel! <decibel(at)decibel(dot)org>
Cc: Richard Huxton <dev(at)archonet(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Luke Lonergan <LLonergan(at)greenplum(dot)com>, albert(at)sedifa(dot)com
Subject: Re: [PERFORM] Posible planner improvement?
Date: 2008-05-25 08:21:33
Message-ID: 20080525082133.GB27965@cuci.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-performance

Decibel! wrote:
>For reference, the original query as posted to -performance:

>select * from t1, t2 where t1.id > 158507 and t1.id = t2.id;

>That took > 84 minutes (the query was a bit longer but this is the
>part that made the difference) after a little change the query took
>~1 second:

Just out of curiosity, would predefining the order of join have solved
the issue, as in:

a. select * from t1 join t2 using(id) where t1.id > 158507;
vs.
b. select * from t2 join t1 using(id) where t1.id > 158507;

I'd expect a to be faster than b, is it?
--
Sincerely, srb(at)cuci(dot)nl
Stephen R. van den Berg.
"Technology is stuff that doesn't work yet." -- Bran Ferren
"We no longer think of chairs as technology." -- Douglas Adams