Re: Possible bug in CASE evaluation

Lists: pgsql-hackers
From: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Possible bug in CASE evaluation
Date: 2013-06-21 08:16:22
Message-ID: A737B7A37273E048B164557ADEF4A58B17BB4EF8@ntex2010a.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I want to draw attention to this thread on -general:
CAMQ5dGq4SuJPbhT2-9XLAPAsvKNUL2-bb0cPyci2Fp+pfSfc3g(at)mail(dot)gmail(dot)com

Would you concur that this is a bug?

The fine manual says about CASE:

If the condition's result is true, the value of the CASE expression
is the result that follows the condition, and the remainder of the
CASE expression is not processed.

But:

test=> CREATE FUNCTION zero() RETURNS integer IMMUTABLE LANGUAGE SQL AS 'SELECT 0';
CREATE FUNCTION
test=> SELECT CASE WHEN (SELECT zero()) = 0 THEN -1 ELSE 1/zero() END;
ERROR: division by zero

Yours,
Laurenz Albe


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-21 08:52:01
Message-ID: 20130621085201.GB12425@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2013-06-21 08:16:22 +0000, Albe Laurenz wrote:
> I want to draw attention to this thread on -general:
> CAMQ5dGq4SuJPbhT2-9XLAPAsvKNUL2-bb0cPyci2Fp+pfSfc3g(at)mail(dot)gmail(dot)com

There's also a bug reported for it:
#8237: E1Uovmc-0007FT-R4(at)wrigleys(dot)postgresql(dot)org

> Would you concur that this is a bug?

Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
though. If we can agree it is, the fix outlined over on -bugs seems to
be easily enough implemented...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Andres Freund *EXTERN*" <andres(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-21 09:20:21
Message-ID: A737B7A37273E048B164557ADEF4A58B17BB5236@ntex2010a.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund wrote:
> On 2013-06-21 08:16:22 +0000, Albe Laurenz wrote:
> > I want to draw attention to this thread on -general:
> > CAMQ5dGq4SuJPbhT2-9XLAPAsvKNUL2-bb0cPyci2Fp+pfSfc3g(at)mail(dot)gmail(dot)com
>
> There's also a bug reported for it:
> #8237: E1Uovmc-0007FT-R4(at)wrigleys(dot)postgresql(dot)org
>
> > Would you concur that this is a bug?
>
> Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
> though. If we can agree it is, the fix outlined over on -bugs seems to
> be easily enough implemented...

I think that it is surprising behaviour.

If fixing the behaviour is undesirable, at least the documentation
should be fixed.

Yours,
Laurenz Albe


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-21 10:09:40
Message-ID: CAFj8pRBAmhXO1ne-Xc2EjgPeM5S+R=CzZt5M50rMU8BjhyOW0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/6/21 Andres Freund <andres(at)2ndquadrant(dot)com>:
> Hi,
>
>
> On 2013-06-21 08:16:22 +0000, Albe Laurenz wrote:
>> I want to draw attention to this thread on -general:
>> CAMQ5dGq4SuJPbhT2-9XLAPAsvKNUL2-bb0cPyci2Fp+pfSfc3g(at)mail(dot)gmail(dot)com
>
> There's also a bug reported for it:
> #8237: E1Uovmc-0007FT-R4(at)wrigleys(dot)postgresql(dot)org
>
>> Would you concur that this is a bug?
>
> Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
> though. If we can agree it is, the fix outlined over on -bugs seems to
> be easily enough implemented...

fix is not easy :-( you should to catch any possible exception, so it
means, so you should to do some optimalization under subtransactions -
or you should not do this kind of optimizations.

Pavel

>
> Greetings,
>
> Andres Freund
>
> --
> Andres Freund http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: Noah Misch <noah(at)leadboat(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: Andres Freund *EXTERN* <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-21 13:51:05
Message-ID: 20130621135105.GA740984@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jun 21, 2013 at 09:20:21AM +0000, Albe Laurenz wrote:
> Andres Freund wrote:
> > Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
> > though. If we can agree it is, the fix outlined over on -bugs seems to
> > be easily enough implemented...

If you refer to this:

On Tue, Jun 18, 2013 at 03:31:32PM +0200, Andres Freund wrote:
> So it seems we need to stop processing after finding a single WHEN
> that's not const? Does anybody have a better idea?

eval_const_expressions() is not just an optimization: it performs mandatory
transformations such as the conversion of named-argument function calls to
positional function calls. Even if you could skip it, queries with expensive
constant expressions would notice the performance loss. The situations helped
by a change like this are too marginal to accept that cost.

Would it work to run eval_const_expressions() lazily on THEN clauses? The
plan-time eval_const_expressions() would not descend to them. The first
ExecEvalCase() to use a given THEN clause would run eval_const_expressions()
before proceeding.

> I think that it is surprising behaviour.

That's about how I characterize it, too.

I question whether real applications care. It's important to have CASE usable
for avoiding data-dependent errors, but what's the use of including in your
query an expression that can do nothing but throw an error? Does anyone have
a real-world example? Perhaps some generated-query scenario.

That being said, if we discover a simple-enough fix that performs well, we may
as well incorporate it.

> If fixing the behaviour is undesirable, at least the documentation
> should be fixed.

A brief documentation mention sounds fine. Perhaps add a paragraph on
constant folding in general and reference that from the CASE page.

Thanks,
nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-21 14:12:32
Message-ID: 20130621141232.GD19710@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
> On Fri, Jun 21, 2013 at 09:20:21AM +0000, Albe Laurenz wrote:
> > Andres Freund wrote:
> > > Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
> > > though. If we can agree it is, the fix outlined over on -bugs seems to
> > > be easily enough implemented...
>
> If you refer to this:
>
> On Tue, Jun 18, 2013 at 03:31:32PM +0200, Andres Freund wrote:
> > So it seems we need to stop processing after finding a single WHEN
> > that's not const? Does anybody have a better idea?
>
> eval_const_expressions() is not just an optimization: it performs mandatory
> transformations such as the conversion of named-argument function calls to
> positional function calls.

Ah yes. Forgot about that... Scrap that. Although it surely isn't nice
that all that is done in a function calleed eval_const_expressions()...

> Even if you could skip it, queries with expensive
> constant expressions would notice the performance loss. The situations helped
> by a change like this are too marginal to accept that cost.

I have to say, that argument doesn't excite me mu8ch. It's not like we
don't want to do the constant expression evaluation at all anymore. Just
not inside CASE WHEN blocks which already are barring some optimizations
anyway...

> Would it work to run eval_const_expressions() lazily on THEN clauses? The
> plan-time eval_const_expressions() would not descend to them. The first
> ExecEvalCase() to use a given THEN clause would run eval_const_expressions()
> before proceeding.

Ugh. Doesn't sound nice. I don't have any better ideas than to actually
split eval_const_expressions into one function that does the necessary
things like canonicalization of AND/OR and one that actually evaluates
expressions inside though.
So maybe that's the way to go :/

> > I think that it is surprising behaviour.
>
> That's about how I characterize it, too.
>
> I question whether real applications care. It's important to have CASE usable
> for avoiding data-dependent errors, but what's the use of including in your
> query an expression that can do nothing but throw an error? Does anyone have
> a real-world example? Perhaps some generated-query scenario.

It doesn't need to be an actual constant. Something that evaluates to
the value at plan time is enough:
PREPARE foo AS SELECT CASE WHEN (SELECT $1::int)=0 THEN 0 ELSE 1/$1::int END;
EXECUTE foo(0);

That example will most likely only crashes in 9.2+ because it will
replan it with the acutal parameter values in place. But you could have
the same in earlier versions e.g. using PQExecParams(), but that's
harder to demonstrate.

Now, that example only crashes because one place uses (SELECT $1) and
the other doesn't, but...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-21 14:45:28
Message-ID: 20130621144528.GE19710@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
> That being said, if we discover a simple-enough fix that performs well, we may
> as well incorporate it.

What about passing another parameter down eval_const_expressions_mutator
(which is static, so changing the API isn't a problem) that basically
tells us whether we actually should evaluate expressions or just perform
the transformation?
There's seems to be basically a couple of places where we call dangerous
things:
* simplify_function (via ->evaluate_function->evaluate_expr) which is
called in a bunch of places
* evaluate_expr which is directly called in T_ArrayExpr
T_ArrayCoerceExpr

All places I've inspected so far need to deal with simplify_function
returning NULL anyway, so that seems like a viable fix.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-21 14:56:35
Message-ID: CAFj8pRBfHFVfsN8GRt6tjyPV-+1OJgcCeD-rSdwBHBabEjopww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/6/21 Andres Freund <andres(at)2ndquadrant(dot)com>:
> On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
>> On Fri, Jun 21, 2013 at 09:20:21AM +0000, Albe Laurenz wrote:
>> > Andres Freund wrote:
>> > > Yes, I think it's pretty clearly a bug - Tom doesn't seem think so
>> > > though. If we can agree it is, the fix outlined over on -bugs seems to
>> > > be easily enough implemented...
>>
>> If you refer to this:
>>
>> On Tue, Jun 18, 2013 at 03:31:32PM +0200, Andres Freund wrote:
>> > So it seems we need to stop processing after finding a single WHEN
>> > that's not const? Does anybody have a better idea?
>>
>> eval_const_expressions() is not just an optimization: it performs mandatory
>> transformations such as the conversion of named-argument function calls to
>> positional function calls.
>
> Ah yes. Forgot about that... Scrap that. Although it surely isn't nice
> that all that is done in a function calleed eval_const_expressions()...
>
>> Even if you could skip it, queries with expensive
>> constant expressions would notice the performance loss. The situations helped
>> by a change like this are too marginal to accept that cost.

yes, I dislike it too - then we can have inconsistent behave of
constant between CASE and other statements.

We should to do without any performance lost, if we do some changes in
this area.

Regards

Pavel

>
> I have to say, that argument doesn't excite me mu8ch. It's not like we
> don't want to do the constant expression evaluation at all anymore. Just
> not inside CASE WHEN blocks which already are barring some optimizations
> anyway...
>
>> Would it work to run eval_const_expressions() lazily on THEN clauses? The
>> plan-time eval_const_expressions() would not descend to them. The first
>> ExecEvalCase() to use a given THEN clause would run eval_const_expressions()
>> before proceeding.
>
> Ugh. Doesn't sound nice. I don't have any better ideas than to actually
> split eval_const_expressions into one function that does the necessary
> things like canonicalization of AND/OR and one that actually evaluates
> expressions inside though.
> So maybe that's the way to go :/
>
>> > I think that it is surprising behaviour.
>>
>> That's about how I characterize it, too.
>>
>> I question whether real applications care. It's important to have CASE usable
>> for avoiding data-dependent errors, but what's the use of including in your
>> query an expression that can do nothing but throw an error? Does anyone have
>> a real-world example? Perhaps some generated-query scenario.
>
> It doesn't need to be an actual constant. Something that evaluates to
> the value at plan time is enough:
> PREPARE foo AS SELECT CASE WHEN (SELECT $1::int)=0 THEN 0 ELSE 1/$1::int END;
> EXECUTE foo(0);
>
> That example will most likely only crashes in 9.2+ because it will
> replan it with the acutal parameter values in place. But you could have
> the same in earlier versions e.g. using PQExecParams(), but that's
> harder to demonstrate.
>
> Now, that example only crashes because one place uses (SELECT $1) and
> the other doesn't, but...
>
> Greetings,
>
> Andres Freund
>
> --
> Andres Freund http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-21 15:05:54
Message-ID: 20130621150554.GC740984@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jun 21, 2013 at 04:12:32PM +0200, Andres Freund wrote:
> On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
> > On Fri, Jun 21, 2013 at 09:20:21AM +0000, Albe Laurenz wrote:

> > Even if you could skip it, queries with expensive
> > constant expressions would notice the performance loss. The situations helped
> > by a change like this are too marginal to accept that cost.
>
> I have to say, that argument doesn't excite me mu8ch. It's not like we
> don't want to do the constant expression evaluation at all anymore. Just
> not inside CASE WHEN blocks which already are barring some optimizations
> anyway...

Sure, it's a narrow loss. Before introducing a new narrow loss to fix an
existing one, we should consider which loss hurts more. Offhand, I sympathize
with the folks who would lose performance more than with the folks who want to
write the sorts of expressions under consideration.

> > Would it work to run eval_const_expressions() lazily on THEN clauses? The
> > plan-time eval_const_expressions() would not descend to them. The first
> > ExecEvalCase() to use a given THEN clause would run eval_const_expressions()
> > before proceeding.
>
> Ugh. Doesn't sound nice.

Would you elaborate?

> > I question whether real applications care. It's important to have CASE usable
> > for avoiding data-dependent errors, but what's the use of including in your
> > query an expression that can do nothing but throw an error? Does anyone have
> > a real-world example? Perhaps some generated-query scenario.
>
> It doesn't need to be an actual constant. Something that evaluates to
> the value at plan time is enough:
> PREPARE foo AS SELECT CASE WHEN (SELECT $1::int)=0 THEN 0 ELSE 1/$1::int END;
> EXECUTE foo(0);

> Now, that example only crashes because one place uses (SELECT $1) and
> the other doesn't, but...

Not the "real-world" I was hoping for, but fair enough.

"Crash" in this context means "raise an error", right?

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-22 14:54:50
Message-ID: 20130622145450.GC1254@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-06-21 16:45:28 +0200, Andres Freund wrote:
> On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
> > That being said, if we discover a simple-enough fix that performs well, we may
> > as well incorporate it.
>
> What about passing another parameter down eval_const_expressions_mutator
> (which is static, so changing the API isn't a problem) that basically
> tells us whether we actually should evaluate expressions or just perform
> the transformation?
> There's seems to be basically a couple of places where we call dangerous
> things:
> * simplify_function (via ->evaluate_function->evaluate_expr) which is
> called in a bunch of places
> * evaluate_expr which is directly called in T_ArrayExpr
> T_ArrayCoerceExpr
>
> All places I've inspected so far need to deal with simplify_function
> returning NULL anyway, so that seems like a viable fix.

*Prototype* patch - that seems simple enough - attached. Opinions?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Don-t-evaluate-potentially-unreachable-parts-of-CASE.patch text/x-patch 5.8 KB

From: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Noah Misch *EXTERN*" <noah(at)leadboat(dot)com>
Cc: Andres Freund *EXTERN* <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-24 13:41:51
Message-ID: A737B7A37273E048B164557ADEF4A58B17BB5F4F@ntex2010a.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Noah Misch wrote:
>> If fixing the behaviour is undesirable, at least the documentation
>> should be fixed.
>
> A brief documentation mention sounds fine. Perhaps add a paragraph on
> constant folding in general and reference that from the CASE page.

How about the attached?

Yours,
Laurenz Albe

Attachment Content-Type Size
case-doc.patch application/octet-stream 824 bytes

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-25 01:35:53
Message-ID: 20130625013553.GA840390@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jun 22, 2013 at 04:54:50PM +0200, Andres Freund wrote:
> On 2013-06-21 16:45:28 +0200, Andres Freund wrote:
> > On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
> > > That being said, if we discover a simple-enough fix that performs well, we may
> > > as well incorporate it.
> >
> > What about passing another parameter down eval_const_expressions_mutator
> > (which is static, so changing the API isn't a problem) that basically
> > tells us whether we actually should evaluate expressions or just perform
> > the transformation?
> > There's seems to be basically a couple of places where we call dangerous
> > things:
> > * simplify_function (via ->evaluate_function->evaluate_expr) which is
> > called in a bunch of places
> > * evaluate_expr which is directly called in T_ArrayExpr
> > T_ArrayCoerceExpr
> >
> > All places I've inspected so far need to deal with simplify_function
> > returning NULL anyway, so that seems like a viable fix.
>
> *Prototype* patch - that seems simple enough - attached. Opinions?

Simple enough, yes. The other point still stands.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-25 13:01:52
Message-ID: 20130625130152.GA7716@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-06-24 21:35:53 -0400, Noah Misch wrote:
> On Sat, Jun 22, 2013 at 04:54:50PM +0200, Andres Freund wrote:
> > On 2013-06-21 16:45:28 +0200, Andres Freund wrote:
> > > On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
> > > > That being said, if we discover a simple-enough fix that performs well, we may
> > > > as well incorporate it.
> > >
> > > What about passing another parameter down eval_const_expressions_mutator
> > > (which is static, so changing the API isn't a problem) that basically
> > > tells us whether we actually should evaluate expressions or just perform
> > > the transformation?
> > > There's seems to be basically a couple of places where we call dangerous
> > > things:
> > > * simplify_function (via ->evaluate_function->evaluate_expr) which is
> > > called in a bunch of places
> > > * evaluate_expr which is directly called in T_ArrayExpr
> > > T_ArrayCoerceExpr
> > >
> > > All places I've inspected so far need to deal with simplify_function
> > > returning NULL anyway, so that seems like a viable fix.
> >
> > *Prototype* patch - that seems simple enough - attached. Opinions?
>
> Simple enough, yes. The other point still stands.

You mean performance? Primarily I still think we should first worry
about correctness first and then about performance. And CASE is the
documented (and really only, without writing procedual code) solution to
use for the cases where evaluation order actually *is* important.

But anyway, the question is to find realistic cases to measure the
performance of. Obviously you can just make arbitrarily expensive
expressions that can be computed full during constant folding. Which I
don't find very interesting, do you?

So, what I've done is to measure the performance difference when doing
full table queries of some CASE containing system views.

best of 5 everytime:
SELECT * FROM pg_stats;
master: 28.287 patched: 28.565

SELECT * FROM information_schema.applicable_roles;
master: 0.757 patched: 0.755

regression=# SELECT * FROM information_schema.attributes:
master: 8.392 patched: 8.555

SELECT * FROM information_schema.column_privileges;
master: 90.853 patched: 88.551

SELECT * FROM information_schema.columns;
master: 259.436 patched: 274.145

SELECT * FROM information_schema.constraint_column_usage ;
master: 14.736 patched 15.005

SELECT * FROM information_schema.parameters;
master: 76.173 patched: 79.850

SELECT * FROM information_schema.routines;
master: 45.102 patched: 46.517 ms

...

So, on those queries there's some difference (I've left out the ones
which are too short), but it's not big.

Now, for the other extreme, the following completely random query I just
typed out:
SELECT f FROM (SELECT (CASE g.i WHEN -1 THEN 0 WHEN 1 THEN 3.0/1 WHEN g.i THEN 2.0/3 END) f FROM generate_series(1, 1000000) g(i)) s WHERE f = 0;
master: 491.931 patched: 943.629

suffers way much worse because the division is so expensive...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-25 13:08:20
Message-ID: CAFj8pRDmqx8HgsYPT9h-5gQ_T0ZhCZTbzeA1+TJocKCijaj4xw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/6/25 Andres Freund <andres(at)2ndquadrant(dot)com>:
> On 2013-06-24 21:35:53 -0400, Noah Misch wrote:
>> On Sat, Jun 22, 2013 at 04:54:50PM +0200, Andres Freund wrote:
>> > On 2013-06-21 16:45:28 +0200, Andres Freund wrote:
>> > > On 2013-06-21 09:51:05 -0400, Noah Misch wrote:
>> > > > That being said, if we discover a simple-enough fix that performs well, we may
>> > > > as well incorporate it.
>> > >
>> > > What about passing another parameter down eval_const_expressions_mutator
>> > > (which is static, so changing the API isn't a problem) that basically
>> > > tells us whether we actually should evaluate expressions or just perform
>> > > the transformation?
>> > > There's seems to be basically a couple of places where we call dangerous
>> > > things:
>> > > * simplify_function (via ->evaluate_function->evaluate_expr) which is
>> > > called in a bunch of places
>> > > * evaluate_expr which is directly called in T_ArrayExpr
>> > > T_ArrayCoerceExpr
>> > >
>> > > All places I've inspected so far need to deal with simplify_function
>> > > returning NULL anyway, so that seems like a viable fix.
>> >
>> > *Prototype* patch - that seems simple enough - attached. Opinions?
>>
>> Simple enough, yes. The other point still stands.
>
> You mean performance? Primarily I still think we should first worry
> about correctness first and then about performance. And CASE is the
> documented (and really only, without writing procedual code) solution to
> use for the cases where evaluation order actually *is* important.
>
> But anyway, the question is to find realistic cases to measure the
> performance of. Obviously you can just make arbitrarily expensive
> expressions that can be computed full during constant folding. Which I
> don't find very interesting, do you?
>
> So, what I've done is to measure the performance difference when doing
> full table queries of some CASE containing system views.
>
> best of 5 everytime:
> SELECT * FROM pg_stats;
> master: 28.287 patched: 28.565
>
> SELECT * FROM information_schema.applicable_roles;
> master: 0.757 patched: 0.755
>
> regression=# SELECT * FROM information_schema.attributes:
> master: 8.392 patched: 8.555
>
> SELECT * FROM information_schema.column_privileges;
> master: 90.853 patched: 88.551
>
> SELECT * FROM information_schema.columns;
> master: 259.436 patched: 274.145
>
> SELECT * FROM information_schema.constraint_column_usage ;
> master: 14.736 patched 15.005
>
> SELECT * FROM information_schema.parameters;
> master: 76.173 patched: 79.850
>
> SELECT * FROM information_schema.routines;
> master: 45.102 patched: 46.517 ms
>
> ...
>
> So, on those queries there's some difference (I've left out the ones
> which are too short), but it's not big.
>
> Now, for the other extreme, the following completely random query I just
> typed out:
> SELECT f FROM (SELECT (CASE g.i WHEN -1 THEN 0 WHEN 1 THEN 3.0/1 WHEN g.i THEN 2.0/3 END) f FROM generate_series(1, 1000000) g(i)) s WHERE f = 0;
> master: 491.931 patched: 943.629
>
> suffers way much worse because the division is so expensive...

:-(

it is too high price

Pavel

>
> Greetings,
>
> Andres Freund
>
> --
> Andres Freund http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-25 23:05:15
Message-ID: 20130625230515.GA848316@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 25, 2013 at 03:01:52PM +0200, Andres Freund wrote:
> On 2013-06-24 21:35:53 -0400, Noah Misch wrote:
> > Simple enough, yes. The other point still stands.
>
> You mean performance? Primarily I still think we should first worry
> about correctness first and then about performance. And CASE is the
> documented (and really only, without writing procedual code) solution to
> use for the cases where evaluation order actually *is* important.

I largely share that sentiment, but it's tempered here by the incorrect
behavior's long tenure, the difficulty of encountering a problem without
constructing a test case for the purpose of doing so, the availability of
workarounds, and the open-ended negative performance implications of your
proposed correction.

> But anyway, the question is to find realistic cases to measure the
> performance of. Obviously you can just make arbitrarily expensive
> expressions that can be computed full during constant folding. Which I
> don't find very interesting, do you?

> Now, for the other extreme, the following completely random query I just
> typed out:
> SELECT f FROM (SELECT (CASE g.i WHEN -1 THEN 0 WHEN 1 THEN 3.0/1 WHEN g.i THEN 2.0/3 END) f FROM generate_series(1, 1000000) g(i)) s WHERE f = 0;
> master: 491.931 patched: 943.629
>
> suffers way much worse because the division is so expensive...

That's a clear indicator for this strategy being a dead end. It's not far
from that to a realistic use case; e.g. log(10,2)/g.i or g.i*(2.0/3).

I'm still interested in your answer to my first question here:
http://www.postgresql.org/message-id/20130621150554.GC740984@tornado.leadboat.com

nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-25 23:41:01
Message-ID: 20130625234101.GB22527@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Noah,

On 2013-06-25 19:05:15 -0400, Noah Misch wrote:
> On Tue, Jun 25, 2013 at 03:01:52PM +0200, Andres Freund wrote:
> > On 2013-06-24 21:35:53 -0400, Noah Misch wrote:
> > > Simple enough, yes. The other point still stands.
> >
> > You mean performance? Primarily I still think we should first worry
> > about correctness first and then about performance. And CASE is the
> > documented (and really only, without writing procedual code) solution to
> > use for the cases where evaluation order actually *is* important.
>
> I largely share that sentiment, but it's tempered here by the incorrect
> behavior's long tenure, the difficulty of encountering a problem without
> constructing a test case for the purpose of doing so, the availability of
> workarounds, and the open-ended negative performance implications of your
> proposed correction.

The workaround being that you need to know which version of postgres can
inline/evaluate which parts of a query?

> > But anyway, the question is to find realistic cases to measure the
> > performance of. Obviously you can just make arbitrarily expensive
> > expressions that can be computed full during constant folding. Which I
> > don't find very interesting, do you?
>
> > Now, for the other extreme, the following completely random query I just
> > typed out:
> > SELECT f FROM (SELECT (CASE g.i WHEN -1 THEN 0 WHEN 1 THEN 3.0/1 WHEN g.i THEN 2.0/3 END) f FROM generate_series(1, 1000000) g(i)) s WHERE f = 0;
> > master: 491.931 patched: 943.629
> >
> > suffers way much worse because the division is so expensive...
>
> That's a clear indicator for this strategy being a dead end. It's not far
> from that to a realistic use case; e.g. log(10,2)/g.i or g.i*(2.0/3).

Sure. But in most realistic scenarios the actual overhead of the query
will be larger - the above example is that expensive because basically
nothing but those computation are happening.

> I'm still interested in your answer to my first question here:
> http://www.postgresql.org/message-id/20130621150554.GC740984@tornado.leadboat.com

I guess you're referring to:

> > > Would it work to run eval_const_expressions() lazily on THEN clauses? The
> > > plan-time eval_const_expressions() would not descend to them. The first
> > > ExecEvalCase() to use a given THEN clause would run eval_const_expressions()
> > > before proceeding.
> >
> > Ugh. Doesn't sound nice.
>
> Would you elaborate?

Several things:

1) As you reminded me of, other parts of the planner rely on a good part
of the transformations in eval_const_expressions() having already been
performed. So we cannot simply not descend there.

So what we would have to do is to apply something akin to the proposed
and then remember for various nodes (at least the individual CaseWhen's
expr and result, potentially CaseTestExpr?) whether we already did
the full constant folding.

2) modifying the expression tree during execution seems unclean and a
layering violation and requires doing type lookups and all that during
runtime. That's rather subjective, I agree.

3) lists are cool and I don't remember the third thing I had in mind.

But I guess given the objections on performance the combined approach is
the way to go?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-26 01:32:36
Message-ID: 22484.1372210356@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> But I guess given the objections on performance the combined approach is
> the way to go?

I think the documentation approach is the way to go.

It was pointed out in the pgsql-general thread about this that a naive
user might expect that, say, syntax or datatype violations in a CASE arm
would not get raised if the CASE doesn't select that arm. We, who know
that all such errors are thrown in the parser, might say that's silly
--- but why? I think it's not that far from acknowledging that some
errors will be thrown pre-execution to acknowledging that errors
produced by constant-folding can be thrown pre-execution, even if
they're inside a CASE. Where is the bright line that says we can
complain about 42+'bogus' in that context but not about 1/0?

If there were realistic use-cases for this sort of thing then I'd have
more sympathy for contorting the planner code to support them, but I'm
not seeing that. The examples shown so far are all pretty artificial,
unless your intent is to throw an error when the CASE fails; and in that
case why not do it with a volatile function containing a RAISE? Not
only will the planner handle that as you want, but it lets you report
an actually-useful message.

regards, tom lane


From: Noah Misch <noah(at)leadboat(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: Andres Freund *EXTERN* <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-27 00:41:30
Message-ID: 20130627004130.GA866697@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 24, 2013 at 01:41:51PM +0000, Albe Laurenz wrote:
> Noah Misch wrote:
> >> If fixing the behaviour is undesirable, at least the documentation
> >> should be fixed.
> >
> > A brief documentation mention sounds fine. Perhaps add a paragraph on
> > constant folding in general and reference that from the CASE page.
>
> How about the attached?

Works for me; committed. Thanks.

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com