Re: Possible bug in CASE evaluation

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Possible bug in CASE evaluation
Date: 2013-06-25 23:41:01
Message-ID: 20130625234101.GB22527@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Noah,

On 2013-06-25 19:05:15 -0400, Noah Misch wrote:
> On Tue, Jun 25, 2013 at 03:01:52PM +0200, Andres Freund wrote:
> > On 2013-06-24 21:35:53 -0400, Noah Misch wrote:
> > > Simple enough, yes. The other point still stands.
> >
> > You mean performance? Primarily I still think we should first worry
> > about correctness first and then about performance. And CASE is the
> > documented (and really only, without writing procedual code) solution to
> > use for the cases where evaluation order actually *is* important.
>
> I largely share that sentiment, but it's tempered here by the incorrect
> behavior's long tenure, the difficulty of encountering a problem without
> constructing a test case for the purpose of doing so, the availability of
> workarounds, and the open-ended negative performance implications of your
> proposed correction.

The workaround being that you need to know which version of postgres can
inline/evaluate which parts of a query?

> > But anyway, the question is to find realistic cases to measure the
> > performance of. Obviously you can just make arbitrarily expensive
> > expressions that can be computed full during constant folding. Which I
> > don't find very interesting, do you?
>
> > Now, for the other extreme, the following completely random query I just
> > typed out:
> > SELECT f FROM (SELECT (CASE g.i WHEN -1 THEN 0 WHEN 1 THEN 3.0/1 WHEN g.i THEN 2.0/3 END) f FROM generate_series(1, 1000000) g(i)) s WHERE f = 0;
> > master: 491.931 patched: 943.629
> >
> > suffers way much worse because the division is so expensive...
>
> That's a clear indicator for this strategy being a dead end. It's not far
> from that to a realistic use case; e.g. log(10,2)/g.i or g.i*(2.0/3).

Sure. But in most realistic scenarios the actual overhead of the query
will be larger - the above example is that expensive because basically
nothing but those computation are happening.

> I'm still interested in your answer to my first question here:
> http://www.postgresql.org/message-id/20130621150554.GC740984@tornado.leadboat.com

I guess you're referring to:

> > > Would it work to run eval_const_expressions() lazily on THEN clauses? The
> > > plan-time eval_const_expressions() would not descend to them. The first
> > > ExecEvalCase() to use a given THEN clause would run eval_const_expressions()
> > > before proceeding.
> >
> > Ugh. Doesn't sound nice.
>
> Would you elaborate?

Several things:

1) As you reminded me of, other parts of the planner rely on a good part
of the transformations in eval_const_expressions() having already been
performed. So we cannot simply not descend there.

So what we would have to do is to apply something akin to the proposed
and then remember for various nodes (at least the individual CaseWhen's
expr and result, potentially CaseTestExpr?) whether we already did
the full constant folding.

2) modifying the expression tree during execution seems unclean and a
layering violation and requires doing type lookups and all that during
runtime. That's rather subjective, I agree.

3) lists are cool and I don't remember the third thing I had in mind.

But I guess given the objections on performance the combined approach is
the way to go?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2013-06-26 00:01:39 Re: FILTER for aggregates [was Re: Department of Redundancy Department: makeNode(FuncCall) division]
Previous Message Steve Singer 2013-06-25 23:40:00 Re: [PATCH] Fix conversion for Decimal arguments in plpython functions