Re: Text-any concatenation volatility acting as optimization barrier

Lists: pgsql-hackers
From: Marti Raudsepp <marti(at)juffo(dot)org>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Text-any concatenation volatility acting as optimization barrier
Date: 2012-02-07 20:18:13
Message-ID: CABRT9RBKy-OAjaxWMFMRSaj=1+4_=vmaTeCBbi2BJ-s195Fdyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi list,

Andrew Dunstan reported an awkward-seeming case on IRC where shifting
around a concatenation expression in a view made the planner choose a
good or a bad execution plan.

Simplified, it boils down to this:

db=# create table foo(i int);
db=# explain verbose select i from (select i, i::text || 'x' as asd
from foo) as subq;
Seq Scan on public.foo (cost=0.00..34.00 rows=2400 width=4)
Output: foo.i

db=# explain verbose select i from (select i, i || 'x'::text as asd
from foo) as subq;
Subquery Scan on subq (cost=0.00..76.00 rows=2400 width=4)
Output: subq.i
-> Seq Scan on public.foo (cost=0.00..52.00 rows=2400 width=4)
Output: foo.i, ((foo.i)::text || 'x'::text)

Case #1 uses the normal textcat(text, text) operator by automatically
coercing 'x' as text.
However, case #2 uses the anytextcat(anynonarray, text), which is
marked as volatile thus acts as an optimization barrier. Later, the
anytextcat SQL function is inlined and the EXPLAIN VERBOSE output has
no trace of what happened.

Is this something we can, or want, to fix?

One way would be doing preprocess_expression() before
pull_up_subqueries() so function inlining happens earlier, but I can't
imagine what unintended consequences that might have.

Another option would be creating explicit immutable text || foo
operators for common types, but that sounds pretty hacky.

Regards,
Marti


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Text-any concatenation volatility acting as optimization barrier
Date: 2012-02-07 20:31:56
Message-ID: 4F318A3C.8010905@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/07/2012 03:18 PM, Marti Raudsepp wrote:
> Hi list,
>
> Andrew Dunstan reported an awkward-seeming case on IRC where shifting
> around a concatenation expression in a view made the planner choose a
> good or a bad execution plan.
>
> Simplified, it boils down to this:
>
> db=# create table foo(i int);
> db=# explain verbose select i from (select i, i::text || 'x' as asd
> from foo) as subq;
> Seq Scan on public.foo (cost=0.00..34.00 rows=2400 width=4)
> Output: foo.i
>
> db=# explain verbose select i from (select i, i || 'x'::text as asd
> from foo) as subq;
> Subquery Scan on subq (cost=0.00..76.00 rows=2400 width=4)
> Output: subq.i
> -> Seq Scan on public.foo (cost=0.00..52.00 rows=2400 width=4)
> Output: foo.i, ((foo.i)::text || 'x'::text)
>
> Case #1 uses the normal textcat(text, text) operator by automatically
> coercing 'x' as text.
> However, case #2 uses the anytextcat(anynonarray, text), which is
> marked as volatile thus acts as an optimization barrier. Later, the
> anytextcat SQL function is inlined and the EXPLAIN VERBOSE output has
> no trace of what happened.
>
> Is this something we can, or want, to fix?
>
> One way would be doing preprocess_expression() before
> pull_up_subqueries() so function inlining happens earlier, but I can't
> imagine what unintended consequences that might have.
>
> Another option would be creating explicit immutable text || foo
> operators for common types, but that sounds pretty hacky.
>

It gets worse if you replace the expression with a call to a (non-sql)
function returning text, which was in fact the original use case. Then
you're pretty much hosed.

cheers

andrew


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Text-any concatenation volatility acting as optimization barrier
Date: 2012-02-07 20:36:56
Message-ID: CABRT9RB4-=ScOe00LN_N59b6Z84W0bz1Zc40Ccc93tyBKa7aKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 7, 2012 at 22:31, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> It gets worse if you replace the expression with a call to a (non-sql)
> function returning text, which was in fact the original use case. Then
> you're pretty  much hosed.

Oh, if it's a non-SQL function then marking it as STABLE/IMMUTABLE
should solve it -- did you try that?

Regards,
Marti


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Text-any concatenation volatility acting as optimization barrier
Date: 2012-02-07 21:01:21
Message-ID: 4F319121.6060608@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/07/2012 03:36 PM, Marti Raudsepp wrote:
> On Tue, Feb 7, 2012 at 22:31, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>> It gets worse if you replace the expression with a call to a (non-sql)
>> function returning text, which was in fact the original use case. Then
>> you're pretty much hosed.
> Oh, if it's a non-SQL function then marking it as STABLE/IMMUTABLE
> should solve it -- did you try that?
>

Hmm, your're right. I could have sworn I tried that. That still leaves
the oddity you reported, though.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Text-any concatenation volatility acting as optimization barrier
Date: 2012-02-08 04:21:21
Message-ID: 11520.1328674881@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marti Raudsepp <marti(at)juffo(dot)org> writes:
> Case #1 uses the normal textcat(text, text) operator by automatically
> coercing 'x' as text.
> However, case #2 uses the anytextcat(anynonarray, text), which is
> marked as volatile thus acts as an optimization barrier.

Hmm ... since those operators were invented (in 8.3), we have adopted a
policy that I/O functions are presumed to be no worse than stable:
http://archives.postgresql.org/pgsql-committers/2010-07/msg00307.php
ISTM that would justify relabeling anytextcat/textanycat as stable,
which should fix this.

> One way would be doing preprocess_expression() before
> pull_up_subqueries() so function inlining happens earlier, but I can't
> imagine what unintended consequences that might have.

I'm pretty sure that would be a bad idea.

regards, tom lane


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Text-any concatenation volatility acting as optimization barrier
Date: 2012-02-08 09:53:01
Message-ID: CABRT9RAF+9EnC0qcZrBd=4g8GQGXcPQxvN2=LWJAtG3H06gkTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 8, 2012 at 06:21, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Marti Raudsepp <marti(at)juffo(dot)org> writes:
>> Case #1 uses the normal textcat(text, text) operator by automatically
>> coercing 'x' as text.
>> However, case #2 uses the anytextcat(anynonarray, text), which is
>> marked as volatile thus acts as an optimization barrier.
>
> Hmm ... since those operators were invented (in 8.3), we have adopted a
> policy that I/O functions are presumed to be no worse than stable:
> http://archives.postgresql.org/pgsql-committers/2010-07/msg00307.php
> ISTM that would justify relabeling anytextcat/textanycat as stable,
> which should fix this.

Yes, we should definitely take advantage of that.

I scanned through all of pg_proc, there are 4 functions like this that
can be changed: textanycat, anytextcat, quote_literal and
quote_nullable. All of these have SQL wrappers to cast their argument
to ::text.

quote_literal | select pg_catalog.quote_literal($1::pg_catalog.text)
quote_nullable | select pg_catalog.quote_nullable($1::pg_catalog.text)
textanycat | select $1 || $2::pg_catalog.text
anytextcat | select $1::pg_catalog.text || $2

Patch attached (in git am format). Passes all regression tests (except
'json' which fails on my machine even on git master).

No documentation changes necessary AFAICT.

Regards,
Marti

Attachment Content-Type Size
0001-Mark-textanycat-quote_literal-quote_nullable-functio.patch text/x-patch 4.1 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Text-any concatenation volatility acting as optimization barrier
Date: 2012-02-08 13:23:33
Message-ID: CA+TgmoZZfyutLkUeK9gWvVfVznYRt88pgrkYhbqdgoipQTBoyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 8, 2012 at 4:53 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
> Patch attached (in git am format). Passes all regression tests (except
> 'json' which fails on my machine even on git master).

Can you provide the diffs from the json test on your machine? I don't
see any build-farm failures off-hand...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Text-any concatenation volatility acting as optimization barrier
Date: 2012-02-08 16:20:34
Message-ID: 26911.1328718034@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Feb 8, 2012 at 4:53 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
>> Patch attached (in git am format). Passes all regression tests (except
>> 'json' which fails on my machine even on git master).

> Can you provide the diffs from the json test on your machine? I don't
> see any build-farm failures off-hand...

I'm seeing diffs too after applying Marti's patch: instead of "z", "b",
etc, the field labels in the json values look like "f1", "f2", etc in
the output of queries such as

SELECT row_to_json(q)
FROM (SELECT $$a$$ || x AS b,
y AS c,
ARRAY[ROW(x.*,ARRAY[1,2,3]),
ROW(y.*,ARRAY[4,5,6])] AS z
FROM generate_series(1,2) x,
generate_series(4,5) y) q;

I believe what is happening is that now that the planner can flatten the
sub-select, the RowExprs are getting expanded differently, and that ties
into the "when do we lose column names" business that Andrew has been
on about.

However, I was not seeing that before applying the patch, so maybe Marti
has another issue too.

I am going to go ahead and commit the patch with the altered json
results, because IMO it is mere accident that these regression cases
were coming out with "nice" field labels anyway. When and if Andrew
gets the RowExpr cases fixed properly, that will show up as these cases
going back to nicer-looking output.

regards, tom lane


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: Text-any concatenation volatility acting as optimization barrier
Date: 2012-02-08 16:47:58
Message-ID: CABRT9RCL1=zrdynOcctV0q=jajDHVjY_N7C6z5pQsuyMLj3ZKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 8, 2012 at 18:20, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Feb 8, 2012 at 4:53 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:
>>> Patch attached (in git am format). Passes all regression tests (except
>>> 'json' which fails on my machine even on git master).
>
>> Can you provide the diffs from the json test on your machine?  I don't
>> see any build-farm failures off-hand...
>
> I'm seeing diffs too after applying Marti's patch: instead of "z", "b",
> etc, the field labels in the json values look like "f1", "f2", etc in
> the output of queries such as

Sorry, my bad. I guess I got the two versions (patched and unpatched)
mixed up. Indeed this difference disappears when I remove my patch.
I'll have to be more careful when checking regression diffs in the
future. :)

Regards,
Marti


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Text-any concatenation volatility acting as optimization barrier
Date: 2012-02-08 17:23:44
Message-ID: 4F32AFA0.9010703@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/08/2012 11:20 AM, Tom Lane wrote:
> I am going to go ahead and commit the patch with the altered json
> results, because IMO it is mere accident that these regression cases
> were coming out with "nice" field labels anyway. When and if Andrew
> gets the RowExpr cases fixed properly, that will show up as these
> cases going back to nicer-looking output.

I take it this is your way of making me hurry up with that work :-)

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Text-any concatenation volatility acting as optimization barrier
Date: 2012-02-08 17:53:54
Message-ID: 28131.1328723634@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 02/08/2012 11:20 AM, Tom Lane wrote:
>> I am going to go ahead and commit the patch with the altered json
>> results, because IMO it is mere accident that these regression cases
>> were coming out with "nice" field labels anyway. When and if Andrew
>> gets the RowExpr cases fixed properly, that will show up as these
>> cases going back to nicer-looking output.

> I take it this is your way of making me hurry up with that work :-)

Well, right now the behavior is more consistent than it was before;
we would surely have gotten questions about it as it was. Whether
you find time to improve it or not isn't my concern at the moment.

BTW, the order of the items in the json output doesn't match the column
order of the sub-select, with or without the patch. This seems ... odd.
Is it intentional?

regards, tom lane