Expression Pruning in postgress

Lists: pgsql-hackers
From: HarmeekSingh Bedi <harmeeksingh(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Expression Pruning in postgress
Date: 2011-07-06 20:29:03
Message-ID: CALLwk6v4OwMbb6+Rr18hCvC+-Y4RY6KcVV1a2QxPTsz0pde6pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi .

Apology for the bandwith. Did not know who to ask the question . I
was interested in a brief detail on how postgress does expression
pruning between nodes . The basic question is as follows

Scenerio

In a plan where Node 1 is parent {say join) and Node 2 is child
(say scan) . If node 1 has a expression say foo(column) then scan
will project 'column' for sure and join will
evaluate foo(column). Now if the node above join does not need
column ? how does postgress remove the column from join's projection
list . I am seeing that it does not in many
cases specially when sort appears above.

Question
In general is there something in the code that can process a plan
tree and find out where expressions are generated and where there
references are not required - thus removing expressions not needed .
In my case the column is a huge column and hence it will matter if it
is removed from the projection list as soon as possible ?

Any insights welcome and thanks for all help

Regards
Harmeek


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: HarmeekSingh Bedi <harmeeksingh(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Expression Pruning in postgress
Date: 2011-07-07 14:24:49
Message-ID: 29948.1310048689@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

HarmeekSingh Bedi <harmeeksingh(at)gmail(dot)com> writes:
> In a plan where Node 1 is parent {say join) and Node 2 is child
> (say scan) . If node 1 has a expression say foo(column) then scan
> will project 'column' for sure and join will
> evaluate foo(column). Now if the node above join does not need
> column ? how does postgress remove the column from join's projection
> list .

See build_joinrel_tlist() in relnode.c.

> I am seeing that it does not in many
> cases specially when sort appears above.

Please show a concrete example.

regards, tom lane


From: HarmeekSingh Bedi <harmeeksingh(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Expression Pruning in postgress
Date: 2011-07-07 17:53:08
Message-ID: CALLwk6vj0V959K_F2QQM=TsL7cGw3=tYh-JE7p4eyzDtu56GJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks Tom. Here is a example. Just a background of things . I have made
changes in postgress execution and storage engine to make it a MPP style
engine - keeping all optimizer intact. Basically take pgress serial plan and
construct a parallel plan. The query I am running is below.

*Query*

# explain select ooj.local_time,ooj.is_timeout,ooj.capsulename,
regexp_split_to_table(
case
when ooj.isubstr is null then 'none'
when ooj.isubstr='' then 'none'
else ooj.isubstr
end ,';') as interest,
sum(ooj.impression) count_impressions,
sum(ooj.click) count_clicks
from (
select (impression.server_utc_time/3600000)*3600 as local_time,
impression.is_timeout_default_impression as is_timeout,
impression.capsule_name as capsulename,
substring(impression.website_variables from 'ybt=([0-9;]*)') as
isubstr,
1 as impression,
case click.impression_id when null then 0 else 1 end as click
from
impression_ytw2_row impression
left outer join clicks_row click
on impression.impression_id = click.impression_id) ooj
group by local_time, is_timeout, capsulename, interest;

*Now if you kindly bear with me and ignore the Parallel nodes in the plan
{which are just parallel distributors} . If you compare the two plans below
and check the output tupledesc of the Hash join you will see that some
columns are not gettign pruned out - In my case this is a big column.
*
*Case 1 Plan {inline view gets merged} *

Now when the inline view gets merged via pullup_subqueries I can see that
we have columns {impression.website_variables} which should get pruned out
but it does not after the JOIN { it gets pruned out after hash aggregate).

Parallel reciever RANDOM queue (nodenum=0 cost=132.79..133.12 rows=10
width=104)
Output: (sum(1)), impression.is_timeout_default_impression,
impression.capsule_name, (sum(1)), sum(1), sum(1)
-> Parallel sender RANDOM queue (nodenum=1 cost=132.79..133.12 rows=10
width=104)
Output: (sum(1)), impression.is_timeout_default_impression,
impression.capsule_name, (sum(1)), sum(1), sum(1)
-> HashAggregate (nodenum=2 cost=132.79..133.12 rows=10
width=104)
Output: (((impression.server_utc_time / 3600000) * 3600)),
impression.is_timeout_default_impression, impression.capsule_name,
(regexp_split_to_table(CASE WHEN
("substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) IS
NULL
THEN 'none'::text WHEN ("substring"((impression.website_variables)::text,
'ybt=([0-9;]*)'::text) = ''::text) THEN 'none'::text ELSE
"substring"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text)
END, ';'::text)), sum(1), sum(1)
-> Parallel reciever HASH-AGG queue (nodenum=3
cost=117.45..130.08 rows=181 width=104)
Output: impression.server_utc_time,
impression.is_timeout_default_impression, impression.capsule_name, *
impression.website_variables*, ((impression.server_utc_time / 3600000) *
3600), regexp_split_to_table(CASE WHEN ("substring"((impres
ion.website_variables)::text, 'ybt=([0-9;]*)'::text) IS NULL) THEN
'none'::text WHEN ("substring"((impression.website_variables)::text,
'ybt=([0-9;]*)'::text) = ''::text) THEN 'none'::text ELSE
"substring"((impression.website_variables)::text, 'ybt=([0-9;
*)'::text) END, ';'::text)
-> Parallel sender HASH-AGG queue (nodenum=4
cost=117.45..130.08 rows=181 width=104)
Output: impression.server_utc_time,
impression.is_timeout_default_impression, impression.capsule_name, *
impression.website_variables*, ((impression.server_utc_time / 3600000) *
3600), regexp_split_to_table(CASE WHEN ("substring"((
mpression.website_variables)::text, 'ybt=([0-9;]*)'::text) IS NULL) THEN
'none'::text WHEN ("substring"((impression.website_variables)::text,
'ybt=([0-9;]*)'::text) = ''::text) THEN 'none'::text ELSE
"substring"((impression.website_variables)::text, 'ybt=
[0-9;]*)'::text) END, ';'::text)
* -> Hash Left Join (nodenum=5
cost=117.45..130.08 rows=181 width=104)
Output: impression.server_utc_time,
impression.is_timeout_default_impression, impression.capsule_name,
impression.website_variables, ((impression.server_utc_time / 3600000) *
3600), regexp_split_to_table(CASE WHEN ("substr
ng"((impression.website_variables)::text, 'ybt=([0-9;]*)'::text) IS NULL)
THEN 'none'::text WHEN ("substring"((impression.website_variables)::text,
'ybt=([0-9;]*)'::text) = ''::text) THEN 'none'::text ELSE
"substring"((impression.website_variables)::text,*
'ybt=([0-9;]*)'::text) END, ';'::text)
-> Parallel reciever HASH-LEFT queue
(nodenum=6 cost=0.00..3.10 rows=10 width=136)
Output: impression.server_utc_time,
impression.is_timeout_default_impression, impression.capsule_name,
impression.website_variables, impression.impression_id
-> Parallel sender HASH-LEFT queue
(nodenum=7 cost=0.00..3.10 rows=10 width=136)
Output:
impression.server_utc_time, impression.is_timeout_default_impression,
impression.capsule_name, impression.website_variables,
impression.impression_id
-> Seq Scan on
impression_ytw2_row impression (nodenum=8 cost=0.00..3.10 rows=10
width=136)
Output:
impression.server_utc_time, impression.is_timeout_default_impression,
impression.capsule_name, impression.website_variables,
impression.impression_id
-> Hash (nodenum=9 cost=72.20..72.20
rows=3620 width=32)
Output: click.impression_id
-> Parallel reciever HASH-RIGHT
queue (nodenum=10 cost=0.00..72.20 rows=3620 width=32)
Output: click.impression_id
-> Parallel sender HASH-RIGHT
queue (nodenum=11 cost=0.00..72.20 rows=3620 width=32)
Output:
click.impression_id
-> Seq Scan on
clicks_row click (nodenum=12 cost=0.00..72.20 rows=3620 width=32)
Output:
click.impression_id

*Case 2 { PLan when I disable code to merge view - basically do not call
pullup_subqueries - it does the right thing }

* Parallel reciever RANDOM queue (nodenum=0 cost=133.70..137.32 rows=181
width=112)
Output: ooj.local_time, ooj.is_timeout, ooj.capsulename, (),
sum((sum((sum(ooj.impression))))), sum((sum((sum(ooj.click)))))
-> Parallel sender RANDOM queue (nodenum=1 cost=133.70..137.32 rows=181
width=112)
Output: ooj.local_time, ooj.is_timeout, ooj.capsulename, (),
sum((sum(ooj.impression))), sum((sum(ooj.click)))
-> HashAggregate (nodenum=2 cost=133.70..137.32 rows=181
width=112)
Output: ooj.local_time, ooj.is_timeout, ooj.capsulename,
(regexp_split_to_table(CASE WHEN (ooj.isubstr IS NULL) THEN 'none'::text
WHEN (ooj.isubstr = ''::text) THEN 'none'::text ELSE ooj.isubstr END,
';'::text)), sum(ooj.impression), sum(oo
.click)
-> Parallel reciever HASH-AGG queue (nodenum=3
cost=117.45..130.98 rows=181 width=112)
Output: ooj.local_time, ooj.is_timeout,
ooj.capsulename, ooj.isubstr, ooj.impression, ooj.click,
regexp_split_to_table(CASE WHEN (ooj.isubstr IS NULL) THEN 'none'::text WHEN
(ooj.isubstr = ''::text) THEN 'none'::text ELSE ooj.isubstr
ND, ';'::text)
-> Parallel sender HASH-AGG queue (nodenum=4
cost=117.45..130.98 rows=181 width=112)
Output: ooj.local_time, ooj.is_timeout,
ooj.capsulename, ooj.isubstr, ooj.impression, ooj.click,
regexp_split_to_table(CASE WHEN (ooj.isubstr IS NULL) THEN 'none'::text WHEN
(ooj.isubstr = ''::text) THEN 'none'::text ELSE ooj.is
bstr END, ';'::text)
-> Subquery Scan ooj (nodenum=5
cost=117.45..130.98 rows=181 width=112)
Output: ooj.local_time, ooj.is_timeout,
ooj.capsulename, ooj.isubstr, ooj.impression, ooj.click,
regexp_split_to_table(CASE WHEN (ooj.isubstr IS NULL) THEN 'none'::text WHEN
(ooj.isubstr = ''::text) THEN 'none'::text ELSE
oj.isubstr END, ';'::text)
*-> Hash Left Join (nodenum=6
cost=117.45..128.27 rows=181 width=104)
Output: ((impression.server_utc_time
/ 3600000) * 3600), impression.is_timeout_default_impression,
impression.capsule_name, "substring"((impression.website_variables)::text,
'ybt=([0-9;]*)'::text), 1, 1*
-> Parallel reciever HASH-LEFT
queue (nodenum=7 cost=0.00..3.10 rows=10 width=136)
Output:
impression.server_utc_time, impression.is_timeout_default_impression,
impression.capsule_name, impression.website_variables,
impression.impression_id
-> Parallel sender HASH-LEFT
queue (nodenum=8 cost=0.00..3.10 rows=10 width=136)
Output:
impression.server_utc_time, impression.is_timeout_default_impression,
impression.capsule_name, impression.website_variables,
impression.impression_id
-> Seq Scan on
impression_ytw2_row impression (nodenum=9 cost=0.00..3.10 rows=10
width=136)
Output:
impression.server_utc_time, impression.is_timeout_default_impression,
impression.capsule_name, impression.website_variables,
impression.impression_id
-> Hash (nodenum=10
cost=72.20..72.20 rows=3620 width=32)
Output: click.impression_id
-> Parallel reciever
HASH-RIGHT queue (nodenum=11 cost=0.00..72.20 rows=3620 width=32)
Output:
click.impression_id
-> Parallel sender
HASH-RIGHT queue (nodenum=12 cost=0.00..72.20 rows=3620 width=32)
Output:
click.impression_id
-> Seq Scan on
clicks_row click (nodenum=13 cost=0.00..72.20 rows=3620 width=32)
Output:
click.impression_id

*More analysis

*

1. Looked at code in make_join_rel and build_joinrel_tlist I can see in
one case the bms_nonempty_difference code kicks in and finds that the column
can be pruned out and other case it does not do the right thing.
2. I am still trying to work out why pullup_subquery should make a
difference.

Any pointers appreciated .

Regards
Harmeek

2011 at 7:24 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> HarmeekSingh Bedi <harmeeksingh(at)gmail(dot)com> writes:
>> In a plan where Node 1 is parent {say join) and Node 2 is child
>> (say scan) . If node 1 has a expression say foo(column) then scan
>> will project 'column' for sure and join will
>> evaluate foo(column). Now if the node above join does not need
>> column ? how does postgress remove the column from join's projection
>> list .
>
> See build_joinrel_tlist() in relnode.c.
>
>> I am seeing that it does not in many
>> cases specially when sort appears above.
>
> Please show a concrete example.
>
> regards, tom lane
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: HarmeekSingh Bedi <harmeeksingh(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Expression Pruning in postgress
Date: 2011-07-10 17:28:37
Message-ID: 3562.1310318917@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

HarmeekSingh Bedi <harmeeksingh(at)gmail(dot)com> writes:
> Thanks Tom. Here is a example. Just a background of things . I have made
> changes in postgress execution and storage engine to make it a MPP style
> engine - keeping all optimizer intact. Basically take pgress serial plan and
> construct a parallel plan. The query I am running is below.

The output lists for the parallel nodes look pretty broken, but I guess
you weren't asking about those. As near as I can tell, what you're
unhappy about is that it's passing up both raw column values and
pre-evaluated placeholder expressions using those values, when only the
placeholders are really going to be needed. Yeah, that's probably true,
because the placeholder mechanism isn't (yet) taken into account by the
code that determines how far up a column value will be needed.

In standard Postgres this isn't much of an issue because passing up
by-reference Datums is really quite cheap ... it's only a pointer copy
in many cases, and even where it's not, it's probably just a
toast-pointer copy. I suspect it's costing you more because your
"parallel" nodes have to instantiate the tuples instead of just passing
virtual slots around ... but it's still not clear to me why you're
passing more than a toast pointer for big values. Maybe you're being
too enthusiastic about detoasting pointers early?

regards, tom lane


From: HarmeekSingh Bedi <harmeeksingh(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Expression Pruning in postgress
Date: 2011-07-11 01:35:37
Message-ID: CALLwk6tSx3_rmBpLwYViGhBBeidy2n64gPZaPDcwKELNC1iQmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi tom .

Thanks for your input . Appreciate your taking time and responding . Just
some comments.

1. May be I am mistaken Kindly help me understand a bit more. I do agree
that passing datums up the node chain helps - but consider the case when
either Sort or Hash joins spills on disk - large columns that get written on
to the disk will still cause a lot of performance issues {as sorts spills
will detoast} - lot of unnecessary columns will cause lot of I/O. 1024
varchars and lot of rows and you can see that serial case detoriates due to
this.
2. The parallel case works - the parallel nodes inherit the target list
of the underlying nodes - but in my case the issue of non pruned column
becomes worse as it also adds to network payload which is worse.
3. Now coming to your detoast . I have to do that at parallel node
boundaries as the data flow operators {delimited by parallel operators} run
on different machines and hence has to pass by value.

I did make a fix at least to alleviate this case in the optimizer . But I am
going to work on a more general approach of expression pruning based on the
lifetime of an expression. Basically each node will either references or
generate an expression. Any expression that is generated and is not
referenced by any top on top will be eliminated.

Regards
Harmeek

On Sun, Jul 10, 2011 at 10:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> HarmeekSingh Bedi <harmeeksingh(at)gmail(dot)com> writes:
> > Thanks Tom. Here is a example. Just a background of things . I have made
> > changes in postgress execution and storage engine to make it a MPP style
> > engine - keeping all optimizer intact. Basically take pgress serial plan
> and
> > construct a parallel plan. The query I am running is below.
>
> The output lists for the parallel nodes look pretty broken, but I guess
> you weren't asking about those. As near as I can tell, what you're
> unhappy about is that it's passing up both raw column values and
> pre-evaluated placeholder expressions using those values, when only the
> placeholders are really going to be needed. Yeah, that's probably true,
> because the placeholder mechanism isn't (yet) taken into account by the
> code that determines how far up a column value will be needed.
>
> In standard Postgres this isn't much of an issue because passing up
> by-reference Datums is really quite cheap ... it's only a pointer copy
> in many cases, and even where it's not, it's probably just a
> toast-pointer copy. I suspect it's costing you more because your
> "parallel" nodes have to instantiate the tuples instead of just passing
> virtual slots around ... but it's still not clear to me why you're
> passing more than a toast pointer for big values. Maybe you're being
> too enthusiastic about detoasting pointers early?
>
> regards, tom lane
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: HarmeekSingh Bedi <harmeeksingh(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Expression Pruning in postgress
Date: 2011-07-13 15:59:21
Message-ID: 5621.1310572761@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

HarmeekSingh Bedi <harmeeksingh(at)gmail(dot)com> writes:
> 1. May be I am mistaken Kindly help me understand a bit more. I do agree
> that passing datums up the node chain helps - but consider the case when
> either Sort or Hash joins spills on disk - large columns that get written on
> to the disk will still cause a lot of performance issues {as sorts spills
> will detoast}

No, they don't. What gets sent to disk is normally just the toast
pointer datum (19 bytes or whatever it is these days).

> I did make a fix at least to alleviate this case in the optimizer . But I am
> going to work on a more general approach of expression pruning based on the
> lifetime of an expression. Basically each node will either references or
> generate an expression. Any expression that is generated and is not
> referenced by any top on top will be eliminated.

Sounds like overkill.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: HarmeekSingh Bedi <harmeeksingh(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Expression Pruning in postgress
Date: 2011-07-13 17:43:09
Message-ID: 7620.1310578989@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> HarmeekSingh Bedi <harmeeksingh(at)gmail(dot)com> writes:
>> I did make a fix at least to alleviate this case in the optimizer . But I am
>> going to work on a more general approach of expression pruning based on the
>> lifetime of an expression. Basically each node will either references or
>> generate an expression. Any expression that is generated and is not
>> referenced by any top on top will be eliminated.

> Sounds like overkill.

BTW, I looked a little more closely at the example you posted earlier,
and it's a lot simpler than I initially thought. The actual issue seems
to not be anything to do with placeholders; it's just that
make_subplanTargetList() is lazy about deciding what to put into the
targetlist that will be passed to query_planner. Per its comment,

* For example, given a query like
* SELECT a+b,SUM(c+d) FROM table GROUP BY a+b;
* we want to pass this targetlist to the subplan:
* a,b,c,d,a+b
* where the a+b target will be used by the Sort/Group steps, and the
* other targets will be used for computing the final results. (In the
* above example we could theoretically suppress the a and b targets and
* pass down only c,d,a+b, but it's not really worth the trouble to
* eliminate simple var references from the subplan. We will avoid doing
* the extra computation to recompute a+b at the outer level; see
* fix_upper_expr() in setrefs.c.)

The extra variables you're complaining about are all used in GROUP BY
expressions, so the above describes exactly the behavior you're seeing.

Possibly it wouldn't be too hard to separate the GROUP BY targetlist
items from the rest and only apply flatten_tlist to the items that
aren't GROUP BY targets. I'm unconvinced that the use case is wide
enough to be worth the trouble, though.

regards, tom lane