Re: Pull up aggregate sublink (was: Parameterized aggregate subquery (was: Pull up aggregate subquery))

From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Yeb Havinga <yebhavinga(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pull up aggregate sublink (was: Parameterized aggregate subquery (was: Pull up aggregate subquery))
Date: 2011-07-27 15:50:20
Message-ID: CAP7Qgmnwry_AoSh0vBk1s=0gaA7UqfhOO5NgA0twQLnKsvAg1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2011/7/27 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Yeb Havinga <yebhavinga(at)gmail(dot)com> writes:
>> A few days ago I read Tomas Vondra's blog post about dss tpc-h queries
>> on PostgreSQL at
>> http://fuzzy.cz/en/articles/dss-tpc-h-benchmark-with-postgresql/ - in
>> which he showed how to manually pull up a dss subquery to get a large
>> speed up. Initially I thought: cool, this is probably now handled by
>> Hitoshi's patch, but it turns out the subquery type in the dss query is
>> different.
>
> Actually, I believe this example is the exact opposite of the
> transformation Hitoshi proposes. Tomas was manually replacing an
> aggregated subquery by a reference to a grouped table, which can be
> a win if the subquery would be executed enough times to amortize
> calculation of the grouped table over all the groups (some of which
> might never be demanded by the outer query).  Hitoshi was talking about
> avoiding calculations of grouped-table elements that we don't need,
> which would be a win in different cases.  Or at least that was the
> thrust of his original proposal; I'm not sure where the patch went since
> then.

My first proposal which is about pulling up aggregate like sublink
expression is exact opposite of this (Tomas pushed down the sublink
expression to join subquery). But the latest proposal is upon
parameterized NestLoop, so I think my latest patch might help
something for the *second* query. Actually the problem is the same; We
want to reduce grouping operation which is not interesting to the
final output, by filtering other relations expression. In this case,
if the joined lineitem-part relation has very few rows by WHERE
conditions (p_brand, p_container), we don't want calculate avg of huge
lineitem because we know almost all of the avg result is not in the
upper result. However, the current optimizer cannot pass the upper
query's condition (like "it will have only few rows") down to the
lower aggregate query.

> This leads me to think that we need to represent both cases as the same
> sort of query and make a cost-based decision as to which way to go.
> Thinking of it as a pull-up or push-down transformation is the wrong
> approach because those sorts of transformations are done too early to
> be able to use cost comparisons.

Wrapping up my mind above and reading this paragraph, it might be
another work to make sublink expression look like the same as join.
But what we want to solve is the same goal, I think.

Regards,

--
Hitoshi Harada

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-07-27 16:12:15 Re: sinval synchronization considered harmful
Previous Message Hitoshi Harada 2011-07-27 15:34:21 Re: Pull up aggregate sublink (was: Parameterized aggregate subquery (was: Pull up aggregate subquery))