Re: using custom scan nodes to prototype parallel sequential scan

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: using custom scan nodes to prototype parallel sequential scan
Date: 2014-11-14 11:02:28
Message-ID: 9A28C8860F777E439AA12E8AEA7694F801077C8F@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 14 November 2014 07:37, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> > On 11/12/14, 1:54 AM, David Rowley wrote:
> >>
> >> On Tue, Nov 11, 2014 at 9:29 PM, Simon Riggs <simon(at)2ndquadrant(dot)com
> >> <mailto:simon(at)2ndquadrant(dot)com>> wrote:
> >>
> >>
> >> This plan type is widely used in reporting queries, so will hit the
> >> mainline of BI applications and many Mat View creations.
> >> This will allow SELECT count(*) FROM foo to go faster also.
> >>
> >> We'd also need to add some infrastructure to merge aggregate states
> >> together for this to work properly. This means that could also work
> >> for
> >> avg() and stddev etc. For max() and min() the merge functions would
> >> likely just be the same as the transition functions.
> >
> >
> > Sanity check: what % of a large aggregate query fed by a seqscan
> > actually spent in the aggregate functions? Even if you look strictly
> > at CPU cost, isn't there more code involved to get data to the
> > aggregate function than in the aggregation itself, except maybe for
> numeric?
>
> Yes, which is why I suggested pre-aggregating before collecting the streams
> together.
>
> The point is not that the aggregation is expensive, its that the aggregation
> eats data and the required bandwidth for later steps is reduced and hence
> does not then become a bottleneck that renders the parallel Seq Scan
> ineffective.
>
I'd like to throw community folks a question.
Did someone have a discussion to the challenge of aggregate push-down across
relations join in the past? It potentially reduces number of rows to be joined.
If we already had, I'd like to check up the discussion at that time.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeremy Harris 2014-11-14 11:06:47 Re: EXPLAIN ANALYZE output weird for Top-N Sort
Previous Message Alvaro Herrera 2014-11-14 10:29:54 Re: Size of regression database