Re: Parallel Seq Scan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-02-10 13:52:09
Message-ID: CA+Tgmoa9Q6zw6tDLAiNSY_muHCJuk_1_kMf8zjRVBh6LtFq9AQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 10, 2015 at 2:48 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Note that I'm not saying that Amit's patch is right - I haven't read it
> - but that I don't think a 'scan this range of pages' heapscan API would
> not be a bad idea. Not even just for parallelism, but for a bunch of
> usecases.

We do have that, already. heap_setscanlimits(). I'm just not
convinced that that's the right way to split up a parallel scan.
There's too much risk of ending up with a very-uneven distribution of
work.

>> Regarding tuple flow between backends, I've thought about that before,
>> I agree that we need it, and I don't think I know how to do it. I can
>> see how to have a group of processes executing a single node in
>> parallel, or a single process executing a group of nodes we break off
>> from the query tree and push down to it, but what you're talking about
>> here is a group of processes executing a group of nodes jointly.
>
> I don't think it really is that. I think you'd do it essentially by
> introducing a couple more nodes. Something like
>
> SomeUpperLayerNode
> |
> |
> AggCombinerNode
> / \
> / \
> / \
> PartialHashAggNode PartialHashAggNode .... .PartialHashAggNode ...
> | |
> | |
> | |
> | |
> PartialSeqScan PartialSeqScan
>
> The only thing that'd potentially might need to end up working jointly
> jointly would be the block selection of the individual PartialSeqScans
> to avoid having to wait for stragglers for too long. E.g. each might
> just ask for a range of a 16 megabytes or so that it scans sequentially.
>
> In such a plan - a pretty sensible and not that uncommon thing for
> parallelized aggregates - you'd need to be able to tell the heap scans
> which blocks to scan. Right?

For this case, what I would imagine is that there is one parallel heap
scan, and each PartialSeqScan attaches to it. The executor says "give
me a tuple" and heapam.c provides one. Details like the chunk size
are managed down inside heapam.c, and the executor does not know about
them. It just knows that it can establish a parallel scan and then
pull tuples from it.

>> Maybe we designate nodes as can-generate-multiple-tuple-streams (seq
>> scan, mostly, I would think) and can-absorb-parallel-tuple-streams
>> (sort, hash, materialize), or something like that, but I'm really
>> fuzzy on the details.
>
> I don't think we really should have individual nodes that produce
> multiple streams - that seems like it'd end up being really
> complicated. I'd more say that we have distinct nodes (like the
> PartialSeqScan ones above) that do a teensy bit of coordination about
> which work to perform.

I think we're in violent agreement here, except for some
terminological confusion. Are there N PartialSeqScan nodes, one
running in each node, or is there one ParallelSeqScan node, which is
copied and run jointly across N nodes? You can talk about either way
and have it make sense, but we haven't had enough conversations about
this on this list to have settled on a consistent set of vocabulary
yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-02-10 14:00:38 Re: Fetch zero result rows when executing a query?
Previous Message Pavel Stehule 2015-02-10 13:46:23 Re: For cursors, there is FETCH and MOVE, why no TELL?