Re: Hash partitioning.

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Markus Wanner <markus(at)bluegap(dot)ch>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Yuri Levinsky <yuril(at)celltick(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hash partitioning.
Date: 2013-06-27 21:13:02
Message-ID: CAMkU=1wsR3DvvsxmAsoiQAHGnW+_UFETcQsig-MP6JL9cK098Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 26, 2013 at 8:55 AM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:

> On 06/26/2013 05:46 PM, Heikki Linnakangas wrote:
> > We could also allow a large query to search a single table in parallel.
> > A seqscan would be easy to divide into N equally-sized parts that can be
> > scanned in parallel. It's more difficult for index scans, but even then
> > it might be possible at least in some limited cases.
>
> So far reading sequentially is still faster than hopping between
> different locations. Purely from the I/O perspective, that is.
>

Wouldn't any IO system being used on a high-end system be fairly good about
making this work through interleaved read-ahead algorithms? Also,
hopefully the planner would be able to predict when parallelization has
nothing to add and avoid using it, although surely that is easier said than
done.

>
> For queries where the single CPU core turns into a bottle-neck and which
> we want to parallelize, we should ideally still do a normal, fully
> sequential scan and only fan out after the scan and distribute the
> incoming pages (or even tuples) to the multiple cores to process.
>

That sounds like it would be much more susceptible to lock contention, and
harder to get bug-free, than dividing into bigger chunks, like whole 1 gig
segments.

Fanning out line by line (according to line_number % number_processes) was
my favorite parallelization method in Perl, but those files were read only
and so had no concurrency issues.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2013-06-27 21:20:51 Re: Hash partitioning.
Previous Message Josh Berkus 2013-06-27 21:12:29 Re: patch submission: truncate trailing nulls from heap rows to reduce the size of the null bitmap [Review]