Re: Support Parallel Query Execution in Executor

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Myron Scott <lister(at)sacadia(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Support Parallel Query Execution in Executor
Date: 2006-04-09 16:27:30
Message-ID: 9234.1144600050@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Myron Scott <lister(at)sacadia(dot)com> writes:
> Gregory Maxwell wrote:
>> There are other cases where it is useful to perform parallel I/O
>> without parallel processing..

> I have done some testing more along these lines with an old fork of
> postgres code (2001). In my tests, I used a thread to delegate out
> the actual heap scan of the SeqScan. The job of the "slave" thread
> the was to fault in buffer pages and determine the time validity of
> the tuples. ItemPointers are passed back to the "master" thread via a
> common memory area guarded by mutex locking.

I was considering a variant idea in the shower this morning: suppose
that we invent one or more "background reader" processes that have
basically the same infrastructure as the background writer, but have
the responsibility of causing buffer reads to happen at useful times
(whereas the writer causes writes to happen at useful times). The
idea would be for backends to signal the readers when they know they
will need a given block soon, and then hopefully when they need it
it'll already be in shared buffers. For instance, in a seqscan it'd be
pretty trivial to request block N+1 just after reading block N, and then
doing our actual processing on block N while (we hope) some reader
process is faulting in N+1. Bitmap indexscans could use this structure
too; I'm less sure about whether plain indexscans could do much with it
though.

The major issues I can see are:

1. We'd need a shared-memory queue of read requests, probably much like
the queue of fsync requests. We've already seen problems with
contention for the fsync queue, IIRC, and that's used much less heavily
than the read request queue would be. So there might be some
performance issues with getting the block requests sent over to the
readers.

2. There are some low-level assumptions that no one reads in pages of
a relation without having some kind of lock on the relation (consider
eg the case where the relation is being dropped). A bgwriter-like
process wouldn't be able to hold lmgr locks, and we wouldn't really want
it to be thrashing the lmgr shared data structures for each read anyway.
So you'd have to design some interlock to guarantee that no backend
abandons a query (and releases its own lmgr locks) while an async read
request it made is still pending. Ugh.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-04-09 16:34:03 Re: Support Parallel Query Execution in Executor
Previous Message Martijn van Oosterhout 2006-04-09 16:26:30 Re: Support Parallel Query Execution in Executor

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2006-04-09 16:34:03 Re: Support Parallel Query Execution in Executor
Previous Message Martijn van Oosterhout 2006-04-09 16:26:30 Re: Support Parallel Query Execution in Executor