Re: Extended Prefetching using Asynchronous IO - proposal and patch

From: johnlumby <johnlumby(at)hotmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, pgsql hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Greg Stark <stark(at)mit(dot)edu>, "andres(at)2ndquadrant(dot)com" <andres(at)2ndquadrant(dot)com>
Subject: Re: Extended Prefetching using Asynchronous IO - proposal and patch
Date: 2014-08-24 21:49:31
Message-ID: BLU436-SMTP167F40CBAC7CD274E669274A3DE0@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Thanks for the replies and thoughts.

On 08/19/14 18:27, Heikki Linnakangas wrote:
> On 08/20/2014 12:17 AM, John Lumby wrote:
>> I am attaching a new version of the patch for consideration in the
>> current commit fest.
>
> Thanks for working on this!
>
>> Relative to the one I submitted on 25 June in
>> BAY175-W412FF89303686022A9F16AA3190(at)phx(dot)gbl
>> the method for handling aio completion using sigevent has been
>> re-written to use
>> signals exclusively rather than a composite of signals and LWlocks,
>> and this has fixed the problem I mentioned before with the LWlock
>> method.
>
> ISTM the patch is still allocating stuff in shared memory that really
> doesn't belong there. Namely, the BufferAiocb structs. Or at least
> parts of it; there's also a waiter queue there which probably needs to
> live in shared memory, but the rest of it does not.

Actually the reason the BufferAiocb ( the postgresql block corresponding
to the aio's aiocb)
must be located in shared memory is that, as you said, it acts as
anchor for the waiter list.
See further comment below.

>
> At least BufAWaitAioCompletion is still calling aio_error() on an AIO
> request that might've been initiated by another backend. That's not OK.

Yes, you are right, and I agree with this one -
I will add a aio_error_return_code field in the BufferAiocb
and only the originator will set this from the real aiocb.

>
> Please write the patch without atomic CAS operation. Just use a spinlock.

Umm, this is a new criticism I think. I use CAS for things other
than locking,
such as add/remove from shared queue. I suppose maybe a spinlock on
the entire queue
can be used equivalently, but with more code (extra confusion) and
worse performance
(coarser serialization). What is your objection to using gcc's
atomic ops? Portability?

> There's a patch in the commitfest to add support for that,

sorry, support for what? There are already spinlocks in postgresql,
you mean some new kind? please point me at it with hacker msgid or
something.

> but it's not committed yet, and all those
> USE_AIO_ATOMIC_BUILTIN_COMP_SWAP ifdefs make the patch more difficult
> to read. Same with all the other #ifdefs; please just pick a method
> that works.

Ok, yes, the ifdefs are unpleasant. I will do something about that.
Ideally they would be entirely confined to header files and only macro
functions
in C files - maybe I can do that. And eventually when the dust has
settled
eliminate obsolete ifdef blocks altogether.

> Also, please split prefetching of regular index scans into a separate
> patch. It's orthogonal to doing async I/O;

actually not completely orthogonal, see next

> we could prefetch regular index scans with posix_fadvise too, and AIO
> should be useful for prefetching other stuff.

On 08/19/14 19:10, Claudio Freire wrote:
> On Tue, Aug 19, 2014 at 7:27 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> Also, please split prefetching of regular index scans into a separate patch. ...
> That patch already happened on the list, and it wasn't a win in many
> cases. I'm not sure it should be proposed independently of this one.
> Maybe a separate patch, but it should be considered dependent on this.
>
> I don't have an archive link at hand atm, but I could produce one later.
Several people have asked to split this patch into several smaller ones
and I
have thought about it. It would introduce some awkward dependencies.
E.g. splitting the scanner code (index, relation-heap) into separate
patch
from aio code would mean that the scanner patch becomes dependent
on the aio patch. They are not quite orthogonal.

The reason is that the scanners call a new function, DiscardBuffer(blockid)
when aio is in use. We can get around it by providing a stub for
that function
in the scanner patch, but then there is some risk of someone getting the
wrong version of that function in their build. It just adds yet more
complexity
and breakage opportunities.

>
> - Heikki
>
One further comment concerning these BufferAiocb and aiocb control blocks
being in shared memory :

I explained above that the BufferAiocb must be in shared memory for
wait/post.
The aiocb does not necessarily have to be in shared memory,
but since there is a one-to-one relationship between BufferAiocb and aiocb,
it makes the code much simpler , and the operation much more efficient,
if the aiocb is embedded in the BufferAiocb as I have it now.
E.g. if the aiocb is in private-process memory, then an additional
allocation
scheme is needed (fixed number? palloc()in'g extra ones as needed? ...)
which adds complexity, and probably some inefficiency since a shared
pool is usually
more efficient (allows higher maximum per process etc), and there
would have to be
some pointer de-referencing from BufferAiocb to aiocb adding some
(small) overhead.

I understood your objection to use of shared memory as being that you
don't want
a non-originator to access the originator's aiocb using aio_xxx calls,
and, with the
one change I've said I will do above (put the aio_error retcode in the
BufferAiocb)
I will have achieved that requirement. I am hoping this answers your
objections
concerning shared memory.

>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Huang, Suya 2014-08-25 02:05:17 how to query against nested hstore data type
Previous Message Adrian Klaver 2014-08-24 19:12:18 Re: How to insert either a value or the column default?

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2014-08-24 22:26:12 Code bug or doc bug?
Previous Message Thomas Munro 2014-08-24 21:04:09 Re: SKIP LOCKED DATA (work in progress)