Re: Extended Prefetching using Asynchronous IO - proposal and patch

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: John Lumby <johnlumby(at)hotmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, pgsql hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extended Prefetching using Asynchronous IO - proposal and patch
Date: 2014-06-24 14:02:38
Message-ID: 53A984FE.1070309@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On 06/24/2014 04:29 PM, John Lumby wrote:
>> On Mon, Jun 23, 2014 at 2:43 PM, John Lumby <johnlumby(at)hotmail(dot)com> wrote:
>>> It is when some *other* backend gets there first with the ReadBuffer that
>>> things are a bit trickier. The current version of the patch did polling for that case
>>> but that drew criticism, and so an imminent new version of the patch
>>> uses the sigevent mechanism. And there are other ways still.
>>
>> I'm a bit puzzled by this though. Postgres *already* has code for this
>> case. When you call ReadBuffer you set the bits on the buffer
>
> Good question. Let me explain.
> Yes, postgresql has code for the case of a backend is inside a synchronous
> read() or write(), performed from a ReadBuffer(), and some other backend
> wants that buffer. asynchronous aio is initiated not from ReadBuffer
> but from PrefetchBuffer, and performs its aio_read into an allocated, pinned,
> postgresql buffer. This is entirely different from the synchronous io case.
> Why? Because the issuer of the aio_read (the "originator") is unaware
> of this buffer pinned on its behalf, and is then free to do any other
> reading or writing it wishes, such as more prefetching or any other operation.
> And furthermore, it may *never* issue a ReadBuffer for the block which it
> prefetched.

I still don't see the difference. Once an asynchronous read is initiated
on the buffer, it can't be used for anything else until the read has
finished. This is exactly the same situation as with a synchronous read:
after read() is called, the buffer can't be used for anything else until
the call finishes.

In particular, consider the situation from another backend's point of
view. Looking from another backend (i.e. one that didn't initiate the
read), there's no difference between a synchronous and asynchronous
read. So why do we need a different IPC mechanism for the synchronous
and asynchronous cases? We don't.

I understand that *within the backend*, you need to somehow track the
I/O, and you'll need to treat synchronous and asynchronous I/Os
differently. But that's all within the same backend, and doesn't need to
involve the flags or locks in shared memory at all. The inter-process
communication doesn't need any changes.

>> The problem with using the Buffers I/O in progress bit is that the I/O
>> might complete while the other backend is busy doing stuff. As long as
>> you can handle the I/O completion promptly -- either in callback or
>> thread or signal handler then that wouldn't matter. But I'm not clear
>> that any of those will work reliably.
>
> They both work reliably, but the criticism was that backend B polling
> an aiocb of an aio issued by backend A is not documented as
> being supported (although it happens to work), hence the proposed
> change to use sigevent.

You didn't understand what Greg meant. You need to handle the completion
of the I/O in the same process that initiated it, by clearing the
in-progress bit of the buffer and releasing the I/O in-progress lwlock
on it. And you need to do that very quickly after the I/O has finished,
because there might be another backend waiting for the buffer and you
don't want him to wait longer than necessary.

The question is, if you receive the notification of the I/O completion
using a signal or a thread, is it safe to release the lwlock from the
signal handler or a separate thread?

> By the way, on the "will it actually work though?" question which several folks
> have raised, I should mention that this patch has been in semi-production
> use for almost 2 years now in different stages of completion on all postgresql
> releases from 9.1.4 to 9.5 devel. I would guess it has had around
> 500 hours of operation by now. I'm sure there are bugs still to be
> found but I am confident it is fundamentally sound.

Well, a committable version of this patch is going to look quite
different from the first version that you posted, so I don't put much
weight on how long you've tested the first version.

- Heikki

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John Lumby 2014-06-24 15:08:55 Re: Extended Prefetching using Asynchronous IO - proposal and patch
Previous Message David G Johnston 2014-06-24 13:40:11 Re: if row has property X, find all rows that has property X

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-06-24 14:04:03 Re: idle_in_transaction_timeout
Previous Message Vik Fearing 2014-06-24 13:37:22 Re: idle_in_transaction_timeout