Re: Extended Prefetching using Asynchronous IO - proposal and patch

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: johnlumby <johnlumby(at)hotmail(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, pgsql hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Greg Stark <stark(at)mit(dot)edu>, "andres(at)2ndquadrant(dot)com" <andres(at)2ndquadrant(dot)com>
Subject: Re: Extended Prefetching using Asynchronous IO - proposal and patch
Date: 2014-08-25 09:37:14
Message-ID: 53FB03CA.2070804@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On 08/25/2014 12:49 AM, johnlumby wrote:
> On 08/19/14 18:27, Heikki Linnakangas wrote:
>> Please write the patch without atomic CAS operation. Just use a spinlock.
>
> Umm, this is a new criticism I think.

Yeah. Be prepared that new issues will crop up as the patch gets slimmer
and easier to review :-). Right now there's still so much chaff that
it's difficult to see the wheat.

> I use CAS for things other
> than locking,
> such as add/remove from shared queue. I suppose maybe a spinlock on
> the entire queue
> can be used equivalently, but with more code (extra confusion) and
> worse performance
> (coarser serialization). What is your objection to using gcc's
> atomic ops? Portability?

Yeah, portability.

Atomic ops might make sense, but at this stage it's important to slim
down the patch to the bare minimum, so that it's easier to review. Once
the initial patch is in, you can improve it with additional patches.

>> There's a patch in the commitfest to add support for that,
>
> sorry, support for what? There are already spinlocks in postgresql,
> you mean some new kind? please point me at it with hacker msgid or
> something.

Atomic ops: https://commitfest.postgresql.org/action/patch_view?id=1314

Once that's committed, you can use the new atomic ops in your patch. But
until then, stick to spinlocks.

> On 08/19/14 19:10, Claudio Freire wrote:
>> On Tue, Aug 19, 2014 at 7:27 PM, Heikki Linnakangas
>> <hlinnakangas(at)vmware(dot)com> wrote:
>>> Also, please split prefetching of regular index scans into a separate patch. ...
>> That patch already happened on the list, and it wasn't a win in many
>> cases. I'm not sure it should be proposed independently of this one.
>> Maybe a separate patch, but it should be considered dependent on this.
>>
>> I don't have an archive link at hand atm, but I could produce one later.
> Several people have asked to split this patch into several smaller ones
> and I
> have thought about it. It would introduce some awkward dependencies.
> E.g. splitting the scanner code (index, relation-heap) into separate
> patch
> from aio code would mean that the scanner patch becomes dependent
> on the aio patch. They are not quite orthogonal.

Right now, please focus on the main AIO patch. That should show a
benefit for bitmap heap scans too, so to demonstrate the benefits of
AIO, you don't need to prefetch regular index scans. The main AIO patch
can be written, performance tested, and reviewed without caring about
regular index scans at all.

> The reason is that the scanners call a new function, DiscardBuffer(blockid)
> when aio is in use. We can get around it by providing a stub for
> that function
> in the scanner patch, but then there is some risk of someone getting the
> wrong version of that function in their build. It just adds yet more
> complexity
> and breakage opportunities.

Regardless of the regular index scans, we'll need to discuss the new API
of PrefetchBuffer and DiscardBuffer.

It would be simpler for the callers if you didn't require the
DiscardBuffer calls. I think it would be totally feasible to write the
patch that way. Just drop the buffer pin after the asynchronous read
finishes. When the caller actually needs the page, it will call
ReadBuffer which will pin it again. You'll get a little bit more bufmgr
traffic that way, but I think it'll be fine in practice.

> One further comment concerning these BufferAiocb and aiocb control blocks
> being in shared memory :
>
> I explained above that the BufferAiocb must be in shared memory for
> wait/post.
> The aiocb does not necessarily have to be in shared memory,
> but since there is a one-to-one relationship between BufferAiocb and aiocb,
> it makes the code much simpler , and the operation much more efficient,
> if the aiocb is embedded in the BufferAiocb as I have it now.
> E.g. if the aiocb is in private-process memory, then an additional
> allocation
> scheme is needed (fixed number? palloc()in'g extra ones as needed? ...)
> which adds complexity,

Yep, use palloc or a fixed pool. There's nothing wrong with that.

> and probably some inefficiency since a shared
> pool is usually
> more efficient (allows higher maximum per process etc), and there
> would have to be
> some pointer de-referencing from BufferAiocb to aiocb adding some
> (small) overhead.

I think you're falling into the domain of premature optimization. A few
pointer dereferences are totally insignificant, and the amount of memory
you're saving pales in comparison to other similar non-shared pools and
caches we have (catalog caches, for example). And on the other side of
the coin, with a shared pool you'll waste memory when async I/O is not
used (e.g because everything fits in RAM), and when it is used, you'll
have more contention on locks and cache lines when multiple processes
use the same objects.

The general guideline in PostgreSQL is that everything is
backend-private, except structures used to communicate between backends.

- Heikki

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Marc Mamin 2014-08-25 10:48:55 Way to identify the current session's temp tables within pg_class ?
Previous Message Huang, Suya 2014-08-25 05:10:36 Re: how to query against nested hstore data type

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-08-25 10:02:50 Re: implement subject alternative names support for SSL connections
Previous Message Bernd Helmle 2014-08-25 08:18:03 Re: Hardening pg_upgrade