Re: Does larger i/o size make sense?

From: Greg Stark <stark(at)mit(dot)edu>
To: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
Cc: PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Does larger i/o size make sense?
Date: 2013-08-23 13:58:51
Message-ID: CAM-w4HOxZ71aG75n6ruRJaSM62CbFUjhHeNp8nsFC-M_sgVTHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 22, 2013 at 8:53 PM, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp> wrote:

> An idea that I'd like to investigate is, PostgreSQL allocates a set of
> continuous buffers to fit larger i/o size when block is referenced due to
> sequential scan, then invokes consolidated i/o request on the buffer.
> It probably make sense if we can expect upcoming block references
> shall be on the neighbor blocks; that is typical sequential read workload.
>

I think it makes more sense to use scatter gather i/o or async i/o to read
to regular sized buffers scattered around memory than to restrict the
buffers to needing to be contiguous.

As others said, Postgres depends on the OS buffer cache to do readahead.
The scenario where the above becomes interesting is if it's paired with a
move to directio or other ways of skipping the buffer cache. Double caching
is a huge waste and leads to lots of inefficiencies.

The blocking issue there is that Postgres doesn't understand much about the
underlying hardware storage. If there were APIs to find out more about it
from the kernel -- how much further before the end of the raid chunk, how
much parallelism it has, how congested the i/o channel is, etc -- then
Postgres might be on par with the kernel and able to eliminate the double
buffering inefficiency and might even be able to do better if it
understands its own workload better.

If Postgres did that then it would be necessary to be able to initiate i/o
on multiple buffers in parallel. That can be done using scatter gather i/o
such as readv() and writev() but that would mean blocking on reading blocks
that might not be needed until the future. Or it could be done using libaio
to initiate i/o and return control as soon as the needed data is available
while other i/o is still pending.

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2013-08-23 13:59:44 Re: pg_system_identifier()
Previous Message Emanuel Calvo 2013-08-23 13:51:28 Parallel pg_basebackup