Re: Why we are going to have to go DirectIO

From: Greg Stark <stark(at)mit(dot)edu>
To: KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Tatsuo Ishii <ishii(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Josh Berkus <josh(at)agliodbs(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why we are going to have to go DirectIO
Date: 2013-12-05 14:42:29
Message-ID: CAM-w4HMWf4J8ZKKBFhMy2EntXdKiGOhDKtdi0YDxggh-YY6fxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 5, 2013 at 8:35 AM, KONDO Mitsumasa
<kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Yes. And using something efficiently DirectIO is more difficult than
> BufferedIO.
> If we change write() flag with direct IO in PostgreSQL, it will execute
> hardest ugly randomIO.

Using DirectIO presumes you're using libaio or threads to implement
prefetching and asynchronous I/O scheduling.

I think in the long term there are only two ways to go here. Either a)
we use DirectIO and implement an I/O scheduler in Postgres or b) We
use mmap and use new system calls to give the kernel all the
information Postgres has available to it to control the I/O scheduler.

(a) is by far the lower risk option as it's well trodden and doesn't
depend on other projects to do anything. The most that would be
valuable is if the kernel provided an interface to learn about the
hardware properties such as the raid geometry and queue depth for
different parts of the devices.

(b) is the way more interesting research project though. I don't think
anyone's tried it and the kernel interface to provide the kinds of
information Postgres needs requires a lot of thought. If it's done
right then Postgres wouldn't need a buffer cache manager at all. It
would just mmap the entire database and tell the kernel when it's safe
to flush buffers and let the kernel decide when based on when it's
convenient for the hardware.

I don't think it's tenable in the long run to have Postgres manage
buffers that are then copied to another buffer in memory which are
then flushed to disk based on another scheduler. That it works at all
is a testament to the quality of the code in Postgres and Linux but
it's implausibly inefficient.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2013-12-05 14:43:31 Re: Feature request: Logging SSL connections
Previous Message Tom Lane 2013-12-05 14:41:57 Re: Proposal: variant of regclass