Re: Why we are going to have to go DirectIO

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Tatsuo Ishii <ishii(at)postgresql(dot)org>, KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Josh Berkus <josh(at)agliodbs(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why we are going to have to go DirectIO
Date: 2013-12-11 01:09:19
Message-ID: CAGTBQpajeB6w7o2ZpGrfubAR+Hd5gyfUGKyAYp2xuSW-1UX3qQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 10, 2013 at 9:22 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> Communicating more with the kernel (through posix_fadvise, fallocate,
>> aio, iovec, etc...) would probably be good, but it does expose more
>> kernel issues. posix_fadvise, for instance, is a double-edged sword
>> ATM. I do believe, however, that exposing those issues and prompting a
>> fix is far preferable than silently working around them.
>
>
> Getting the kernel to improve those things so PostgreSQL can be changed to
> use them more aggressively seems almost hopeless to me. PostgreSQL would
> have to be coded to take advantage of the improved versions, while defending
> itself from the pre-improved versions. And my understanding is that
> different distributions of Linux cherry pick changes to the kernel back and
> forth into their code, so just looking at the kernel version number without
> also looking at the distribution doesn't mean very much about whether we
> have the improved feature or not. Or am I misinformed about that?
>
> If we can point things out to the kernel hackers things that would be
> absolute improvements, where PostgreSQL and everything else just magically
> start working better if that improvement makes it in, that is great. Both if
> both systems have to be changed in sync to derive any benefit, how do we
> coordinate that?

Well, posix_fadvise is one such thing. It's a cheap form of AIO used
by more than a few programs that want I/O performance, and in its
current form is sub-optimal, the fix is rather simple, it just needs a
lot of testing.

But my report on LKML[0] spurred little actual work. So it's possible
this kind of thing will need patches attached.

On Tue, Dec 10, 2013 at 9:34 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-12-04 05:39:23 -0200, Claudio Freire wrote:
>> Problem is, Postgres relies on a working kernel cache for checkpoints.
>> Checkpoint logic would have to be heavily reworked to account for an
>> impaired kernel cache.
>
> I don't think checkpoints are the critical problem with that, they are
> nicely in the background and we could easily add sorting.

Problem is, with DirectIO, they won't be so background.

Currently, checkpoints assume there's a background process catching
all I/O requests, sorting them, and flushing them as optimally as
possible. This makes the checkpoint's slow-paced write pattern
benignly background, since it will be scheduled opportunistically by
the kernel.

If you use DirectIO, however, a write will pretty much physically move
the writing head (when it reaches the queue's head at least) of
rotating media, causing delays on all other pending I/O requests.
That's quite un-backgroundly of it.

A few blocks per second like that can pretty much kill sequential
scans (I've seen that effect happen with fadvise).

[0] https://lkml.org/lkml/2012/11/9/353

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-12-11 01:25:04 Re: Why we are going to have to go DirectIO
Previous Message Simon Riggs 2013-12-11 00:58:12 Re: ANALYZE sampling is too good