From: | Claudio Freire <klaussfreire(at)gmail(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Tatsuo Ishii <ishii(at)postgresql(dot)org>, KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Josh Berkus <josh(at)agliodbs(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Why we are going to have to go DirectIO |
Date: | 2013-12-11 01:09:19 |
Message-ID: | CAGTBQpajeB6w7o2ZpGrfubAR+Hd5gyfUGKyAYp2xuSW-1UX3qQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Dec 10, 2013 at 9:22 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> Communicating more with the kernel (through posix_fadvise, fallocate,
>> aio, iovec, etc...) would probably be good, but it does expose more
>> kernel issues. posix_fadvise, for instance, is a double-edged sword
>> ATM. I do believe, however, that exposing those issues and prompting a
>> fix is far preferable than silently working around them.
>
>
> Getting the kernel to improve those things so PostgreSQL can be changed to
> use them more aggressively seems almost hopeless to me. PostgreSQL would
> have to be coded to take advantage of the improved versions, while defending
> itself from the pre-improved versions. And my understanding is that
> different distributions of Linux cherry pick changes to the kernel back and
> forth into their code, so just looking at the kernel version number without
> also looking at the distribution doesn't mean very much about whether we
> have the improved feature or not. Or am I misinformed about that?
>
> If we can point things out to the kernel hackers things that would be
> absolute improvements, where PostgreSQL and everything else just magically
> start working better if that improvement makes it in, that is great. Both if
> both systems have to be changed in sync to derive any benefit, how do we
> coordinate that?
Well, posix_fadvise is one such thing. It's a cheap form of AIO used
by more than a few programs that want I/O performance, and in its
current form is sub-optimal, the fix is rather simple, it just needs a
lot of testing.
But my report on LKML[0] spurred little actual work. So it's possible
this kind of thing will need patches attached.
On Tue, Dec 10, 2013 at 9:34 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-12-04 05:39:23 -0200, Claudio Freire wrote:
>> Problem is, Postgres relies on a working kernel cache for checkpoints.
>> Checkpoint logic would have to be heavily reworked to account for an
>> impaired kernel cache.
>
> I don't think checkpoints are the critical problem with that, they are
> nicely in the background and we could easily add sorting.
Problem is, with DirectIO, they won't be so background.
Currently, checkpoints assume there's a background process catching
all I/O requests, sorting them, and flushing them as optimally as
possible. This makes the checkpoint's slow-paced write pattern
benignly background, since it will be scheduled opportunistically by
the kernel.
If you use DirectIO, however, a write will pretty much physically move
the writing head (when it reaches the queue's head at least) of
rotating media, causing delays on all other pending I/O requests.
That's quite un-backgroundly of it.
A few blocks per second like that can pretty much kill sequential
scans (I've seen that effect happen with fadvise).
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2013-12-11 01:25:04 | Re: Why we are going to have to go DirectIO |
Previous Message | Simon Riggs | 2013-12-11 00:58:12 | Re: ANALYZE sampling is too good |