Re: Improvement of checkpoint IO scheduler for stable transaction responses

From: didier <did447(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improvement of checkpoint IO scheduler for stable transaction responses
Date: 2013-07-22 04:21:33
Message-ID: CAJRYxuJkYme5xKXW5M280yWQijOAM1uv0UDkxVryP-PUDn_0Og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Sat, Jul 20, 2013 at 6:28 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:

> On 7/20/13 4:48 AM, didier wrote:
>
>>
>> That is the theory. In practice write caches are so large now, there is
> almost no pressure forcing writes to happen until the fsync calls show up.
> It's easily possible to enter the checkpoint fsync phase only to discover
> there are 4GB of dirty writes ahead of you, ones that have nothing to do
> with the checkpoint's I/O.
>
> Isn't adding another layer of cache the usual answer?

The best would be in the OS, a fs with a big journal able to write
sequentially a lot of blocks.

If not and If you can spare at worst 2bit in memory per data blocks, don't
mind preallocated data files (assuming meta data are stable then) and have
a working mmap( MAP_NONBLOCK), and mincore() syscalls you could have a
checkpoint in bound time, worst case you sequentially write the whole
server RAM to a separate disk every checkpoint.
Not sure I would trust such a beast with my data though :)

Didier

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2013-07-22 07:01:53 Re: Wal sync odirect
Previous Message Quan Zongliang 2013-07-22 04:17:56 improve Chinese locale performance