Re: Spread checkpoint sync

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Spread checkpoint sync
Date: 2010-11-21 22:45:50
Message-ID: 201011212345.50499.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sunday 21 November 2010 23:19:30 Martijn van Oosterhout wrote:
> For a similar problem we had (kernel buffering too much) we had success
> using the fadvise and madvise WONTNEED syscalls to force the data to
> exit the cache much sooner than it would otherwise. This was on Linux
> and it had the side-effect that the data was deleted from the kernel
> cache, which we wanted, but probably isn't appropriate here.
Yep, works fine. Although it has the issue that the data will get read again if
archiving/SR is enabled.

> There is also sync_file_range, but that's linux specific, although
> close to what you want I think. It would allow you to work with blocks
> smaller than 1GB.
Unfortunately that puts the data under quite high write-out pressure inside
the kernel - which is not what you actually want because it limits reordering
and such significantly.

It would be nicer if you could get a mix of both semantics (looking at it,
depending on the approach that seems to be about a 10 line patch to the
kernel). I.e. indicate that you want to write the pages soonish, but don't put
it on the head of the writeout queue.

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2010-11-21 23:07:20 Re: Spread checkpoint sync
Previous Message Vaibhav Kaushal 2010-11-21 22:44:55 Re: Fwd: What do these terms mean in the SOURCE CODE?