Re: Spreading full-page writes

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>
Subject: Re: Spreading full-page writes
Date: 2014-06-02 12:34:10
Message-ID: CAHGQGwFQ5k_eXMOfkU-XwqdsuenjqrOpS=KQ8GSioM4C6Kmt0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 28, 2014 at 1:10 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Tue, May 27, 2014 at 1:19 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Tue, May 27, 2014 at 3:57 PM, Simon Riggs <simon(at)2ndquadrant(dot)com>
>> wrote:
>> > The requirements we were discussing were around
>> >
>> > A) reducing WAL volume
>> > B) reducing foreground overhead of writing FPWs - which spikes badly
>> > after checkpoint and the overhead is paid by the user processes
>> > themselves
>> > C) need for FPWs during base backup
>> >
>> > So that gives us a few approaches
>> >
>> > * Compressing FPWs gives A
>> > * Background FPWs gives us B
>> > which look like we can combine both ideas
>> >
>> > * Double-buffering would give us A and B, but not C
>> > and would be incompatible with other two ideas
>>
>> Double-buffering would allow us to disable FPW safely but which would make
>> a recovery slow.
>
> Is it due to the fact that during recovery, it needs to check the
> contents of double buffer as well as the page in original location
> for consistency or there is something else also which will lead
> to slow recovery?
>
> Won't DBW (double buffer write) reduce the need for number of
> pages that needs to be read from disk as compare to FPW which
> will suffice the performance degradation due to any other impact?
>
> IIUC in DBW mechanism, we need to have a temporary sequential
> log file of fixed size which will be used to write data before the data
> gets written to its actual location in tablespace. Now as the temporary
> log file is of fixed size, the number of pages that needs to be read
> during recovery should be less as compare to FPW because in FPW
> it needs to read all the pages written in WAL log after last successful
> checkpoint.

Hmm... maybe I'm misunderstanding how WAL replay works in DBW case.
Imagine the case where we try to replay two WAL records for the page A and
the page has not been cached in shared_buffers yet. If FPW is enabled,
the first WAL record is FPW and firstly it's just read to shared_buffers.
The page doesn't neeed to be read from the disk. Then the second WAL record
will be applied.

OTOH, in DBW case, how does this example case work? I was thinking that
firstly we try to apply the first WAL record but find that the page A doesn't
exist in shared_buffers yet. We try to read the page from the disk, check
whether its CRC is valid or not, and read the same page from double buffer
if it's invalid. After reading the page into shared_buffers, the first WAL
record can be applied. Then the second WAL record will be applied. Is my
understanding right?

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2014-06-02 12:44:27 Re: Compression of full-page-writes
Previous Message Koichi Suzuki 2014-06-02 12:31:13 Re: Documenting the Frontend/Backend Protocol update criteria