Full page writes improvement

Lists: pgsql-hackerspgsql-patches
From: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
To: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-patches(at)postgresql(dot)org
Subject: Full page writes improvement
Date: 2007-02-01 02:07:27
Message-ID: 45C14B5F.4010406@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Here's an idea and a patch for full page writes improvement.

Idea:
(1) keep full page writes for ordinary WAL, make them available during
the crash recovery, -> recovery from inconsistent pages which can be
made at the crash,
(2) Remove them from the archive log except for those written during
online backup (between pg_start_backup and pg_stop_backup) -> small size
archive log.

Implementation:
(1) Mark WAL record whose full-page-writes can be removed,
(2) Remove full-page writes from the marked WAL record in archive
command, and
(3) Restore the removed full-page writes to make LSN consistent.

Included is a patch for this as well as archive and restore command source.

Patch is very small and I hope this to be included in 8.3.

--
Koichi Suzuki

Attachment Content-Type Size
pg_lesslog.tar.gz application/gzip 34.9 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
Cc: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Full page writes improvement
Date: 2007-02-01 19:59:07
Message-ID: 23241.1170359947@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp> writes:
> Here's an idea and a patch for full page writes improvement.

> Idea:
> (1) keep full page writes for ordinary WAL, make them available during
> the crash recovery, -> recovery from inconsistent pages which can be
> made at the crash,
> (2) Remove them from the archive log except for those written during
> online backup (between pg_start_backup and pg_stop_backup) -> small size
> archive log.

Doesn't this break crash recovery on PITR slaves?

regards, tom lane


From: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Full page writes improvement
Date: 2007-02-02 00:39:38
Message-ID: 45C2884A.50607@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp> writes:
>> Here's an idea and a patch for full page writes improvement.
>
>> Idea:
>> (1) keep full page writes for ordinary WAL, make them available during
>> the crash recovery, -> recovery from inconsistent pages which can be
>> made at the crash,
>> (2) Remove them from the archive log except for those written during
>> online backup (between pg_start_backup and pg_stop_backup) -> small size
>> archive log.
>
> Doesn't this break crash recovery on PITR slaves?

Compressed archive log contains the same data as full_page_writes off
case. So the influence to PITR slaves is the same as full_page_writes off.

K.Suzuki

>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

--
Koichi Suzuki


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
Cc: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Full page writes improvement
Date: 2007-02-02 01:03:27
Message-ID: 2098.1170378207@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp> writes:
> Tom Lane wrote:
>> Doesn't this break crash recovery on PITR slaves?

> Compressed archive log contains the same data as full_page_writes off
> case. So the influence to PITR slaves is the same as full_page_writes off.

Right. So what is the use-case for running your primary database with
full_page_writes on and the slaves with it off? It doesn't seem like
a very sensible combination to me.

Also, it seems to me that some significant performance hit would be
taken by having to grovel through the log files to remove and re-add the
full-page data. Plus you are actually writing *more* WAL data out of
the primary, not less, because you have to save both the full-page
images and the per-tuple data they normally replace. Do you have
numbers showing that there's actually any meaningful savings overall?

regards, tom lane


From: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Full page writes improvement
Date: 2007-02-02 02:05:52
Message-ID: 45C29C80.1040200@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp> writes:
>> Tom Lane wrote:
>>> Doesn't this break crash recovery on PITR slaves?
>
>> Compressed archive log contains the same data as full_page_writes off
>> case. So the influence to PITR slaves is the same as full_page_writes off.
>
> Right. So what is the use-case for running your primary database with
> full_page_writes on and the slaves with it off? It doesn't seem like
> a very sensible combination to me.
>
> Also, it seems to me that some significant performance hit would be
> taken by having to grovel through the log files to remove and re-add the
> full-page data. Plus you are actually writing *more* WAL data out of
> the primary, not less, because you have to save both the full-page
> images and the per-tuple data they normally replace. Do you have
> numbers showing that there's actually any meaningful savings overall?

Yes, I have some evaluations to show that we're writing less and using
overall less resources. Please give me a couple of days to translate.

In the case of PITR slave, because archive logs are read in a short
period, amount of archive log may not be an issue. In the case where
online backup and archive logs must be kept for (relatively) long
period, archive log size is a major issue.

K.Suzuki

>
> regards, tom lane
>

--
Koichi Suzuki


From: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Full page writes improvement
Date: 2007-02-09 04:13:38
Message-ID: 45CBF4F2.1080708@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Full_page_compress is not intended to use with PITR slave, but for the
case to keep both online backup and archive log for archive recovery,
which is very popular PostgreSQL operation now.

I've just posted my evaluation for the patch as a reply for another
thread of the same proposal (sorry, I created new thread because old one
seemed not good).

It compares log compression with gzip case. Also, our proposal can
combine with gzip. It's overall overhead is slightly less than just
copying WAL using cat. As a result, my proposal does not include
serious overhead.

Please refer to the thread "Archive log compression keeping physical log
available in the crash recovery". I appreciate further opinion/comment
on this. I'd like to have more suggestion which evaluation is useful.

I've posted two (archive and restore) commands and a small patch.
These two commands can be treated as contrib and the patch itself does
work if WAL is simply copied to the archive directory.

Regards;
Koichi Suzuki

Tom Lane wrote:
> Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp> writes:
>> Tom Lane wrote:
>>> Doesn't this break crash recovery on PITR slaves?
>
>> Compressed archive log contains the same data as full_page_writes off
>> case. So the influence to PITR slaves is the same as full_page_writes off.
>
> Right. So what is the use-case for running your primary database with
> full_page_writes on and the slaves with it off? It doesn't seem like
> a very sensible combination to me.
>
> Also, it seems to me that some significant performance hit would be
> taken by having to grovel through the log files to remove and re-add the
> full-page data. Plus you are actually writing *more* WAL data out of
> the primary, not less, because you have to save both the full-page
> images and the per-tuple data they normally replace. Do you have
> numbers showing that there's actually any meaningful savings overall?
>
> regards, tom lane
>

--
Koichi Suzuki