Re: [GENERAL] Slow PITR restore

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Jeff Trout <threshar(at)threshar(dot)is-a-geek(dot)com>, pgsql-hackers(at)postgresql(dot)org, Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
Subject: Re: [GENERAL] Slow PITR restore
Date: 2007-12-13 21:57:29
Message-ID: 4761AAC9.2050303@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> Also, I have not seen anyone provide a very credible argument why
> we should spend a lot of effort on optimizing a part of the system
> that is so little-exercised. Don't tell me about warm standby
> systems --- they are fine as long as recovery is at least as fast
> as the original transactions, and no evidence has been provided to
> suggest that it's not.

Koichi showed me & Simon graphs of DBT-2 runs in their test lab back in
May. They had setup two identical systems, one running the benchmark,
and another one as a warm stand-by. The stand-by couldn't keep up; it
couldn't replay the WAL as quickly as the primary server produced it.
IIRC, replaying WAL generated in a 1h benchmark run took 6 hours.

It sounds unbelievable at first, but the problem is that our WAL replay
doesn't scale. On the primary server, you can have (and they did) a huge
RAID array with dozens of disks, and a lot of concurrent activity
keeping it busy. On the standby, we do all the same work, but with a
single process. Every time we need to read in a page to modify it, we
block. No matter how many disks you have in the array, it won't help,
because we only issue one I/O request at a time.

That said, I think the change we made in Spring to not read in pages for
full page writes will help a lot with that. It would be nice to see some
new benchmark results to measure that. However, it didn't fix the
underlying scalability problem.

One KISS approach would be to just do full page writes more often. It
would obviously bloat the WAL, but it would make the replay faster.

Another reason you would care about fast recovery is PITR. If you do
base backups only once a week, for example, when you need to recover
using the archive, you might have to replay a weeks worth of WAL in the
worst case. You don't want to wait a week for the replay to finish.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2007-12-13 22:10:44 Re: [GENERAL] Slow PITR restore
Previous Message Simon Riggs 2007-12-13 21:55:33 Re: [GENERAL] Slow PITR restore

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-12-13 22:10:44 Re: [GENERAL] Slow PITR restore
Previous Message Simon Riggs 2007-12-13 21:55:33 Re: [GENERAL] Slow PITR restore