Re: Redesigning checkpoint_segments

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Redesigning checkpoint_segments
Date: 2013-06-07 19:14:51
Message-ID: 1370632491.1630.YahooMailNeo@web162905.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> (As to why smaller checkpoint_segments can help, here's my guess:
> if checkpoint_segments is relatively small, then when we recycle
> a segment we're likely to find its data already in cache.  That's
> a lot better than reading it back in from disk just to overwrite
> the data.)

My recollection on this topic is that before pg_upgrade Wisconsin
Courts had to upgrade all of the geographically distributed
databases to a new PostgreSQL version, and that was being done with
pg_dump piped to psql in conjunction with the rollout of new
hardware (according to the four-year replacement policy).  The
upgrade process involved a DBA staying late centrally while the
conversion ran, a field tech staying late on the client site to
haul off the old box once successful conversion was confirmed, a
business analyst staying late to confirm proper operation after the
conversion, and a web programmer staying late to confirm that all
web interfaces showed proper data flow post-conversion.  Every
minute shaved off of the upgrade process saved a lot of staff time,
so the DBA team tested the conversion process very carefully.

Some findings were unsurprising, like that a direct connection
between the servers using a cross-wired network patch cable was
faster than plugging both machines into the same switch.  But we
tested all of our assumptions, and re-tested the surprising ones.
One such surprise was that the conversion ran faster, even on a
"largish" database of around 200GB, with 3 checkpoint_segments than
with larger settings.  The difference was significant and
repeatable. My personal theory was that segments were being
recycled and overwritten while still in the battery-backed
controller cache, so writes from multiple cycles evaporated in the
cache, reducing total physical disk writes.  Greg Smith blew that
theory out of the water by finding the same behavior on his laptop,
which did not have a write-back cache.  AFAIK, this mystery remains
unsolved, although Robert's idea above sounds plausible.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2013-06-07 19:15:57 Re: Freezing without write I/O
Previous Message Josh Berkus 2013-06-07 19:14:32 Re: Hard limit on WAL space used (because PANIC sucks)