Re: checkpoints are duplicated even while the system is idle

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpoints are duplicated even while the system is idle
Date: 2011-10-06 15:55:37
Message-ID: CA+U5nMJyQ+E6Bxfz4_KrQ481ThfaifTAqU96ntXrD_uwimSB0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 5, 2011 at 6:19 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> While the system is idle, we skip duplicate checkpoints for some
> reasons. But when wal_level is set to hot_standby, I found that
> checkpoints are wrongly duplicated even while the system is idle.
> The cause is that XLOG_RUNNING_XACTS WAL record always
> follows CHECKPOINT one when wal_level is set to hot_standby.
> So the subsequent checkpoint wrongly thinks that there is inserted
> record (i.e., XLOG_RUNNING_XACTS record) since the start of the
> last checkpoint, the system is not idle, and this checkpoint cannot
> be skipped. Is this intentional behavior? Or a bug?

I think it is avoidable behaviour, but not a bug.

Thinking some more about this, IMHO it is possible to improve the
situation greatly by returning to look at the true purpose of
checkpoints. Checkpoints exist to minimise the time taken during crash
recovery, and as starting points for backups/archive recoveries.

The current idea is that if there has been no activity then we skip
checkpoint. But all it takes is a single WAL record and off we go with
another checkpoint. If there hasn't been much WAL activity, there is
not much point in having another checkpoint record since there is
little if any time to be saved in recovery.

So why not avoid checkpoints until we have written at least 1 WAL file
worth of data? That way checkpoint records are always in different
files, so we are safer with regard to primary and secondary checkpoint
records. That would mean in some cases that dirty data would stay in
shared buffers for days or weeks? No, because the bgwriter would clean
it - but even if it did, so what? Recovery will still be incredibly
quick, which is the whole point.

Testing whether we're in a different segment is easy and much simpler
than trying to wriggle around trying to directly fix the problem you
mention. Patch attached.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
spaced_checkpoints.v1.patch application/octet-stream 2.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-10-06 16:06:04 Re: checkpoints are duplicated even while the system is idle
Previous Message Heikki Linnakangas 2011-10-06 14:24:07 Re: Inserting heap tuples in bulk in COPY