Quick Links

checkpoints are duplicated even while the system is idle

Lists:	pgsql-hackers

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	checkpoints are duplicated even while the system is idle
Date:	2011-10-05 05:19:44
Message-ID:	CAHGQGwFEjkNRMTjyvVtYWhuTJkDYTQBb95Oebm4yMXd_9CBx7A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

While the system is idle, we skip duplicate checkpoints for some
reasons. But when wal_level is set to hot_standby, I found that
checkpoints are wrongly duplicated even while the system is idle.
The cause is that XLOG_RUNNING_XACTS WAL record always
follows CHECKPOINT one when wal_level is set to hot_standby.
So the subsequent checkpoint wrongly thinks that there is inserted
record (i.e., XLOG_RUNNING_XACTS record) since the start of the
last checkpoint, the system is not idle, and this checkpoint cannot
be skipped. Is this intentional behavior? Or a bug?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject:	Re: checkpoints are duplicated even while the system is idle
Date:	2011-10-06 14:09:47
Message-ID:	CA+Tgmoamm58oSvqAAqUsOi2Leq55iQGA8-HWF=-iCcRARKZLhw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 5, 2011 at 1:19 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> While the system is idle, we skip duplicate checkpoints for some
> reasons. But when wal_level is set to hot_standby, I found that
> checkpoints are wrongly duplicated even while the system is idle.
> The cause is that XLOG_RUNNING_XACTS WAL record always
> follows CHECKPOINT one when wal_level is set to hot_standby.
> So the subsequent checkpoint wrongly thinks that there is inserted
> record (i.e., XLOG_RUNNING_XACTS record) since the start of the
> last checkpoint, the system is not idle, and this checkpoint cannot
> be skipped. Is this intentional behavior? Or a bug?

If we can eliminate it in some cases where it isn't necessary, that
seems like a good thing, but I'm not sure I have a good handle on when
it is or isn't necessary.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpoints are duplicated even while the system is idle
Date:	2011-10-06 15:55:37
Message-ID:	CA+U5nMJyQ+E6Bxfz4_KrQ481ThfaifTAqU96ntXrD_uwimSB0w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Oct 5, 2011 at 6:19 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> While the system is idle, we skip duplicate checkpoints for some
> reasons. But when wal_level is set to hot_standby, I found that
> checkpoints are wrongly duplicated even while the system is idle.
> The cause is that XLOG_RUNNING_XACTS WAL record always
> follows CHECKPOINT one when wal_level is set to hot_standby.
> So the subsequent checkpoint wrongly thinks that there is inserted
> record (i.e., XLOG_RUNNING_XACTS record) since the start of the
> last checkpoint, the system is not idle, and this checkpoint cannot
> be skipped. Is this intentional behavior? Or a bug?

I think it is avoidable behaviour, but not a bug.

Thinking some more about this, IMHO it is possible to improve the
situation greatly by returning to look at the true purpose of
checkpoints. Checkpoints exist to minimise the time taken during crash
recovery, and as starting points for backups/archive recoveries.

The current idea is that if there has been no activity then we skip
checkpoint. But all it takes is a single WAL record and off we go with
another checkpoint. If there hasn't been much WAL activity, there is
not much point in having another checkpoint record since there is
little if any time to be saved in recovery.

So why not avoid checkpoints until we have written at least 1 WAL file
worth of data? That way checkpoint records are always in different
files, so we are safer with regard to primary and secondary checkpoint
records. That would mean in some cases that dirty data would stay in
shared buffers for days or weeks? No, because the bgwriter would clean
it - but even if it did, so what? Recovery will still be incredibly
quick, which is the whole point.

Testing whether we're in a different segment is easy and much simpler
than trying to wriggle around trying to directly fix the problem you
mention. Patch attached.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
spaced_checkpoints.v1.patch	application/octet-stream	2.3 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpoints are duplicated even while the system is idle
Date:	2011-10-06 16:06:04
Message-ID:	3356.1317917164@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> The current idea is that if there has been no activity then we skip
> checkpoint. But all it takes is a single WAL record and off we go with
> another checkpoint. If there hasn't been much WAL activity, there is
> not much point in having another checkpoint record since there is
> little if any time to be saved in recovery.

> So why not avoid checkpoints until we have written at least 1 WAL file
> worth of data?

+1, but I think you need to compare to the last checkpoint's REDO
pointer, not to the position of the checkpoint record itself.
Otherwise, the argument falls down if there was a lot of activity
during the last checkpoint (which is not unlikely in these days of
spread checkpoints).

Also I think the comment needs more extensive revision than you gave it.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpoints are duplicated even while the system is idle
Date:	2011-10-06 16:32:27
Message-ID:	CA+TgmobY8Y+BU=BnhpBuYPC00PxfsMBMTsJmng67GF=HAWGk3A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 6, 2011 at 12:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> The current idea is that if there has been no activity then we skip
>> checkpoint. But all it takes is a single WAL record and off we go with
>> another checkpoint. If there hasn't been much WAL activity, there is
>> not much point in having another checkpoint record since there is
>> little if any time to be saved in recovery.
>
>> So why not avoid checkpoints until we have written at least 1 WAL file
>> worth of data?
>
> +1, but I think you need to compare to the last checkpoint's REDO
> pointer, not to the position of the checkpoint record itself.
> Otherwise, the argument falls down if there was a lot of activity
> during the last checkpoint (which is not unlikely in these days of
> spread checkpoints).
>
> Also I think the comment needs more extensive revision than you gave it.

If we go with this approach, we presumably also need to update the
documentation, especially for checkpoint_timeout (which will no longer
be a hard limit on the time between checkpoints).

I'm not entirely sure I understand the rationale, though. I mean, if
very little has happened since the last checkpoint, then the
checkpoint will be very cheap. In the totally degenerate case Fujii
Masao is reporting, where absolutely nothing has happened, it should
be basically free. We'll loop through a whole bunch of things, decide
there's nothing to fsync, and call it a day.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpoints are duplicated even while the system is idle
Date:	2011-10-06 16:44:53
Message-ID:	3800.1317919493@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I'm not entirely sure I understand the rationale, though. I mean, if
> very little has happened since the last checkpoint, then the
> checkpoint will be very cheap. In the totally degenerate case Fujii
> Masao is reporting, where absolutely nothing has happened, it should
> be basically free. We'll loop through a whole bunch of things, decide
> there's nothing to fsync, and call it a day.

I think the point is that a totally idle database should not continue to
emit WAL, not even at a slow rate. There are also power-consumption
objections to allowing the checkpoint process to fire up to no purpose.

The larger issue is that we should not only be concerned with optimizing
for high load. Doing nothing when there's nothing to be done is an
important part of good data-center citizenship. See also the ongoing
work to avoid unnecessary process wakeups.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpoints are duplicated even while the system is idle
Date:	2011-10-06 16:54:53
Message-ID:	CA+TgmoYANByJr-ogfu9otHfk-BQR8zmqgY_8afohBOLDzgxczw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 6, 2011 at 12:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I'm not entirely sure I understand the rationale, though. I mean, if
>> very little has happened since the last checkpoint, then the
>> checkpoint will be very cheap. In the totally degenerate case Fujii
>> Masao is reporting, where absolutely nothing has happened, it should
>> be basically free. We'll loop through a whole bunch of things, decide
>> there's nothing to fsync, and call it a day.
>
> I think the point is that a totally idle database should not continue to
> emit WAL, not even at a slow rate. There are also power-consumption
> objections to allowing the checkpoint process to fire up to no purpose.

Hmm, OK. I still think it's a little funny to say that
checkpoint_timeout will force a checkpoint every N minutes except when
it doesn't, but maybe there's no real harm in that as long as we
document it properly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpoints are duplicated even while the system is idle
Date:	2011-10-06 17:20:01
Message-ID:	CA+U5nMLZp=idWZBh3K9vPYVow8iCpg5iMeodDRpA_5pbibu_XQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 6, 2011 at 5:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> The current idea is that if there has been no activity then we skip
>> checkpoint. But all it takes is a single WAL record and off we go with
>> another checkpoint. If there hasn't been much WAL activity, there is
>> not much point in having another checkpoint record since there is
>> little if any time to be saved in recovery.
>
>> So why not avoid checkpoints until we have written at least 1 WAL file
>> worth of data?
>
> +1, but I think you need to compare to the last checkpoint's REDO
> pointer, not to the position of the checkpoint record itself.
> Otherwise, the argument falls down if there was a lot of activity
> during the last checkpoint (which is not unlikely in these days of
> spread checkpoints).

Agreed, that's better.

> Also I think the comment needs more extensive revision than you gave it.

OK, will do.

New version attached. Docs later.

Do we want this backpatched? If so, suggest just 9.1 and 9.0?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
spaced_checkpoints.v2.patch	application/octet-stream	2.7 KB

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpoints are duplicated even while the system is idle
Date:	2011-10-06 17:26:56
Message-ID:	4E8D9E900200002500041B81@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>> I think the point is that a totally idle database should not
>> continue to emit WAL, not even at a slow rate. There are also
>> power-consumption objections to allowing the checkpoint process
>> to fire up to no purpose.
>
> Hmm, OK. I still think it's a little funny to say that
> checkpoint_timeout will force a checkpoint every N minutes except
> when it doesn't, but maybe there's no real harm in that as long as
> we document it properly.

What will be the best way to determine, from looking only at a
standby, whether replication is healthy and progressing? Going back
to the 8.1 warm standby days we have run pg_controldata and compared
the "Time of latest checkpoint" to current time; I'm hoping there's
a better way now, so that we can drop that kludge. If not, I would
like a way to kick out a checkpoint from the master at some finite
and configurable interval for monitoring purposes. I'm all for
having a nice, sharp fillet knife, but if that's not available,
please don't take away this rock I chipped at until it had an
edge.

Most likely there's something there which I've missed, but it's
really nice to be able to tell the difference between a quiet master
and broken replication.

-Kevin

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpoints are duplicated even while the system is idle
Date:	2011-10-06 17:56:21
Message-ID:	4671.1317923781@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> Do we want this backpatched? If so, suggest just 9.1 and 9.0?

-1 for backpatching; it's more an improvement than a bug fix.

In any case, I think we still need to respond to the point Kevin made
about how to tell an idle master from broken replication. Right now,
you will get at least a few bytes of data every checkpoint_timeout
seconds. If we change this, you won't.

I'm inclined to think that the way to deal with that is not to force out
useless WAL data, but to add some sort of explicit "I'm alive" heartbeat
signal to the walsender/walreceiver protocol. The hard part of that is
to figure out how to expose it where you can see it on the slave side
--- or do we have a status view that could handle that?

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpoints are duplicated even while the system is idle
Date:	2011-10-06 17:58:50
Message-ID:	4706.1317923930@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Oct 6, 2011 at 12:44 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I think the point is that a totally idle database should not continue to
>> emit WAL, not even at a slow rate. There are also power-consumption
>> objections to allowing the checkpoint process to fire up to no purpose.

> Hmm, OK. I still think it's a little funny to say that
> checkpoint_timeout will force a checkpoint every N minutes except when
> it doesn't, but maybe there's no real harm in that as long as we
> document it properly.

Well ... if we think that it's sane to only checkpoint once per WAL
segment, maybe we should just take out checkpoint_timeout.

We'd need some other mechanism to address replication use-cases, but see
my comments to Simon's followup patch just now.

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpoints are duplicated even while the system is idle
Date:	2011-10-06 18:05:16
Message-ID:	CA+U5nMJ2gnW8KG4rDfTT4E+vqr20Oddv1v-J_oJbgCQTi6GfAg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 6, 2011 at 6:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
>> Do we want this backpatched? If so, suggest just 9.1 and 9.0?
>
> -1 for backpatching; it's more an improvement than a bug fix.

OK, works for me.

> In any case, I think we still need to respond to the point Kevin made
> about how to tell an idle master from broken replication. Right now,
> you will get at least a few bytes of data every checkpoint_timeout
> seconds. If we change this, you won't.

> I'm inclined to think that the way to deal with that is not to force out
> useless WAL data, but to add some sort of explicit "I'm alive" heartbeat
> signal to the walsender/walreceiver protocol. The hard part of that is
> to figure out how to expose it where you can see it on the slave side
> --- or do we have a status view that could handle that?

Different but related issue and yes, am on it, and yes, the way you just said.

I foresee a function that tells you the delay based on a protocol
message of 'k' for keepalive.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpoints are duplicated even while the system is idle
Date:	2011-10-06 18:08:20
Message-ID:	4E8DEE94.7080507@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06.10.2011 20:58, Tom Lane wrote:
> Robert Haas<robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Oct 6, 2011 at 12:44 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I think the point is that a totally idle database should not continue to
>>> emit WAL, not even at a slow rate. There are also power-consumption
>>> objections to allowing the checkpoint process to fire up to no purpose.
>
>> Hmm, OK. I still think it's a little funny to say that
>> checkpoint_timeout will force a checkpoint every N minutes except when
>> it doesn't, but maybe there's no real harm in that as long as we
>> document it properly.
>
> Well ... if we think that it's sane to only checkpoint once per WAL
> segment, maybe we should just take out checkpoint_timeout.

Huh? Surely not, in my mind checkpoint_timeout is the primary way of
controlling checkpoints, and checkpoint_segments you just set "high
enough" so that you never reach it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpoints are duplicated even while the system is idle
Date:	2011-10-06 18:18:29
Message-ID:	CA+TgmoZTnqk8vpODJJqW_Q3R0y1eSnoN3Y+LH6J=cyqq_cch4Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 6, 2011 at 1:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
>> Do we want this backpatched? If so, suggest just 9.1 and 9.0?
>
> -1 for backpatching; it's more an improvement than a bug fix.
>
> In any case, I think we still need to respond to the point Kevin made
> about how to tell an idle master from broken replication. Right now,
> you will get at least a few bytes of data every checkpoint_timeout
> seconds. If we change this, you won't.
>
> I'm inclined to think that the way to deal with that is not to force out
> useless WAL data, but to add some sort of explicit "I'm alive" heartbeat
> signal to the walsender/walreceiver protocol. The hard part of that is
> to figure out how to expose it where you can see it on the slave side
> --- or do we have a status view that could handle that?

As of 9.1, we already have something very much like this, in the
opposite direction. See wal_receiver_status_interval and
replication_timeout. I bet we could adapt that slightly to work in
the other direction, too. But that'll only work with streaming
replication - do we care about the WAL shipping case?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpoints are duplicated even while the system is idle
Date:	2011-10-06 18:26:50
Message-ID:	4964.1317925610@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Oct 6, 2011 at 1:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm inclined to think that the way to deal with that is not to force out
>> useless WAL data, but to add some sort of explicit "I'm alive" heartbeat
>> signal to the walsender/walreceiver protocol. The hard part of that is
>> to figure out how to expose it where you can see it on the slave side
>> --- or do we have a status view that could handle that?

> As of 9.1, we already have something very much like this, in the
> opposite direction. See wal_receiver_status_interval and
> replication_timeout. I bet we could adapt that slightly to work in
> the other direction, too. But that'll only work with streaming
> replication - do we care about the WAL shipping case?

I can't get excited about the WAL-shipping case. The artifact that we'll
generate a checkpoint record every few minutes does not create enough
WAL volume for WAL-shipping to reliably generate a heartbeat signal.
It'd be way too long between filling up segments if that were the only
WAL data being generated.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpoints are duplicated even while the system is idle
Date:	2011-10-06 18:42:03
Message-ID:	CA+TgmoZsSgTwr42JDGCO3HuLyNaLu4BEBiakH+DUmT94ZEP_NQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 6, 2011 at 2:26 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Oct 6, 2011 at 1:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I'm inclined to think that the way to deal with that is not to force out
>>> useless WAL data, but to add some sort of explicit "I'm alive" heartbeat
>>> signal to the walsender/walreceiver protocol. The hard part of that is
>>> to figure out how to expose it where you can see it on the slave side
>>> --- or do we have a status view that could handle that?
>
>> As of 9.1, we already have something very much like this, in the
>> opposite direction. See wal_receiver_status_interval and
>> replication_timeout. I bet we could adapt that slightly to work in
>> the other direction, too. But that'll only work with streaming
>> replication - do we care about the WAL shipping case?
>
> I can't get excited about the WAL-shipping case. The artifact that we'll
> generate a checkpoint record every few minutes does not create enough
> WAL volume for WAL-shipping to reliably generate a heartbeat signal.
> It'd be way too long between filling up segments if that were the only
> WAL data being generated.

Depends how you set archive_timeout, I think.

Anyway, I'm not saying we *have* to care about that case. I'm just
asking whether we do, before we go start changing things.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpoints are duplicated even while the system is idle
Date:	2011-10-06 18:46:29
Message-ID:	4E8DB1350200002500041B89@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:

> I foresee a function that tells you the delay based on a protocol
> message of 'k' for keepalive.

If the delay you mention is basically a "ping" time or something
similar, that would answer the need I've been on about. We need to
know, based on access to the replica, that the replication system is
alive and well even with an idle master; as it can otherwise be hard
to distinguish an idle master from a failed replication system
(broken connection, misconfigured replication, etc.).

Right now there is a periodic checkpoint which flows through the WAL
and affects the pg_controldata report on the replica -- we've been
using that for monitoring. Any sort of heartbeat or ping which
provides sign-of-life on the connection, accessible on the replica,
should do for our purposes. If it only works over streaming
replication on a hot standby, that's OK -- we plan to be running
everything that way before 9.2 comes out, as long as we can
materialize "traditional" WAL files on the receiving end from the SR
stream.

-1 on changing the checkpoint behavior before 9.2, though.

-Kevin

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: checkpoints are duplicated even while the system is idle
Date:	2011-10-06 19:11:57
Message-ID:	CA+U5nMJH3YYmKfCq1uaZTViEE4Cz3UjezhwevsxhVTKfTMdNwQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Oct 6, 2011 at 7:18 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> As of 9.1, we already have something very much like this, in the
> opposite direction.

Yes Robert, I wrote it.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services