Quick Links

Re: Streaming replication status

Lists:	pgsql-hackers

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Streaming replication status
Date:	2010-01-08 21:16:12
Message-ID:	4B47A09C.8070704@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I've gone through the patch in detail now. Here's my list of remaining
issues:

* If there's no WAL to send, walsender doesn't notice if the client has
closed connection already. This is the issue Fujii reported already.
We'll need to add a select() call to the walsender main loop to check if
the socket has been closed.

* I removed the feature that archiver was started during recovery. The
idea of that was to enable archiving from a standby server, to relieve
the master server of that duty, but I found it annoying because it
causes trouble if the standby and master are configured to archive to
the same location; they will fight over which copies the file to the
archive first. Frankly the feature doesn't seem very useful as the patch
stands, because you still have to configure archiving in the master in
practice; you can't take an online base backup otherwise, and you have
the risk of standby falling too much behind and having to restore from
base backup whenever the standby is disconnected for any reason. Let's
revisit this later when it's truly useful.

* We still have a related issue, though: if standby is configured to
archive to the same location as master (as it always is on my laptop,
where I use the postgresql.conf of the master unmodified in the server),
right after failover the standby server will try to archive all the old
WAL files that were streamed from the master; but they exist already in
the archive, as the master archived them already. I'm not sure if this
is a pilot error, or if we should do something in the server to tell
apart WAL segments streamed from master and those generated in the
standby server after failover. Maybe we should immediately create a
.done file for every file received from master?

* I don't think we should require superuser rights for replication.
Although you see all WAL and potentially all data in the system through
that, a standby doesn't need any write access to the master, so it would
be good practice to create a dedicated account with limited privileges
for replication.

* A standby that connects to master, initiates streaming, and then sits
idle without stalls recycling of old WAL files in the master. That will
eventually lead to a full disk in master. Do we need some kind of a
emergency valve on that?

* Do we really need REPLICATION_DEBUG_ENABLED? The output doesn't seem
very useful to me.

* Need to add comments somewhere to note that ReadRecord depends on the
fact that a WAL record is always send as whole, never split across two
messages.

* Do we really need to split the sleep in walsender to NAPTIME_PER_CYCLE
increments?

* Walreceiver should flush less aggresively than after each received
piece of WAL as noted by XXX comment.

* Consider renaming PREPARE_REPLICATION to IDENTIFY_SYSTEM or something.

* What's the change in bgwriter.c for?

* ReadRecord/FetchRecord is a bit of mess. I earlier tried to refactor
it into something simpler a couple of times, but failed. So I'm going to
leave it as it is, but if someone else wants to give it a shot, that
would be good.

* Documentation. The patch used to move around some sections, but I
think that has been partially reverted so that it now just duplicates
them. It probably needs other work too, I haven't looked at the docs in
any detail.

These are all the issues I know of right now. Assuming no new issues
crop up (which often does happen), the patch is ready for committing
after those have been addressed.

Attached is my latest version as a patch, also available in my git
repository.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
replication-20100108.patch.gz	application/x-gzip	50.1 KB

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-08 22:20:47
Message-ID:	4B47AFBF.2030704@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/8/10 1:16 PM, Heikki Linnakangas wrote:
> * A standby that connects to master, initiates streaming, and then sits
> idle without stalls recycling of old WAL files in the master. That will
> eventually lead to a full disk in master. Do we need some kind of a
> emergency valve on that?

WARNING: I haven't thought about how this would work together with HS yes.

I think this needs to be administrator-configurable.

I'd suggest a GUC approach:

archiving_lag_action = { ignore, shutdown, stop }

"Ignore" would be the default. Some users would rather have the master
shut down if the slave has stopped taking segments; that's "shutdown".
Otherwise, it's "stop" which simply stops archiving and starts recylcing
when we reach that number of segments.

Better name for the GUC very welcome ...

--Josh Berkus

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-09 01:38:22
Message-ID:	407d949e1001081738j6a5217f8we2ec5d05c965d685@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 8, 2010 at 9:16 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> * We still have a related issue, though: if standby is configured to
> archive to the same location as master (as it always is on my laptop,
> where I use the postgresql.conf of the master unmodified in the server),
> right after failover the standby server will try to archive all the old
> WAL files that were streamed from the master; but they exist already in
> the archive, as the master archived them already. I'm not sure if this
> is a pilot error, or if we should do something in the server to tell
> apart WAL segments streamed from master and those generated in the
> standby server after failover. Maybe we should immediately create a
> .done file for every file received from master?

How do we know the master has finished archiving them? If the master
crashes suddenly and you fail over couldn't it have failed to archive
segments that have been received by the standby via streaming
replication?

> * Need to add comments somewhere to note that ReadRecord depends on the
> fact that a WAL record is always send as whole, never split across two
> messages.

What happens in the case of the very large records Tom was describing
recently. If the entire record doesn't fit in a WAL segment is it the
whole record or the partial record with the continuation bit that
needs to fit in a message?

--
greg

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-09 06:49:13
Message-ID:	3f0b79eb1001082249r2c410f5q8b1386fc8c765f61@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 9, 2010 at 6:16 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> I've gone through the patch in detail now. Here's my list of remaining
> issues:

Great! Thanks a lot!

> * If there's no WAL to send, walsender doesn't notice if the client has
> closed connection already. This is the issue Fujii reported already.
> We'll need to add a select() call to the walsender main loop to check if
> the socket has been closed.

We should reactivate pq_wait() and secure_poll()?

> * I removed the feature that archiver was started during recovery. The
> idea of that was to enable archiving from a standby server, to relieve
> the master server of that duty, but I found it annoying because it
> causes trouble if the standby and master are configured to archive to
> the same location; they will fight over which copies the file to the
> archive first. Frankly the feature doesn't seem very useful as the patch
> stands, because you still have to configure archiving in the master in
> practice; you can't take an online base backup otherwise, and you have
> the risk of standby falling too much behind and having to restore from
> base backup whenever the standby is disconnected for any reason. Let's
> revisit this later when it's truly useful.

Okey.

There is no guarantee that such file has already been archived by master.
This is just an idea, but new WAL record indicating the completion of the
archiving would be useful for the standby to create .done file. But, this idea
might kill the "archiving during recovery" idea discussed above.

Personally, I'm OK with that issue because we can avoid it by tweaking
archive_command. Could we revisit this discussion with the "archiving
during recovery" discussion later?

> * I don't think we should require superuser rights for replication.
> Although you see all WAL and potentially all data in the system through
> that, a standby doesn't need any write access to the master, so it would
> be good practice to create a dedicated account with limited privileges
> for replication.

Okey to just drop the superuser() check from walsender.c.

> * A standby that connects to master, initiates streaming, and then sits
> idle without stalls recycling of old WAL files in the master. That will
> eventually lead to a full disk in master. Do we need some kind of a
> emergency valve on that?

I think that we need the GUC parameter to specify the maximum number
of log file segments held in pg_xlog directory to send to the standby server.
The replication to the standby which falls more than that GUC value behind
is just terminated.
http://archives.postgresql.org/pgsql-hackers/2009-12/msg01901.php

> * Do we really need REPLICATION_DEBUG_ENABLED? The output doesn't seem
> very useful to me.

This was useful for me to debug the code. But, right now, Okey to drop it.

> * Need to add comments somewhere to note that ReadRecord depends on the
> fact that a WAL record is always send as whole, never split across two
> messages.

Okey.

> * Do we really need to split the sleep in walsender to NAPTIME_PER_CYCLE
> increments?

Yes. It's required for some platforms (probably HP-UX) in which signals
cannot interrupt the sleep.

> * Walreceiver should flush less aggresively than after each received
> piece of WAL as noted by XXX comment.

> * XXX: Flushing after each received message is overly aggressive. Should
> * implement some sort of lazy flushing. Perhaps check in the main loop
> * if there's any more messages before blocking and waiting for one, and
> * flush the WAL if there isn't, just blocking.

In this approach, if messages continuously arrive from master, the fsync
would be delayed until WAL segment is switched. Likewise, recovery also
would be delayed, which seems to be problem.

How about the straightforward approach; let the process which wants to
flush the buffer send the fsync-request to walreceiver and wait until WAL
is flushed up to the buffer's LSN?

> * Consider renaming PREPARE_REPLICATION to IDENTIFY_SYSTEM or something.

Okey.

> * What's the change in bgwriter.c for?

It's for the bgwriter to know the current timeline for recycling the WAL files.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-09 06:53:48
Message-ID:	3f0b79eb1001082253x3a35d5dala105a63497f76f81@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 9, 2010 at 10:38 AM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
>> * Need to add comments somewhere to note that ReadRecord depends on the
>> fact that a WAL record is always send as whole, never split across two
>> messages.
>
> What happens in the case of the very large records Tom was describing
> recently. If the entire record doesn't fit in a WAL segment is it the
> whole record or the partial record with the continuation bit that
> needs to fit in a message?

It's the partial record with the continuation bit.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-09 07:25:31
Message-ID:	4B482F6B.80900@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Fujii Masao wrote:
> On Sat, Jan 9, 2010 at 6:16 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> * If there's no WAL to send, walsender doesn't notice if the client has
>> closed connection already. This is the issue Fujii reported already.
>> We'll need to add a select() call to the walsender main loop to check if
>> the socket has been closed.
>
> We should reactivate pq_wait() and secure_poll()?

I don't think we need all that, a simple select() should be enough.
Though I must admit I'm not very familiar with select/poll().

>> * We still have a related issue, though: if standby is configured to
>> archive to the same location as master (as it always is on my laptop,
>> where I use the postgresql.conf of the master unmodified in the server),
>> right after failover the standby server will try to archive all the old
>> WAL files that were streamed from the master; but they exist already in
>> the archive, as the master archived them already. I'm not sure if this
>> is a pilot error, or if we should do something in the server to tell
>> apart WAL segments streamed from master and those generated in the
>> standby server after failover. Maybe we should immediately create a
>> .done file for every file received from master?
>
> There is no guarantee that such file has already been archived by master.
> This is just an idea, but new WAL record indicating the completion of the
> archiving would be useful for the standby to create .done file. But, this idea
> might kill the "archiving during recovery" idea discussed above.
>
> Personally, I'm OK with that issue because we can avoid it by tweaking
> archive_command. Could we revisit this discussion with the "archiving
> during recovery" discussion later?

Ok. The workaround is to configure standby to archive to a different
location. If you need to restore from that, you'll need to stitch
together the logs from the old master and the new one.

>> * A standby that connects to master, initiates streaming, and then sits
>> idle without stalls recycling of old WAL files in the master. That will
>> eventually lead to a full disk in master. Do we need some kind of a
>> emergency valve on that?
>
> I think that we need the GUC parameter to specify the maximum number
> of log file segments held in pg_xlog directory to send to the standby server.
> The replication to the standby which falls more than that GUC value behind
> is just terminated.
> http://archives.postgresql.org/pgsql-hackers/2009-12/msg01901.php

Oh yes, sounds good.

>> * Do we really need to split the sleep in walsender to NAPTIME_PER_CYCLE
>> increments?
>
> Yes. It's required for some platforms (probably HP-UX) in which signals
> cannot interrupt the sleep.

I'm thinking that the wal_sender_delay is so small that maybe it's not
worth worrying about.

>> * Walreceiver should flush less aggresively than after each received
>> piece of WAL as noted by XXX comment.
>
>> * XXX: Flushing after each received message is overly aggressive. Should
>> * implement some sort of lazy flushing. Perhaps check in the main loop
>> * if there's any more messages before blocking and waiting for one, and
>> * flush the WAL if there isn't, just blocking.
>
> In this approach, if messages continuously arrive from master, the fsync
> would be delayed until WAL segment is switched. Likewise, recovery also
> would be delayed, which seems to be problem.

That seems OK to me. If messages are really coming in that fast,
fsyncing the whole WAL segment at a time is probably most efficient.

But if that really is too much, you could still do extra flushes within
XLogRecv() every few megabytes for example.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-10 11:13:35
Message-ID:	1263122015.19367.139042.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2010-01-08 at 23:16 +0200, Heikki Linnakangas wrote:

Agreed

That sounds like the right thing to do.

Agreed. I think we should have a predefined user, called "replication"
that has only the correct rights.

Can you explain how this could occur? My understanding was that the
walreceiver and startup processes were capable of independent action
specifically to avoid for this kind of effect.

> * Documentation. The patch used to move around some sections, but I
> think that has been partially reverted so that it now just duplicates
> them. It probably needs other work too, I haven't looked at the docs in
> any detail.

I believe the docs need urgent attention. We need more people to read
the docs and understand the implications so that people can then
comment. It is extremely non-obvious from the patch how things work at a
behaviour level.

I am very concerned that there is no thought given to monitoring
replication. This will make the feature difficult to use in practice.

--
Simon Riggs www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-10 11:17:41
Message-ID:	1263122262.19367.139091.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2010-01-08 at 14:20 -0800, Josh Berkus wrote:
> On 1/8/10 1:16 PM, Heikki Linnakangas wrote:
> > * A standby that connects to master, initiates streaming, and then sits
> > idle without stalls recycling of old WAL files in the master. That will
> > eventually lead to a full disk in master. Do we need some kind of a
> > emergency valve on that?
>
> WARNING: I haven't thought about how this would work together with HS yes.

I've been reviewing things as we go along, so I'm not that tense
overall. Having said that I don't understand why the problem above would
occur and the sentence seems to be missing a verb between "without" and
"stalls". More explanation please.

What could happen is that the standby could slowly lag behind master. We
don't have any way of monitoring that, as yet. Setting ps display is not
enough here.

--
Simon Riggs www.2ndQuadrant.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-10 16:40:37
Message-ID:	4B4A0305.4040905@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Fri, 2010-01-08 at 14:20 -0800, Josh Berkus wrote:
>> On 1/8/10 1:16 PM, Heikki Linnakangas wrote:
>>> * A standby that connects to master, initiates streaming, and then sits
>>> idle without stalls recycling of old WAL files in the master. That will
>>> eventually lead to a full disk in master. Do we need some kind of a
>>> emergency valve on that?
>> WARNING: I haven't thought about how this would work together with HS yes.
>
> I've been reviewing things as we go along, so I'm not that tense
> overall. Having said that I don't understand why the problem above would
> occur and the sentence seems to be missing a verb between "without" and
> "stalls". More explanation please.

Yeah, that sentence was broken.

> What could happen is that the standby could slowly lag behind master.

Right, that's what I'm worried about. In the worst case it the
walreceiver process in the standby might stall completely for some
reason, e.g hardware problem or SIGSTOP by an administrator.

> We
> don't have any way of monitoring that, as yet. Setting ps display is not
> enough here.

Yeah, monitoring would be nice too. But what I was wondering is whether
we need some way of stopping that from filling the disk in master.
(Fujii-san's suggestion of a GUC to set the max. amount of WAL to keep
in the master for standbys feels good to me).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-10 17:40:54
Message-ID:	1263145254.19367.143332.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, 2010-01-10 at 18:40 +0200, Heikki Linnakangas wrote:

> > We
> > don't have any way of monitoring that, as yet. Setting ps display is not
> > enough here.
>
> Yeah, monitoring would be nice too. But what I was wondering is whether
> we need some way of stopping that from filling the disk in master.
> (Fujii-san's suggestion of a GUC to set the max. amount of WAL to keep
> in the master for standbys feels good to me).

OK, now I got you. I thought that was already agreed; guess it is now.

We need monitoring anywhere we have a max_* parameter. Otherwise we
won't know how close we are to disaster until we hit the limit and
things break down. Otherwise we will have to set parameters by trial and
error, or set them so high they are meaningless.

--
Simon Riggs www.2ndQuadrant.com

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-10 20:10:29
Message-ID:	4B4A3435.6070106@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> We need monitoring anywhere we have a max_* parameter. Otherwise we
> won't know how close we are to disaster until we hit the limit and
> things break down. Otherwise we will have to set parameters by trial and
> error, or set them so high they are meaningless.

I agree.

Thing is, though, we have a de-facto max already ... when pgxlog runs
out of disk space. And no monitoring *in postgresql* for that, although
obviously you can use OS monitoring for it.

I'm saying, even for plain PITR, it would be an improvement in
manageablity if the DBA could set a maximum number of checkpoint
segments before replication is abandonded or the master shuts down.
It's something we've been missing.

--Josh Berkus

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-10 22:30:53
Message-ID:	1263162653.19367.146739.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, 2010-01-10 at 12:10 -0800, Josh Berkus wrote:
> > We need monitoring anywhere we have a max_* parameter. Otherwise we
> > won't know how close we are to disaster until we hit the limit and
> > things break down. Otherwise we will have to set parameters by trial and
> > error, or set them so high they are meaningless.
>
> I agree.
>
> Thing is, though, we have a de-facto max already ... when pgxlog runs
> out of disk space.

What I mean is this: The purpose of monitoring is to avoid bad things
happening by being able to predict that a bad thing will happen before
it actually does happen. Cars have windows to allow us to see we are
about to hit something.

> And no monitoring *in postgresql* for that, although
> obviously you can use OS monitoring for it.

PostgreSQL doesn't need to monitor that. If the user wants to avoid
out-of-space they can write a script to monitor files/space. The info is
accessible, if you wish to monitor it.

Currently there is no way of knowing what the average/current transit
time is on replication, no way of knowing what is happening if we go
idle etc.. Those things need to be included because they are not
otherwise accessible. Cars need windows, not just a finely tuned engine.

--
Simon Riggs www.2ndQuadrant.com

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-10 23:14:18
Message-ID:	4B4A5F4A.3010501@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Currently there is no way of knowing what the average/current transit
> time is on replication, no way of knowing what is happening if we go
> idle etc.. Those things need to be included because they are not
> otherwise accessible. Cars need windows, not just a finely tuned engine.

Like I said, I agree. I'm just pointing out that the monitoring
deficiency already exists whether or not we add a max_* parameter.

--Josh Berkus

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-11 08:36:22
Message-ID:	4B4AE306.3010203@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 9/01/2010 6:20 AM, Josh Berkus wrote:
> On 1/8/10 1:16 PM, Heikki Linnakangas wrote:
>> * A standby that connects to master, initiates streaming, and then sits
>> idle without stalls recycling of old WAL files in the master. That will
>> eventually lead to a full disk in master. Do we need some kind of a
>> emergency valve on that?
>
> WARNING: I haven't thought about how this would work together with HS yes.
>
> I think this needs to be administrator-configurable.
>
> I'd suggest a GUC approach:
>
> archiving_lag_action = { ignore, shutdown, stop }
>
> "Ignore" would be the default. Some users would rather have the master
> shut down if the slave has stopped taking segments; that's "shutdown".
> Otherwise, it's "stop" which simply stops archiving and starts recylcing
> when we reach that number of segments.

IMO "stop" would be *really* bad without some sort of administrator
alert support (scream for help) and/or the ability to refresh the
slave's base backup when it started responding again. We'd start seeing
mailing list posts along the lines of "my master failed over to the
slave, and it's missing the last 3 months of data! Help!".

Personally, I'd be uncomfortable enabling something like that without
_both_ an admin alert _and_ the ability to refresh the slave's base
backup without admin intervention.

It'd also be necessary to define what exactly "lag" means here,
preferably in a way that doesn't generally need admin tuning for most
users. Ideally there'd be separate thresholds for "scream to the admin
for help, something's wrong" and "forced to act, slave is holding up the
master".

--
Craig Ringer

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 01:16:57
Message-ID:	201001120116.o0C1GvH15599@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> > * I don't think we should require superuser rights for replication.
> > Although you see all WAL and potentially all data in the system through
> > that, a standby doesn't need any write access to the master, so it would
> > be good practice to create a dedicated account with limited privileges
> > for replication.
>
> Agreed. I think we should have a predefined user, called "replication"
> that has only the correct rights.

I am concerned that knowledge of this new read-only replication user
would have to be spread all over the backend code, which is really not
something we should be doing at this stage in 8.5 development. I am
also thinking such a special user might fall out of work on mandatory
access control, so maybe we should just require super-user for 8.5 and
revisit this for 8.6.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 02:59:20
Message-ID:	3f0b79eb1001111859h7702e48eld7699dbc9e03dd40@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jan 9, 2010 at 4:25 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> I don't think we need all that, a simple select() should be enough.
> Though I must admit I'm not very familiar with select/poll().

I'm not sure whether poll(2) should be called for this purpose. But
poll(2) and select(2) seem to often come together in the existing code.
We should follow such custom?

>>> * Do we really need to split the sleep in walsender to NAPTIME_PER_CYCLE
>>> increments?
>>
>> Yes. It's required for some platforms (probably HP-UX) in which signals
>> cannot interrupt the sleep.
>
> I'm thinking that the wal_sender_delay is so small that maybe it's not
> worth worrying about.

The same problem exists in walwriter.c, too. Though we can expect that
wal_writer_delay is small, its sleep has been broken down into smaller bit.
We should follow such existing code? Or just remove that feature from
walwriter?

>>> * Walreceiver should flush less aggresively than after each received
>>> piece of WAL as noted by XXX comment.
>>
>>> * XXX: Flushing after each received message is overly aggressive. Should
>>> * implement some sort of lazy flushing. Perhaps check in the main loop
>>> * if there's any more messages before blocking and waiting for one, and
>>> * flush the WAL if there isn't, just blocking.
>>
>> In this approach, if messages continuously arrive from master, the fsync
>> would be delayed until WAL segment is switched. Likewise, recovery also
>> would be delayed, which seems to be problem.
>
> That seems OK to me. If messages are really coming in that fast,
> fsyncing the whole WAL segment at a time is probably most efficient.

OK, I'll implement your idea. But that seems to be inefficient in the
synchronous replication (especially "wait WAL-replay" mode). So let's
revisit this discussion later.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 03:45:41
Message-ID:	3f0b79eb1001111945m1d9aa6b6n1a7e62705a00e9fe@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 10, 2010 at 8:17 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> What could happen is that the standby could slowly lag behind master. We
> don't have any way of monitoring that, as yet. Setting ps display is not
> enough here.

I agree that the statistical information about replication activity is
very useful. But I think that it's not an urgent issue. Shall we think
it later?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 03:59:30
Message-ID:	3f0b79eb1001111959m2978c507n59fd490828aa3a8f@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 11, 2010 at 5:36 PM, Craig Ringer
<craig(at)postnewspapers(dot)com(dot)au> wrote:
> Personally, I'd be uncomfortable enabling something like that without _both_
> an admin alert _and_ the ability to refresh the slave's base backup without
> admin intervention.

What feature do you specifically need as an alert? Just writing
the warning into the logfile is enough? Or need to notify by
using SNMP trap message? Though I'm not sure if this is a role
of Postgres.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 04:21:29
Message-ID:	4B4BF8C9.9030107@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Fujii Masao wrote:
> On Sun, Jan 10, 2010 at 8:17 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
>> What could happen is that the standby could slowly lag behind master. We
>> don't have any way of monitoring that, as yet. Setting ps display is not
>> enough here.
>>
>
> I agree that the statistical information about replication activity is
> very useful. But I think that it's not an urgent issue. Shall we think
> it later?
>

I don't think anybody can deploy this feature without at least some very
basic monitoring here. I like the basic proposal you made back in
September for adding a pg_standbys_xlog_location to replace what you
have to get from ps right now:
http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php

That's basic, but enough that people could get by for a V1.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 04:24:24
Message-ID:	4B4BF978.3080602@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Fujii Masao wrote:
> On Mon, Jan 11, 2010 at 5:36 PM, Craig Ringer
> <craig(at)postnewspapers(dot)com(dot)au> wrote:
>
>> Personally, I'd be uncomfortable enabling something like that without _both_
>> an admin alert _and_ the ability to refresh the slave's base backup without
>> admin intervention.
>>
>
> What feature do you specifically need as an alert? Just writing
> the warning into the logfile is enough? Or need to notify by
> using SNMP trap message? Though I'm not sure if this is a role
> of Postgres.
>

It's impossible for the database to have any idea whatsoever how people
are going to want to be alerted. Provide functions to monitor things
like replication lag, like the number of segments queued up to feed to
archive_command, and let people build their own alerting mechanism for
now. They're going to do that anyway, so why waste precious time here
building someone that's unlikely to fit any but a very narrow use case?

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 05:13:47
Message-ID:	3f0b79eb1001112113o795ad4e8n2b4fd318526bacd7@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 12, 2010 at 1:21 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> I don't think anybody can deploy this feature without at least some very
> basic monitoring here. I like the basic proposal you made back in September
> for adding a pg_standbys_xlog_location to replace what you have to get from
> ps right now:
> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php
>
> That's basic, but enough that people could get by for a V1.

Yeah, I have no objection to add such simple capability which monitors
the lag into the first release. But I guess that, in addition to that,
Simon wanted the capability to collect the statistical information about
replication activity (e.g., a transfer time, a write time, replay time).
So I'd like to postpone it.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 05:16:23
Message-ID:	3f0b79eb1001112116w43d7aa4va2e407c90ffa848b@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 12, 2010 at 1:24 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> It's impossible for the database to have any idea whatsoever how people are
> going to want to be alerted. Provide functions to monitor things like
> replication lag, like the number of segments queued up to feed to
> archive_command, and let people build their own alerting mechanism for now.
> They're going to do that anyway, so why waste precious time here building
> someone that's unlikely to fit any but a very narrow use case?

Agreed.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 07:22:06
Message-ID:	4B4C231E.90508@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Smith wrote:
> I don't think anybody can deploy this feature without at least some very
> basic monitoring here. I like the basic proposal you made back in
> September for adding a pg_standbys_xlog_location to replace what you
> have to get from ps right now:
> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php
>
> That's basic, but enough that people could get by for a V1.

It would be more straightforward to have a function in the standby to
return the current replay location. It feels more logical to poll the
standby to get the status of the standby, instead of indirectly from the
master. Besides, the master won't know how far the standby is if the
connection to the standby is broken.

Maybe we should just change the existing pg_current_xlog_location()
function to return that when recovery is in progress. It currently
throws an error during hot standby.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 07:24:53
Message-ID:	4B4C23C5.2070703@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Fujii Masao wrote:
> On Tue, Jan 12, 2010 at 1:21 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>> I don't think anybody can deploy this feature without at least some very
>> basic monitoring here. I like the basic proposal you made back in September
>> for adding a pg_standbys_xlog_location to replace what you have to get from
>> ps right now:
>> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php
>>
>> That's basic, but enough that people could get by for a V1.
>
> Yeah, I have no objection to add such simple capability which monitors
> the lag into the first release. But I guess that, in addition to that,
> Simon wanted the capability to collect the statistical information about
> replication activity (e.g., a transfer time, a write time, replay time).
> So I'd like to postpone it.

yeah getting that would all be nice and handy but we have to remember
that this is really our first cut at integrated replication. Being able
to monitor lag is what is needed as a minimum, more advanced stuff can
and will emerge once we get some actual feedback from the field.

Stefan

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 07:39:55
Message-ID:	1263281995.19367.170726.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2010-01-12 at 08:24 +0100, Stefan Kaltenbrunner wrote:
> Fujii Masao wrote:
> > On Tue, Jan 12, 2010 at 1:21 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> >> I don't think anybody can deploy this feature without at least some very
> >> basic monitoring here. I like the basic proposal you made back in September
> >> for adding a pg_standbys_xlog_location to replace what you have to get from
> >> ps right now:
> >> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php
> >>
> >> That's basic, but enough that people could get by for a V1.
> >
> > Yeah, I have no objection to add such simple capability which monitors
> > the lag into the first release. But I guess that, in addition to that,
> > Simon wanted the capability to collect the statistical information about
> > replication activity (e.g., a transfer time, a write time, replay time).
> > So I'd like to postpone it.
>
> yeah getting that would all be nice and handy but we have to remember
> that this is really our first cut at integrated replication. Being able
> to monitor lag is what is needed as a minimum, more advanced stuff can
> and will emerge once we get some actual feedback from the field.

Though there won't be any feedback from the field because there won't be
any numbers to discuss. Just "it appears to be working". Then we will go
into production and the problems will begin to be reported. We will be
able to do nothing to resolve them because we won't know how many people
are affected.

--
Simon Riggs www.2ndQuadrant.com

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 08:04:35
Message-ID:	4B4C2D13.9060302@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> Greg Smith wrote:
>
>> I don't think anybody can deploy this feature without at least some very
>> basic monitoring here. I like the basic proposal you made back in
>> September for adding a pg_standbys_xlog_location to replace what you
>> have to get from ps right now:
>> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php
>>
>> That's basic, but enough that people could get by for a V1.
>>
>
> It would be more straightforward to have a function in the standby to
> return the current replay location. It feels more logical to poll the
> standby to get the status of the standby, instead of indirectly from the
> master. Besides, the master won't know how far the standby is if the
> connection to the standby is broken.
>

This is one reason I was talking in my other message about getting
simple stats on how bad the archive_command backlog is, which I'd think
is an easy way to inform the DBA "the standby isn't keeping up and disk
is filling" in a way that's more database-centric than just looking at
disk space getting gobbled.

I think that it's important to be able to get whatever useful
information you can from both the primary and the standby, because most
of the interesting (read: painful) situations here are when one or the
other is down. The fundamental questions here are:

-When things are running normally, how much is the standby lagging by?
This is needed for a baseline of good performance, by which you can
detect problems before they get too bad.
-If the standby is down altogether, how can I get more information about
the state of things from the primary?
-If the primary is down, how can I tell more from the standby?

Predicting what people are going to want to do when one of these bad
conditions pops up is a large step ahead of where I think this
discussion should be focusing on now. You have to show how you're going
to measure the badness here in the likely failure situations before you
can then take action on them. If you do the former well enough, admins
will figure out how to deal with the latter in a way compatible with
their business processes in the first version.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 10:15:21
Message-ID:	3f0b79eb1001120215j1beba61cy6d00a81617128f8c@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 12, 2010 at 4:22 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> It would be more straightforward to have a function in the standby to
> return the current replay location. It feels more logical to poll the
> standby to get the status of the standby, instead of indirectly from the
> master. Besides, the master won't know how far the standby is if the
> connection to the standby is broken.
>
> Maybe we should just change the existing pg_current_xlog_location()
> function to return that when recovery is in progress. It currently
> throws an error during hot standby.

Sounds good.

I'd like to hear from someone which location should be returned by
that function (WAL receive/write/flush/replay location?). I vote for
WAL flush location because it's important for me to know how far the
standby can replay the WAL, i.e., how much transactions might be lost
at failover. And, it's also OK to provide the dedicated function for
WAL replay location. Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 10:18:45
Message-ID:	9837222c1001120218x5584e0cfvef15c2e0fbd72949@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 12, 2010 at 08:22, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Greg Smith wrote:
>> I don't think anybody can deploy this feature without at least some very
>> basic monitoring here. I like the basic proposal you made back in
>> September for adding a pg_standbys_xlog_location to replace what you
>> have to get from ps right now:
>> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php
>>
>> That's basic, but enough that people could get by for a V1.
>
> It would be more straightforward to have a function in the standby to
> return the current replay location. It feels more logical to poll the
> standby to get the status of the standby, instead of indirectly from the
> master. Besides, the master won't know how far the standby is if the
> connection to the standby is broken.
>
> Maybe we should just change the existing pg_current_xlog_location()
> function to return that when recovery is in progress. It currently
> throws an error during hot standby.
>

Not sure. I don't really like to monitor functions that return
different things depending on a scenario.

Assume I monitor it, and then do a failover. Suddenly the values I
monitor mean something else.

I think I'd prefer a separate function to monitor this status on the
slave. Oh, and it'd be nice if that one worked in HS mode both when in
streaming and non-streaming mode :-)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 13:59:01
Message-ID:	2716.1263304741@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> I'm not sure whether poll(2) should be called for this purpose. But
> poll(2) and select(2) seem to often come together in the existing code.
> We should follow such custom?

Yes. poll() is usually more efficient, so it's preferred, but not all
platforms have it. (On the other side, I think Windows might have
only poll and not select.)

regards, tom lane

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 14:13:30
Message-ID:	4B4C838A.60501@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>
>> I'm not sure whether poll(2) should be called for this purpose. But
>> poll(2) and select(2) seem to often come together in the existing code.
>> We should follow such custom?
>>
>
> Yes. poll() is usually more efficient, so it's preferred, but not all
> platforms have it. (On the other side, I think Windows might have
> only poll and not select.)
>
>
>

No, other way around, I'm fairly sure.

cheers

andrew

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 14:20:10
Message-ID:	9837222c1001120620s522e1945x6c79eca4ec55baf6@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 12, 2010 at 15:13, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> Tom Lane wrote:
>>
>> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>>
>>>
>>> I'm not sure whether poll(2) should be called for this purpose. But
>>> poll(2) and select(2) seem to often come together in the existing code.
>>> We should follow such custom?
>>>
>>
>> Yes. poll() is usually more efficient, so it's preferred, but not all
>> platforms have it. (On the other side, I think Windows might have
>> only poll and not select.)
>>
>>
>>
>
> No, other way around, I'm fairly sure.

Yeah, the emulation layer has select, not poll. It basically
translates the select into what looks very much like a poll, so maybe
we should consider implementing poll as well/instead. But for now,
select() is what we have.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 14:39:50
Message-ID:	3398.1263307190@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Tue, Jan 12, 2010 at 08:22, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Maybe we should just change the existing pg_current_xlog_location()
>> function to return that when recovery is in progress. It currently
>> throws an error during hot standby.

> Not sure. I don't really like to monitor functions that return
> different things depending on a scenario.

Yeah. We should only use that function if we can define it to mean
something on the slave that is very close to what it means on the
master. Otherwise, pick another name.

It seems to me that we should have at least two functions available
on the slave: latest xlog location received and synced to disk by
walreceiver (ie, we are guaranteed to be able to replay up to here);
and latest xlog location actually replayed (ie, the state visible
to queries on the slave). The latter perhaps could be
pg_current_xlog_location().

regards, tom lane

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 18:48:38
Message-ID:	4B4CC406.30207@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Tue, 2010-01-12 at 08:24 +0100, Stefan Kaltenbrunner wrote:
>> Fujii Masao wrote:
>>> On Tue, Jan 12, 2010 at 1:21 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>>>> I don't think anybody can deploy this feature without at least some very
>>>> basic monitoring here. I like the basic proposal you made back in September
>>>> for adding a pg_standbys_xlog_location to replace what you have to get from
>>>> ps right now:
>>>> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php
>>>>
>>>> That's basic, but enough that people could get by for a V1.
>>> Yeah, I have no objection to add such simple capability which monitors
>>> the lag into the first release. But I guess that, in addition to that,
>>> Simon wanted the capability to collect the statistical information about
>>> replication activity (e.g., a transfer time, a write time, replay time).
>>> So I'd like to postpone it.
>> yeah getting that would all be nice and handy but we have to remember
>> that this is really our first cut at integrated replication. Being able
>> to monitor lag is what is needed as a minimum, more advanced stuff can
>> and will emerge once we get some actual feedback from the field.
>
> Though there won't be any feedback from the field because there won't be
> any numbers to discuss. Just "it appears to be working". Then we will go
> into production and the problems will begin to be reported. We will be
> able to do nothing to resolve them because we won't know how many people
> are affected.

field is also production usage in my pov, and I'm not sure how we would
know how many people are affected by some imaginary issue just because
there is a column that has some numbers in it.
All of the large features we added in the past got finetuned and
improved in the following releases, and I expect SR to be one of them
that will see a lot of improvement in 8.5+n.
Adding detailed monitoring of some random stuff (I don't think there was
a clear proposal of what kind of stuff you would like to see) while we
don't really know what the performance characteristics are might easily
lead to us provding a ton of data and nothing relevant :(
What I really think we should do for this first cut is to make it as
foolproof and easy to set up as possible and add the minimum required
monitoring knobs but not going overboard with doing too many stats.

Stefan

From:	Marko Kreen <markokr(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 20:06:10
Message-ID:	e51f66da1001121206w79394f1cx8f36c8587cdeda3b@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 1/12/10, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> > I'm not sure whether poll(2) should be called for this purpose. But
> > poll(2) and select(2) seem to often come together in the existing code.
> > We should follow such custom?
>
>
> Yes. poll() is usually more efficient, so it's preferred, but not all
> platforms have it. (On the other side, I think Windows might have
> only poll and not select.)

FYI: on PL/Proxy we use poll() exclusively and on platforms
that dont have it (win32) we emulate poll() with select():

http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/plproxy/plproxy/src/poll_compat.c?rev=1.3&content-type=text/x-cvsweb-markup

End result is efficient and clean #ifdef-less code.

Something to consider.

--
marko

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Greg Smith <greg(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 20:11:54
Message-ID:	201001122011.o0CKBsq16953@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Stefan Kaltenbrunner wrote:
> Simon Riggs wrote:
> > On Tue, 2010-01-12 at 08:24 +0100, Stefan Kaltenbrunner wrote:
> >> Fujii Masao wrote:
> >>> On Tue, Jan 12, 2010 at 1:21 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> >>>> I don't think anybody can deploy this feature without at least some very
> >>>> basic monitoring here. I like the basic proposal you made back in September
> >>>> for adding a pg_standbys_xlog_location to replace what you have to get from
> >>>> ps right now:
> >>>> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php
> >>>>
> >>>> That's basic, but enough that people could get by for a V1.
> >>> Yeah, I have no objection to add such simple capability which monitors
> >>> the lag into the first release. But I guess that, in addition to that,
> >>> Simon wanted the capability to collect the statistical information about
> >>> replication activity (e.g., a transfer time, a write time, replay time).
> >>> So I'd like to postpone it.
> >> yeah getting that would all be nice and handy but we have to remember
> >> that this is really our first cut at integrated replication. Being able
> >> to monitor lag is what is needed as a minimum, more advanced stuff can
> >> and will emerge once we get some actual feedback from the field.
> >
> > Though there won't be any feedback from the field because there won't be
> > any numbers to discuss. Just "it appears to be working". Then we will go
> > into production and the problems will begin to be reported. We will be
> > able to do nothing to resolve them because we won't know how many people
> > are affected.
>
> field is also production usage in my pov, and I'm not sure how we would
> know how many people are affected by some imaginary issue just because
> there is a column that has some numbers in it.
> All of the large features we added in the past got finetuned and
> improved in the following releases, and I expect SR to be one of them
> that will see a lot of improvement in 8.5+n.
> Adding detailed monitoring of some random stuff (I don't think there was
> a clear proposal of what kind of stuff you would like to see) while we
> don't really know what the performance characteristics are might easily
> lead to us provding a ton of data and nothing relevant :(
> What I really think we should do for this first cut is to make it as
> foolproof and easy to set up as possible and add the minimum required
> monitoring knobs but not going overboard with doing too many stats.

I totally agree. If SR isn't going to be useful without being
feature-complete, we might as well just drop it for 8.5 right now.

Let's get a reasonable feature set implemented and then come back in 8.6
to improve it. For example, there is no need for a special
'replication' user (just use super-user), and monitoring should be
minimal until we have field experience of exactly what monitoring we
need.

The final commit-fest is in 5 days --- this is not the time for design
discussion and feature additions. If we wait for SR to be feature
complete, with design discussions, etc, we will hopelessly delay 8.5 and
people will get frustrated. I am not saying we can't talk about design,
but none of this should be a requirement for 8.5.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 20:39:00
Message-ID:	603c8f071001121239i3136750csf83e14a394814037@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 12, 2010 at 3:11 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> The final commit-fest is in 5 days --- this is not the time for design

Actually just over 2 days at this point...

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Marko Kreen <markokr(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 20:40:34
Message-ID:	23560.1263328834@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Marko Kreen <markokr(at)gmail(dot)com> writes:
> FYI: on PL/Proxy we use poll() exclusively and on platforms
> that dont have it (win32) we emulate poll() with select():

Yeah, maybe. At the time we started adding poll() support there were
enough platforms with only select() that it didn't make sense to impose
any sort of penalty on the latter. But by now maybe it'd make sense.
Especially if someone fixes the Windows code --- two levels of emulation
on Windows probably won't fly ...

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Greg Smith <greg(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 20:42:57
Message-ID:	23618.1263328977@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> The final commit-fest is in 5 days --- this is not the time for design
> discussion and feature additions.

+10 --- the one reason I can see for deciding to bounce SR is that there
still seem to be design discussions going on. It is WAY TOO LATE for
that folks. It's time to be thinking "what's the least we have to do to
make this shippable?"

regards, tom lane

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Greg Smith <greg(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 20:49:01
Message-ID:	1263329342.19367.179665.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2010-01-12 at 15:11 -0500, Bruce Momjian wrote:
> Stefan Kaltenbrunner wrote:
> > Simon Riggs wrote:
> > > On Tue, 2010-01-12 at 08:24 +0100, Stefan Kaltenbrunner wrote:
> > >> Fujii Masao wrote:
> > >>> On Tue, Jan 12, 2010 at 1:21 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> > >>>> I don't think anybody can deploy this feature without at least some very
> > >>>> basic monitoring here. I like the basic proposal you made back in September
> > >>>> for adding a pg_standbys_xlog_location to replace what you have to get from
> > >>>> ps right now:
> > >>>> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php
> > >>>>
> > >>>> That's basic, but enough that people could get by for a V1.
> > >>> Yeah, I have no objection to add such simple capability which monitors
> > >>> the lag into the first release. But I guess that, in addition to that,
> > >>> Simon wanted the capability to collect the statistical information about
> > >>> replication activity (e.g., a transfer time, a write time, replay time).
> > >>> So I'd like to postpone it.
> > >> yeah getting that would all be nice and handy but we have to remember
> > >> that this is really our first cut at integrated replication. Being able
> > >> to monitor lag is what is needed as a minimum, more advanced stuff can
> > >> and will emerge once we get some actual feedback from the field.
> > >
> > > Though there won't be any feedback from the field because there won't be
> > > any numbers to discuss. Just "it appears to be working". Then we will go
> > > into production and the problems will begin to be reported. We will be
> > > able to do nothing to resolve them because we won't know how many people
> > > are affected.
> >
> > field is also production usage in my pov, and I'm not sure how we would
> > know how many people are affected by some imaginary issue just because
> > there is a column that has some numbers in it.
> > All of the large features we added in the past got finetuned and
> > improved in the following releases, and I expect SR to be one of them
> > that will see a lot of improvement in 8.5+n.
> > Adding detailed monitoring of some random stuff (I don't think there was
> > a clear proposal of what kind of stuff you would like to see) while we
> > don't really know what the performance characteristics are might easily
> > lead to us provding a ton of data and nothing relevant :(
> > What I really think we should do for this first cut is to make it as
> > foolproof and easy to set up as possible and add the minimum required
> > monitoring knobs but not going overboard with doing too many stats.
>
> I totally agree. If SR isn't going to be useful without being
> feature-complete, we might as well just drop it for 8.5 right now.
>
> Let's get a reasonable feature set implemented and then come back in 8.6
> to improve it. For example, there is no need for a special
> 'replication' user (just use super-user), and monitoring should be
> minimal until we have field experience of exactly what monitoring we
> need.
>
> The final commit-fest is in 5 days --- this is not the time for design
> discussion and feature additions. If we wait for SR to be feature
> complete, with design discussions, etc, we will hopelessly delay 8.5 and
> people will get frustrated. I am not saying we can't talk about design,
> but none of this should be a requirement for 8.5.

We can't add monitoring until we know what the performance
characteristics are. Hmmm. And how will we know what the performance
characteristics are, I wonder?

Anyway, I'll leave it to you now.

--
Simon Riggs www.2ndQuadrant.com

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Greg Smith <greg(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 21:02:38
Message-ID:	4B4CE36E.3010603@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Tue, 2010-01-12 at 15:11 -0500, Bruce Momjian wrote:
>> Stefan Kaltenbrunner wrote:
>>> Simon Riggs wrote:
>>>> On Tue, 2010-01-12 at 08:24 +0100, Stefan Kaltenbrunner wrote:
>>>>> Fujii Masao wrote:
>>>>>> On Tue, Jan 12, 2010 at 1:21 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>>>>>>> I don't think anybody can deploy this feature without at least some very
>>>>>>> basic monitoring here. I like the basic proposal you made back in September
>>>>>>> for adding a pg_standbys_xlog_location to replace what you have to get from
>>>>>>> ps right now:
>>>>>>> http://archives.postgresql.org/pgsql-hackers/2009-09/msg00889.php
>>>>>>>
>>>>>>> That's basic, but enough that people could get by for a V1.
>>>>>> Yeah, I have no objection to add such simple capability which monitors
>>>>>> the lag into the first release. But I guess that, in addition to that,
>>>>>> Simon wanted the capability to collect the statistical information about
>>>>>> replication activity (e.g., a transfer time, a write time, replay time).
>>>>>> So I'd like to postpone it.
>>>>> yeah getting that would all be nice and handy but we have to remember
>>>>> that this is really our first cut at integrated replication. Being able
>>>>> to monitor lag is what is needed as a minimum, more advanced stuff can
>>>>> and will emerge once we get some actual feedback from the field.
>>>> Though there won't be any feedback from the field because there won't be
>>>> any numbers to discuss. Just "it appears to be working". Then we will go
>>>> into production and the problems will begin to be reported. We will be
>>>> able to do nothing to resolve them because we won't know how many people
>>>> are affected.
>>> field is also production usage in my pov, and I'm not sure how we would
>>> know how many people are affected by some imaginary issue just because
>>> there is a column that has some numbers in it.
>>> All of the large features we added in the past got finetuned and
>>> improved in the following releases, and I expect SR to be one of them
>>> that will see a lot of improvement in 8.5+n.
>>> Adding detailed monitoring of some random stuff (I don't think there was
>>> a clear proposal of what kind of stuff you would like to see) while we
>>> don't really know what the performance characteristics are might easily
>>> lead to us provding a ton of data and nothing relevant :(
>>> What I really think we should do for this first cut is to make it as
>>> foolproof and easy to set up as possible and add the minimum required
>>> monitoring knobs but not going overboard with doing too many stats.
>> I totally agree. If SR isn't going to be useful without being
>> feature-complete, we might as well just drop it for 8.5 right now.
>>
>> Let's get a reasonable feature set implemented and then come back in 8.6
>> to improve it. For example, there is no need for a special
>> 'replication' user (just use super-user), and monitoring should be
>> minimal until we have field experience of exactly what monitoring we
>> need.
>>
>> The final commit-fest is in 5 days --- this is not the time for design
>> discussion and feature additions. If we wait for SR to be feature
>> complete, with design discussions, etc, we will hopelessly delay 8.5 and
>> people will get frustrated. I am not saying we can't talk about design,
>> but none of this should be a requirement for 8.5.
>
> We can't add monitoring until we know what the performance
> characteristics are. Hmmm. And how will we know what the performance
> characteristics are, I wonder?

well I would say we do exactly how we have done in the past with other
features - by debugging the stuff with low level tools until we fully
understand what it really is and then we can always add more
"accessible" stats.

Stefan

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Greg Smith <greg(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 21:34:18
Message-ID:	201001122134.o0CLYIU19258@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > The final commit-fest is in 5 days --- this is not the time for design
> > discussion and feature additions.
>
> +10 --- the one reason I can see for deciding to bounce SR is that there
> still seem to be design discussions going on. It is WAY TOO LATE for
> that folks. It's time to be thinking "what's the least we have to do to
> make this shippable?"

I didn't know the plus meter went that high. ;-) LOL

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc:	Simon Riggs <simon(at)2ndQuadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Greg Smith <greg(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 21:35:05
Message-ID:	201001122135.o0CLZ5g20395@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Stefan Kaltenbrunner wrote:
> >> Let's get a reasonable feature set implemented and then come back in 8.6
> >> to improve it. For example, there is no need for a special
> >> 'replication' user (just use super-user), and monitoring should be
> >> minimal until we have field experience of exactly what monitoring we
> >> need.
> >>
> >> The final commit-fest is in 5 days --- this is not the time for design
> >> discussion and feature additions. If we wait for SR to be feature
> >> complete, with design discussions, etc, we will hopelessly delay 8.5 and
> >> people will get frustrated. I am not saying we can't talk about design,
> >> but none of this should be a requirement for 8.5.
> >
> > We can't add monitoring until we know what the performance
> > characteristics are. Hmmm. And how will we know what the performance
> > characteristics are, I wonder?
>
> well I would say we do exactly how we have done in the past with other
> features - by debugging the stuff with low level tools until we fully
> understand what it really is and then we can always add more
> "accessible" stats.

Right, so what is the risk of shipping without any fancy monitoring?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Greg Smith <greg(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 21:36:05
Message-ID:	1263332165.4362.592.camel@jd-desktop.unknown.charter.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2010-01-12 at 16:34 -0500, Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > The final commit-fest is in 5 days --- this is not the time for design
> > > discussion and feature additions.
> >
> > +10 --- the one reason I can see for deciding to bounce SR is that there
> > still seem to be design discussions going on. It is WAY TOO LATE for
> > that folks. It's time to be thinking "what's the least we have to do to
> > make this shippable?"
>
> I didn't know the plus meter went that high. ;-) LOL

Well, it is Tom. He has karma points.

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
Respect is earned, not gained through arbitrary and repetitive use or Mr. or Sir.

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 22:26:42
Message-ID:	4B4CF722.9080009@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Right, so what is the risk of shipping without any fancy monitoring?

We add monitoring in 9.1. er, 8.6.

--Josh Berkus

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Greg Smith <greg(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 22:37:11
Message-ID:	1263335831.26654.288.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2010-01-12 at 15:42 -0500, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > The final commit-fest is in 5 days --- this is not the time for design
> > discussion and feature additions.
>
> +10 --- the one reason I can see for deciding to bounce SR is that there
> still seem to be design discussions going on. It is WAY TOO LATE for
> that folks. It's time to be thinking "what's the least we have to do to
> make this shippable?"

I've not asked to bounce SR, I am strongly in favour of it going in,
having been supporting the project on and off for 18 months.

There is not much sense being talked here. I have asked for sufficient
monitoring to allow us to manage it in production, which is IMHO the
minimum required to make it shippable. This is a point I have mentioned
over the course of many months, not a sudden additional thought.

If the majority thinks that being able to find out the current replay
point of recovery is all we need to manage replication then I will
happily defer to that view, without changing my opinion that we need
more. It should be clear that we didn't even have that before I raised
the point.

Overall, it isn't sensible or appropriate to oppose my viewpoint by
putting words into my mouth that have never been said, which applies to
most people's comments to me on this recent thread.

--
Simon Riggs www.2ndQuadrant.com

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 22:41:21
Message-ID:	4B4CFA91.4050005@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian wrote:
> Right, so what is the risk of shipping without any fancy monitoring?
>

You can monitor the code right now by watching the output shown in the
ps display and by trolling the database logs. If I had to I could build
a whole monitoring system out of those components, it would just be very
fragile. I'd rather see one or two very basic bits of internals exposed
beyond those to reduce that effort. I think it's a stretch to say that
request represents a design change; a couple of UDFs to expose some
internals is all I think it would take to dramatically drop the amount
of process/log scraping required here to support a SR system.

I guess the slightly more ambitious performance monitoring bits that
Simon was suggesting may cross the line as being too late to implement
now though (depends on how productive the people actually coding on this
are I guess), and certainly the ideas thrown out for implementing any
smart behavior or alerting when replication goes bad like Josh's
"archiving_lag_action" seem based the deadline to get addressed
now--even though I agree with the basic idea.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 22:47:46
Message-ID:	1263336466.14547.8.camel@jd-desktop.unknown.charter.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2010-01-12 at 17:41 -0500, Greg Smith wrote:
> Bruce Momjian wrote:
> > Right, so what is the risk of shipping without any fancy monitoring?
> >
>
> You can monitor the code right now by watching the output shown in the
> ps display and by trolling the database logs. If I had to I could build
> a whole monitoring system out of those components, it would just be very
> fragile. I'd rather see one or two very basic bits of internals exposed
> beyond those to reduce that effort.

Considering that is pretty much the best we can do with log shipping, I
would have to agree. We should either provide real monitoring facilities
(not necessarily tools, but at least queries or an api) for the feature
or the feature isn't ready to go in.

> I think it's a stretch to say that
> request represents a design change; a couple of UDFs to expose some
> internals is all I think it would take to dramatically drop the amount
> of process/log scraping required here to support a SR system.

Bingo.

Joshua D. Drake

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 22:54:35
Message-ID:	4B4CFDAB.1070901@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Smith wrote:
> Bruce Momjian wrote:
>> Right, so what is the risk of shipping without any fancy monitoring?
>>
>
> You can monitor the code right now by watching the output shown in the
> ps display and by trolling the database logs. If I had to I could build
> a whole monitoring system out of those components, it would just be very
> fragile. I'd rather see one or two very basic bits of internals exposed
> beyond those to reduce that effort. I think it's a stretch to say that
> request represents a design change; a couple of UDFs to expose some
> internals is all I think it would take to dramatically drop the amount
> of process/log scraping required here to support a SR system.

so is there an actually concrete proposal of _what_ interals to expose?

>
> I guess the slightly more ambitious performance monitoring bits that
> Simon was suggesting may cross the line as being too late to implement
> now though (depends on how productive the people actually coding on this
> are I guess), and certainly the ideas thrown out for implementing any
> smart behavior or alerting when replication goes bad like Josh's
> "archiving_lag_action" seem based the deadline to get addressed
> now--even though I agree with the basic idea.

I'm not convinced that embedding actual alerting functionality in the
database is a good idea. Any reasonable production deployment is
probably using a dedicated monitoring and alerting system that is
aggregating and qualifying all monitoring results (as wel as proper
ratelimiting and stuff) that just needs a way to read in basic data.
Initially something like archiving_lag_action sounds like an invitation
to do a send_mail_to_admin() thingy which is really the wrong way to
approach monitoring in large scale environments...
The database needs to prove very basic information like "we are 10min
behind in replication" or "3 wal files behind" - the decision if any of
that is an actual issue or not should be left to the actual monitoring
system.

Stefan

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Smith" <greg(at)2ndQuadrant(dot)com>, "Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc>
Cc:	"Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-12 23:00:03
Message-ID:	4B4CAA93020000250002E3D4@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> wrote:

> The database needs to prove very basic information like "we are
> 10min behind in replication" or "3 wal files behind" - the
> decision if any of that is an actual issue or not should be left
> to the actual monitoring system.

+1

-Kevin

From:	Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject:	Re: Streaming replication status
Date:	2010-01-13 01:44:48
Message-ID:	201001122044.48955.xzilla@users.sourceforge.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Monday 11 January 2010 23:24:24 Greg Smith wrote:
> Fujii Masao wrote:
> > On Mon, Jan 11, 2010 at 5:36 PM, Craig Ringer
> >
> > <craig(at)postnewspapers(dot)com(dot)au> wrote:
> >> Personally, I'd be uncomfortable enabling something like that without
> >> _both_ an admin alert _and_ the ability to refresh the slave's base
> >> backup without admin intervention.
> >
> > What feature do you specifically need as an alert? Just writing
> > the warning into the logfile is enough? Or need to notify by
> > using SNMP trap message? Though I'm not sure if this is a role
> > of Postgres.
>
> It's impossible for the database to have any idea whatsoever how people
> are going to want to be alerted. Provide functions to monitor things
> like replication lag, like the number of segments queued up to feed to
> archive_command, and let people build their own alerting mechanism for
> now. They're going to do that anyway, so why waste precious time here
> building someone that's unlikely to fit any but a very narrow use case?

That said, emitting the information to a log file makes for a crappy way to
retrieve the information. The ideal api is that I can find the information out
via result of some SELECT query; view, table ,function doesn't matter, as long
as I can select it out. Bonus points for being able to get information from
the hot standby.

--
Robert Treat
Conjecture: http://www.xzilla.net
Consulting: http://www.omniti.com

From:	Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject:	Re: Streaming replication status
Date:	2010-01-13 02:18:50
Message-ID:	201001122118.51430.xzilla@users.sourceforge.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tuesday 12 January 2010 17:37:11 Simon Riggs wrote:
> There is not much sense being talked here. I have asked for sufficient
> monitoring to allow us to manage it in production, which is IMHO the
> minimum required to make it shippable. This is a point I have mentioned
> over the course of many months, not a sudden additional thought.
>

Even subscribing to this view point, there is sure to be a significant wiggle
room in what people find to be "sufficient monitoring". If I had to score the
monitoring facilities we have for PITR standby, I'd give it about a crap out
of 5, and yet somehow we seem to manage it.

> If the majority thinks that being able to find out the current replay
> point of recovery is all we need to manage replication then I will
> happily defer to that view, without changing my opinion that we need
> more. It should be clear that we didn't even have that before I raised
> the point.
>

I'm certainly interested in specifics of what you think need to be exposed for
monitoring, and I'd be interested in whether those things can be exposed as
either trace points or possibly as C functions. My guess is that we won't get
them into core for 8.5, but that we might be able to provide some additional
facilities after the fact as we get more of these systems deployed.

--
Robert Treat
Conjecture: http://www.xzilla.net
Consulting: http://www.omniti.com

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-13 03:05:26
Message-ID:	4B4D3876.9030604@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> I guess the slightly more ambitious performance monitoring bits that
> Simon was suggesting may cross the line as being too late to implement
> now though (depends on how productive the people actually coding on this
> are I guess), and certainly the ideas thrown out for implementing any
> smart behavior or alerting when replication goes bad like Josh's
> "archiving_lag_action" seem based the deadline to get addressed
> now--even though I agree with the basic idea.

Well, honestly, I wasn't talking about monitoring at all. I was talking
about the general issue of "how should the system behave when it runs
out of disk space".

For the installation for which data integrity is paramount, when
replication becomes impossible because there is no more room for logs,
then the whole system, master and slaves, should shut down. For most
people, they'd just want the master to start ignoring the slave and
recycling logs. Presumably, the slave would notice this and shut down.

So I was talking about data integrity, not monitoring.

However, it's probably a better thing to simply expose a way to query
how much extra log data we have, in raw form (bytes or pages). From
this, an administration script could take appropriate action.

--Josh Berkus

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-13 03:55:24
Message-ID:	4B4D442C.4070304@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> However, it's probably a better thing to simply expose a way to query
> how much extra log data we have, in raw form (bytes or pages). From
> this, an administration script could take appropriate action.

Also: I think we could release without having this facility. We did
with PITR, after all.

--Josh

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Greg Smith <greg(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-13 04:26:49
Message-ID:	201001130426.o0D4QnD21775@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Tue, 2010-01-12 at 15:42 -0500, Tom Lane wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > The final commit-fest is in 5 days --- this is not the time for design
> > > discussion and feature additions.
> >
> > +10 --- the one reason I can see for deciding to bounce SR is that there
> > still seem to be design discussions going on. It is WAY TOO LATE for
> > that folks. It's time to be thinking "what's the least we have to do to
> > make this shippable?"
>
> I've not asked to bounce SR, I am strongly in favour of it going in,
> having been supporting the project on and off for 18 months.
>
> There is not much sense being talked here. I have asked for sufficient
> monitoring to allow us to manage it in production, which is IMHO the
> minimum required to make it shippable. This is a point I have mentioned

Let me explain why Simon feels he is misquoted --- Simon, you are saying
above that "sufficient monitoring" is a minimum requirement, meaning it
is necessary, and I and others are saying if we need to design a
monitoring system at this stage to ship SR, then let's forget about this
feature for 8.5.

In summary, by requiring monitoring, you are encouraging others to just
abandon SR completely for 8.5. We didn't say you were suggesting
abandonment SR, it is just that the monitoring requirement is making
abandonment of SR for 8.5 more likely because the addition of monitoring
could hopelessly delay 8.5 because we have no idea even how to implement
monitoring.

> over the course of many months, not a sudden additional thought.
>
> Overall, it isn't sensible or appropriate to oppose my viewpoint by
> putting words into my mouth that have never been said, which applies to
> most people's comments to me on this recent thread.

Yea, yea, everyone seems to misquote you Simon, at least from your
perspective. You must admit that you seem to feel that way a lot.

Good --- let's move forward with a minimal feature set to get SR in 8.5
in a reasonable timeframe. If we have extra time we can add stuff but
let's not require it from the start.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-13 05:10:27
Message-ID:	3f0b79eb1001122110m303fd1bbq4c28dea203bdb9ed@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 12, 2010 at 10:59 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
>> I'm not sure whether poll(2) should be called for this purpose. But
>> poll(2) and select(2) seem to often come together in the existing code.
>> We should follow such custom?
>
> Yes. poll() is usually more efficient, so it's preferred, but not all
> platforms have it. (On the other side, I think Windows might have
> only poll and not select.)

OK. I reactivated pq_wait() and secure_poll() which uses poll(2) to
check the socket if available, otherwise select(2).

Also the capability to check the socket for data to be written is not
unused for SR right now (it was provided previously). So I dropped it
for simplification.

http://archives.postgresql.org/pgsql-hackers/2010-01/msg00827.php
> Oh, I think we need to fix that, I'm thinking of doing a select() in the
> loop to check that the socket hasn't been closed yet. I meant we don't
> need to try reading the 'X' to tell apart e.g a network problem from a
> standby that's shut down cleanly.

Without reading the 'X' message from the standby, the walsender doesn't
detect the close of connection immediately in my environment. So I also
reactivated the subset of ProcessStreamMessage().

git://git.postgresql.org/git/users/fujii/postgres.git
branch: replication

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-13 05:34:26
Message-ID:	3f0b79eb1001122134y67463a16m49960f2e98548ab3@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 12, 2010 at 10:16 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> I am concerned that knowledge of this new read-only replication user
> would have to be spread all over the backend code, which is really not
> something we should be doing at this stage in 8.5 development. I am
> also thinking such a special user might fall out of work on mandatory
> access control, so maybe we should just require super-user for 8.5 and
> revisit this for 8.6.

OK. I leave that code as it is. If the majority feel it's overkill to
require a superuser privilege when authenticating the standby, we can
just drop it later.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-13 08:47:33
Message-ID:	4B4D88A5.2050905@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Stefan Kaltenbrunner wrote:
> so is there an actually concrete proposal of _what_ interals to expose? '

The pieces are coming together...summary:

-Status quo: really bad, but could probably ship anyway because
existing PITR is no better and people manage to use it
-Add slave pg_current_xlog_location() and something like
pg_standby_received_xlog_location(): Much better, gets rid of the worst
issues here.
-Also add pg_standbys_xlog_location() on the master: while they could
live without it, this really helps out the "alert/monitor" script writer
whose use cases keep popping up here.

Details...the original idea from Fujii was:

"I'm thinking something like pg_standbys_xlog_location() [on the
primary] which returns
one row per standby servers, showing pid of walsender, host name/
port number/user OID of the standby, the location where the standby
has written/flushed WAL. DBA can measure the gap from the
combination of pg_current_xlog_location() and pg_standbys_xlog_location()
via one query on the primary."

After some naming quibbles and questions about what direction that
should happen in, Tom suggested the initial step here is:

"It seems to me that we should have at least two functions available
on the slave: latest xlog location received and synced to disk by
walreceiver (ie, we are guaranteed to be able to replay up to here);
and latest xlog location actually replayed (ie, the state visible
to queries on the slave). The latter perhaps could be
pg_current_xlog_location()."

So there's the first two of them: on the slave,
pg_current_xlog_location() giving the latest location replayed, and a
new one named something like pg_standby_received_xlog_location(). If
you take the position that an unreachable standby does provide answers
to these questions too (you just won't like them), this pair might be
sufficient to ship.

To help a lot at dealing with all the error situations where the standby
isn't reachable and segments are piling up (possibly leading to full
disk), the next figure that seems to answer the most questions is asking
the primary "what's the location of the last WAL segment file in the
pile of ones to be archived/distributed that has been requested (or
processed if that's the easier thing to note) by the standby?". That's
what is named pg_standbys_xlog_location() in the first paragraph I
quoted. If you know enough to identify that segment file on disk, you
can always look at its timestamp (and the ones on the rest of the files
in that directory) in a monitoring script to turn that information into
segments or a time measurement instead--xlog segments are nicely ordered
after all.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-14 08:33:41
Message-ID:	3f0b79eb1001140033p707862f4yabece81301ca609c@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 13, 2010 at 5:47 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> The pieces are coming together...summary:

Thanks for the summary!

> -Also add pg_standbys_xlog_location() on the master: while they could live without it, this really helps out the "alert/monitor" script writer whose use cases keep popping up here.
>
> Details...the original idea from Fujii was:
>
> "I'm thinking something like pg_standbys_xlog_location() [on the primary] which returns
> one row per standby servers, showing pid of walsender, host name/
> port number/user OID of the standby, the location where the standby
> has written/flushed WAL. DBA can measure the gap from the
> combination of pg_current_xlog_location() and pg_standbys_xlog_location()
> via one query on the primary."

This function is useful but not essential for troubleshooting, I think.
So I'd like to postpone it.

> "It seems to me that we should have at least two functions available
> on the slave: latest xlog location received and synced to disk by
> walreceiver (ie, we are guaranteed to be able to replay up to here);
> and latest xlog location actually replayed (ie, the state visible
> to queries on the slave). The latter perhaps could be
> pg_current_xlog_location()."
>
> So there's the first two of them: on the slave, pg_current_xlog_location()
> giving the latest location replayed, and a new one named something like
> pg_standby_received_xlog_location(). If you take the position that an
> unreachable standby does provide answers to these questions too (you just
> won't like them), this pair might be sufficient to ship.

Done.

git://git.postgresql.org/git/users/fujii/postgres.git
branch: replication

I added two new functions;

(1) pg_last_xlog_receive_location() reports the last WAL location received
and synced by walreceiver. If streaming replication is still in progress
this will increase monotonically. If streaming replication has completed
then this value will remain static at the value of the last WAL record
received and synced. When the server has been started without a streaming
replication then the return value will be InvalidXLogRecPtr (0/0).

(2) pg_last_xlog_replay_location() reports the last WAL location replayed
during recovery. If recovery is still in progress this will increase
monotonically. If recovery has completed then this value will remain
static at the value of the last WAL record applied. When the server has
been started normally without a recovery then the return value will be
InvalidXLogRecPtr (0/0).

Since it's somewhat odd for me that pg_current_xlog_location() reports the
WAL replay location, I didn't do that. But if the majority feel that it's sane,
I'll merge pg_last_xlog_replay_location() into pg_current_xlog_location().

Thought? Better name?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-15 04:07:52
Message-ID:	4B4FEA18.5080705@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Fujii Masao wrote:
>> "I'm thinking something like pg_standbys_xlog_location() [on the primary] which returns
>> one row per standby servers, showing pid of walsender, host name/
>> port number/user OID of the standby, the location where the standby
>> has written/flushed WAL. DBA can measure the gap from the
>> combination of pg_current_xlog_location() and pg_standbys_xlog_location()
>> via one query on the primary."
>>
>
> This function is useful but not essential for troubleshooting, I think.
> So I'd like to postpone it.
>

Sure; in a functional system where primary and secondary are both up,
you can assemble the info using the new functions you just added, so
this other one is certainly optional. I just took a brief look at the
code of the features you added, and it looks like it exposes the minimum
necessary to make this whole thing possible to manage. I think it's OK
if you postpone this other bit, more important stuff for you to work on.

So: the one piece of information I though was most important to expose
here at an absolute minimum is there now. Good progress. The other
popular request that keeps popping up here is providing an easy way to
see how backlogged the archive_command is, to make it easier to monitor
for out of disk errors that might prove catastrophic to replication.

I just spent some time looking through the WAL/archiving code in that
context. It looks to me that that this information isn't really stored
anywhere right now. The only thing that knows what segment is currently
queued up to copy over is pgarch_ArchiverCopyLoop via its call to
pgarch_readyXlog. Now, this is a pretty brute-force piece of code: it
doesn't remember its previous work at all, it literally walks the
archive_status directory looking for *.ready files that have names that
look like xlog files, then returns the earliest. That unfortunately
means that it's not even thinking in the same terms as all these other
functions, which are driven by the xlog_location advancing, and then the
filename is computed from that. All you've got is the filename at this
point, and it's not even guaranteed to be real--you could easily fool
this code if you dropped an inappropriately named file into that directory.

I could easily update this code path to save the name of the last
archived file in memory while all this directory scanning is going on
anyway, and then provide a UDF to expose that bit of information. The
result would need to have documentation that disclaims it like this:

pg_last_archived_xlogfile() text: Get the name of the last file the
archive_command [tried to|successfully] archived since the server was
started. If archiving is disabled or no xlog files have become ready to
archive since startup, a blank line will be returned. It is possible
for this function to return a result that does not reflect an actual
xlogfile if files are manually added to the server's archive_status
directory.

I'd find this extremely handy as a hook for monitoring scripts that want
to watch the server but don't have access to the filesystem directly,
even given those limitations. I'd prefer to have the "tried to"
version, because it will populate with the name of the troublesome file
it's stuck on even if archiving never gets its first segment delivered.

I'd happily write a patch to handle all that if I thought it would be
accepted. I fear that the whole approach will be considered a bit too
hackish and get rejected on that basis though. Not really sure of a
"right" way to handle this though. Anything better is going to be more
complicated because it requires passing more information into the
archiver, with little gain for that work beyond improving the quality of
this diagnostic routine. And I think most people would find what I
described above useful enough.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-15 04:20:41
Message-ID:	1263529241.26654.28801.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2010-01-14 at 23:07 -0500, Greg Smith wrote:

> pg_last_archived_xlogfile() text: Get the name of the last file the
> archive_command [tried to|successfully] archived since the server was
> started. If archiving is disabled or no xlog files have become ready
> to archive since startup, a blank line will be returned.

> It is possible for this function to return a result that does not
> reflect an actual xlogfile if files are manually added to the server's
> archive_status directory.

> I'd find this extremely handy as a hook for monitoring scripts that
> want to watch the server but don't have access to the filesystem
> directly, even given those limitations. I'd prefer to have the "tried
> to" version, because it will populate with the name of the troublesome
> file it's stuck on even if archiving never gets its first segment
> delivered.
>
> I'd happily write a patch to handle all that if I thought it would be
> accepted. I fear that the whole approach will be considered a bit too
> hackish and get rejected on that basis though. Not really sure of a
> "right" way to handle this though. Anything better is going to be
> more complicated because it requires passing more information into the
> archiver, with little gain for that work beyond improving the quality
> of this diagnostic routine. And I think most people would find what I
> described above useful enough.

Yes, please write it. It's separate from SR, so will not interfere.

--
Simon Riggs www.2ndQuadrant.com

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-15 06:53:18
Message-ID:	4B5010DE.50802@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Smith wrote:
> Fujii Masao wrote:
>>> "I'm thinking something like pg_standbys_xlog_location() [on the primary] which returns
>>> one row per standby servers, showing pid of walsender, host name/
>>> port number/user OID of the standby, the location where the standby
>>> has written/flushed WAL. DBA can measure the gap from the
>>> combination of pg_current_xlog_location() and pg_standbys_xlog_location()
>>> via one query on the primary."
>>>
>>
>> This function is useful but not essential for troubleshooting, I think.
>> So I'd like to postpone it.
>>
>
> Sure; in a functional system where primary and secondary are both up,
> you can assemble the info using the new functions you just added, so
> this other one is certainly optional. I just took a brief look at the
> code of the features you added, and it looks like it exposes the minimum
> necessary to make this whole thing possible to manage. I think it's OK
> if you postpone this other bit, more important stuff for you to work on.

agreed

>
> So: the one piece of information I though was most important to expose
> here at an absolute minimum is there now. Good progress. The other
> popular request that keeps popping up here is providing an easy way to
> see how backlogged the archive_command is, to make it easier to monitor
> for out of disk errors that might prove catastrophic to replication.

I tend to disagree - in any reasonable production setup basic stulff
like disk space usage is monitored by non-application specific matters.
While monitoring backlog might be interesting for other reasons, citing
disk space usage/exhaustions seems just wrong.

[...]
>
> I'd find this extremely handy as a hook for monitoring scripts that want
> to watch the server but don't have access to the filesystem directly,
> even given those limitations. I'd prefer to have the "tried to"
> version, because it will populate with the name of the troublesome file
> it's stuck on even if archiving never gets its first segment delivered.

While fancy at all I think this goes way to far for the first cut at
SR(or say this release), monitoring disk usage and tracking log files
for errors are SOLVED issues in estabilished production setups. If you
are in an environment that does neither for each and every server
independent on what you have running on it, or a setup where the
sysadmins are clueless and the poor DBA has to hack around that fact you
have way bigger issues anyway.

Stefan

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-15 16:50:39
Message-ID:	4B509CDF.7020902@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Stefan Kaltenbrunner wrote:
> Greg Smith wrote:
>>
>> The other popular request that keeps popping up here is providing an
>> easy way to see how backlogged the archive_command is, to make it
>> easier to monitor for out of disk errors that might prove
>> catastrophic to replication.
>
> I tend to disagree - in any reasonable production setup basic stulff
> like disk space usage is monitored by non-application specific matters.
> While monitoring backlog might be interesting for other reasons,
> citing disk space usage/exhaustions seems just wrong.

I was just mentioning that one use of the data, but there are others.
Let's say that your archive_command works by copying things over to a
NFS mount, and the mount goes down. It could be a long time before you
noticed this via disk space monitoring. But if you were monitoring "how
long has it been since the last time pg_last_archived_xlogfile()
changed?", this would jump right out at you.

Another popular question is "how far behind real-time is the archiver
process?" You can do this right now by duplicating the same xlog file
name scanning and sorting that the archiver does in your own code,
looking for .ready files. It would be simpler if you could call
pg_last_archived_xlogfile() and then just grab that file's timestamp.

I think it's also important to consider the fact that diagnostic
internals exposed via the database are far more useful to some people
than things you have to setup outside of it. You talk about reasonable
configurations above, but some production setups are not so reasonable.
In many of the more secure environments I've worked in (finance,
defense), there is *no* access to the database server beyond what comes
out of port 5432 without getting a whole separate team of people
involved. If the DBA can write a simple monitoring program themselves
that presents data via the one port that is exposed, that makes life
easier for them. This same issue pops up sometimes when we consider the
shared hosting case too, where the user may not have the option of
running a full-fledged monitoring script.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Smith" <greg(at)2ndquadrant(dot)com>, "Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc>
Cc:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-15 16:55:59
Message-ID:	4B5049BF020000250002E57B@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Smith <greg(at)2ndquadrant(dot)com> wrote:

> In many of the more secure environments I've worked in (finance,
> defense), there is *no* access to the database server beyond what
> comes out of port 5432 without getting a whole separate team of
> people involved. If the DBA can write a simple monitoring program
> themselves that presents data via the one port that is exposed,
> that makes life easier for them.

Right, we don't want to give the monitoring software an OS login for
the database servers, for security reasons.

-Kevin

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-15 17:24:58
Message-ID:	4B50A4EA.8040408@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Smith wrote:
> Stefan Kaltenbrunner wrote:
>> Greg Smith wrote:
>>>
>>> The other popular request that keeps popping up here is providing an
>>> easy way to see how backlogged the archive_command is, to make it
>>> easier to monitor for out of disk errors that might prove
>>> catastrophic to replication.
>>
>> I tend to disagree - in any reasonable production setup basic stulff
>> like disk space usage is monitored by non-application specific matters.
>> While monitoring backlog might be interesting for other reasons,
>> citing disk space usage/exhaustions seems just wrong.
>
> I was just mentioning that one use of the data, but there are others.
> Let's say that your archive_command works by copying things over to a
> NFS mount, and the mount goes down. It could be a long time before you
> noticed this via disk space monitoring. But if you were monitoring "how
> long has it been since the last time pg_last_archived_xlogfile()
> changed?", this would jump right out at you.

well from an syadmin perspective you have to monitor the NFS mount
anyway - so why do you need the database to do too(and not in a sane way
because there is no way the database can even figure out what the real
problem is and if there is one)?

>
> Another popular question is "how far behind real-time is the archiver
> process?" You can do this right now by duplicating the same xlog file
> name scanning and sorting that the archiver does in your own code,
> looking for .ready files. It would be simpler if you could call
> pg_last_archived_xlogfile() and then just grab that file's timestamp.

well that one seems a more reasonable reasoning to me however I'm not so
sure that the proposed implementation feels right - though can't come up
with a better suggestion for now.

>
> I think it's also important to consider the fact that diagnostic
> internals exposed via the database are far more useful to some people
> than things you have to setup outside of it. You talk about reasonable
> configurations above, but some production setups are not so reasonable.
> In many of the more secure environments I've worked in (finance,
> defense), there is *no* access to the database server beyond what comes
> out of port 5432 without getting a whole separate team of people
> involved. If the DBA can write a simple monitoring program themselves
> that presents data via the one port that is exposed, that makes life
> easier for them. This same issue pops up sometimes when we consider the
> shared hosting case too, where the user may not have the option of
> running a full-fledged monitoring script.

well again I consider stuff like "available diskspace" or "NFS mount
available" completely in the realm of the OS level management. The
database side should focus on the stuff that concerns the internal state
and operation of the database app itself.
If you continue your line of thought you will have to add all kind of
stuff to the database, like CPU usage tracking, getting information
about running processes, storage health.
As soon as you are done you have reimplemented nagios-plugins over SQL
on port 5432 instead of NRPE(or SNMP or whatnot).
Again I fully understand and know that there are environments where the
DBA does not have OS level (be it root or no shell at all) access has to
the OS but even if you had that "archiving is hanging" function you
would still have to go back to that "completely different group" and
have them diagnose again.
So my point is - that even if you have disparate groups of people being
responsible for different parts of a system solution you can't really
work around incompetency(or slownest or whatever) of the group
responsible for the lower layer by adding partial and inexact
functionality at the upper part that can only guess what the real issue is.

Stefan

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-15 17:30:27
Message-ID:	4B50A633.4060706@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kevin Grittner wrote:
> Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>
>> In many of the more secure environments I've worked in (finance,
>> defense), there is *no* access to the database server beyond what
>> comes out of port 5432 without getting a whole separate team of
>> people involved. If the DBA can write a simple monitoring program
>> themselves that presents data via the one port that is exposed,
>> that makes life easier for them.
>
> Right, we don't want to give the monitoring software an OS login for
> the database servers, for security reasons.

depending on what you exactly mean by that I do have to wonder how you
monitor more complex stuff (or stuff that require elevated privs) - say
raid health, multipath configuration, status of OS level updates, "are
certain processes running or not" as well as basic parameters like CPU
or IO load. as in stuff you cannot know usless you have it exported
through "some" port.

Stefan

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Stefan Kaltenbrunner" <stefan(at)kaltenbrunner(dot)cc>
Cc:	"Greg Smith" <greg(at)2ndquadrant(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-15 18:03:53
Message-ID:	4B5059A9020000250002E591@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> wrote:
> Kevin Grittner wrote:

>> Right, we don't want to give the monitoring software an OS login
>> for the database servers, for security reasons.
>
> depending on what you exactly mean by that I do have to wonder how
> you monitor more complex stuff (or stuff that require elevated
> privs) - say raid health, multipath configuration, status of OS
> level updates, "are certain processes running or not" as well as
> basic parameters like CPU or IO load. as in stuff you cannot know
> usless you have it exported through "some" port.

Many of those are monitored on the server one way or another,
through a hardware card accessible only to the DBAs. The card sends
an email to the DBAs for any sort of distress, including impending
or actual drive failure, ambient temperature out of bounds, internal
or external power out of bounds, etc. OS updates are managed by the
DBAs through scripts. Ideally we would tie these in to our opcenter
software, which displays status through hundreds of "LED" boxes on
big plasma displays in our support areas (and can send emails and
jabber messages when things get to a bad state), but since the
messages are getting to the right people in a timely manner, this is
a low priority as far as monitoring enhancement requests go.

Only the DBAs have OS logins to database servers. Monitoring
software must deal with application ports (which have to be open
anyway, so that doesn't add any security risk). Since the hardware
monitoring doesn't know about file systems, and the disk space on
database servers is primarily an issue for the database, it made
sense to us to add the ability to check the space available to the
database through a database connection. Hence, fsutil.

-Kevin

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-15 18:44:18
Message-ID:	4B50B782.40805@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Stefan Kaltenbrunner wrote:
>>
>> Another popular question is "how far behind real-time is the archiver
>> process?" You can do this right now by duplicating the same xlog
>> file name scanning and sorting that the archiver does in your own
>> code, looking for .ready files. It would be simpler if you could
>> call pg_last_archived_xlogfile() and then just grab that file's
>> timestamp.
>
> well that one seems a more reasonable reasoning to me however I'm not
> so sure that the proposed implementation feels right - though can't
> come up with a better suggestion for now.

That's basically where I'm at, and I was looking more for feedback on
that topic rather than to get lost defending use-cases here. There are
a few of them, and you can debate their individual merits all day. As a
general comment to your line of criticism here, I feel the idea that
"we're monitoring that already via <x>" does not mean that an additional
check is without value. The kind of people who like redundancy in their
database like it in their monitoring, too. I feel there's at least one
unique thing exposing this bit buys you, and the fact that it can be a
useful secondary source of information too for systems monitoring is
welcome bonus--regardless of whether good practice already supplies a
primary one.

> If you continue your line of thought you will have to add all kind of
> stuff to the database, like CPU usage tracking, getting information
> about running processes, storage health.

I'm looking to expose something that only the database knows for
sure--"what is the archiver working on?"--via the standard way you ask
the database questions, a SELECT call. The database doesn't know
anything about the CPU, running processes, or storage, so suggesting
this path leads in that direction doesn't make any sense.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-16 17:18:07
Message-ID:	4B51F4CF.40400@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Smith wrote:
> Stefan Kaltenbrunner wrote:
>>>
>>> Another popular question is "how far behind real-time is the archiver
>>> process?" You can do this right now by duplicating the same xlog
>>> file name scanning and sorting that the archiver does in your own
>>> code, looking for .ready files. It would be simpler if you could
>>> call pg_last_archived_xlogfile() and then just grab that file's
>>> timestamp.
>>
>> well that one seems a more reasonable reasoning to me however I'm not
>> so sure that the proposed implementation feels right - though can't
>> come up with a better suggestion for now.
>
> That's basically where I'm at, and I was looking more for feedback on
> that topic rather than to get lost defending use-cases here. There are
> a few of them, and you can debate their individual merits all day. As a
> general comment to your line of criticism here, I feel the idea that
> "we're monitoring that already via <x>" does not mean that an additional
> check is without value. The kind of people who like redundancy in their
> database like it in their monitoring, too. I feel there's at least one
> unique thing exposing this bit buys you, and the fact that it can be a
> useful secondary source of information too for systems monitoring is
> welcome bonus--regardless of whether good practice already supplies a
> primary one.

well that might be true - but as somebody with an extensive sysadmin
background I was specifically ticked by the "disk full" stuff mentioned
upthread. Monitoring also means standardization and somebody who runs
hundreds (or dozends) of servers is much better of getting the basics
monitored the same on all systems and getting more specific as you move
upwards the (application)stack.

>
>> If you continue your line of thought you will have to add all kind of
>> stuff to the database, like CPU usage tracking, getting information
>> about running processes, storage health.
>
> I'm looking to expose something that only the database knows for
> sure--"what is the archiver working on?"--via the standard way you ask
> the database questions, a SELECT call. The database doesn't know
> anything about the CPU, running processes, or storage, so suggesting
> this path leads in that direction doesn't make any sense.

well the database does not really know much about "free diskspace" in
reality as well - the only thing it knows is that it might not be able
to write data or execute a script and unless you have shell/logfile
access you cannot diagnose those anyway even with all the proposed
functions.
However what I was really trying to say is that we should focus on
getting the code stable first and that prettying it up with fancy stat
functions is something that really can and should be done in a followup
release once we understand how the code behaves and maybe also how it is
likely going to evolve...

Stefan

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-16 17:22:57
Message-ID:	4B51F5F1.2050906@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kevin Grittner wrote:
> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> wrote:
>> Kevin Grittner wrote:
>
>>> Right, we don't want to give the monitoring software an OS login
>>> for the database servers, for security reasons.
>> depending on what you exactly mean by that I do have to wonder how
>> you monitor more complex stuff (or stuff that require elevated
>> privs) - say raid health, multipath configuration, status of OS
>> level updates, "are certain processes running or not" as well as
>> basic parameters like CPU or IO load. as in stuff you cannot know
>> usless you have it exported through "some" port.
>
> Many of those are monitored on the server one way or another,
> through a hardware card accessible only to the DBAs. The card sends
> an email to the DBAs for any sort of distress, including impending
> or actual drive failure, ambient temperature out of bounds, internal
> or external power out of bounds, etc. OS updates are managed by the
> DBAs through scripts. Ideally we would tie these in to our opcenter
> software, which displays status through hundreds of "LED" boxes on
> big plasma displays in our support areas (and can send emails and
> jabber messages when things get to a bad state), but since the
> messages are getting to the right people in a timely manner, this is
> a low priority as far as monitoring enhancement requests go.

well a lot of people (including myself) consider it a necessity to
aggregate all that stuff in your system monitoring, only that way you
can guarantee proper dependency handling (ie no need to page for
"webserver not running" if the whole server is down).
There is also a case to be made for statistics tracking and long term
monitoring of stuff.

>
> Only the DBAs have OS logins to database servers. Monitoring
> software must deal with application ports (which have to be open
> anyway, so that doesn't add any security risk). Since the hardware
> monitoring doesn't know about file systems, and the disk space on
> database servers is primarily an issue for the database, it made
> sense to us to add the ability to check the space available to the
> database through a database connection. Hence, fsutil.

still seems very backwards - there is much much more than can only be
monitored from within the OS(and not from an external
iLO/RSA/IMM/DRAC/whatever) that you cannot really do from within the
database (or any other application) so I'm still puzzled...

Stefan

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-16 23:53:54
Message-ID:	4B525192.6000103@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> I'd happily write a patch to handle all that if I thought it would be
> accepted. I fear that the whole approach will be considered a bit too
> hackish and get rejected on that basis though. Not really sure of a
> "right" way to handle this though. Anything better is going to be more
> complicated because it requires passing more information into the
> archiver, with little gain for that work beyond improving the quality of
> this diagnostic routine. And I think most people would find what I
> described above useful enough.

Yeah, I think we should focus right now on "what monitoring can we get
into this version without holding up release?" Your proposal sounds
like a good one in that respect.

In future versions, I think we'll want a host of granular data on including:

* amount of *time* since last successful archive (this would be a good
trigger for alerts)
* number of failed archive attempts
* number of archive files awaiting processing (presumably monitored by
the slave)
* last archive file processed by the slave, and when
* for HS: frequency and length of conflict delays in log processing, as
a stat
* for HS: number of query cancels due to write/lock conflicts from the
master, as a stat

However, *all* of the above can wait for the next version, especially
since by then we'll have user feedback from the field on required
monitoring. If we try to nail this all down now, not only will it delay
the release, but we'll get it wrong and have to re-do it anyway. Release
early and often, y'know?

I think it's key to keep our data as granular and low-level as possible;
with good low-level data people can write good tools, but if we
over-summarize they can't. Also, it would be nice to have all of our
archiving stuff grouped into something like pg_stat_archive rather than
being a bunch of disconnected functions.

--Josh Berkus

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-01-18 03:03:06
Message-ID:	3f0b79eb1001171903u3605d960w5ffbde27737b4377@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jan 17, 2010 at 8:53 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> * amount of *time* since last successful archive (this would be a good
> trigger for alerts)
> * number of failed archive attempts
> * number of archive files awaiting processing (presumably monitored by
> the slave)
> * last archive file processed by the slave, and when

Are these for warm-standby, not SR? At least SR isn't so much
involved in WAL archiving, i.e, WAL is sent to the standby by
walsender instead of an archiver.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-03-20 08:49:23
Message-ID:	1269074963.3556.1375.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2010-01-14 at 17:33 +0900, Fujii Masao wrote:

> I added two new functions;
>
> (1) pg_last_xlog_receive_location() reports the last WAL location received
> and synced by walreceiver. If streaming replication is still in progress
> this will increase monotonically. If streaming replication has completed
> then this value will remain static at the value of the last WAL record
> received and synced. When the server has been started without a streaming
> replication then the return value will be InvalidXLogRecPtr (0/0).
>
> (2) pg_last_xlog_replay_location() reports the last WAL location replayed
> during recovery. If recovery is still in progress this will increase
> monotonically. If recovery has completed then this value will remain
> static at the value of the last WAL record applied. When the server has
> been started normally without a recovery then the return value will be
> InvalidXLogRecPtr (0/0).

I just noticed that these functions have almost the same name as
functions I wrote for Hot Standby and Heikki removed from that patch.
The function code and docs are 99% identical.

I'm happy that the code was used and it is BSD, though it does seem
strange to have this credited to others in the release notes.

>From May 2 2009 the patch included

+ <entry>
+
<literal><function>pg_last_recovered_xlog_location</function>()</literal>
+ </entry>
+ <entry><type>text</type></entry>
+ <entry>Returns the transaction log location of the last WAL
record
+ in the current recovery. If recovery is still in progress this
+ will increase monotonically. If recovery is complete then this value
will
+ remain static at the value of the last transaction applied during
that
+ recovery. When the server has been started normally this will
return
+ InvalidXLogRecPtr (0/0).
+ (zero).
+ </entry>

with code

+ /*
+ * Returns xlog location of last recovered WAL record.
+ */
+ Datum
+ pg_last_recovered_xlog_location(PG_FUNCTION_ARGS)
+ {
+ char location[MAXFNAMELEN];
+
+ {
+ /* use volatile pointer to prevent code rearrangement */
+ volatile XLogCtlData *xlogctl = XLogCtl;
+
+ SpinLockAcquire(&xlogctl->info_lck);
+
+ LastRec = xlogctl->recoveryLastRecPtr;
+
+ SpinLockRelease(&xlogctl->info_lck);
+ }
+
+ snprintf(location, sizeof(location), "%X/%X",
+ LastRec.xlogid, LastRec.xrecoff);
+ PG_RETURN_TEXT_P(cstring_to_text(location));
+ }

--
Simon Riggs www.2ndQuadrant.com

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Greg Smith <greg(at)2ndQuadrant(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-03-22 22:56:33
Message-ID:	201003222256.o2MMuXQ18356@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Thu, 2010-01-14 at 17:33 +0900, Fujii Masao wrote:
>
> > I added two new functions;
> >
> > (1) pg_last_xlog_receive_location() reports the last WAL location received
> > and synced by walreceiver. If streaming replication is still in progress
> > this will increase monotonically. If streaming replication has completed
> > then this value will remain static at the value of the last WAL record
> > received and synced. When the server has been started without a streaming
> > replication then the return value will be InvalidXLogRecPtr (0/0).
> >
> > (2) pg_last_xlog_replay_location() reports the last WAL location replayed
> > during recovery. If recovery is still in progress this will increase
> > monotonically. If recovery has completed then this value will remain
> > static at the value of the last WAL record applied. When the server has
> > been started normally without a recovery then the return value will be
> > InvalidXLogRecPtr (0/0).
>
> I just noticed that these functions have almost the same name as
> functions I wrote for Hot Standby and Heikki removed from that patch.
> The function code and docs are 99% identical.
>
> I'm happy that the code was used and it is BSD, though it does seem
> strange to have this credited to others in the release notes.

Sorry, release notes updated:

Add <link
linkend="functions-recovery-info-table"><function>pg_last_xlog_receive_location()</></link>
and <function>pg_last_xlog_replay_location()</>, which
can be used to monitor standby server <acronym>WAL</>
activity (Simon)

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-03-23 01:36:19
Message-ID:	3f0b79eb1003221836o32db91bbocb7b8abf06dbae99@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 23, 2010 at 7:56 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Sorry, release notes updated:
>
> Add <link
> linkend="functions-recovery-info-table"><function>pg_last_xlog_receive_location()</></link>
> and <function>pg_last_xlog_replay_location()</>, which
> can be used to monitor standby server <acronym>WAL</>
> activity (Simon)

Umm... though I'm not sure the policy about credit, I think that
three names should be put down with.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Streaming replication status
Date:	2010-03-23 02:29:00
Message-ID:	201003230229.o2N2T0l25578@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Fujii Masao wrote:
> On Tue, Mar 23, 2010 at 7:56 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Sorry, release notes updated:
> >
> > ? ? ? ? ? ? ?Add <link
> > ? ? ? ? ? ? ?linkend="functions-recovery-info-table"><function>pg_last_xlog_receive_location()</></link>
> > ? ? ? ? ? ? ?and <function>pg_last_xlog_replay_location()</>, which
> > ? ? ? ? ? ? ?can be used to monitor standby server <acronym>WAL</>
> > ? ? ? ? ? ? ?activity (Simon)
>
> Umm... though I'm not sure the policy about credit, I think that
> three names should be put down with.

OK, all three are there now:

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do