Lists: | pgsql-hackers |
---|
From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | How can we tell how far behind the standby is? |
Date: | 2010-11-05 18:46:08 |
Message-ID: | 4CD450F0.9090405@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Folks,
I'm continuing in my efforts now to document how to deploy and manage
replication on our wiki. One of the things a DBA needs to do is to use
pg_current_xlog_location() (and related functions) to check how far
behind the master the standby is.
However, there's some serious problems with that:
(1) comparing these numbers is quite mathematically complex -- and, for
that matter, undocumented.
(2) pg_rotate_xlog and/or archive_timeout will create a "gap" in the
xlog positions, quite a large one if it happens near the beginning of a
file. There is no way for any monitoring on the standby to tell the
difference between a gap created by forced rotation as opposed to being
most of a file behind, until the next record shows up. Hello, nagios
false alerts!
(3) There is no easy way to relate a difference in log positions to an
amount of time.
I'll work on some tools to make this a bit more palatable, but I
disagree with earlier assertions that we have the replication monitoring
"done". There's still a *lot* of work to do.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: How can we tell how far behind the standby is? |
Date: | 2010-11-06 00:39:04 |
Message-ID: | AANLkTi=wmjqDHVNjLLRVsQgtkZx8TmCw8-8oBDt2Ejqx@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, Nov 5, 2010 at 2:46 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> I'm continuing in my efforts now to document how to deploy and manage
> replication on our wiki. One of the things a DBA needs to do is to use
> pg_current_xlog_location() (and related functions) to check how far
> behind the master the standby is.
>
> However, there's some serious problems with that:
>
> (1) comparing these numbers is quite mathematically complex -- and, for
> that matter, undocumented.
>
> (2) pg_rotate_xlog and/or archive_timeout will create a "gap" in the
> xlog positions, quite a large one if it happens near the beginning of a
> file. There is no way for any monitoring on the standby to tell the
> difference between a gap created by forced rotation as opposed to being
> most of a file behind, until the next record shows up. Hello, nagios
> false alerts!
>
> (3) There is no easy way to relate a difference in log positions to an
> amount of time.
>
> I'll work on some tools to make this a bit more palatable, but I
> disagree with earlier assertions that we have the replication monitoring
> "done". There's still a *lot* of work to do.
I've heard the same complaint, and I agree with your concerns.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From: | Rob Wultsch <wultsch(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: How can we tell how far behind the standby is? |
Date: | 2010-11-06 03:28:24 |
Message-ID: | AANLkTinHYBSn7NpUEuccHb27X95ELwLNUiY7UAZxtZs2@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, Nov 5, 2010 at 5:39 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Nov 5, 2010 at 2:46 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> I'm continuing in my efforts now to document how to deploy and manage
>> replication on our wiki. One of the things a DBA needs to do is to use
>> pg_current_xlog_location() (and related functions) to check how far
>> behind the master the standby is.
>>
>> However, there's some serious problems with that:
>>
>> (1) comparing these numbers is quite mathematically complex -- and, for
>> that matter, undocumented.
>>
>> (2) pg_rotate_xlog and/or archive_timeout will create a "gap" in the
>> xlog positions, quite a large one if it happens near the beginning of a
>> file. There is no way for any monitoring on the standby to tell the
>> difference between a gap created by forced rotation as opposed to being
>> most of a file behind, until the next record shows up. Hello, nagios
>> false alerts!
>>
>> (3) There is no easy way to relate a difference in log positions to an
>> amount of time.
>>
>> I'll work on some tools to make this a bit more palatable, but I
>> disagree with earlier assertions that we have the replication monitoring
>> "done". There's still a *lot* of work to do.
>
> I've heard the same complaint, and I agree with your concerns.
"All this has happened before, and all of it will happen again."
At this point pg has the equivalent of MySQL's "show slave status" in
4.0. The output of that change significantly over time:
http://dev.mysql.com/doc/refman/4.1/en/show-slave-status.html
http://dev.mysql.com/doc/refman/5.5/en/show-slave-status.html
Also of interest
http://dev.mysql.com/doc/refman/4.1/en/show-binary-logs.html
--
Rob Wultsch
wultsch(at)gmail(dot)com
From: | Brendan Jurd <direvus(at)gmail(dot)com> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: How can we tell how far behind the standby is? |
Date: | 2010-11-06 13:45:33 |
Message-ID: | AANLkTi=oVKuZb7kDr-xrqPEt3oumo8d3AiiztHyeiBj3@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 6 November 2010 05:46, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> I'm continuing in my efforts now to document how to deploy and manage
> replication on our wiki. One of the things a DBA needs to do is to use
> pg_current_xlog_location() (and related functions) to check how far
> behind the master the standby is.
>
> However, there's some serious problems with that:
>
> (1) comparing these numbers is quite mathematically complex -- and, for
> that matter, undocumented.
>
Our solution to this was to strip the slash out of the numbers and
then feed them to `bc` for comparison. The shell script for our
zabbix item looks something like this:
----
#!/bin/bash
errval=-1
primary=$(psql -At -h $1 -p $2 -c "SELECT
replace(pg_current_xlog_location(), '/', '');" postgres)
standby=$(psql -At -h $3 -p $4 -c "SELECT
replace(pg_last_xlog_receive_location(), '/', '');" postgres)
if [ -n "$primary" -a -n "$standby" ]
then
echo $(echo "ibase=16; obase=10; $primary-$standby" | bc)
else
echo $errval
fi
----
I'm posting this snippet a) in the hopes that it might help others,
and b) by way of agreement with Josh's point. Requiring every user
who wants to monitor replication to set something like this up for
themselves is ... not awesome.
Cheers,
BJ
From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: How can we tell how far behind the standby is? |
Date: | 2010-11-08 11:05:42 |
Message-ID: | AANLkTimLSd2bF8jRLoM2PkkyqChf+u6LK2bvuy_5HCN0@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sat, Nov 6, 2010 at 3:46 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> I'm continuing in my efforts now to document how to deploy and manage
> replication on our wiki. One of the things a DBA needs to do is to use
> pg_current_xlog_location() (and related functions) to check how far
> behind the master the standby is.
>
> However, there's some serious problems with that:
>
> (1) comparing these numbers is quite mathematically complex -- and, for
> that matter, undocumented.
>
> (2) pg_rotate_xlog and/or archive_timeout will create a "gap" in the
> xlog positions, quite a large one if it happens near the beginning of a
> file. There is no way for any monitoring on the standby to tell the
> difference between a gap created by forced rotation as opposed to being
> most of a file behind, until the next record shows up. Hello, nagios
> false alerts!
>
> (3) There is no easy way to relate a difference in log positions to an
> amount of time.
The patch which I'm proposing is helpful for you?
http://archives.postgresql.org/pgsql-hackers/2010-11/msg00167.php
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From: | Bernd Helmle <mailings(at)oopsware(dot)de> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: How can we tell how far behind the standby is? |
Date: | 2010-11-08 18:55:38 |
Message-ID: | 318FF7CC8303031664551D05@amenophis |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
--On 5. November 2010 11:46:08 -0700 Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> I'll work on some tools to make this a bit more palatable, but I
> disagree with earlier assertions that we have the replication monitoring
> "done". There's still a *lot* of work to do.
While getting familiar with our SR/HS infrastructure i scripted this
<https://github.com/psoo/pg_standby_status/blob/master/pg_standby_status.pl>
Not sure if it does all things right, but it helped me a lot while load
testing SR. AFAIK Magnus has a monitoring script, too, i think this one is
it:
<https://github.com/mhagander/munin-plugins/blob/master/postgres/postgres_streaming_.in>
Maybe that helps, too.
--
Thanks
Bernd
From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: How can we tell how far behind the standby is? |
Date: | 2010-11-08 22:57:43 |
Message-ID: | 4CD88067.6040804@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> The patch which I'm proposing is helpful for you?
> http://archives.postgresql.org/pgsql-hackers/2010-11/msg00167.php
Depends. Is that the timestamp on the master (when it was synced), or
the timestamp on the standby (when it was replayed)? It's only helpful
if it's the former.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com
From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: How can we tell how far behind the standby is? |
Date: | 2010-11-09 14:16:03 |
Message-ID: | AANLkTi=zK2PPrOk9KL=wdtmjfSF1i4KEPdKCyBHFHD6+@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Tue, Nov 9, 2010 at 7:57 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
>> The patch which I'm proposing is helpful for you?
>> http://archives.postgresql.org/pgsql-hackers/2010-11/msg00167.php
>
> Depends. Is that the timestamp on the master (when it was synced), or
> the timestamp on the standby (when it was replayed)? It's only helpful
> if it's the former.
It's the former. The function which I'm proposing returns the timestamp
of the last replayed commit/abort log record. That timestamp is given
when the commit/abort log record is generated in the master.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org> |
Subject: | Re: How can we tell how far behind the standby is? |
Date: | 2010-11-09 18:26:27 |
Message-ID: | 4CD99253.8040107@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> It's the former. The function which I'm proposing returns the timestamp
> of the last replayed commit/abort log record. That timestamp is given
> when the commit/abort log record is generated in the master.
That would be *extremely* helpful for all kinds of monitoring tools.
Please complete/submit this.
--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com