Re: Streaming Recovery - Automated Monitoring

From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: Karl Denninger <karl(at)denninger(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Streaming Recovery - Automated Monitoring
Date: 2010-10-03 06:34:56
Message-ID: 4CA82410.4020300@lelarge.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Le 03/10/2010 07:07, Karl Denninger a écrit :
> On 10/2/2010 11:40 PM, Rajesh Kumar Mallah wrote:
>>
>> I hope u checked point #11
>> http://wiki.postgresql.org/wiki/Streaming_Replication#How_to_Use
>>
>> * *11.* You can calculate the replication lag by comparing the
>> current WAL write location on the primary with the last WAL
>> location received/replayed by the standby. They can be retrieved
>> using /pg_current_xlog_location/ on the primary and the
>> /pg_last_xlog_receive_location///pg_last_xlog_replay_location/
>> on the standby, respectively.
>>
>> $ psql -c "SELECT pg_current_xlog_location()" -h192.168.0.10 (primary host)
>> pg_current_xlog_location
>> --------------------------
>> 0/2000000
>> (1 row)
>>
>> $ psql -c "select pg_last_xlog_receive_location()" -h192.168.0.20 (standby host)
>> pg_last_xlog_receive_location
>> -------------------------------
>> 0/2000000
>> (1 row)
>>
>> $ psql -c "select pg_last_xlog_replay_location()" -h192.168.0.20 (standby host)
>> pg_last_xlog_replay_location
>> ------------------------------
>> 0/2000000
>> (1 row)
>>
>> Regds
>> Rajesh Kumar Mallah.
>
> Yes, I did.
>
> Now how do I get an arithmetic difference between the two? There will
> (usually) be a small difference between the master and slave on a busy
> system - what I want to do is query both and if the difference in their
> locations is greater than some defined size, start raising hell (e.g.
> sending SMS to people, etc)
>
> I can SEE the difference, but I don't see a way to COMPUTE a difference,
> and there does not appear to be a function that will accept the log file
> location as an argument for conversion - the one documented for offsets
> (which might otherwise work) does not work on the slave as I noted.
>
> With Slony there was a set of system tables that would tell me how many
> unapplied changes were in the queue. From this I could determine health
> - if the number was more than some reasonably-small amount, something
> was broken and alarms were to be raised.
>
> I'm looking for a way to implement the same sort of functionality here.
>
> ticker=# select pg_last_xlog_replay_location();
> pg_last_xlog_replay_location
> ------------------------------
> 37A/327D1888
> (1 row)
>
> ticker=# select pg_current_xlog_location();
> pg_current_xlog_location
> --------------------------
> 37A/3280DCB8
> (1 row)
>
> How do I get an arithmetic difference between these two
> programmatically, and will such always be monoatomically increasing
> (that is, will they ever roll over, thereby giving me a potential
> NEGATIVE difference?)
>

pgPool-II does that computation. You should check that in its source
code. File pool_worker_child.c, function check_replication_time_lag().
It creates a long value from the LSN returned by these functions. Here
is the computation:

lsn = xlogid * 16 * 1024 * 1024 * 255 + xrecoff;

In your example, xlogid is 37A and xrecoff is 327D1888 on the slave and
3280DCB8 on the master. The hexadecimal values were first converted to
their unsigned decimal integer representation (same file, function
text_to_lsn()).

You should really get a look at this file. It's probably a better
explanation than what I'm trying to do :)

--
Guillaume
http://www.postgresql.fr
http://dalibo.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Gurjeet Singh 2010-10-03 08:28:02 Re: error while autovacuuming
Previous Message Magnus Hagander 2010-10-03 06:28:31 Re: Streaming Recovery - Automated Monitoring