Re: Checking for stale reads on hot standby

Lists: pgsql-general
From: Yang Zhang <yanghatespam(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Checking for stale reads on hot standby
Date: 2010-09-26 23:51:38
Message-ID: AANLkTi=c75UAZ3u787fY2LKHQK4tJszWsm1xv0ceQ+uq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Say you have an application using PG asynchronous streaming
replication to some hot standbys, to distribute the read load. The
application itself is a typical web application consisting of multiple
servers, serving a number of sessions (perhaps belonging to different
users), and the workload is OLTP-ish, with each session continually
issuing a bunch of transactions. To guarantee session timeline
consistency for clients of the application, you want to make sure that
they can read data that's at least as new as anything they've
read/written previously, never traveling back in time.

With asynchronous replication, after seeing a new version of the data
from one standby, you may see an older version from a subsequent query
to another standby. The question: what are some ways to provide this
form of consistency in the context of PG asynchronous replication?

Is the standard/recommended approach to use a sequence representing
the global database version? Here, the application is responsible for
incrementing this from update transactions. In read transactions,
check that the sequence value is >= the session's highest-seen-value,
and raise the latter if necessary.

Thanks in advance.
--
Yang Zhang
http://yz.mit.edu/


From: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To: Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Checking for stale reads on hot standby
Date: 2010-09-27 00:09:31
Message-ID: AANLkTimofuRW2=J95cuTDeqGr01+oX3jvX_HtO30mwDV@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Sep 27, 2010 at 1:51 AM, Yang Zhang <yanghatespam(at)gmail(dot)com> wrote:

> Say you have an application using PG asynchronous streaming
> replication to some hot standbys, to distribute the read load. The
> application itself is a typical web application consisting of multiple
> servers, serving a number of sessions (perhaps belonging to different
> users), and the workload is OLTP-ish, with each session continually
> issuing a bunch of transactions. To guarantee session timeline
> consistency for clients of the application, you want to make sure that
> they can read data that's at least as new as anything they've
> read/written previously, never traveling back in time.
>
> With asynchronous replication, after seeing a new version of the data
> from one standby, you may see an older version from a subsequent query
> to another standby. The question: what are some ways to provide this
> form of consistency in the context of PG asynchronous replication?
>
> Is the standard/recommended approach to use a sequence representing
> the global database version? Here, the application is responsible for
> incrementing this from update transactions. In read transactions,
> check that the sequence value is >= the session's highest-seen-value,
> and raise the latter if necessary.
>
>
See the nuggets hidden in section 25.2.5.2. "Monitoring" at
http://www.postgresql.org/docs/9.0/static/warm-standby.html#STREAMING-REPLICATION

After an UPDATE, your application can cache the info from
'pg_current_xlog_location()' result on the primary and then compare that
with the result of 'pg_last_xlog_receive_location()' on the standby to see
if it is seeing fresh enough data.

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh(dot)gurjeet(at){ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Cc: Yang Zhang <yanghatespam(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Checking for stale reads on hot standby
Date: 2010-09-27 00:20:29
Message-ID: AANLkTi=jgnO5neqzcoJ3+zjuP0Z5BxMFv0+DcxG+SGCo@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Sep 27, 2010 at 9:09 AM, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com> wrote:
> See the nuggets hidden in section 25.2.5.2. "Monitoring" at
> http://www.postgresql.org/docs/9.0/static/warm-standby.html#STREAMING-REPLICATION
>
> After an UPDATE, your application can cache the info from
> 'pg_current_xlog_location()' result on the primary and then compare that
> with the result of  'pg_last_xlog_receive_location()' on the standby to see
> if it is seeing fresh enough data.

Yep, but since recovery might fall behind WAL receiving,
pg_last_xlog_replay_location should be called instead of
pg_last_xlog_receive_location.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Guillaume Lelarge <guillaume(at)lelarge(dot)info>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, Yang Zhang <yanghatespam(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Checking for stale reads on hot standby
Date: 2010-09-27 05:06:43
Message-ID: 4CA02663.9010208@lelarge.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Le 27/09/2010 02:20, Fujii Masao a écrit :
> On Mon, Sep 27, 2010 at 9:09 AM, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com> wrote:
>> See the nuggets hidden in section 25.2.5.2. "Monitoring" at
>> http://www.postgresql.org/docs/9.0/static/warm-standby.html#STREAMING-REPLICATION
>>
>> After an UPDATE, your application can cache the info from
>> 'pg_current_xlog_location()' result on the primary and then compare that
>> with the result of 'pg_last_xlog_receive_location()' on the standby to see
>> if it is seeing fresh enough data.
>
> Yep, but since recovery might fall behind WAL receiving,
> pg_last_xlog_replay_location should be called instead of
> pg_last_xlog_receive_location.
>

pgPool-II can do that automatically for you in load balancing mode, and
not use a standby node if it lags too much.

--
Guillaume
http://www.postgresql.fr
http://dalibo.com