data loss with pg_standby when doing a controlled failover

Lists: pgsql-bugs
From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: pgsql-bugs(at)postgresql(dot)org
Subject: data loss with pg_standby when doing a controlled failover
Date: 2009-04-06 11:37:42
Message-ID: 49D9E986.8010604@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Running 8.3.7, I have a warm standby configuration with a
archive_timeout of 10min.

It's obvious that there's a 10min period where data could be lost if the
master fails and the warm standby server has to take over. What's not
obvious is that this is true even if the master server is shut down
regularly, because it will not write out a last log segment to the
archive. As a consequence, when doing a controlled failover (for
maintenance purposes or so) all data changed after the last archive copy
will be lost.
IMHO this should be mentioned in the docs explicitly (I find it quite
surprising that data can be lost even if the system is shutdown
correctly), or better when shutting down the postmaster should spit all
log segments containing all changes when archiving is on so the warm
standby server can catch up.

Regards.
Andreas


From: Scott Mead <scott(dot)lists(at)enterprisedb(dot)com>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: data loss with pg_standby when doing a controlled failover
Date: 2009-04-06 13:42:15
Message-ID: d3ab2ec80904060642p6eb20a49u9aa952abc3f133b8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Mon, Apr 6, 2009 at 7:37 AM, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>wrote:

> Running 8.3.7, I have a warm standby configuration with a
> archive_timeout of 10min.
>
> It's obvious that there's a 10min period where data could be lost if the
> master fails and the warm standby server has to take over. What's not
> obvious is that this is true even if the master server is shut down
> regularly, because it will not write out a last log segment to the
> archive. As a consequence, when doing a controlled failover (for
> maintenance purposes or so) all data changed after the last archive copy
> will be lost.
> IMHO this should be mentioned in the docs explicitly (I find it quite
> surprising that data can be lost even if the system is shutdown
> correctly), or better when shutting down the postmaster should spit all
> log segments containing all changes when archiving is on so the warm
> standby server can catch up.

You make an excellent point. If you're looking for a way to mitigate this
risk, run:

select pg_switch_xlog() ;

Before shutting down.

--Scott

>
>
> Regards.
> Andreas
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>


From: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: data loss with pg_standby when doing a controlled failover
Date: 2009-04-06 13:51:33
Message-ID: 1d4e0c10904060651m18049be5qae1d09a42904d18b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Mon, Apr 6, 2009 at 1:37 PM, Andreas Pflug <pgadmin(at)pse-consulting(dot)de> wrote:
> IMHO this should be mentioned in the docs explicitly (I find it quite
> surprising that data can be lost even if the system is shutdown
> correctly), or better when shutting down the postmaster should spit all
> log segments containing all changes when archiving is on so the warm
> standby server can catch up.

See also this thread which might be interesting for you:
http://archives.postgresql.org/message-id/3f0b79eb0903242329j12865d55s348f5c873a956e71@mail.gmail.com

--
Guillaume


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Scott Mead <scott(dot)lists(at)enterprisedb(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: data loss with pg_standby when doing a controlled failover
Date: 2009-04-06 14:13:26
Message-ID: 49DA0E06.5030309@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Scott Mead wrote:
>
>
>
> On Mon, Apr 6, 2009 at 7:37 AM, Andreas Pflug
> <pgadmin(at)pse-consulting(dot)de <mailto:pgadmin(at)pse-consulting(dot)de>> wrote:
>
> Running 8.3.7, I have a warm standby configuration with a
> archive_timeout of 10min.
>
> It's obvious that there's a 10min period where data could be lost
> if the
> master fails and the warm standby server has to take over. What's not
> obvious is that this is true even if the master server is shut down
> regularly, because it will not write out a last log segment to the
> archive. As a consequence, when doing a controlled failover (for
> maintenance purposes or so) all data changed after the last
> archive copy
> will be lost.
> IMHO this should be mentioned in the docs explicitly (I find it quite
> surprising that data can be lost even if the system is shutdown
> correctly), or better when shutting down the postmaster should
> spit all
> log segments containing all changes when archiving is on so the warm
> standby server can catch up.
>
>
>
> You make an excellent point. If you're looking for a way to mitigate
> this risk, run:
>
> select pg_switch_xlog() ;
>
> Before shutting down.
Sort of, unless some other user succeeds to commit a transaction after
pg_switch_xlog, and before the database ceases operation.

My "graceful failover" procedure now includes this workaround:
- shutdown server
- restart server with --listen_addresses='' to prevent other users to
connect (there are no local users on the server machine)
- pg_switch_xlog()
- shutdown finally
- let the warm server continue

Regards,
Andreas


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: data loss with pg_standby when doing a controlled failover
Date: 2009-04-06 14:29:13
Message-ID: 49DA11B9.5010500@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Guillaume Smet wrote:
> On Mon, Apr 6, 2009 at 1:37 PM, Andreas Pflug <pgadmin(at)pse-consulting(dot)de> wrote:
>
>> IMHO this should be mentioned in the docs explicitly (I find it quite
>> surprising that data can be lost even if the system is shutdown
>> correctly), or better when shutting down the postmaster should spit all
>> log segments containing all changes when archiving is on so the warm
>> standby server can catch up.
>>
>
> See also this thread which might be interesting for you:
> http://archives.postgresql.org/message-id/3f0b79eb0903242329j12865d55s348f5c873a956e71@mail.gmail.com
>
It is, though not related to this problem. I'd expect pg_standby's
default behaviour to be like the pseudocode's in the warm-standby
documentation. To me, it's kind of unexpected that it won't continue
restoring if the trigger file is present (as Kevin said, what's the use
case for the current behaviour?).

Regards,
Andreas


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: Scott Mead <scott(dot)lists(at)enterprisedb(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: data loss with pg_standby when doing a controlled failover
Date: 2009-04-07 01:55:36
Message-ID: 3f0b79eb0904061855k48b61592w1937b90e5f181275@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi,

On Mon, Apr 6, 2009 at 11:13 PM, Andreas Pflug
<pgadmin(at)pse-consulting(dot)de> wrote:
> Scott Mead wrote:
>>
>>
>>
>> On Mon, Apr 6, 2009 at 7:37 AM, Andreas Pflug
>> <pgadmin(at)pse-consulting(dot)de <mailto:pgadmin(at)pse-consulting(dot)de>> wrote:
>>
>>     Running 8.3.7, I have a warm standby configuration with a
>>     archive_timeout of 10min.
>>
>>     It's obvious that there's a 10min period where data could be lost
>>     if the
>>     master fails and the warm standby server has to take over. What's not
>>     obvious is that this is true even if the master server is shut down
>>     regularly, because it will not write out a last log segment to the
>>     archive. As a consequence, when doing a controlled failover (for
>>     maintenance purposes or so) all data changed after the last
>>     archive copy
>>     will be lost.
>>     IMHO this should be mentioned in the docs explicitly (I find it quite
>>     surprising that data can be lost even if the system is shutdown
>>     correctly), or better when shutting down the postmaster should
>>     spit all
>>     log segments containing all changes when archiving is on so the warm
>>     standby server can catch up.
>>
>>
>>
>> You make an excellent point.  If you're looking for a way to mitigate
>> this risk, run:
>>
>>     select pg_switch_xlog() ;
>>
>>    Before shutting down.
> Sort of, unless some other user succeeds to commit a transaction after
> pg_switch_xlog, and before the database ceases operation.
>
> My "graceful failover" procedure now includes this workaround:
> - shutdown server
> - restart server with --listen_addresses='' to prevent other users to
> connect (there are no local users on the server machine)
> - pg_switch_xlog()
> - shutdown finally
> - let the warm server continue

What if new xlogs are generated by autovacuum or bgwriter
between pg_switch_xlog and final shutdown? Those xlogs
can be ignored?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center