答复: [GENERAL] [streaming replication] 9.1.3 streaming replication bug ?

From: 乔志强 <qiaozhiqiang(at)leadcoretech(dot)com>
To: "Condor" <condor(at)stz-bg(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: 答复: [GENERAL] [streaming replication] 9.1.3 streaming replication bug ?
Date: 2012-04-10 08:09:17
Message-ID: E81554BCB8813E49A8916AACC0503A850B59F270@lc-shmail3.SHANGHAI.LEADCORETECH.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

>I see if no standby connect to master when synchronous_standby_names =
> '*', all commit will delay to standby connect to master. It is good.

So I think the commit is sync between master and standby,

But why the master delete the WAL segment before the standby commit when the standby connected ?

-----邮件原件-----
发件人: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-owner(at)postgresql(dot)org] 代表 Condor
发送时间: 2012年4月9日 21:33
收件人: pgsql-general(at)postgresql(dot)org
主题: Re: [GENERAL] [streaming replication] 9.1.3 streaming replication bug ?

On 09.04.2012 13:33, 乔志强 wrote:
> I use postgresql-9.1.3-1-windows-x64.exe on windows 2008 R2 x64.
>
> 1 master and 1 standby. The standby is a synchronous standby use
> streaming replication (synchronous_standby_names = '*', archive_mode =
> off), the master output:
> standby "walreceiver" is now the synchronous standby with
> priority 1 the standby output:
> LOG: streaming replication successfully connected to primary
>
> Then run the test program to write and commit large blob(10 to 1000 MB
> bytes rand size) to master server use 40 threads(40 sessions) in loop,
> The Master and standby is run on the same machine, and the client run
> on another machine with 100 mbps network.
>
>
> But after some minutes the master output:
> requested WAL segment XXX has already been removed the standby
> output:
> FATAL: could not receive data from WAL stream: FATAL:
> requested WAL segment XXX
> has already been removed
>
>
> Question:
> Why the master deletes the WAL segment before send to standby in
> synchronous mode? It is a streaming replication bug ?
>
>
> I see if no standby connect to master when synchronous_standby_names =
> '*', all commit will delay to standby connect to master. It is good.
>
> Use a bigger wal_keep_segments? But I think the master should keep
> all WAL segments not sent to online standby (sync or async).
> wal_keep_segments shoud be only for offline standby.
>
> If use synchronous_standby_names for sync standby, if no online
> standby, all commit will delay to standby connect to master, So
> wal_keep_segments is only for offline async standby actually.
>
>
>
> ////////////////////////////////////////
>
> master server output:
> LOG: database system was interrupted; last known up at 2012-03-30
> 15:37:03 HKT
> LOG: database system was not properly shut down; automatic recovery
> in progress
>
> LOG: redo starts at 0/136077B0
> LOG: record with zero length at 0/17DF1E10
> LOG: redo done at 0/17DF1D98
> LOG: last completed transaction was at log time 2012-03-30
> 15:37:03.148+08
> FATAL: the database system is starting up
> LOG: database system is ready to accept connections
> LOG: autovacuum launcher started
> ///////////////////// the standby is a synchronous standby
> LOG: standby "walreceiver" is now the synchronous standby with
> priority 1
> /////////////////////
> LOG: checkpoints are occurring too frequently (16 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (23 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (24 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (20 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (22 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> FATAL: requested WAL segment 000000010000000000000032 has already
> been removed
> FATAL: requested WAL segment 000000010000000000000032 has already
> been removed
> FATAL: requested WAL segment 000000010000000000000032 has already
> been removed
> LOG: checkpoints are occurring too frequently (8 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> FATAL: requested WAL segment 000000010000000000000032 has already
> been removed
>
>
>
> ////////////////////////
> standby server output:
> LOG: database system was interrupted while in recovery at log time
> 2012-03-30 1
> 4:44:31 HKT
> HINT: If this has occurred more than once some data might be
> corrupted and you might need to choose an earlier recovery target.
> LOG: entering standby mode
> LOG: redo starts at 0/16E4760
> LOG: consistent recovery state reached at 0/12D984D8
> LOG: database system is ready to accept read only connections
> LOG: record with zero length at 0/17DF1E68
> LOG: invalid magic number 0000 in log file 0, segment 50, offset
> 6946816
> LOG: streaming replication successfully connected to primary
> FATAL: could not receive data from WAL stream: FATAL: requested WAL
> segment 00
> 0000010000000000000032 has already been removed

Well,
that is not a bug, just activate archive_mode = on on the master server and set also wal_keep_segments = 1000 for example to avoid that situation. I had the same situation, after digging on search engines that was recomended settings. Well I forgot real reason why, may be was too slow sending / receiving data from master / sleave, but this fix the problem.

Regards,
Condor

--
Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

In response to

Browse pgsql-general by date

  From Date Subject
Next Message mika 2012-04-10 08:52:02 Is this doable using Postgresql crosstab or some other function?
Previous Message raghu ram 2012-04-10 07:59:38 Re: PostgreSQL pgstat wait timeout question

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2012-04-10 08:22:43 Re: Deprecating non-select rules (was Re: Last gasp)
Previous Message Boszormenyi Zoltan 2012-04-10 07:35:21 Re: ECPG FETCH readahead