Streaming replication on 9.1-beta2 after pg_restore is very slow

From: David Hartveld <David(dot)Hartveld(at)mendix(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Streaming replication on 9.1-beta2 after pg_restore is very slow
Date: 2011-07-06 15:54:17
Message-ID: 0317654684C3CF48B06D8FF5AE5D2EE0CCE0@Win-Exchange-02.MENDIXDOMAIN.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi all,

I am experimenting with (synchronous) streaming replication on postgresql 9.1 beta 2 and am having performance problems. I have initially set up an (asynchronous) streaming replication master cluster with postgresql 9.0, which streamed to a single slave cluster. This seemed to work quite well. Then I've mostly copied the configuration to a 9.1 beta 2 cluster (master and slave) to see how synchronous replication would behave.

The master cluster, when empty after an initdb (pg_createcluster on debian) seems to properly stream changes to one or more slave clusters when correctly set up. I watch the master and slave with pg_current_xlog_location() on master and pg_last_xlog_receive_location() and pg_last_xlog_replay_location(). It seems that slaves pick up changes, such as a simple database creation, or updating a role password, or adding a role. But when I then do a restoredb on the master, the slaves quickly fall behind and only very slowly catch up (after maybe an hour or something...)

The log on the slave is filled with statements similar to the following:
LOG: streaming replication successfully connected to primary
LOG: record with zero length at 0/9B7A010
FATAL: terminating walreceiver process due to administrator command

The slave log file also contains the following line a number of times (with the numbers of course a bit different every time):
LOG: invalid magic number 0000 in log file 0, segment 9, offset 10878976

The log on the master contains several lines with:
LOG: could not send data to client: Connection reset by peer

Is there possibly a known issue with the beta, or do I have to configure my cluster differently for 9.1? I'm a bit lost, and would appreciate any comments. Below, I've added server configurations. I'm running postgresql from debian packages from the experimental suite.

Thanks,
David Hartveld
--
== Master configuration ==
"version";"PostgreSQL 9.1beta2 on x86_64-pc-linux-gnu, compiled by gcc-4.6.real (Debian 4.6.0-12) 4.6.1 20110608 (prerelease), 64-bit"
"archive_command";"cp %p /walshipping/9.1/sr-master/%f"
"archive_mode";"on"
"bytea_output";"escape"
"client_encoding";"UNICODE"
"external_pid_file";"/var/run/postgresql/9.1-sr-master.pid"
"lc_collate";"en_US.UTF-8"
"lc_ctype";"en_US.UTF-8"
"listen_addresses";"*"
"log_line_prefix";"%t "
"max_connections";"100"
"max_stack_depth";"2MB"
"max_wal_senders";"3"
"port";"5434"
"server_encoding";"UTF8"
"shared_buffers";"96MB"
"ssl";"on"
"synchronous_standby_names";"*"
"TimeZone";"localtime"
"unix_socket_directory";"/var/run/postgresql"
"wal_buffers";"3MB"
"wal_keep_segments";"32"
"wal_level";"hot_standby"

== Slave configuration ==
"version";"PostgreSQL 9.1beta2 on x86_64-pc-linux-gnu, compiled by gcc-4.6.real (Debian 4.6.0-12) 4.6.1 20110608 (prerelease), 64-bit"
"bytea_output";"escape"
"client_encoding";"UNICODE"
"external_pid_file";"/var/run/postgresql/9.1-sr-slave0.pid"
"hot_standby";"on"
"lc_collate";"en_US.UTF-8"
"lc_ctype";"en_US.UTF-8"
"listen_addresses";"*"
"log_line_prefix";"%t "
"max_connections";"100"
"max_stack_depth";"2MB"
"port";"5434"
"server_encoding";"UTF8"
"shared_buffers";"96MB"
"ssl";"on"
"TimeZone";"localtime"
"unix_socket_directory";"/var/run/postgresql"
"wal_buffers";"3MB"

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Emi Lu 2011-07-06 16:02:16 ERROR: malformed record literal: "", DETAIL: Missing left parenthesis?
Previous Message Sergey Urlin 2011-07-06 15:45:42 exclude user mappings, foreign server from dump