Re: Streaming replication - unable to stop the standby

From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication - unable to stop the standby
Date: 2010-05-03 18:22:16
Message-ID: 4BDF1458.1040807@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
>> I'm currently testing SR/HS in 9.0beta1 and I noticed that it seems
>> quite easy to end up in a situation where you have a standby that seems
>> to be stuck in:
>
>> $ psql -p 5433
>> psql: FATAL: the database system is shutting down
>
>> but not not actually shuting down ever. I ran into that a few times now
>> (mostly because I'm trying to chase a recovery issue I hit during
>> earlier testing) by simply having the master iterate between a pgbench
>> run and "idle" while simple doing pg_ctl restart in a loop on the standby.
>> I do vaguely recall some discussions of that but I thought the issue git
>> settled somehow?
>
> Hm, I haven't pushed this hard but "pg_ctl stop" seems to stop the
> standby for me. Which subprocesses of the slave postmaster are still
> around? Could you attach to them with gdb and get stack traces?

it is not always failing to shut down - it only fails sometimes - I have
not exactly pinpointed yet what it is causing this but the standby is in
a weird state now:

* the master is currently idle
* the standby has no connections at all

logs from the standby:

FATAL: the database system is shutting down
FATAL: the database system is shutting down
FATAL: replication terminated by primary server
LOG: restored log file "000000010000001900000054" from archive
cp: cannot stat `/mnt/space/wal-archive/000000010000001900000055': No
such file or directory
LOG: record with zero length at 19/55000078
cp: cannot stat `/mnt/space/wal-archive/000000010000001900000055': No
such file or directory
FATAL: could not connect to the primary server: could not connect to
server: Connection refused
Is the server running on host "localhost" and accepting
TCP/IP connections on port 5432?
could not connect to server: Connection refused
Is the server running on host "localhost" and accepting
TCP/IP connections on port 5432?

cp: cannot stat `/mnt/space/wal-archive/000000010000001900000055': No
such file or directory
cp: cannot stat `/mnt/space/wal-archive/000000010000001900000055': No
such file or directory
LOG: streaming replication successfully connected to primary
FATAL: the database system is shutting down

the first two "FATAL: the database system is shutting down" are from me
trying to connect using psql after i noticed that pg_ctl failed to
shutdown the slave.
The next thing I tried was restarting the master - which lead to the
following logs and the standby noticing that and reconnecting but you
cannot actually connect...

process tree for the standby is:

29523 pts/2 S 0:00 /home/postgres9/pginst/bin/postgres -D
/mnt/space/pgdata_standby
29524 ? Ss 0:06 \_ postgres: startup process waiting for
000000010000001900000055
29529 ? Ss 0:00 \_ postgres: writer process

29835 ? Ss 0:00 \_ postgres: wal receiver process
streaming 19/55000078

Stefan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2010-05-03 18:22:31 Re: max_standby_delay considered harmful
Previous Message Robert Haas 2010-05-03 18:17:41 Re: Streaming replication - unable to stop the standby