Re: Switching timeline over streaming replication

From: Thom Brown <thom(at)linux(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Switching timeline over streaming replication
Date: 2012-12-20 23:50:46
Message-ID: CAA-aLv7UZhOtrymHpxWM1KF_XHJjfAJDsxygwn26cAkDXLFxHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20 December 2012 12:45, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>wrote:

> On 17.12.2012 15:05, Thom Brown wrote:
>
>> I just set up 120 chained standbys, and for some reason I'm seeing these
>> errors:
>>
>> LOG: replication terminated by primary server
>> DETAIL: End of WAL reached on timeline 1
>> LOG: record with zero length at 0/301EC10
>> LOG: fetching timeline history file for timeline 2 from primary server
>> LOG: restarted WAL streaming at 0/3000000 on timeline 1
>> LOG: replication terminated by primary server
>> DETAIL: End of WAL reached on timeline 1
>> LOG: new target timeline is 2
>> LOG: restarted WAL streaming at 0/3000000 on timeline 2
>> LOG: replication terminated by primary server
>> DETAIL: End of WAL reached on timeline 2
>> FATAL: error reading result of streaming command: ERROR: requested WAL
>> segment 000000020000000000000003 has already been removed
>>
>> ERROR: requested WAL segment 000000020000000000000003 has already been
>> removed
>> LOG: started streaming WAL from primary at 0/3000000 on timeline 2
>> ERROR: requested WAL segment 000000020000000000000003 has already been
>> removed
>>
>
> I just committed a patch that should make the "requested WAL segment
> 000000020000000000000003 has already been removed" errors go away. The
> trick was for walsenders to not switch to the new timeline until at least
> one record has been replayed on it. That closes the window where the
> walsender already considers the new timeline to be the latest, but the WAL
> file has not been created yet.
>

Now I'm getting this on all standbys after promoting the first standby in a
chain.

LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 1
LOG: record with zero length at 0/301EC10
LOG: fetching timeline history file for timeline 2 from primary server
LOG: restarted WAL streaming at 0/3000000 on timeline 1
FATAL: could not receive data from WAL stream:
LOG: new target timeline is 2
FATAL: could not connect to the primary server: FATAL: the database
system is in recovery mode

LOG: started streaming WAL from primary at 0/3000000 on timeline 2
TRAP: FailedAssertion("!(((sentPtr) <= (SendRqstPtr)))", File:
"walsender.c", Line: 1425)
LOG: server process (PID 19917) was terminated by signal 6: Aborted
LOG: terminating any other active server processes
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted while in recovery at log time
2012-12-20 23:41:23 GMT
HINT: If this has occurred more than once some data might be corrupted and
you might need to choose an earlier recovery target.
LOG: entering standby mode
FATAL: the database system is in recovery mode
LOG: redo starts at 0/2000028
LOG: consistent recovery state reached at 0/20000E8
LOG: database system is ready to accept read only connections
LOG: record with zero length at 0/301EC70
LOG: started streaming WAL from primary at 0/3000000 on timeline 2
LOG: unexpected EOF on standby connection

And if I restart the new primary, the first new standby connected to it
shows:

LOG: replication terminated by primary server
DETAIL: End of WAL reached on timeline 2
FATAL: error reading result of streaming command: server closed the
connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

LOG: record with zero length at 0/301F1E0

However, all other standbys don't show any additional log output.

--
Thom

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-12-20 23:59:18 Re: Review of Row Level Security
Previous Message Tom Lane 2012-12-20 23:47:32 Re: Parser Cruft in gram.y