Re: Postgresql 9.3.4 Streaming Replication Standby invalid Page block

Lists: pgsql-bugs
From: "Burgess, Freddie" <FBurgess(at)Radiantblue(dot)com>
To: PostgreSQL Bugs ‎[pgsql-bugs(at)postgresql(dot)org]‎ <pgsql-bugs(at)postgresql(dot)org>
Subject: Postgresql 9.3.4 Streaming Replication Standby invalid Page block
Date: 2014-07-01 23:03:54
Message-ID: 3BBE635F64E28D4C899377A61DAA9FE02E66257B@NBSVR-MAIL01.radiantblue.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

PostgreSQL version: 9.3.4
Operating system: rhel 6.4 linux
Action: stream replication Master/Slave
Description:

Last entries in the PostgreSQL log file before the standby crashed, the primary seems unaffected

LOG: restored log file "0000000100001127000000cc" from archive
FATAL: invalid page in block 464698 of relation pg_tblspc/16435/PG_9.3_201306121/16444/125127698
CONTEXT: xlog redo vacuum: rel 16435/16444/125127698; blk 512019, lastBlockVacuumed 0
LOG: startup process (PID 27797) exited with exit code 1
LOG: terminating any other active server processes

We did re-started the database and the process of restoring the log file has continued beyond this point, but is are standby server corrupted?

thanks


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: "Burgess, Freddie" <FBurgess(at)Radiantblue(dot)com>, "PostgreSQL Bugs ‎[pgsql-bugs(at)postgresql(dot)org]‎" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Postgresql 9.3.4 Streaming Replication Standby invalid Page block
Date: 2014-07-02 11:02:27
Message-ID: 53B3E6C3.40003@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 07/02/2014 02:03 AM, Burgess, Freddie wrote:
> PostgreSQL version: 9.3.4
> Operating system: rhel 6.4 linux
> Action: stream replication Master/Slave
> Description:
>
> Last entries in the PostgreSQL log file before the standby crashed, the primary seems unaffected
>
> LOG: restored log file "0000000100001127000000cc" from archive
> FATAL: invalid page in block 464698 of relation pg_tblspc/16435/PG_9.3_201306121/16444/125127698
> CONTEXT: xlog redo vacuum: rel 16435/16444/125127698; blk 512019, lastBlockVacuumed 0
> LOG: startup process (PID 27797) exited with exit code 1
> LOG: terminating any other active server processes
>
> We did re-started the database and the process of restoring the log file has continued beyond this point, but is are standby server corrupted?

Sounds exactly like this bug:

http://www.postgresql.org/message-id/flat/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg(at)mail(dot)gmail(dot)com

but that was fixed in 9.3.3 already. Are you sure you're running 9.3.4
in the standby too?

- Heikki


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: "Burgess, Freddie" <FBurgess(at)Radiantblue(dot)com>, "PostgreSQL Bugs ‎[pgsql-bugs(at)postgresql(dot)org]‎" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Postgresql 9.3.4 Streaming Replication Standby invalid Page block
Date: 2014-07-02 11:09:44
Message-ID: 20140702110944.GL21169@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 2014-07-02 14:02:27 +0300, Heikki Linnakangas wrote:
> On 07/02/2014 02:03 AM, Burgess, Freddie wrote:
> > PostgreSQL version: 9.3.4
> > Operating system: rhel 6.4 linux
> > Action: stream replication Master/Slave
> > Description:
> >
> >Last entries in the PostgreSQL log file before the standby crashed, the primary seems unaffected
> >
> >LOG: restored log file "0000000100001127000000cc" from archive
> >FATAL: invalid page in block 464698 of relation pg_tblspc/16435/PG_9.3_201306121/16444/125127698
> >CONTEXT: xlog redo vacuum: rel 16435/16444/125127698; blk 512019, lastBlockVacuumed 0
> >LOG: startup process (PID 27797) exited with exit code 1
> >LOG: terminating any other active server processes
> >
> >We did re-started the database and the process of restoring the log file has continued beyond this point, but is are standby server corrupted?

Do you run with data checksums enabled?

> Sounds exactly like this bug:
>
> http://www.postgresql.org/message-id/flat/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg(at)mail(dot)gmail(dot)com
>
> but that was fixed in 9.3.3 already. Are you sure you're running 9.3.4 in
> the standby too?

Hm - that bug was about uninitialized pages, not invalid ones. I don't
immediately see why it'd be legal to have a invalid page (as in
!PageIsVerified()) somewhere? At least not after reaching consistency.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: "Burgess, Freddie" <FBurgess(at)Radiantblue(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: "PostgreSQL Bugs ‎[pgsql-bugs(at)postgresql(dot)org]‎" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Postgresql 9.3.4 Streaming Replication Standby invalid Page block
Date: 2014-07-02 20:04:27
Message-ID: 3BBE635F64E28D4C899377A61DAA9FE02E662601@NBSVR-MAIL01.radiantblue.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

show data_checksums;
data_checksums
----------------
off

tabsdb=# select version();
version
----------------------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4). 64-bit

On both Master/Standby
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The standby replayed all of the outstanding WAL logs overnight and we have caught up with the primary database now, and streaming replication is running fine now.

The relation "pg_tblspc/16435/PG_9.3_201306121/16444/125127698" points to a Partition tablespace with data from the year 2007. I verified that the row counts match up between the master/slave on the tables that reside on that tablespace.

Is there anything else I can do to verify the consistency on the standby?

thanks

________________________________________
From: Andres Freund [andres(at)2ndquadrant(dot)com]
Sent: Wednesday, July 02, 2014 7:09 AM
To: Heikki Linnakangas
Cc: Burgess, Freddie; "PostgreSQL Bugs ‎[pgsql-bugs(at)postgresql(dot)org]‎"
Subject: Re: [BUGS] Postgresql 9.3.4 Streaming Replication Standby invalid Page block

On 2014-07-02 14:02:27 +0300, Heikki Linnakangas wrote:
> On 07/02/2014 02:03 AM, Burgess, Freddie wrote:
> > PostgreSQL version: 9.3.4
> > Operating system: rhel 6.4 linux
> > Action: stream replication Master/Slave
> > Description:
> >
> >Last entries in the PostgreSQL log file before the standby crashed, the primary seems unaffected
> >
> >LOG: restored log file "0000000100001127000000cc" from archive
> >FATAL: invalid page in block 464698 of relation pg_tblspc/16435/PG_9.3_201306121/16444/125127698
> >CONTEXT: xlog redo vacuum: rel 16435/16444/125127698; blk 512019, lastBlockVacuumed 0
> >LOG: startup process (PID 27797) exited with exit code 1
> >LOG: terminating any other active server processes
> >
> >We did re-started the database and the process of restoring the log file has continued beyond this point, but is are standby server corrupted?

Do you run with data checksums enabled?

> Sounds exactly like this bug:
>
> http://www.postgresql.org/message-id/flat/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg(at)mail(dot)gmail(dot)com
>
> but that was fixed in 9.3.3 already. Are you sure you're running 9.3.4 in
> the standby too?

Hm - that bug was about uninitialized pages, not invalid ones. I don't
immediately see why it'd be legal to have a invalid page (as in
!PageIsVerified()) somewhere? At least not after reaching consistency.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: "Burgess, Freddie" <FBurgess(at)Radiantblue(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: "PostgreSQL Bugs ‎[pgsql-bugs(at)postgresql(dot)org]‎" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Postgresql 9.3.4 Streaming Replication Standby invalid Page block
Date: 2014-07-08 21:05:40
Message-ID: 3BBE635F64E28D4C899377A61DAA9FE034EDB06E@NBSVR-MAIL01.radiantblue.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Today, we have the same error in the logs, but now the standby server will not re-start at all. This error is referring to a static partition holding historical data from 2006, so the problem has to be related to autovaccum

FATAL: invalid page in block 420538 of relation pg_tblspc/16434/PG_9.3_201306121/16444/125127662
CONTEXT: xlog redo vacuum: rel 16434/16444/125127662; blk 582590, lastBlockVacuumed 0
LOG: startup process (PID 14307) exited with exit code 1
LOG: terminating any other active server processes

Are there any solutions?

thanks
________________________________________
From: pgsql-bugs-owner(at)postgresql(dot)org [pgsql-bugs-owner(at)postgresql(dot)org] on behalf of Burgess, Freddie [FBurgess(at)Radiantblue(dot)com]
Sent: Wednesday, July 02, 2014 4:04 PM
To: Andres Freund; Heikki Linnakangas
Cc: "PostgreSQL Bugs ‎[pgsql-bugs(at)postgresql(dot)org]‎"
Subject: Re: [BUGS] Postgresql 9.3.4 Streaming Replication Standby invalid Page block

show data_checksums;
data_checksums
----------------
off

tabsdb=# select version();
version
----------------------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4). 64-bit

On both Master/Standby
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The standby replayed all of the outstanding WAL logs overnight and we have caught up with the primary database now, and streaming replication is running fine now.

The relation "pg_tblspc/16435/PG_9.3_201306121/16444/125127698" points to a Partition tablespace with data from the year 2007. I verified that the row counts match up between the master/slave on the tables that reside on that tablespace.

Is there anything else I can do to verify the consistency on the standby?

thanks

________________________________________
From: Andres Freund [andres(at)2ndquadrant(dot)com]
Sent: Wednesday, July 02, 2014 7:09 AM
To: Heikki Linnakangas
Cc: Burgess, Freddie; "PostgreSQL Bugs ‎[pgsql-bugs(at)postgresql(dot)org]‎"
Subject: Re: [BUGS] Postgresql 9.3.4 Streaming Replication Standby invalid Page block

On 2014-07-02 14:02:27 +0300, Heikki Linnakangas wrote:
> On 07/02/2014 02:03 AM, Burgess, Freddie wrote:
> > PostgreSQL version: 9.3.4
> > Operating system: rhel 6.4 linux
> > Action: stream replication Master/Slave
> > Description:
> >
> >Last entries in the PostgreSQL log file before the standby crashed, the primary seems unaffected
> >
> >LOG: restored log file "0000000100001127000000cc" from archive
> >FATAL: invalid page in block 464698 of relation pg_tblspc/16435/PG_9.3_201306121/16444/125127698
> >CONTEXT: xlog redo vacuum: rel 16435/16444/125127698; blk 512019, lastBlockVacuumed 0
> >LOG: startup process (PID 27797) exited with exit code 1
> >LOG: terminating any other active server processes
> >
> >We did re-started the database and the process of restoring the log file has continued beyond this point, but is are standby server corrupted?

Do you run with data checksums enabled?

> Sounds exactly like this bug:
>
> http://www.postgresql.org/message-id/flat/CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg(at)mail(dot)gmail(dot)com
>
> but that was fixed in 9.3.3 already. Are you sure you're running 9.3.4 in
> the standby too?

Hm - that bug was about uninitialized pages, not invalid ones. I don't
immediately see why it'd be legal to have a invalid page (as in
!PageIsVerified()) somewhere? At least not after reaching consistency.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

--
Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs