Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages

Lists: pgsql-bugspgsql-hackers
From: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>
To: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2013-12-19 22:37:04
Message-ID: CAL_0b1s4QCkFy_55kk_8XWcJPs7wsgVWf8vn4=jXe6V4R7Hxmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi,

Linux 2.6.32
PostgreSQL 9.2.6

It was suffering from this problem on 9.2.4, mostly last couple of
weeks when I had to rebuild the replica almost every 3 days, and I
hoped it would be fixed in 9.2.6, but it is not.

Yesterday I upgraded both master and replica servers to 9.2.6,
performed SET vacuum_freeze_table_age = 0; VACUUM; and then rebuild
the replica from scratch.

Below are replica logs, pg_controldata output, configuration and the
backtrace of the core dump. I can keep the replica untouched (without
rebuilding it) for several more hours, so please let me know if you
need me to show you something else.

Replica logs:

2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] WARNING:
page 14833 of relation base/16436/3321003988 is uninitialized
2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT:
xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
lastBlockVacuumed 0
2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] PANIC: WAL
contains references to invalid pages
2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT:
xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
lastBlockVacuumed 0
2013-12-19 20:51:22 MSK 19935 @ from [vxid: txid:0] [] LOG: startup
process (PID 19938) was terminated by signal 6: Aborted
2013-12-19 20:51:22 MSK 19935 @ from [vxid: txid:0] [] LOG:
terminating any other active server processes

pg_controldata output:

pg_control version number: 922
Catalog version number: 201204301
Database system identifier: 5858109675396804534
Database cluster state: in archive recovery
pg_control last modified: Thu 19 Dec 2013 08:37:31 PM MSK
Latest checkpoint location: 3271/79F16848
Prior checkpoint location: 3270/77E32598
Latest checkpoint's REDO location: 3270/929671E0
Latest checkpoint's TimeLineID: 2
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 4/1572485244
Latest checkpoint's NextOID: 3349674903
Latest checkpoint's NextMultiXactId: 6730476
Latest checkpoint's NextMultiOffset: 30866630
Latest checkpoint's oldestXID: 1340099111
Latest checkpoint's oldestXID's DB: 16436
Latest checkpoint's oldestActiveXID: 1572038146
Time of latest checkpoint: Thu 19 Dec 2013 07:00:09 PM MSK
Minimum recovery ending location: 3272/666EE368
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current max_connections setting: 550
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 64
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value

Non default configuration:

application_name = psql
archive_command = (disabled)
autovacuum_analyze_scale_factor = 0.05
autovacuum_max_workers = 5
autovacuum_naptime = 5
autovacuum_vacuum_cost_delay = 5
autovacuum_vacuum_scale_factor = 0.05
bgwriter_delay = 10
bgwriter_lru_multiplier = 10
checkpoint_completion_target = 0.9
checkpoint_segments = 1024
checkpoint_timeout = 3600
checkpoint_warning = 300
client_encoding = UTF8
DateStyle = ISO, DMY
default_text_search_config = pg_catalog.russian
effective_cache_size = 17432576
effective_io_concurrency = 32
hot_standby = on
hot_standby_feedback = on
lc_collate = ru_RU.UTF-8
lc_ctype = ru_RU.UTF-8
lc_messages = en_US.UTF-8
lc_monetary = ru_RU.UTF-8
lc_numeric = ru_RU.UTF-8
lc_time = ru_RU.UTF-8
listen_addresses = *
log_checkpoints = on
log_directory = /var/log/pgsql
log_file_mode = 0600
log_filename = postgresql-%Y-%m-%d.log
log_line_prefix = %t %p %u(at)%d from %h [vxid:%v txid:%x] [%i]
log_lock_waits = on
log_rotation_size = 0
log_statement = ddl
log_timezone = W-SU
log_truncate_on_rotation = on
logging_collector = on
maintenance_work_mem = 1048576
max_connections = 550
max_stack_depth = 2048
max_standby_streaming_delay = 300000
max_wal_senders = 3
pg_stat_statements.track = all
port = 6432
server_encoding = UTF8
shared_buffers = 4587520
shared_preload_libraries = pg_stat_statements
synchronous_commit = off
TimeZone = W-SU
track_activity_query_size = 4096
transaction_isolation = read committed
unix_socket_permissions = 0777
wal_buffers = 2048
wal_keep_segments = 2048
wal_level = hot_standby
work_mem = 262144

Backtrace of the coredump:

Core was generated by `postgres: startup process recovering
000000020000327200000066 '.
Program terminated with signal 6, Aborted.
#0 0x00007f6a5a8818a5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install
postgresql92-server-9.2.6-1PGDG.el6.x86_64
(gdb) set print pretty on
(gdb) bt f
#0 0x00007f6a5a8818a5 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00007f6a5a883085 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x000000000071c580 in errfinish (dummy=<value optimized out>) at elog.c:534
edata = 0xb88fc0
elevel = 22
oldcontext = 0x134a740
econtext = 0x0
__func__ = "errfinish"
#3 0x000000000071d09c in elog_finish (elevel=<value optimized out>,
fmt=<value optimized out>) at elog.c:1213
edata = 0xb88fc0
oldcontext = 0x134a740
__func__ = "elog_finish"
#4 0x00000000004a6d6d in log_invalid_page (node=...,
forkno=MAIN_FORKNUM, blkno=14833, present=1 '\001') at xlogutils.c:91
key = {
node = {
spcNode = 1663,
dbNode = 16436,
relNode = 3321003988
},
forkno = 25867822,
blkno = 0
}
hentry = <value optimized out>
found = <value optimized out>
__func__ = "log_invalid_page"
#5 0x00000000004a6ea0 in XLogReadBufferExtended (rnode=...,
forknum=MAIN_FORKNUM, blkno=14833, mode=RBM_NORMAL) at xlogutils.c:358
page = <value optimized out>
lastblock = 38539
buffer = <value optimized out>
smgr = <value optimized out>
#6 0x00000000004897a2 in btree_xlog_vacuum (lsn=<value optimized
out>, record=0x7f6a4fb20010) at nbtxlog.c:510
blkno = 14833
xlrec = 0x7f6a4fb20030
buffer = <value optimized out>
page = <value optimized out>
opaque = <value optimized out>
#7 btree_redo (lsn=<value optimized out>, record=0x7f6a4fb20010) at
nbtxlog.c:1062
info = <value optimized out>
__func__ = "btree_redo"
#8 0x00000000004a3468 in StartupXLOG () at xlog.c:6931
recoveryContinue = 1 '\001'
recoveryApply = <value optimized out>
errcontext = {
previous = 0x0,
callback = 0x49d5a0 <rm_redo_error_callback>,
arg = 0x7f6a4fb20010
}
xtime = <value optimized out>
rmid = <value optimized out>
xlogctl = 0x7f6a6de08480
Insert = <value optimized out>
checkPoint = {
redo = {
xlogid = 12896,
xrecoff = 4261478784
},
ThisTimeLineID = 2,
fullPageWrites = 1 '\001',
nextXidEpoch = 4,
nextXid = 1452297001,
nextOid = 3344849815,
nextMulti = 6695025,
nextMultiOffset = 30699263,
oldestXid = 1256929431,
oldestXidDB = 12865,
time = 1387392623,
oldestActiveXid = 1452282892
}
wasShutdown = <value optimized out>
reachedStopPoint = 0 '\000'
haveBackupLabel = <value optimized out>
RecPtr = <value optimized out>
checkPointLoc = {
xlogid = 12896,
xrecoff = 4271100680
}
EndOfLog = <value optimized out>
endLogId = <value optimized out>
endLogSeg = <value optimized out>
record = 0x7f6a4fb20010
freespace = <value optimized out>
oldestActiveXID = <value optimized out>
backupEndRequired = <value optimized out>
backupFromStandby = <value optimized out>
dbstate_at_startup = <value optimized out>
__func__ = "StartupXLOG"
#9 0x00000000006153b8 in StartupProcessMain () at startup.c:220
__func__ = "StartupProcessMain"
#10 0x00000000004c3837 in AuxiliaryProcessMain (argc=2,
argv=0x7fffa02b0100) at bootstrap.c:418
progname = 0x854a1a "postgres"
flag = <value optimized out>
userDoption = 0x0
__func__ = "AuxiliaryProcessMain"
#11 0x00000000006105f3 in StartChildProcess (type=StartupProcess) at
postmaster.c:4522
pid = <value optimized out>
av = {0x854a1a "postgres", 0x7fffa02b0150 "-x2", 0x0,
0x7fffa02f0814
"H\213\025e\376\377\377H+B(\213J8H#B0H\017\257\301\213J<H\323\370D;\"u\245H\003C\b1\322\061\311H\213\063H=\377ɚ;v\033\017\037\204",
0x7fffa02b0170 "\300\226\064\001", 0x0, 0x7fffa02b0150 "-x2",
0x7fffa02f0814
"H\213\025e\376\377\377H+B(\213J8H#B0H\017\257\301\213J<H\323\370D;\"u\245H\003C\b1\322\061\311H\213\063H=\377ɚ;v\033\017\037\204",
0xb41080 ">-\262R",
0x0}
ac = 2
typebuf =
"-x2\000\377\177\000\000\274\b/\240\377\177\000\000\005\000\000\000\000\000\000\000\005\000\000\000\000\000\000"
__func__ = "StartChildProcess"
#12 0x0000000000614c3c in PostmasterMain (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:1123
opt = <value optimized out>
status = <value optimized out>
userDoption = <value optimized out>
listen_addr_saved = 1 '\001'
i = <value optimized out>
__func__ = "PostmasterMain"
#13 0x00000000005b37e0 in main (argc=5, argv=0x13496a0) at main.c:199
No locals.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray(dot)ru(at)gmail(dot)com


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2013-12-19 23:04:07
Message-ID: 20131219230407.GC11483@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi Sergey, Heikki,

On 2013-12-19 14:37:04 -0800, Sergey Konoplev wrote:
> It was suffering from this problem on 9.2.4, mostly last couple of
> weeks when I had to rebuild the replica almost every 3 days, and I
> hoped it would be fixed in 9.2.6, but it is not.

This actually is a separate bug from the one fixed. Not that the CONTEXT
message isn't talking about "visible" but about "vacuum".

> Yesterday I upgraded both master and replica servers to 9.2.6,
> performed SET vacuum_freeze_table_age = 0; VACUUM; and then rebuild
> the replica from scratch.

> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] WARNING:
> page 14833 of relation base/16436/3321003988 is uninitialized
> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT:
> xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
> lastBlockVacuumed 0
> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] PANIC: WAL
> contains references to invalid pages
> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT:
> xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
> lastBlockVacuumed 0

Heikki, that's the redo routine for XLOG_BTREE_VACUUM, you probably know
that code best atm, and it might influence on how the new split and
deletion code works...

Greetings,

Andres


From: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2013-12-19 23:13:25
Message-ID: CAL_0b1tL-mciZZ8GyZj4HVbny5fPO0ZN8XkJ+RTyJb9oisyBYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Dec 19, 2013 at 3:04 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-12-19 14:37:04 -0800, Sergey Konoplev wrote:
>> It was suffering from this problem on 9.2.4, mostly last couple of
>> weeks when I had to rebuild the replica almost every 3 days, and I
>> hoped it would be fixed in 9.2.6, but it is not.
>
> This actually is a separate bug from the one fixed. Not that the CONTEXT
> message isn't talking about "visible" but about "vacuum".

Oh, I see. I have completely lost in these bugs last several months.

Are there anything I can do to prevent it except waiting for a patch?

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray(dot)ru(at)gmail(dot)com


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>, Tomonari Katsumata <katsumata(dot)tomonari(at)po(dot)ntts(dot)co(dot)jp>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2013-12-20 19:34:30
Message-ID: 52B49BC6.3030406@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 12/20/2013 01:13 AM, Sergey Konoplev wrote:
> On Thu, Dec 19, 2013 at 3:04 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> On 2013-12-19 14:37:04 -0800, Sergey Konoplev wrote:
>>> It was suffering from this problem on 9.2.4, mostly last couple of
>>> weeks when I had to rebuild the replica almost every 3 days, and I
>>> hoped it would be fixed in 9.2.6, but it is not.
>>
>> This actually is a separate bug from the one fixed. Not that the CONTEXT
>> message isn't talking about "visible" but about "vacuum".
>
> Oh, I see. I have completely lost in these bugs last several months.
>
> Are there anything I can do to prevent it except waiting for a patch?

I wonder if this might be the same bug Tomonari Katsumata reported
yesterday:
http://www.postgresql.org/message-id/E1VtTni-00082E-Jv@wrigleys.postgresql.org.
Since this keeps happening to you, could you try the patch he posted and
see if it helps?

- Heikki


From: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>, Tomonari Katsumata <katsumata(dot)tomonari(at)po(dot)ntts(dot)co(dot)jp>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2013-12-20 20:01:55
Message-ID: CAL_0b1vhCTWM2Le2+u4ycdCfTscuicdU=7HMss_cuokunQxh2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Fri, Dec 20, 2013 at 11:34 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 12/20/2013 01:13 AM, Sergey Konoplev wrote:
>> On Thu, Dec 19, 2013 at 3:04 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
>> wrote:
>>> On 2013-12-19 14:37:04 -0800, Sergey Konoplev wrote:
>>>>
>>>> It was suffering from this problem on 9.2.4, mostly last couple of
>>>> weeks when I had to rebuild the replica almost every 3 days, and I
>>>> hoped it would be fixed in 9.2.6, but it is not.
>>>
>>>
>>> This actually is a separate bug from the one fixed. Not that the CONTEXT
>>> message isn't talking about "visible" but about "vacuum".
>>
>>
>> Oh, I see. I have completely lost in these bugs last several months.
>>
>> Are there anything I can do to prevent it except waiting for a patch?
>
> I wonder if this might be the same bug Tomonari Katsumata reported
> yesterday:
> http://www.postgresql.org/message-id/E1VtTni-00082E-Jv@wrigleys.postgresql.org.
> Since this keeps happening to you, could you try the patch he posted and see
> if it helps?

For me it looks like something completely different from the
Tomonari's bug. I does not have problems with restarting and it always
has something to do with vacuum. The error message is the same every
time it breaks down:

2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] WARNING:
page 14833 of relation base/16436/3321003988 is uninitialized
2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT:
xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
lastBlockVacuumed 0
2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] PANIC: WAL
contains references to invalid pages
2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT:
xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
lastBlockVacuumed 0

Also, the issue is reproducible on the production site of a big
enterprise company and it is always a long lasting pain to negotiate
and deploy even a little patch. So I would like to ask you to review
this issue more carefully before I start the negotiation process,
please.

I saved the buggy cluster directory and the core dump in case if you
will need it.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray(dot)ru(at)gmail(dot)com


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>, Tomonari Katsumata <katsumata(dot)tomonari(at)po(dot)ntts(dot)co(dot)jp>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2013-12-20 20:09:31
Message-ID: 20131220200931.GA31909@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 2013-12-20 12:01:55 -0800, Sergey Konoplev wrote:
> For me it looks like something completely different from the
> Tomonari's bug. I does not have problems with restarting and it always
> has something to do with vacuum. The error message is the same every
> time it breaks down:

Well, you didn't mention that it wasn't related to restarts before. You
said that you just had re-cloned the standby, so assuming a restart
isn't unreasonable.

> Also, the issue is reproducible on the production site of a big
> enterprise company and it is always a long lasting pain to negotiate
> and deploy even a little patch. So I would like to ask you to review
> this issue more carefully before I start the negotiation process,
> please.

I have to say, attitude-wise, I find that statement a bit
problematic. You're working with the community here, not somebody you're
paying.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>, Tomonari Katsumata <katsumata(dot)tomonari(at)po(dot)ntts(dot)co(dot)jp>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2013-12-20 23:07:37
Message-ID: CAL_0b1tVmywdzJn=yA2dBY0sEF--rcrBAfZGCco3yFQRQ9wLLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Fri, Dec 20, 2013 at 12:09 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-12-20 12:01:55 -0800, Sergey Konoplev wrote:
>> For me it looks like something completely different from the
>> Tomonari's bug. I does not have problems with restarting and it always
>> has something to do with vacuum. The error message is the same every
>> time it breaks down:
>
> Well, you didn't mention that it wasn't related to restarts before. You
> said that you just had re-cloned the standby, so assuming a restart
> isn't unreasonable.

I tried to reproduce the Tomonari's bug after Heikki pointed me on it,
and nothing like described in those thread has happened.

>> Also, the issue is reproducible on the production site of a big
>> enterprise company and it is always a long lasting pain to negotiate
>> and deploy even a little patch. So I would like to ask you to review
>> this issue more carefully before I start the negotiation process,
>> please.
>
> I have to say, attitude-wise, I find that statement a bit
> problematic. You're working with the community here, not somebody you're
> paying.

Sure, I realize it. I am from the open-source world a little bit too.
Just wanted to make sure that everything that could be done is done.
Sorry if I said something wrong.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray(dot)ru(at)gmail(dot)com


From: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2013-12-21 09:04:00
Message-ID: CAL_0b1vxdJePwUVxdkfoeChAHFRDqTa=zOM-XM3yZvkYNhA_8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Dec 19, 2013 at 3:04 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2013-12-19 14:37:04 -0800, Sergey Konoplev wrote:
>> It was suffering from this problem on 9.2.4, mostly last couple of
>> weeks when I had to rebuild the replica almost every 3 days, and I
>> hoped it would be fixed in 9.2.6, but it is not.
>
> This actually is a separate bug from the one fixed. Not that the CONTEXT
> message isn't talking about "visible" but about "vacuum".
>
>> Yesterday I upgraded both master and replica servers to 9.2.6,
>> performed SET vacuum_freeze_table_age = 0; VACUUM; and then rebuild
>> the replica from scratch.
>
>> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] WARNING:
>> page 14833 of relation base/16436/3321003988 is uninitialized
>> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT:
>> xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
>> lastBlockVacuumed 0
>
> Heikki, that's the redo routine for XLOG_BTREE_VACUUM, you probably know
> that code best atm, and it might influence on how the new split and
> deletion code works...

I just wonder if it might be wal_level = hot_standby problem only.
What if I switch it to just archive? Wont this problem gone?

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray(dot)ru(at)gmail(dot)com


From: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>
To: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>, Мельковский Владимир Ярославович <Melkovskiy(at)gw(dot)tander(dot)ru>, Толстенко Илья <tolstenko_iv(at)gw(dot)tander(dot)ru>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2013-12-22 00:49:23
Message-ID: CAL_0b1tXw6cwFQv7LaU743sA+KjGvHKnOhjhfvLn-3YGLJ1x1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Thu, Dec 19, 2013 at 2:37 PM, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com> wrote:
> Yesterday I upgraded both master and replica servers to 9.2.6,
> performed SET vacuum_freeze_table_age = 0; VACUUM; and then rebuild
> the replica from scratch.
>
> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] WARNING:
> page 14833 of relation base/16436/3321003988 is uninitialized
> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT:
> xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
> lastBlockVacuumed 0
> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] PANIC: WAL
> contains references to invalid pages
> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT:
> xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
> lastBlockVacuumed 0

FYI

Today we noticed that the PANIC happens on the same relation with the
same block and page. This is an index, created on Dec 10 with its
daily partition that was before I upgraded Postgres from 9.2.4 to
9.2.6. So probably it was a kind of aftermath of the pre-9.2.6
problems.

I have done reindex of this particular index on production, but I
still have the buggy copy of the cluster in case if someone would like
to dig the problem deeper.

Will keep you informed.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray(dot)ru(at)gmail(dot)com


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, matioli(dot)matheus(at)gmail(dot)com
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-06 13:48:15
Message-ID: 20140106134815.GE28320@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi,

On 2013-12-19 14:37:04 -0800, Sergey Konoplev wrote:
> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] WARNING:
> page 14833 of relation base/16436/3321003988 is uninitialized
> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT:
> xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
> lastBlockVacuumed 0
> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] PANIC: WAL
> contains references to invalid pages
> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT:
> xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
> lastBlockVacuumed 0
> 2013-12-19 20:51:22 MSK 19935 @ from [vxid: txid:0] [] LOG: startup
> process (PID 19938) was terminated by signal 6: Aborted
> 2013-12-19 20:51:22 MSK 19935 @ from [vxid: txid:0] [] LOG:
> terminating any other active server processes

There just was another case of this reported on IRC by MatheusOl and for
some reason in his case I noticed the pertinent details and it quickly
clicked:
* page 14833 is the one with the error
* we're actually vacuuming page 38538
* lastBlockVacuumed is 0

In btree_xlog_vacuum() we scan all the pages between lastBlockVacuumed
and the page vacuumed and acquire a cleanup lock on it. But there isn't
any guarantee that the intermediate pages are valid, filled pages,
afaics. ISTM we can just use RBM_ZERO_ON_ERROR instead of RBM_NORMAL.

Alternatively we could devise a trivial routine that just does a
BufTableLookup() and the required hangups around that. But that seems
more invasive than necessary.

Neither makes me really happy, but I don't see a nicer solution right
now.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-06 14:35:42
Message-ID: 52CABF3E.2050004@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 01/06/2014 03:48 PM, Andres Freund wrote:
> Hi,
>
> On 2013-12-19 14:37:04 -0800, Sergey Konoplev wrote:
>> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] WARNING:
>> page 14833 of relation base/16436/3321003988 is uninitialized
>> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT:
>> xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
>> lastBlockVacuumed 0
>> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] PANIC: WAL
>> contains references to invalid pages
>> 2013-12-19 20:51:22 MSK 19938 @ from [vxid:1/0 txid:0] [] CONTEXT:
>> xlog redo vacuum: rel 1663/16436/3321003988; blk 38538,
>> lastBlockVacuumed 0
>> 2013-12-19 20:51:22 MSK 19935 @ from [vxid: txid:0] [] LOG: startup
>> process (PID 19938) was terminated by signal 6: Aborted
>> 2013-12-19 20:51:22 MSK 19935 @ from [vxid: txid:0] [] LOG:
>> terminating any other active server processes
>
> There just was another case of this reported on IRC by MatheusOl and for
> some reason in his case I noticed the pertinent details and it quickly
> clicked:
> * page 14833 is the one with the error
> * we're actually vacuuming page 38538
> * lastBlockVacuumed is 0
>
> In btree_xlog_vacuum() we scan all the pages between lastBlockVacuumed
> and the page vacuumed and acquire a cleanup lock on it. But there isn't
> any guarantee that the intermediate pages are valid, filled pages,
> afaics.

Hmm. So the problem arises if there's an uninitialized page in the
middle of the b-tree relation for some reason. It's unusual for an
uninitialized page to be left in the middle of the relation, but it's
certainly possible, if e.g you crash just after extending the relation.
In a heap, vacuum will initialize such pages and emit a WARNING like
"page %u is uninitialized --- fixing", but we don't do that for b-tree.

> ISTM we can just use RBM_ZERO_ON_ERROR instead of RBM_NORMAL.

That'd be horrendously dangerous. It would silently zap any page with
any error on it. But we could add a new ReadBufferMode that returns
InvalidBuffer on error, without zeroing the page.

- Heikki


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-06 14:45:32
Message-ID: 20140106144532.GH28320@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 2014-01-06 16:35:42 +0200, Heikki Linnakangas wrote:
> >In btree_xlog_vacuum() we scan all the pages between lastBlockVacuumed
> >and the page vacuumed and acquire a cleanup lock on it. But there isn't
> >any guarantee that the intermediate pages are valid, filled pages,
> >afaics.
>
> Hmm. So the problem arises if there's an uninitialized page in the middle of
> the b-tree relation for some reason. It's unusual for an uninitialized page
> to be left in the middle of the relation, but it's certainly possible, if
> e.g you crash just after extending the relation. In a heap, vacuum will
> initialize such pages and emit a WARNING like "page %u is uninitialized ---
> fixing", but we don't do that for b-tree.

Yes. There's probably some resource exhaustion scenarios that can cause
this as well. Triggering *_timeout or out-of-memory errors at least...

> >ISTM we can just use RBM_ZERO_ON_ERROR instead of RBM_NORMAL.
>
> That'd be horrendously dangerous. It would silently zap any page with any
> error on it.

Well, the number of scenarios in which you can hit that is somewhat
limited - it's not going to be ENOSPC since the file is guaranteed to be
longer, and we don't create holes - so the scenario pretty much is disk
reads not returning (pretty rare) and checksums failing.

> But we could add a new ReadBufferMode that returns
> InvalidBuffer on error, without zeroing the page.

But yes, that's much better.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-13 20:19:06
Message-ID: 17450.1389644346@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> On 01/06/2014 03:48 PM, Andres Freund wrote:
>> There just was another case of this reported on IRC by MatheusOl and for
>> some reason in his case I noticed the pertinent details and it quickly
>> clicked:
>> * page 14833 is the one with the error
>> * we're actually vacuuming page 38538
>> * lastBlockVacuumed is 0
>>
>> In btree_xlog_vacuum() we scan all the pages between lastBlockVacuumed
>> and the page vacuumed and acquire a cleanup lock on it. But there isn't
>> any guarantee that the intermediate pages are valid, filled pages,
>> afaics.

> Hmm. So the problem arises if there's an uninitialized page in the
> middle of the b-tree relation for some reason. It's unusual for an
> uninitialized page to be left in the middle of the relation, but it's
> certainly possible, if e.g you crash just after extending the relation.

Right. This diagnosis is incomplete in itself, because if the slave has a
zeroed page there, shouldn't the master have one too? If the master does
have a zeroed page there, how come vacuum didn't fail on the master? The
answer is that btvacuumpage will skip over all-zero pages without doing
anything more than noting them as free in FSM. When btree_xlog_vacuum
rescans the relation, it will also skip over all-zero pages without doing
anything --- but XLogReadBufferExtended logs such a page as invalid, and
then bitches later when it doesn't see the page dropped or truncated away.

>> ISTM we can just use RBM_ZERO_ON_ERROR instead of RBM_NORMAL.

> That'd be horrendously dangerous. It would silently zap any page with
> any error on it. But we could add a new ReadBufferMode that returns
> InvalidBuffer on error, without zeroing the page.

The important point is not just that it not damage the page, but that
it not log it as invalid. I concur that the right fix requires a
new operating mode for XLogReadBufferExtended, perhaps RBM_NORMAL_ZERO_OK.
I think the spec for this should be that if the page doesn't exist or
contains zeroes, we return InvalidBuffer without logging the page number
as invalid. The doesn't-exist case is justified by the expectation that
there will be a later RBM_NORMAL call for a larger page number, which will
result in a suitable complaint if the page range isn't there.

Will go fix this if there's not any objection to that plan.

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-13 20:26:00
Message-ID: 20140113202600.GB14861@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 2014-01-13 15:19:06 -0500, Tom Lane wrote:
> >> ISTM we can just use RBM_ZERO_ON_ERROR instead of RBM_NORMAL.
>
> > That'd be horrendously dangerous. It would silently zap any page with
> > any error on it. But we could add a new ReadBufferMode that returns
> > InvalidBuffer on error, without zeroing the page.
>
> The important point is not just that it not damage the page, but that
> it not log it as invalid. I concur that the right fix requires a
> new operating mode for XLogReadBufferExtended, perhaps RBM_NORMAL_ZERO_OK.
> I think the spec for this should be that if the page doesn't exist or
> contains zeroes, we return InvalidBuffer without logging the page number
> as invalid. The doesn't-exist case is justified by the expectation that
> there will be a later RBM_NORMAL call for a larger page number, which will
> result in a suitable complaint if the page range isn't there.

That's a sensible way to go, yes. But I wonder if we wouldn't end up
with less complicated code if we added a variant of ReadBuffer that only
returns a buffer from cache if already present in s_b.
I started to prototype something like RBM_NORMAL_ZERO_OK shortly after
Heikki's message and it seems to quickly turn a bit ugly because the
surrounding code isn't really ready to deal with not returning a
buffer. And for the purpose of that redo routine, that'd actually be
better.

> Will go fix this if there's not any objection to that plan.

Fine with me!

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-13 20:34:33
Message-ID: 17881.1389645273@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2014-01-13 15:19:06 -0500, Tom Lane wrote:
>> The important point is not just that it not damage the page, but that
>> it not log it as invalid. I concur that the right fix requires a
>> new operating mode for XLogReadBufferExtended, perhaps RBM_NORMAL_ZERO_OK.
>> I think the spec for this should be that if the page doesn't exist or
>> contains zeroes, we return InvalidBuffer without logging the page number
>> as invalid. The doesn't-exist case is justified by the expectation that
>> there will be a later RBM_NORMAL call for a larger page number, which will
>> result in a suitable complaint if the page range isn't there.

> That's a sensible way to go, yes. But I wonder if we wouldn't end up
> with less complicated code if we added a variant of ReadBuffer that only
> returns a buffer from cache if already present in s_b.

That sounds rather invasive to me, and more to the point it requires
fooling with code that doesn't have anything to do with WAL replay.

> I started to prototype something like RBM_NORMAL_ZERO_OK shortly after
> Heikki's message and it seems to quickly turn a bit ugly because the
> surrounding code isn't really ready to deal with not returning a
> buffer. And for the purpose of that redo routine, that'd actually be
> better.

It looks easy enough to me to support the desired behavior in
XLogReadBufferExtended. I wasn't planning to support it in ReadBuffer
itself. (Probably the path of least resistance is to define it as meaning
the same thing as RBM_NORMAL, so far as bufmgr.c is concerned.)

regards, tom lane


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-13 20:40:32
Message-ID: 52D44F40.1000803@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 01/13/2014 10:26 PM, Andres Freund wrote:
> On 2014-01-13 15:19:06 -0500, Tom Lane wrote:
>> I concur that the right fix requires a
>> new operating mode for XLogReadBufferExtended, perhaps RBM_NORMAL_ZERO_OK.
>> I think the spec for this should be that if the page doesn't exist or
>> contains zeroes, we return InvalidBuffer without logging the page number
>> as invalid. The doesn't-exist case is justified by the expectation that
>> there will be a later RBM_NORMAL call for a larger page number, which will
>> result in a suitable complaint if the page range isn't there.

I think it's more natural to return the empty page to the caller, rather
than InvalidBuffer.

> That's a sensible way to go, yes. But I wonder if we wouldn't end up
> with less complicated code if we added a variant of ReadBuffer that only
> returns a buffer from cache if already present in s_b.
> I started to prototype something like RBM_NORMAL_ZERO_OK shortly after
> Heikki's message and it seems to quickly turn a bit ugly because the
> surrounding code isn't really ready to deal with not returning a
> buffer. And for the purpose of that redo routine, that'd actually be
> better.

If it's trivial to add such a mode to buffer cache, then sure, let's do
that by all means. But I seriously doubt it's really simple enough to be
back-patchable.

With RBM_NORMAL_ZERO_OK, AFAICS we're talking about a tiny patch to
XLogReadBufferExtended. bufmgr.c doesn't need to do anything about the
new mode, as it's XLogReadBuffer that does the the check for zero pages.
Per attached patch (for demonstration purposes only, you also need to
add the new mode to the header file and adjust comments).

- Heikki

Attachment Content-Type Size
rbm_normal_zero_ok-1.patch text/x-diff 974 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-13 20:56:23
Message-ID: 18474.1389646583@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> With RBM_NORMAL_ZERO_OK, AFAICS we're talking about a tiny patch to
> XLogReadBufferExtended. bufmgr.c doesn't need to do anything about the
> new mode, as it's XLogReadBuffer that does the the check for zero pages.
> Per attached patch (for demonstration purposes only, you also need to
> add the new mode to the header file and adjust comments).

That looks like it will still log the page as invalid if it's past EOF,
which I think we don't want.

Did you want to finish this off and commit, or were you just passing
what you had to me?

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-13 21:02:45
Message-ID: 20140113210245.GD14861@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 2014-01-13 22:40:32 +0200, Heikki Linnakangas wrote:
> On 01/13/2014 10:26 PM, Andres Freund wrote:
> >On 2014-01-13 15:19:06 -0500, Tom Lane wrote:
> >>I concur that the right fix requires a
> >>new operating mode for XLogReadBufferExtended, perhaps RBM_NORMAL_ZERO_OK.
> >>I think the spec for this should be that if the page doesn't exist or
> >>contains zeroes, we return InvalidBuffer without logging the page number
> >>as invalid. The doesn't-exist case is justified by the expectation that
> >>there will be a later RBM_NORMAL call for a larger page number, which will
> >>result in a suitable complaint if the page range isn't there.
>
> I think it's more natural to return the empty page to the caller, rather
> than InvalidBuffer.

Hm. I don't see the advantage - and it makes it impossible to recognize
that case at the caller level.

> >That's a sensible way to go, yes. But I wonder if we wouldn't end up
> >with less complicated code if we added a variant of ReadBuffer that only
> >returns a buffer from cache if already present in s_b.
> >I started to prototype something like RBM_NORMAL_ZERO_OK shortly after
> >Heikki's message and it seems to quickly turn a bit ugly because the
> >surrounding code isn't really ready to deal with not returning a
> >buffer. And for the purpose of that redo routine, that'd actually be
> >better.
>
> If it's trivial to add such a mode to buffer cache, then sure, let's do that
> by all means. But I seriously doubt it's really simple enough to be
> back-patchable.

Isn't it (untested, written in the editor) just something like:

/* only works for pages in shared buffers */
Assert(!SmgrIsTemp(smgr));

/* Make sure we will have room to remember the buffer pin */
ResourceOwnerEnlargeBuffers(CurrentResourceOwner);

/* create a tag so we can lookup the buffer */
INIT_BUFFERTAG(newTag, smgr->smgr_rnode.node, forkNum, blockNum);

newHash = BufTableHashCode(&newTag);
newPartitionLock = BufMappingPartitionLock(newHash);

/* now, check if the block is in the buffer pool already */
LWLockAcquire(newPartitionLock, LW_SHARED);

buf_id = BufTableLookup(&newTag, newHash);
if (buf_id >= 0)
{
/*
* Found it. Now, pin the buffer so no one can steal it from the
* buffer pool, and check to see if the correct data has been loaded
* into the buffer.
*/
buf = &BufferDescriptors[buf_id];

valid = PinBuffer(buf, NULL);

/* Can release the mapping lock as soon as we've pinned it */
LWLockRelease(newPartitionLock);

if (!valid)
{
/*
* We can only get here if (a) someone else is still reading in
* the page, or (b) a previous read attempt failed. We have to
* wait for any active read attempt to finish, and then set up our
* own read attempt if the page is still not BM_VALID.
* StartBufferIO does it all.
*/
if (StartBufferIO(buf, true))
{
/*
* If we get here, previous attempts to read the buffer must
* have failed ...
*/
return InvalidBuffer;
}
}

return buf;
}

/*
* Didn't find it in the buffer pool, done.
*/
LWLockRelease(newPartitionLock);

return InvalidBuffer;

Now, that's not trivial, but also not too complicated.

> With RBM_NORMAL_ZERO_OK, AFAICS we're talking about a tiny patch to
> XLogReadBufferExtended. bufmgr.c doesn't need to do anything about the new
> mode, as it's XLogReadBuffer that does the the check for zero pages. Per
> attached patch (for demonstration purposes only, you also need to add the
> new mode to the header file and adjust comments).

I thought about that approach at first as well, but I am not so sure
it's sufficient. Isn't it quite possible that we'd end up reading a page
that was *partially* written during a crash and due to that has a
corrupted checksum? There won't be any protection due to WAL replay/full
page writes against that case here.
Now, you could argue that that shouldn't be the case because we're only
entering that codepath once STANDBY_SNAPSHOT_READY and you might be
right...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-13 21:25:48
Message-ID: 19262.1389648348@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2014-01-13 22:40:32 +0200, Heikki Linnakangas wrote:
>> I think it's more natural to return the empty page to the caller, rather
>> than InvalidBuffer.

> Hm. I don't see the advantage - and it makes it impossible to recognize
> that case at the caller level.

Huh? The caller can do PageIsNew if it needs to tell the difference.
If we always return InvalidBuffer, it *can't* tell the difference.

> I thought about that approach at first as well, but I am not so sure
> it's sufficient. Isn't it quite possible that we'd end up reading a page
> that was *partially* written during a crash and due to that has a
> corrupted checksum?

If we did, then either the master would also have failed (so we'd not
be here), or there's a discrepancy between master and slave data,
in which case we've got bigger problems than this.

The addition of a bufmgr function to only return an already-in-cache
page is a possible future performance enhancement, but I do not care
for it as a bug fix that has to be back-patched as far as 9.0.

regards, tom lane


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-13 21:29:59
Message-ID: 52D45AD7.9080406@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 01/13/2014 11:02 PM, Andres Freund wrote:
> On 2014-01-13 22:40:32 +0200, Heikki Linnakangas wrote:
>> With RBM_NORMAL_ZERO_OK, AFAICS we're talking about a tiny patch to
>> XLogReadBufferExtended. bufmgr.c doesn't need to do anything about the new
>> mode, as it's XLogReadBuffer that does the the check for zero pages. Per
>> attached patch (for demonstration purposes only, you also need to add the
>> new mode to the header file and adjust comments).
>
> I thought about that approach at first as well, but I am not so sure
> it's sufficient. Isn't it quite possible that we'd end up reading a page
> that was *partially* written during a crash and due to that has a
> corrupted checksum? There won't be any protection due to WAL replay/full
> page writes against that case here.

Good point. Normally, we expect the checksum to match on all pages that
we read during WAL replay, because full page writes will initialize any
page that is modified to an untorn state, before it's ever read. But we
can't rely on that in the extra read that btree_xlog_vacuum() does. It's
possible that there's a torn page on disk on block X, and we're
vacuuming page X + 1. The page will be fixed by a later record in the
WAL, before we reach consistency, but the ReadBuffer call from
btree_xlog_vacuum() will throw an error.

> Now, you could argue that that shouldn't be the case because we're only
> entering that codepath once STANDBY_SNAPSHOT_READY and you might be
> right...

I don't think that saves us. standbyMode can be STANDBY_SNAPSHOT_READY,
before we reach consistency. Adding a check for reachedConsistency,
though, ought to fix it.

- Heikki


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-13 21:36:41
Message-ID: 19552.1389649001@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> Good point. Normally, we expect the checksum to match on all pages that
> we read during WAL replay, because full page writes will initialize any
> page that is modified to an untorn state, before it's ever read. But we
> can't rely on that in the extra read that btree_xlog_vacuum() does.

But it's not an "extra" read. It's replicating a read that was done
on the master in the btvacuumpage() scan. AFAICS the only way to fail
on the slave and not the master is if the slave has inconsistent data,
in which case you're at hazard of failing anyway.

>> Now, you could argue that that shouldn't be the case because we're only
>> entering that codepath once STANDBY_SNAPSHOT_READY and you might be
>> right...

> I don't think that saves us. standbyMode can be STANDBY_SNAPSHOT_READY,
> before we reach consistency. Adding a check for reachedConsistency,
> though, ought to fix it.

Huh? Surely we're not letting queries in until we're consistent.

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-13 21:36:59
Message-ID: 20140113213659.GC5838@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 2014-01-13 16:25:48 -0500, Tom Lane wrote:
> > I thought about that approach at first as well, but I am not so sure
> > it's sufficient. Isn't it quite possible that we'd end up reading a page
> > that was *partially* written during a crash and due to that has a
> > corrupted checksum?
>
> If we did, then either the master would also have failed (so we'd not
> be here), or there's a discrepancy between master and slave data,
> in which case we've got bigger problems than this.

If we crashed during replay and then restart replaying it might, at
least without the STANDBY_SNAPSHOT_READY check. I think.

XLOG_BTREE_VACUUM block 10, lastVacuumed 1
XLOG_BTREE_INSERT_SPLIT
XLOG_BTREE_INSERT_LEAF block 5
XLOG_BTREE_VACUUM block 10, lastVacuumed 1
-- crash
if we now restart replaying block 5 can be in just about any state.

... tinker ...

Hm, the standbyMode check isn't sufficient, it's independent from other
consistency checks (like minRecoveryPoint). But we can extend that check
in the redo routine to check for full consistency, which should be fine
since there won't be any relevant pins on buffers before that anyway.

> The addition of a bufmgr function to only return an already-in-cache
> page is a possible future performance enhancement, but I do not care
> for it as a bug fix that has to be back-patched as far as 9.0.

Yea, the only argument I see is that it actually might end up being
simpler, that's why I brought it up.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-13 21:40:22
Message-ID: 20140113214022.GD5838@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 2014-01-13 16:36:41 -0500, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> > Good point. Normally, we expect the checksum to match on all pages that
> > we read during WAL replay, because full page writes will initialize any
> > page that is modified to an untorn state, before it's ever read. But we
> > can't rely on that in the extra read that btree_xlog_vacuum() does.
>
> But it's not an "extra" read. It's replicating a read that was done
> on the master in the btvacuumpage() scan. AFAICS the only way to fail
> on the slave and not the master is if the slave has inconsistent data,
> in which case you're at hazard of failing anyway.

I tried to explain which scenario I see as dangerous nearby.

> >> Now, you could argue that that shouldn't be the case because we're only
> >> entering that codepath once STANDBY_SNAPSHOT_READY and you might be
> >> right...
>
> > I don't think that saves us. standbyMode can be STANDBY_SNAPSHOT_READY,
> > before we reach consistency. Adding a check for reachedConsistency,
> > though, ought to fix it.
>
> Huh? Surely we're not letting queries in until we're consistent.

We don't, but STANDBY_SNAPSHOT_READY isn't the only variable controlling
that. It just determines whether we'd have the necessary visibility
information. The full check is:
*/
if (standbyState == STANDBY_SNAPSHOT_READY &&
!LocalHotStandbyActive &&
reachedConsistency &&
IsUnderPostmaster)
{
...
xlogctl->SharedHotStandbyActive = true;
...
SendPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY);
}

So we need to mimick that.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-13 21:43:45
Message-ID: 19746.1389649425@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2014-01-13 16:25:48 -0500, Tom Lane wrote:
>> If we did, then either the master would also have failed (so we'd not
>> be here), or there's a discrepancy between master and slave data,
>> in which case we've got bigger problems than this.

> If we crashed during replay and then restart replaying it might, at
> least without the STANDBY_SNAPSHOT_READY check. I think.

Perhaps, but the code in question isn't needed during crash recovery, only
during hot standby; so modulo the possibility that the check for that is
broken, I don't see a problem.

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-13 22:04:40
Message-ID: 20140113220440.GE5838@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 2014-01-13 23:29:59 +0200, Heikki Linnakangas wrote:
> On 01/13/2014 11:02 PM, Andres Freund wrote:
> >Now, you could argue that that shouldn't be the case because we're only
> >entering that codepath once STANDBY_SNAPSHOT_READY and you might be
> >right...
>
> I don't think that saves us. standbyMode can be STANDBY_SNAPSHOT_READY,
> before we reach consistency. Adding a check for reachedConsistency, though,
> ought to fix it.

Maybe we should just put LocalHotStandbyActive into xlog_internal.h?
That'd be enough for the startup process. Or alternatively add a check
for EnableHotStandby into HotStandbyActive() so we don't take the
spinlock if it's not enabled?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-13 23:55:39
Message-ID: 22892.1389657339@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> Maybe we should just put LocalHotStandbyActive into xlog_internal.h?

Or a macro, or something. nbtxlog.c shouldn't know quite that much
about what allows hot standby ...

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-14 01:15:59
Message-ID: 25048.1389662159@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2014-01-13 23:29:59 +0200, Heikki Linnakangas wrote:
>> I don't think that saves us. standbyMode can be STANDBY_SNAPSHOT_READY,
>> before we reach consistency. Adding a check for reachedConsistency, though,
>> ought to fix it.

> Maybe we should just put LocalHotStandbyActive into xlog_internal.h?
> That'd be enough for the startup process. Or alternatively add a check
> for EnableHotStandby into HotStandbyActive() so we don't take the
> spinlock if it's not enabled?

After a little bit of research, I propose that we add a function like
this in xlog.c:

/*
* Like HotStandbyActive(), but to be used only in WAL replay code,
* where we don't need to ask any other process what the state is.
*/
bool
HotStandbyActiveStartup(void)
{
Assert(AmStartupProcess());
return LocalHotStandbyActive;
}

Any objections? Bikeshedding on the name?

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-14 01:18:51
Message-ID: 20140114011851.GA20335@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 2014-01-13 20:15:59 -0500, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2014-01-13 23:29:59 +0200, Heikki Linnakangas wrote:
> >> I don't think that saves us. standbyMode can be STANDBY_SNAPSHOT_READY,
> >> before we reach consistency. Adding a check for reachedConsistency, though,
> >> ought to fix it.
>
> > Maybe we should just put LocalHotStandbyActive into xlog_internal.h?
> > That'd be enough for the startup process. Or alternatively add a check
> > for EnableHotStandby into HotStandbyActive() so we don't take the
> > spinlock if it's not enabled?
>
> After a little bit of research, I propose that we add a function like
> this in xlog.c:

> Any objections? Bikeshedding on the name?

Fine with me.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-14 01:59:48
Message-ID: 26165.1389664788@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

BTW, while I'm looking at this ... the writing side of the code seems a
few bricks shy of a load too:

/*
* InHotStandby we need to scan right up to the end of the index for
* correct locking, so we may need to write a WAL record for the final
* block in the index if it was not vacuumed. It's possible that VACUUMing
* has actually removed zeroed pages at the end of the index so we need to
* take care to issue the record for last actual block and not for the
* last block that was scanned. Ignore empty indexes.
*/
if (XLogStandbyInfoActive() &&
num_pages > 1 && vstate.lastBlockVacuumed < (num_pages - 1))
{
Buffer buf;

/*
* We can't use _bt_getbuf() here because it always applies
* _bt_checkpage(), which will barf on an all-zero page. We want to
* recycle all-zero pages, not fail. Also, we want to use a
* nondefault buffer access strategy.
*/
buf = ReadBufferExtended(rel, MAIN_FORKNUM, num_pages - 1, RBM_NORMAL,
info->strategy);
LockBufferForCleanup(buf);
_bt_delitems_vacuum(rel, buf, NULL, 0, vstate.lastBlockVacuumed);
_bt_relbuf(rel, buf);
}

If the last physical page of the index is all-zero, the preceding loop
won't have any problem with that (nor should it); but this code sure will
have a problem, because _bt_delitems_vacuum isn't prepared to cope with an
all-zero page AFAICS.

Nor am I following the logic of the initial comment. If we know that
pages above N are all-zero, why are we worried about taking locks on them?
There shouldn't be any scans reaching such pages, and any that did would
error out anyhow in _bt_getbuf().

ISTM the right thing here is for the btvacuumpage loop to remember the
last ordinary, valid index page it saw, and point to that one in this
added WAL entry.

regards, tom lane


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-14 13:54:32
Message-ID: 52D54198.7090802@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 01/14/2014 03:59 AM, Tom Lane wrote:
> BTW, while I'm looking at this ... the writing side of the code seems a
> few bricks shy of a load too:
>
> /*
> * InHotStandby we need to scan right up to the end of the index for
> * correct locking, so we may need to write a WAL record for the final
> * block in the index if it was not vacuumed. It's possible that VACUUMing
> * has actually removed zeroed pages at the end of the index so we need to
> * take care to issue the record for last actual block and not for the
> * last block that was scanned. Ignore empty indexes.
> */
> if (XLogStandbyInfoActive() &&
> num_pages > 1 && vstate.lastBlockVacuumed < (num_pages - 1))
> {
> Buffer buf;
>
> /*
> * We can't use _bt_getbuf() here because it always applies
> * _bt_checkpage(), which will barf on an all-zero page. We want to
> * recycle all-zero pages, not fail. Also, we want to use a
> * nondefault buffer access strategy.
> */
> buf = ReadBufferExtended(rel, MAIN_FORKNUM, num_pages - 1, RBM_NORMAL,
> info->strategy);
> LockBufferForCleanup(buf);
> _bt_delitems_vacuum(rel, buf, NULL, 0, vstate.lastBlockVacuumed);
> _bt_relbuf(rel, buf);
> }
>
> If the last physical page of the index is all-zero, the preceding loop
> won't have any problem with that (nor should it); but this code sure will
> have a problem, because _bt_delitems_vacuum isn't prepared to cope with an
> all-zero page AFAICS.

Yep, that's broken.

> Nor am I following the logic of the initial comment. If we know that
> pages above N are all-zero, why are we worried about taking locks on them?
> There shouldn't be any scans reaching such pages, and any that did would
> error out anyhow in _bt_getbuf().

Also, btree relations are never truncated, so the comment is wrong when
it claims that VACUUM might've removed zero pages at the end of the index.

> ISTM the right thing here is for the btvacuumpage loop to remember the
> last ordinary, valid index page it saw, and point to that one in this
> added WAL entry.

Agreed.

- Heikki


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, matioli(dot)matheus(at)gmail(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Сизов Сергей Павлович <sizov_sp(at)gw(dot)tander(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-01-14 22:52:52
Message-ID: 1164.1389739972@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

I've pushed patches fixing the issues discussed in this thread.

While I was working on it, it occurred to me that the whole darn mess
might be dead code as of the removal of SnapshotNow. The reason we're
doing this at all is the fear that, while some indexscan is in flight
from an index entry to the relevant heap tuple, VACUUM could remove
said tuple and then somebody else could insert a new, unrelated tuple
at the now-vacant TID. With ordinary MVCC snapshot checking, this
would not be a problem because the new tuple could not possibly
pass the snapshot check, so it wouldn't get returned. However, with
non-MVCC snapshots, it's possible we could return a heap tuple that
doesn't match the index qual. The fear of this happening for SnapshotNow
catalog accesses was what drove the original locking design, and that is
also why we don't bother with similar locking for other index types ---
only btree indexes are used for system catalog fetches.

So now that we use MVCC snapshots for system catalog fetches, it seems
like maybe we don't need to sweat about cleanup-locking index pages
we're not going to remove tuples from; indeed maybe we don't need to
acquire super-exclusive btree page locks at all.

I'm not totally sure about this; we still have SnapshotSelf for instance.
But maybe the remaining non-MVCC snapshot types are never used in ways
where this type of aliasing would be a problem, or maybe we could cause
that to become true with a bit more work.

I'm not planning to pursue the question myself at the moment, but
I thought I should mention it in case anybody else wants to.

regards, tom lane


From: jlrando <jose(dot)luis(dot)rando(dot)calvo(at)ericsson(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-06-09 11:27:25
Message-ID: 1402313245515-5806482.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi all,

I am also facing this issue. I set up streaming replication for a PG 9.2.6
database in both master and slave. I have used pg_basebackup to dump master
database to slave. It took two hours to complete pg_basebackup. During this
time a lot of wal files were archived. As expected, when slave database
started up it went through archived wal files for 5 hours to catch the
master. During recovery from archived wal files appeared some warnings like
this:

...
Jun 6 14:55:50 VNWID1-4 postgres[17024]: [177-1] [2014-06-06 14:55:50 CEST]
17024@ LOG: restored log file "000000010000066C000000E4" from archive
Jun 6 14:55:51 VNWID1-4 postgres[17032]: [1-1] [2014-06-06 14:55:51 CEST]
17032@ WARNING: xlog min recovery request 67B/B36ABD78 is past current
point 66C/E4FFFFE8
Jun 6 14:55:51 VNWID1-4 postgres[17032]: [1-2] [2014-06-06 14:55:51 CEST]
17032@ CONTEXT: writing block 0 of relation
pg_tblspc/16756/PG_9.2_201204301/16757/1116517382_vm
Jun 6 14:55:51 VNWID1-4 postgres[17024]: [178-1] [2014-06-06 14:55:51 CEST]
17024@ LOG: restored log file "000000010000066C000000E5" from archive
...

It seems they are harmless and can be safely ignored, but I prefer to state
this in the post in case be relevant for any reason.

After 5 hours, slave database catched master database and streaming
replication started:

...
Jun 6 19:59:10 VNWID1-4 postgres[17024]: [15763-1] [2014-06-06 19:59:10
CEST] 17024@ LOG: restored log file "00000001000006AA00000003" from
archive
Jun 6 19:59:11 VNWID1-4 postgres[17024]: [15764-1] [2014-06-06 19:59:11
CEST] 17024@ LOG: unexpected pageaddr 6A7/CC000000 in log file 1706,
segment 4, offset 0
Jun 6 19:59:11 VNWID1-4 postgres[21691]: [2-1] [2014-06-06 19:59:11 CEST]
21691@ LOG: streaming replication successfully connected to primary

It seems that all is ok until now. And no other message was logged until the
crash. After some hours of replication being working properly, Postgres
stopped and log contains this:

Jun 7 17:52:45 VNWID1-4 postgres[17024]: [15765-1] [2014-06-07 17:52:45
CEST] 17024@ WARNING: page 916 of relation
pg_tblspc/16756/PG_9.2_201204301/16757/402007830 is uninitialized
Jun 7 17:52:45 VNWID1-4 postgres[17024]: [15765-2] [2014-06-07 17:52:45
CEST] 17024@ CONTEXT: xlog redo vacuum: rel 16756/16757/402007830; blk
917, lastBlockVacuumed 915
Jun 7 17:52:45 VNWID1-4 postgres[17024]: [15766-1] [2014-06-07 17:52:45
CEST] 17024@ PANIC: WAL contains references to invalid pages
Jun 7 17:52:45 VNWID1-4 postgres[17024]: [15766-2] [2014-06-07 17:52:45
CEST] 17024@ CONTEXT: xlog redo vacuum: rel 16756/16757/402007830; blk
917, lastBlockVacuumed 915
Jun 7 17:52:45 VNWID1-4 postgres[17021]: [2-1] [2014-06-07 17:52:45 CEST]
17021@ LOG: startup process (PID 17024) was terminated by signal 6: Aborted
Jun 7 17:52:45 VNWID1-4 postgres[17021]: [3-1] [2014-06-07 17:52:45 CEST]
17021@ LOG: terminating any other active server processes

I think that this issue is exactly the same that is being discussed in this
thread and I have seen that a patch was released for this bug. Could anyone
tell me how to get / apply this patch?

Thanks in advace.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Hot-standby-9-2-6-9-2-6-PANIC-WAL-contains-references-to-invalid-pages-tp5784147p5806482.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: jlrando <jose(dot)luis(dot)rando(dot)calvo(at)ericsson(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-06-09 12:41:27
Message-ID: 5395AB77.4080209@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On 06/09/2014 02:27 PM, jlrando wrote:
> Hi all,
>
> I am also facing this issue. I set up streaming replication for a PG 9.2.6
> database in both master and slave. I have used pg_basebackup to dump master
> database to slave. It took two hours to complete pg_basebackup. During this
> time a lot of wal files were archived. As expected, when slave database
> started up it went through archived wal files for 5 hours to catch the
> master. During recovery from archived wal files appeared some warnings like
> this:
>
> ...
> Jun 6 14:55:50 VNWID1-4 postgres[17024]: [177-1] [2014-06-06 14:55:50 CEST]
> 17024@ LOG: restored log file "000000010000066C000000E4" from archive
> Jun 6 14:55:51 VNWID1-4 postgres[17032]: [1-1] [2014-06-06 14:55:51 CEST]
> 17032@ WARNING: xlog min recovery request 67B/B36ABD78 is past current
> point 66C/E4FFFFE8
> Jun 6 14:55:51 VNWID1-4 postgres[17032]: [1-2] [2014-06-06 14:55:51 CEST]
> 17032@ CONTEXT: writing block 0 of relation
> pg_tblspc/16756/PG_9.2_201204301/16757/1116517382_vm
> Jun 6 14:55:51 VNWID1-4 postgres[17024]: [178-1] [2014-06-06 14:55:51 CEST]
> 17024@ LOG: restored log file "000000010000066C000000E5" from archive
> ...
>
> It seems they are harmless and can be safely ignored, but I prefer to state
> this in the post in case be relevant for any reason.

Hmm. That warning shouldn't happen, but looking at the visibility map
code, I see how it could:

During the WAL replay, a heap update is replayed, which clears a bit in
the visibility map. That dirties the VM page, but we intentionally don't
check or update the LSN on the VM page. The VM buffer is now dirty in
the buffer pool, but if it was backed up later during the backup, it has
a greater LSN than what we're currently replaying. When that buffer gets
written out, you get the warning.

Yeah, I think that's harmless. But we should look into fixing that...

> It seems that all is ok until now. And no other message was logged until the
> crash. After some hours of replication being working properly, Postgres
> stopped and log contains this:
>
> Jun 7 17:52:45 VNWID1-4 postgres[17024]: [15765-1] [2014-06-07 17:52:45
> CEST] 17024@ WARNING: page 916 of relation
> pg_tblspc/16756/PG_9.2_201204301/16757/402007830 is uninitialized
> Jun 7 17:52:45 VNWID1-4 postgres[17024]: [15765-2] [2014-06-07 17:52:45
> CEST] 17024@ CONTEXT: xlog redo vacuum: rel 16756/16757/402007830; blk
> 917, lastBlockVacuumed 915
> Jun 7 17:52:45 VNWID1-4 postgres[17024]: [15766-1] [2014-06-07 17:52:45
> CEST] 17024@ PANIC: WAL contains references to invalid pages
> Jun 7 17:52:45 VNWID1-4 postgres[17024]: [15766-2] [2014-06-07 17:52:45
> CEST] 17024@ CONTEXT: xlog redo vacuum: rel 16756/16757/402007830; blk
> 917, lastBlockVacuumed 915
> Jun 7 17:52:45 VNWID1-4 postgres[17021]: [2-1] [2014-06-07 17:52:45 CEST]
> 17021@ LOG: startup process (PID 17024) was terminated by signal 6: Aborted
> Jun 7 17:52:45 VNWID1-4 postgres[17021]: [3-1] [2014-06-07 17:52:45 CEST]
> 17021@ LOG: terminating any other active server processes
>
> I think that this issue is exactly the same that is being discussed in this
> thread and I have seen that a patch was released for this bug.

Yeah, that looks like the same bug.

> Could anyone tell me how to get / apply this patch?

It's included in 9.2.8. Just update the binaries.

- Heikki


From: jlrando <jose(dot)luis(dot)rando(dot)calvo(at)ericsson(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-06-09 14:10:03
Message-ID: 1402323003717-5806506.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi Heikki,

Thanks a lot for your answer. Perform an upgrade of this DB will require a
lot of time because the only certified DB version for this application is
9.2.6. A very complex flow will be triggered to ask for a new PG release
certification. So my question is, do you know if there is any workaround for
9.2.6 we can use in the meanwhile?

Thanks in advance and best regards!

Jose.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Hot-standby-9-2-6-9-2-6-PANIC-WAL-contains-references-to-invalid-pages-tp5784147p5806506.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.


From: Matheus de Oliveira <matioli(dot)matheus(at)gmail(dot)com>
To: jlrando <jose(dot)luis(dot)rando(dot)calvo(at)ericsson(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-06-09 18:58:17
Message-ID: CAJghg4+p2aJaZEg3-awKd0Yd15HR5VMrzpKdJoV3yyVS_Mn1wA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Mon, Jun 9, 2014 at 11:10 AM, jlrando <jose(dot)luis(dot)rando(dot)calvo(at)ericsson(dot)com
> wrote:

> Perform an upgrade of this DB will require a
> lot of time because the only certified DB version for this application is
> 9.2.6.
>

This simple makes no sense for PG release versioning schema/policy, see [1].

Following schema X.Y.Z of a PG release, you may "certifies" an application
to work on version X.Y only, the Z number means the release (bug-fix/patch
basically), and you should **always** be using the most higher Z for your
X.Y version (in the case of 9.2, you should be using 9.2.8 already, no
excuses). In summary, you can always upgrade from Z to Z+1 without
compatibility issues. Of course, read the release notes to check if there
is some maintenance needed after the upgrade.

[1] http://www.postgresql.org/support/versioning/

--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Matheus de Oliveira <matioli(dot)matheus(at)gmail(dot)com>
Cc: jlrando <jose(dot)luis(dot)rando(dot)calvo(at)ericsson(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Re: Hot standby 9.2.6 -> 9.2.6 PANIC: WAL contains references to invalid pages
Date: 2014-06-12 03:14:54
Message-ID: CAB7nPqQ_u7Up-nJzsBKU7R+oc8acdq3D0jm2VXbJCfc53oMBLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Jun 10, 2014 at 3:58 AM, Matheus de Oliveira
<matioli(dot)matheus(at)gmail(dot)com> wrote:
>
> On Mon, Jun 9, 2014 at 11:10 AM, jlrando
> <jose(dot)luis(dot)rando(dot)calvo(at)ericsson(dot)com> wrote:
>>
>> Perform an upgrade of this DB will require a
>> lot of time because the only certified DB version for this application is
>> 9.2.6.
>
>
> This simple makes no sense for PG release versioning schema/policy, see [1].
>
> Following schema X.Y.Z of a PG release, you may "certifies" an application
> to work on version X.Y only, the Z number means the release (bug-fix/patch
> basically), and you should **always** be using the most higher Z for your
> X.Y version (in the case of 9.2, you should be using 9.2.8 already, no
> excuses). In summary, you can always upgrade from Z to Z+1 without
> compatibility issues. Of course, read the release notes to check if there is
> some maintenance needed after the upgrade.
>
> [1] http://www.postgresql.org/support/versioning/
The same major versions (X.Y releases) are guaranteed to be
disk-compatible and new minor releases only contain bug fixes. So
updating is only a matter of installing the new binaries and then
restart the server. By updating always to the latest available version
you protect as well your application from the latest bugs found.
--
Michael