BUG #10432: failed to re-find parent key in index

From: m(dot)sakrejda(at)gmail(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #10432: failed to re-find parent key in index
Date: 2014-05-23 17:52:17
Message-ID: 20140523175217.2692.50568@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 10432
Logged by: Maciek Sakrejda
Email address: m(dot)sakrejda(at)gmail(dot)com
PostgreSQL version: 9.3.4
Operating system: Ubuntu 12.04 LTS 64-bit
Description:

Hi,

I just got the error `failed to re-find parent key in index "1665279" for
split pages 175193/193740`. From the list archives [1], it looks like
there's no easy recovery, but this happened on a replica right after the
recovery trigger file was touched and recovery completed, so that's not a
huge concern (unless it also happens to the next replica). Is there
something to be learned from the failed cluster? The only odd thing I see in
the logs is that it looks like we tried to take a base backup before
recovery actually completed, though the attempt appears to have failed
cleanly. Here are the logs from around the time of the failure, including
WAL-E logs:

May 23 16:21:28 postgres[10]: [1244-1] LOG: trigger file found:
/etc/postgresql/wal-e.d/pull-env/STANDBY_OFF
May 23 16:21:28 postgres[7545]: [5-1] FATAL: terminating walreceiver
process due to administrator command
May 23 16:21:28 wal_e.operator.backup INFO MSG: begin wal restore#012
STRUCTURED: time=2014-05-23T16:21:28.169498-00 pid=7757 action=wal-fetch
key=s3://.../wal_005/000000020000033900000066.lzo prefix=.../
seg=000000020000033900000066 state=begin
May 23 16:21:28 wal_e.blobstore.s3.s3_util WARNING MSG: could no longer
locate object while performing wal restore#012 DETAIL: The absolute
URI that could not be located is
s3://.../wal_005/000000020000033900000066.lzo.#012 HINT: This can be
normal when Postgres is trying to detect what timelines are available during
restoration.#012 STRUCTURED: time=2014-05-23T16:21:28.364020-00
pid=7757
May 23 16:21:28 wal_e.operator.backup INFO MSG: complete wal restore#012
STRUCTURED: time=2014-05-23T16:21:28.365570-00 pid=7757
action=wal-fetch key=s3://.../wal_005/000000020000033900000066.lzo
prefix=.../ seg=000000020000033900000066 state=complete
May 23 16:21:28 postgres[10]: [1245-1] LOG: redo done at 339/650000C8
May 23 16:21:28 postgres[10]: [1246-1] LOG: last completed transaction was
at log time 2014-05-23 14:43:31.251167+00
May 23 16:21:28 wal_e.operator.backup INFO MSG: begin wal restore#012
STRUCTURED: time=2014-05-23T16:21:28.535075-00 pid=7763 action=wal-fetch
key=s3://.../wal_005/000000020000033900000065.lzo prefix=.../
seg=000000020000033900000065 state=begin
May 23 16:21:29 wal_e.blobstore.s3.s3_util INFO MSG: completed download
and decompression#012 DETAIL: Downloaded and decompressed
"s3://.../wal_005/000000020000033900000065.lzo" to
"pg_xlog/RECOVERYXLOG"#012 STRUCTURED:
time=2014-05-23T16:21:29.094826-00 pid=7763
May 23 16:21:29 wal_e.operator.backup INFO MSG: complete wal restore#012
STRUCTURED: time=2014-05-23T16:21:29.096962-00 pid=7763
action=wal-fetch key=s3://.../wal_005/000000020000033900000065.lzo
prefix=.../ seg=000000020000033900000065 state=complete
May 23 16:21:29 postgres[10]: [1247-1] LOG: restored log file
"000000020000033900000065" from archive
May 23 16:21:29 wal_e.operator.backup INFO MSG: begin wal restore#012
STRUCTURED: time=2014-05-23T16:21:29.269909-00 pid=7769 action=wal-fetch
key=s3://.../wal_005/00000003.history.lzo prefix=.../ seg=00000003.history
state=begin
May 23 16:21:29 wal_e.blobstore.s3.s3_util WARNING MSG: could no longer
locate object while performing wal restore#012 DETAIL: The absolute
URI that could not be located is s3://.../wal_005/00000003.history.lzo.#012
HINT: This can be normal when Postgres is trying to detect what
timelines are available during restoration.#012 STRUCTURED:
time=2014-05-23T16:21:29.424936-00 pid=7769
May 23 16:21:29 wal_e.operator.backup INFO MSG: complete wal restore#012
STRUCTURED: time=2014-05-23T16:21:29.426295-00 pid=7769
action=wal-fetch key=s3://.../wal_005/00000003.history.lzo prefix=.../
seg=00000003.history state=complete
May 23 16:21:29 postgres[10]: [1248-1] LOG: selected new timeline ID: 3
May 23 16:21:29 wal_e.operator.backup INFO MSG: begin wal restore#012
STRUCTURED: time=2014-05-23T16:21:29.593188-00 pid=7775 action=wal-fetch
key=s3://.../wal_005/00000002.history.lzo prefix=.../ seg=00000002.history
state=begin
May 23 16:21:29 postgres[7779]: [4-1] [unknown] LOG: connection received:
host=[local]
May 23 16:21:29 postgres[7779]: [5-1] postgres LOG: connection authorized:
user=postgres database=postgres
May 23 16:21:29 postgres[7781]: [4-1] [unknown] LOG: connection received:
host=[local]
May 23 16:21:29 postgres[7781]: [5-1] postgres LOG: connection authorized:
user=postgres database=postgres
May 23 16:21:29 postgres[7781]: [6-1] postgres ERROR: recovery is in
progress
May 23 16:21:29 postgres[7781]: [6-2] postgres HINT: WAL control functions
cannot be executed during recovery.
May 23 16:21:29 postgres[7781]: [6-3] postgres STATEMENT: COPY (SELECT
file_name, lpad(file_offset::text, 8, '0') AS file_offset FROM
pg_xlogfile_name_offset(
pg_start_backup('freeze_start_2014-05-23T16:21:29.681458+00:00'))) TO STDOUT
WITH CSV HEADER;
May 23 16:21:29 wal_e.operator.backup WARNING MSG: blocking on sending WAL
segments#012 DETAIL: The backup was not completed successfully, but
we have to wait anyway. See README: TODO about pg_cancel_backup#012
STRUCTURED: time=2014-05-23T16:21:29.764625-00 pid=18790
May 23 16:21:29 postgres[7782]: [4-1] [unknown] LOG: connection received:
host=[local]
May 23 16:21:29 postgres[7782]: [5-1] postgres LOG: connection authorized:
user=postgres database=postgres
May 23 16:21:29 postgres[7782]: [6-1] postgres ERROR: recovery is in
progress
May 23 16:21:29 postgres[7782]: [6-2] postgres HINT: WAL control functions
cannot be executed during recovery.
May 23 16:21:29 postgres[7782]: [6-3] postgres STATEMENT: COPY (SELECT
file_name, lpad(file_offset::text, 8, '0') AS file_offset FROM
pg_xlogfile_name_offset( pg_stop_backup())) TO STDOUT WITH CSV HEADER;
May 23 16:21:29 wal_e.main ERROR MSG: Could not stop hot backup#012
STRUCTURED: time=2014-05-23T16:21:29.844455-00 pid=18790
May 23 16:21:29 wal_e.blobstore.s3.s3_util INFO MSG: completed download
and decompression#012 DETAIL: Downloaded and decompressed
"s3://.../wal_005/00000002.history.lzo" to "pg_xlog/RECOVERYHISTORY"#012
STRUCTURED: time=2014-05-23T16:21:29.866686-00 pid=7775
May 23 16:21:29 wal_e.operator.backup INFO MSG: complete wal restore#012
STRUCTURED: time=2014-05-23T16:21:29.868954-00 pid=7775
action=wal-fetch key=s3://.../wal_005/00000002.history.lzo prefix=.../
seg=00000002.history state=complete
May 23 16:21:29 postgres[10]: [1249-1] LOG: restored log file
"00000002.history" from archive
May 23 16:21:30 postgres[10]: [1250-1] LOG: archive recovery complete
May 23 16:21:30 postgres[10]: [1251-1] FATAL: failed to re-find parent key
in index "1665279" for split pages 175193/193740
May 23 16:21:31 postgres[8]: [4-1] LOG: startup process (PID 10) exited
with exit code 1
May 23 16:21:31 postgres[8]: [5-1] LOG: terminating any other active
server processes

[1]: http://www.postgresql.org/message-id/7753.1291075332@sss.pgh.pa.us

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Claudio Nieder 2014-05-23 19:29:03 Re: BUG #10429: the same update return different result
Previous Message rodriguez 2014-05-23 17:16:47 Re: BUG #10429: the same update return different result