Re: WAL logging problem in 9.4.3?

From: Andres Freund <andres(at)anarazel(dot)de>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL logging problem in 9.4.3?
Date: 2015-07-02 22:21:02
Message-ID: 20150702222102.GG30708@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2015-07-03 00:05:24 +0200, Martijn van Oosterhout wrote:
> === Start with an empty database

My guess is you have wal_level = minimal?

> ctmp=# begin;
> BEGIN
> ctmp=# create table test(id serial primary key);
> CREATE TABLE
> ctmp=# truncate table test;
> TRUNCATE TABLE
> ctmp=# commit;
> COMMIT
> ctmp=# select relname, relfilenode from pg_class where relname like 'test%';
> relname | relfilenode
> -------------+-------------
> test | 16389
> test_id_seq | 16387
> test_pkey | 16393
> (3 rows)
>

> === Note the index file is 8KB.
> === At this point nuke the database server (in this case it was simply
> === destroying the container it was running in.

How did you continue from there? The container has persistent storage?
Or are you repapplying the WAL to somewhere else?

> === Dump the xlogs just to show what got recorded. Note there's a
> === truncate for the data file and the index file.

That should be ok.

> martijn(at)martijn-jessie:$ sudo /usr/lib/postgresql/9.4/bin/pg_xlogdump -p /data/postgres/pg_xlog/ 000000010000000000000001 |grep -wE '16389|16387|16393'
> rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn: 0/016A9240, prev 0/016A9200, bkp: 0000, desc: checkpoint: redo 0/16A9240; tli 1; prev tli 1; fpw true; xid 0/686; oid 16387; multi 1; offset 0; oldest xid 673 in DB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown
> rmgr: Storage len (rec/tot): 16/ 48, tx: 0, lsn: 0/016A92D0, prev 0/016A92A8, bkp: 0000, desc: file create: base/16385/16387
> rmgr: Sequence len (rec/tot): 158/ 190, tx: 686, lsn: 0/016B5E50, prev 0/016B5D88, bkp: 0000, desc: log: rel 1663/16385/16387
> rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016B5F10, prev 0/016B5E50, bkp: 0000, desc: file create: base/16385/16389
> rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016BB028, prev 0/016BAFD8, bkp: 0000, desc: file create: base/16385/16393
> rmgr: Sequence len (rec/tot): 158/ 190, tx: 686, lsn: 0/016BE4F8, prev 0/016BE440, bkp: 0000, desc: log: rel 1663/16385/16387
> rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016BE6B0, prev 0/016BE660, bkp: 0000, desc: file truncate: base/16385/16389 to 0 blocks
> rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016BE6E0, prev 0/016BE6B0, bkp: 0000, desc: file truncate: base/16385/16393 to 0 blocks
> pg_xlogdump: FATAL: error in WAL record at 0/16BE710: record with zero length at 0/16BE740

Note that the truncate will lead to a new, different, relfilenode.

> === Start the DB up again
>
> database_1 | LOG: database system was interrupted; last known up at 2015-07-02 21:08:05 UTC
> database_1 | LOG: database system was not properly shut down; automatic recovery in progress
> database_1 | LOG: redo starts at 0/16A92A8
> database_1 | LOG: record with zero length at 0/16BE740
> database_1 | LOG: redo done at 0/16BE710
> database_1 | LOG: last completed transaction was at log time 2015-07-02 21:34:45.664989+00
> database_1 | LOG: database system is ready to accept connections
> database_1 | LOG: autovacuum launcher started
>
> === Oops, the index file is empty now

That's probably just the old index file?

> martijn(at)martijn-jessie:$ sudo ls -l /data/postgres/base/16385/{16389,16387,16393}
> -rw------- 1 messagebus ssl-cert 8192 Jul 2 23:37 /data/postgres/base/16385/16387
> -rw------- 1 messagebus ssl-cert 0 Jul 2 23:34 /data/postgres/base/16385/16389
> -rw------- 1 messagebus ssl-cert 0 Jul 2 23:37 /data/postgres/base/16385/16393
>
> martijn(at)martijn-jessie:$ psql ctmp -h localhost -U username
> Password for user username:
> psql (9.4.3)
> Type "help" for help.
>
> === And now the index is broken. I think the only reason it doesn't
> === complain about the data file is because zero bytes there is OK. But if
> === the table had data before it would be gone now.
>
> ctmp=# select * from test;
> ERROR: could not read block 0 in file "base/16385/16393": read only 0 of 8192 bytes

Hm. I can't reproduce this. Can you include a bit more details about how
to reproduce?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-07-02 23:07:45 Re: bugfix: incomplete implementation of errhidecontext
Previous Message Jim Nasby 2015-07-02 22:20:16 Determine operator from it's function