Need help with autovacuuming error.

From: "S(dot) Horio" <fory(at)twinsun(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Need help with autovacuuming error.
Date: 2008-02-07 03:39:34
Message-ID: 47AA7D76.70504@twinsun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


We are investigating the following error:

ERROR: XX000: xlog flush request 1/50B823D8 is not satisfied --- flushed only to 1/50A711B0

OS : Windows XP SP2
Version : PostgreSQL8.1.4

It seems that "postmaster" or "autovacuum" process is
executed simultaneously, and it is affecting the WAL
related controls. Is there such a case that the "postmaster"
or "autovacuum" process are executed simultaneously
which are referring the same directory?

Here are what we have observed:

- We compiled PostgreSQL with WAL_DEBUG option so that it can
report WAL related information. Here are the logs which
had the error.

2008-02-05 12:37:13 4764 LOG: 00000: xlog flush request 1/50C55198; write 1/50C54AC0; flush 1/50C54AC0
2008-02-05 12:37:13 4764 CONTEXT: writing block 1212 of relation 1663/18096/18763
2008-02-05 12:37:13 4764 LOCATION: XLogFlush, xlog.c:1596
2008-02-05 12:37:15 3240 LOG: 00000: xlog flush request 1/50B823D8; write 0/0; flush 0/0
2008-02-05 12:37:15 3240 CONTEXT: writing block 588 of relation 1663/18096/27221
2008-02-05 12:37:15 3240 LOCATION: XLogFlush, xlog.c:1596
(*1) 2008-02-05 12:37:15 3240 ERROR: XX000: xlog flush request 1/50B823D8 is not satisfied --- flushed only to 1/50A711B0
2008-02-05 12:37:15 3240 CONTEXT: writing block 588 of relation 1663/18096/27221
2008-02-05 12:37:15 3240 LOCATION: XLogFlush, xlog.c:1687

- Then, we looked for "1/50A711B0" which was indicated in
the line (*1) "flushed only to", and found that the "autovacuum"
process was requesting "xlog flush" till "1/50A711B0" (*2)

2008-02-05 12:36:50 4468 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:36:50 4468 LOCATION: AutoVacMain, autovacuum.c:412
2008-02-05 12:36:51 4468 LOG: 00000: INSERT @ 1/50A5B9C8: prev 1/50A5B980; xid 1060766; bkpb 2: Heap - update: rel 1663/18096/2619; tid 942/18; new 944/34
2008-02-05 12:36:51 4468 LOCATION: XLogInsert, xlog.c:864
:
:
2008-02-05 12:36:51 4468 LOG: 00000: INSERT @ 1/50A70C50: prev 1/50A70B80; xid 1060766; bkpb 1: Btree - insert: rel 1663/18096/2696; tid 46/1
2008-02-05 12:36:51 4468 LOCATION: XLogInsert, xlog.c:864
2008-02-05 12:36:51 4468 LOG: 00000: INSERT @ 1/50A71180: prev 1/50A70C50; xid 1060766: Transaction - commit: 2008-02-05 12:36:51
2008-02-05 12:36:51 4468 LOCATION: XLogInsert, xlog.c:864
(*2) 2008-02-05 12:36:51 4468 LOG: 00000: xlog flush request 1/50A711B0; write 1/50A6A000; flush 1/50A5B9C8
2008-02-05 12:36:51 4468 LOCATION: XLogFlush, xlog.c:1596

- The following logs had been taken from the other process
before the "autovacuum" process was started. The logs from
"INSERT" reported that XLogInsert's write and flush position
were newer than the values specified by the "autovacuum"
process.

2008-02-05 12:36:49 4764 LOG: 00000: INSERT @ 1/50C54080: prev 1/50C54048; xid 1060970: Transaction - commit: 2008-02-05 12:36:49
2008-02-05 12:36:49 4764 LOCATION: XLogInsert, xlog.c:864
2008-02-05 12:36:49 4764 LOG: 00000: xlog flush request 1/50C540B0; write 1/50C53ED0; flush 1/50C53ED0
2008-02-05 12:36:49 4764 LOCATION: XLogFlush, xlog.c:1596

- We looked for the position, 1/50A5B980, which is XLogInsert
stating position by the autovacuum process. And found that the value
was reported right after PostgreSQL was restarted.
Also, we found that the some of the logs are reported twice.

2008-02-05 12:28:40 2748 LOG: 00000: database system was interrupted at 2008-02-05 12:07:00
2008-02-05 12:28:40 2748 LOCATION: StartupXLOG, xlog.c:4373
2008-02-05 12:28:40 2716 LOG: 00000: database system was interrupted at 2008-02-05 12:07:00
2008-02-05 12:28:40 2716 LOCATION: StartupXLOG, xlog.c:4373
2008-02-05 12:28:40 2716 LOG: 00000: checkpoint record is at 1/50A5B938
2008-02-05 12:28:40 2716 LOCATION: StartupXLOG, xlog.c:4441
2008-02-05 12:28:40 2716 LOG: 00000: redo record is at 1/50A5B938; undo record is at 0/0; shutdown TRUE
2008-02-05 12:28:40 2716 LOCATION: StartupXLOG, xlog.c:4468
2008-02-05 12:28:40 2716 LOG: 00000: next transaction ID: 1060753; next OID: 282105
2008-02-05 12:28:40 2716 LOCATION: StartupXLOG, xlog.c:4471
2008-02-05 12:28:40 2716 LOG: 00000: next MultiXactId: 1; next MultiXactOffset: 0
2008-02-05 12:28:40 2716 LOCATION: StartupXLOG, xlog.c:4474
2008-02-05 12:28:40 2716 LOG: 00000: database system was not properly shut down; automatic recovery in progress
2008-02-05 12:28:40 2716 LOCATION: StartupXLOG, xlog.c:4531
2008-02-05 12:28:41 2748 LOG: 00000: record with zero length at 1/50A5B980
2008-02-05 12:28:41 2748 LOCATION: ReadRecord, xlog.c:2763
2008-02-05 12:28:41 2748 LOG: 00000: redo is not required
2008-02-05 12:28:41 2748 LOCATION: StartupXLOG, xlog.c:4633
2008-02-05 12:28:41 2716 LOG: 00000: INSERT @ 1/50A5B980: prev 1/50A5B938; xid 0: XLOG - checkpoint: redo 1/50A5B980; undo 0/0; tli 1; xid 1060753; oid 282105; multi 1; offset 0; shutdown
2008-02-05 12:28:41 2716 LOCATION: XLogInsert, xlog.c:864
2008-02-05 12:28:41 2716 LOG: 00000: xlog flush request 1/50A5B9C8; write 1/50A5B980; flush 1/50A5B980
2008-02-05 12:28:41 2716 LOCATION: XLogFlush, xlog.c:1596
2008-02-05 12:28:42 2748 LOG: 00000: INSERT @ 1/50A5B980: prev 1/50A5B938; xid 0: XLOG - checkpoint: redo 1/50A5B980; undo 0/0; tli 1; xid 1060753; oid 282105; multi 1; offset 0; shutdown
2008-02-05 12:28:42 2748 LOCATION: XLogInsert, xlog.c:864
2008-02-05 12:28:42 2748 LOG: 00000: xlog flush request 1/50A5B9C8; write 1/50A5B980; flush 1/50A5B980
2008-02-05 12:28:42 2748 LOCATION: XLogFlush, xlog.c:1596
2008-02-05 12:28:42 2748 LOG: 00000: database system is ready
2008-02-05 12:28:42 2748 LOCATION: StartupXLOG, xlog.c:4820
2008-02-05 12:28:42 2716 LOG: 00000: database system is ready
2008-02-05 12:28:42 2716 LOCATION: StartupXLOG, xlog.c:4820

- Also, we found that autovacuum was executed more frequently than the value
set by autovacuum_naptime(= 60).

2008-02-05 12:30:46 4512 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:30:51 4552 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:31:47 5224 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:32:10 5620 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:32:47 6100 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:33:50 3340 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:33:50 4156 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:34:50 5196 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:34:50 5192 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:35:50 5064 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:35:50 5068 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:36:50 3848 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:36:50 4468 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:37:51 5344 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:37:51 5528 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:38:54 4536 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:39:12 4168 LOG: 00000: autovacuum: processing database "hoge"
2008-02-05 12:39:54 3304 LOG: 00000: autovacuum: processing database "hoge"

- It seems that the postmaster process is simultaneously executed and
breaking the WAL. Have anyone observed such a case?
We are starting PostgreSQL with "pg_ctl" command and if it does not
start up within a certain period, we are restarting it with "pg_ctl"
again.

---
Shigenori Horio 360 N. Sepulveda Blvd., Suite 1040
Twin Sun Inc. El Segundo, CA 90245
shorio(at)twinsun(dot)com Tel: (310) 524-1800 ext 5, Fax: (310) 524-1818

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2008-02-07 04:25:41 Re: PostgreSQL 8.4 development plan
Previous Message Mark Mielke 2008-02-07 01:33:08 Re: PostgreSQL 8.4 development plan