Re: Disaster!

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martín Marqués <martin(at)bugs(dot)unl(dot)edu(dot)ar>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Disaster!
Date: 2004-01-23 21:21:04
Message-ID: 4221.1074892864@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

=?iso-8859-1?b?TWFydO1uIA==?= =?iso-8859-1?b?TWFycXXpcw==?= <martin(at)bugs(dot)unl(dot)edu(dot)ar> writes:
> Tom, could you give a small insight on what occurred here, why those
> 8k of zeros fixed it, and what is a "WAL replay"?

I think what happened is that there was insufficient space to write out
a new page of the clog (transaction commit) file. This would result in
a database panic, which is fine --- you're not gonna get much done
anyway if you are down to zero free disk space. However, after Chris
freed up space, the system needed to replay the WAL from the last
checkpoint to ensure consistency. The WAL entries evidently included
references to transactions whose commit bits were in the unwritten page.
Now there would also be WAL entries recording those commits, so once the
replay was complete everything would be cool. But the clog access code
evidently got confused by being asked to read a page that didn't exist
in the file. I'm not sure yet how that sequence of events occurred,
which is why I asked Chris for a stack trace.

Adding a page of zeroes fixed it by eliminating the read error
condition. It was okay to do so because zeroes is the correct initial
state for a clog page (all transactions in it "still in progress").
After WAL replay, any completed transactions would be updated in the page.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2004-01-23 21:22:49 Re: Disaster!
Previous Message Tom Lane 2004-01-23 21:13:09 Re: Disaster!