Re: Expose checkpoint start/finish times into SQL.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Expose checkpoint start/finish times into SQL.
Date: 2008-04-04 06:36:35
Message-ID: 6413.1207290995@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Greg Smith <gsmith(at)gregsmith(dot)com> writes:
> On Fri, 4 Apr 2008, Tom Lane wrote:
>> (And you still didn't tell me what the actual failure case was.)

> Database stops checkpointing. WAL files pile up. In the middle of
> backup, system finally dies, and when it starts recovery there's a bad
> record in the WAL files--which there are now thousands of to apply, and
> the bad one is 4 hours of replay in. Believe it or not, it goes downhill
> from there.

> It's what kicked off the first step that's the big mystery.

Indeed :-(. But given those observations, I'd still have about zero
faith in the usefulness of this patch. If the bgwriter is not able to
complete checkpoints, is it able to tell you the truth about what it's
doing?

The actual advice I'd give to a DBA faced with such a case is to
kill -ABRT the bgwriter and send the stack trace to -hackers.
That's not in the proposed patch though...

regards, tom lane

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Greg Smith 2008-04-04 07:01:06 Re: Expose checkpoint start/finish times into SQL.
Previous Message Greg Smith 2008-04-04 06:21:32 Re: Expose checkpoint start/finish times into SQL.