Re: Hard limit on WAL space used (because PANIC sucks)

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hard limit on WAL space used (because PANIC sucks)
Date: 2013-06-07 04:30:51
Message-ID: CAMkU=1wR3R_P2s6=J6qmL+V6ox62UBSAWscyycii-soU6YfHMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, June 6, 2013, Josh Berkus wrote:

> Let's talk failure cases.
>
> There's actually three potential failure cases here:
>
> - One Volume: WAL is on the same volume as PGDATA, and that volume is
> completely out of space.
>
> - XLog Partition: WAL is on its own partition/volume, and fills it up.
>
> - Archiving: archiving is failing or too slow, causing the disk to fill
> up with waiting log segments.
>
> I'll argue that these three cases need to be dealt with in three
> different ways, and no single solution is going to work for all three.
>
> Archiving
> ---------
>
> In some ways, this is the simplest case. Really, we just need a way to
> know when the available WAL space has become 90% full, and abort
> archiving at that stage. Once we stop attempting to archive, we can
> clean up the unneeded log segments.
>

I would oppose that as the solution, either an unconditional one, or
configurable with is it as the default. Those segments are not unneeded.
I need them. That is why I set up archiving in the first place. If you
need to shut down the database rather than violate my established retention
policy, then shut down the database.

> What we need is a better way for the DBA to find out that archiving is
> falling behind when it first starts to fall behind. Tailing the log and
> examining the rather cryptic error messages we give out isn't very
> effective.
>

The archive command can be made a shell script (or that matter a compiled
program) which can do anything it wants upon failure, including emailing
people. Of course maybe whatever causes the archive to fail will also
cause the delivery of the message to fail, but I don't see a real solution
to this that doesn't start down an infinite regress. If it is not failing
outright, but merely falling behind, then I don't really know how to go
about detecting that, either in archive_command, or through tailing the
PostgreSQL log. I guess archive_command, each time it is invoked, could
count the files in the pg_xlog directory and warn if it thinks the number
is unreasonable.

>
> xLog Partition
> --------------
>
> As Heikki pointed, out, a full dedicated WAL drive is hard to fix once
> it gets full, since there's nothing you can safely delete to clear
> space, even enough for a checkpoint record.
>

Although the DBA probably wouldn't know it from reading the manual, it is
almost always safe to delete the oldest WAL file (after copying it to a
different partition just in case something goes wrong--it should be
possible to do that as if WAL is on its own partition, it is hard to
imagine you can't scrounge up 16MB on a different one), as PostgreSQL keeps
two complete checkpoints worth of WAL around. I think the only reason you
would not be able to recover after removing the oldest file is if the
controldata file is damaged such that the most recent checkpoint record
cannot be found and so it has to fall back to the previous one. Or at
least, this is my understanding.

>
> On the other hand, it should be easy to prevent full status; we could
> simply force a non-spread checkpoint whenever the available WAL space
> gets 90% full. We'd also probably want to be prepared to switch to a
> read-only mode if we get full enough that there's only room for the
> checkpoint records.
>

I think that that last sentence could also be applied without modification
to the "one volume" case as well.

So what would that look like? Before accepting a (non-checkpoint) WAL
Insert that fills up the current segment to a high enough level that a
checkpoint record will no longer fit, it must first verify that a recycled
file exists, or if not it must successfully init a new file.

If that init fails, then it must do what? Signal for a checkpoint, release
it's locks, and then ERROR out? That would be better than a PANIC, but can
it do better? Enter a retry loop so that once the checkpoint has finished
and assuming it has freed up enough WAL files to recycling/removal, then it
can try the original WAL Insert again?

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2013-06-07 05:06:43 Re: Hard limit on WAL space used (because PANIC sucks)
Previous Message Amit Kapila 2013-06-07 04:14:42 Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]