Re: Hard limit on WAL space used (because PANIC sucks)

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: MauMau <maumau307(at)gmail(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hard limit on WAL space used (because PANIC sucks)
Date: 2013-06-10 22:45:46
Message-ID: CAMkU=1wCsTvMt=XrqHroaRCynYJZBQKW6hqL948UAVk=mLkr5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jun 8, 2013 at 11:07 AM, Joshua D. Drake <jd(at)commandprompt(dot)com>wrote:

>
> On 06/08/2013 07:36 AM, MauMau wrote:
>
> 1. If the machine or postgres crashes while archive_command is copying a
>> WAL file, later archive recovery fails.
>> This is because cp leaves a file of less than 16MB in archive area, and
>> postgres refuses to start when it finds such a small archive WAL file.
>>
>
Should that be changed? If the file is 16MB but it turns to gibberish
after 3MB, recovery proceeds up to the gibberish. Given that, why should
it refuse to start if the file is only 3MB to start with?

> The solution, which IIRC Tomas san told me here, is to do like "cp %p
>> /archive/dir/%f.tmp && mv /archive/dir/%f.tmp /archive/dir/%f".
>>
>

This will overwrite /archive/dir/%f if it already exists, which is usually
recommended against. Although I don't know that it necessarily should be.
One common problem with archiving is for a network glitch to occur during
the archive command, so the archive command fails and tries again later.
But the later tries will always fail, because the target was created
before/during the glitch. Perhaps a more full featured archive command
would detect and rename an existing file, rather than either overwriting it
or failing.

If we have no compunction about overwriting the file, then I don't see a
reason to use the cp + mv combination. If the simple cp fails to copy the
entire file, it will be tried again until it succeeds.

Well it seems to me that one of the problems here is we tell people to use
> copy. We should be telling people to use a command (or supply a command)
> that is smarter than that.
>

Actually we describe what archive_command needs to fulfill, and tell them
to use something that accomplishes that. The example with cp is explicitly
given as an example, not a recommendation.

>
>
>
> 3. You cannot know the reason of archive_command failure (e.g. archive
>> area full) if you don't use PostgreSQL's server logging.
>> This is because archive_command failure is not logged in syslog/eventlog.
>>
>
> Wait, what? Is this true (someone else?)

It is kind of true. PostgreSQL does not automatically arrange for the
stderr of the archive_command to be sent to syslog. But archive_command
can do whatever it wants, including arranging for its own failure messages
to go to syslog.

Cheers,

Jeff

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Farina 2013-06-10 23:01:40 Re: Hard limit on WAL space used (because PANIC sucks)
Previous Message Andrew Dunstan 2013-06-10 22:40:30 Re: JSON and unicode surrogate pairs