Re: [PATCH] Report exit code from external recovery commands properly

Lists: pgsql-hackers
From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: [PATCH] Report exit code from external recovery commands properly
Date: 2013-11-13 11:42:48
Message-ID: 1384342968.15453.2.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

When an external recovery command such as restore_command or
archive_cleanup_command fails, it just reports "return code 34567" or
something, but we have facilities to do decode this properly, so use
them.

Attachment Content-Type Size
0001-Report-exit-code-from-external-recovery-commands-pro.patch text/x-patch 1.8 KB

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Report exit code from external recovery commands properly
Date: 2013-11-14 03:14:43
Message-ID: CAM3SWZQapjCsOzVphJGbz4t3s0LbWbzeiE95K+1E25W_3AyWgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 13, 2013 at 3:42 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> When an external recovery command such as restore_command or
> archive_cleanup_command fails, it just reports "return code 34567" or
> something, but we have facilities to do decode this properly, so use
> them.

I think this is a very good idea, but you should go a bit further:
document the special relationship restore_command has to special
return codes. Currently, the documentation says:

"It is important that the archive command return zero exit status if
and only if it succeeded. Upon getting a zero result, PostgreSQL will
assume that the WAL segment file has been successfully archived, and
will remove or recycle it. However, a nonzero status tells PostgreSQL
that the file was not archived; it will try again periodically until
it succeeds."

Yes, this concerns archive_command (where return code values that are
non-zero *are* never distinguished), but nothing much is said about
the return code of restore_command specifically anywhere else, so it's
implied that it's exactly inverse to archive_command. In reality, some
special return codes have a significance to restore_command: they make
recovery abort, because they're taking as proxies for various failures
that it isn't sensible to continue recovery in the event of.

We're talking about the difference between recovery aborting, and
recovery having conceptually "reached the end of the WAL stream", so
it's very surprising that this isn't documented currently.

--
Peter Geoghegan


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Report exit code from external recovery commands properly
Date: 2013-11-24 13:11:58
Message-ID: 1385298718.31144.0.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2013-11-13 at 19:14 -0800, Peter Geoghegan wrote:
> I think this is a very good idea, but you should go a bit further:
> document the special relationship restore_command has to special
> return codes.

How about this?

Attachment Content-Type Size
recovery-command-exit-doc.patch text/x-patch 3.3 KB

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Report exit code from external recovery commands properly
Date: 2013-11-30 23:56:00
Message-ID: CAM3SWZQ-w7uGiF5-6gvvm3=4-s6rpPkJRJXiZVaR6d_VX3Zedg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 24, 2013 at 5:11 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> How about this?

Hmm. You say:

+ If the command returns a nonzero exit status then a warning log
+ message will be written. An exception is that if the command was
+ terminated by a signal or an error by the shell (such as command not
+ found), a fatal error will be raised.

But in the case of the archiver, in contrast to the startup process,
this isn't really a big deal. It'll just pick up where it left off.
Whereas the reaper code shuts down the system if this happens in the
startup process. In my opinion that's a distinction that bears
emphasizing.

--
Peter Geoghegan


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Report exit code from external recovery commands properly
Date: 2013-12-03 02:23:39
Message-ID: 1386037419.26167.1.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2013-11-30 at 15:56 -0800, Peter Geoghegan wrote:
> On Sun, Nov 24, 2013 at 5:11 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> > How about this?
>
> Hmm. You say:
>
> + If the command returns a nonzero exit status then a warning log
> + message will be written. An exception is that if the command was
> + terminated by a signal or an error by the shell (such as command not
> + found), a fatal error will be raised.
>
> But in the case of the archiver, in contrast to the startup process,
> this isn't really a big deal. It'll just pick up where it left off.
> Whereas the reaper code shuts down the system if this happens in the
> startup process. In my opinion that's a distinction that bears
> emphasizing.

That snippet you quote is about archive_cleanup_command. My patch
doesn't touch archive_command at all.

The current documentation about archive_command contains

<para>
It is important that the archive command return zero exit status if and
only if it succeeds. Upon getting a zero result,
<productname>PostgreSQL</> will assume that the file has been
successfully archived, and will remove or recycle it. However, a nonzero
status tells <productname>PostgreSQL</> that the file was not archived;
it will try again periodically until it succeeds.
</para>

which I think addresses your point.


From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Report exit code from external recovery commands properly
Date: 2013-12-03 02:46:43
Message-ID: CAM3SWZRFdNrAgtuR9XHmmm-1DPxgKHJxu-o_g8XRaZgHzV5GkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 2, 2013 at 6:23 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> which I think addresses your point.

Fair enough. Marked "ready for committer".

--
Peter Geoghegan