"pg_ctl promote" exit status

Lists: pgsql-hackers
From: Dhruv Ahuja <dhruvahuja(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: "pg_ctl promote" exit status
Date: 2012-10-23 10:39:08
Message-ID: CANv0W23fX3Uhn8RhJbXCsToshmZZa=RVeUW+38r+yfSVOGSdOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

The "pg_ctl promote" command returns an exit code of 1 when the server
is not in standby mode, and the same exit code of 1 when the server
isn't started at all. The only difference at the time being is the
string output at the time, which FYI are...

pg_ctl: cannot promote server; server is not in standby mode

...and...

pg_ctl: PID file "/var/lib/pgsql/9.1/data/postmaster.pid" does not exist
Is server running?

...respectively.

I am in the process of developing a clustering solution around luci
and rgmanager (in Red Hat EL 6) and for the time being, am basing it
off the string output. Maybe each different exit reason should have a
unique exit code, whatever my logic and approach to solving this
problem be?

Thanks


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dhruv Ahuja <dhruvahuja(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: "pg_ctl promote" exit status
Date: 2012-10-23 16:29:11
Message-ID: CA+TgmoZY5L-tWo66qL74=1xON_yGsiLpMyMvuv64zBR0fysDhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 23, 2012 at 6:39 AM, Dhruv Ahuja <dhruvahuja(at)gmail(dot)com> wrote:
> The "pg_ctl promote" command returns an exit code of 1 when the server
> is not in standby mode, and the same exit code of 1 when the server
> isn't started at all. The only difference at the time being is the
> string output at the time, which FYI are...
>
> pg_ctl: cannot promote server; server is not in standby mode
>
> ...and...
>
> pg_ctl: PID file "/var/lib/pgsql/9.1/data/postmaster.pid" does not exist
> Is server running?
>
> ...respectively.
>
> I am in the process of developing a clustering solution around luci
> and rgmanager (in Red Hat EL 6) and for the time being, am basing it
> off the string output. Maybe each different exit reason should have a
> unique exit code, whatever my logic and approach to solving this
> problem be?

That doesn't seem like a bad idea. Got a patch?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "Aaron W(dot) Swenson" <titanofold(at)gentoo(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: "pg_ctl promote" exit status
Date: 2013-01-12 20:30:12
Message-ID: 20130112203010.GA2162@gengoff.gsmr.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 23, 2012 at 12:29:11PM -0400, Robert Haas wrote:
> On Tue, Oct 23, 2012 at 6:39 AM, Dhruv Ahuja <dhruvahuja(at)gmail(dot)com> wrote:
> > The "pg_ctl promote" command returns an exit code of 1 when the server
> > is not in standby mode, and the same exit code of 1 when the server
> > isn't started at all. The only difference at the time being is the
> > string output at the time, which FYI are...
> >
> > pg_ctl: cannot promote server; server is not in standby mode
> >
> > ...and...
> >
> > pg_ctl: PID file "/var/lib/pgsql/9.1/data/postmaster.pid" does not exist
> > Is server running?
> >
> > ...respectively.
> >
> > I am in the process of developing a clustering solution around luci
> > and rgmanager (in Red Hat EL 6) and for the time being, am basing it
> > off the string output. Maybe each different exit reason should have a
> > unique exit code, whatever my logic and approach to solving this
> > problem be?
>
> That doesn't seem like a bad idea. Got a patch?
>

The Linux Standard Base Core Specification 3.1 says this should return
'3'. [1]

[1] http://refspecs.freestandards.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

--
Mr. Aaron W. Swenson
Gentoo Linux Developer
Email : titanofold(at)gentoo(dot)org
GnuPG FP : 2C00 7719 4F85 FB07 A49C 0E31 5713 AA03 D1BB FDA0
GnuPG ID : D1BBFDA0

Attachment Content-Type Size
pg_ctl.c-exit_status.patch text/plain 1.8 KB

From: Dhruv Ahuja <dhruvahuja(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: "pg_ctl promote" exit status
Date: 2013-01-25 18:33:13
Message-ID: CANv0W209uM_j2_bGrFPO7mxMkh4RT_pq+wFUikbQRCn9vDA6rg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

May I propose the attached patch.

Points to note and possibly discuss:
(a) Only exit codes in do_* functions have been changed.
(b) The link to, and the version of, LSB specifications has been updated.
(c) A significant change is the exit code of do_stop() on stopping a
stopped server. Previous return is 1. Proposed return is 0. If this is
accepted, I would highly suggest a mention in the Release Notes.
(d) The exit code that raised this issue was the return of promoting a
promoted server. If promotion fails because the server is running but not
as standby, should that be considered a case of starting a started service,
or an application specific failure? I am equally weighted to opt for the
former, but have proposed differently in the patch.

On 23 October 2012 17:29, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Oct 23, 2012 at 6:39 AM, Dhruv Ahuja <dhruvahuja(at)gmail(dot)com> wrote:
> > The "pg_ctl promote" command returns an exit code of 1 when the server
> > is not in standby mode, and the same exit code of 1 when the server
> > isn't started at all. The only difference at the time being is the
> > string output at the time, which FYI are...
> >
> > pg_ctl: cannot promote server; server is not in standby mode
> >
> > ...and...
> >
> > pg_ctl: PID file "/var/lib/pgsql/9.1/data/postmaster.pid" does not exist
> > Is server running?
> >
> > ...respectively.
> >
> > I am in the process of developing a clustering solution around luci
> > and rgmanager (in Red Hat EL 6) and for the time being, am basing it
> > off the string output. Maybe each different exit reason should have a
> > unique exit code, whatever my logic and approach to solving this
> > problem be?
>
> That doesn't seem like a bad idea. Got a patch?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: titanofold(at)gentoo(dot)org
Subject: Re: "pg_ctl promote" exit status
Date: 2013-01-25 18:54:06
Message-ID: 5102D4CE.7000308@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1/12/13 3:30 PM, Aaron W. Swenson wrote:
> The Linux Standard Base Core Specification 3.1 says this should return
> '3'. [1]
>
> [1] http://refspecs.freestandards.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

The LSB spec doesn't say anything about a "promote" action.

And for the stop and reload actions that you tried to change, 3 is
"unimplemented".

There is an ongoing discussion about the exit status of the stop action
under <https://commitfest.postgresql.org/action/patch_view?id=1045>, so
let's keep this item about the "promote" action.


From: "Aaron W(dot) Swenson" <titanofold(at)gentoo(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: "pg_ctl promote" exit status
Date: 2013-01-26 21:44:24
Message-ID: 20130126214424.GA14960@gengoff.grandmasfridge.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 25, 2013 at 01:54:06PM -0500, Peter Eisentraut wrote:
> On 1/12/13 3:30 PM, Aaron W. Swenson wrote:
> > The Linux Standard Base Core Specification 3.1 says this should return
> > '3'. [1]
> >
> > [1] http://refspecs.freestandards.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
>
> The LSB spec doesn't say anything about a "promote" action.
>
> And for the stop and reload actions that you tried to change, 3 is
> "unimplemented".
>
> There is an ongoing discussion about the exit status of the stop action
> under <https://commitfest.postgresql.org/action/patch_view?id=1045>, so
> let's keep this item about the "promote" action.

You are right. Had I read a little further down, it seems that the
exit status should actually be 7.

--
Mr. Aaron W. Swenson
Gentoo Linux Developer
Email : titanofold(at)gentoo(dot)org
GnuPG FP : 2C00 7719 4F85 FB07 A49C 0E31 5713 AA03 D1BB FDA0
GnuPG ID : D1BBFDA0


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: "pg_ctl promote" exit status
Date: 2013-01-28 09:18:25
Message-ID: 51064261.3000806@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26.01.2013 23:44, Aaron W. Swenson wrote:
> On Fri, Jan 25, 2013 at 01:54:06PM -0500, Peter Eisentraut wrote:
>> On 1/12/13 3:30 PM, Aaron W. Swenson wrote:
>>> The Linux Standard Base Core Specification 3.1 says this should return
>>> '3'. [1]
>>>
>>> [1] http://refspecs.freestandards.org/LSB_3.1.1/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
>>
>> The LSB spec doesn't say anything about a "promote" action.
>>
>> And for the stop and reload actions that you tried to change, 3 is
>> "unimplemented".
>>
>> There is an ongoing discussion about the exit status of the stop action
>> under<https://commitfest.postgresql.org/action/patch_view?id=1045>, so
>> let's keep this item about the "promote" action.
>
> You are right. Had I read a little further down, it seems that the
> exit status should actually be 7.

Not sure if that LSB section is relevant anyway. It specifies the exit
codes for init scripts, but pg_ctl is not an init script.

- Heikki


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "pg_ctl promote" exit status
Date: 2013-01-28 14:28:43
Message-ID: 1359383323.73622.YahooMailNeo@web162905.mail.bf1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:

> Not sure if that LSB section is relevant anyway. It specifies the
> exit codes for init scripts, but pg_ctl is not an init script.

Except that when I went to the trouble of wrapping pg_ctl with an
init script which was thoroughly LSB compliant (according to my
reading) and offered it to the community, everyone said that rather
than have such a complicated script it would be better to change
pg_ctl to include that logic and exit with an LSB compliant exit
code.

-Kevin


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: "pg_ctl promote" exit status
Date: 2013-01-28 14:46:32
Message-ID: 51068F48.3020809@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1/26/13 4:44 PM, Aaron W. Swenson wrote:
> You are right. Had I read a little further down, it seems that the
> exit status should actually be 7.

7 is OK for "not running", but what should we use when the server is not
in standby mode? Using the idempotent argument that we are discussing
for the stop action, promoting a server that is not a standby should be
a noop and exit successfully. Not sure if that is what we want, though.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "pg_ctl promote" exit status
Date: 2013-01-28 15:08:59
Message-ID: 15332.1359385739@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
> Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
>> Not sure if that LSB section is relevant anyway. It specifies the
>> exit codes for init scripts, but pg_ctl is not an init script.

> Except that when I went to the trouble of wrapping pg_ctl with an
> init script which was thoroughly LSB compliant (according to my
> reading) and offered it to the community, everyone said that rather
> than have such a complicated script it would be better to change
> pg_ctl to include that logic and exit with an LSB compliant exit
> code.

Right. The start and stop actions are commonly used in initscripts
so it'd be handy if the exit codes for those didn't need to be
remapped.

On the other hand, it's not at all clear to me that anyone would try
to put the promote action into an initscript, or that LSB would have
anything to say about the exit codes for such a nonstandard action
anyway.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: "pg_ctl promote" exit status
Date: 2013-06-29 02:50:33
Message-ID: 20130629025033.GI13790@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 28, 2013 at 09:46:32AM -0500, Peter Eisentraut wrote:
> On 1/26/13 4:44 PM, Aaron W. Swenson wrote:
> > You are right. Had I read a little further down, it seems that the
> > exit status should actually be 7.
>
> 7 is OK for "not running", but what should we use when the server is not
> in standby mode? Using the idempotent argument that we are discussing
> for the stop action, promoting a server that is not a standby should be
> a noop and exit successfully. Not sure if that is what we want, though.

I looked at all the LSB return codes listed here and mapped them to
pg_ctl error situations:

https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

Patch attached. I did not touch the start/stop return codes.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
pg_ctl.diff text/x-diff 8.8 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: "pg_ctl promote" exit status
Date: 2013-07-01 14:11:23
Message-ID: 51D18E0B.10500@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6/28/13 10:50 PM, Bruce Momjian wrote:
> On Mon, Jan 28, 2013 at 09:46:32AM -0500, Peter Eisentraut wrote:
>> On 1/26/13 4:44 PM, Aaron W. Swenson wrote:
>>> You are right. Had I read a little further down, it seems that the
>>> exit status should actually be 7.
>>
>> 7 is OK for "not running", but what should we use when the server is not
>> in standby mode? Using the idempotent argument that we are discussing
>> for the stop action, promoting a server that is not a standby should be
>> a noop and exit successfully. Not sure if that is what we want, though.
>
> I looked at all the LSB return codes listed here and mapped them to
> pg_ctl error situations:
>
> https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
>
> Patch attached. I did not touch the start/stop return codes.

Approximately none of these changes seem correct to me. For example,
why is failing to open the PID file 6, or failing to start the server 7?


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: "pg_ctl promote" exit status
Date: 2013-07-01 16:47:46
Message-ID: 20130701164746.GB16348@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 1, 2013 at 10:11:23AM -0400, Peter Eisentraut wrote:
> On 6/28/13 10:50 PM, Bruce Momjian wrote:
> > On Mon, Jan 28, 2013 at 09:46:32AM -0500, Peter Eisentraut wrote:
> >> On 1/26/13 4:44 PM, Aaron W. Swenson wrote:
> >>> You are right. Had I read a little further down, it seems that the
> >>> exit status should actually be 7.
> >>
> >> 7 is OK for "not running", but what should we use when the server is not
> >> in standby mode? Using the idempotent argument that we are discussing
> >> for the stop action, promoting a server that is not a standby should be
> >> a noop and exit successfully. Not sure if that is what we want, though.
> >
> > I looked at all the LSB return codes listed here and mapped them to
> > pg_ctl error situations:
> >
> > https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html
> >
> > Patch attached. I did not touch the start/stop return codes.
>
> Approximately none of these changes seem correct to me. For example,
> why is failing to open the PID file 6, or failing to start the server 7?

Well, according to that URL, we have:

6 program is not configured
7 program is not running

I just updated the pg_ctl.c comments to at least point to a valid URL
for this. I think we can just call this item closed because I am still
unclear if these return codes should be returned by pg_ctl or the
start/stop script.

Anyway, while I do think pg_ctl could pass a little more information
back about failure via its return code, I am unclear if LSB is the right
approach.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: "pg_ctl promote" exit status
Date: 2013-07-01 20:20:18
Message-ID: 51D1E482.5090602@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/1/13 12:47 PM, Bruce Momjian wrote:
>> Approximately none of these changes seem correct to me. For example,
>> why is failing to open the PID file 6, or failing to start the server 7?
>
> Well, according to that URL, we have:
>
> 6 program is not configured
> 7 program is not running

There is also

4 user had insufficient privilege

> I just updated the pg_ctl.c comments to at least point to a valid URL
> for this. I think we can just call this item closed because I am still
> unclear if these return codes should be returned by pg_ctl or the
> start/stop script.
>
> Anyway, while I do think pg_ctl could pass a little more information
> back about failure via its return code, I am unclear if LSB is the right
> approach.

Yeah, a lot of these things are unclear and not used in practice, so
it's probably better to stick to exit code 1, unless there is a clear
use case. The "status" case is different, because there the exit code
can be passed out by the init script directly.