Re: FATAL: bogus data in lock file "postmaster.pid": ""

Lists: pgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-01-05 13:18:48
Message-ID: 4F05A338.7000409@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

My laptop ran out of battery and turned itself off while I was just
starting up postmaster. After plugging in the charger and rebooting, I
got the following error when I tried to restart PostgreSQL:

FATAL: bogus data in lock file "postmaster.pid": ""

postmaster.pid file was present in the data directory, but had zero
length. Looking at the way the file is created and written, that can
happen if you crash after the file is created, but before it's
written/fsync'd (my laptop might have write-cache enabled, which would
make the window larger).

I was a bit surprised by that. That's probably not a big deal in
practice, but I wonder if there was some easy way to avoid that. First I
thought we could create the new postmaster.pid file with a temporary
name and rename it in place, but rename(2) will merrily overwrite any
existing file which is not what we want. We could use link(2), I guess.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-01-05 13:23:57
Message-ID: CABUevEx6tCsg0Upc+P8o+VH8qYvHO5WCWUSj0OQeSxHQR+GwCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 5, 2012 at 14:18, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> My laptop ran out of battery and turned itself off while I was just starting
> up postmaster. After plugging in the charger and rebooting, I got the
> following error when I tried to restart PostgreSQL:
>
> FATAL:  bogus data in lock file "postmaster.pid": ""
>
> postmaster.pid file was present in the data directory, but had zero length.
> Looking at the way the file is created and written, that can happen if you
> crash after the file is created, but before it's written/fsync'd (my laptop
> might have write-cache enabled, which would make the window larger).
>
> I was a bit surprised by that. That's probably not a big deal in practice,
> but I wonder if there was some easy way to avoid that. First I thought we
> could create the new postmaster.pid file with a temporary name and rename it
> in place, but rename(2) will merrily overwrite any existing file which is
> not what we want. We could use link(2), I guess.

Is this really a problem big enough to spend even that much effort on?

Perhaps a special-case in the error message instead is enough?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-01-05 14:28:30
Message-ID: CA+TgmoZW4H7W5-t5_iiGPaQ+0kgtXDSn5zf+mAoxQDRLmLkJpg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 5, 2012 at 8:23 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Thu, Jan 5, 2012 at 14:18, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> My laptop ran out of battery and turned itself off while I was just starting
>> up postmaster. After plugging in the charger and rebooting, I got the
>> following error when I tried to restart PostgreSQL:
>>
>> FATAL:  bogus data in lock file "postmaster.pid": ""
>>
>> postmaster.pid file was present in the data directory, but had zero length.
>> Looking at the way the file is created and written, that can happen if you
>> crash after the file is created, but before it's written/fsync'd (my laptop
>> might have write-cache enabled, which would make the window larger).
>>
>> I was a bit surprised by that. That's probably not a big deal in practice,
>> but I wonder if there was some easy way to avoid that. First I thought we
>> could create the new postmaster.pid file with a temporary name and rename it
>> in place, but rename(2) will merrily overwrite any existing file which is
>> not what we want. We could use link(2), I guess.
>
> Is this really a problem big enough to spend even that much effort on?
>
> Perhaps a special-case in the error message instead is enough?

On a laptop it's not a big deal, but it sort of sucks if your
production database fails to start automatically after an unexpected
power outage.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-01-05 16:13:16
Message-ID: 15592.1325779996@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> My laptop ran out of battery and turned itself off while I was just
> starting up postmaster. After plugging in the charger and rebooting, I
> got the following error when I tried to restart PostgreSQL:

> FATAL: bogus data in lock file "postmaster.pid": ""

> postmaster.pid file was present in the data directory, but had zero
> length. Looking at the way the file is created and written, that can
> happen if you crash after the file is created, but before it's
> written/fsync'd (my laptop might have write-cache enabled, which would
> make the window larger).

> I was a bit surprised by that. That's probably not a big deal in
> practice, but I wonder if there was some easy way to avoid that. First I
> thought we could create the new postmaster.pid file with a temporary
> name and rename it in place, but rename(2) will merrily overwrite any
> existing file which is not what we want. We could use link(2), I guess.

I think link(2) would create race conditions of its own. I'd be
inclined to suggest that maybe we should just special-case a zero length
postmaster.pid file as meaning "okay to proceed". In general, garbage
data in postmaster.pid is something I'm happy to insist on manual
recovery from, but maybe we could safely make an exception for empty
files.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-01-05 20:06:31
Message-ID: CABUevExKSQ-7Rs2w9uPkuRd9vDEBWRf42YW6JhJndSD_A68EhA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 5, 2012 at 17:13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> My laptop ran out of battery and turned itself off while I was just
>> starting up postmaster. After plugging in the charger and rebooting, I
>> got the following error when I tried to restart PostgreSQL:
>
>> FATAL:  bogus data in lock file "postmaster.pid": ""
>
>> postmaster.pid file was present in the data directory, but had zero
>> length. Looking at the way the file is created and written, that can
>> happen if you crash after the file is created, but before it's
>> written/fsync'd (my laptop might have write-cache enabled, which would
>> make the window larger).
>
>> I was a bit surprised by that. That's probably not a big deal in
>> practice, but I wonder if there was some easy way to avoid that. First I
>> thought we could create the new postmaster.pid file with a temporary
>> name and rename it in place, but rename(2) will merrily overwrite any
>> existing file which is not what we want. We could use link(2), I guess.
>
> I think link(2) would create race conditions of its own.  I'd be
> inclined to suggest that maybe we should just special-case a zero length
> postmaster.pid file as meaning "okay to proceed".  In general, garbage

That's pretty much what I meant - but with a warning message.

> data in postmaster.pid is something I'm happy to insist on manual
> recovery from, but maybe we could safely make an exception for empty
> files.

+1.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-01-05 22:19:54
Message-ID: 21846.1325801994@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Thu, Jan 5, 2012 at 17:13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I think link(2) would create race conditions of its own. I'd be
>> inclined to suggest that maybe we should just special-case a zero length
>> postmaster.pid file as meaning "okay to proceed". In general, garbage

> That's pretty much what I meant - but with a warning message.

Actually ... wait a minute. If we allow that, don't we create a race
condition between two postmasters starting at almost the same instant?
The second one could see the lock file when the first has created but
not yet filled it.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-01-06 11:13:25
Message-ID: CABUevEwhH22SHK2aCJVWM75UG0QHLL9D+mGPaqt=XtGYDagVkw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 5, 2012 at 23:19, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Thu, Jan 5, 2012 at 17:13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I think link(2) would create race conditions of its own.  I'd be
>>> inclined to suggest that maybe we should just special-case a zero length
>>> postmaster.pid file as meaning "okay to proceed".  In general, garbage
>
>> That's pretty much what I meant - but with a warning message.
>
> Actually ... wait a minute.  If we allow that, don't we create a race
> condition between two postmasters starting at almost the same instant?
> The second one could see the lock file when the first has created but
> not yet filled it.

Good point, yeah, it should do that. But I still think it's rare
enough that just special-casing the error message should be enough...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Michael Beattie <mtbeedee(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-01-06 13:28:48
Message-ID: CAJ=Gs0Kjw9iEnuxOL6J_8NvR6SH6rrsSbL9Gy1t_-pFKNR8dtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 6, 2012 at 6:13 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:

> On Thu, Jan 5, 2012 at 23:19, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Magnus Hagander <magnus(at)hagander(dot)net> writes:
> >> On Thu, Jan 5, 2012 at 17:13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> I think link(2) would create race conditions of its own. I'd be
> >>> inclined to suggest that maybe we should just special-case a zero
> length
> >>> postmaster.pid file as meaning "okay to proceed". In general, garbage
> >
> >> That's pretty much what I meant - but with a warning message.
> >
> > Actually ... wait a minute. If we allow that, don't we create a race
> > condition between two postmasters starting at almost the same instant?
> > The second one could see the lock file when the first has created but
> > not yet filled it.
>
> Good point, yeah, it should do that. But I still think it's rare
> enough that just special-casing the error message should be enough...
>
>
so just something that does like

stat(filename, &st);
size = st.st_size;
if (size == 0)
elog(ERROR, "lock file \"%s\" has 0 length.", filename);

somewhere in CreateLockFile in miscinit.c?


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Michael Beattie <mtbeedee(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-27 04:02:13
Message-ID: 20120827040213.GF28780@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 6, 2012 at 08:28:48AM -0500, Michael Beattie wrote:
> On Fri, Jan 6, 2012 at 6:13 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>
> On Thu, Jan 5, 2012 at 23:19, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Magnus Hagander <magnus(at)hagander(dot)net> writes:
> >> On Thu, Jan 5, 2012 at 17:13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> I think link(2) would create race conditions of its own. I'd be
> >>> inclined to suggest that maybe we should just special-case a zero
> length
> >>> postmaster.pid file as meaning "okay to proceed". In general, garbage
> >
> >> That's pretty much what I meant - but with a warning message.
> >
> > Actually ... wait a minute. If we allow that, don't we create a race
> > condition between two postmasters starting at almost the same instant?
> > The second one could see the lock file when the first has created but
> > not yet filled it.
>
> Good point, yeah, it should do that. But I still think it's rare
> enough that just special-casing the error message should be enough...
>
>
>
> so just something that does like
>
> stat(filename, &st);
> size = st.st_size;
> if (size == 0)
> elog(ERROR, "lock file \"%s\" has 0 length.", filename);
>
> somewhere in CreateLockFile in miscinit.c?

I have developed the attached patch to report a zero-length file, as you
suggested.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
pid.diff text/x-diff 534 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-27 20:29:46
Message-ID: 8764.1346099386@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> I have developed the attached patch to report a zero-length file, as you
> suggested.

DIRECTORY_LOCK_FILE is entirely incorrect there.

Taking a step back, I don't think this message is much better than the
existing behavior of reporting "bogus data". Either way, it's not
obvious to typical users what the problem is or what to do about it.
If we're going to emit a special message I think it should be more user
friendly than this.

Perhaps something like:

FATAL: lock file "foo" is empty
HINT: This may mean that another postmaster was starting at the
same time. If not, remove the lock file and try again.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-27 22:02:06
Message-ID: CA+TgmoaKDUfMQT7_uSNCcU92+CktvUy8RqvpXt=Qk+J=CJiV=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 27, 2012 at 4:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> I have developed the attached patch to report a zero-length file, as you
>> suggested.
>
> DIRECTORY_LOCK_FILE is entirely incorrect there.
>
> Taking a step back, I don't think this message is much better than the
> existing behavior of reporting "bogus data". Either way, it's not
> obvious to typical users what the problem is or what to do about it.
> If we're going to emit a special message I think it should be more user
> friendly than this.
>
> Perhaps something like:
>
> FATAL: lock file "foo" is empty
> HINT: This may mean that another postmaster was starting at the
> same time. If not, remove the lock file and try again.

The problem with this is that it gives the customer only one remedy,
which they will (if experience is any guide) try whether it is
actually correct to do so or not.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-27 22:16:53
Message-ID: 1346105371-sup-9216@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Robert Haas's message of lun ago 27 18:02:06 -0400 2012:
> On Mon, Aug 27, 2012 at 4:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> >> I have developed the attached patch to report a zero-length file, as you
> >> suggested.
> >
> > DIRECTORY_LOCK_FILE is entirely incorrect there.
> >
> > Taking a step back, I don't think this message is much better than the
> > existing behavior of reporting "bogus data". Either way, it's not
> > obvious to typical users what the problem is or what to do about it.
> > If we're going to emit a special message I think it should be more user
> > friendly than this.
> >
> > Perhaps something like:
> >
> > FATAL: lock file "foo" is empty
> > HINT: This may mean that another postmaster was starting at the
> > same time. If not, remove the lock file and try again.
>
> The problem with this is that it gives the customer only one remedy,
> which they will (if experience is any guide) try whether it is
> actually correct to do so or not.

How about having it sleep for a short while, then try again? I would
expect that it would cause the second postmaster to fail during the
second try, which is okay because the first one is then operational.
The problem, of course, is how long to sleep so that this doesn't fail
when load is high enough that the first postmaster still hasn't written
the file after the sleep.

Maybe

LOG: lock file "foo" is empty, sleeping to retry
-- sleep 100ms and recheck
LOG: lock file "foo" is empty, sleeping to retry
-- sleep, dunno, 1s, recheck
LOG: lock file "foo" is empty, sleeping to retry
-- sleep maybe 5s? recheck
FATAL: lock file "foo" is empty
HINT: Is another postmaster running on data directory "bar"?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-27 23:35:14
Message-ID: 12375.1346110514@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Aug 27, 2012 at 4:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Perhaps something like:
>>
>> FATAL: lock file "foo" is empty
>> HINT: This may mean that another postmaster was starting at the
>> same time. If not, remove the lock file and try again.

> The problem with this is that it gives the customer only one remedy,
> which they will (if experience is any guide) try whether it is
> actually correct to do so or not.

Which other remedy(s) do you think the hint should suggest?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-27 23:39:35
Message-ID: 12468.1346110775@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> How about having it sleep for a short while, then try again?

I could get behind that, but I don't think the delay should be more than
100ms or so. It's important for the postmaster to acquire the lock (or
not) pretty quickly, or pg_ctl is going to get confused. If we keep it
short, we can also dispense with the log spam you were suggesting.

(Actually, I wonder if this type of scenario isn't going to confuse
pg_ctl already --- it might think the lockfile belongs to the postmaster
*it* started, not some pre-existing one. Does that matter?)

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-28 01:49:27
Message-ID: 20120828014927.GB6786@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 27, 2012 at 07:39:35PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> > How about having it sleep for a short while, then try again?
>
> I could get behind that, but I don't think the delay should be more than
> 100ms or so. It's important for the postmaster to acquire the lock (or
> not) pretty quickly, or pg_ctl is going to get confused. If we keep it
> short, we can also dispense with the log spam you were suggesting.
>
> (Actually, I wonder if this type of scenario isn't going to confuse
> pg_ctl already --- it might think the lockfile belongs to the postmaster
> *it* started, not some pre-existing one. Does that matter?)

I took Alvaro's approach of a sleep. The file test was already in a
loop that went 100 times. Basically, if the lock file exists, this
postmaster isn't going to succeed, so I figured there is no reason to
rush in the testing. I gave it 5 tries with one second between
attempts. Either the file is being populated, or it is stale and empty.

I checked pg_ctl and that has a default wait of 60 second, so 5 seconds
to exit out of the postmaster should be fine.

Patch attached.

FYI, I noticed we have a similar 5-second creation time requirement in
pg_ctl:

/*
* The postmaster should create postmaster.pid very soon after being
* started. If it's not there after we've waited 5 or more seconds,
* assume startup failed and give up waiting. (Note this covers both
* cases where the pidfile was never created, and where it was created
* and then removed during postmaster exit.) Also, if there *is* a
* file there but it appears stale, issue a suitable warning and give
* up waiting.
*/
if (i >= 5)

This is for the case where the file has an old pid, rather than it is
empty.

FYI, I fixed the filename problem Tom found.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
pid.diff text/x-diff 1.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-28 01:59:10
Message-ID: 15392.1346119150@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> On Mon, Aug 27, 2012 at 07:39:35PM -0400, Tom Lane wrote:
>> I could get behind that, but I don't think the delay should be more than
>> 100ms or so.

> I took Alvaro's approach of a sleep. The file test was already in a
> loop that went 100 times. Basically, if the lock file exists, this
> postmaster isn't going to succeed, so I figured there is no reason to
> rush in the testing. I gave it 5 tries with one second between
> attempts. Either the file is being populated, or it is stale and empty.

How did "100ms" translate to 5 seconds?

> I checked pg_ctl and that has a default wait of 60 second, so 5 seconds
> to exit out of the postmaster should be fine.

pg_ctl is not the only consideration here. In particular, there are a
lot of initscripts out there (all of Red Hat's, for instance) that don't
use pg_ctl and expect the postmaster to come up (or not) in a couple of
seconds.

I don't see a need for more than about one retry with 100ms delay.
There is no evidence that the case we're worried about has ever occurred
in the real world anyway, so slowing down error failures to make really
really really sure there's not a competing postmaster doesn't seem like
a good tradeoff.

I'm not terribly impressed with that errhint, either.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-28 02:17:43
Message-ID: 20120828021743.GC6786@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 27, 2012 at 09:59:10PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > On Mon, Aug 27, 2012 at 07:39:35PM -0400, Tom Lane wrote:
> >> I could get behind that, but I don't think the delay should be more than
> >> 100ms or so.
>
> > I took Alvaro's approach of a sleep. The file test was already in a
> > loop that went 100 times. Basically, if the lock file exists, this
> > postmaster isn't going to succeed, so I figured there is no reason to
> > rush in the testing. I gave it 5 tries with one second between
> > attempts. Either the file is being populated, or it is stale and empty.
>
> How did "100ms" translate to 5 seconds?

That was the "no need to rush, let's just be sure of what we report".

> > I checked pg_ctl and that has a default wait of 60 second, so 5 seconds
> > to exit out of the postmaster should be fine.
>
> pg_ctl is not the only consideration here. In particular, there are a
> lot of initscripts out there (all of Red Hat's, for instance) that don't
> use pg_ctl and expect the postmaster to come up (or not) in a couple of
> seconds.
>
> I don't see a need for more than about one retry with 100ms delay.
> There is no evidence that the case we're worried about has ever occurred
> in the real world anyway, so slowing down error failures to make really
> really really sure there's not a competing postmaster doesn't seem like
> a good tradeoff.
>
> I'm not terribly impressed with that errhint, either.

I am concerned at 100ms that we can't be sure if it is still being
created, and if we can't be sure, I am not sure there is much point in
trying to clarify the odd error message we omit.

FYI, here is what the code does now with a zero-length pid file, with my
patch:

$ postmaster
[ wait 5 seconds ]
FATAL: lock file "postmaster.pid" is empty
HINT: Empty lock file probably left from operating system crash during
database startup; file deletion suggested.
$ pg_ctl start
pg_ctl: invalid data in PID file "/u/pgsql/data/postmaster.pid"
$ pg_ctl -w start
pg_ctl: invalid data in PID file "/u/pgsql/data/postmaster.pid"

Seems pg_ctl would also need some cleanup if we change the error
message and/or timing.

I am thinking we should just change the error message in the postmaster
and pg_ctl to say the file is empty, and call it done (no hint message).
If we do want a hint, say that either the file is stale from a crash or
another postmaster is starting up, and let the user diagnose it.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-28 19:06:15
Message-ID: 20120828190615.GH16116@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 27, 2012 at 10:17:43PM -0400, Bruce Momjian wrote:
> Seems pg_ctl would also need some cleanup if we change the error
> message and/or timing.
>
> I am thinking we should just change the error message in the postmaster
> and pg_ctl to say the file is empty, and call it done (no hint message).
> If we do want a hint, say that either the file is stale from a crash or
> another postmaster is starting up, and let the user diagnose it.

Updated patch attached which just reports the file as empty. I assume
we don't want the extra text output for pg_ctl like we do for the
backend.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
pid.diff text/x-diff 1.4 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-28 20:25:36
Message-ID: 8665.1346185536@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Updated patch attached which just reports the file as empty. I assume
> we don't want the extra text output for pg_ctl like we do for the
> backend.

The backend side of this looks mostly sane to me (but drop the \n,
messages are not supposed to contain those). But the feof test proposed
for pg_ctl is no good: consider a file containing just, say, "-".
fscanf would eat the "-", then hit eof, and this would complain the file
is empty. Possibly checking for ftell(pidf) == 0 would do, though I'm
not sure whether it's portable to assume fscanf would eat a non-numeric
character before complaining.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-29 02:21:27
Message-ID: 20120829022127.GA26103@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 28, 2012 at 04:25:36PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Updated patch attached which just reports the file as empty. I assume
> > we don't want the extra text output for pg_ctl like we do for the
> > backend.
>
> The backend side of this looks mostly sane to me (but drop the \n,
> messages are not supposed to contain those). But the feof test proposed

Removed. I thought we needed to add \n so that strings >80 would wrap
properly. How do we handle this?

> for pg_ctl is no good: consider a file containing just, say, "-".
> fscanf would eat the "-", then hit eof, and this would complain the file
> is empty. Possibly checking for ftell(pidf) == 0 would do, though I'm
> not sure whether it's portable to assume fscanf would eat a non-numeric
> character before complaining.

ftell() seems to work fine when combined with feof(), so I used that in
the attached patch. ftell() alone remains at zero if the file contains
"A", so feof() is also needed.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
pid.diff text/x-diff 1.4 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-29 02:28:55
Message-ID: 24583.1346207335@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> On Tue, Aug 28, 2012 at 04:25:36PM -0400, Tom Lane wrote:
>> The backend side of this looks mostly sane to me (but drop the \n,
>> messages are not supposed to contain those). But the feof test proposed

> Removed. I thought we needed to add \n so that strings >80 would wrap
> properly. How do we handle this?

We don't. Per the message style guidelines, it's the responsibility of
a client frontend to line-wrap such text if it feels the need to. The
backend has no business assuming that 80 characters (or any other
number) is where to wrap.

regards, tom lane


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-29 04:24:26
Message-ID: 1346214040-sup-2010@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Bruce Momjian's message of mar ago 28 22:21:27 -0400 2012:
> On Tue, Aug 28, 2012 at 04:25:36PM -0400, Tom Lane wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > Updated patch attached which just reports the file as empty. I assume
> > > we don't want the extra text output for pg_ctl like we do for the
> > > backend.
> >
> > The backend side of this looks mostly sane to me (but drop the \n,
> > messages are not supposed to contain those). But the feof test proposed
>
> Removed.

It's a pretty strange line wrap you got in this version of the patch.
Normally we just let the string run past the 78 char limit, without
cutting it in any way. And moving the start of the string to the line
following "errhint(" looks very odd to me.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-29 04:52:25
Message-ID: 20120829045225.GC26103@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 29, 2012 at 12:24:26AM -0400, Alvaro Herrera wrote:
> Excerpts from Bruce Momjian's message of mar ago 28 22:21:27 -0400 2012:
> > On Tue, Aug 28, 2012 at 04:25:36PM -0400, Tom Lane wrote:
> > > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > > Updated patch attached which just reports the file as empty. I assume
> > > > we don't want the extra text output for pg_ctl like we do for the
> > > > backend.
> > >
> > > The backend side of this looks mostly sane to me (but drop the \n,
> > > messages are not supposed to contain those). But the feof test proposed
> >
> > Removed.
>
> It's a pretty strange line wrap you got in this version of the patch.
> Normally we just let the string run past the 78 char limit, without
> cutting it in any way. And moving the start of the string to the line
> following "errhint(" looks very odd to me.

OK, updated patch attached.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
pid.diff text/x-diff 1.4 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-29 04:56:26
Message-ID: 27691.1346216186@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> On Wed, Aug 29, 2012 at 12:24:26AM -0400, Alvaro Herrera wrote:
>> It's a pretty strange line wrap you got in this version of the patch.
>> Normally we just let the string run past the 78 char limit, without
>> cutting it in any way. And moving the start of the string to the line
>> following "errhint(" looks very odd to me.

> OK, updated patch attached.

I agree with Alvaro's complaint that moving the whole string literal to
the next line isn't conformant to our usual coding style. Definitely
nitpicky, but why would you do it like that?

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-29 12:51:40
Message-ID: 20120829125140.GD26103@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 29, 2012 at 12:56:26AM -0400, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > On Wed, Aug 29, 2012 at 12:24:26AM -0400, Alvaro Herrera wrote:
> >> It's a pretty strange line wrap you got in this version of the patch.
> >> Normally we just let the string run past the 78 char limit, without
> >> cutting it in any way. And moving the start of the string to the line
> >> following "errhint(" looks very odd to me.
>
> > OK, updated patch attached.
>
> I agree with Alvaro's complaint that moving the whole string literal to
> the next line isn't conformant to our usual coding style. Definitely
> nitpicky, but why would you do it like that?

I remember now why I added "\n". I am used to writing pg_upgrade output
strings, which are obviously not sent to log files. Seems I forgot that
distinction. As far as moving the string to the next line, I was trying
to keep the line from getting too long.

Updated patch has everyone on the same line. I am fine with nitpicky.
Frankly, I have applied so many patches in the past few weeks, I am glad
someone is watching.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
pid.diff text/x-diff 1.4 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-29 21:05:37
Message-ID: 20120829210537.GB8753@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Applied.

---------------------------------------------------------------------------

On Wed, Aug 29, 2012 at 08:51:40AM -0400, Bruce Momjian wrote:
> On Wed, Aug 29, 2012 at 12:56:26AM -0400, Tom Lane wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > > On Wed, Aug 29, 2012 at 12:24:26AM -0400, Alvaro Herrera wrote:
> > >> It's a pretty strange line wrap you got in this version of the patch.
> > >> Normally we just let the string run past the 78 char limit, without
> > >> cutting it in any way. And moving the start of the string to the line
> > >> following "errhint(" looks very odd to me.
> >
> > > OK, updated patch attached.
> >
> > I agree with Alvaro's complaint that moving the whole string literal to
> > the next line isn't conformant to our usual coding style. Definitely
> > nitpicky, but why would you do it like that?
>
> I remember now why I added "\n". I am used to writing pg_upgrade output
> strings, which are obviously not sent to log files. Seems I forgot that
> distinction. As far as moving the string to the next line, I was trying
> to keep the line from getting too long.
>
> Updated patch has everyone on the same line. I am fine with nitpicky.
> Frankly, I have applied so many patches in the past few weeks, I am glad
> someone is watching.
>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://enterprisedb.com
>
> + It's impossible for everything to be true. +

> diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
> new file mode 100644
> index 775d71f..9a0f92c
> *** a/src/backend/utils/init/miscinit.c
> --- b/src/backend/utils/init/miscinit.c
> *************** CreateLockFile(const char *filename, boo
> *** 766,771 ****
> --- 766,779 ----
> filename)));
> close(fd);
>
> + if (len == 0)
> + {
> + ereport(FATAL,
> + (errcode(ERRCODE_LOCK_FILE_EXISTS),
> + errmsg("lock file \"%s\" is empty", filename),
> + errhint("Either another server is starting, or the lock file is the remnant of a previous server startup crash.")));
> + }
> +
> buffer[len] = '\0';
> encoded_pid = atoi(buffer);
>
> diff --git a/src/bin/pg_ctl/pg_ctl.c b/src/bin/pg_ctl/pg_ctl.c
> new file mode 100644
> index af8d8b2..81ba39e
> *** a/src/bin/pg_ctl/pg_ctl.c
> --- b/src/bin/pg_ctl/pg_ctl.c
> *************** get_pgpid(void)
> *** 292,299 ****
> }
> if (fscanf(pidf, "%ld", &pid) != 1)
> {
> ! write_stderr(_("%s: invalid data in PID file \"%s\"\n"),
> ! progname, pid_file);
> exit(1);
> }
> fclose(pidf);
> --- 292,304 ----
> }
> if (fscanf(pidf, "%ld", &pid) != 1)
> {
> ! /* Is the file empty? */
> ! if (ftell(pidf) == 0 && feof(pidf))
> ! write_stderr(_("%s: the PID file \"%s\" is empty\n"),
> ! progname, pid_file);
> ! else
> ! write_stderr(_("%s: invalid data in PID file \"%s\"\n"),
> ! progname, pid_file);
> exit(1);
> }
> fclose(pidf);

>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +