Re: BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve

Lists: pgsql-bugspgsql-hackers
From: giomac(at)gmail(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
Date: 2013-01-18 22:19:48
Message-ID: E1TwKHs-0005yp-39@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

The following bug has been logged on the website:

Bug reference: 7815
Logged by: George Machitidze
Email address: giomac(at)gmail(dot)com
PostgreSQL version: 9.2.2
Operating system: Fedora 18 Linux
Description:

https://bugzilla.redhat.com/show_bug.cgi?id=896161
Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails
with invalid message "There seems to be a postmaster servicing the old
cluster". Looks like pg_upgrade is checking pid file too early without
waiting for master process to exit:

open("/var/lib/pgsql/data-old/postmaster.pid", O_RDONLY) = 5


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: giomac(at)gmail(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
Date: 2013-01-19 03:32:57
Message-ID: 20130119033257.GC2857@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Fri, Jan 18, 2013 at 10:19:48PM +0000, giomac(at)gmail(dot)com wrote:
> The following bug has been logged on the website:
>
> Bug reference: 7815
> Logged by: George Machitidze
> Email address: giomac(at)gmail(dot)com
> PostgreSQL version: 9.2.2
> Operating system: Fedora 18 Linux
> Description:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=896161
> Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails
> with invalid message "There seems to be a postmaster servicing the old
> cluster". Looks like pg_upgrade is checking pid file too early without
> waiting for master process to exit:
>
> open("/var/lib/pgsql/data-old/postmaster.pid", O_RDONLY) = 5

How are you shutting down the postmaster? Are you use pg_ctl -w stop?
If not, you have to wait for the server to actually shut down before
starting pg_upgrade. pg_upgrade is not going to do that waiting.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: giomac(at)gmail(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
Date: 2013-01-19 05:02:31
Message-ID: 14155.1358571751@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> On Fri, Jan 18, 2013 at 10:19:48PM +0000, giomac(at)gmail(dot)com wrote:
>> https://bugzilla.redhat.com/show_bug.cgi?id=896161
>> Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails
>> with invalid message "There seems to be a postmaster servicing the old
>> cluster". Looks like pg_upgrade is checking pid file too early without
>> waiting for master process to exit:
>>
>> open("/var/lib/pgsql/data-old/postmaster.pid", O_RDONLY) = 5

> How are you shutting down the postmaster? Are you use pg_ctl -w stop?
> If not, you have to wait for the server to actually shut down before
> starting pg_upgrade. pg_upgrade is not going to do that waiting.

The backstory on this is at the cited Red Hat bug ... apparently the OP
decided I was clueless and he needed to consult some real authorities.

The existing pg_control clearly says that the cluster was shut down,
so it's not clear why there's still a postmaster.pid file there.
There's some debugging to be done yet about how that got to be that way.
(AFAICS the RPM upgrade process ought to shut down the old postmaster
before installing a new one; but somehow that went wrong, or else a
doppelganger postmaster.pid rose from the dead. Anyway, that's not a
matter for this list because it involves Red Hat upgrade processes, not
anything supplied by the community.)

In the meantime, I was wondering a bit why pg_upgrade looks at the
postmaster.pid file at all. Generally we recommend that startup scripts
*not* look at the lock file but just try to start a postmaster, and
leave it to the postmaster to decide if there's a valid lockfile
present. Is it really appropriate for pg_upgrade to do this
differently? I think the complained-of case would have gone through
cleanly if that error check weren't there, or in any case the postmaster
would have done a better job of checking for a conflicting postmaster.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: giomac(at)gmail(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
Date: 2013-01-19 05:15:49
Message-ID: 20130119051549.GD2857@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Sat, Jan 19, 2013 at 12:02:31AM -0500, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > On Fri, Jan 18, 2013 at 10:19:48PM +0000, giomac(at)gmail(dot)com wrote:
> >> https://bugzilla.redhat.com/show_bug.cgi?id=896161
> >> Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails
> >> with invalid message "There seems to be a postmaster servicing the old
> >> cluster". Looks like pg_upgrade is checking pid file too early without
> >> waiting for master process to exit:
> >>
> >> open("/var/lib/pgsql/data-old/postmaster.pid", O_RDONLY) = 5
>
> > How are you shutting down the postmaster? Are you use pg_ctl -w stop?
> > If not, you have to wait for the server to actually shut down before
> > starting pg_upgrade. pg_upgrade is not going to do that waiting.
>
> The backstory on this is at the cited Red Hat bug ... apparently the OP
> decided I was clueless and he needed to consult some real authorities.

Yes, it was clear there was some backstory in reading that thread.

> The existing pg_control clearly says that the cluster was shut down,
> so it's not clear why there's still a postmaster.pid file there.
> There's some debugging to be done yet about how that got to be that way.
> (AFAICS the RPM upgrade process ought to shut down the old postmaster
> before installing a new one; but somehow that went wrong, or else a
> doppelganger postmaster.pid rose from the dead. Anyway, that's not a
> matter for this list because it involves Red Hat upgrade processes, not
> anything supplied by the community.)
>
> In the meantime, I was wondering a bit why pg_upgrade looks at the
> postmaster.pid file at all. Generally we recommend that startup scripts
> *not* look at the lock file but just try to start a postmaster, and
> leave it to the postmaster to decide if there's a valid lockfile
> present. Is it really appropriate for pg_upgrade to do this
> differently? I think the complained-of case would have gone through
> cleanly if that error check weren't there, or in any case the postmaster
> would have done a better job of checking for a conflicting postmaster.

The reason we check for postmaster.pid is so we can give the user a clue
about which postmaster is running. We want to make sure everything is
super-clean before we do anything. What we could do is to first try to
start each cluster, and then fail if the start fails, but the start
could fail for all sorts of reasons so it doesn't really seems like a
win.

Also, we don't want to start on a non-clean shutdown, so the missing pid
file tells us it was clean.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: giomac(at)gmail(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
Date: 2013-01-19 05:47:03
Message-ID: 14955.1358574423@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> On Sat, Jan 19, 2013 at 12:02:31AM -0500, Tom Lane wrote:
>> In the meantime, I was wondering a bit why pg_upgrade looks at the
>> postmaster.pid file at all.

> The reason we check for postmaster.pid is so we can give the user a clue
> about which postmaster is running.

[ scratches head... ] I failed to detect any such clue in the error
message it prints. Had you printed the PID from the file, or even
better looked to see if that process was actually still alive, this
argument would be reasonable. But pg_upgrade does neither of those,
whereas if it had started a postmaster the postmaster would have done
both of those things.

> Also, we don't want to start on a non-clean shutdown, so the missing pid
> file tells us it was clean.

I agree that super paranoia is not unreasonable in pg_upgrade. But it
would be useful to print something similar to what the backend prints,
about checking whether PID N is still there and manually removing the
lock file if not. Or (ahem) you could let the existing backend-side
logic do that for you, rather than reimplementing that logic badly.

Meanwhile I still have to figure out how come the postmaster.pid file
is still there in the OP's case ...

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: giomac(at)gmail(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
Date: 2013-01-19 15:02:42
Message-ID: 20130119150242.GE2857@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Sat, Jan 19, 2013 at 12:47:03AM -0500, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > On Sat, Jan 19, 2013 at 12:02:31AM -0500, Tom Lane wrote:
> >> In the meantime, I was wondering a bit why pg_upgrade looks at the
> >> postmaster.pid file at all.
>
> > The reason we check for postmaster.pid is so we can give the user a clue
> > about which postmaster is running.
>
> [ scratches head... ] I failed to detect any such clue in the error
> message it prints. Had you printed the PID from the file, or even
> better looked to see if that process was actually still alive, this
> argument would be reasonable. But pg_upgrade does neither of those,
> whereas if it had started a postmaster the postmaster would have done
> both of those things.
>
> > Also, we don't want to start on a non-clean shutdown, so the missing pid
> > file tells us it was clean.
>
> I agree that super paranoia is not unreasonable in pg_upgrade. But it
> would be useful to print something similar to what the backend prints,
> about checking whether PID N is still there and manually removing the
> lock file if not. Or (ahem) you could let the existing backend-side
> logic do that for you, rather than reimplementing that logic badly.

The current output is:

There seems to be a postmaster servicing the old cluster.
Please shutdown that postmaster and try again.

You are right that it is inaccurate. I should reword that to say the
server is running or was not properly shut down:

There seems to be a postmaster servicing the old cluster, or
it was not properly shut down. Please cleanly shutdown that
postmaster and try again.

Why is a clean shutdown important? If the server crashed, we would have
committed transactions in the WAL files which are not transfered to the
new server, and would be lost.

I am hesistant to even start such an old server because pg_upgrade never
modifies the old server. Even starting it in that case would be
modifying it.

The other problem is that if the server start fails, how do we know if
the failure was due to a running postmaster? I could later check the
postmaster.pid file, but it might have failed not yet getting to the
section where we remove that file.

The server-still-running is a common cause of failure, so I wanted
something that was very clear, rather than a generic
can't-start-the-server.

I am open to ideas.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: giomac(at)gmail(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
Date: 2013-01-19 16:27:28
Message-ID: 25216.1358612848@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Why is a clean shutdown important? If the server crashed, we would have
> committed transactions in the WAL files which are not transfered to the
> new server, and would be lost.

> I am hesistant to even start such an old server because pg_upgrade never
> modifies the old server. Even starting it in that case would be
> modifying it.

I'm not really following this logic. If the old cluster was in a
crashed state, why would we not expect that starting a postmaster would
be the best (only) way to repair the damage and make everything good
again? Isn't that exactly what the user would have to do anyway? What
other action would you expect him to take instead?

(But, at least with the type of packaging I'm using in Fedora, he would
first have to go through a package downgrade/reinstallation process,
because the packaging provides no simple scripted way of manually
starting the old postgres executable, only the new one. Moreover, what
pg_upgrade is printing provides no help in figuring out whether that's
the next step.)

I do sympathize with taking a paranoid attitude here, but I'm failing
to see what advantage there is in not attempting to start the old
postmaster. In the *only* case that pg_upgrade is successfully
protecting against with this logic, namely there's-an-active-postmaster-
already, the postmaster is equally able to protect itself. In other
cases it would be more helpful not less to let the postmaster analyze
the situation.

> The other problem is that if the server start fails, how do we know if
> the failure was due to a running postmaster?

Because we read the postmaster's log file, or at least tell the user to
do so. That report would be unambiguous, unlike pg_upgrade's.

regards, tom lane


From: George Machitidze <giomac(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
Date: 2013-01-19 18:45:15
Message-ID: CA+nv05jPr8TorZejo9=KSZd8krMTD4YY3RSxob8LW+imCZLqdw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hi Bruce, Tom

>The backstory on this is at the cited Red Hat bug ... apparently the OP
>decided I was clueless and he needed to consult some real authorities.
Oh come on, I'm very sure you both are good guys and know what you are
doing, none of us is ignorant bastard :)
Decided to open case here too, because of simple reason - maybe someone had
same issue, or knows how pg_upgrade works (in details) better than me,
because I am clueless.
This is test DB and I can erase it, but I'm very sure there's something
wrong in upgrade process - this is what I want to be solved.

Now, we can open a bottle of whiskey and go back to the problem:
1. I didn't run postmaster before/during pg_upgrade, it was never invoked
manually in this process
2. There is no pid file AFTER application is stopped, but looks like it's
there while pg_upgrade is running - strace showed that and there is no need
to run FAM to verify that

I don't know how pg_upgrade works, looks like it's trying to start
postmaster, which runs, postmaster.pid is created, then postmaster fails
stop or needs some more time bedore pg_upgrade is checking it's pid. That's
what I see.

So, is pg_upgrade starting postmaster? If yes, then when (at which step)
and why pid file check is done. That's all what we all want to know, right?

Best regards,
George Machitidze

On Sat, Jan 19, 2013 at 8:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Why is a clean shutdown important? If the server crashed, we would have
> > committed transactions in the WAL files which are not transfered to the
> > new server, and would be lost.
>
> > I am hesistant to even start such an old server because pg_upgrade never
> > modifies the old server. Even starting it in that case would be
> > modifying it.
>
> I'm not really following this logic. If the old cluster was in a
> crashed state, why would we not expect that starting a postmaster would
> be the best (only) way to repair the damage and make everything good
> again? Isn't that exactly what the user would have to do anyway? What
> other action would you expect him to take instead?
>
> (But, at least with the type of packaging I'm using in Fedora, he would
> first have to go through a package downgrade/reinstallation process,
> because the packaging provides no simple scripted way of manually
> starting the old postgres executable, only the new one. Moreover, what
> pg_upgrade is printing provides no help in figuring out whether that's
> the next step.)
>
> I do sympathize with taking a paranoid attitude here, but I'm failing
> to see what advantage there is in not attempting to start the old
> postmaster. In the *only* case that pg_upgrade is successfully
> protecting against with this logic, namely there's-an-active-postmaster-
> already, the postmaster is equally able to protect itself. In other
> cases it would be more helpful not less to let the postmaster analyze
> the situation.
>
> > The other problem is that if the server start fails, how do we know if
> > the failure was due to a running postmaster?
>
> Because we read the postmaster's log file, or at least tell the user to
> do so. That report would be unambiguous, unlike pg_upgrade's.
>
> regards, tom lane
>


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: George Machitidze <giomac(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
Date: 2013-01-19 20:24:34
Message-ID: 20130119202434.GB24541@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Sat, Jan 19, 2013 at 10:45:15PM +0400, George Machitidze wrote:
> Hi Bruce, Tom
>
> >The backstory on this is at the cited Red Hat bug ... apparently the OP
> >decided I was clueless and he needed to consult some real authorities.
> Oh come on, I'm very sure you both are good guys and know what you are doing,
> none of us is ignorant bastard :)
> Decided to open case here too, because of simple reason - maybe someone had
> same issue, or knows how pg_upgrade works (in details) better than me, because
> I am clueless.
> This is test DB and I can erase it, but I'm very sure there's something wrong
> in upgrade process - this is what I want to be solved.
>
> Now, we can open a bottle of whiskey and go back to the problem:
> 1. I didn't run postmaster before/during pg_upgrade, it was never invoked
> manually in this process
> 2. There is no pid file AFTER application is stopped, but looks like it's there
> while pg_upgrade is running - strace showed that and there is no need to run
> FAM to verify that
>
> I don't know how pg_upgrade works, looks like it's trying to start postmaster,
> which runs, postmaster.pid is created, then postmaster fails stop or needs some
> more time bedore pg_upgrade is checking it's pid. That's what I see.
>
> So, is pg_upgrade starting postmaster? If yes, then when (at which step) and
> why pid file check is done. That's all what we all want to know, right?

The pid check is done before pg_upgrade starts or stops any postmaster,
to make sure both servers are down before it starts. Tom wants that
testing improved.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: giomac(at)gmail(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
Date: 2013-01-20 02:56:48
Message-ID: 20130120025648.GE24541@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Sat, Jan 19, 2013 at 11:27:28AM -0500, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Why is a clean shutdown important? If the server crashed, we would have
> > committed transactions in the WAL files which are not transfered to the
> > new server, and would be lost.
>
> > I am hesistant to even start such an old server because pg_upgrade never
> > modifies the old server. Even starting it in that case would be
> > modifying it.
>
> I'm not really following this logic. If the old cluster was in a
> crashed state, why would we not expect that starting a postmaster would
> be the best (only) way to repair the damage and make everything good
> again? Isn't that exactly what the user would have to do anyway? What
> other action would you expect him to take instead?
>
> (But, at least with the type of packaging I'm using in Fedora, he would
> first have to go through a package downgrade/reinstallation process,
> because the packaging provides no simple scripted way of manually
> starting the old postgres executable, only the new one. Moreover, what
> pg_upgrade is printing provides no help in figuring out whether that's
> the next step.)
>
> I do sympathize with taking a paranoid attitude here, but I'm failing
> to see what advantage there is in not attempting to start the old
> postmaster. In the *only* case that pg_upgrade is successfully
> protecting against with this logic, namely there's-an-active-postmaster-
> already, the postmaster is equally able to protect itself. In other
> cases it would be more helpful not less to let the postmaster analyze
> the situation.
>
> > The other problem is that if the server start fails, how do we know if
> > the failure was due to a running postmaster?
>
> Because we read the postmaster's log file, or at least tell the user to
> do so. That report would be unambiguous, unlike pg_upgrade's.

Attached is a WIP patch to give you an idea of how I am going to solve
this problem. This comment says it all:

! /*
! * If we have a postmaster.pid file, try to start the server. If
! * it starts, the pid file was stale, so stop the server. If it
! * doesn't start, assume the server is running.
! */

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
check.diff text/x-diff 6.4 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: giomac(at)gmail(dot)com, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [BUGS] BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
Date: 2013-01-23 21:58:57
Message-ID: 20130123215857.GA6657@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Sat, Jan 19, 2013 at 09:56:48PM -0500, Bruce Momjian wrote:
> > (But, at least with the type of packaging I'm using in Fedora, he would
> > first have to go through a package downgrade/reinstallation process,
> > because the packaging provides no simple scripted way of manually
> > starting the old postgres executable, only the new one. Moreover, what
> > pg_upgrade is printing provides no help in figuring out whether that's
> > the next step.)
> >
> > I do sympathize with taking a paranoid attitude here, but I'm failing
> > to see what advantage there is in not attempting to start the old
> > postmaster. In the *only* case that pg_upgrade is successfully
> > protecting against with this logic, namely there's-an-active-postmaster-
> > already, the postmaster is equally able to protect itself. In other
> > cases it would be more helpful not less to let the postmaster analyze
> > the situation.
> >
> > > The other problem is that if the server start fails, how do we know if
> > > the failure was due to a running postmaster?
> >
> > Because we read the postmaster's log file, or at least tell the user to
> > do so. That report would be unambiguous, unlike pg_upgrade's.
>
> Attached is a WIP patch to give you an idea of how I am going to solve
> this problem. This comment says it all:

Attached is a ready-to-apply version of the patch. I continued to use
postmaster.pid to determine if I need to try the start/stop test ---
that allows me to know which servers _might_ be running, because a
server start failure might be for many reasons and I would prefer not to
suggest the server is running if I can avoid it, and the pid file gives
me that.

The patch is longer than I expected because the code needed to be
reordered. The starting banner indicates if it a live check, but to
check for a live server we need to start/stop the servers and we needed
to know where the binaries are, and we didn't do that until after the
start banner. I removed the 'check' line for binary checks, and moved
that before the banner printing. postmaster_start also now needs an
option to supress an error.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
pg_upgrade.diff text/x-diff 10.4 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: giomac(at)gmail(dot)com, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Re: [BUGS] BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
Date: 2013-01-24 20:37:25
Message-ID: 20130124203725.GA21914@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, Jan 23, 2013 at 04:58:57PM -0500, Bruce Momjian wrote:
> Attached is a ready-to-apply version of the patch. I continued to use
> postmaster.pid to determine if I need to try the start/stop test ---
> that allows me to know which servers _might_ be running, because a
> server start failure might be for many reasons and I would prefer not to
> suggest the server is running if I can avoid it, and the pid file gives
> me that.
>
> The patch is longer than I expected because the code needed to be
> reordered. The starting banner indicates if it a live check, but to
> check for a live server we need to start/stop the servers and we needed
> to know where the binaries are, and we didn't do that until after the
> start banner. I removed the 'check' line for binary checks, and moved
> that before the banner printing. postmaster_start also now needs an
> option to supress an error.

Applied.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Pavel Raiskup <praiskup(at)redhat(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: pgsql-bugs(at)postgresql(dot)org, bruce(at)momjian(dot)us, tgl(at)sss(dot)pgh(dot)pa(dot)us, giomac(at)gmail(dot)com
Subject: [PATCH] Re: [BUGS] BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
Date: 2013-08-12 10:37:29
Message-ID: 1376303849-17344-1-git-send-email-praiskup@redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

The latest update on original bug report at Red Hat bugzilla shows that the
reproducer is just disable the peer access for 'postgres' user in
pg_hba.conf. So — the old server was most probably still running for OP
(not shut down properly as was originally said).

But basically, this fix is relevant to this thread so posting here.

8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8<

From f2df7fb281b6346f3feeb5f0f8d2d0ee7fb13f6c Mon Sep 17 00:00:00 2001
From: Pavel Raiskup <praiskup(at)redhat(dot)com>
Date: Mon, 12 Aug 2013 10:45:08 +0200
Subject: [PATCH] pg_upgrade: stop postmaster properly

Setup the os_info.running_cluster little bit earlier just to allow the
stop_postmaster_atexit callback success when error occurs during the
authentication checks.

https://bugzilla.redhat.com/show_bug.cgi?id=896161
http://www.postgresql.org/message-id/E1TwKHs-0005yp-39@wrigleys.postgresql.org
---
contrib/pg_upgrade/server.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
index c1d459d..c082d0e 100644
--- a/contrib/pg_upgrade/server.c
+++ b/contrib/pg_upgrade/server.c
@@ -239,6 +239,8 @@ start_postmaster(ClusterInfo *cluster, bool throw_error)
if (!pg_ctl_return && !throw_error)
return false;

+ os_info.running_cluster = cluster;
+
/* Check to see if we can connect to the server; if not, report it. */
if ((conn = get_db_conn(cluster, "template1")) == NULL ||
PQstatus(conn) != CONNECTION_OK)
@@ -258,8 +260,6 @@ start_postmaster(ClusterInfo *cluster, bool throw_error)
pg_log(PG_FATAL, "pg_ctl failed to start the %s server, or connection failed\n",
CLUSTER_NAME(cluster));

- os_info.running_cluster = cluster;
-
return true;
}

--
1.8.3.1


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Pavel Raiskup <praiskup(at)redhat(dot)com>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, giomac(at)gmail(dot)com, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [PATCH] Re: [BUGS] BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
Date: 2013-08-12 19:33:47
Message-ID: 20130812193347.GD12510@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


[ fixed hackers CC address ]

On Mon, Aug 12, 2013 at 12:31:10PM +0200, Pavel Raiskup wrote:
> The latest update on original bug report at Red Hat bugzilla shows that the
> reproducer is just disable the peer access for 'postgres' user in
> pg_hba.conf. So — the old server was most probably still running for OP
> (not shut down properly as was originally said).
>
> But basically, this fix is relevant to this thread so posting here.
>
> 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8< 8<
>
> >From f2df7fb281b6346f3feeb5f0f8d2d0ee7fb13f6c Mon Sep 17 00:00:00 2001
> From: Pavel Raiskup <praiskup(at)redhat(dot)com>
> Date: Mon, 12 Aug 2013 10:45:08 +0200
> Subject: [PATCH] pg_upgrade: stop postmaster properly
>
> Setup the os_info.running_cluster little bit earlier just to allow the
> stop_postmaster_atexit callback success when error occurs during the
> authentication checks.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=896161
> http://www.postgresql.org/message-id/E1TwKHs-0005yp-39@wrigleys.postgresql.org
> ---
> contrib/pg_upgrade/server.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/contrib/pg_upgrade/server.c b/contrib/pg_upgrade/server.c
> index c1d459d..c082d0e 100644
> --- a/contrib/pg_upgrade/server.c
> +++ b/contrib/pg_upgrade/server.c
> @@ -239,6 +239,8 @@ start_postmaster(ClusterInfo *cluster, bool throw_error)
> if (!pg_ctl_return && !throw_error)
> return false;
>
> + os_info.running_cluster = cluster;
> +
> /* Check to see if we can connect to the server; if not, report it. */
> if ((conn = get_db_conn(cluster, "template1")) == NULL ||
> PQstatus(conn) != CONNECTION_OK)
> @@ -258,8 +260,6 @@ start_postmaster(ClusterInfo *cluster, bool throw_error)
> pg_log(PG_FATAL, "pg_ctl failed to start the %s server, or connection failed\n",
> CLUSTER_NAME(cluster));
>
> - os_info.running_cluster = cluster;
> -
> return true;
> }

Wow, this is an interesting bug report, and it certainly is a bug. What
is complex is the solution.

The existing code is written so we only shutdown the server on exit when
we know we have started the server _and_ can connect to it. As you
stated, this leaves the server running in cases where we _did_ start the
server, but can't connect to it, e.g. used MD5 authentication.

The patch moves the atexit setting up, as you suggested, but only does
that when pg_ctl succeeds (we know we started the server), because we
don't want to stop the server on exit if we didn't start it. PG 9.1+
will allow pg_ctl -w start to succeed even if there are permissions
problems; earlier versions will not and will keep the server running
--- the user will have to stop the server after pg_upgrade says it is
running.

I am not going to backpatch this beyond 9.3 as it is risky code. I have
improved the comments in this area.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
pg_upgrade.diff text/x-diff 2.4 KB

From: Pavel Raiskup <praiskup(at)redhat(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, giomac(at)gmail(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Re: [BUGS] BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
Date: 2013-08-12 20:08:07
Message-ID: 2173089.HqHtjyoHSD@nb.usersys.redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

> The patch moves the atexit setting up, as you suggested, but only does
> that when pg_ctl succeeds (we know we started the server),

Yes, of course!

> PG 9.1+ will allow pg_ctl -w start to succeed even if there are
> permissions problems; earlier versions will not and will keep the
> server running --- the user will have to stop the server after
> pg_upgrade says it is running.

This makes it a complex, really.. We may not easily make the
stop_postmaster resistant to non-running server. Thus your solution must
be good enough.

> I am not going to backpatch this beyond 9.3 as it is risky code. I have
> improved the comments in this area.

Agree, it is OK for me — thanks for your work.

Pavel


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Pavel Raiskup <praiskup(at)redhat(dot)com>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, giomac(at)gmail(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Re: [BUGS] BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
Date: 2013-08-12 20:44:12
Message-ID: 20130812204412.GF12510@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Mon, Aug 12, 2013 at 10:08:07PM +0200, Pavel Raiskup wrote:
> > The patch moves the atexit setting up, as you suggested, but only does
> > that when pg_ctl succeeds (we know we started the server),
>
> Yes, of course!
>
> > PG 9.1+ will allow pg_ctl -w start to succeed even if there are
> > permissions problems; earlier versions will not and will keep the
> > server running --- the user will have to stop the server after
> > pg_upgrade says it is running.
>
> This makes it a complex, really.. We may not easily make the
> stop_postmaster resistant to non-running server. Thus your solution must
> be good enough.

Well, stop_postmaster can run just fine with a stopped server, as we
allow the atexit() shutdown to ignore errors. The larger question is
whether we should ever stop a server we are not sure we started.

The existing pg_upgrade logic checks if the servers are running first
with start_postmaster(throw_error = false), so in our existing code, we
could probably unconditionally shutdown the server even with a pg_ctl
error when using throw_error = true, but pg_upgrade is complex so I am
hesitant to make such a bold change. Does anyone else have an opinion?

> > I am not going to backpatch this beyond 9.3 as it is risky code. I have
> > improved the comments in this area.
>
> Agree, it is OK for me — thanks for your work.

Sure. You gave me something to study today, and highlighted an area of
the code that was very unclear.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Pavel Raiskup <praiskup(at)redhat(dot)com>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, giomac(at)gmail(dot)com, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [PATCH] Re: [BUGS] BUG #7815: Upgrading PostgreSQL from 9.1 to 9.2 with pg_upgrade/postgreql-setup fails - invalid status retrieve
Date: 2013-08-16 15:43:20
Message-ID: 20130816154320.GA8972@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Mon, Aug 12, 2013 at 04:44:12PM -0400, Bruce Momjian wrote:
> On Mon, Aug 12, 2013 at 10:08:07PM +0200, Pavel Raiskup wrote:
> > > The patch moves the atexit setting up, as you suggested, but only does
> > > that when pg_ctl succeeds (we know we started the server),
> >
> > Yes, of course!
> >
> > > PG 9.1+ will allow pg_ctl -w start to succeed even if there are
> > > permissions problems; earlier versions will not and will keep the
> > > server running --- the user will have to stop the server after
> > > pg_upgrade says it is running.
> >
> > This makes it a complex, really.. We may not easily make the
> > stop_postmaster resistant to non-running server. Thus your solution must
> > be good enough.
>
> Well, stop_postmaster can run just fine with a stopped server, as we
> allow the atexit() shutdown to ignore errors. The larger question is
> whether we should ever stop a server we are not sure we started.
>
> The existing pg_upgrade logic checks if the servers are running first
> with start_postmaster(throw_error = false), so in our existing code, we
> could probably unconditionally shutdown the server even with a pg_ctl
> error when using throw_error = true, but pg_upgrade is complex so I am
> hesitant to make such a bold change. Does anyone else have an opinion?
>
> > > I am not going to backpatch this beyond 9.3 as it is risky code. I have
> > > improved the comments in this area.
> >
> > Agree, it is OK for me — thanks for your work.
>
> Sure. You gave me something to study today, and highlighted an area of
> the code that was very unclear.

I have applied a patch to shutdown the server on successful pg_ctl
start, but authentication failure. I have also added code that we might
want to be more aggressive someday.

I backpatched this to 9.3, but not further back as this is a risky area
of the code. Does anyone want to advocate further backpatching?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +