Re: bgwriter never dies

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: bgwriter never dies
Date: 2004-02-23 17:51:51
Message-ID: 5300.1077558711@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I noticed while doing some debugging this morning that if the postmaster
crashes for some reason (eg kill -9) the bgwriter process never goes
away. Backends will eventually exit when their clients quit, and the
stats collection processes shut down nicely, but the bgwriter process
has to be killed by hand. This doesn't seem like a real good thing.
Maybe there should be a provision similar to the stats collector's
check-for-read-ready-from-a-pipe?

regards, tom lane


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: bgwriter never dies
Date: 2004-02-24 03:42:50
Message-ID: 403AC83A.9080008@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I noticed while doing some debugging this morning that if the postmaster
> crashes for some reason (eg kill -9) the bgwriter process never goes
> away. Backends will eventually exit when their clients quit, and the
> stats collection processes shut down nicely, but the bgwriter process
> has to be killed by hand. This doesn't seem like a real good thing.
> Maybe there should be a provision similar to the stats collector's
> check-for-read-ready-from-a-pipe?

Hmmmm,

the case of the bgwriter is a bit of a twist here. In contrast to the
collectors it is connected to the shared memory. So it can keep
resources and also even worse, it could write() after the postmaster died.

Maybe there is a chance to create a watchdog for free here. Do we
currently create our own process group, with all processes under the
postmaster belonging to it? If the bgwriter would at the times it naps
check if its parent process is init, (Win32 note, check if the
postmaster does not exist any more instead), it could kill the entire
process group on behalf of the dead postmaster. This is one more system
call at a time, where the bgwriter does a system call with a timeout to
nap anyway.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: bgwriter never dies
Date: 2004-02-24 05:08:05
Message-ID: 18185.1077599285@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> Tom Lane wrote:
>> Maybe there should be a provision similar to the stats collector's
>> check-for-read-ready-from-a-pipe?

> the case of the bgwriter is a bit of a twist here. In contrast to the
> collectors it is connected to the shared memory. So it can keep
> resources and also even worse, it could write() after the postmaster died.

That's not "worse", really. Any backends that are still alive are
committing real live transactions --- they're telling their clients
they committed, so we'd better commit. I don't mind if performance gets
worse or if we lose pg_stats statistics, but we'd better not adopt the
attitude that transaction correctness no longer matters after a
postmaster crash.

So one thing we ought to think about here is whether system correctness
depends on the bgwriter continuing to run until the last backend is
gone. AFAICS that is not true now --- the bgwriter just improves
performance --- but we'd better decide what our plan for the future is.

> Maybe there is a chance to create a watchdog for free here. Do we
> currently create our own process group, with all processes under the
> postmaster belonging to it?

We do not; I'm not sure the notion of a process group is even portable,
and I am pretty sure that the API to control process grouping isn't.

> If the bgwriter would at the times it naps
> check if its parent process is init, (Win32 note, check if the
> postmaster does not exist any more instead), it could kill the entire
> process group on behalf of the dead postmaster.

I don't think we want that. IMHO the preferred behavior if the
postmaster crashes should be like a "smart shutdown" --- you don't spawn
any more backends (obviously) but existing backends should be allowed to
run until their clients exit. That's how things have always worked
anyway...

[ thinks ... ] If we do want it we don't need any process-group
assumptions. The bgwriter is connected to shmem so it can scan the
PGPROC array and issue kill() against each sibling.

We oughta debate the desired behavior first though.

regards, tom lane


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: bgwriter never dies
Date: 2004-02-24 11:08:36
Message-ID: 403B30B4.109@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
>> Tom Lane wrote:
>>> Maybe there should be a provision similar to the stats collector's
>>> check-for-read-ready-from-a-pipe?
>
>> the case of the bgwriter is a bit of a twist here. In contrast to the
>> collectors it is connected to the shared memory. So it can keep
>> resources and also even worse, it could write() after the postmaster died.
>
> That's not "worse", really. Any backends that are still alive are
> committing real live transactions --- they're telling their clients
> they committed, so we'd better commit. I don't mind if performance gets
> worse or if we lose pg_stats statistics, but we'd better not adopt the
> attitude that transaction correctness no longer matters after a
> postmaster crash.
>
> So one thing we ought to think about here is whether system correctness
> depends on the bgwriter continuing to run until the last backend is
> gone. AFAICS that is not true now --- the bgwriter just improves
> performance --- but we'd better decide what our plan for the future is.
>
>> Maybe there is a chance to create a watchdog for free here. Do we
>> currently create our own process group, with all processes under the
>> postmaster belonging to it?
>
> We do not; I'm not sure the notion of a process group is even portable,
> and I am pretty sure that the API to control process grouping isn't.
>
>> If the bgwriter would at the times it naps
>> check if its parent process is init, (Win32 note, check if the
>> postmaster does not exist any more instead), it could kill the entire
>> process group on behalf of the dead postmaster.
>
> I don't think we want that. IMHO the preferred behavior if the
> postmaster crashes should be like a "smart shutdown" --- you don't spawn
> any more backends (obviously) but existing backends should be allowed to
> run until their clients exit. That's how things have always worked
> anyway...
>
> [ thinks ... ] If we do want it we don't need any process-group
> assumptions. The bgwriter is connected to shmem so it can scan the
> PGPROC array and issue kill() against each sibling.

Right. Which can change the backend behaviour from a smart shutdown to
an immediate shutdown. In the case of a postmaster crash, I think
something in the system is so wrong that I'd prefer an immediate shutdown.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: bgwriter never dies
Date: 2004-02-25 04:07:15
Message-ID: 269.1077682035@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> Tom Lane wrote:
>> I don't think we want that. IMHO the preferred behavior if the
>> postmaster crashes should be like a "smart shutdown" --- you don't spawn
>> any more backends (obviously) but existing backends should be allowed to
>> run until their clients exit. That's how things have always worked
>> anyway...

> ... In the case of a postmaster crash, I think
> something in the system is so wrong that I'd prefer an immediate shutdown.

Surely some other people have opinions on this? Hello out there?

regards, tom lane


From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: bgwriter never dies
Date: 2004-02-25 04:21:57
Message-ID: Pine.LNX.4.58.0402251518060.7382@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > I don't think we want that. IMHO the preferred behavior if the
> > postmaster crashes should be like a "smart shutdown" --- you don't spawn
> > any more backends (obviously) but existing backends should be allowed to
> > run until their clients exit. That's how things have always worked
> > anyway...
> >
> > [ thinks ... ] If we do want it we don't need any process-group
> > assumptions. The bgwriter is connected to shmem so it can scan the
> > PGPROC array and issue kill() against each sibling.
>
> Right. Which can change the backend behaviour from a smart shutdown to
> an immediate shutdown. In the case of a postmaster crash, I think
> something in the system is so wrong that I'd prefer an immediate shutdown.

I agree that if the postmaster dies something bad is definately happening.
However, there will be a period of time X between the postmaster dying and
the bgwriter (or another process, perhaps) discovering this. Which means
that the bug/hardware problem/condition which killed the postmaster may
affect other live backends. Hmmm. Still, if we can minimise impact then
we're probably assisting. We could always add a GUC variable ;-).

Gavin


From: Neil Conway <neilc(at)samurai(dot)com>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: bgwriter never dies
Date: 2004-02-25 04:47:58
Message-ID: 87k72bahoh.fsf@mailbox.samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> In the case of a postmaster crash, I think something in the system
> is so wrong that I'd prefer an immediate shutdown.

I agree. Allowing existing backends to commit transactions after the
postmaster has died doesn't strike me as being that useful, and is
probably more confusing than anything else.

That said, if it takes some period of time between the death of the
postmaster and the shutdown of any backends, we *need* to ensure that
any transactions committed during that period still make it to durable
storage.

-Neil


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: Neil Conway <neilc(at)samurai(dot)com>, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: bgwriter never dies
Date: 2004-02-25 13:19:34
Message-ID: 200402250819.34197.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday 24 February 2004 23:47, Neil Conway wrote:
> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> > In the case of a postmaster crash, I think something in the system
> > is so wrong that I'd prefer an immediate shutdown.
>
> I agree. Allowing existing backends to commit transactions after the
> postmaster has died doesn't strike me as being that useful, and is
> probably more confusing than anything else.
>
> That said, if it takes some period of time between the death of the
> postmaster and the shutdown of any backends, we *need* to ensure that
> any transactions committed during that period still make it to durable
> storage.
>

Yes, roll back any existing/uncommited transactions and shutdown those
connections, but make sure that committed transactions are stored on disk
before exiting completly.

Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, Neil Conway <neilc(at)samurai(dot)com>, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: bgwriter never dies
Date: 2004-02-26 03:43:01
Message-ID: 6.0.0.22.0.20040226144050.05433a70@203.8.195.10
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 12:19 AM 26/02/2004, Robert Treat wrote:
>Yes, roll back any existing/uncommited transactions and shutdown

I'm not event sure I'd go with the rollback; whatever killed the PM may
make the rest of the system unstable. I'd prefer to see the transactions
rolled back (if necessary) as part of the log recovery on PM startup, not
by possibly dying PG proceses.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, Neil Conway <neilc(at)samurai(dot)com>, Jan Wieck <JanWieck(at)Yahoo(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: bgwriter never dies
Date: 2004-02-26 05:01:41
Message-ID: 9142.1077771701@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
> I'm not event sure I'd go with the rollback; whatever killed the PM may
> make the rest of the system unstable. I'd prefer to see the transactions
> rolled back (if necessary) as part of the log recovery on PM startup, not
> by possibly dying PG proceses.

Well, in the first place "rollback" is not an explicit action in
Postgres; you're thinking of Oracle or some other old-line technology.
There's nothing that has to happen to undo the effects of a failed
transaction.

But my real problem with the above line of reasoning is that there is
no basis for assuming that a postmaster failure has anything to do with
problems at the backend level. We have always gone out of our way to
ensure that the postmaster is disconnected from backend failure causes
--- it doesn't touch any but the simplest shared-memory datastructures,
for example. This design rule exists mostly to try to ensure that the
postmaster will survive backend crashes, but the effects cut both ways:
there is no reason that a backend won't survive a postmaster crash.
In practice, the few postmaster crashes I've seen have been due to
localized bugs in postmaster-only code or a Linux kernel randomly
seizing on the postmaster as the victim for an out-of-memory kill.
I have never seen the postmaster crash as a result of backend-level
problems, and if I did I'd be out to fix it immediately.

So my opinion is that "kill all the backends when the postmaster
crashes" is a bad idea that will only result in a net reduction in
system reliability. There is no point in building insulated independent
components if you then put in logic to force the system uptime to be the
minimum of the component uptimes.

regards, tom lane


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, Neil Conway <neilc(at)samurai(dot)com>, Jan Wieck <JanWieck(at)Yahoo(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: bgwriter never dies
Date: 2004-02-26 05:48:24
Message-ID: 6.0.0.22.0.20040226164450.04323ea0@203.8.195.10
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 04:01 PM 26/02/2004, Tom Lane wrote:
>there is no basis for assuming that a postmaster failure has
>anything to do with problems at the backend level....So my
>opinion is that "kill all the backends when the postmaster
>crashes" is a bad idea

Sounds fine. Then a system that will allow a new PM to start ASAP and serve
other connections would be great. I assume that means an orderly shutdown &
restart, but I can't see a way to make the restart work.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Jan Wieck'" <JanWieck(at)Yahoo(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: bgwriter never dies
Date: 2004-02-26 21:44:45
Message-ID: 002701c3fcb1$c18d74f0$0200000a@LaptopDellXP
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>Tom Lane
> Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> > Tom Lane wrote:
> >> I don't think we want that. IMHO the preferred behavior if the
> >> postmaster crashes should be like a "smart shutdown" --- you don't
> spawn
> >> any more backends (obviously) but existing backends should be
allowed
> to
> >> run until their clients exit. That's how things have always worked
> >> anyway...
>
> > ... In the case of a postmaster crash, I think
> > something in the system is so wrong that I'd prefer an immediate
> shutdown.
>
> Surely some other people have opinions on this? Hello out there?
>

I would prefer that all backends are allowed (how would you stop them?)
should carry on working on their current transaction only, then quit.
But that sounds like each backened would need to check postmaster status
at end of every transaction - yuk! Is there another way to get that
behaviour?

Your comments about least reliable component bringing rest down is
appropriate. You should assume that the backends are doing something
very important and should never be interfered with - like a very long
running transaction that is mere seconds away from committing. Besides,
this might encourage some kind of denial of service attacks... Overall,
my feeling is that a broken postmaster could be bad, but so could a
malfunctioning "fail safe" feature, so immediate shutdown wouldn't
necessarily get you out of the **** in the way that it seems it might.

If the postmaster crashes, then you might get the situation that you
have one person still connected, yet are unable to connect others. That
would be very annoying with many connected users - admittedly not much
problem if you're using external session pooling. You can't restart the
postmaster with one backend still up can you? I hope not, that sounds
bad: convince me! But if you can't its in everybody else's interests for
that last guy to stop cleanly, but earlier than their own convenience,
to allow the whole system to be restarted.

Oracle uses a PMON process to monitor for this situation: Oracle SMON
process is similar to postmaster/bg_writer. If SMON dies, PMON will
restart it. Should we have a pgmon process that watches the postmaster
and restarts it if required?

Overall, we need a clear statement of how this works, so things like the
archiver process for PITR knows when to stop/start etc. My suggestion
would be to draw out the finite state machine, so there's never a case
when we accidentally turn off archiving when there's some part of pg
still up.

Best regards, Simon Riggs


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: simon(at)2ndquadrant(dot)com
Cc: "'Jan Wieck'" <JanWieck(at)Yahoo(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: bgwriter never dies
Date: 2004-02-26 22:09:21
Message-ID: 14768.1077833361@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> Should we have a pgmon process that watches the postmaster
> and restarts it if required?

I doubt it; in practice the postmaster is *very* reliable (because it
doesn't do much), and so I'm not sure that adding a watchdog is going to
increase the net reliability of the system. The sorts of things that
could take out the postmaster are likely to take out a watchdog too.

regards, tom lane


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: simon(at)2ndquadrant(dot)com, 'Jan Wieck' <JanWieck(at)Yahoo(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: bgwriter never dies
Date: 2004-02-27 14:29:17
Message-ID: 20040227142917.GA22237@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 26, 2004 at 17:09:21 -0500,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > Should we have a pgmon process that watches the postmaster
> > and restarts it if required?
>
> I doubt it; in practice the postmaster is *very* reliable (because it
> doesn't do much), and so I'm not sure that adding a watchdog is going to
> increase the net reliability of the system. The sorts of things that
> could take out the postmaster are likely to take out a watchdog too.

Running postgres under daemontools is an easy way to accomplish this.
The one case it won't handle is if some process gets the process id
from the old postgres process before the new one starts up.