Re: why not kill -9 postmaster

Lists: pgsql-general
From: "Harpreet Dhaliwal" <harpreet(dot)dhaliwal01(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: why not kill -9 postmaster
Date: 2006-10-20 08:59:03
Message-ID: d86a77ef0610200159x1ed23da2v70fc0c7fea992c8f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Its always said that don't kill -9 postmaster.
Whats the reason not to do it. Why is it so strictly prohibited?

Thanks,
~Harpreet.


From: Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org>
To: "Harpreet Dhaliwal" <harpreet(dot)dhaliwal01(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: why not kill -9 postmaster
Date: 2006-10-20 10:27:38
Message-ID: 87ejt346zp.fsf@gate450.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Harpreet Dhaliwal writes:

> Its always said that don't kill -9 postmaster.
> Whats the reason not to do it. Why is it so strictly prohibited?

,----[ <http://www.postgresql.org/docs/8.1/static/postmaster-shutdown.html#AEN18182> ]
| It is best not to use SIGKILL to shut down the server. Doing so will
| prevent the server from releasing shared memory and semaphores,
| which may then have to be done manually before a new server can be
| started. Furthermore, SIGKILL kills the postmaster process without
| letting it relay the signal to its subprocesses, so it will be
| necessary to kill the individual subprocesses by hand as well.
`----

regards,
andreas


From: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: why not kill -9 postmaster
Date: 2006-10-20 11:12:23
Message-ID: 4538AF17.9020400@cox.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/20/06 05:27, Andreas Seltenreich wrote:
> Harpreet Dhaliwal writes:
>
>> Its always said that don't kill -9 postmaster.
>> Whats the reason not to do it. Why is it so strictly prohibited?
>
> ,----[ <http://www.postgresql.org/docs/8.1/static/postmaster-shutdown.html#AEN18182> ]
> | It is best not to use SIGKILL to shut down the server. Doing so will
> | prevent the server from releasing shared memory and semaphores,
> | which may then have to be done manually before a new server can be
> | started. Furthermore, SIGKILL kills the postmaster process without
> | letting it relay the signal to its subprocesses, so it will be
> | necessary to kill the individual subprocesses by hand as well.
> `----

But it can't be fatal, can it? After all, that's what a system
crash is, right?

- --
Ron Johnson, Jr.
Jefferson LA USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFOK8XS9HxQb37XmcRAsUMAKCptETkgCvdbhaxyvqhCryYAo3GtgCfUmqt
J41C6cs+rk7+h993Qh0pUMI=
=OJsz
-----END PGP SIGNATURE-----


From: Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org>
To: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: why not kill -9 postmaster
Date: 2006-10-20 12:10:48
Message-ID: 87y7rb2nnb.fsf@gate450.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Ron Johnson writes:

> On 10/20/06 05:27, Andreas Seltenreich wrote:
>> ,----[ <http://www.postgresql.org/docs/8.1/static/postmaster-shutdown.html#AEN18182> ]
>> | It is best not to use SIGKILL to shut down the server. Doing so will
>> | prevent the server from releasing shared memory and semaphores,
>> | which may then have to be done manually before a new server can be
>> | started. Furthermore, SIGKILL kills the postmaster process without
>> | letting it relay the signal to its subprocesses, so it will be
>> | necessary to kill the individual subprocesses by hand as well.
>> `----
>
> But it can't be fatal, can it?

While it could be fixed by hand, the list archives tell that it was
fatal enough for some to shoot themselves in their feet.

> After all, that's what a system crash is, right?

A system crash is safer in that it won't leave orphaned child
processes or IPC/synchronization resources around, making it more
comparable to a SIGQUIT than a SIGKILL.

regards,
andreas


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-general(at)postgresql(dot)org
Cc: Ron Johnson <ron(dot)l(dot)johnson(at)cox(dot)net>
Subject: Re: why not kill -9 postmaster
Date: 2006-10-20 13:22:22
Message-ID: 200610201522.23685.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Am Freitag, 20. Oktober 2006 13:12 schrieb Ron Johnson:
> But it can't be fatal, can it? After all, that's what a system
> crash is, right?

Perhaps we should add another tip not to crash the system.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Shane Ambler <pgsql(at)007Marketing(dot)com>
To: Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: why not kill -9 postmaster
Date: 2006-10-20 13:26:09
Message-ID: 4538CE71.4050806@007Marketing.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Andreas Seltenreich wrote:
> Ron Johnson writes:
>
>> On 10/20/06 05:27, Andreas Seltenreich wrote:
>>> ,----[ <http://www.postgresql.org/docs/8.1/static/postmaster-shutdown.html#AEN18182> ]
>>> | It is best not to use SIGKILL to shut down the server. Doing so will
>>> | prevent the server from releasing shared memory and semaphores,
>>> | which may then have to be done manually before a new server can be
>>> | started. Furthermore, SIGKILL kills the postmaster process without
>>> | letting it relay the signal to its subprocesses, so it will be
>>> | necessary to kill the individual subprocesses by hand as well.
>>> `----
>> But it can't be fatal, can it?
>
> While it could be fixed by hand, the list archives tell that it was
> fatal enough for some to shoot themselves in their feet.
>
>> After all, that's what a system crash is, right?
>
> A system crash is safer in that it won't leave orphaned child
> processes or IPC/synchronization resources around, making it more
> comparable to a SIGQUIT than a SIGKILL.
>

The one thing worse than kill -9 the postmaster is pulling the power
cord out of the server. Which is what makes UPS's so good.

If your server is changing the data file on disk and you pull the power
cord, what chance do you expect of reading that data file again?

While every attempt is made to make the server as reliable as possible
and to be able to recover as much as possible when things go wrong,
abrupt stops (whether from kill -9 or other) at the worst time will make
you dig out your backup copies or spend hours or days manually fixing
what is left to get as much data as you can.

If you are testing and developing that probably won't matter, but what
would it cost you or your company if you lost all the data in your
database? What about lost productivity during the time spent recovering?
Is it worth risking all that?

--

Shane Ambler
Postgres(at)007Marketing(dot)com

Get Sheeky @ http://Sheeky.Biz


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Shane Ambler <pgsql(at)007Marketing(dot)com>
Cc: Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: why not kill -9 postmaster
Date: 2006-10-20 13:29:54
Message-ID: 20061020132954.GD27869@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Shane Ambler wrote:

> The one thing worse than kill -9 the postmaster is pulling the power
> cord out of the server. Which is what makes UPS's so good.
>
> If your server is changing the data file on disk and you pull the power
> cord, what chance do you expect of reading that data file again?

1. That's what we have WAL for. The only thing that can really kill
you is the use of non-battery-backed write cache.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: "Harald Armin Massa" <haraldarminmassa(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: why not kill -9 postmaster
Date: 2006-10-20 13:37:29
Message-ID: 7be3f35d0610200637p747f81e5s32af230419a9dc05@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

>
> >
> > If your server is changing the data file on disk and you pull the power
> > cord, what chance do you expect of reading that data file again?
>
> 1. That's what we have WAL for. The only thing that can really kill
> you is the use of non-battery-backed write cache.
>

Just for information: I had to suffer numerous BOS (blue screen of death) on
an W2k3 Server running PostgreSQL 8.0 and 8.1 for Windows.

Every time the database restarted without data loss and without operator
invention.

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
-
Python: the only language with more web frameworks than keywords.


From: Ray Stell <stellr(at)cns(dot)vt(dot)edu>
To: Shane Ambler <pgsql(at)007Marketing(dot)com>
Cc: Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: why not kill -9 postmaster
Date: 2006-10-20 14:07:55
Message-ID: 20061020140755.GA20901@cns.vt.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Oct 20, 2006 at 10:56:09PM +0930, Shane Ambler wrote:

Someone in the thread mentioned having to clean up shared mem. I've had
to do this often with oracle:

root# ipcs
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0xe97c83ac 5505024 oracle 640 807403520 10
0x0052f649 3538945 postgresql600 10461184 2

------ Semaphore Arrays --------
key semid owner perms nsems
0xfb5e028c 25690112 oracle 640 154
0x0052f649 17629185 postgresq 600 17
0x0052f64a 17661954 postgresq 600 17
0x0052f64b 17694723 postgresq 600 17
0x0052f64c 17727492 postgresq 600 17
0x0052f64d 17760261 postgresq 600 17
0x0052f64e 17793030 postgresq 600 17
0x0052f64f 17825799 postgresq 600 17

------ Message Queues --------
key msqid owner perms used-bytes messages

$ ipcrm shm 2588672
resource(s) deleted

this remove example was not from the above shared mem report.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Shane Ambler <pgsql(at)007Marketing(dot)com>, Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: why not kill -9 postmaster
Date: 2006-10-20 14:18:41
Message-ID: 8058.1161353921@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Shane Ambler wrote:
>> The one thing worse than kill -9 the postmaster is pulling the power
>> cord out of the server. Which is what makes UPS's so good.
>>
>> If your server is changing the data file on disk and you pull the power
>> cord, what chance do you expect of reading that data file again?

> 1. That's what we have WAL for. The only thing that can really kill
> you is the use of non-battery-backed write cache.

The important distinction here is "will you lose data" vs "can you start
a new server without tedious manual intervention" (ipcrm etc). kill -9
won't lose data, but you may have to clean up after it. And, as Andreas
already noted, some people have been seen to mess up the manual
intervention part badly enough to cause data loss by themselves.
Personally I think the TIP that's really needed is "never remove
postmaster.pid by hand".

regards, tom lane


From: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>
To: "Shane Ambler" <pgsql(at)007marketing(dot)com>
Cc: "Andreas Seltenreich" <andreas+pg(at)gate450(dot)dyndns(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: why not kill -9 postmaster
Date: 2006-10-20 14:28:08
Message-ID: 758d5e7f0610200728uf0c0841oee77573c2ef5a7f4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 10/20/06, Shane Ambler <pgsql(at)007marketing(dot)com> wrote:

> >> After all, that's what a system crash is, right?
> >
> > A system crash is safer in that it won't leave orphaned child
> > processes or IPC/synchronization resources around, making it more
> > comparable to a SIGQUIT than a SIGKILL.
> >
>
> The one thing worse than kill -9 the postmaster is pulling the power
> cord out of the server. Which is what makes UPS's so good.

Well, I think that pulling the power cord is much safer than killing -9
the postmaster. If you pull the plug, then during bootup postgresql
will just replay every COMMITed transaction, so there won't be any
dataloss or downtime.

If you kill -9 postmaster... well, it's messy. ;-))) I feel safer when
everything goes down at the same time. ;)

> If your server is changing the data file on disk and you pull the power
> cord, what chance do you expect of reading that data file again?

With PostgreSQL? I expect to read all commited transactions. And
those not commited... well, they weren't commited in the first place,
so I won't see them anyway.

This is all in assumption that you are running your DB with fsync on,
on a reliable filesystem, and your hardware doesn't lie to you about
fsyncing data (and it's best if you have a battery for controller's cache).

Regards,
Dawid


From: "Ian Harding" <iharding(at)destinydata(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Shane Ambler" <pgsql(at)007marketing(dot)com>, "Andreas Seltenreich" <andreas+pg(at)gate450(dot)dyndns(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: why not kill -9 postmaster
Date: 2006-10-20 14:45:26
Message-ID: 725602300610200745s13ad1f1dy3e53c5a127fb666e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 10/20/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Shane Ambler wrote:
> >> The one thing worse than kill -9 the postmaster is pulling the power
> >> cord out of the server. Which is what makes UPS's so good.
> >>
> >> If your server is changing the data file on disk and you pull the power
> >> cord, what chance do you expect of reading that data file again?
>
> > 1. That's what we have WAL for. The only thing that can really kill
> > you is the use of non-battery-backed write cache.
>
> The important distinction here is "will you lose data" vs "can you start
> a new server without tedious manual intervention" (ipcrm etc). kill -9
> won't lose data, but you may have to clean up after it. And, as Andreas
> already noted, some people have been seen to mess up the manual
> intervention part badly enough to cause data loss by themselves.
> Personally I think the TIP that's really needed is "never remove
> postmaster.pid by hand".
>

When the machine crashes, don't you have to remove the pid file by
hand to get the Postgres to start? I seem to remember having to do
that....

- Ian Never-Say-Never Harding


From: Shane Ambler <pgsql(at)007Marketing(dot)com>
To: Dawid Kuroczko <qnex42(at)gmail(dot)com>
Cc: Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: why not kill -9 postmaster
Date: 2006-10-20 14:50:35
Message-ID: 4538E23B.30506@007Marketing.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Dawid Kuroczko wrote:
> On 10/20/06, Shane Ambler <pgsql(at)007marketing(dot)com> wrote:

>> The one thing worse than kill -9 the postmaster is pulling the power
>> cord out of the server. Which is what makes UPS's so good.
>
>
> Well, I think that pulling the power cord is much safer than killing -9
> the postmaster. If you pull the plug, then during bootup postgresql
> will just replay every COMMITed transaction, so there won't be any
> dataloss or downtime.

If you kill -9 the postmaster the system can still finish sending
changes to disk and close the file but pulling the power cord can stop a
write in the middle of a block giving you half new data and half old
data in the one file.

It's all a matter of timing.

--

Shane Ambler
Postgres(at)007Marketing(dot)com

Get Sheeky @ http://Sheeky.Biz


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Shane Ambler <pgsql(at)007Marketing(dot)com>
Cc: Dawid Kuroczko <qnex42(at)gmail(dot)com>, Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: why not kill -9 postmaster
Date: 2006-10-20 15:12:46
Message-ID: 20061020151246.GE6718@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Shane Ambler wrote:
> Dawid Kuroczko wrote:
> >On 10/20/06, Shane Ambler <pgsql(at)007marketing(dot)com> wrote:
>
> >>The one thing worse than kill -9 the postmaster is pulling the power
> >>cord out of the server. Which is what makes UPS's so good.
> >
> >
> >Well, I think that pulling the power cord is much safer than killing -9
> >the postmaster. If you pull the plug, then during bootup postgresql
> >will just replay every COMMITed transaction, so there won't be any
> >dataloss or downtime.
>
> If you kill -9 the postmaster the system can still finish sending
> changes to disk and close the file but pulling the power cord can stop a
> write in the middle of a block giving you half new data and half old
> data in the one file.

That case is protected against in the WAL code. That's what we save
whole page images for.

The only difference between kill -9 postmaster and abrupt shutdown, is
that on the former case there may be backends that continue to run and
commit transactions. Those will still be WAL-logged though.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Shane Ambler <pgsql(at)007Marketing(dot)com>
Cc: Dawid Kuroczko <qnex42(at)gmail(dot)com>, Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: why not kill -9 postmaster
Date: 2006-10-20 15:19:29
Message-ID: 20061020151929.GA31471@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sat, Oct 21, 2006 at 12:20:35AM +0930, Shane Ambler wrote:
> If you kill -9 the postmaster the system can still finish sending
> changes to disk and close the file but pulling the power cord can stop a
> write in the middle of a block giving you half new data and half old
> data in the one file.

Well, if you kill -9 the postmaster all the connections stay alive and
stay processing tuples and writing to disk, except the coordination is
gone. Some queues won't be processed, some signals will be ignored, if
the postmaster pid gets reused you'll have some fun.

In particular, the sinval-queue processing would break, which could
lead to some interesting issues. But I expect any number of issues to
start occurring.

A half-written disk blocks is a solved problem, postgresql will recover
from that without blinking.

> It's all a matter of timing.

Pulling the plug is *way* safer, it's a known quantity. As Tom said,
killing the postmaster needs cleanup, and some people screwup the
cleanup enough to corrupt their own data.

Now: killall -9 postgres (kill the parents, all the clients,
autovacuum, bgwriter, etc) all in one go is much more like a crash. But
that's not what's being discussed here.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Shane Ambler <pgsql(at)007Marketing(dot)com>, Dawid Kuroczko <qnex42(at)gmail(dot)com>, Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: why not kill -9 postmaster
Date: 2006-10-20 15:54:27
Message-ID: 15818.1161359667@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> Well, if you kill -9 the postmaster all the connections stay alive and
> stay processing tuples and writing to disk, except the coordination is
> gone.

The postmaster isn't involved in any critical inter-backend coordination.
If you kill -9 the postmaster *and then kill or wait out all the
backends*, you won't lose data. This is not a desirable long-term
operating mode, because it cripples autovacuum and some other things,
but it's not dangerous.

The only really serious risk I'm aware of in this scenario is:

1. DBA does "kill -9" postmaster, but some backends are still alive and
processing.

2. DBA tries to start new postmaster, gets message about "shared memory
segment still in use".

3. DBA does "rm postmaster.pid" (this is the step that qualifies him
as an idiot).

4. DBA starts new postmaster. Since the interlock file is gone, it
starts up without any awareness that there are old backends still alive.

At this point, you have two separate sets of backends that are not
communicating (they're using two different shared memory segments)
but they are munging the same data files. It will not take long
to turn the data files into irrecoverable hash --- for just one
reason, transaction numbering will diverge between the two sets of
backends.

regards, tom lane


From: "Harpreet Dhaliwal" <harpreet(dot)dhaliwal01(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Shane Ambler" <pgsql(at)007marketing(dot)com>, "Dawid Kuroczko" <qnex42(at)gmail(dot)com>, "Andreas Seltenreich" <andreas+pg(at)gate450(dot)dyndns(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: why not kill -9 postmaster
Date: 2006-10-20 16:42:31
Message-ID: d86a77ef0610200942p1b0f7044s7de5dc559b4e8ab0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

After all that discussion that took place while i was sleeping, I have a few
more questions simply haunting me.

Someitmes, rather most of the times, when I start postgres using pg_ctl, it
says antoher postmaster is running. Being a total naive about the hazzards
of kill -9 postmaster, i simply used to kill -9 all postmaster related
process IDs.
Now, what should i do to get rid of the postmaster that is already running
from a safe perspective.
Also, even though it says, postmaster is still running, i can't start my
pgadmin because it starts crying over the fact that postgres server is not
running.
Another thing that worries me is the importance of postmaster.pid.
What happens if I simply do rm postmaster.pid after killing all the
postmaster processes.
How big a pain in the neck is that going to be?

Thanks,
~Harpreet

On 10/20/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> > Well, if you kill -9 the postmaster all the connections stay alive and
> > stay processing tuples and writing to disk, except the coordination is
> > gone.
>
> The postmaster isn't involved in any critical inter-backend coordination.
> If you kill -9 the postmaster *and then kill or wait out all the
> backends*, you won't lose data. This is not a desirable long-term
> operating mode, because it cripples autovacuum and some other things,
> but it's not dangerous.
>
> The only really serious risk I'm aware of in this scenario is:
>
> 1. DBA does "kill -9" postmaster, but some backends are still alive and
> processing.
>
> 2. DBA tries to start new postmaster, gets message about "shared memory
> segment still in use".
>
> 3. DBA does "rm postmaster.pid" (this is the step that qualifies him
> as an idiot).
>
> 4. DBA starts new postmaster. Since the interlock file is gone, it
> starts up without any awareness that there are old backends still alive.
>
> At this point, you have two separate sets of backends that are not
> communicating (they're using two different shared memory segments)
> but they are munging the same data files. It will not take long
> to turn the data files into irrecoverable hash --- for just one
> reason, transaction numbering will diverge between the two sets of
> backends.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Ian Harding" <iharding(at)destinydata(dot)com>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Shane Ambler" <pgsql(at)007marketing(dot)com>, "Andreas Seltenreich" <andreas+pg(at)gate450(dot)dyndns(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: why not kill -9 postmaster
Date: 2006-10-21 04:02:32
Message-ID: 21208.1161403352@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Ian Harding" <iharding(at)destinydata(dot)com> writes:
> On 10/20/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Personally I think the TIP that's really needed is "never remove
>> postmaster.pid by hand".

> When the machine crashes, don't you have to remove the pid file by
> hand to get the Postgres to start? I seem to remember having to do
> that....

Given a properly written startup script and a reasonably recent
postmaster, that shouldn't be necessary. In any case, retrying the
startup script is a *far* safer habit to develop than manually removing
the pidfile (and putting an "rm" into the script itself is folly of the
first magnitude).

regards, tom lane


From: "Harpreet Dhaliwal" <harpreet(dot)dhaliwal01(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Ian Harding" <iharding(at)destinydata(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Shane Ambler" <pgsql(at)007marketing(dot)com>, "Andreas Seltenreich" <andreas+pg(at)gate450(dot)dyndns(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: why not kill -9 postmaster
Date: 2006-10-21 22:56:58
Message-ID: d86a77ef0610211556u6c2de32bhec7df82f6ab5d9a9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

what type of start up script are you talking about here?

On 10/21/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> "Ian Harding" <iharding(at)destinydata(dot)com> writes:
> > On 10/20/06, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Personally I think the TIP that's really needed is "never remove
> >> postmaster.pid by hand".
>
> > When the machine crashes, don't you have to remove the pid file by
> > hand to get the Postgres to start? I seem to remember having to do
> > that....
>
> Given a properly written startup script and a reasonably recent
> postmaster, that shouldn't be necessary. In any case, retrying the
> startup script is a *far* safer habit to develop than manually removing
> the pidfile (and putting an "rm" into the script itself is folly of the
> first magnitude).
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>