Re: server auto-restarts and ipcs

Lists: pgsql-generalpgsql-hackers
From: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: server auto-restarts and ipcs
Date: 2004-11-09 00:47:44
Message-ID: 200411081747.44968.pgsql@bluepolka.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


A power failure led to failed postmaster restart using 7.4.6 (see output
below). The short-term fix is usually to delete the pid file and restart.

I often wonder why ipcs never seems to show the shared memory
block in question? Am I using the wrong command? Does the key
mentioned by pgsql map to the key in the ipcs output? And if the
shared segment is simply not there, would it be possible for pgsql to
figure that out ala Apache, search the process table, and go ahead
and restart if it didn't see a postmaster already running? I'm sure this
has been asked and answered, I just couldn't find it via google...

TIA.

Ed

Database and process is pg746dba...

$ cat logs-pg746-7.4.6/server_log.Mon
pg_ctl: Another postmaster may be running. Trying to start postmaster anyway.
2004-11-08 17:17:22.398 [18038] FATAL: pre-existing shared memory block (key 9746001, ID 658210829) is still in use
HINT: If you're sure there are no old server processes still running, remove the shared memory block with the command "ipcrm", or just delete the file "/users/pg746dba/dbclusters/pg746/postgresql-7.4.6/data/postmaster.pid".
pg_ctl: cannot start postmaster
Examine the log output.

$ ipcs

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 32768 ed 777 393216 2 dest
0x00000000 131073 root 644 110592 4 dest
0x00000000 3538946 ed 777 393216 2 dest
0x00000000 3670019 ed 777 393216 2 dest
0x00000000 4685828 ed 777 393216 2 dest
0x00000000 4816901 ed 777 393216 2 dest
0x00000000 4915206 ed 777 393216 2 dest
0x00000000 4980743 ed 777 393216 2 dest
0x00000000 5046280 ed 777 393216 2 dest
0x00000000 5111817 ed 777 393216 2 dest
0x00000000 5537802 root 644 110592 3 dest
0x00000000 6651915 ed 777 393216 2 dest
0x00000000 19595276 ed 666 14400 1 dest
0x00000000 11272205 root 644 110592 2 dest

------ Semaphore Arrays --------
key semid owner perms nsems

------ Message Queues --------
key msqid owner perms used-bytes messages


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: server auto-restarts and ipcs
Date: 2004-11-09 01:16:15
Message-ID: 13012.1099962975@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

"Ed L." <pgsql(at)bluepolka(dot)net> writes:
> A power failure led to failed postmaster restart using 7.4.6 (see output
> below). The short-term fix is usually to delete the pid file and restart.

> I often wonder why ipcs never seems to show the shared memory
> block in question?

The shared memory block would certainly not still exist after a system
reboot, so what we have here is a misleading error message. Looking at
the code, the most plausible explanation appears to be that
shmctl(IPC_STAT) is failing (which it ought to) and returning some errno
code different from EINVAL (which is the case we are expecting to see).
What platform are you on, and what does its shmctl(2) man page document
as error conditions?

regards, tom lane


From: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: server auto-restarts and ipcs
Date: 2004-11-09 02:24:31
Message-ID: 200411081924.31101.pgsql@bluepolka.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Monday November 8 2004 6:16, Tom Lane wrote:
> "Ed L." <pgsql(at)bluepolka(dot)net> writes:
> > A power failure led to failed postmaster restart using 7.4.6 (see
> > output below). The short-term fix is usually to delete the pid file
> > and restart.
> >
> > I often wonder why ipcs never seems to show the shared memory
> > block in question?
>
> The shared memory block would certainly not still exist after a system
> reboot, so what we have here is a misleading error message. Looking at
> the code, the most plausible explanation appears to be that
> shmctl(IPC_STAT) is failing (which it ought to) and returning some errno
> code different from EINVAL (which is the case we are expecting to see).
> What platform are you on, and what does its shmctl(2) man page document
> as error conditions?

Platform is Linux 2.4.20-30.9 on i686 (Pentium 4, I think).

From man 2 schctl:

ERRORS
On error, errno will be set to one of the following:

EACCES is returned if IPC_STAT is requested and
shm_perm.modes does not allow read access for shmid.

EFAULT The argument cmd has value IPC_SET or IPC_STAT but
the address pointed to by buf isn’t accessible.

EINVAL is returned if shmid is not a valid identifier, or cmd
is not a valid command.

EIDRM is returned if shmid points to a removed identifier.

EPERM is returned if IPC_SET or IPC_RMID is attempted, and
the effective user ID of the calling process is not the creator (as found
in shm_perm.cuid), the owner (as found in shm_perm.uid), or the
super-user.

EOVERFLOW is returned if IPC_STAT is attempted, and the gid or
uid value is too large to be stored in the structure pointed to by buf.

CONFORMING TO
SVr4, SVID. SVr4 documents additional error conditions EINVAL,
ENOENT, ENOSPC, ENOMEM, EEXIST. Neither SVr4 nor SVID documents an EIDRM
error condition.


From: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: server auto-restarts and ipcs
Date: 2004-11-09 02:28:57
Message-ID: 200411081928.57211.pgsql@bluepolka.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Monday November 8 2004 7:24, Ed L. wrote:
> On Monday November 8 2004 6:16, Tom Lane wrote:
> > "Ed L." <pgsql(at)bluepolka(dot)net> writes:
> > > A power failure led to failed postmaster restart using 7.4.6 (see
> > > output below). The short-term fix is usually to delete the pid file
> > > and restart.
> > >
> > > I often wonder why ipcs never seems to show the shared memory
> > > block in question?
> >
> > The shared memory block would certainly not still exist after a system
> > reboot, so what we have here is a misleading error message. Looking at
> > the code, the most plausible explanation appears to be that
> > shmctl(IPC_STAT) is failing (which it ought to) and returning some
> > errno code different from EINVAL (which is the case we are expecting to
> > see). What platform are you on, and what does its shmctl(2) man page
> > document as error conditions?
>
> Platform is Linux 2.4.20-30.9 on i686 (Pentium 4, I think).

I recently saw this same thing happen from a power failure on several HPUX
boxes as well (I think running B.11.00/11.23 with 7.3.4/7.3.7, but not
sure).

Ed


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: server auto-restarts and ipcs
Date: 2004-11-09 03:41:55
Message-ID: 14346.1099971715@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

"Ed L." <pgsql(at)bluepolka(dot)net> writes:
> A power failure led to failed postmaster restart using 7.4.6 (see
> output below). The short-term fix is usually to delete the pid file
> and restart.

Thinking some more about this ... does anyone know the algorithm used
in Linux to assign shared memory segment IDs?

Your report shows about a dozen shmem segments in use; which would put
the probability of an accidental collision at pretty-tiny. But if the
kernel's assignment algorithm is nonrandom then it'd be plausible for
the Postgres shmem ID from the previous system boot cycle to match
one of the shmem IDs already handed out in the current boot cycle.
In that case we'd get EACCES from shmctl() which we take to be a trouble
indication. (This is probably over-conservatism, but I don't want to
relax it without knowing for sure that we need to.)

BTW, do you know what all those shmem segments are for? My Linux box
shows only one segment in use besides the ones Postgres is using.

regards, tom lane


From: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: server auto-restarts and ipcs
Date: 2004-11-09 04:09:37
Message-ID: 200411082109.37083.pgsql@bluepolka.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Monday November 8 2004 8:41, Tom Lane wrote:
>
> BTW, do you know what all those shmem segments are for? My Linux box
> shows only one segment in use besides the ones Postgres is using.

Looks like Ximian Evolution apps, X, Mozilla, Wombat, etc ...

Ed


From: Oliver Elphick <olly(at)lfix(dot)co(dot)uk>
To: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: server auto-restarts and ipcs
Date: 2004-11-09 09:16:57
Message-ID: 1099991818.29685.56.camel@linda
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Mon, 2004-11-08 at 17:47 -0700, Ed L. wrote:
> I often wonder why ipcs never seems to show the shared memory
> block in question?

The permissions of the shared memory block and the semaphore arrays are
600. ipcs seems not to report objects which you cannot access. Run
ipcs as root and you should see the PostgreQSL shared memory segment and
semaphores.

--
Oliver Elphick olly(at)lfix(dot)co(dot)uk
Isle of Wight http://www.lfix.co.uk/oliver
GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA
========================================
"O death, where is thy sting? O grave, where is
thy victory?" 1 Corinthians 15:55


From: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
To: olly(at)lfix(dot)co(dot)uk
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: server auto-restarts and ipcs
Date: 2004-11-09 14:00:12
Message-ID: 200411090700.12657.pgsql@bluepolka.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Tuesday November 9 2004 2:16, Oliver Elphick wrote:
> On Mon, 2004-11-08 at 17:47 -0700, Ed L. wrote:
> > I often wonder why ipcs never seems to show the shared memory
> > block in question?
>
> The permissions of the shared memory block and the semaphore arrays are
> 600. ipcs seems not to report objects which you cannot access. Run
> ipcs as root and you should see the PostgreQSL shared memory segment and
> semaphores.

I don't see them when running ipcs as root, either. Not sure that would
make sense given the shared memory is created as the same user running
ipcs...

Ed


From: Oliver Elphick <olly(at)lfix(dot)co(dot)uk>
To: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: server auto-restarts and ipcs
Date: 2004-11-09 16:54:59
Message-ID: 1100019299.29685.62.camel@linda
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Tue, 2004-11-09 at 07:00 -0700, Ed L. wrote:
> On Tuesday November 9 2004 2:16, Oliver Elphick wrote:
> > On Mon, 2004-11-08 at 17:47 -0700, Ed L. wrote:
> > > I often wonder why ipcs never seems to show the shared memory
> > > block in question?
> >
> > The permissions of the shared memory block and the semaphore arrays are
> > 600. ipcs seems not to report objects which you cannot access. Run
> > ipcs as root and you should see the PostgreQSL shared memory segment and
> > semaphores.
>
> I don't see them when running ipcs as root, either. Not sure that would
> make sense given the shared memory is created as the same user running
> ipcs...

If neither root nor their creator can see them, I assume they don't
exist. Certainly, with Linux 2.6 and util-linux 2.12, ipcs sees the
postgres objects whether it is run by root or by the postgres user.

--
Oliver Elphick olly(at)lfix(dot)co(dot)uk
Isle of Wight http://www.lfix.co.uk/oliver
GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA
========================================
"O death, where is thy sting? O grave, where is
thy victory?" 1 Corinthians 15:55


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [GENERAL] server auto-restarts and ipcs
Date: 2004-11-09 18:44:32
Message-ID: 26671.1100025872@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

"Ed L." <pgsql(at)bluepolka(dot)net> writes:
> A power failure led to failed postmaster restart using 7.4.6 (see output
> below). The short-term fix is usually to delete the pid file and restart.
> I often wonder why ipcs never seems to show the shared memory
> block in question?

> 2004-11-08 17:17:22.398 [18038] FATAL: pre-existing shared memory block (key 9746001, ID 658210829) is still in use

I did a bit of experimentation and found that the Linux kernel does seem
to reproducibly assign similar shmem IDs from one boot cycle to the
next. Here's a smoking-gun case:

$ sudo ipcs -m

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x0052e2c1 65536 postgres 600 10436608 1
0x00000000 131073 gdm 600 393216 2 dest
0x00530201 163842 tgl 600 10395648 2

[ reboot ]

$ sudo ipcs -m

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x0052e2c1 65536 postgres 600 10436608 1
0x00530201 98305 tgl 600 10395648 2
0x00000000 163842 gdm 600 393216 2 dest

The "tgl" entry is a manually-started postmaster, which in the second
boot cycle I was able to start before gdm came up. Notice that gdm has
been handed out a shmid that belonged to a different userID in the
previous boot cycle.

What this says is that given a little bit of variability in the boot
cycle, it is fairly likely for the postmaster.pid file to contain a
shared memory ID that has already been assigned to another daemon in
the current boot cycle. The way that PGSharedMemoryIsInUse() is coded,
this will result in a failure as exhibited by Ed, because shmctl() will
return EACCES and we interpret that as a conflicting shmem segment.
(The reason this is considered dangerous is it suggests that there might
be backends still alive from a crashed previous postmaster; we dare not
start new backends that are not in sync with the old ones.)

After thinking about this awhile, I believe that it is safe to consider
EACCES as a don't-care situation. EACCES could only happen if the shmem
ID belongs to a different userid, which implies that it is not a
postgres shared memory segment. Even if you are running postmasters
under multiple userids, this can be ignored, because all that we care
about is whether the shared memory segment could indicate the presence
of backends running in the current $PGDATA directory. With the file
permissions that we use, it is not possible for a shared memory segment
to belong to a userid different from the one that owns the data
directory, and so any postmaster having a different userid must be
managing a different data directory.

So we could reduce our exposure to failure-to-start conditions by
allowing the EACCES case in PGSharedMemoryIsInUse. Does anyone see
a flaw in this reasoning?

This isn't a complete solution, because if you are running multiple
postmasters under the *same* userid, they could still get confused.
We could probably fix that by marking each shmem seg to indicate which
data directory it goes with (eg, store the directory's inode number in
the seg header). If we see an apparently live shmem segment of our own
userid, we could attach to it and check the header to determine whether
it's really a conflict or not. There might be some portability issues
here though; didn't we find out that Windows doesn't really have inode
numbers?

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: server auto-restarts and ipcs
Date: 2004-11-09 20:30:39
Message-ID: 87actqix3k.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> "Ed L." <pgsql(at)bluepolka(dot)net> writes:
> > A power failure led to failed postmaster restart using 7.4.6 (see
> > output below). The short-term fix is usually to delete the pid file
> > and restart.
>
> Thinking some more about this ... does anyone know the algorithm used
> in Linux to assign shared memory segment IDs?

At least in 2.6 it seems to avoid reuse of ids by keeping a global counter
that is incremented every time a segment is created which ranges from 0..128k
that it multiplies by 32k and adds to the array index (which is reused
quickly).

So it doesn't seem plausible that there was an id collision unless this was
different in 2.4.20. However looking at his list of ids they're all separated
by multiples of 32769 which is what you would expect from this algorithm at
least until they start being reused.

--
greg


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Greg Stark <gsstark(at)MIT(dot)EDU>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: server auto-restarts and ipcs
Date: 2004-11-09 20:52:39
Message-ID: 874qjyiw2w.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


Greg Stark <gsstark(at)MIT(dot)EDU> writes:

> At least in 2.6 it seems to avoid reuse of ids by keeping a global counter
> that is incremented every time a segment is created which ranges from 0..128k
> that it multiplies by 32k and adds to the array index (which is reused
> quickly).
>
> So it doesn't seem plausible that there was an id collision unless this was
> different in 2.4.20. However looking at his list of ids they're all separated
> by multiples of 32769 which is what you would expect from this algorithm at
> least until they start being reused.

Oh I missed the fact that you were talking about after a reboot. So the
algorithm I described would produce exactly the same sequence of ids after any
reboot given the same sequence of creation and deletions. Even if there's a
different sequence as long as the n'th creation is for the m'th array slot it
would get the same id. So collisions would be very common.

(though it seems the sequence is shared across all the ipc objects.)

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)MIT(dot)EDU>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: server auto-restarts and ipcs
Date: 2004-11-09 21:36:17
Message-ID: 5353.1100036177@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Greg Stark <gsstark(at)MIT(dot)EDU> writes:
> Oh I missed the fact that you were talking about after a reboot. So the
> algorithm I described would produce exactly the same sequence of ids after any
> reboot given the same sequence of creation and deletions. Even if there's a
> different sequence as long as the n'th creation is for the m'th array slot it
> would get the same id. So collisions would be very common.

This seems to square with Ed's complaint that he frequently sees a
collision after a reboot. I've just committed some code that makes a
more extensive check as to whether a pre-existing segment actually has
any relevance to our data directory; should fix the problem.

regards, tom lane


From: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: server auto-restarts and ipcs
Date: 2004-11-09 23:28:01
Message-ID: 200411091628.01401.pgsql@bluepolka.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Tuesday November 9 2004 1:37, Tom Lane wrote:
> >> The shared memory block would certainly not still exist after a system
> >> reboot, so what we have here is a misleading error message. Looking
> >> at the code, the most plausible explanation appears to be that
> >> shmctl(IPC_STAT) is failing (which it ought to) and returning some
> >> errno code different from EINVAL (which is the case we are expecting
> >> to see).
>
> I believe the attached patch will fix this problem for you, at least on
> the assumption that you are starting only one postmaster at system boot.

Just realizing we do start multiple postmasters under same user id when
upgrading a cluster (one on old port, one on new).

I noticed that ipcs on my linux box has a command-line option to list the
pid that created the segment. Not sure if such a library exists in usable
form, but looking for segments owned by the downed postmaster's pid would
seem to be what is needed. Just a thought...

Ed


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: server auto-restarts and ipcs
Date: 2004-11-09 23:35:34
Message-ID: 6417.1100043334@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

"Ed L." <pgsql(at)bluepolka(dot)net> writes:
> I noticed that ipcs on my linux box has a command-line option to list the
> pid that created the segment. Not sure if such a library exists in usable
> form, but looking for segments owned by the downed postmaster's pid would
> seem to be what is needed. Just a thought...

[ thinks about it... ] Nah, it's still not bulletproof, because in a
system reboot situation you can't trust the old PID either. It could
easy be that the other guy gets both the PID and the shmem ID that
belonged to you last time.

I've committed changes for 8.0 that mark a shmem segment with the inode
of the associated data directory; that should be a stable enough ID to
handle all routine-reboot cases. (If you had to restore your whole
filesystem from backup tapes, it might be wrong, but you're going to be
doing such recovery manually anyway ...)

regards, tom lane


From: "Ed L(dot)" <pgsql(at)bluepolka(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: server auto-restarts and ipcs
Date: 2004-11-10 00:20:47
Message-ID: 200411091720.47996.pgsql@bluepolka.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Tuesday November 9 2004 4:35, Tom Lane wrote:
> "Ed L." <pgsql(at)bluepolka(dot)net> writes:
> > I noticed that ipcs on my linux box has a command-line option to list
> > the pid that created the segment. Not sure if such a library exists in
> > usable form, but looking for segments owned by the downed postmaster's
> > pid would seem to be what is needed. Just a thought...
>
> [ thinks about it... ] Nah, it's still not bulletproof, because in a
> system reboot situation you can't trust the old PID either. It could
> easy be that the other guy gets both the PID and the shmem ID that
> belonged to you last time.

I see. Ipcs on my box also lists the date/time of shared memory segment
attach/detach/change (ipcs -t), but ...

> I've committed changes for 8.0 that mark a shmem segment with the inode
> of the associated data directory; that should be a stable enough ID to
> handle all routine-reboot cases. (If you had to restore your whole
> filesystem from backup tapes, it might be wrong, but you're going to be
> doing such recovery manually anyway ...)

...that will remove a major hassle for us and lots of other. Thanks.

Ed