Re: Stats collection on Windows

Lists: pgsql-hackers
From: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Stats collection on Windows
Date: 2006-04-05 00:35:30
Message-ID: 4432CA82020000BE00002949@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

I think I've found the cause (or one of the causes) why stats
collection is unreliable on Windows and I'm wondering about the best way
to go about fixing it.

The problem is that process IDs on Windows seem to be assigned without
much rhyme or reason and it seems to happen relatively frequently that a
new process will be assigned the same process ID as a process which
recently died. If this happens before the backend has been expired out
of pgstat.c's pgStatBeDead hash, the backend will be missed.

I was thinking the postmaster could maintain a backend sequence number
with similar semantics to a UNIXish process ID which could then be used
as the key for pgStatBeDead instead of the actual process ID. Does that
sound reasonable?

Peter


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Stats collection on Windows
Date: 2006-04-05 03:02:11
Message-ID: 23293.1144206131@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov> writes:
> I think I've found the cause (or one of the causes) why stats
> collection is unreliable on Windows and I'm wondering about the best way
> to go about fixing it.

> The problem is that process IDs on Windows seem to be assigned without
> much rhyme or reason and it seems to happen relatively frequently that a
> new process will be assigned the same process ID as a process which
> recently died. If this happens before the backend has been expired out
> of pgstat.c's pgStatBeDead hash, the backend will be missed.

That's an interesting theory, but do you have any actual evidence for it?
The evidence I've seen says that our big problem on Windows is the stats
collector process just quitting due to unexplained piperead() failures.

(I mean, I'd love to blame Microsoft for everything, but even the
Redmond crowd should be able to figure out that recycling process IDs
instantly would be a stupid idea...)

regards, tom lane


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Stats collection on Windows
Date: 2006-04-05 03:07:12
Message-ID: e0vc92$ted$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote
>
>> The problem is that process IDs on Windows seem to be assigned without
>> much rhyme or reason and it seems to happen relatively frequently that a
>> new process will be assigned the same process ID as a process which
>> recently died.
>
> That's an interesting theory, but do you have any actual evidence for it?
>

I can confirm that on Windows 2000 the process ID is recycled instantly.

Regards,
Qingqing


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Stats collection on Windows
Date: 2006-04-05 03:13:48
Message-ID: e0vcld$v1a$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote
>
> Redmond crowd should be able to figure out that recycling process IDs
> instantly would be a stupid idea...)
>

Can you explain more of this? IMHO, if we rely on feature like this, the
difference is unstable-every-day vs. unstable-every-year.

Regards,
Qingqing


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Stats collection on Windows
Date: 2006-04-05 03:17:49
Message-ID: 23436.1144207069@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu> writes:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote
>> Redmond crowd should be able to figure out that recycling process IDs
>> instantly would be a stupid idea...)

> Can you explain more of this? IMHO, if we rely on feature like this, the
> difference is unstable-every-day vs. unstable-every-year.

The mere existence of the kill() primitive should bring to mind reasons
why it's a bad idea.

regards, tom lane


From: mark(at)mark(dot)mielke(dot)cc
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Brant <Peter(dot)Brant(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Stats collection on Windows
Date: 2006-04-05 07:12:41
Message-ID: 20060405071241.GB7742@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 04, 2006 at 11:02:11PM -0400, Tom Lane wrote:
> "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov> writes:
> > I think I've found the cause (or one of the causes) why stats
> > collection is unreliable on Windows and I'm wondering about the best way
> > to go about fixing it.
> > The problem is that process IDs on Windows seem to be assigned without
> > much rhyme or reason and it seems to happen relatively frequently that a
> > new process will be assigned the same process ID as a process which
> > recently died. If this happens before the backend has been expired out
> > of pgstat.c's pgStatBeDead hash, the backend will be missed.
> That's an interesting theory, but do you have any actual evidence for it?
> The evidence I've seen says that our big problem on Windows is the stats
> collector process just quitting due to unexplained piperead() failures.
> (I mean, I'd love to blame Microsoft for everything, but even the
> Redmond crowd should be able to figure out that recycling process IDs
> instantly would be a stupid idea...)

Why? :-)

They use HANDLE. The process ID isn't nearly as useful as it is on UNIX.
I haven't looked at that stuff in a long time, but process "ID" on Windows
may be a compatibility method.

Process "ID" isn't necessarily a good way of identifying tasks,
precisely because they may be reused. Using a serial allocated at
process start might make more sense. Relying on it not being reused,
is not dissimilar to the old malloc() "tricks" of assuming that
malloc() will not return something recently free()d. It's bad.

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: mark(at)mark(dot)mielke(dot)cc
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Stats collection on Windows
Date: 2006-04-05 07:20:47
Message-ID: 20060405072046.GC7742@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 04, 2006 at 11:17:49PM -0400, Tom Lane wrote:
> "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu> writes:
> > "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote
> >> Redmond crowd should be able to figure out that recycling process IDs
> >> instantly would be a stupid idea...)
> > Can you explain more of this? IMHO, if we rely on feature like this, the
> > difference is unstable-every-day vs. unstable-every-year.
> The mere existence of the kill() primitive should bring to mind reasons
> why it's a bad idea.

CreateProcess - "The process is assigned a process identifier. The
identifier is valid until the process terminates. It can be used to
identify the process, or specified in the OpenProcess function to open
a handle to the process. The initial thread in the process is also
assigned a thread identifier. ..."

TerminateProcess - "Terminates the specified process and all of its
threads."

TerminateProcess takes a HANDLE, not a process identifier. Yes, they
provide the "kill" primitive, but only as a compatibility measure. A
"good" Windows process, should maintain a HANDLE to the process, and
kill the process using the HANDLE. This way, there is no race. The
HANDLE is also how you wait for the process to terminate normally.

I prefer the "Redmond" way, in that I find UNIX's use of integer
identifiers to be encouraging of race conditions. UNIX requires hacks
like minimizing PID reuse, because UNIX is the one that is broken. :-)

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: mark(at)mark(dot)mielke(dot)cc
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Stats collection on Windows
Date: 2006-04-05 07:30:06
Message-ID: 20060405073006.GA18401@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 05, 2006 at 03:20:47AM -0400, mark(at)mark(dot)mielke(dot)cc wrote:
> TerminateProcess takes a HANDLE, not a process identifier. Yes, they
> provide the "kill" primitive, but only as a compatibility measure. A
> "good" Windows process, should maintain a HANDLE to the process, and
> kill the process using the HANDLE. This way, there is no race. The
> HANDLE is also how you wait for the process to terminate normally.

Which presents the solution, we should use the HANDLE on windows rather
than the process identifier.

> I prefer the "Redmond" way, in that I find UNIX's use of integer
> identifiers to be encouraging of race conditions. UNIX requires hacks
> like minimizing PID reuse, because UNIX is the one that is broken. :-)

Eh? A HANDLE is (or can be mapped to) an integer too. I don't see
anything on that page about handle reuse. If you run a machine long
enough I'm sure it can be reused also...

Have a nice day,

--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: mark(at)mark(dot)mielke(dot)cc
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Stats collection on Windows
Date: 2006-04-05 07:38:28
Message-ID: 20060405073828.GA8069@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 05, 2006 at 09:30:06AM +0200, Martijn van Oosterhout wrote:
> On Wed, Apr 05, 2006 at 03:20:47AM -0400, mark(at)mark(dot)mielke(dot)cc wrote:
> > TerminateProcess takes a HANDLE, not a process identifier. Yes, they
> > provide the "kill" primitive, but only as a compatibility measure. A
> > "good" Windows process, should maintain a HANDLE to the process, and
> > kill the process using the HANDLE. This way, there is no race. The
> > HANDLE is also how you wait for the process to terminate normally.
> Which presents the solution, we should use the HANDLE on windows rather
> than the process identifier.

Yes.

> > I prefer the "Redmond" way, in that I find UNIX's use of integer
> > identifiers to be encouraging of race conditions. UNIX requires hacks
> > like minimizing PID reuse, because UNIX is the one that is broken. :-)
> Eh? A HANDLE is (or can be mapped to) an integer too. I don't see
> anything on that page about handle reuse. If you run a machine long
> enough I'm sure it can be reused also...

Once upon a time, when I played with this stuff (I mostly use UNIX, not
Windows), I concluded to myself that HANDLE was process-local, and that
it was allocated. Meaning - it won't be re-used until you CloseHandle().
It's best then, to think of HANDLE as a opaque object. Regardless, of
whether it is process-local or not, until you run CloseHandle(), it is
yours to keep, and it won't be re-used.

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: mark(at)mark(dot)mielke(dot)cc
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Stats collection on Windows
Date: 2006-04-05 07:58:54
Message-ID: 20060405075854.GB18401@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 05, 2006 at 03:38:28AM -0400, mark(at)mark(dot)mielke(dot)cc wrote:
> Once upon a time, when I played with this stuff (I mostly use UNIX, not
> Windows), I concluded to myself that HANDLE was process-local, and that
> it was allocated. Meaning - it won't be re-used until you CloseHandle().
> It's best then, to think of HANDLE as a opaque object. Regardless, of
> whether it is process-local or not, until you run CloseHandle(), it is
> yours to keep, and it won't be re-used.

HANDLE is process local? That is worse then, because then there's no
guarentee that each process will see a different identifier.

The stats collector identifies processes by their process id, which
they get using getpid(). If instead they used a handle for their own
process (GetCurrentProcess() always returns -1, but you can apparently
clone it to get a real handle), you have no idea whether that handle is
unique amongst backends, because it's process local.

The stats collector doesn't have any open handles for the backend, it's
just a way for backends to identify themselves. It appears that process
handles are not up to the task either...

Do we have a plan C?
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: mark(at)mark(dot)mielke(dot)cc
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Stats collection on Windows
Date: 2006-04-05 10:03:31
Message-ID: 20060405100330.GA10082@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 05, 2006 at 09:58:54AM +0200, Martijn van Oosterhout wrote:
> On Wed, Apr 05, 2006 at 03:38:28AM -0400, mark(at)mark(dot)mielke(dot)cc wrote:
> > Once upon a time, when I played with this stuff (I mostly use UNIX, not
> > Windows), I concluded to myself that HANDLE was process-local, and that
> > it was allocated. Meaning - it won't be re-used until you CloseHandle().
> > It's best then, to think of HANDLE as a opaque object. Regardless, of
> > whether it is process-local or not, until you run CloseHandle(), it is
> > yours to keep, and it won't be re-used.
> HANDLE is process local? That is worse then, because then there's no
> guarentee that each process will see a different identifier.

It's no different from a file descriptor on UNIX.

Neither UNIX nor Windows promise that a process identifier is valid
beyond the life of the process. UNIX avoids it from happening, as it
is necessary to avoid races with system calls such as kill(). Windows
does not have this problem.

> The stats collector identifies processes by their process id, which
> they get using getpid(). If instead they used a handle for their own
> process (GetCurrentProcess() always returns -1, but you can apparently
> clone it to get a real handle), you have no idea whether that handle is
> unique amongst backends, because it's process local.

> The stats collector doesn't have any open handles for the backend, it's
> just a way for backends to identify themselves. It appears that process
> handles are not up to the task either...

> Do we have a plan C?

Sure. Serial. Allocate on process start.

Or, back to another topic from months ago - UUID generation... :-)

Process identifier should not be used beyond the life of the process.
As soon as wait() removes the process on UNIX, the process identifier
is no longer valid, and could be reused. That the operating system tries
to prevent problems by avoiding recycling isn't necessarily a good
reason to exploit this capability.

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: mark(at)mark(dot)mielke(dot)cc
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Stats collection on Windows
Date: 2006-04-05 10:20:49
Message-ID: 20060405102049.GC18401@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 05, 2006 at 06:03:31AM -0400, mark(at)mark(dot)mielke(dot)cc wrote:
> It's no different from a file descriptor on UNIX.
>
> Neither UNIX nor Windows promise that a process identifier is valid
> beyond the life of the process. UNIX avoids it from happening, as it
> is necessary to avoid races with system calls such as kill(). Windows
> does not have this problem.

But consider, why is the process id there? (Amongst other reasons) so
that users can monitor pg_stat_activity and kill a backend that's out
of control. The equivalent to this in windows would be to.

1. Get HANDLE from the process ID.
2. TerminateProcess with that HANDLE.

Presumably users have a little GUI app that displays processes on the
system with the process ID so they can kill it. If a process ID is
quickly reused, they may end up killing the wrong one.

The race condition in this case involves the user and you can't solve
that programmatically. The non-reuse of pids is more for
user-friendlyness than anything else. The Window use of HANDLE doesn't
solve this problem at all.

> Sure. Serial. Allocate on process start.
>
> Or, back to another topic from months ago - UUID generation... :-)

Neither of which solve the "I'm a user and want to kill *that* backend"
problem. Because even on windows you'll have to get the process id,
convert it to a handle an kill it. Same race condition.

> Process identifier should not be used beyond the life of the process.
> As soon as wait() removes the process on UNIX, the process identifier
> is no longer valid, and could be reused. That the operating system tries
> to prevent problems by avoiding recycling isn't necessarily a good
> reason to exploit this capability.

Yeah, but it's very useful as a user. Consider this scenerio:

<process 1234 goes AWOL>
kill -INT 1234
<check if process still there>
kill -TERM 1234
<process still there, damn it!>
kill -9 1234

There gone. With no quick PID reuse I can be sure I won't kill the
wrong one. This is presumably why recent versions of windows don't reuse
pids quickly either... It for *users* not programs.
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: mark(at)mark(dot)mielke(dot)cc
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Stats collection on Windows
Date: 2006-04-05 10:35:34
Message-ID: 20060405103534.GA10421@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 05, 2006 at 12:20:49PM +0200, Martijn van Oosterhout wrote:
> On Wed, Apr 05, 2006 at 06:03:31AM -0400, mark(at)mark(dot)mielke(dot)cc wrote:
> > It's no different from a file descriptor on UNIX.
> > Neither UNIX nor Windows promise that a process identifier is valid
> > beyond the life of the process. UNIX avoids it from happening, as it
> > is necessary to avoid races with system calls such as kill(). Windows
> > does not have this problem.
> But consider, why is the process id there? (Amongst other reasons) so
> that users can monitor pg_stat_activity and kill a backend that's out
> of control. The equivalent to this in windows would be to.
> 1. Get HANDLE from the process ID.
> 2. TerminateProcess with that HANDLE.

You missed 1.5. Ensure that HANDLE matches the process you intend. :-)

This is a little bit of a distraction though, as any system that requires
the user to be able to kill broken backends, is only indicative of a
broken backend. We're talking about how to deal with a broken process,
after the process owner (PostgreSQL) has forgotten about it.

What would be wrong with having a PostgreSQL generated serial associated
with each backend, that can be used by the backend process owner, to
map to a HANDLE, which uses TerminateProcess() underneath?

> Presumably users have a little GUI app that displays processes on the
> system with the process ID so they can kill it. If a process ID is
> quickly reused, they may end up killing the wrong one.

Sure. But the problem here, is that PostgreSQL is broken, so they find a
need to go to their process listing GUI. And who is to say that the GUI
doesn't do as I've suggested above? Ensure that TerminateProcess() is
against the intended process? I have no idea if it does - but it could.

> The race condition in this case involves the user and you can't solve
> that programmatically. The non-reuse of pids is more for
> user-friendlyness than anything else. The Window use of HANDLE doesn't
> solve this problem at all.

Sure you can. But it also shouldn't matter.

> > Sure. Serial. Allocate on process start.
> > Or, back to another topic from months ago - UUID generation... :-)
> Neither of which solve the "I'm a user and want to kill *that* backend"
> problem. Because even on windows you'll have to get the process id,
> convert it to a handle an kill it. Same race condition.

No, and not if PostgreSQL kills the backend for you cleanly, using the
HANDLE, that it owns for the process.

> > Process identifier should not be used beyond the life of the process.
> > As soon as wait() removes the process on UNIX, the process identifier
> > is no longer valid, and could be reused. That the operating system tries
> > to prevent problems by avoiding recycling isn't necessarily a good
> > reason to exploit this capability.
> Yeah, but it's very useful as a user. Consider this scenerio:
> <process 1234 goes AWOL>
> kill -INT 1234
> <check if process still there>
> kill -TERM 1234
> <process still there, damn it!>
> kill -9 1234
> There gone. With no quick PID reuse I can be sure I won't kill the
> wrong one. This is presumably why recent versions of windows don't reuse
> pids quickly either... It for *users* not programs.

Users shouldn't need to kill programs.

If they do - on the off chance that they do, they should be more
careful than you list above.

That's a kneejerk reaction to a problem that shouldn't occur in the
first place. It's a habit trained by UNIX users.

I rarely ever need to kill a process on my Windows box, and when I have,
the process listing in the GUI hasn't offered me a process ID. I click
on the item I wish to kill, and I right click "End Process". A lot more
user friendly, in my opinion. And if they happen to fix the race in the
background using the method I suggest above? All the better...

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: Stats collection on Windows
Date: 2006-04-05 14:49:00
Message-ID: 249.1144248540@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov> writes:
> I added some strategic printfs to pgstat.c. Attached is the output when
> a little program is run which, in a loop, makes 10 connections, sleeps 3
> seconds, closes them, sleeps another 3 seconds. My workstation (Windows
> XP) was otherwise idle.

> Search for "is known to be dead, ignoring" to find the re-used process
> IDs. Things start out clean, but after a few cycles anywhere between 1
> and 5 backends are being missed.

Looking at the pgstats code, I notice that once it makes an entry in the
dead-backends hashtable, it keeps that entry (rejecting any messages
with the same PID) for 10 seconds. That seems like approximately
forever on modern machines, certainly much more than any plausible
out-of-order condition in the UDP packet stream. It could easily be
enough to get us in trouble on Unix machines, never mind Windows.

A conservative suggestion would be to trim down the destroy interval.
A more radical one is to question whether we need the destroy delay
mechanism at all. What if we got rid of all that logic and simply let
the collector delete stuff when it's told to? Out-of-order messages
could cause entries to be re-created after they've been deleted, but
I'm not sure that I see any harm in that. Bogus DB and table entries
are already ignored in the pgstats views (because they won't join to
anything in the system catalogs) and we also have a filter for bogus
backend entries. There are also mechanisms that ensure these entries
will go away eventually: pgstat_vacuum_tabstat for DB and table
entries, and eventual re-use of a BackendId slot for backends.
So I'm sort of thinking that the destroy delay has outlived its
usefulness.

regards, tom lane


From: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>,"Jan Wieck" <JanWieck(at)Yahoo(dot)com>
Subject: Re: Stats collection on Windows
Date: 2006-04-05 16:55:30
Message-ID: 4433B032020000BE00002989@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm going to rip out the destroy code and see how it goes. Patch will
be forthcoming if things turn out well.

Pete

>>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> 04/05/06 4:49 pm >>>
So I'm sort of thinking that the destroy delay has outlived its
usefulness.

regards, tom lane