pg_ctl vs. Windows locking

Lists: pgsql-hackers-win32
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: pgsql-hackers-win32 <pgsql-hackers-win32(at)postgresql(dot)org>
Subject: pg_ctl vs. Windows locking
Date: 2004-06-13 21:07:44
Message-ID: 40CCC220.3070004@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers-win32


pg_ctl seems to be working for Windows now, except that there are occasional Windows file locking issues, especially on restart - see below.

cheers

andrew

C:\msys\1.0\local\pgsql>bin\pg_ctl -D data -l logfile restart
waiting for postmaster to shut down....done
postmaster stopped
postmaster starting

C:\msys\1.0\local\pgsql>
C:\msys\1.0\local\pgsql>
C:\msys\1.0\local\pgsql>
C:\msys\1.0\local\pgsql>
C:\msys\1.0\local\pgsql>bin\pg_ctl -D data -l logfile restart
waiting for postmaster to shut down....done
postmaster stopped
The process cannot access the file because it is being used by another process.
Unable to run the postmaster binary


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers-win32 <pgsql-hackers-win32(at)postgresql(dot)org>
Subject: Re: pg_ctl vs. Windows locking
Date: 2004-06-13 23:44:58
Message-ID: 200406132344.i5DNiw025070@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers-win32


Good report. Would you check to see if a sleep or loop checking if the
file is unused would help?

---------------------------------------------------------------------------

Andrew Dunstan wrote:
>
> pg_ctl seems to be working for Windows now, except that there are occasional Windows file locking issues, especially on restart - see below.
>
> cheers
>
> andrew
>
>
> C:\msys\1.0\local\pgsql>bin\pg_ctl -D data -l logfile restart
> waiting for postmaster to shut down....done
> postmaster stopped
> postmaster starting
>
> C:\msys\1.0\local\pgsql>
> C:\msys\1.0\local\pgsql>
> C:\msys\1.0\local\pgsql>
> C:\msys\1.0\local\pgsql>
> C:\msys\1.0\local\pgsql>bin\pg_ctl -D data -l logfile restart
> waiting for postmaster to shut down....done
> postmaster stopped
> The process cannot access the file because it is being used by another process.
> Unable to run the postmaster binary
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: pgsql-hackers-win32 <pgsql-hackers-win32(at)postgresql(dot)org>
Subject: Re: pg_ctl vs. Windows locking
Date: 2004-06-14 14:54:02
Message-ID: 40CDBC0A.5000207@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers-win32

Bruce Momjian wrote:

>Good report. Would you check to see if a sleep or loop checking if the
>file is unused would help?
>
>

I'll check later - I am assuming that this is the logfile it is
complaining about.

When I was watching it in the Task Manager, it was observable that the
postmaster exited before some of the other backend processes, by up to a
second or two. I suspect a sleep would cure most of the problem, but it
seems a bit arbitrary - is there a more robust way on Windows to check
that needed files are unlocked?

cheers

andrew

>---------------------------------------------------------------------------
>
>Andrew Dunstan wrote:
>
>
>>pg_ctl seems to be working for Windows now, except that there are occasional Windows file locking issues, especially on restart - see below.
>>
>>
>>
>>C:\msys\1.0\local\pgsql>bin\pg_ctl -D data -l logfile restart
>>waiting for postmaster to shut down....done
>>postmaster stopped
>>postmaster starting
>>
>>C:\msys\1.0\local\pgsql>
>>C:\msys\1.0\local\pgsql>
>>C:\msys\1.0\local\pgsql>
>>C:\msys\1.0\local\pgsql>
>>C:\msys\1.0\local\pgsql>bin\pg_ctl -D data -l logfile restart
>>waiting for postmaster to shut down....done
>>postmaster stopped
>>The process cannot access the file because it is being used by another process.
>>Unable to run the postmaster binary
>>
>>
>>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers-win32 <pgsql-hackers-win32(at)postgresql(dot)org>
Subject: Re: pg_ctl vs. Windows locking
Date: 2004-06-14 15:18:38
Message-ID: 384.1087226318@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers-win32

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> When I was watching it in the Task Manager, it was observable that the
> postmaster exited before some of the other backend processes, by up to a
> second or two.

The stats collector processes normally don't shut down until they
observe postmaster exit (and I think in CVS tip there's an
up-to-one-second delay before they'll even look). Everything else
should be gone first though.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: pgsql-hackers-win32 <pgsql-hackers-win32(at)postgresql(dot)org>
Subject: Re: pg_ctl vs. Windows locking
Date: 2004-06-14 15:35:41
Message-ID: 40CDC5CD.30806@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers-win32

Tom Lane wrote:

>Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>
>>When I was watching it in the Task Manager, it was observable that the
>>postmaster exited before some of the other backend processes, by up to a
>>second or two.
>>
>>
>
>The stats collector processes normally don't shut down until they
>observe postmaster exit (and I think in CVS tip there's an
>up-to-one-second delay before they'll even look). Everything else
>should be gone first though.
>
>
>

I assumed it was them. Can't tell in Task Manager, that I can see - all
you get is the image name :-(

Sounds like a 2 second sleep should do the trick. Will test later.

Maybe we should do it everywhere, not just Windows, to ensure that we
don't get nasty interleaving on the log file?

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers-win32 <pgsql-hackers-win32(at)postgresql(dot)org>, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_ctl vs. Windows locking
Date: 2004-06-14 15:55:50
Message-ID: 756.1087228550@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers-win32

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Tom Lane wrote:
>> The stats collector processes normally don't shut down until they
>> observe postmaster exit (and I think in CVS tip there's an
>> up-to-one-second delay before they'll even look). Everything else
>> should be gone first though.

> Sounds like a 2 second sleep should do the trick. Will test later.
> Maybe we should do it everywhere, not just Windows, to ensure that we
> don't get nasty interleaving on the log file?

The stats processes won't write anything to the log file under normal
circumstances anyway, so I wouldn't support the above.

A possibly more relevant argument is that if one is using the option to
keep stats totals across postmaster restarts, the old totals aren't
necessarily valid until the old stats processes exit, and so a pg_ctl
restart sequence could fail to transfer the totals.

Maybe we should redesign the shutdown sequence so that the stats
processes get killed quicker. Offhand it seems to me that we could kill
the stats buffer process as soon as the last normal backend is gone, and
let the stats collector process do its shutdown in parallel with the
shutdown checkpoint. I don't believe that a checkpoint operation will
send anything to pgstats, so this wouldn't lose any stats data.
Arguably it would make it more likely that the stats data gets written
--- in the current scheme, the stats collector is the last out the door
and thus in pretty serious risk of being SIGKILL'd by init, if we are in
an init-driven shutdown.

Jan, any thoughts?

regards, tom lane


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers-win32 <pgsql-hackers-win32(at)postgresql(dot)org>
Subject: Re: pg_ctl vs. Windows locking
Date: 2004-06-14 16:23:31
Message-ID: 40CDD103.8010909@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers-win32

On 6/14/2004 11:55 AM, Tom Lane wrote:

> Maybe we should redesign the shutdown sequence so that the stats
> processes get killed quicker. Offhand it seems to me that we could kill
> the stats buffer process as soon as the last normal backend is gone, and
> let the stats collector process do its shutdown in parallel with the
> shutdown checkpoint. I don't believe that a checkpoint operation will
> send anything to pgstats, so this wouldn't lose any stats data.
> Arguably it would make it more likely that the stats data gets written
> --- in the current scheme, the stats collector is the last out the door
> and thus in pretty serious risk of being SIGKILL'd by init, if we are in
> an init-driven shutdown.

Yes, killing (or rather closing the pipe to it) early is probably the
right idea.

The claim it wouldn't carry over the stats is wrong. Worst case an
instantaneously restarting postmaster firing off a new collector before
the old one has written the final stats can cause stats collected during
the last 500 milliseconds to be lost. The new collector could open the
old stats file before the old collector did the rename.

We don't guarantee loss free stats by design anyway, so some stats lost
due to a too fast restarting postmaster aren't any worse than stats lost
due to a dropped UDP packet.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers-win32 <pgsql-hackers-win32(at)postgresql(dot)org>
Subject: Re: pg_ctl vs. Windows locking
Date: 2004-06-14 16:27:02
Message-ID: 2480.1087230422@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers-win32

Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> Yes, killing (or rather closing the pipe to it) early is probably the
> right idea.

Right, we can just close the pipe. I was thinking of adding another
signal but that's obviously easier. Will work on making this happen.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers-win32 <pgsql-hackers-win32(at)postgresql(dot)org>
Subject: Re: pg_ctl vs. Windows locking
Date: 2004-06-14 16:39:08
Message-ID: 3698.1087231148@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers-win32

> Right, we can just close the pipe. I was thinking of adding another
> signal but that's obviously easier. Will work on making this happen.

... except there is no postmaster pipe anymore --- back to plan A.

regards, tom lane


From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-hackers-win32(at)postgresql(dot)org
Subject: Re: pg_ctl vs. Windows locking
Date: 2004-06-14 20:50:39
Message-ID: cal32v$jad$1@sea.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers-win32

Andrew Dunstan schrieb:
> Can't tell in Task Manager, that I can see - all
> you get is the image name :-(

Try ProcessExplorer from SysInternals. It will show you the complete
command line which was used to start the process and the open file handles
as well (and loaded DLLs and, and...)

Download at:
http://www.sysinternals.com/ntw2k/freeware/procexp.shtml

Regards
Thomas


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jan Wieck <JanWieck(at)Yahoo(dot)com>, pgsql-hackers-win32 <pgsql-hackers-win32(at)postgresql(dot)org>
Subject: Re: pg_ctl vs. Windows locking
Date: 2004-06-17 19:02:56
Message-ID: 40D1EAE0.8050909@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers-win32

Tom Lane wrote:

>>Right, we can just close the pipe. I was thinking of adding another
>>signal but that's obviously easier. Will work on making this happen.
>>
>>
>
>... except there is no postmaster pipe anymore --- back to plan A.
>
>
>

Did we get a resolution on this?

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Jan Wieck <JanWieck(at)Yahoo(dot)com>, pgsql-hackers-win32 <pgsql-hackers-win32(at)postgresql(dot)org>
Subject: Re: pg_ctl vs. Windows locking
Date: 2004-06-18 16:18:21
Message-ID: 21345.1087575501@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers-win32

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Tom Lane wrote:
>>> Right, we can just close the pipe. I was thinking of adding another
>>> signal but that's obviously easier. Will work on making this happen.
>>
>> ... except there is no postmaster pipe anymore --- back to plan A.

> Did we get a resolution on this?

Yes, I committed the change several days ago. The pgstat processes are
told to shut down at the same time the shutdown checkpoint starts.
I didn't actually make the postmaster wait for them, but I'd think that
under normal circumstances the shutdown checkpoint will take longer.

regards, tom lane