[Win32] Problem with rename()

Lists: pgsql-bugspgsql-patches
From: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
To: <pgsql-bugs(at)postgresql(dot)org>
Subject: [Win32] Problem with rename()
Date: 2006-04-17 23:53:56
Message-ID: 4443E444020000BE00002ED6@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches

Hi all,

In the last couple of days, we've been bitten (a couple of times, on
different servers) by an apparent glitch or bad interaction in the
Windows implementation of rename().

The relevant log message is:

[2006-04-17 16:49:22.583 ] 2252 LOG: could not rename file
"pg_xlog/000000010000010A000000BD" to
"pg_xlog/000000010000010A000000D7", continuing to try

It apparently just keeps on looping indefinitely. The "completed
rename" message from port/dirmod.c never shows up.

Shortly thereafter, Postgres becomes unresponsive. Attempts to make a
new connection just block. Autovacuums block. A "pg_ctl ... stop -m
fast" doesn't work. Only "pg_ctl ... stop -m immediate" does.

With the last occurrence, I saved off the output of "handle -a" and
"pslist -x" in case that's helpful.

Any thoughts on what might be going wrong? If it happens again, what
other clues should I be looking for?

Pete


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Peter Brant <Peter(dot)Brant(at)wicourts(dot)gov>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: [Win32] Problem with rename()
Date: 2006-04-18 00:58:36
Message-ID: 200604180058.k3I0was02090@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches

Peter Brant wrote:
> Hi all,
>
> In the last couple of days, we've been bitten (a couple of times, on
> different servers) by an apparent glitch or bad interaction in the
> Windows implementation of rename().
>
> The relevant log message is:
>
> [2006-04-17 16:49:22.583 ] 2252 LOG: could not rename file
> "pg_xlog/000000010000010A000000BD" to
> "pg_xlog/000000010000010A000000D7", continuing to try
>
> It apparently just keeps on looping indefinitely. The "completed
> rename" message from port/dirmod.c never shows up.
>
> Shortly thereafter, Postgres becomes unresponsive. Attempts to make a
> new connection just block. Autovacuums block. A "pg_ctl ... stop -m
> fast" doesn't work. Only "pg_ctl ... stop -m immediate" does.
>
> With the last occurrence, I saved off the output of "handle -a" and
> "pslist -x" in case that's helpful.
>
> Any thoughts on what might be going wrong? If it happens again, what
> other clues should I be looking for?

Yes, comment I added to dirmod.c give a hint:

/*
* We need these loops because even though PostgreSQL uses flags that
* allow rename while the file is open, other applications might have
* these files open without those flags.
*/

so someone else has the file opened, but didn't use the required flags.
As to what could have it open, I don't know.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: [Win32] Problem with rename()
Date: 2006-04-18 01:54:08
Message-ID: e21h2c$1c8$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches


""Peter Brant"" <Peter(dot)Brant(at)wicourts(dot)gov>
>
> In the last couple of days, we've been bitten (a couple of times, on
> different servers) by an apparent glitch or bad interaction in the
> Windows implementation of rename().
>
> The relevant log message is:
>
> [2006-04-17 16:49:22.583 ] 2252 LOG: could not rename file
> "pg_xlog/000000010000010A000000BD" to
> "pg_xlog/000000010000010A000000D7", continuing to try
>
> It apparently just keeps on looping indefinitely. The "completed
> rename" message from port/dirmod.c never shows up.
>

Similar problems have been reported before -- which PG version and do you
have any anti-virus software installed?

Regards,
Qingqing


From: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, <Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, <Magnus Hagander <mha(at)sollentuna(dot)net>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: [Win32] Problem with rename()
Date: 2006-04-18 14:15:04
Message-ID: 4444AE18020000BE00002F1F@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches

Unfortunately, it's not that simple. It would be straightforward to
track down if it were.

In response to other questions:

It's Postgres 8.1.3 running on Windows 2003 Server. No anti-virus
software is installed. The servers are essentially bare except for the
OS and Postgres.

We have "handle -a" output from two occurrences (different servers):

For the first one:

LOG: could not rename file "pg_xlog/000000010000010A000000BD" to
"pg_xlog/000000010000010A000000D7", continuing to try

Only one process (postgres.exe) is holding a handle to
pg_xlog/000000010000010A000000BD:

F84: Event \BaseNamedObjects\pgident: postgres: bigbird
bigbird 127.0.0.1(3306) BIND
FF4: File G:\pgsql\data\pg_xlog\000000010000010A000000BD

Nothing has the target file open.

The second is similar, except that two postgres.exe processes (and
nothing else) have the file open:

LOG: could not rename file "pg_xlog/000000010000010A0000006E" to
"pg_xlog/000000010000010A00000087", continuing to try

#1:
F84: Event \BaseNamedObjects\pgident: postgres: bigbird
bigbird 127.0.0.1(2367) SELECT
EFC: File G:\pgsql\data\pg_xlog\000000010000010A0000006E

#2:
F84: Event \BaseNamedObjects\pgident: postgres: bigbird
bigbird 127.0.0.1(2420) SELECT
FF4: File G:\pgsql\data\pg_xlog\000000010000010A0000006E

Nothing has the target file open.

Pete

>>> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> 04/18/06 2:58 am >>>
Yes, comment I added to dirmod.c give a hint:

/*
* We need these loops because even though PostgreSQL uses flags
that
* allow rename while the file is open, other applications might
have
* these files open without those flags.
*/

so someone else has the file opened, but didn't use the required flags.

As to what could have it open, I don't know.


From: "Harald Armin Massa" <haraldarminmassa(at)gmail(dot)com>
To: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: [Win32] Problem with rename()
Date: 2006-04-18 14:35:19
Message-ID: 7be3f35d0604180735l71c2c8b3t209a56bdee00eea1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches

Peter,

> G:\pgsql\data\pg_xlog\000000010000010A000000BD

propably a very stupid question: "G" - is that really a LOKAL drive at that
server, or rather some NAS or similiar?

I had the same error in one logfile one time, but there where a large amount
of possible culprits (viral scanner, login script changing permissions,
backups, access control software...) and we could not reproduce the error.

Harald
--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
-
PostgreSQL - supported by a community that does not put you on hold


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
Cc: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "<Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>, "<Magnus Hagander" <mha(at)sollentuna(dot)net>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: [Win32] Problem with rename()
Date: 2006-04-18 14:50:58
Message-ID: 13220.1145371858@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches

"Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov> writes:
> LOG: could not rename file "pg_xlog/000000010000010A000000BD" to
> "pg_xlog/000000010000010A000000D7", continuing to try
> ...
> Only one process (postgres.exe) is holding a handle to
> pg_xlog/000000010000010A000000BD:
> ...
> The second is similar, except that two postgres.exe processes (and
> nothing else) have the file open:

Hmm, could these be backends that have been sitting idle for some time?
I'd expect a backend to be holding open a handle for whichever WAL
segment it last wrote to. If the backend sits idle for a couple of
checkpoints while others are advancing the end of WAL, then that segment
could become a target for renaming.

The only workable fix I can think of is to allow the checkpointer to
simply fail to rename this segment and go on about its business,
figuring that we'll be able to rename/delete the WAL segment in some
future checkpoint cycle. Not sure how messy that would be to implement.

regards, tom lane


From: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
To: "Harald Armin Massa" <haraldarminmassa(at)gmail(dot)com>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: [Win32] Problem with rename()
Date: 2006-04-18 15:42:43
Message-ID: 4444C2A3020000BE00002F4C@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches

They are local.

Pete

>>> "Harald Armin Massa" <haraldarminmassa(at)gmail(dot)com> 04/18/06 4:35 pm
>>>
"G" - is that really a LOKAL drive at that server, or rather some NAS
or similiar?


From: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "<Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>, <pgsql-bugs(at)postgresql(dot)org>, "<Magnus Hagander" <mha(at)sollentuna(dot)net>
Subject: Re: [Win32] Problem with rename()
Date: 2006-04-18 15:50:39
Message-ID: 4444C47F020000BE00002F59@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches

It's definitely possible. Both failures occurred around the end of the
business day as update traffic would have been coasting to a stop. The
middle tier never closes a connection unless it's forced to (e.g. as a
result of a query error, connection going away, etc.)

Pete

>>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> 04/18/06 4:50 pm >>>
Hmm, could these be backends that have been sitting idle for some
time?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: [Win32] Problem with rename()
Date: 2006-04-18 17:09:09
Message-ID: 14774.1145380149@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches

"Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov> writes:
> [2006-04-17 16:49:22.583 ] 2252 LOG: could not rename file
> "pg_xlog/000000010000010A000000BD" to
> "pg_xlog/000000010000010A000000D7", continuing to try
> It apparently just keeps on looping indefinitely. The "completed
> rename" message from port/dirmod.c never shows up.

> Shortly thereafter, Postgres becomes unresponsive. Attempts to make a
> new connection just block. Autovacuums block. A "pg_ctl ... stop -m
> fast" doesn't work. Only "pg_ctl ... stop -m immediate" does.

BTW, whatever we decide to do about the rename problem, I'd say that the
second point represents an independent bug. The rename loop would hang
up the bgwriter, which would probably cause performance to tank, but the
rest of the system shouldn't become completely unresponsive because of
an incomplete checkpoint. The checkpoint operation shouldn't be holding
any critical locks at this point.

Can you find out anything about what the other processes are blocking on?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: [Win32] Problem with rename()
Date: 2006-04-18 18:03:40
Message-ID: 17351.1145383420@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches

I wrote:
> "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov> writes:
>> Shortly thereafter, Postgres becomes unresponsive. Attempts to make a
>> new connection just block. Autovacuums block. A "pg_ctl ... stop -m
>> fast" doesn't work. Only "pg_ctl ... stop -m immediate" does.

> BTW, whatever we decide to do about the rename problem, I'd say that the
> second point represents an independent bug. The rename loop would hang
> up the bgwriter, which would probably cause performance to tank, but the
> rest of the system shouldn't become completely unresponsive because of
> an incomplete checkpoint. The checkpoint operation shouldn't be holding
> any critical locks at this point.

I looked into this and found out that in fact, InstallXLogFileSegment
holds the ControlFileLock while trying to rename the WAL segment file.
It does this specifically as an interlock against someone else trying
to create the same new WAL segment name. So once the system runs out
of already-created WAL segments, XLogFileInit hangs up on the lock,
and then anything that wants to generate WAL entries is blocked.

It's possible that we could avoid using a lock here, but it would
require accepting some errors in creation/renaming of WAL segments as
being expected rather than fatal conditions. That seems a bit risky to
me, particularly for the Windows port where I have zero confidence that
I understand what errors Windows might report :-(. Maybe such a cure
is worse than the disease, since we intend to do something about fixing
the rename problem anyway. Any comments?

regards, tom lane


From: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: [Win32] Problem with rename()
Date: 2006-04-18 18:32:43
Message-ID: 4444EA7B020000BE00002F76@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches

Does that also explain why an attempt to make a new connection just
hangs?

One other thing regarding that is that connection attempt seems to
kinda, sorta succeed. It never makes it as far as a command prompt, but
on the "stop -m immediate", psql does print the "HINT: In a moment you
should be able to reconnect to the database and repeat your command.",
etc. log messages.

Pete

>>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> 04/18/06 8:03 pm >>>
I looked into this and found out that in fact, InstallXLogFileSegment
holds the ControlFileLock while trying to rename the WAL segment file.
It does this specifically as an interlock against someone else trying
to create the same new WAL segment name. So once the system runs out
of already-created WAL segments, XLogFileInit hangs up on the lock,
and then anything that wants to generate WAL entries is blocked.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: [Win32] Problem with rename()
Date: 2006-04-18 18:44:32
Message-ID: 17586.1145385872@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches

"Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov> writes:
> Does that also explain why an attempt to make a new connection just
> hangs?

Actually, I was just wondering about that --- seems like a bare
connection attempt should not generate any WAL entries. Do you have any
nondefault actions in ~/.psqlrc or something like that?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: [Win32] Problem with rename()
Date: 2006-04-18 19:01:05
Message-ID: 17802.1145386865@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches

I wrote:
> "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov> writes:
>> Does that also explain why an attempt to make a new connection just
>> hangs?

> Actually, I was just wondering about that --- seems like a bare
> connection attempt should not generate any WAL entries. Do you have any
> nondefault actions in ~/.psqlrc or something like that?

I just repeated the hangup scenario here, and confirmed that I can still
start and stop a plain-vanilla psql session (no ~/.psqlrc, no special
per-user or per-database settings) without it hanging. I can also do
simple read-only SELECTs. So I'm thinking your hang must involve some
additional non-read-only actions.

[ thinks for awhile longer ... ] No, I take that back. Once you'd
exhausted the current pg_clog page (32K transactions), even read-only
transactions would be blocked by the need to create a new pg_clog page
(which is a WAL-logged action). A read-only transaction never actually
makes a WAL entry, but it does still consume an XID and hence a slot on
the current pg_clog page. So I just hadn't tried enough transactions.

regards, tom lane


From: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: [Win32] Problem with rename()
Date: 2006-04-21 15:02:50
Message-ID: 4448ADCA020000BE00003135@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches

This is probably somewhat superfluous, but we had another one these
incidents last night whose details confirm your explanation here.

[2006-04-21 00:22:19.500 ] 2452 LOG: could not rename file
"pg_xlog/000000010000011A0000004C" to
"pg_xlog/000000010000011A00000071", continuing to try

the autovacuums (which wouldn't actually have been vacuuming anything
since update traffic would have stopped by then) continued until:

[2006-04-21 01:57:35.968 ] 4048 LOG: autovacuum: processing database
"bigbird"

and the Web site first started noticing timeouts at 01:31:42,827.

Overnight traffic is so light that 70 minutes to work through 32K / 2
transactions is probably about right.

Pete

>>> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> 04/18/06 9:01 pm >>>
[ thinks for awhile longer ... ] No, I take that back. Once you'd
exhausted the current pg_clog page (32K transactions), even read-only
transactions would be blocked by the need to create a new pg_clog page
(which is a WAL-logged action). A read-only transaction never
actually
makes a WAL entry, but it does still consume an XID and hence a slot
on
the current pg_clog page. So I just hadn't tried enough transactions.


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Brant <Peter(dot)Brant(at)wicourts(dot)gov>, Bugs for PostgreSQL <pgsql-bugs(at)postgreSQL(dot)org>, PostgreSQL-patches <pgsql-patches(at)postgreSQL(dot)org>
Subject: Re: [Win32] Problem with rename()
Date: 2006-06-16 20:05:21
Message-ID: 200606162005.k5GK5LI28654@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches


I am assuming this problem and the other rash of Win32 problems reported
in March are now all fixed in 8.1.4. If not, please let me know.

---------------------------------------------------------------------------

Tom Lane wrote:
> I wrote:
> > "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov> writes:
> >> Does that also explain why an attempt to make a new connection just
> >> hangs?
>
> > Actually, I was just wondering about that --- seems like a bare
> > connection attempt should not generate any WAL entries. Do you have any
> > nondefault actions in ~/.psqlrc or something like that?
>
> I just repeated the hangup scenario here, and confirmed that I can still
> start and stop a plain-vanilla psql session (no ~/.psqlrc, no special
> per-user or per-database settings) without it hanging. I can also do
> simple read-only SELECTs. So I'm thinking your hang must involve some
> additional non-read-only actions.
>
> [ thinks for awhile longer ... ] No, I take that back. Once you'd
> exhausted the current pg_clog page (32K transactions), even read-only
> transactions would be blocked by the need to create a new pg_clog page
> (which is a WAL-logged action). A read-only transaction never actually
> makes a WAL entry, but it does still consume an XID and hence a slot on
> the current pg_clog page. So I just hadn't tried enough transactions.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Bugs for PostgreSQL" <pgsql-bugs(at)postgreSQL(dot)org>, "PostgreSQL-patches" <pgsql-patches(at)postgreSQL(dot)org>
Subject: Re: [Win32] Problem with rename()
Date: 2006-06-16 20:45:37
Message-ID: 4493348E.E840.00BE.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches

Really? If there was a patch, I missed it.

My recollection is that there was general agreement about this
particular problem (see, for example,
http://archives.postgresql.org/pgsql-bugs/2006-04/msg00189.php ), but
things kind of trailed off after that without a resolution.

As far as the complete list of Win32 problems which affected us:
- The stats collector crashing should indeed be fixed in 8.1.4
- Missing stats caused by Windows PID recycling is fixed in 8.2
- Various semaphore problems are probably all fixed with the new
Win32 semaphore implementation in 8.2
- The stuck log rename problem mentioned above is still an issue
- The "permission denied on fsync" (or something like that) problem
is still an issue. Unfortunately, IIRC, we could never really nail down
the underlying problem.

None of these problems affect us any more: the production servers now
run Linux. Great to have options! (and we were moving that direction
anyway)

Pete

>>> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> 16.06.2006 22:05 >>>

I am assuming this problem and the other rash of Win32 problems
reported
in March are now all fixed in 8.1.4. If not, please let me know.


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Peter Brant <Peter(dot)Brant(at)wicourts(dot)gov>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bugs for PostgreSQL <pgsql-bugs(at)postgreSQL(dot)org>, PostgreSQL-patches <pgsql-patches(at)postgreSQL(dot)org>
Subject: Re: [Win32] Problem with rename()
Date: 2006-06-16 21:21:21
Message-ID: 200606162121.k5GLLLw13054@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches

Peter Brant wrote:
> Really? If there was a patch, I missed it.
>
> My recollection is that there was general agreement about this
> particular problem (see, for example,
> http://archives.postgresql.org/pgsql-bugs/2006-04/msg00189.php ), but
> things kind of trailed off after that without a resolution.

Yea. Where you using WAL archiving? We will have a fix in 8.1.5 to
prevent multiple archivers from starting. Perhaps that was a cause.

> As far as the complete list of Win32 problems which affected us:
> - The stats collector crashing should indeed be fixed in 8.1.4
> - Missing stats caused by Windows PID recycling is fixed in 8.2
> - Various semaphore problems are probably all fixed with the new
> Win32 semaphore implementation in 8.2
> - The stuck log rename problem mentioned above is still an issue

Yep. What has me baffled is why no one else is seeing the problem.
We had a rash of reports, and now all is quiet.

> - The "permission denied on fsync" (or something like that) problem
> is still an issue. Unfortunately, IIRC, we could never really nail down
> the underlying problem.

Yes, I just reread that thread. I also am confused where to go from
here.

> None of these problems affect us any more: the production servers now
> run Linux. Great to have options! (and we were moving that direction
> anyway)

Were you the only one use Win32 in heavy usage? You were on Win2003.
Were there some bugs in the OS that got fixed later.

Yea, stumped. Guess we will have to wait for more reports. I don't
even see how to document this as a TODO.

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "Bugs for PostgreSQL" <pgsql-bugs(at)postgreSQL(dot)org>, "PostgreSQL-patches" <pgsql-patches(at)postgreSQL(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [Win32] Problem with rename()
Date: 2006-06-17 17:25:37
Message-ID: 44945708.E840.00BE.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-patches

>>> On 16.06.2006 at 23:21:21, in message
<200606162121(dot)k5GLLLw13054(at)candle(dot)pha(dot)pa(dot)us>, Bruce Momjian
<pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
> Yea. Where you using WAL archiving? We will have a fix in 8.1.5 to
> prevent multiple archivers from starting. Perhaps that was a cause.
>
Not at the time, no. The rename in question was just a regular WAL
segment rename.

> Yes, I just reread that thread. I also am confused where to go from
> here.
>
Yeah, it's unfortunate that our best theory (a _commit on a deleted
file) just didn't seem to be supported by the evidence. Although the
servers which see a heavy SELECT load are now Linux, we still have a
couple of Windows servers receiving the normal replication traffic. We
still get regular fsync errors after the scheduled CLUSTERs so if you do
find a fix (or come up with a new theory), there's a test bed there (at
least for now).

> Were you the only one use Win32 in heavy usage? You were on Win2003.

> Were there some bugs in the OS that got fixed later.
...
> Yep. What has me baffled is why no one else is seeing the problem.
> We had a rash of reports, and now all is quiet.
>
We might be somewhat more susceptible than most too. Due to the way
our middle tier parcels out queries, some connections might sit idle for
a long time. Per Tom's explanation in the original thread, this is an
important factor. Ultimately if a concurrent rename isn't possible in
Windows (and that looks likely), it's going to be a problem as things
stand now.

Pete