Re: stats test on Windows is now failing repeatably?

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: stats test on Windows is now failing repeatably?
Date: 2006-08-29 02:36:36
Message-ID: 23046.1156818996@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I just looked over the buildfarm results and was struck by the
observation that the stats regression test, which lately had been
failing once-in-a-while on Windows and never anywhere else, has a
batting average of 0-for-10-or-so over the past 24 hours on the Windows
buildfarm machines. I still have no idea what the real problem is there
--- but since it suddenly seems to have gotten very repeatable, I trust
someone with a Windows box and a debugger will get after it before the
source code drifts again.

[ urk ... must ... resist ... temptation ... failing ... AUTOVACUUM? ]

regards, tom lane


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: stats test on Windows is now failing repeatably?
Date: 2006-08-29 16:39:23
Message-ID: 44F46DBB.60808@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I just looked over the buildfarm results and was struck by the
> observation that the stats regression test, which lately had been
> failing once-in-a-while on Windows and never anywhere else, has a
> batting average of 0-for-10-or-so over the past 24 hours on the Windows
> buildfarm machines. I still have no idea what the real problem is there
> --- but since it suddenly seems to have gotten very repeatable, I trust
> someone with a Windows box and a debugger will get after it before the
> source code drifts again.

maybe it's worth pointing out that leveret(fedora core5/x86_64/icc)
manages to trigger that too on occassion - so maybe it is not a "windows
only" bug:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=leveret&dt=2006-08-17%2008:30:01
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=leveret&dt=2006-08-10%2000:30:02

Stefan


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: stats test on Windows is now failing repeatably?
Date: 2006-08-30 02:46:49
Message-ID: 20060830112649.573C.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I just looked over the buildfarm results and was struck by the
> observation that the stats regression test, which lately had been
> failing once-in-a-while on Windows and never anywhere else, has a
> batting average of 0-for-10-or-so over the past 24 hours on the Windows
> buildfarm machines.

I tested HEAD on Windows and saw some Windows-specific logs.

LOG: Windows fopen("base/16384/pg_internal.init","rb") failed: code 2, errno 2
LOG: Windows fopen("global/pgstat.stat","rb") failed: code 32, errno 13

The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find the file
specified." and the code 32 means ERROR_SHARING_VIOLATION, "The process
cannot access the file because it is being used by another process."

We use the tmpfile-and-rename trick on both pg_internal.init and pgstat.stat.
Are there any incompatible behavior in the trick between POSIX and Windows?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: stats test on Windows is now failing repeatably?
Date: 2006-08-30 15:02:24
Message-ID: 27555.1156950144@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> I tested HEAD on Windows and saw some Windows-specific logs.

> LOG: Windows fopen("base/16384/pg_internal.init","rb") failed: code 2, errno 2
> LOG: Windows fopen("global/pgstat.stat","rb") failed: code 32, errno 13

> The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find the file
> specified." and the code 32 means ERROR_SHARING_VIOLATION, "The process
> cannot access the file because it is being used by another process."

The first of those is probably normal operation --- we remove
pg_internal.init whenever it is out-of-date. The second is bad though.

> We use the tmpfile-and-rename trick on both pg_internal.init and pgstat.stat.
> Are there any incompatible behavior in the trick between POSIX and Windows?

It looks to me like we have implemented Windows' FILE_SHARE_DELETE flag
for open() calls but not for fopen(). Isn't this a problem? We do use
fopen() for stuff like pgstat.stat.

regards, tom lane


From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: stats test on Windows is now failing repeatably?
Date: 2006-08-30 15:07:17
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCEA355F0@algol.sollentuna.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find
> the
> > file specified." and the code 32 means ERROR_SHARING_VIOLATION,
> "The
> > process cannot access the file because it is being used by
> another process."
>
> The first of those is probably normal operation --- we remove
> pg_internal.init whenever it is out-of-date. The second is bad
> though.
>
> > We use the tmpfile-and-rename trick on both pg_internal.init and
> pgstat.stat.
> > Are there any incompatible behavior in the trick between POSIX
> and Windows?
>
> It looks to me like we have implemented Windows' FILE_SHARE_DELETE
> flag for open() calls but not for fopen(). Isn't this a problem?
> We do use
> fopen() for stuff like pgstat.stat.

That definitely sounds like a problem, there is no reason why the issue
shouldn't occur for fopen(). Do you want to work up a patch for that
based on open(), or do you want me to take a look at it?

//Magnus


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Magnus Hagander" <mha(at)sollentuna(dot)net>
Cc: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: stats test on Windows is now failing repeatably?
Date: 2006-08-30 15:10:19
Message-ID: 27752.1156950619@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Magnus Hagander" <mha(at)sollentuna(dot)net> writes:
>> It looks to me like we have implemented Windows' FILE_SHARE_DELETE
>> flag for open() calls but not for fopen(). Isn't this a problem?
>> We do use fopen() for stuff like pgstat.stat.

> That definitely sounds like a problem, there is no reason why the issue
> shouldn't occur for fopen(). Do you want to work up a patch for that
> based on open(), or do you want me to take a look at it?

It looks straightforward to apply our reimplemented pgwin32_open()
followed by fdopen(), but since I don't have a Windows build environment
I couldn't test the patch. Please take a look at it.

regards, tom lane


From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: stats test on Windows is now failing repeatably?
Date: 2006-08-30 17:46:25
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCEA355F4@algol.sollentuna.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> >> It looks to me like we have implemented Windows'
> FILE_SHARE_DELETE
> >> flag for open() calls but not for fopen(). Isn't this a
> problem?
> >> We do use fopen() for stuff like pgstat.stat.
>
> > That definitely sounds like a problem, there is no reason why the
> > issue shouldn't occur for fopen(). Do you want to work up a patch
> for
> > that based on open(), or do you want me to take a look at it?
>
> It looks straightforward to apply our reimplemented pgwin32_open()
> followed by fdopen(), but since I don't have a Windows build
> environment I couldn't test the patch. Please take a look at it.

I think this is what we want. It passes regression tests on my machine.
I never managed to reproduce the original problem on this machine, so
don't know if it solves the problem, but I don't think it makes it worse
:-)

//Magnus

Attachment Content-Type Size
win32_fopen.diff application/octet-stream 2.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Magnus Hagander" <mha(at)sollentuna(dot)net>
Cc: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: stats test on Windows is now failing repeatably?
Date: 2006-08-30 18:07:26
Message-ID: 10509.1156961246@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Magnus Hagander" <mha(at)sollentuna(dot)net> writes:
>> It looks straightforward to apply our reimplemented pgwin32_open()
>> followed by fdopen(), but since I don't have a Windows build
>> environment I couldn't test the patch. Please take a look at it.

> I think this is what we want. It passes regression tests on my machine.
> I never managed to reproduce the original problem on this machine, so
> don't know if it solves the problem, but I don't think it makes it worse
> :-)

Applied, we'll see what happens ...

regards, tom lane


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: "Magnus Hagander" <mha(at)sollentuna(dot)net>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: stats test on Windows is now failing repeatably?
Date: 2006-08-31 02:33:51
Message-ID: 20060831113031.54DF.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Magnus Hagander" <mha(at)sollentuna(dot)net> wrote:

> > FILE_SHARE_DELETE
>
> I think this is what we want. It passes regression tests on my machine.
> I never managed to reproduce the original problem on this machine, so
> don't know if it solves the problem, but I don't think it makes it worse
> :-)

It seems to work very well!
I ran the same workload on the HEAD, and I did not see any
pgstat.stat related logs now.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center