Re: windows regression failure - prepared xacts

Lists: pgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: windows regression failure - prepared xacts
Date: 2005-07-07 14:30:29
Message-ID: 42CD3C85.4050507@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I am consistently seeing the regression failure shown below on my
Windows machine. See
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&dt=2005-07-07%2013:54:13

(On the plus side, I am now building happily and passing regression
tests with ASPerl, and hope to add ASPython and ASTcl to the list shortly).

cheers

andrew

================== pgsql.2072/src/test/regress/regression.diffs ===================
*** ./expected/prepared_xacts.out Thu Jul 7 09:55:18 2005
--- ./results/prepared_xacts.out Thu Jul 7 10:20:37 2005
***************
*** 179,189 ****
-- Commit table creation
COMMIT PREPARED 'regress-one';
\d pxtest2
! Table "public.pxtest2"
! Column | Type | Modifiers
! --------+---------+-----------
! a | integer |
!
SELECT * FROM pxtest2;
a
---
--- 179,185 ----
-- Commit table creation
COMMIT PREPARED 'regress-one';
\d pxtest2
! ERROR: cache lookup failed for relation 27240
SELECT * FROM pxtest2;
a
---

======================================================================


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: windows regression failure - prepared xacts
Date: 2005-07-13 19:51:10
Message-ID: 42D570AE.2070805@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I never got a reply to this, but I am still seeing it from time to time
- twice today in fact. Any suggestions?

cheers

andrew

Andrew Dunstan wrote:

>
> I am consistently seeing the regression failure shown below on my
> Windows machine. See
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&dt=2005-07-07%2013:54:13
>
>
>
> ================== pgsql.2072/src/test/regress/regression.diffs
> ===================
> *** ./expected/prepared_xacts.out Thu Jul 7 09:55:18 2005
> --- ./results/prepared_xacts.out Thu Jul 7 10:20:37 2005
> ***************
> *** 179,189 ****
> -- Commit table creation
> COMMIT PREPARED 'regress-one';
> \d pxtest2
> ! Table "public.pxtest2"
> ! Column | Type | Modifiers ! --------+---------+-----------
> ! a | integer | ! SELECT * FROM pxtest2;
> a ---
> --- 179,185 ----
> -- Commit table creation
> COMMIT PREPARED 'regress-one';
> \d pxtest2
> ! ERROR: cache lookup failed for relation 27240
> SELECT * FROM pxtest2;
> a ---
>
> ======================================================================


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: windows regression failure - prepared xacts
Date: 2005-07-13 21:50:11
Message-ID: 21937.1121291411@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> I never got a reply to this, but I am still seeing it from time to time
> - twice today in fact. Any suggestions?

I've been puzzled by that too. It seems to indicate that the syscache
inval message that the COMMIT should send is either not getting sent at
all, or is being processed too late. Neither of these ideas seems very
promising, especially considering that we're looking at a single backend
as both source and recipient of the message --- a race condition doesn't
seem credible. And there's nothing very platform-specific in that code
either. (I tried for awhile to explain it as some kind of deficiency
in the signal emulation we use on Windows, but there's no signals used
for normal sinval processing, so that doesn't seem to hold water.)

Are we sure that it only happens on Windows? Anyone else seen a similar
failure in the prepared_xacts test?

>> *** ./expected/prepared_xacts.out Thu Jul 7 09:55:18 2005
>> --- ./results/prepared_xacts.out Thu Jul 7 10:20:37 2005
>> ***************
>> *** 179,189 ****
>> -- Commit table creation
>> COMMIT PREPARED 'regress-one';
>> \d pxtest2
>> ! Table "public.pxtest2"
>> ! Column | Type | Modifiers ! --------+---------+-----------
>> ! a | integer | ! SELECT * FROM pxtest2;
>> a ---
>> --- 179,185 ----
>> -- Commit table creation
>> COMMIT PREPARED 'regress-one';
>> \d pxtest2
>> ! ERROR: cache lookup failed for relation 27240
>> SELECT * FROM pxtest2;
>> a ---

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: windows regression failure - prepared xacts
Date: 2005-07-13 23:39:20
Message-ID: 42D5A628.6060604@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


further (anecdotal) data point: I have usually seen this after doing a
number of builds. Rebooting seems to cure the problem (and that's
happened today agin - I have just seen 2 builds work). Maybe some sort
of strange shmem corruption?

cheers

andrew

Tom Lane wrote:

>Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>
>>I never got a reply to this, but I am still seeing it from time to time
>>- twice today in fact. Any suggestions?
>>
>>
>
>I've been puzzled by that too. It seems to indicate that the syscache
>inval message that the COMMIT should send is either not getting sent at
>all, or is being processed too late. Neither of these ideas seems very
>promising, especially considering that we're looking at a single backend
>as both source and recipient of the message --- a race condition doesn't
>seem credible. And there's nothing very platform-specific in that code
>either. (I tried for awhile to explain it as some kind of deficiency
>in the signal emulation we use on Windows, but there's no signals used
>for normal sinval processing, so that doesn't seem to hold water.)
>
>Are we sure that it only happens on Windows? Anyone else seen a similar
>failure in the prepared_xacts test?
>
>
>
>>>*** ./expected/prepared_xacts.out Thu Jul 7 09:55:18 2005
>>>--- ./results/prepared_xacts.out Thu Jul 7 10:20:37 2005
>>>***************
>>>*** 179,189 ****
>>>-- Commit table creation
>>>COMMIT PREPARED 'regress-one';
>>>\d pxtest2
>>>! Table "public.pxtest2"
>>>! Column | Type | Modifiers ! --------+---------+-----------
>>>! a | integer | ! SELECT * FROM pxtest2;
>>>a ---
>>>--- 179,185 ----
>>>-- Commit table creation
>>>COMMIT PREPARED 'regress-one';
>>>\d pxtest2
>>>! ERROR: cache lookup failed for relation 27240
>>>SELECT * FROM pxtest2;
>>>a ---
>>>
>>>
>
> regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: don't forget to increase your free space map settings
>
>
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: windows regression failure - prepared xacts
Date: 2005-07-14 01:31:39
Message-ID: 27766.1121304699@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> further (anecdotal) data point: I have usually seen this after doing a
> number of builds. Rebooting seems to cure the problem (and that's
> happened today agin - I have just seen 2 builds work). Maybe some sort
> of strange shmem corruption?

Hmmm ... that still doesn't make any sense, given that the test is
being run on a freshly started postmaster. Unless it's a hardware
problem? Have you seen this on more than one machine?

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: windows regression failure - prepared xacts
Date: 2005-07-14 12:35:18
Message-ID: 42D65C06.9060702@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

>Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>
>>further (anecdotal) data point: I have usually seen this after doing a
>>number of builds. Rebooting seems to cure the problem (and that's
>>happened today agin - I have just seen 2 builds work). Maybe some sort
>>of strange shmem corruption?
>>
>>
>
>Hmmm ... that still doesn't make any sense, given that the test is
>being run on a freshly started postmaster. Unless it's a hardware
>problem? Have you seen this on more than one machine?
>
>
>
>
No :-( But I find it hard to believe that a hardware failure would lead
to this precise error repeatedly. Stranger things have happened, I guess.

ON a related note, we need more Windows boxes on the buildfarm - ideally
living in a data center somewhere so we can automate builds, rather than
relying on my laptop and Jim's Windows box which seems to build
intermittently.

cheers

andrew