race condition for drop schema cascade?

Lists: pgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: race condition for drop schema cascade?
Date: 2004-12-15 21:05:36
Message-ID: 41C0A720.9050803@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I have seen this failure several times, but not consistently, on the
buildfarm member otter (Debian/MIPS) and possible on others, and am
wondering if it indicates a possible race condition on DROP SCHEMA CASCADE.

================== pgsql.30167/src/test/regress/regression.diffs ===================
*** ./expected/tablespace.out Sat Dec 11 13:05:32 2004
--- ./results/tablespace.out Sat Dec 11 14:35:24 2004
***************
*** 35,37 ****
--- 35,38 ----
NOTICE: drop cascades to table testschema.foo
-- Should succeed
DROP TABLESPACE testspace;
+ ERROR: tablespace "testspace" is not empty

======================================================================

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: race condition for drop schema cascade?
Date: 2004-12-15 21:29:01
Message-ID: 9085.1103146141@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> I have seen this failure several times, but not consistently, on the
> buildfarm member otter (Debian/MIPS) and possible on others, and am
> wondering if it indicates a possible race condition on DROP SCHEMA CASCADE.

Hard to see what, considering that there's only one backend touching
that tablespace in the test. I'd be inclined to wonder if there's
a filesystem-level problem on that platform. What filesystem are you
running on anyway?

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: race condition for drop schema cascade?
Date: 2004-12-29 03:12:56
Message-ID: 41D220B8.3080403@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

>Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>
>>I have seen this failure several times, but not consistently, on the
>>buildfarm member otter (Debian/MIPS) and possible on others, and am
>>wondering if it indicates a possible race condition on DROP SCHEMA CASCADE.
>>
>>
>
>Hard to see what, considering that there's only one backend touching
>that tablespace in the test. I'd be inclined to wonder if there's
>a filesystem-level problem on that platform. What filesystem are you
>running on anyway?
>
>

I have just seen this error again, this time on Cygwin. I did a trawl thought the buildfarm history looking for other occurrences and found it happening on many platforms:

pgbuildfarm=# select name, operating_system, stage, count from buildsystems b, (select sysname, stage, count(*) as count from build_status where log ~ 'tablespace "testspace" is not empty' group by sysname, stage) as s where s.sysname=b.name;

name | operating_system | stage | count

----------+------------------+--------------+-------

spoonbill | OpenBSD | Check | 2

lionfish | Linux | Check | 9

kudu | Solaris | InstallCheck | 1

kudu | Solaris | Check | 5

emu | OpenBSD | Check | 137

loris | Windows | Check | 2

gibbon | Cygwin | InstallCheck | 1

panda | Linux Debian | Check | 3

otter | Debian Linux | Check | 2

hare | Debian Linux | Check | 3

dog | Fedora Core | Check | 17

fantail | Linux | Check | 3

osprey | NetBSD | Check | 15

(13 rows)

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: race condition for drop schema cascade?
Date: 2004-12-29 07:53:19
Message-ID: 7217.1104306799@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> I have just seen this error again, this time on Cygwin. I did a trawl thought the buildfarm history looking for other occurrences and found it happening on many platforms:

[ yawning... ] I've got to go to bed now, but so far tonight my Fedora
Core 3 machine has completed 314 iterations of "make check" on CVS tip
with no such error. So whatever this is, there must be some
platform-specific issue involved...

regards, tom lane


From: Kurt Roeckx <Q(at)ping(dot)be>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: race condition for drop schema cascade?
Date: 2004-12-29 12:45:04
Message-ID: 20041229124504.GA24680@ping.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> pgbuildfarm=# select name, operating_system, stage, count from buildsystems
> b, (select sysname, stage, count(*) as count from build_status where log ~
> 'tablespace "testspace" is not empty' group by sysname, stage) as s where
> s.sysname=b.name;

Note that the expected log has that as error message after a
"drop tablespace testspace;", while it should works with a
"drop tablespace testspace cascade;".

How many of those errors are because of some other error? Like
dog for intance ran out of diskspace recently and had those in
the logs. I know panda also once ran out of diskspace, but the
logs for that aren't available on the site anymore.

When was the last time this error actually happened? Because
looking at emu (which seem to have it the most) shows that it's
last 30 builds are all succesful.

PS: It might be nice to have an option to keep the last X days of
all logs around.

Kurt


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Kurt Roeckx <Q(at)ping(dot)be>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: race condition for drop schema cascade?
Date: 2004-12-29 13:16:04
Message-ID: 41D2AE14.20306@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kurt Roeckx wrote:

>>pgbuildfarm=# select name, operating_system, stage, count from buildsystems
>>b, (select sysname, stage, count(*) as count from build_status where log ~
>>'tablespace "testspace" is not empty' group by sysname, stage) as s where
>>s.sysname=b.name;
>>
>>
>
>Note that the expected log has that as error message after a
>"drop tablespace testspace;", while it should works with a
>"drop tablespace testspace cascade;".
>
>How many of those errors are because of some other error? Like
>dog for intance ran out of diskspace recently and had those in
>the logs. I know panda also once ran out of diskspace, but the
>logs for that aren't available on the site anymore.
>
>When was the last time this error actually happened? Because
>looking at emu (which seem to have it the most) shows that it's
>last 30 builds are all succesful.
>
>
You're right - my query was not sufficiently specific. There have in
fact been 4 failures:

pgbuildfarm=# select sysname, snapshot, stage, branch from build_status
where log ~ 'tablespace "testspace" is not empty.*tablespace "testspace"
is not empty' and not log ~ 'No space left';
sysname | snapshot | stage | branch
--------+---------------------+--------------+--------
hare | 2004-12-09 05:15:05 | Check | HEAD
otter | 2004-12-11 15:50:09 | Check | HEAD
otter | 2004-12-15 15:50:10 | Check | HEAD
gibbon | 2004-12-28 23:55:05 | InstallCheck | HEAD

gibbon is a Cygwin box, otter and hare are both Debian 3.1 boxes, hare
on Alpha and otter on MIPS.

>
>PS: It might be nice to have an option to keep the last X days of
>all logs around.
>
>
>
>

You mean on the client? I'd rather not - the logs kept there are mostly
intended as debugging devices. The buildfarm db keeps the log from the
stage where an error occurred indefinitely. I intend to provide a way of
going back through that history - at the moment you can easily see the
last 30.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Kurt Roeckx <Q(at)ping(dot)be>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: race condition for drop schema cascade?
Date: 2004-12-29 17:26:56
Message-ID: 11170.1104341216@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> You're right - my query was not sufficiently specific. There have in
> fact been 4 failures:

> pgbuildfarm=# select sysname, snapshot, stage, branch from build_status
> where log ~ 'tablespace "testspace" is not empty.*tablespace "testspace"
> is not empty' and not log ~ 'No space left';
> sysname | snapshot | stage | branch
> --------+---------------------+--------------+--------
> hare | 2004-12-09 05:15:05 | Check | HEAD
> otter | 2004-12-11 15:50:09 | Check | HEAD
> otter | 2004-12-15 15:50:10 | Check | HEAD
> gibbon | 2004-12-28 23:55:05 | InstallCheck | HEAD

Why does the last show as an "install" failure?

Anyway, given the small number of machines involved, I'm once again
wondering what filesystem they are using. They wouldn't be running
the check over NFS, by any chance, for instance?

The theory that is in my mind is that the bgwriter could have written
out a page for the table in the test tablespace, and thereby be holding
an open file pointer for it. On standard Unix filesystems this would
not disrupt the backend's ability to unlink the table at the DROP stage,
but I'm wondering about nonstandard filesystems ...

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kurt Roeckx <Q(at)ping(dot)be>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: race condition for drop schema cascade?
Date: 2004-12-29 18:05:26
Message-ID: 41D2F1E6.7040303@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

>Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>
>>You're right - my query was not sufficiently specific. There have in
>>fact been 4 failures:
>>
>>
>
>
>
>>pgbuildfarm=# select sysname, snapshot, stage, branch from build_status
>>where log ~ 'tablespace "testspace" is not empty.*tablespace "testspace"
>>is not empty' and not log ~ 'No space left';
>> sysname | snapshot | stage | branch
>> --------+---------------------+--------------+--------
>> hare | 2004-12-09 05:15:05 | Check | HEAD
>> otter | 2004-12-11 15:50:09 | Check | HEAD
>> otter | 2004-12-15 15:50:10 | Check | HEAD
>> gibbon | 2004-12-28 23:55:05 | InstallCheck | HEAD
>>
>>
>
>Why does the last show as an "install" failure?
>
>

We run the standard regression suite twice - the failure on Gibbon
occurred on the second of these. Clearly this is very transient.

>Anyway, given the small number of machines involved, I'm once again
>wondering what filesystem they are using. They wouldn't be running
>the check over NFS, by any chance, for instance?
>
>The theory that is in my mind is that the bgwriter could have written
>out a page for the table in the test tablespace, and thereby be holding
>an open file pointer for it. On standard Unix filesystems this would
>not disrupt the backend's ability to unlink the table at the DROP stage,
>but I'm wondering about nonstandard filesystems ...
>
>
>

Jim Buttafuoco reported on December 16th that he had rebuilt the
filesystem on his MIPS box - I assume this means that he isn't using
NFS. In any case, we have not seen the problem since then. His Alpha box
has not been reporting buildfarm results since before then.

The Cygwin box is running on NTFS - and we know we've encountered plenty
of problems with unlinking on Windows.

I know it's not much to go on.

cheers

andrew


From: "Jim Buttafuoco" <jim(at)contactbda(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kurt Roeckx <Q(at)ping(dot)be>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: race condition for drop schema cascade?
Date: 2004-12-29 20:47:28
Message-ID: 20041229204422.M98299@contactbda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew/all

I have not seen any problems on my MIPS systems since the rebuild ext3 (I ran badblocks during fs creation). I should
have the alpha running about soon, the disk died and I am waiting a replacement. I do believe there is a floating
point problem with older alpha's out there. The seems to have a problem with INFINITY and NAN's. I did some checking
on the net and the problem seems know (with no solution). Maybe something can go into the readme or such. If anyone
is interested in looking at this for > pg8.0 I can give SSH access in a week or so.

Jim

---------- Original Message -----------
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kurt Roeckx <Q(at)ping(dot)be>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Sent: Wed, 29 Dec 2004 13:05:26 -0500
Subject: Re: [HACKERS] race condition for drop schema cascade?

> Tom Lane wrote:
>
> >Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> >
> >
> >>You're right - my query was not sufficiently specific. There have in
> >>fact been 4 failures:
> >>
> >>
> >
> >
> >
> >>pgbuildfarm=# select sysname, snapshot, stage, branch from build_status
> >>where log ~ 'tablespace "testspace" is not empty.*tablespace "testspace"
> >>is not empty' and not log ~ 'No space left';
> >> sysname | snapshot | stage | branch
> >> --------+---------------------+--------------+--------
> >> hare | 2004-12-09 05:15:05 | Check | HEAD
> >> otter | 2004-12-11 15:50:09 | Check | HEAD
> >> otter | 2004-12-15 15:50:10 | Check | HEAD
> >> gibbon | 2004-12-28 23:55:05 | InstallCheck | HEAD
> >>
> >>
> >
> >Why does the last show as an "install" failure?
> >
> >
>
> We run the standard regression suite twice - the failure on Gibbon
> occurred on the second of these. Clearly this is very transient.
>
> >Anyway, given the small number of machines involved, I'm once again
> >wondering what filesystem they are using. They wouldn't be running
> >the check over NFS, by any chance, for instance?
> >
> >The theory that is in my mind is that the bgwriter could have written
> >out a page for the table in the test tablespace, and thereby be holding
> >an open file pointer for it. On standard Unix filesystems this would
> >not disrupt the backend's ability to unlink the table at the DROP stage,
> >but I'm wondering about nonstandard filesystems ...
> >
> >
> >
>
> Jim Buttafuoco reported on December 16th that he had rebuilt the
> filesystem on his MIPS box - I assume this means that he isn't using
> NFS. In any case, we have not seen the problem since then. His Alpha box
> has not been reporting buildfarm results since before then.
>
> The Cygwin box is running on NTFS - and we know we've encountered plenty
> of problems with unlinking on Windows.
>
> I know it's not much to go on.
>
> cheers
>
> andrew
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
------- End of Original Message -------


From: "Jim Buttafuoco" <jim(at)contactbda(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Kurt Roeckx <Q(at)ping(dot)be>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: race condition for drop schema cascade?
Date: 2004-12-29 20:49:27
Message-ID: 20041229204848.M34598@contactbda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom,

my systems are all EXT3 (Debian 3.1) (andrew can tell you which ones they are).

Jim

---------- Original Message -----------
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Kurt Roeckx <Q(at)ping(dot)be>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Sent: Wed, 29 Dec 2004 12:26:56 -0500
Subject: Re: [HACKERS] race condition for drop schema cascade?

> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > You're right - my query was not sufficiently specific. There have in
> > fact been 4 failures:
>
> > pgbuildfarm=# select sysname, snapshot, stage, branch from build_status
> > where log ~ 'tablespace "testspace" is not empty.*tablespace "testspace"
> > is not empty' and not log ~ 'No space left';
> > sysname | snapshot | stage | branch
> > --------+---------------------+--------------+--------
> > hare | 2004-12-09 05:15:05 | Check | HEAD
> > otter | 2004-12-11 15:50:09 | Check | HEAD
> > otter | 2004-12-15 15:50:10 | Check | HEAD
> > gibbon | 2004-12-28 23:55:05 | InstallCheck | HEAD
>
> Why does the last show as an "install" failure?
>
> Anyway, given the small number of machines involved, I'm once again
> wondering what filesystem they are using. They wouldn't be running
> the check over NFS, by any chance, for instance?
>
> The theory that is in my mind is that the bgwriter could have written
> out a page for the table in the test tablespace, and thereby be holding
> an open file pointer for it. On standard Unix filesystems this would
> not disrupt the backend's ability to unlink the table at the DROP stage,
> but I'm wondering about nonstandard filesystems ...
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
------- End of Original Message -------


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: jim(at)contactbda(dot)com
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kurt Roeckx <Q(at)ping(dot)be>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: race condition for drop schema cascade?
Date: 2004-12-29 21:04:25
Message-ID: 41D31BD9.1030809@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim Buttafuoco wrote:

>Andrew/all
>
>I have not seen any problems on my MIPS systems since the rebuild ext3 (I ran badblocks during fs creation). I should
>have the alpha running about soon, the disk died and I am waiting a replacement. I do believe there is a floating
>point problem with older alpha's out there. The seems to have a problem with INFINITY and NAN's. I did some checking
>on the net and the problem seems know (with no solution). Maybe something can go into the readme or such. If anyone
>is interested in looking at this for > pg8.0 I can give SSH access in a week or so.
>
>
>
>
>

I doubt that either of the problems (FP on old Alpha or failing 'drop
schema cascade + drop tablespace') is a showstopper. Maybe the
'platforms supported' notes should carry a mention.

cheers

andrew


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kurt Roeckx <Q(at)ping(dot)be>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: race condition for drop schema cascade?
Date: 2005-01-04 04:08:54
Message-ID: 200501040408.j0448sS00381@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Did this get resolved as an OS file system issue?

---------------------------------------------------------------------------

Andrew Dunstan wrote:
>
>
> Tom Lane wrote:
>
> >Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> >
> >
> >>You're right - my query was not sufficiently specific. There have in
> >>fact been 4 failures:
> >>
> >>
> >
> >
> >
> >>pgbuildfarm=# select sysname, snapshot, stage, branch from build_status
> >>where log ~ 'tablespace "testspace" is not empty.*tablespace "testspace"
> >>is not empty' and not log ~ 'No space left';
> >> sysname | snapshot | stage | branch
> >> --------+---------------------+--------------+--------
> >> hare | 2004-12-09 05:15:05 | Check | HEAD
> >> otter | 2004-12-11 15:50:09 | Check | HEAD
> >> otter | 2004-12-15 15:50:10 | Check | HEAD
> >> gibbon | 2004-12-28 23:55:05 | InstallCheck | HEAD
> >>
> >>
> >
> >Why does the last show as an "install" failure?
> >
> >
>
>
> We run the standard regression suite twice - the failure on Gibbon
> occurred on the second of these. Clearly this is very transient.
>
>
> >Anyway, given the small number of machines involved, I'm once again
> >wondering what filesystem they are using. They wouldn't be running
> >the check over NFS, by any chance, for instance?
> >
> >The theory that is in my mind is that the bgwriter could have written
> >out a page for the table in the test tablespace, and thereby be holding
> >an open file pointer for it. On standard Unix filesystems this would
> >not disrupt the backend's ability to unlink the table at the DROP stage,
> >but I'm wondering about nonstandard filesystems ...
> >
> >
> >
>
> Jim Buttafuoco reported on December 16th that he had rebuilt the
> filesystem on his MIPS box - I assume this means that he isn't using
> NFS. In any case, we have not seen the problem since then. His Alpha box
> has not been reporting buildfarm results since before then.
>
> The Cygwin box is running on NTFS - and we know we've encountered plenty
> of problems with unlinking on Windows.
>
> I know it's not much to go on.
>
> cheers
>
> andrew
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073