Re: Isolation tests still falling over routinely

Lists: pgsql-hackers
From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <alvherre(at)commandprompt(dot)com>,<tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Isolation tests still falling over routinely
Date: 2011-09-21 01:51:39
Message-ID: 4E78FCDB0200002500041455@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

> The buildfarm is still showing isolation test failures more days
> than not, eg
>
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=pika&dt=2011-09-17%2012%3A43%3A11
> and I've personally seen such failures when testing with
> CLOBBER_CACHE_ALWAYS. Could we please fix those tests to not have
> such fragile timing assumptions?

I went back over two months, and only found one failure related to an
SSI test, and that was because the machine ran out of disk space.
There should never be any timing-related failures on the SSI tests,
as there is no blocking or deadlocking.

If you have seen any failures on isolation tests other than the fk-*
tests, I'd be very interested in details.

The rest are not related to SSI but test deadlock conditions related
to foreign keys. I didn't have anything to do with these but to
provide alternate result files for REPEATABLE READ and SERIALIZABLE
isolation levels. (I test the installcheck-world target and the
isolation tests in those modes frequently, and the fk-deadlock tests
were failing every time at those levels.)

If I remember right, Alvaro chose these timings to balance run time
against chance of failure. Unless we want to remove these deadlock
handling tests or ignore failures (which both seem like bad ideas to
me), I think we need to bump the long timings by an order of
magnitude and just concede that those tests run for a while.

-Kevin


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Kevin Grittner <kevin(dot)grittner(at)wicourts(dot)gov>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Isolation tests still falling over routinely
Date: 2011-09-21 02:04:45
Message-ID: 1316570348-sup-317@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Excerpts from Kevin Grittner's message of mar sep 20 22:51:39 -0300 2011:

> If I remember right, Alvaro chose these timings to balance run time
> against chance of failure. Unless we want to remove these deadlock
> handling tests or ignore failures (which both seem like bad ideas to
> me), I think we need to bump the long timings by an order of
> magnitude and just concede that those tests run for a while.

The main problem I have is that I haven't found a way to reproduce the
problems in my machine. I was playing with modifying the way the error
messages are reported, but that ended up unfinished in a local branch.

I'll give it a go once more and see if I can commit so that buildfarm
tells us if it works or not.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Kevin Grittner <kevin(dot)grittner(at)wicourts(dot)gov>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Isolation tests still falling over routinely
Date: 2011-09-21 03:42:14
Message-ID: 13028.1316576534@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> The main problem I have is that I haven't found a way to reproduce the
> problems in my machine.

Try -DCLOBBER_CACHE_ALWAYS.

regards, tom lane