Cause of occasional buildfarm failures in sequence test

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Cause of occasional buildfarm failures in sequence test
Date: 2008-08-31 17:17:51
Message-ID: 11009.1220203071@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The current result from buildfarm member pigeon
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=pigeon&dt=2008-08-31%2008:04:54
shows a symptom I've noticed a few times before: everything is fine
except that one "select * from sequence" shows log_cnt = 31 instead
of the expected 32. This is a bit odd since no other backend is
touching that sequence, so you'd expect perfectly reproducible results.

I finally got around to looking through the sequence code to see if I
could figure out what was happening, and indeed I did. The relevant
parts of the test are basically
create sequence foo;
select nextval('foo');
select nextval('foo');
select * from foo;
and you can reproduce the unexpected result if you insert a checkpoint
command between the create and the first nextval. So the buildfarm
failures are due to chance occurrences of a bgwriter-driven checkpoint
occurring just there during the test.

In the normal execution of this test, the CREATE SEQUENCE leaves the
sequence in a state where one nextval() can be done "for free", without
emitting a new WAL record. So the second nextval() is the one that
emits a WAL record and pushes log_cnt up to 32. But if a checkpoint
intervenes, then the first nextval() decides it'd better emit the WAL
record (cf lines 500ff in sequence.c as of CVS HEAD), so it pushes
log_cnt up to 32, and then the second one decreases it to 31.

So unless we want to just live with this test failing occasionally,
it seems we have two choices: redesign the behavior of nextval()
to be insensitive to checkpoint timing, or provide an alternate
regression "expected" file that matches the result with log_cnt = 31.
I favor the second answer --- I don't want to touch the nextval
logic, which has been stable for over six years.

Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hannu Krosing 2008-08-31 17:34:10 Re: Cause of occasional buildfarm failures in sequence test
Previous Message Simon Riggs 2008-08-31 09:27:47 rmgr hooks and contrib/rmgr_hook