Bizarre buildfarm failure on baiji: can't find pg_class_oid_index

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Bizarre buildfarm failure on baiji: can't find pg_class_oid_index
Date: 2010-02-24 21:23:59
Message-ID: 29914.1267046639@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

There is a failure report here that seems worthy of notice:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=baiji&dt=2010-02-23%2023:00:03

Everything is fine except that the session running the "triggers" test
went completely bonkers: the first few commands are okay, and then
after that everything fails with

ERROR: could not find pg_class tuple for index 2662

2662 is pg_class_oid_index, so this basically means that relcache loads
are failing, which renders it not surprising that nothing works.

However, the concurrent test sessions are okay, and so are the following
ones except for some diffs arising from the trigger test's failure to
add or remove some tables once it had got broken. So only the one
backend went nuts. This leads to the conclusion that either the
breakage was purely internal to that backend's relcache, or else there
was a very transient broken state in the system catalogs that no other
backend was unlucky enough to see.

The triggers test runs in parallel with the vacuum test, which as of
a few weeks ago executes a "VACUUM FULL pg_class". My gut tells me
that that's related somehow, but on the other hand maybe this was just
random cosmic rays or something. We are talking about a Vista machine
after all.

I don't recall having seen anything like this elsewhere. Does anyone
else?

BTW, this is a good example of why letting the buildfarm sit broken
for any length of time is a bad idea: I very nearly didn't see this
failure at all, because I assumed it was just another instance of the
fsync-related issues. Maybe we should have a stricter policy about
backing out known-broken patches rather than letting them pollute
buildfarm results for awhile.

regards, tom lane

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2010-02-24 21:30:50 Re: pg_stop_backup does not complete
Previous Message Stefan Kaltenbrunner 2010-02-24 21:23:31 Re: SR/libpq - outbound interface/ipaddress binding