Intermittent regression test failures from index-only plan changes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Intermittent regression test failures from index-only plan changes
Date: 2012-01-06 22:25:15
Message-ID: 24785.1325888715@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Another regression test failure that I've been seeing lately is a change
from index-only scan to seqscan in create_index, as for instance here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguar&dt=2012-01-02%2023%3A05%3A02
I've managed to duplicate and debug this one too. What I find is
that the planner switches from preferring index-only scan to preferring
seqscan because normally it sees the table as being fully all-visible
after the immediately preceding VACUUM, but in the failure cases the
table is seen as having no all-visible pages. That inflates the
estimated cost of an index-only scan by enough to make it look like a
loser. The create_index test is usually running by itself at this point
(since the concurrently started create_view test is much shorter), but
I had autovacuum logging on and found that there was a concurrent
auto-ANALYZE on another table when the manual VACUUM was running.
So failure to set the all-visible flags is expected given that context.

This is a bit troublesome because, AFAICS, this means that every single
one of the fifty-odd regression test cases that expect to see an Index
Only Scan plan might transiently fail, if there happens to be a
background auto-ANALYZE running at just the moment that the previous
vacuum would've otherwise set the all-visible bits. It might be that
all the other ones are safe in practice for various reasons, but even
if they are there's no guarantee that new regression tests added in
future will be reliable.

Background auto-VACUUMs shouldn't cause this problem because they don't
take snapshots, and ideally it'd be nice if auto-ANALYZE couldn't create
the issue either, but ANALYZE does need a snapshot so it's hard to see
how to avoid having it trip the all-visible logic. Anybody have any
ideas?

If nothing else comes to mind I'll probably just remove the
sometimes-failing EXPLAIN test case, but I'm worried about the prospects
of future failures of the same ilk.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Aidan Van Dyk 2012-01-06 23:48:32 Re: 16-bit page checksums for 9.2
Previous Message Robert Haas 2012-01-06 22:24:32 LWLOCK_STATS