Broken HOT chains in system catalogs

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Broken HOT chains in system catalogs
Date: 2011-04-15 19:01:04
Message-ID: 28664.1302894064@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been able to reproduce the syndrome reported here and here:
http://archives.postgresql.org/pgsql-performance/2011-02/msg00497.php
http://archives.postgresql.org/pgsql-general/2011-04/msg00487.php

It's a bit tricky to do it reliably, but once you get into the right
state it's reproducible. You need an open transaction sitting in
another session (I used "begin isolation level serializable;
select * from pg_enum;"), and then mix around the following commands

vacuum full pg_class;
reindex table pg_class;
reindex table pg_index;
vacuum full pg_index;

until you get a failure. I added some debugging elog's to the code
while I was chasing this, so my psql terminal window looks like this:

vft=# vacuum full pg_class;
VACUUM
vft=# reindex table pg_class;
WARNING: broken HOT chain at DELETE_IN_PROGRESS tuple 6/30
WARNING: setting indcheckxmin for 2662 in table 2610 fn 65231 tuple 1/52
WARNING: broken HOT chain at DELETE_IN_PROGRESS tuple 6/30
WARNING: broken HOT chain at DELETE_IN_PROGRESS tuple 6/31
WARNING: setting indcheckxmin for 2663 in table 2610 fn 65231 tuple 2/1
REINDEX
vft=# reindex table pg_index;
WARNING: broken HOT chain at RECENTLY_DEAD tuple 2/1
WARNING: setting indcheckxmin for 2678 in table 2610 fn 65231 tuple 1/47
WARNING: broken HOT chain at RECENTLY_DEAD tuple 2/1
WARNING: setting indcheckxmin for 2679 in table 2610 fn 65231 tuple 1/43
REINDEX
vft=# vacuum full pg_index;
ERROR: duplicate key value violates unique constraint "pg_index_indexrelid_index"
DETAIL: Key (indexrelid)=(2678) already exists.
vft=#

The failure is here:

#0 errfinish (dummy=0) at elog.c:369
#1 0x1f2b94 in _bt_check_unique (rel=0x400feb00, itup=0x4016e270,
heapRel=0x40101bb0, buf=205, offset=43, itup_scankey=0x40171040,
checkUnique=UNIQUE_CHECK_YES, is_unique=0x7b03c2b0 "\001")
at nbtinsert.c:404
#2 0x1f2698 in _bt_doinsert (rel=0x400feb00, itup=0x4016e270,
checkUnique=UNIQUE_CHECK_YES, heapRel=0x40101bb0) at nbtinsert.c:161
#3 0x1fa224 in btinsert (fcinfo=0x4005254c) at nbtree.c:229
#4 0x4fd6dc in FunctionCall6 (flinfo=0x0, arg1=0, arg2=0, arg3=0,
arg4=1075232940, arg5=1074797488, arg6=1) at fmgr.c:1417
#5 0x1f0454 in index_insert (indexRelation=0x400feb00, values=0x7b03bdf0,
isnull=0x7b03be70 "", heap_t_ctid=0x4016c0ac, heapRelation=0x40101bb0,
checkUnique=UNIQUE_CHECK_YES) at indexam.c:198
#6 0x250fcc in CatalogIndexInsert (indstate=0x0, heapTuple=0x4016c0a8)
at indexing.c:135
#7 0x25105c in CatalogUpdateIndexes (heapRel=0x0, heapTuple=0x4016c0a8)
at indexing.c:161
#8 0x2506e4 in reindex_index (indexId=2678, skip_constraint_checks=1 '\001')
at index.c:2516
#9 0x250898 in reindex_relation (relid=0, toast_too=0 '\000', flags=2)
at index.c:2630
#10 0x2b0ac8 in finish_heap_swap (OIDOldHeap=2610, OIDNewHeap=65187,
is_system_catalog=0 '\000', swap_toast_by_content=0 '\000',
check_constraints=0, frozenXid=654) at cluster.c:1367
#11 0x2afa04 in rebuild_relation (OldHeap=0xfea3, indexOid=0,
freeze_min_age=-1, freeze_table_age=-1) at cluster.c:609
#12 0x2af370 in cluster_rel (tableOid=2610, indexOid=0, recheck=0 '\000',
verbose=0 '\000', freeze_min_age=-1, freeze_table_age=-1) at cluster.c:394
#13 0x3060fc in vacuum_rel (relid=2610, vacstmt=0x400647b8, do_toast=0 '\000',
for_wraparound=0 '\000', scanned_all=0x7b03b880 "") at vacuum.c:970
#14 0x30500c in vacuum (vacstmt=0x400647b8, relid=2, do_toast=1 '\001',
bstrategy=0x40025458, for_wraparound=0 '\000', isTopLevel=1)
at vacuum.c:230

What is happening is that the preceding "reindex table pg_index" set the
indcheckxmin flag for pg_index's indexes. The new table built by vacuum
full contains no HOT chains at all, let alone broken ones, so at the end
of reindex_index we decide we ought to clear the indcheckxmin flag.
But we are not done swapping the old and new indexes, so the uniqueness
check doesn't work right --- I believe it's looking into the old index
and finding a TID that means something totally different in the new
table.

I had originally been working on the theory that the problem was coming
from the code that sets indcheckxmin in index_build(), but at least in
this manifestation it's actually the code that resets the flag in
reindex_index() that is failing. But IMO they are both buggy as can be,
because they are totally failing to consider the possibility that the
table/index being hacked on is pg_index itself. There are all sorts of
conceivable failure modes there, such as inserting the updated tuple
into the old copy of pg_index instead of the new one, and any of them
might come to pass anytime someone rearranges any of the code within
hailing distance of vacuum/reindex/etc. I suspect that related failures
are possible if other system catalogs need to get consulted during the
syscache fetch, too --- if one of them is the target table, we have the
same sort of issue of executing code that isn't expecting the catalogs
to be moving underneath it.

I thought for awhile about restructuring the reindexing code so that
these indcheckxmin updates don't happen until after we're fully out of
the reindexing and heap-swapping operation. This would be vaguely
similar to the pushups that reindex_relation has to go through to
reindex pg_class successfully. However, it would be pretty messy to
pass back the relevant state to someplace where it'd be safer to fix
the flag.

On further reflection, though, none of this should be necessary because
in reality there should never be any broken HOT chains in pg_index in
the first place. On that theory, the real bug is IndexBuildHeapScan's
laziness about determining whether a dead HOT tuple actually represents
a problem. We could try to make it disassemble the tuple and check the
index fields, but that seems a bit complicated and error-prone for
something that needs to be back-patched.

So, my proposal for a fix is this: change IndexBuildHeapScan so it
never sets ii_BrokenHotChain at all if is_system_catalog is true.
This amounts to assuming that no new indexes get added to system
catalogs after initdb, or at least not during concurrent operations
wherein indcheckxmin would be important.

Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-04-15 19:13:04 Re: Single client performance on trivial SELECTs
Previous Message Heikki Linnakangas 2011-04-15 17:56:19 pgsql: setlocale() on Windows doesn't work correctly if the locale name