bug at build_dummy_tuple

Lists: pgsql-bugs
From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: pgsql-bugs(at)postgresql(dot)org
Subject: bug at build_dummy_tuple
Date: 2004-12-08 06:40:51
Message-ID: 20041208064050.GA16896@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

People,

This is a weird bug. In a freshly initialized database, or just after
deleting the pg_internal.init relcache file,

SELECT 16854::regclass;

crashes the backend. (Apparently any Oid not belonging to a regclass
does the trick.) The following assertion is failed:

TRAP: FailedAssertion(«!(((ntp)->t_data)->t_infomask & 0x0010)», Archivo: «/home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/cache/catcache.c», Línea: 1729)

The problem is that build_dummy_tuple() is calling HeapTupleSetOid(),
which complains apparently because it believes the pg_class relation
does not have Oids. This is wrong, but it doesn't know. In fact, next
time through, when it has the relcache built, all is well. I can attest
that the cache has wrong info, because gdb shows

(gdb) print *cache->cc_tupdesc
$15 = {natts = 25, attrs = 0x839d5d4, constr = 0x839cb3c, tdtypeid = 2249,
tdtypmod = -1, tdhasoid = 0 '\0'}
(gdb) print cache->cc_relname
$16 = 0x8266f7a "pg_class"

I don't know what the fix for this should look like ...

This doesn't seem to happen on 7.4.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El sabio habla porque tiene algo que decir;
el tonto, porque tiene que decir algo" (Platon).


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: bug at build_dummy_tuple
Date: 2004-12-12 04:13:18
Message-ID: 22602.1102824798@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> This is a weird bug. In a freshly initialized database, or just after
> deleting the pg_internal.init relcache file,
> SELECT 16854::regclass;
> crashes the backend.

Ah, I see it. Looks like I was a bit too cute here:
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/cache/relcache.c.diff?r1=1.200;r2=1.201;f=h
in particular the change within formrdesc at about line 1316.

I was thinking that formrdesc needn't get the relcache's tuple
descriptor (rel->rd_att) completely right, since it would get fixed up
during RelationCacheInitializePhase2. However, that routine uses
SearchSysCache(RELOID), which means catcache.c will have to initialize
that catalog cache on first call, and when it does so, it copies the
not-yet-fully-valid tupdesc for pg_class from the relcache into the
catcache entry. Any subsequent code path that looks at the tdtypeid or
tdhasoid fields of the RELOID catcache's tupdesc will see wrong data.

The reason it didn't crash in 7.4 was that the 7.4 coding forces the
hasoids bit true rather than false, which is no more "correct" than CVS
tip, but it happens to be right for pg_class which is the only case that
presently will be examined before RelationCacheInitializePhase2 fixes
everything. I saw that the code was not setting the bit correctly and
misassumed that it was therefore a don't care :-(

The proper solution is to make sure that formrdesc can fill the tupdesc
completely correctly; that's just a matter of adding a couple more
parameters to it, since it's only used for a small set of nailed
relations. Will fix.

regards, tom lane