Garbage pad bytes within datums are bad news

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Garbage pad bytes within datums are bad news
Date: 2008-04-04 19:57:24
Message-ID: 7408.1207339044@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I tracked down the problem reported here:
http://archives.postgresql.org/pgsql-admin/2008-04/msg00038.php
What it boils down to is that equal() doesn't see these two Consts
as equal:

{CONST
:consttype 1009
:consttypmod -1
:constlen -1
:constbyval false
:constisnull false
:constvalue 48 [ 0 0 0 48 0 0 0 1 0 0 0 0 0 0 0 25 0 0 0 3 0 0
0 1 0 0 0 5 49 127 127 127 0 0 0 5 50 127 127 127 0 0 0 5 51 12
7 127 127 ]
}

{CONST
:consttype 1009
:consttypmod -1
:constlen -1
:constbyval false
:constisnull false
:constvalue 48 [ 0 0 0 48 0 0 0 1 0 0 0 0 0 0 0 25 0 0 0 3 0 0
0 1 0 0 0 5 49 0 0 0 0 0 0 5 50 0 0 0 0 0 0 5 51 0 0 0 ]
}

The datums are arrays of text, and the bytes that are different are
garbage pad bytes between array entries. Since equal() uses simple
bytewise equality (cf datumIsEqual()) it sees the constants as unequal.
The reason the behavior is a bit erratic is that the array constructor
isn't bothering to initialize these bytes, so you might or might not
get a failure depending on what happened to be there before.

Now, in large chunks of the system, a false not-equal result doesn't
cause anything worse than inefficiency, but in the particular case here
you actually get an error :-(. I'm surprised that we've not seen
something like this reported before, because this has been busted since
forever.

From a semantic point of view it would be nicer if equal() used a
type-specific equality operator to compare Datums, but that idea runs up
against the same problem we saw in connection with HOT comparison of
index-column values: how do you know which equality operator to use,
if a data type has more than one? Not to mention it'd be slow.

The alternative seems to be to forbid uninitialized pad bytes within
Datums. That's not very pleasant to contemplate either, since it'll
forever be vulnerable to sins of omission.

Thoughts?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-04-04 20:12:44 Re: modules
Previous Message Gregory Stark 2008-04-04 19:56:27 Re: Patch queue -> wiki