Re: Variable length varlena headers redux

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Gregory Stark" <gsstark(at)mit(dot)edu>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Variable length varlena headers redux
Date: 2007-02-13 15:15:21
Message-ID: 29715.1171379721@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> I don't really see a way around it though. Places that fill in VARDATA before
> the size (formatting.c seems to be the worst case) will just have to be
> changed and it'll be a fairly fragile point.

No, we're not going there: it'd break too much code now and it'd be a
continuing source of bugs for the foreseeable future. The sane way to
design this is that

(1) code written to existing practice will always generate 4-byte
headers. (Hence, VARDATA() acts the same as now.) That's the format
that generally gets passed around in memory.

(2) creation of a short header is handled by the TOAST code just before
the tuple goes to disk.

(3) replacement of a short header with a 4-byte header is considered
part of de-TOASTing.

After we have that working, we can work on offering alternative macros
that let specific functions avoid the overhead of conversion between
4-byte headers and short ones, in much the same way that there are TOAST
macros now that let specific functions get down-and-dirty with the
out-of-line TOAST representation. But first we have to get to the point
where 4-byte-header datums can be distinguished from short-header datums
by inspection; and that requires either network byte order in the 4-byte
length word or some other change in its representation.

> Actually I think neither htonl nor bitshifting the entire 4-byte word is going
> to really work here. Both will require 4-byte alignment.

And your point is what? The 4-byte form can continue to require
alignment, and *will* require it in any case, since many of the affected
datatypes expect alignment of the data within the varlena. The trick is
that when we are examining a non-aligned address within a tuple, we have
to be able to tell whether we are looking at the first byte of a
short-header datum (not aligned) or a pad byte. This is easily done,
for instance by decreeing that pad bytes must be zeroes.

I think we should probably consider making use of different alignment
codes for different varlena datatypes. For instance the geometry types
probably will still need align 'd' since they contain doubles; this may
mean that we should just punt on any short-header optimization for them.
But text and friends could have align 'c' showing that they need no
padding and would be perfectly happy with a nonaligned VARDATA pointer.
(Actually, maybe we should only do this whole thing for 'c'-alignable
data types? But NUMERIC is a bit of a problem, it'd like
's'-alignment. OTOH we could just switch NUMERIC to an all-two-byte
format that's independent of TOAST per se.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2007-02-13 15:24:16 Re: Variable length varlena headers redux
Previous Message Magnus Hagander 2007-02-13 15:03:23 Re: Variable length varlena headers redux