Quick Links

Re: ERROR: failed to build any 8-way joins

Lists:	pgsql-hackers

From:	Gregory Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Variable length varlena headers redux
Date:	2007-02-09 03:31:01
Message-ID:	87sldghuru.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I've been looking at this again and had a few conversations about it. This may
be easier than I had originally thought but there's one major issue that's
bugging me. Do you see any way to avoid having every user function everywhere
use a new macro api instead of VARDATA/VARATT_DATA and VARSIZE/VARATT_SIZEP?

The two approaches I see are either

a) To have two sets of macros, one of which, VARATT_DATA and VARATT_SIZEP are
for constructing new tuples and behaves exactly as it does now. So you always
construct a four-byte header datum. Then in heap_form*tuple we check if you
can use a shorter header and convert. VARDATA/VARSIZE would be for looking at
existing datums and would interpret the header bits.

This seems very fragile since one stray call site using VARATT_DATA to find
the data in an existing datum would cause random bugs that only occur rarely
in certain circumstances. It would even work as long as the size is filled in
with VARATT_SIZEP first which it usually is, but fail if someone changes the
order of the statements.

b) throw away VARATT_DATA and VARATT_SIZEP and make all user function
everywhere change over to a new macro api. That seems like a pretty big
burden. It's safer but means every contrib module would have to be updated and
so on.

I'm hoping I'm missing something and there's a way to do this without breaking
the api for every user function.

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Gregory Stark <gsstark(at)mit(dot)edu>
Cc:	Tom Lane <tgl(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-09 03:58:07
Message-ID:	200702090358.l193w7v02893@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Uh, I thought the approach was to create type-specific in/out functions,
and add casting so every time there were referenced, they would expand
to a varlena structure in memory.

---------------------------------------------------------------------------

Gregory Stark wrote:
>
> I've been looking at this again and had a few conversations about it. This may
> be easier than I had originally thought but there's one major issue that's
> bugging me. Do you see any way to avoid having every user function everywhere
> use a new macro api instead of VARDATA/VARATT_DATA and VARSIZE/VARATT_SIZEP?
>
> The two approaches I see are either
>
> a) To have two sets of macros, one of which, VARATT_DATA and VARATT_SIZEP are
> for constructing new tuples and behaves exactly as it does now. So you always
> construct a four-byte header datum. Then in heap_form*tuple we check if you
> can use a shorter header and convert. VARDATA/VARSIZE would be for looking at
> existing datums and would interpret the header bits.
>
> This seems very fragile since one stray call site using VARATT_DATA to find
> the data in an existing datum would cause random bugs that only occur rarely
> in certain circumstances. It would even work as long as the size is filled in
> with VARATT_SIZEP first which it usually is, but fail if someone changes the
> order of the statements.
>
> or
>
> b) throw away VARATT_DATA and VARATT_SIZEP and make all user function
> everywhere change over to a new macro api. That seems like a pretty big
> burden. It's safer but means every contrib module would have to be updated and
> so on.
>
> I'm hoping I'm missing something and there's a way to do this without breaking
> the api for every user function.
>
>

-- Start of included mail From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>

> To: Gregory Stark <stark(at)enterprisedb(dot)com>
> cc: Gregory Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>,
> Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org,
> Martijn van Oosterhout <kleptog(at)svana(dot)org>
> Subject: Re: [HACKERS] Fixed length data types issue
> Date: Mon, 11 Sep 2006 13:15:43 -0400
> Lines: 64
> Xref: stark.xeocode.com work.enterprisedb:683

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > In any case it seems a bit backwards to me. Wouldn't it be better to
> > preserve bits in the case of short length words where they're precious
> > rather than long ones? If we make 0xxxxxxx the 1-byte case it means ...
>
> Well, I don't find that real persuasive: you're saying that it's
> important to have a 1-byte not 2-byte header for datums between 64 and
> 127 bytes long. Which is by definition less than a 2% savings for those
> values. I think its's more important to pick bitpatterns that reduce
> the number of cases heap_deform_tuple has to think about while decoding
> the length of a field --- every "if" in that inner loop is expensive.
>
> I realized this morning that if we are going to preserve the rule that
> 4-byte-header and compressed-header cases can be distinguished from the
> data alone, there is no reason to be very worried about whether the
> 2-byte cases can represent the maximal length of an in-line datum.
> If you want to do 16K inline (and your page is big enough for that)
> you can just fall back to the 4-byte-header case. So there's no real
> disadvantage if the 2-byte headers can only go up to 4K or so. This
> gives us some more flexibility in the bitpattern choices.
>
> Another thought that occurred to me is that if we preserve the
> convention that a length word's value includes itself, then for a
> 1-byte header the bit pattern 10000000 is meaningless --- the count
> has to be at least 1. So one trick we could play is to take over
> this value as the signal for "toast pointer follows", with the
> assumption that the tuple-decoder code knows a-priori how big a
> toast pointer is. I am not real enamored of this, because it certainly
> adds one case to the inner heap_deform_tuple loop and it'll give us
> problems if we ever want more than one kind of toast pointer. But
> it's a possibility.
>
> Anyway, a couple of encodings that I'm thinking about now involve
> limiting uncompressed data to 1G (same as now), so that we can play
> with the first 2 bits instead of just 1:
>
> 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
> 01xxxxxx 4-byte length word, aligned, compressed data (up to 1G)
> 100xxxxx 1-byte length word, unaligned, TOAST pointer
> 1010xxxx 2-byte length word, unaligned, uncompressed data (up to 4K)
> 1011xxxx 2-byte length word, unaligned, compressed data (up to 4K)
> 11xxxxxx 1-byte length word, unaligned, uncompressed data (up to 63b)
>
> or
>
> 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
> 010xxxxx 2-byte length word, unaligned, uncompressed data (up to 8K)
> 011xxxxx 2-byte length word, unaligned, compressed data (up to 8K)
> 10000000 1-byte length word, unaligned, TOAST pointer
> 1xxxxxxx 1-byte length word, unaligned, uncompressed data (up to 127b)
> (xxxxxxx not all zero)
>
> This second choice allows longer datums in both the 1-byte and 2-byte
> header formats, but it hardwires the length of a TOAST pointer and
> requires four cases to be distinguished in the inner loop; the first
> choice only requires three cases, because TOAST pointer and 1-byte
> header can be handled by the same rule "length is low 6 bits of byte".
> The second choice also loses the ability to store in-line compressed
> data above 8K, but that's probably an insignificant loss.
>
> There's more than one way to do it ...
>
> regards, tom lane
>
-- End of included mail.

>
>
> --
> Gregory Stark
> EnterpriseDB http://www.enterprisedb.com

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Gregory Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-09 05:35:08
Message-ID:	200702090535.l195Z8C07463@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian wrote:
>
> Uh, I thought the approach was to create type-specific in/out functions,
> and add casting so every time there were referenced, they would expand
> to a varlena structure in memory.

Oh, one more thing. You are going to need to teach the code that walks
through a tuple attributes about the short header types. I think you
should set pg_type.typlen = -3 (vs -1 for varlena) and put your macro
code there too. (As an example, see the macro att_addlength().)

I know it is kind of odd to have a data type that is only used on disk,
and not in memory, but I see this as a baby varlena type, used only to
store and get varlena values using less disk space.

---------------------------------------------------------------------------
>
> Gregory Stark wrote:
> >
> > I've been looking at this again and had a few conversations about it. This may
> > be easier than I had originally thought but there's one major issue that's
> > bugging me. Do you see any way to avoid having every user function everywhere
> > use a new macro api instead of VARDATA/VARATT_DATA and VARSIZE/VARATT_SIZEP?
> >
> > The two approaches I see are either
> >
> > a) To have two sets of macros, one of which, VARATT_DATA and VARATT_SIZEP are
> > for constructing new tuples and behaves exactly as it does now. So you always
> > construct a four-byte header datum. Then in heap_form*tuple we check if you
> > can use a shorter header and convert. VARDATA/VARSIZE would be for looking at
> > existing datums and would interpret the header bits.
> >
> > This seems very fragile since one stray call site using VARATT_DATA to find
> > the data in an existing datum would cause random bugs that only occur rarely
> > in certain circumstances. It would even work as long as the size is filled in
> > with VARATT_SIZEP first which it usually is, but fail if someone changes the
> > order of the statements.
> >
> > or
> >
> > b) throw away VARATT_DATA and VARATT_SIZEP and make all user function
> > everywhere change over to a new macro api. That seems like a pretty big
> > burden. It's safer but means every contrib module would have to be updated and
> > so on.
> >
> > I'm hoping I'm missing something and there's a way to do this without breaking
> > the api for every user function.
> >
> >
>
> -- Start of included mail From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
>
> > To: Gregory Stark <stark(at)enterprisedb(dot)com>
> > cc: Gregory Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>,
> > Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org,
> > Martijn van Oosterhout <kleptog(at)svana(dot)org>
> > Subject: Re: [HACKERS] Fixed length data types issue
> > Date: Mon, 11 Sep 2006 13:15:43 -0400
> > Lines: 64
> > Xref: stark.xeocode.com work.enterprisedb:683
>
> > Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > > In any case it seems a bit backwards to me. Wouldn't it be better to
> > > preserve bits in the case of short length words where they're precious
> > > rather than long ones? If we make 0xxxxxxx the 1-byte case it means ...
> >
> > Well, I don't find that real persuasive: you're saying that it's
> > important to have a 1-byte not 2-byte header for datums between 64 and
> > 127 bytes long. Which is by definition less than a 2% savings for those
> > values. I think its's more important to pick bitpatterns that reduce
> > the number of cases heap_deform_tuple has to think about while decoding
> > the length of a field --- every "if" in that inner loop is expensive.
> >
> > I realized this morning that if we are going to preserve the rule that
> > 4-byte-header and compressed-header cases can be distinguished from the
> > data alone, there is no reason to be very worried about whether the
> > 2-byte cases can represent the maximal length of an in-line datum.
> > If you want to do 16K inline (and your page is big enough for that)
> > you can just fall back to the 4-byte-header case. So there's no real
> > disadvantage if the 2-byte headers can only go up to 4K or so. This
> > gives us some more flexibility in the bitpattern choices.
> >
> > Another thought that occurred to me is that if we preserve the
> > convention that a length word's value includes itself, then for a
> > 1-byte header the bit pattern 10000000 is meaningless --- the count
> > has to be at least 1. So one trick we could play is to take over
> > this value as the signal for "toast pointer follows", with the
> > assumption that the tuple-decoder code knows a-priori how big a
> > toast pointer is. I am not real enamored of this, because it certainly
> > adds one case to the inner heap_deform_tuple loop and it'll give us
> > problems if we ever want more than one kind of toast pointer. But
> > it's a possibility.
> >
> > Anyway, a couple of encodings that I'm thinking about now involve
> > limiting uncompressed data to 1G (same as now), so that we can play
> > with the first 2 bits instead of just 1:
> >
> > 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
> > 01xxxxxx 4-byte length word, aligned, compressed data (up to 1G)
> > 100xxxxx 1-byte length word, unaligned, TOAST pointer
> > 1010xxxx 2-byte length word, unaligned, uncompressed data (up to 4K)
> > 1011xxxx 2-byte length word, unaligned, compressed data (up to 4K)
> > 11xxxxxx 1-byte length word, unaligned, uncompressed data (up to 63b)
> >
> > or
> >
> > 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
> > 010xxxxx 2-byte length word, unaligned, uncompressed data (up to 8K)
> > 011xxxxx 2-byte length word, unaligned, compressed data (up to 8K)
> > 10000000 1-byte length word, unaligned, TOAST pointer
> > 1xxxxxxx 1-byte length word, unaligned, uncompressed data (up to 127b)
> > (xxxxxxx not all zero)
> >
> > This second choice allows longer datums in both the 1-byte and 2-byte
> > header formats, but it hardwires the length of a TOAST pointer and
> > requires four cases to be distinguished in the inner loop; the first
> > choice only requires three cases, because TOAST pointer and 1-byte
> > header can be handled by the same rule "length is low 6 bits of byte".
> > The second choice also loses the ability to store in-line compressed
> > data above 8K, but that's probably an insignificant loss.
> >
> > There's more than one way to do it ...
> >
> > regards, tom lane
> >
> -- End of included mail.
>
> >
> >
> > --
> > Gregory Stark
> > EnterpriseDB http://www.enterprisedb.com
>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://www.enterprisedb.com
>
> + If your life is a hard drive, Christ can be your backup. +
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Gregory Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-09 11:33:04
Message-ID:	87mz3nin0v.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:

> Bruce Momjian wrote:
> >
> > Uh, I thought the approach was to create type-specific in/out functions,
> > and add casting so every time there were referenced, they would expand
> > to a varlena structure in memory.

Are you talking about actual casts? Because that would lead to all kinds of
strange places with indexes and function lookups and so on. Or are you just
talking about code in the macro api to datum?

> Oh, one more thing. You are going to need to teach the code that walks
> through a tuple attributes about the short header types. I think you
> should set pg_type.typlen = -3 (vs -1 for varlena) and put your macro
> code there too. (As an example, see the macro att_addlength().)

I thought of doing this. It would let us, for example, treat text/varchar,
bpchar, and numeric but leave other data types unchanged.

That does help somewhat but unfortunately text is the problem case. There's
tons of code that generates text without using textin. All of pgcrypto for
example.

> I know it is kind of odd to have a data type that is only used on disk,
> and not in memory, but I see this as a baby varlena type, used only to
> store and get varlena values using less disk space.

I was leaning toward generating the short varlena headers primarily in
heap_form*tuple and just having the datatype specific code generate 4-byte
headers much as you describe.

However that doesn't get us away from having VARDATA/VARSIZE aware of the new
headers. Since heap_deform*tuple and the other entry points which extract
individual attributes return pointers to the datum in the tuple. They can't
expand the header to a 4-byte header on the fly.

I thought of doing it in DETOAST_DATUM on the theory that everyone's going to
be calling it on their arguments. However there are other cases than just
arguments. Other functions might call, say, text_concat() and then call
VARDATA() on the result.

Even if we only ever generate short headers on heap_form*tuple and always
expand them on DETOAST we could have code that passes around tuples that it
"knows" are entirely in memory and therefore not toasted. I'm thinking of
plpgsql here primarily. Perhaps it would be enough to outlaw this behaviour
but it still seems sort of fragile to me.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Tom Lane <tgl(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-09 14:20:45
Message-ID:	200702091420.l19EKjP26468@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark wrote:
>
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>
> > Bruce Momjian wrote:
> > >
> > > Uh, I thought the approach was to create type-specific in/out functions,
> > > and add casting so every time there were referenced, they would expand
> > > to a varlena structure in memory.
>
> Are you talking about actual casts? Because that would lead to all kinds of
> strange places with indexes and function lookups and so on. Or are you just
> talking about code in the macro api to datum?

I am thinking of auto-casts, sort of like how varchar, char, and text
are all internally treated as interchangable.

> > Oh, one more thing. You are going to need to teach the code that walks
> > through a tuple attributes about the short header types. I think you
> > should set pg_type.typlen = -3 (vs -1 for varlena) and put your macro
> > code there too. (As an example, see the macro att_addlength().)
>
> I thought of doing this. It would let us, for example, treat text/varchar,
> bpchar, and numeric but leave other data types unchanged.
>
> That does help somewhat but unfortunately text is the problem case. There's
> tons of code that generates text without using textin. All of pgcrypto for
> example.

Well, TEXT can't use short headers.

> > I know it is kind of odd to have a data type that is only used on disk,
> > and not in memory, but I see this as a baby varlena type, used only to
> > store and get varlena values using less disk space.
>
> I was leaning toward generating the short varlena headers primarily in
> heap_form*tuple and just having the datatype specific code generate 4-byte
> headers much as you describe.

Yep.

> However that doesn't get us away from having VARDATA/VARSIZE aware of the new
> headers. Since heap_deform*tuple and the other entry points which extract
> individual attributes return pointers to the datum in the tuple. They can't
> expand the header to a 4-byte header on the fly.

Yep, you are going to have to special-case those call points to test
typlen and use your short macros there if == -3.

> I thought of doing it in DETOAST_DATUM on the theory that everyone's going to
> be calling it on their arguments. However there are other cases than just
> arguments. Other functions might call, say, text_concat() and then call
> VARDATA() on the result.

Right, I think all the in-memory stuff has to be varlena.

> Even if we only ever generate short headers on heap_form*tuple and always
> expand them on DETOAST we could have code that passes around tuples that it
> "knows" are entirely in memory and therefore not toasted. I'm thinking of
> plpgsql here primarily. Perhaps it would be enough to outlaw this behaviour
> but it still seems sort of fragile to me.

Yea, we might need some cleanup, but the cleanup is just to do things
properly. I am unclear on the case you are describing.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-09 15:11:18
Message-ID:	24763.1171033878@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> I know it is kind of odd to have a data type that is only used on disk,
>> and not in memory, but I see this as a baby varlena type, used only to
>> store and get varlena values using less disk space.

> I was leaning toward generating the short varlena headers primarily in
> heap_form*tuple and just having the datatype specific code generate 4-byte
> headers much as you describe.

I thought we had a solution for all this, namely to make the short-form
headers be essentially a TOAST-compressed representation. The format
with 4-byte headers is still legal but just not compressed. Anyone who
fails to detoast an input argument is already broken, so there's no code
compatibility hit taken.

regards, tom lane

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Greg Stark" <gsstark(at)mit(dot)edu>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-09 16:50:05
Message-ID:	87zm7ncm2q.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Greg Stark <gsstark(at)mit(dot)edu> writes:
>> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>>> I know it is kind of odd to have a data type that is only used on disk,
>>> and not in memory, but I see this as a baby varlena type, used only to
>>> store and get varlena values using less disk space.
>
>> I was leaning toward generating the short varlena headers primarily in
>> heap_form*tuple and just having the datatype specific code generate 4-byte
>> headers much as you describe.
>
> I thought we had a solution for all this, namely to make the short-form
> headers be essentially a TOAST-compressed representation. The format
> with 4-byte headers is still legal but just not compressed. Anyone who
> fails to detoast an input argument is already broken, so there's no code
> compatibility hit taken.

It's not just input arguments though. A function could call
DirectFunctionCall* and rightfully expect the return value not to need
detoasting.

I suppose this leads me to *only* generate short headers at heap_form*tuple
time. Then DirectFunctionCall isn't relevant and most of the user code is
perfectly safe.

There could still be cases where a heaptuple is passed around in pl_exec.c or
somewhere but if it's subsequently deformed whoever looks at it hopefully
wouldn't be too surprised for it to be mandatory that they go through
pg_detoast_datum. It'll happen as long as they use the DatumGetFoo macros
anyways.

It does mean that anyone going through a heap_form*tuple/heap_deform*tuple
cycle may generate more copies and memory allocations than they expected.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-09 18:36:23
Message-ID:	878xf7i3fc.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> >> I know it is kind of odd to have a data type that is only used on disk,
> >> and not in memory, but I see this as a baby varlena type, used only to
> >> store and get varlena values using less disk space.
>
> > I was leaning toward generating the short varlena headers primarily in
> > heap_form*tuple and just having the datatype specific code generate 4-byte
> > headers much as you describe.
>
> I thought we had a solution for all this, namely to make the short-form
> headers be essentially a TOAST-compressed representation. The format
> with 4-byte headers is still legal but just not compressed. Anyone who
> fails to detoast an input argument is already broken, so there's no code
> compatibility hit taken.

Uh. So I don't see how to make this work on a little-endian machine. If the
leading its are 0 we don't know if they're toast flags or bits on the least
significant byte of a longer length.

If we store all lengths in network byte order that problem goes away but then
user code that does "VARATT_SIZEP(datum) = len" is incorrect.

If we declare in-memory format to be host byte order and on-disk format to be
network byte order then every single varlena datum needs to be copied when
heap_deform*tuple runs.

If we only do this for a new kind of varlena then only text/varchar/
char/numeric datums would need to be copied but that's still a lot.

--
greg

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-09 22:48:16
Message-ID:	200702092248.l19MmG317708@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > Greg Stark <gsstark(at)mit(dot)edu> writes:
> > > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > >> I know it is kind of odd to have a data type that is only used on disk,
> > >> and not in memory, but I see this as a baby varlena type, used only to
> > >> store and get varlena values using less disk space.
> >
> > > I was leaning toward generating the short varlena headers primarily in
> > > heap_form*tuple and just having the datatype specific code generate 4-byte
> > > headers much as you describe.
> >
> > I thought we had a solution for all this, namely to make the short-form
> > headers be essentially a TOAST-compressed representation. The format
> > with 4-byte headers is still legal but just not compressed. Anyone who
> > fails to detoast an input argument is already broken, so there's no code
> > compatibility hit taken.
>
> Uh. So I don't see how to make this work on a little-endian machine. If the
> leading its are 0 we don't know if they're toast flags or bits on the least
> significant byte of a longer length.
>
> If we store all lengths in network byte order that problem goes away but then
> user code that does "VARATT_SIZEP(datum) = len" is incorrect.
>
> If we declare in-memory format to be host byte order and on-disk format to be
> network byte order then every single varlena datum needs to be copied when
> heap_deform*tuple runs.
>
> If we only do this for a new kind of varlena then only text/varchar/
> char/numeric datums would need to be copied but that's still a lot.

I wonder if we need to reorder the TOAST structure to have the bits we
need at the start of the structure so we can be sure they are first.

For example, what if we split varattrib.va_header, which is int32 now,
into for 'char' fields, and just reassemble it in the toast code. That
would be pretty localized.

I had forgotten about hooking into the TOAST system, but since we are
going to be "expanding" the headers of these types when they get into
memory, it does make sense.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Bruce Momjian" <bruce(at)momjian(dot)us>
Cc:	"Greg Stark" <gsstark(at)mit(dot)edu>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-09 23:23:55
Message-ID:	87mz3mdies.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Bruce Momjian" <bruce(at)momjian(dot)us> writes:

>> Uh. So I don't see how to make this work on a little-endian machine. If the
>> leading its are 0 we don't know if they're toast flags or bits on the least
>> significant byte of a longer length.
>> ...
> I had forgotten about hooking into the TOAST system, but since we are
> going to be "expanding" the headers of these types when they get into
> memory, it does make sense.

Ok, I guess this can work if we guarantee that in-memory datums always have
4-byte headers. That means that heap_deform*tuple always copies the datum if
it's this type of datum.

That means we never have pointers to shortvarlena datums inside tuples. I'm
not sure if there are parts of the system that assume that the datums they get
out of heap_deform*tuple are pointers into the tuple or not. I haven't come
across any in my travels thus far.

That seems like an awful lot of copying and pallocs that aren't there
currently though. And it'll make us reluctant to change over frequently used
data types like text -- which are precisely the ones that would gain us the
most.

It seems to me that it might be better to change to storing varlena lengths in
network byte order instead. That way we can dedicate the leading bits to toast
flags and read more bytes as necessary.

I think the way to do this would be to throw out the VARATT_SIZEP macro and
replace it with VARATT_SET_SIZE(datum,size). VARSIZE would just call ntohl (or
ntohs if the leading bits on the first byte indicated...)

That does mean touching every piece of data type code. And invalidating every
piece of user code. :( At least it's fairly mechanical. And it has the
advantage of not being at all fragile -- unfixed code won't even compile.

While we're at it I would suggest taking out the VARHDRSZ offset. Just store
the size of the data payload. The constant VARHDRSZ offset no longer makes
sense since it won't actually be the size of the varlena header size anyways.
And predicting the actual size of the varlena header will be annoying and
bug-prone since it depends on the resulting value you calculate.

(Incidentally, this would actually make EnterpriseDB somewhat sad since we
want pg_migrator to work for 8.3. But it wouldn't be out of the realm of
possibility to go through the database and switch varlena headers to network
byte order. There's no need to compress them, just leave the 4-byte format in
place with the bytes swapped around.)

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	"Bruce Momjian" <bruce(at)momjian(dot)us>, "Greg Stark" <gsstark(at)mit(dot)edu>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-10 00:31:56
Message-ID:	6320.1171067516@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> That seems like an awful lot of copying and pallocs that aren't there
> currently though. And it'll make us reluctant to change over frequently used
> data types like text -- which are precisely the ones that would gain us the
> most.

> It seems to me that it might be better to change to storing varlena lengths in
> network byte order instead. That way we can dedicate the leading bits to toast
> flags and read more bytes as necessary.

This'll add its own overhead ... but probably less than pallocs and
data-copying would. And I agree we can find (pretty much) all the
places that need changing by the expedient of deliberately renaming
the macros and struct fields.

One problem I foresee is that I think you are about to propose that
VARDATA depend on the length already having been inserted, which it
does not now; and simple renamings won't detect ordering errors for
that. Also I believe there are places that over-allocate memory,
fill in the data, and only then set the length; something you will
not easily be able to change. It might work if we assume that
*creation* of a varlena value always produces the 4-byte-header form
and only reading of a value that might be on disk needs to cope with
the short-header forms. However this seems to require two different
forms of VARDATA depending on whether one is preparing or reading a
value. Ugh.

regards, tom lane

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-10 02:44:36
Message-ID:	200702100244.l1A2iaI10632@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > That seems like an awful lot of copying and pallocs that aren't there
> > currently though. And it'll make us reluctant to change over frequently used
> > data types like text -- which are precisely the ones that would gain us the
> > most.
>
> > It seems to me that it might be better to change to storing varlena lengths in
> > network byte order instead. That way we can dedicate the leading bits to toast
> > flags and read more bytes as necessary.
>
> This'll add its own overhead ... but probably less than pallocs and
> data-copying would. And I agree we can find (pretty much) all the
> places that need changing by the expedient of deliberately renaming
> the macros and struct fields.

I think we should go with the pallocs and see how it performs. That is
certainly going to be easier to do, and we can test it pretty easily.

One palloc optimization idea would be to split out the representation so
the length is stored seprately from the data in memory, and we could use
an int32 for the length, and point to the shared buffer for the data.
However I don't think our macros can handle that so it might be a
non-starter.

However, I think we should find out of the palloc is a problem before
avoiding it.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-10 03:45:38
Message-ID:	200702100345.l1A3jcT16281@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian wrote:
> Tom Lane wrote:
> > Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > > That seems like an awful lot of copying and pallocs that aren't there
> > > currently though. And it'll make us reluctant to change over frequently used
> > > data types like text -- which are precisely the ones that would gain us the
> > > most.
> >
> > > It seems to me that it might be better to change to storing varlena lengths in
> > > network byte order instead. That way we can dedicate the leading bits to toast
> > > flags and read more bytes as necessary.
> >
> > This'll add its own overhead ... but probably less than pallocs and
> > data-copying would. And I agree we can find (pretty much) all the
> > places that need changing by the expedient of deliberately renaming
> > the macros and struct fields.
>
> I think we should go with the pallocs and see how it performs. That is
> certainly going to be easier to do, and we can test it pretty easily.
>
> One palloc optimization idea would be to split out the representation so
> the length is stored seprately from the data in memory, and we could use
> an int32 for the length, and point to the shared buffer for the data.
> However I don't think our macros can handle that so it might be a
> non-starter.
>
> However, I think we should find out of the palloc is a problem before
> avoiding it.

Another idea about reducing palloc is that we know every short column is
at most 128 + 4 = 132 bytes, so we could allocate a 132-byte buffer for
every short column in the scan, and just re-use the buffer for every
row.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gregory Stark <gsstark(at)mit(dot)edu>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 04:19:14
Message-ID:	23221.1171340354@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark <gsstark(at)mit(dot)edu> writes:
> Do you see any way to avoid having every user function everywhere
> use a new macro api instead of VARDATA/VARATT_DATA and VARSIZE/VARATT_SIZEP?

Bruce and I had a long phone conversation about that today. We
concluded that there doesn't seem to be any strong reason to change the
macros for *reading* a varlena value's length. The problem is with the
code that *writes* a length word. The mother example is textin():

result = (text *) palloc(len + VARHDRSZ);
VARATT_SIZEP(result) = len + VARHDRSZ;
memcpy(VARDATA(result), inputText, len);

There is not a lot we can do with that second line: it's going to assign
the value of "len + VARHDRSZ" to something that has to be a native C
variable, struct field, or at best bit field.

If we replaced that line with something like

SET_VARLENA_LEN(result, len + VARHDRSZ);

then we'd have a *whole* lot more control. For example it'd be easy to
implement the previously-discussed design involving storing uncompressed
length words in network byte order: SET_VARLENA_LEN does htonl() and
VARSIZE does ntohl() and nothing else in the per-datatype functions
needs to change. Another idea that we were kicking around is to make
an explicit distinction between little-endian and big-endian hardware:
on big-endian hardware, store the two TOAST flag bits in the MSBs as
now, but on little-endian, store them in the LSBs, shifting the length
value up two bits. This would probably be marginally faster than
htonl/ntohl depending on hardware and compiler intelligence, but either
way you get to guarantee that the flag bits are in the physically first
byte, which is the critical thing needed to be able to tell the
difference between compressed and uncompressed length values.

By my count there are only 170 uses of VARATT_SIZEP in the entire
backend (including contrib) so this is not an especially daunting
change. It would break existing user-written functions that return
varlena values, but the fix wouldn't be painful for them either.
We could guarantee that every problem spot is found by removing the
current definition of the macro (and renaming the underlying struct
fields to get the attention of the truly unreconstructed...). The
important point here is that VARSIZE() still works, so only code that
creates a new varlena value need be affected, not code that examines
one.

Too bad we didn't get this right in the original TOAST rewrite; but
hindsight is always 20/20. Right now I think we gotta absorb some
pain if we want to improve the header-overhead situation, and this
seems about the minimum possible amount of pain.

regards, tom lane

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Gregory Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 08:33:52
Message-ID:	20070213083352.GA608@svr2.hagander.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 12, 2007 at 11:19:14PM -0500, Tom Lane wrote:
> Gregory Stark <gsstark(at)mit(dot)edu> writes:

> By my count there are only 170 uses of VARATT_SIZEP in the entire
> backend (including contrib) so this is not an especially daunting
> change. It would break existing user-written functions that return
> varlena values, but the fix wouldn't be painful for them either.

Could the "new style" macros be back-ported to previous releases in case
we do this? That way module maintainers wouldn't need to maintain
two different sets of code for it - they could use the new style and
just compile it against an older version of pg?

//Magnus

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Gregory Stark" <gsstark(at)mit(dot)edu>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 13:49:00
Message-ID:	87wt2m88xf.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> If we replaced that line with something like
>
> SET_VARLENA_LEN(result, len + VARHDRSZ);
>
> then we'd have a *whole* lot more control.

I think that part was already clear. The problem was in VARDATA.

I don't really see a way around it though. Places that fill in VARDATA before
the size (formatting.c seems to be the worst case) will just have to be
changed and it'll be a fairly fragile point.

> For example it'd be easy to implement the previously-discussed design
> involving storing uncompressed length words in network byte order:
> SET_VARLENA_LEN does htonl() and VARSIZE does ntohl() and nothing else in
> the per-datatype functions needs to change. Another idea that we were
> kicking around is to make an explicit distinction between little-endian and
> big-endian hardware: on big-endian hardware, store the two TOAST flag bits
> in the MSBs as now, but on little-endian, store them in the LSBs, shifting
> the length value up two bits. This would probably be marginally faster than
> htonl/ntohl depending on hardware and compiler intelligence, but either way
> you get to guarantee that the flag bits are in the physically first byte,
> which is the critical thing needed to be able to tell the difference between
> compressed and uncompressed length values.

Actually I think neither htonl nor bitshifting the entire 4-byte word is going
to really work here. Both will require 4-byte alignment. Instead I think we
have to access the length byte by byte as a (char*) and do arithmetic. Since
it's the pointer being passed to VARSIZE that isn't too hard, but it might
perform poorly.

> The important point here is that VARSIZE() still works, so only code that
> creates a new varlena value need be affected, not code that examines one.

So what would VARSIZE() return, the size of the payload plus VARHDRSZ
regardless of what actual size the header was? That seems like it would break
the least existing code though removing all the VARHDRSZ offsets seems like it
would be cleaner.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 14:22:08
Message-ID:	200702131422.l1DEM8x06569@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Magnus Hagander wrote:
> On Mon, Feb 12, 2007 at 11:19:14PM -0500, Tom Lane wrote:
> > Gregory Stark <gsstark(at)mit(dot)edu> writes:
>
> > By my count there are only 170 uses of VARATT_SIZEP in the entire
> > backend (including contrib) so this is not an especially daunting
> > change. It would break existing user-written functions that return
> > varlena values, but the fix wouldn't be painful for them either.
>
> Could the "new style" macros be back-ported to previous releases in case
> we do this? That way module maintainers wouldn't need to maintain
> two different sets of code for it - they could use the new style and
> just compile it against an older version of pg?

Yes, Tom and I talked about this. It could appear in the next minor
release of all branches.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 14:25:32
Message-ID:	45D1CA5C.3010506@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark wrote:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> For example it'd be easy to implement the previously-discussed design
>> involving storing uncompressed length words in network byte order:
>> SET_VARLENA_LEN does htonl() and VARSIZE does ntohl() and nothing else in
>> the per-datatype functions needs to change. Another idea that we were
>> kicking around is to make an explicit distinction between little-endian and
>> big-endian hardware: on big-endian hardware, store the two TOAST flag bits
>> in the MSBs as now, but on little-endian, store them in the LSBs, shifting
>> the length value up two bits. This would probably be marginally faster than
>> htonl/ntohl depending on hardware and compiler intelligence, but either way
>> you get to guarantee that the flag bits are in the physically first byte,
>> which is the critical thing needed to be able to tell the difference between
>> compressed and uncompressed length values.
>
> Actually I think neither htonl nor bitshifting the entire 4-byte word is going
> to really work here. Both will require 4-byte alignment. Instead I think we
> have to access the length byte by byte as a (char*) and do arithmetic. Since
> it's the pointer being passed to VARSIZE that isn't too hard, but it might
> perform poorly.

We would still require all datums with a 4-byte header to be 4-byte
aligned, right? When reading, you would first check if it's a compressed
or uncompressed header. If compressed, read the 1 byte header, if
uncompressed, read the 4-byte header and do htonl or bitshifting. No
need to do htonl or bitshifting on unaligned datums.

>> The important point here is that VARSIZE() still works, so only code that
>> creates a new varlena value need be affected, not code that examines one.
>
> So what would VARSIZE() return, the size of the payload plus VARHDRSZ
> regardless of what actual size the header was? That seems like it would break
> the least existing code though removing all the VARHDRSZ offsets seems like it
> would be cleaner.

My vote would be to change every caller. Though there's a lot of
callers, it's a very simple change.

To make it posible to compile an external module against 8.2 and 8.3,
you could have a simple ifdef block to map the new macro to old
behavior. Or we could backport the macro definitions as Magnus suggested.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Gregory Stark" <gsstark(at)mit(dot)edu>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 14:41:48
Message-ID:	87fy9a86hf.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:

>> Actually I think neither htonl nor bitshifting the entire 4-byte word is going
>> to really work here. Both will require 4-byte alignment. Instead I think we
>> have to access the length byte by byte as a (char*) and do arithmetic. Since
>> it's the pointer being passed to VARSIZE that isn't too hard, but it might
>> perform poorly.
>
> We would still require all datums with a 4-byte header to be 4-byte aligned,
> right? When reading, you would first check if it's a compressed or uncompressed
> header. If compressed, read the 1 byte header, if uncompressed, read the 4-byte
> header and do htonl or bitshifting. No need to do htonl or bitshifting on
> unaligned datums.

It's not easy to require datums with 4-byte headers to be 4-byte aligned. How
do you know where to look for the bits to show it's an uncompressed header if
you don't know where it's aligned yet?

It could be done if you rule that if you're on an unaligned byte and see a \0
then scan forward until the aligned byte. But that seems just as cpu expensive
as just doing the arithmetic. And wastes space to boot.

I'm thinking VARSIZE would look something like:

#define VARSIZE((datum)) \
((((uint8*)(datum))[0] & 0x80) ?
(((uint8*)(datum))[0] & 0x7F) : \
(((uint8*)(datum))[0]<< 24 | ((uint8*)(datum))[1]<<16 | ((uint8*)(datum))[2]<<8 | ((uint8*)(datum))[0]))

Which is effectively the same as doing ntohl except that it only works for
left hand sides -- luckily VARSIZE always has a lhs. It also works for
unaligned accesses. It's going to be fairly slow but no slower than doing an
unaligned access looking at nul padding bytes.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, Gregory Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 14:44:03
Message-ID:	29357.1171377843@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Magnus Hagander wrote:
>> Could the "new style" macros be back-ported to previous releases in case
>> we do this?

> Yes, Tom and I talked about this. It could appear in the next minor
> release of all branches.

I don't really see the point of that. Third-party authors who want
their code to be backwards-compatible would do something like

#ifndef SET_VARLENA_LEN
#define SET_VARLENA_LEN(var,len) (VARATT_SIZEP(var) = (len))
#endif

While we could provide this same macro in later updates of the current
release branches, those authors are still going to want to include the
above in their code so as to be able to compile against existing
releases. Therefore there's not really much point in us doing it too.

regards, tom lane

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Gregory Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 14:46:40
Message-ID:	200702131546.40789.peter_e@gmx.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark wrote:
> a) To have two sets of macros, one of which, VARATT_DATA and
> VARATT_SIZEP are for constructing new tuples and behaves exactly as
> it does now. So you always construct a four-byte header datum. Then
> in heap_form*tuple we check if you can use a shorter header and
> convert. VARDATA/VARSIZE would be for looking at existing datums and
> would interpret the header bits.

Has any thought been given to headers *longer* than four bytes? I don't
exactly recall a flood of field reports that one gigabyte per datum is
too little, but as long as the encoding of variable length data is
changed, one might as well prepare a little for the future.

Of course, that would put a dent into any plan that wants to normalize
the header to four bytes somewhere along the way.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Gregory Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 15:03:23
Message-ID:	20070213150323.GA11315@svr2.hagander.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 13, 2007 at 09:44:03AM -0500, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Magnus Hagander wrote:
> >> Could the "new style" macros be back-ported to previous releases in case
> >> we do this?
>
> > Yes, Tom and I talked about this. It could appear in the next minor
> > release of all branches.
>
> I don't really see the point of that. Third-party authors who want
> their code to be backwards-compatible would do something like
>
> #ifndef SET_VARLENA_LEN
> #define SET_VARLENA_LEN(var,len) (VARATT_SIZEP(var) = (len))
> #endif
>
> While we could provide this same macro in later updates of the current
> release branches, those authors are still going to want to include the
> above in their code so as to be able to compile against existing
> releases. Therefore there's not really much point in us doing it too.

It'd be a help to those who wouldn't be building against releases with
known security issues in them for one ;-)

Sure, it's not important or a dealbreaker or so, but it would be
convenient.

//Magnus

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	"Gregory Stark" <gsstark(at)mit(dot)edu>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 15:15:21
Message-ID:	29715.1171379721@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> I don't really see a way around it though. Places that fill in VARDATA before
> the size (formatting.c seems to be the worst case) will just have to be
> changed and it'll be a fairly fragile point.

No, we're not going there: it'd break too much code now and it'd be a
continuing source of bugs for the foreseeable future. The sane way to
design this is that

(1) code written to existing practice will always generate 4-byte
headers. (Hence, VARDATA() acts the same as now.) That's the format
that generally gets passed around in memory.

(2) creation of a short header is handled by the TOAST code just before
the tuple goes to disk.

(3) replacement of a short header with a 4-byte header is considered
part of de-TOASTing.

After we have that working, we can work on offering alternative macros
that let specific functions avoid the overhead of conversion between
4-byte headers and short ones, in much the same way that there are TOAST
macros now that let specific functions get down-and-dirty with the
out-of-line TOAST representation. But first we have to get to the point
where 4-byte-header datums can be distinguished from short-header datums
by inspection; and that requires either network byte order in the 4-byte
length word or some other change in its representation.

> Actually I think neither htonl nor bitshifting the entire 4-byte word is going
> to really work here. Both will require 4-byte alignment.

And your point is what? The 4-byte form can continue to require
alignment, and *will* require it in any case, since many of the affected
datatypes expect alignment of the data within the varlena. The trick is
that when we are examining a non-aligned address within a tuple, we have
to be able to tell whether we are looking at the first byte of a
short-header datum (not aligned) or a pad byte. This is easily done,
for instance by decreeing that pad bytes must be zeroes.

I think we should probably consider making use of different alignment
codes for different varlena datatypes. For instance the geometry types
probably will still need align 'd' since they contain doubles; this may
mean that we should just punt on any short-header optimization for them.
But text and friends could have align 'c' showing that they need no
padding and would be perfectly happy with a nonaligned VARDATA pointer.
(Actually, maybe we should only do this whole thing for 'c'-alignable
data types? But NUMERIC is a bit of a problem, it'd like
's'-alignment. OTOH we could just switch NUMERIC to an all-two-byte
format that's independent of TOAST per se.)

regards, tom lane

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 15:24:16
Message-ID:	200702131524.l1DFOGT21647@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> Gregory Stark wrote:
> > "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> >> For example it'd be easy to implement the previously-discussed design
> >> involving storing uncompressed length words in network byte order:
> >> SET_VARLENA_LEN does htonl() and VARSIZE does ntohl() and nothing else in
> >> the per-datatype functions needs to change. Another idea that we were
> >> kicking around is to make an explicit distinction between little-endian and
> >> big-endian hardware: on big-endian hardware, store the two TOAST flag bits
> >> in the MSBs as now, but on little-endian, store them in the LSBs, shifting
> >> the length value up two bits. This would probably be marginally faster than
> >> htonl/ntohl depending on hardware and compiler intelligence, but either way
> >> you get to guarantee that the flag bits are in the physically first byte,
> >> which is the critical thing needed to be able to tell the difference between
> >> compressed and uncompressed length values.
> >
> > Actually I think neither htonl nor bitshifting the entire 4-byte word is going
> > to really work here. Both will require 4-byte alignment. Instead I think we
> > have to access the length byte by byte as a (char*) and do arithmetic. Since
> > it's the pointer being passed to VARSIZE that isn't too hard, but it might
> > perform poorly.
>
> We would still require all datums with a 4-byte header to be 4-byte
> aligned, right? When reading, you would first check if it's a compressed
> or uncompressed header. If compressed, read the 1 byte header, if
> uncompressed, read the 4-byte header and do htonl or bitshifting. No
> need to do htonl or bitshifting on unaligned datums.

I am not sure how to handle the alignment issue. If we require 1-byte
headers to be 4-byte aligned, we lose a lot of the benefits of the
1-byte header.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 15:30:08
Message-ID:	45D1D980.4050109@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian wrote:
> Heikki Linnakangas wrote:
>> We would still require all datums with a 4-byte header to be 4-byte
>> aligned, right? When reading, you would first check if it's a compressed
>> or uncompressed header. If compressed, read the 1 byte header, if
>> uncompressed, read the 4-byte header and do htonl or bitshifting. No
>> need to do htonl or bitshifting on unaligned datums.
>
> I am not sure how to handle the alignment issue. If we require 1-byte
> headers to be 4-byte aligned, we lose a lot of the benefits of the
> 1-byte header.

Why would we require that?

I don't see a problem with having 4-byte header 4-byte aligned, and
1-byte headers not aligned. The first access to the header is to check
if it's a 4 or 1 byte header. That's a 1 byte wide access, requiring no
alignment. After that you know which one it is.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 15:43:30
Message-ID:	200702131543.l1DFhUJ24710@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> Bruce Momjian wrote:
> > Heikki Linnakangas wrote:
> >> We would still require all datums with a 4-byte header to be 4-byte
> >> aligned, right? When reading, you would first check if it's a compressed
> >> or uncompressed header. If compressed, read the 1 byte header, if
> >> uncompressed, read the 4-byte header and do htonl or bitshifting. No
> >> need to do htonl or bitshifting on unaligned datums.
> >
> > I am not sure how to handle the alignment issue. If we require 1-byte
> > headers to be 4-byte aligned, we lose a lot of the benefits of the
> > 1-byte header.
>
> Why would we require that?
>
> I don't see a problem with having 4-byte header 4-byte aligned, and
> 1-byte headers not aligned. The first access to the header is to check
> if it's a 4 or 1 byte header. That's a 1 byte wide access, requiring no
> alignment. After that you know which one it is.

But if you are walking through attributes, how do you know to look at
the next byte or the next aligned byte? We have to force zeros in
there?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Gregory Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 15:48:41
Message-ID:	305.1171381721@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> But if you are walking through attributes, how do you know to look at
> the next byte or the next aligned byte? We have to force zeros in
> there?

Yup: pad bytes must be zeroes (they are already) and a 1-byte-header
can't be a zero (easily done if its length includes itself). So the
tuple-walking code would do something like

if (looking-at-a-zero && not-at-4-byte-boundary)
advance to next 4-byte boundary;
check current byte to determine if 1-byte or 4-byte header;

regards, tom lane

From:	Gregory Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Gregory Stark" <gsstark(at)mit(dot)edu>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 15:49:13
Message-ID:	87fy9agirq.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > I don't really see a way around it though. Places that fill in VARDATA before
> > the size (formatting.c seems to be the worst case) will just have to be
> > changed and it'll be a fairly fragile point.
>
> No, we're not going there: it'd break too much code now and it'd be a
> continuing source of bugs for the foreseeable future. The sane way to
> design this is that
>
> (1) code written to existing practice will always generate 4-byte
> headers. (Hence, VARDATA() acts the same as now.) That's the format
> that generally gets passed around in memory.

So then we don't need to replace VARSIZE with SET_VARLENA_LEN at all.

> (2) creation of a short header is handled by the TOAST code just before
> the tuple goes to disk.
>
> (3) replacement of a short header with a 4-byte header is considered
> part of de-TOASTing.

So (nigh) every tuple will get deformed and reformed once before it goes to
disk? Currently the toast code doesn't even look at a tuple if it's small
enough, but in this case we would want it to fire even on very narrow rows.

One design possibility I considered was doing this in heap_deform_tuple and
heap_form_tuple. Basically skipping the extra deform/form_tuple cycle in the
toast code. I had considered having heap_deform_tuple palloc copies of these
data before returning them. But that has the same problems.

The other problem is that there may be places in the code that receive a datum
from someplace where they have every right to expect it not to be toasted. For
example, plpgsql deforming a tuple they just formed, or even as the return
value from a function. They might be quite surprised to receive a toasted
tuple.

Note also that that's going to force us to palloc and memcpy these data. Are
there going to be circumstances where existing code where this changes the
memory context lifetime of some data? If, for example, soemthing like the inet
code knows its arguments can never be large enough to get toasted and doesn't
do a FREE_IF_COPY on its btree operator arguments.

> After we have that working, we can work on offering alternative macros
> that let specific functions avoid the overhead of conversion between
> 4-byte headers and short ones, in much the same way that there are TOAST
> macros now that let specific functions get down-and-dirty with the
> out-of-line TOAST representation. But first we have to get to the point
> where 4-byte-header datums can be distinguished from short-header datums
> by inspection; and that requires either network byte order in the 4-byte
> length word or some other change in its representation.
>
> > Actually I think neither htonl nor bitshifting the entire 4-byte word is going
> > to really work here. Both will require 4-byte alignment.
>
> And your point is what? The 4-byte form can continue to require
> alignment, and *will* require it in any case, since many of the affected
> datatypes expect alignment of the data within the varlena. The trick is
> that when we are examining a non-aligned address within a tuple, we have
> to be able to tell whether we are looking at the first byte of a
> short-header datum (not aligned) or a pad byte. This is easily done,
> for instance by decreeing that pad bytes must be zeroes.

Well if we're doing it in toast then the alignment of the payload really
doesn't matter at all. It'll be realigned after detoasting anyways.

What I had had in mind was to prohibit using smaller headers than the
alignment of the data type. But that was on the assumption we would continue
to use the compressed header in memory and not copy it.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gregory Stark <gsstark(at)mit(dot)edu>
Cc:	"Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 16:06:00
Message-ID:	548.1171382760@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark <gsstark(at)mit(dot)edu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> (1) code written to existing practice will always generate 4-byte
>> headers. (Hence, VARDATA() acts the same as now.) That's the format
>> that generally gets passed around in memory.

> So then we don't need to replace VARSIZE with SET_VARLENA_LEN at all.

Yes, we do, because we have to alter the representation of 4-byte headers.
Otherwise we can't tell which header format a datum is using.

> So (nigh) every tuple will get deformed and reformed once before it goes to
> disk? Currently the toast code doesn't even look at a tuple if it's small
> enough, but in this case we would want it to fire even on very narrow rows.

I'd be inclined to put the intelligence into heap_form_tuple and thereby
avoid getting the TOAST code involved unless there are wide fields to
deal with.

> What I had had in mind was to prohibit using smaller headers than the
> alignment of the data type. But that was on the assumption we would continue
> to use the compressed header in memory and not copy it.

Well, it wouldn't be too unreasonable to limit this whole mechanism to
datatypes that have no alignment requirements on the *content* of their
datums; which in practice is probably just text/varchar/char and perhaps
inet.

regards, tom lane

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Gregory Stark <gsstark(at)mit(dot)edu>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 16:14:17
Message-ID:	200702131614.l1DGEHA29575@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark wrote:
>
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > > I don't really see a way around it though. Places that fill in VARDATA before
> > > the size (formatting.c seems to be the worst case) will just have to be
> > > changed and it'll be a fairly fragile point.
> >
> > No, we're not going there: it'd break too much code now and it'd be a
> > continuing source of bugs for the foreseeable future. The sane way to
> > design this is that
> >
> > (1) code written to existing practice will always generate 4-byte
> > headers. (Hence, VARDATA() acts the same as now.) That's the format
> > that generally gets passed around in memory.
>
> So then we don't need to replace VARSIZE with SET_VARLENA_LEN at all.
>
> > (2) creation of a short header is handled by the TOAST code just before
> > the tuple goes to disk.
> >
> > (3) replacement of a short header with a 4-byte header is considered
> > part of de-TOASTing.
>
> So (nigh) every tuple will get deformed and reformed once before it goes to
> disk? Currently the toast code doesn't even look at a tuple if it's small
> enough, but in this case we would want it to fire even on very narrow rows.

One weird idea I had was that the macros can read 1 and 4-byte headers,
but can only create 4-byte headers. The code that writes to the shared
buffer pages would to compression from 1 to 4 bytes as needed.

This might avoid changing any macros. It also allows us to carry around
4-byte headers in memory, which I think might be more efficient. I am
not sure if I have heard this idea proposed already or not.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Gregory Stark <gsstark(at)mit(dot)edu>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 16:16:26
Message-ID:	87abzighid.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> > So (nigh) every tuple will get deformed and reformed once before it goes to
> > disk? Currently the toast code doesn't even look at a tuple if it's small
> > enough, but in this case we would want it to fire even on very narrow rows.
>
> I'd be inclined to put the intelligence into heap_form_tuple and thereby
> avoid getting the TOAST code involved unless there are wide fields to
> deal with.

And have heap_deform_tuple / heap_getattr palloc and memcpy the the datum on
the way out? Or wait until detoast time and then do it?

If we do it on the way out of the heaptuple then we could have a rule that
headers are always compressed in a tuple and always uncompressed out of a
tuple.

But I'm surprised, I had the impression that you were reluctant to consider
memcpying all the data all the time.

--
greg

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 16:19:18
Message-ID:	200702131619.l1DGJIA00442@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > > So (nigh) every tuple will get deformed and reformed once before it goes to
> > > disk? Currently the toast code doesn't even look at a tuple if it's small
> > > enough, but in this case we would want it to fire even on very narrow rows.
> >
> > I'd be inclined to put the intelligence into heap_form_tuple and thereby
> > avoid getting the TOAST code involved unless there are wide fields to
> > deal with.
>
> And have heap_deform_tuple / heap_getattr palloc and memcpy the the datum on
> the way out? Or wait until detoast time and then do it?
>
> If we do it on the way out of the heaptuple then we could have a rule that
> headers are always compressed in a tuple and always uncompressed out of a
> tuple.
>
> But I'm surprised, I had the impression that you were reluctant to consider
> memcpying all the data all the time.

Uh, if the macros can read 1 and 4-byte headers, why do we need to
allocate memory for them?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	"Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 16:23:52
Message-ID:	879.1171383832@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> I'd be inclined to put the intelligence into heap_form_tuple and thereby
>> avoid getting the TOAST code involved unless there are wide fields to
>> deal with.

> And have heap_deform_tuple / heap_getattr palloc and memcpy the the datum on
> the way out? Or wait until detoast time and then do it?

No, heap_deform_tuple / heap_getattr are not responsible for palloc'ing
anything, only for computing appropriate pointers into the tuple.
Existing functions that use PG_DETOAST_DATUM would incur a palloc to
produce a 4-byte-header version of a short-header datum. We could then
work on modifying one function at a time to use some alternative macro
that doesn't force a useless palloc, but the system wouldn't be broken
meanwhile; and only the high-traffic functions would be worth fixing
at all.

The point I'm trying to get across here is to do things one small step
at a time; if you insist on a "big bang" patch then it'll never get
done. You might want to go back and review the CVS history for some
other big changes like TOAST and the version-1 function-call protocol
to see our previous uses of this approach.

regards, tom lane

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Bruce Momjian" <bruce(at)momjian(dot)us>
Cc:	"Greg Stark" <gsstark(at)mit(dot)edu>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 16:46:21
Message-ID:	8764a680pu.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Bruce Momjian" <bruce(at)momjian(dot)us> writes:

>> But I'm surprised, I had the impression that you were reluctant to consider
>> memcpying all the data all the time.
>
> Uh, if the macros can read 1 and 4-byte headers, why do we need to
> allocate memory for them?

Because the macros can't read 1 and 4-byte headers. If they could we would
have the problem with VARDATA for code sites that write to the data before
they write the size.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 17:01:06
Message-ID:	200702131701.l1DH16912560@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark wrote:
>
> "Bruce Momjian" <bruce(at)momjian(dot)us> writes:
>
> >> But I'm surprised, I had the impression that you were reluctant to consider
> >> memcpying all the data all the time.
> >
> > Uh, if the macros can read 1 and 4-byte headers, why do we need to
> > allocate memory for them?
>
> Because the macros can't read 1 and 4-byte headers. If they could we would
> have the problem with VARDATA for code sites that write to the data before
> they write the size.

OK, so what if we always create 4-byte headers, and read both.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	"Bruce Momjian" <bruce(at)momjian(dot)us>, "Greg Stark" <gsstark(at)mit(dot)edu>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 17:02:53
Message-ID:	1470.1171386173@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Bruce Momjian" <bruce(at)momjian(dot)us> writes:
>> Uh, if the macros can read 1 and 4-byte headers, why do we need to
>> allocate memory for them?

> Because the macros can't read 1 and 4-byte headers. If they could we would
> have the problem with VARDATA for code sites that write to the data before
> they write the size.

The way I see this working is that VARDATA keeps its current behavior
and hence can only be used with datums that are known to be in
4-byte-header form; hence, to avoid breaking code that uses it,
PG_DETOAST_DATUM has to produce a 4-byte-header datum always.

After we have the infrastructure in place, we'd make a pass over
high-traffic functions to replace uses of PG_DETOAST_DATUM with
something that doesn't forcibly expand 1-byte-header datums, and replace
uses of VARDATA on the result with something that handles both header
formats (and would be unsuitable for generating result datums, since
it'd have to assume that the length is already filled in).

I don't see any good reason why datatype-specific functions would ever
need to generate the short-header format directly. The only point where
it's worth trimming the header size is during heap_form_tuple, and we
can do it there at no significant efficiency cost. So uses of VARDATA
in connection with building a new datum need not be touched.

I'm inclined also to suggest that VARSIZE() need only support 4-byte
format: we could have a second macro that understands both formats and
gets used in the same high-traffic functions in which we are replacing
uses of VARDATA(). There's no benefit in making other sites support
1-byte format for VARSIZE() if they aren't going to support it for
VARDATA().

regards, tom lane

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Greg Stark" <gsstark(at)mit(dot)edu>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 17:09:13
Message-ID:	871wku7znq.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> The point I'm trying to get across here is to do things one small step
> at a time; if you insist on a "big bang" patch then it'll never get
> done. You might want to go back and review the CVS history for some
> other big changes like TOAST and the version-1 function-call protocol
> to see our previous uses of this approach.

Believe me I'm not looking for a "big bang" approach. It's just that I've
found problems with any of the incremental approaches. I suppose it's
inevitable that "big bang" approaches always seem like the most perfect since
they automatically involve fixing anything that could be a problem.

You keep suggesting things that I've previously considered and rejected --
perhaps prematurely. Off the top of my head I recall the following four
options from our discussions. It looks like we're circling around option 4.

As long as we're willing to live with the palloc/memcpy overhead at least for
now given that we can reduce it by whittling away at the sites that use the
old macros, that seems like a good compromise and the shortest development
path except perhaps option 1.

And I take it you're not worried about sites that might not detoast a datum or
detoast one in the wrong memory context where previously they were guaranteed
it wouldn't generate a copy? In particular I'm worried about btree code and
plpgsql row/record handling.

I'm not sure what to do about the alignment issue. We could just never align
1-byte headers. That would work just fine as long a the data types that need
alignment don't get ported to the new macros. It seems silly to waste space on
disk in order to save a cpu memcpy that we aren't even going to be saving for
now anyways.

Option 1)

We detect cases where the typmod guarantees either a fixed size or a maximum
size < 256 bytes. In which case instead of copying the typlen from pg_type
into tupledesc and the table's attlen we use the implied attlen or store -3
indicating single byte headers everywhere.

Cons: This implies pallocing and memcpying the datum in heap_deform_tuple.

It doesn't help text or varchar(xxxx) at all even if we mostly store
small data in them.

Pros: it buys us 0-byte headers for things like numeric(1,0) and even char(n)
on single-byte encodings. It buys us 1-byte headers for most numerics
and char(n) or varchar(n).

Option 2)

We have heap_form_tuple/heap_deform_tuple compress and uncompress the headers.
The rule is that headers are always compressed in a tuple and never compressed
when at the end of a datum.

Cons: This implies pallocing and memcpying the datum in heap_deform_tuple

It requires restricting the range of 1-byte headers to 127 or 63 bytes
and always uses 1 byte even for fixed size data. (We could get 0-byte
headers for a small range (ascii characters and numeric integers up to
127) but then 1-byte headers would be down to 31 byte data.)

It implies a different varlena format for on-disk and in-memory

Pros: it works for text/varchar as long as we store small data.

It lets us use network byte order or some other format for the on-disk
headers without changing the macros to access in-memory headers.

Option 3)

We have the toaster (or heap_form_tuple as a shortcut) compress the headers
but delay decompression until pg_detoast_datum. tuples only contain compressed
headers but Datums sometimes point to compressed headers and sometimes
uncompressed headers just as they sometimes point to toasted data and
sometimes detoasted data.

Cons: This still implies pallocing and memcpying the datum at least for now

There could be cases where code expects to deform_tuple and be
guaranteed a non-toasted pointer into the tuple.

requires replacing VARATT_SIZEP with SET_VARLENA_LEN()

Pros: It allows for future macros to examine the compressed datum without
decompressing it.

Option 4)

We have compressed data constructed on the fly everywhere possible.

Cons: requires replacing VARATT_SIZEP and also requires hacking VARDATA to
find the data in the appropriate place. Might need an additional pair of
macros for backwards compatibility in code that really needs to
construct a 4-byte headered varlena.

fragility with risk of VARSIZE / VARDATA being filled in out of order

Requires changing header to be in network byte order all the time.

Pros: one consistent representation for varlena everywhere.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

From:	mark(at)mark(dot)mielke(dot)cc
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Gregory Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 17:18:51
Message-ID:	20070213171851.GA16223@mark.mielke.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hey all:

This is a randomly inserted distraction to tell you that I really like to
read about these ideas. No input from myself at this point. I'm happy with
the direction you are taking.

Thanks,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	"Greg Stark" <gsstark(at)mit(dot)edu>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 17:43:19
Message-ID:	1884.1171388599@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> You keep suggesting things that I've previously considered and rejected --
> perhaps prematurely. Off the top of my head I recall the following four
> options from our discussions. It looks like we're circling around option 4.

No, I think I'm arguing for option 3.

> And I take it you're not worried about sites that might not detoast a datum or
> detoast one in the wrong memory context where previously they were guaranteed
> it wouldn't generate a copy? In particular I'm worried about btree code and
> plpgsql row/record handling.

Any such site is already broken, because with the proposed design, code
is only exposed to short-header datums if it is also exposed to toasted
datums. It's possible that we would find a few places that hadn't
gotten converted, but any such place is a must-fix anyway. In practice
the TOAST support has been in there long enough that I don't think we'll
find any ... at least not in the core code. It's entirely possible that
there is user-defined code out there that uses varlena format and isn't
careful about detoasting. This is an argument for allowing opt-out
(see below).

> I'm not sure what to do about the alignment issue. We could just never align
> 1-byte headers. That would work just fine as long a the data types that need
> alignment don't get ported to the new macros. It seems silly to waste space on
> disk in order to save a cpu memcpy that we aren't even going to be saving for
> now anyways.

No, a datum that is in 1-byte-header format wouldn't have any special
alignment inside a tuple. There are two paths to get at it: detoast it
(producing an aligned, 4-byte-header, palloc'd datum) or use the
still-to-be-named macros that let you access the unaligned content
directly.

I'm inclined to think that we might want to set things up so that
varlena datatypes can individually opt-in or opt-out of this treatment;
a datatype that needs alignment of its content might well wish to
opt-out to avoid copying overhead. We could do that either with a
different typlen code, or still typlen -1 but pay attention to whether
typalign is 'c'.

> Option 1)

> We detect cases where the typmod guarantees either a fixed size or a maximum
> size < 256 bytes.

After last week I would veto this option anyway: it fails unless we
always know typmod exactly, and I'm here to tell you we don't.

regards, tom lane

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 17:52:22
Message-ID:	200702131752.l1DHqMa26770@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> > We detect cases where the typmod guarantees either a fixed size or a maximum
> > size < 256 bytes.
>
> After last week I would veto this option anyway: it fails unless we
> always know typmod exactly, and I'm here to tell you we don't.

If we can pull this off, it handles short values stored in TEXT fields
too, which is a big win over the typmod idea I had.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Greg Stark" <gsstark(at)mit(dot)edu>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 18:28:43
Message-ID:	87ps8d7vz8.fsf@stark.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> You keep suggesting things that I've previously considered and rejected --
>> perhaps prematurely. Off the top of my head I recall the following four
>> options from our discussions. It looks like we're circling around option 4.
>
> No, I think I'm arguing for option 3.

oops, sorry, got my own numbering mixed up.

>> And I take it you're not worried about sites that might not detoast a datum or
>> detoast one in the wrong memory context where previously they were guaranteed
>> it wouldn't generate a copy? In particular I'm worried about btree code and
>> plpgsql row/record handling.
>
> Any such site is already broken, because with the proposed design, code
> is only exposed to short-header datums if it is also exposed to toasted
> datums.

Well the toaster doesn't run until we were about to put the tuple on disk. I
was afraid there might be circumstances where data from in-memory tuples are
returned and we use them without fearing them being toasted. In particular I
was fearing the record/row handling in plpgsql.

I was also worried about the tuples that indexes stores. But it seems they
don't go through heap_form_tuple at all. It does seem that the
INDEX_TOAST_HACK would need some work -- but its existence means there can't
be anyone else trusting index tuples not to contain toasted data which is
convenient.

> I'm inclined to think that we might want to set things up so that
> varlena datatypes can individually opt-in or opt-out of this treatment;
> a datatype that needs alignment of its content might well wish to
> opt-out to avoid copying overhead. We could do that either with a
> different typlen code, or still typlen -1 but pay attention to whether
> typalign is 'c'.

Any reason to go with typalign? You can always use typalign='i' to get the
regular headers.

Alternatively, what does the trailing "a" in varlena signify? Would this be
varlenb?

>> Option 1)
>
>> We detect cases where the typmod guarantees either a fixed size or a maximum
>> size < 256 bytes.
>
> After last week I would veto this option anyway: it fails unless we
> always know typmod exactly, and I'm here to tell you we don't.

Oh, heh, timing is everything I guess.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 18:32:11
Message-ID:	200702131832.l1DIWBR02878@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark wrote:
> Alternatively, what does the trailing "a" in varlena signify? Would this be
> varlenb?

"attribute"

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	"Greg Stark" <gsstark(at)mit(dot)edu>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-13 18:40:48
Message-ID:	2715.1171392048@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Any such site is already broken, because with the proposed design, code
>> is only exposed to short-header datums if it is also exposed to toasted
>> datums.

> Well the toaster doesn't run until we were about to put the tuple on disk. I
> was afraid there might be circumstances where data from in-memory tuples are
> returned and we use them without fearing them being toasted. In particular I
> was fearing the record/row handling in plpgsql.

True, which is another argument why datatype-specific functions should
never return short-header datums: there are places that call such
functions directly and probably expect an untoasted result. I'm not
very concerned about stuff that's gone through a tuple though: in most
cases that could be expected to contain toasted values anyway.

regards, tom lane

From:	elein <elein(at)varlena(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-15 01:31:31
Message-ID:	20070215013131.GM8879@varlena.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 13, 2007 at 01:32:11PM -0500, Bruce Momjian wrote:
> Gregory Stark wrote:
> > Alternatively, what does the trailing "a" in varlena signify? Would this be
> > varlenb?
>
> "attribute"
>
> --

Actually varlena stands for "variable length array."

elein
elein(at)varlena(dot)com

From:	elein <elein(at)varlena(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Gregory Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-15 01:36:44
Message-ID:	20070215013644.GN8879@varlena.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I want to register, just in principle, that I object to changing
the structure of a varlena.

The idea behind the data type is simple, clean and fast. And it is
easily understood by developers and by people developing applications
and functions in PostgreSQL.

Of course you will do what you will. Be careful out there.

elein
elein(at)varlena(dot)com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	elein <elein(at)varlena(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Gregory Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject:	Re: Variable length varlena headers redux
Date:	2007-02-15 02:02:33
Message-ID:	10255.1171504953@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

elein <elein(at)varlena(dot)com> writes:
> I want to register, just in principle, that I object to changing
> the structure of a varlena.

Indeed, I'm doing my best to restrain Greg from changing what a
datatype-specific function will see by default. The stuff with
variable-length headers should be confined to a relatively small
number of functions in which we're willing to pay a code-ugliness
penalty for speed.

regards, tom lane

From:	Mario Weilguni <mweilguni(at)sime(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	ERROR: failed to build any 8-way joins
Date:	2007-02-15 14:28:23
Message-ID:	200702151528.23668.mweilguni@sime.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Interesting problem, seems to be planer related:
select 1
from beitraege bei,
b_zuordnungen bz,
(select bei_id
from b_zuordnungen bz,
ben_zuordnungen z,
strukturelemente se
where se.id = z.str_id
and se.sty_id = (select id from strukturtypen where code='GEO')
and z.str_id = bz.str_id
and z.ben_id = 100
union
select id from beitraege where kz_edit <> 'N' and useraend = 100
) as foo
where bz.bei_id = bei.id
and foo.bei_id = bei.id
and bei.red_id in (select gba.grp_id
from grp_ben_applikationen gba,
grp_gruppen grp
where grp.id = gba.grp_id
and grp.kz_aktiv='J'
and gba.app_id in (select id from grp_applikationen where code in ('app1', 'app2'))
and gba.ben_id = 100)
and (bei.bei_id_frei is null or bei.kz_edit='N')
and (bei.bt_id, bz.str_id) in ((96,1259036), (96,2688382) )
and bei.red_id=1777712
and bei.id in (
select bzu.bei_id
from b_zuordnungen bzu,
strukturelemente se
where bzu.str_id = se.id
and se.id = 1773715
)
and bei.id=10157309;
ERROR: failed to build any 8-way joins

Interesting: remove any of the above where conditions solves the problem go away, e.g. removing "and bei.id=10157309".

Testet with 8.2.1 and 8.1.4, same effect on both systems. Because of the large number of tables involved it's difficult to find a self contained patch, but if necessary I'll give it a try.
I could give 8.2.3 a try, but I doubt this will help.

Any ideas?

Regards
Mario Weilguni

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Mario Weilguni <mweilguni(at)sime(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ERROR: failed to build any 8-way joins
Date:	2007-02-15 15:25:25
Message-ID:	20070215152525.GI4682@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Mario Weilguni wrote:

> Interesting: remove any of the above where conditions solves the problem go away, e.g. removing "and bei.id=10157309".
>
> Testet with 8.2.1 and 8.1.4, same effect on both systems. Because of the large number of tables involved it's difficult to find a self contained patch, but if necessary I'll give it a try.
> I could give 8.2.3 a try, but I doubt this will help.

I think a similar problem was fixed after 8.2.3 was released, so you may
want to check out the REL8_2_STABLE branch.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Mario Weilguni <mweilguni(at)sime(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ERROR: failed to build any 8-way joins
Date:	2007-02-15 15:38:54
Message-ID:	200702151638.54224.mweilguni@sime.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thanks for the info. Is there a fix for 8.1 branch, the production system is
still 8.1.

Regards
Mario Weilguni

Am Donnerstag, 15. Februar 2007 16:25 schrieb Alvaro Herrera:
> Mario Weilguni wrote:
> > Interesting: remove any of the above where conditions solves the problem
> > go away, e.g. removing "and bei.id=10157309".
> >
> > Testet with 8.2.1 and 8.1.4, same effect on both systems. Because of the
> > large number of tables involved it's difficult to find a self contained
> > patch, but if necessary I'll give it a try. I could give 8.2.3 a try, but
> > I doubt this will help.
>
> I think a similar problem was fixed after 8.2.3 was released, so you may
> want to check out the REL8_2_STABLE branch.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Mario Weilguni <mweilguni(at)sime(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: ERROR: failed to build any 8-way joins
Date:	2007-02-15 16:28:17
Message-ID:	26533.1171556897@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Mario Weilguni <mweilguni(at)sime(dot)com> writes:
> ERROR: failed to build any 8-way joins

Could you provide a self-contained test case for this? You probably
don't need any data, just the table schemas.

I fixed a problem with a similar symptom a couple days ago, but that was
in logic that was new in 8.2 ...

regards, tom lane