Virtual tuple slots versus TOAST: big problem

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: Alexey Beschiokov <proforg(at)maloletka(dot)ru>, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Virtual tuple slots versus TOAST: big problem
Date: 2005-11-19 17:22:05
Message-ID: 20874.1132420925@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I looked into this 8.1 bug reported by Alexey Beschiokov:
http://archives.postgresql.org/pgsql-bugs/2005-11/msg00192.php
The executive summary is: it looks like a kluge solution isn't hard,
but solving it in a more reasonable fashion is going to require some
significant API changes inside the backend :-(

The problem is that ExecInsert() does ExecMaterializeSlot() to get a
physical tuple from the virtual tuple slot it's initially handed.
After this step, the TupleTableSlot contains that same physical tuple.
ExecInsert then passes the bare tuple to some subroutines and the Slot
to other subroutines. In the particular case shown by Alexey, the
sequence is:
ExecConstraints is called with the slot
heap_insert (and thence tuptoaster.c) is called with the bare tuple
ExecInsertIndexTuples is called with the slot

The problem is that ExecConstraints accesses some fields of the slot,
causing partial extraction of tuple fields and setup of the tts_values[]
array within the slot. After that, the tuptoaster proceeds to smash
down the tuple to a size it likes, which it does by scribbling on the
HeapTuple structure it's handed. That makes the TupleTableSlot
structure inconsistent --- it has tts_values fields pointing at memory
that's no longer part of the tuple actually stored in the slot. When
ExecInsertIndexTuples then tries to extract more fields from the slot,
a crash is not unlikely.

(It's annoying that we didn't find this during beta, but a failure
requires ExecInsertIndexTuples to try to access fields to the right of
the last one fetched by ExecConstraints, and it only matters if the
tuple actually got toasted in between, so it is a bit of a corner case.)

ExecUpdate has the same bug. I don't think there are any other places,
because ExecMaterializeSlot isn't used elsewhere ATM. The problem did
not exist before 8.1 because TupleTableSlots didn't contain extra info
beyond the bare tuple, so tuptoaster could hack that without rendering
the Slot inconsistent.

I think that a kluge fix is possible by setting tts_nvalid to zero after
invoking heap_insert or heap_update, so that ExecInsertIndexTuples will
be forced to recompute tts_values instead of reusing the previous data.
This is probably the right thing to do in the 8.1 branch, but it's
incredibly ugly and we need to fix it better going forward.

"A better fix" seems to require passing the TupleTableSlot, not just the
bare tuple, down to the toaster --- else there is no way for the toaster
to update the data structure that it's accidentally invalidating. This
seems like it might be a good idea anyway on performance grounds: we
could save one cycle of heap_deform_tuple and heap_formtuple in the case
where toasting is needed, if the toaster is invoked on the tuple while
it's still in virtual-slot format. The problem is that given the
current structure, that means changing the APIs of heap_insert and
heap_update, or else making near-duplicate versions that take a
TupleTableSlot instead of a bare tuple. Neither of these things seem
real attractive. If we wanted to avoid forming a physical tuple until
the last moment we'd also need to change the APIs associated with
triggers, ie make them work on Slots not tuples. This'd be even more
invasive. It would likely be cleaner and more efficient in the long
run, but there's a lot of code to touch, and breaking user-defined
triggers doesn't seem palatable at all.

Any thoughts on the best way to proceed?

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org, Alexey Beschiokov <proforg(at)maloletka(dot)ru>, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: Virtual tuple slots versus TOAST: big problem
Date: 2005-11-20 11:04:59
Message-ID: 1132484699.4959.363.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2005-11-19 at 12:22 -0500, Tom Lane wrote:
> "A better fix" seems to require passing the TupleTableSlot, not just the
> bare tuple, down to the toaster --- else there is no way for the toaster
> to update the data structure that it's accidentally invalidating. This
> seems like it might be a good idea anyway on performance grounds: we
> could save one cycle of heap_deform_tuple and heap_formtuple in the case
> where toasting is needed, if the toaster is invoked on the tuple while
> it's still in virtual-slot format. The problem is that given the
> current structure, that means changing the APIs of heap_insert and
> heap_update, or else making near-duplicate versions that take a
> TupleTableSlot instead of a bare tuple. Neither of these things seem
> real attractive.

We changed the heap_insert API for 8,1 so would it be a problem to
change it again for 8.2

Other API changes sound likely, so seems less of a problem.

> If we wanted to avoid forming a physical tuple until
> the last moment we'd also need to change the APIs associated with
> triggers, ie make them work on Slots not tuples. This'd be even more
> invasive. It would likely be cleaner and more efficient in the long
> run, but there's a lot of code to touch, and breaking user-defined
> triggers doesn't seem palatable at all.

But if there are no triggers, then no problem.

Not sure I like this idea, but we could support both old and new trigger
APIs. If that was marked in the catalog, then we'd know when to change
slots into tuples.

What is the performance loss/gain by waiting?

On a different note, I'd like to consider how we can reduce the overhead
for a HeapTuple for when we do HashAgg and HashJoin. Currently the
overhead is substantial, with at least 12 bytes unused in most cases,
out of a total current overhead of > 20 bytes per tuple. Reducing that
overhead would increase the limit at which we start to do sorts.

Best Regards, Simon Riggs


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org, Alexey Beschiokov <proforg(at)maloletka(dot)ru>, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: Virtual tuple slots versus TOAST: big problem
Date: 2005-11-20 16:23:40
Message-ID: 28667.1132503820@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> On Sat, 2005-11-19 at 12:22 -0500, Tom Lane wrote:
>> ... The problem is that given the
>> current structure, that means changing the APIs of heap_insert and
>> heap_update, or else making near-duplicate versions that take a
>> TupleTableSlot instead of a bare tuple. Neither of these things seem
>> real attractive.

> We changed the heap_insert API for 8,1 so would it be a problem to
> change it again for 8.2

It's not so much heap_insert/update, it's simple_heap_insert/update,
which are called in a *lot* of places. Nonetheless, we have made such
changes before (the simple_xxx routines didn't exist at all a few years
ago), so it's not out of the question to do it again.

A more constrained change would just alter the tuptoaster API so that
it hands back an entirely new tuple instead of scribbling on the header
it's handed. This doesn't save any overhead but it fixes the problem
in a reasonably robust way, instead of expecting callers to compensate.
(I'm unconvinced that my quick hack of yesterday plugged all the holes.)
We'd need to add a couple lines to the heapam.c routines to free the
separately allocated tuples, but that's no big deal.

> What is the performance loss/gain by waiting?

It's really hard to tell, but in any case it's nil if the tuple isn't
long enough to need toasting. So I'm not sure we should go through
major pushups to change it.

regards, tom lane


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgreSQL(dot)org, Alexey Beschiokov <proforg(at)maloletka(dot)ru>
Subject: Re: Virtual tuple slots versus TOAST: big problem
Date: 2005-11-20 16:34:25
Message-ID: 4380A591.10208@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/20/2005 11:23 AM, Tom Lane wrote:

> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
>> On Sat, 2005-11-19 at 12:22 -0500, Tom Lane wrote:
>>> ... The problem is that given the
>>> current structure, that means changing the APIs of heap_insert and
>>> heap_update, or else making near-duplicate versions that take a
>>> TupleTableSlot instead of a bare tuple. Neither of these things seem
>>> real attractive.
>
>> We changed the heap_insert API for 8,1 so would it be a problem to
>> change it again for 8.2
>
> It's not so much heap_insert/update, it's simple_heap_insert/update,
> which are called in a *lot* of places. Nonetheless, we have made such
> changes before (the simple_xxx routines didn't exist at all a few years
> ago), so it's not out of the question to do it again.
>
> A more constrained change would just alter the tuptoaster API so that
> it hands back an entirely new tuple instead of scribbling on the header
> it's handed. This doesn't save any overhead but it fixes the problem
> in a reasonably robust way, instead of expecting callers to compensate.
> (I'm unconvinced that my quick hack of yesterday plugged all the holes.)
> We'd need to add a couple lines to the heapam.c routines to free the
> separately allocated tuples, but that's no big deal.
>
>> What is the performance loss/gain by waiting?
>
> It's really hard to tell, but in any case it's nil if the tuple isn't
> long enough to need toasting. So I'm not sure we should go through
> major pushups to change it.

Assuming that the saved header values don't need to be recomputed if the
tuple doesn't need to be toasted at all, I think that toasting is
expensive enough so that recomputing those values is hardly noticed.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgreSQL(dot)org, Alexey Beschiokov <proforg(at)maloletka(dot)ru>
Subject: Re: Virtual tuple slots versus TOAST: big problem
Date: 2005-11-20 16:40:47
Message-ID: 28803.1132504847@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> On 11/20/2005 11:23 AM, Tom Lane wrote:
> Assuming that the saved header values don't need to be recomputed if the
> tuple doesn't need to be toasted at all, I think that toasting is
> expensive enough so that recomputing those values is hardly noticed.

Yeah, probably so. I'll just make the localized API change then.

One side effect of changing it as I suggest is that when control comes
back out of heap_insert/update, the caller will still have the
originally passed tuple and not the toasted version that actually went
to disk. AFAICS this is OK (of course we have to make sure t_self and
other header fields are updated in both copies). The main impact of
the change is that FormIndexDatum will receive the pre-toasting version
of the tuple, but that is actually a *good* thing --- right now, it just
has to fetch back untoasted fields anyway, if there's anything it needs
to do with them.

regards, tom lane