WIP patch: reducing overhead for repeat de-TOASTing

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: WIP patch: reducing overhead for repeat de-TOASTing
Date: 2008-06-29 20:57:10
Message-ID: 5184.1214773030@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Attached is a worked-out patch for the approach proposed here:
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00777.php
namely, that cache management for de-TOASTed datums is handled
by TupleTableSlots.

To avoid premature detoasting of values that we might never need, the
patch introduces a concept of an "indirect TOAST pointer", which has
the same 0x80 or 0x01 header as an external TOAST pointer, but can
be told apart by having a different length byte. Within that we have
* pointer to original toasted field within the Slot's tuple
* pointer to the owning Slot
* pointer to decompressed copy, or NULL if not decompressed yet
Some fairly straightforward extensions to the TupleTableSlot code,
heaptuple.c, and tuptoaster.c make it all go.

My original thoughts had included turning FREE_IF_COPY() into a no-op,
but on investigation that seems impractical. One case that still
depends on that pfree is where we have palloc'd a 4-byte-header copy
of a short-header datum to support code that needs properly aligned
datum content. The solution adopted in the patch is to arrange for
pfree() applied to a cacheable detoasted object to be a no-op, whereas
it still works normally for non-cached detoasted objects. We do this
by inserting a dummy chunk header that points to a dummy memory context
whose pfree support method does nothing. I think this part of the patch
would be required for any toast caching method, not just this one.

What I like about this patch is that it's a fairly small-footprint
change, it doesn't add much overhead, and it covers caching of
decompression for in-line-compressed datums as well as the out-of-line
case.

One thing I really *don't* like about it is that it requires everyplace
that copies Datums to know about indirect pointers: in general, the copy
must be a copy of the original toasted Datum, not of the indirect
pointer, else we have indirect pointers that can outlive their owning
TupleTableSlot (or at least outlive its current tuple cycle). There
only seem to be about half a dozen such places in the current code,
but still it seems a rather fragile coding rule.

After playing with it for a little bit, I'm not convinced that it buys
enough performance win to be worth applying --- the restriction of cache
lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive.
(For example, sorts that involve toasted sort keys continue to suck,
because the tuples being sorted aren't in Slots.) It would probably
fix the specific case that the PostGIS hackers were complaining of,
but I think we need something more.

Still, I wanted to get it into the archives because the idea of indirect
toast pointers might be useful for something else.

regards, tom lane

Attachment Content-Type Size
toast-indirect-1.patch.gz application/octet-stream 11.3 KB

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: WIP patch: reducing overhead for repeat de-TOASTing
Date: 2008-06-30 18:56:38
Message-ID: 87iqvq4uqh.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> After playing with it for a little bit, I'm not convinced that it buys
> enough performance win to be worth applying --- the restriction of cache
> lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive.
> (For example, sorts that involve toasted sort keys continue to suck,
> because the tuples being sorted aren't in Slots.) It would probably
> fix the specific case that the PostGIS hackers were complaining of,
> but I think we need something more.
>
> Still, I wanted to get it into the archives because the idea of indirect
> toast pointers might be useful for something else.

I do like that it handles even inline-compressed cases. What I didn't like
about the managed cache was that it couldn't handle such cases. I could easily
imagine the PostGIS case arising for inline compressed data structures. I
wonder if it isn't worthwhile just for that case even if there's a further
cache behind it for repeated fetches of out-of-line data.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's RemoteDBA services!


From: Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: WIP patch: reducing overhead for repeat de-TOASTing
Date: 2008-07-01 11:15:17
Message-ID: 486A11C5.1040809@siriusit.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Attached is a worked-out patch for the approach proposed here:
> http://archives.postgresql.org/pgsql-hackers/2008-06/msg00777.php
> namely, that cache management for de-TOASTed datums is handled
> by TupleTableSlots.
>
> To avoid premature detoasting of values that we might never need, the
> patch introduces a concept of an "indirect TOAST pointer", which has
> the same 0x80 or 0x01 header as an external TOAST pointer, but can
> be told apart by having a different length byte. Within that we have
> * pointer to original toasted field within the Slot's tuple
> * pointer to the owning Slot
> * pointer to decompressed copy, or NULL if not decompressed yet
> Some fairly straightforward extensions to the TupleTableSlot code,
> heaptuple.c, and tuptoaster.c make it all go.
>
> My original thoughts had included turning FREE_IF_COPY() into a no-op,
> but on investigation that seems impractical. One case that still
> depends on that pfree is where we have palloc'd a 4-byte-header copy
> of a short-header datum to support code that needs properly aligned
> datum content. The solution adopted in the patch is to arrange for
> pfree() applied to a cacheable detoasted object to be a no-op, whereas
> it still works normally for non-cached detoasted objects. We do this
> by inserting a dummy chunk header that points to a dummy memory context
> whose pfree support method does nothing. I think this part of the patch
> would be required for any toast caching method, not just this one.
>
> What I like about this patch is that it's a fairly small-footprint
> change, it doesn't add much overhead, and it covers caching of
> decompression for in-line-compressed datums as well as the out-of-line
> case.
>
> One thing I really *don't* like about it is that it requires everyplace
> that copies Datums to know about indirect pointers: in general, the copy
> must be a copy of the original toasted Datum, not of the indirect
> pointer, else we have indirect pointers that can outlive their owning
> TupleTableSlot (or at least outlive its current tuple cycle). There
> only seem to be about half a dozen such places in the current code,
> but still it seems a rather fragile coding rule.
>
> After playing with it for a little bit, I'm not convinced that it buys
> enough performance win to be worth applying --- the restriction of cache
> lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive.
> (For example, sorts that involve toasted sort keys continue to suck,
> because the tuples being sorted aren't in Slots.) It would probably
> fix the specific case that the PostGIS hackers were complaining of,
> but I think we need something more.
>
> Still, I wanted to get it into the archives because the idea of indirect
> toast pointers might be useful for something else.

Hi Tom,

Thanks very much for supplying the WIP patch. In the interest of testing
and feedback, I've just downloaded PostgreSQL CVS head and applied your
patch, compiled up a slightly modified version of PostGIS (without
RECHECKs on the GiST opclass) and loaded in the same dataset.

Unfortunately I have to report back that with your WIP patch applied,
timings seem to have become several orders of magnitude *worse*:

pg84(at)zeno:~$ psql -d postgis
psql (8.4devel)
Type "help" for help.

postgis=# explain analyze select count(*) from geography where centroid
&& (select the_geom from geography where id=69495);

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=7100.28..7100.29 rows=1 width=0) (actual
time=18238.932..18238.934 rows=1 loops=1)
InitPlan
-> Seq Scan on geography (cost=0.00..7092.00 rows=1 width=4387)
(actual time=27.472..69.223 rows=1 loops=1)
Filter: (id = 69495::numeric)
-> Index Scan using geography_centroid_idx on geography
(cost=0.00..8.27 rows=1 width=0) (actual time=118.371..18196.041
rows=32880 loops=1)
Index Cond: (centroid && $0)
Total runtime: 18239.918 ms
(7 rows)

In fact, even sequential scans seem to have gone up by several orders of
magnitude:

postgis=# set enable_indexscan = 'f';
SET
postgis=# set enable_bitmapscan = 'f';
SET
postgis=# explain analyze select count(*) from geography where centroid
&& (select the_geom from geography where id=69495);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Aggregate (cost=14184.01..14184.01 rows=1 width=0) (actual
time=9117.022..9117.024 rows=1 loops=1)
InitPlan
-> Seq Scan on geography (cost=0.00..7092.00 rows=1 width=4387)
(actual time=23.780..54.362 rows=1 loops=1)
Filter: (id = 69495::numeric)
-> Seq Scan on geography (cost=0.00..7092.00 rows=1 width=0)
(actual time=55.016..9073.084 rows=32880 loops=1)
Filter: (centroid && $0)
Total runtime: 9117.174 ms
(7 rows)

ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063


From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: "Mark Cave-Ayland" <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch: reducing overhead for repeat de-TOASTing
Date: 2008-07-01 12:39:52
Message-ID: 1d4e0c10807010539k5950a60cwe9865ae3518c0342@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark,

On Tue, Jul 1, 2008 at 1:15 PM, Mark Cave-Ayland
<mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk> wrote:
> Thanks very much for supplying the WIP patch. In the interest of testing and
> feedback, I've just downloaded PostgreSQL CVS head and applied your patch,
> compiled up a slightly modified version of PostGIS (without RECHECKs on the
> GiST opclass) and loaded in the same dataset.

From the discussion we had a few months ago, I don't think the no
RECHECK trick works with CVS tip anymore.

See my post on the "Remove lossy-operator RECHECK flag?" thread:
http://archives.postgresql.org/pgsql-hackers/2008-04/msg00847.php
and follow-ups.

That said, perhaps it's not the only problem you have but I thought it
was worth mentioning.

--
Guillaume


From: Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>
To: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch: reducing overhead for repeat de-TOASTing
Date: 2008-07-01 13:11:28
Message-ID: 486A2D00.5090409@siriusit.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Guillaume Smet wrote:

> From the discussion we had a few months ago, I don't think the no
> RECHECK trick works with CVS tip anymore.
>
> See my post on the "Remove lossy-operator RECHECK flag?" thread:
> http://archives.postgresql.org/pgsql-hackers/2008-04/msg00847.php
> and follow-ups.
>
> That said, perhaps it's not the only problem you have but I thought it
> was worth mentioning.

Yeah sorry, that might not have been as clear as I hoped. What I meant
was that I removed the explicit RECHECK clause from the GiST operator
class definition - since as you correctly mention, CVS tip throws an
error if this is present.

ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: WIP patch: reducing overhead for repeat de-TOASTing
Date: 2008-07-01 13:53:05
Message-ID: 2910.1214920385@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk> writes:
> Unfortunately I have to report back that with your WIP patch applied,
> timings seem to have become several orders of magnitude *worse*:

Ugh. Could I pester you to run the case under gprof or oprofile?
Or, if you can give me step-by-step directions for setting up the
test scenario, I could do that here.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch: reducing overhead for repeat de-TOASTing
Date: 2008-07-01 18:43:50
Message-ID: 17434.1214937830@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[ back on-list ]

Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk> writes:
> Thanks very much for supplying the WIP patch. In the interest of testing
> and feedback, I've just downloaded PostgreSQL CVS head and applied your
> patch, compiled up a slightly modified version of PostGIS (without
> RECHECKs on the GiST opclass) and loaded in the same dataset.
> Unfortunately I have to report back that with your WIP patch applied,
> timings seem to have become several orders of magnitude *worse*:

OK, I've reproduced the test case locally. I believe that when you
say "worse", you mean "worse than 8.3", right? And you did tell me
offlist that you were testing with --enable-cassert. CVS HEAD has
very substantially greater cassert overhead because of the
randomize_memory addition --- oprofile output for this test looks like

samples % image name symbol name
1239580 78.7721 postgres randomize_mem
143544 9.1218 libc-2.7.so memcpy
48039 3.0528 libc-2.7.so memset
13838 0.8794 postgres LWLockAcquire
12176 0.7738 postgres index_getnext
11697 0.7433 postgres LWLockRelease
10406 0.6613 postgres hash_search_with_hash_value
4739 0.3012 postgres toast_fetch_datum
4099 0.2605 postgres _bt_checkkeys
3905 0.2482 postgres AllocSetAlloc
3751 0.2384 postgres PinBuffer
3545 0.2253 postgres UnpinBuffer

I'm inclined to think that we'd better turn that off by default,
since it's not looking like it's catching anything new.

However, even with cassert turned off, the de-TOAST patch isn't
helping your test case; though it does help if I turn the query
into a join. Here's what I get for

8.3 HEAD HEAD + patch

original query [1] 4575 4669 4613
original + force_2d [2] 196 195 201
converted to join [3] 4603 4667 209
original, indexscans off 2335 2391 2426
join, indexscans off 2350 2373 124

[1] select count(*) from geography where centroid && (select the_geom from geography where id=69495);

[2] select count(*) from geography where centroid && (select force_2d(the_geom) from geography where id=69495);

[3] select count(*) from geography g1, geography g2 where g1.centroid && g2.the_geom and g2.id=69495;

All times in msec, median of 3 trials using psql \timing (times seem
to be reproducible within about 1%, if data is already cached).
Default parameters all around.

The join form of the query produces the results I expected, with
g2.the_geom coming from the outer side of a nestloop join and getting
detoasted only once. After some digging I found the reason why the
original query isn't getting any benefit: it's copying the_geom up
from an InitPlan, and nodeSubplan.c does that like this:

/*
* We need to copy the subplan's tuple into our own context, in case
* any of the params are pass-by-ref type --- the pointers stored in
* the param structs will point at this copied tuple! node->curTuple
* keeps track of the copied tuple for eventual freeing.
*/
if (node->curTuple)
heap_freetuple(node->curTuple);
node->curTuple = ExecCopySlotTuple(slot);

/*
* Now set all the setParam params from the columns of the tuple
*/
foreach(l, subplan->setParam)
{
int paramid = lfirst_int(l);
ParamExecData *prm = &(econtext->ecxt_param_exec_vals[paramid]);

prm->execPlan = NULL;
prm->value = heap_getattr(node->curTuple, i, tdesc,
&(prm->isnull));
i++;
}

So the Param is pointing at a bare toasted Datum, not an indirect
pointer in a Slot, and there's no way to avoid detoasting each time
the Param is referenced.

It would be simple enough to fix nodeSubplan.c to copy the data into
an upper-level Slot rather than a bare tuple. But this makes me wonder
how many other places are like this. In the past there wasn't that much
benefit to pulling data from a Slot instead of a bare tuple, so I'm
afraid we might have a number of similar gotchas we'd have to track
down.

The other thing that worries me is that the force_2d query measurements
suggest a runtime penalty of two or three percent for the patch, which
is higher than I was hoping for. But that isn't that much more than
the noise level in this test.

On the whole I'm still feeling pretty discouraged about this patch ...

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch: reducing overhead for repeat de-TOASTing
Date: 2008-07-02 15:05:44
Message-ID: 20080702150544.GS18252@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

> It would be simple enough to fix nodeSubplan.c to copy the data into
> an upper-level Slot rather than a bare tuple. But this makes me wonder
> how many other places are like this. In the past there wasn't that much
> benefit to pulling data from a Slot instead of a bare tuple, so I'm
> afraid we might have a number of similar gotchas we'd have to track
> down.

I compare this to adding CHECK_FOR_INTERRUPTS(): let's declare that
every usage of bare tuples is a not-very-serious bug, and we can fix
them one by one as we come across them.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch: reducing overhead for repeat de-TOASTing
Date: 2008-07-02 16:52:47
Message-ID: 13038.1215017567@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Tom Lane wrote:
>> It would be simple enough to fix nodeSubplan.c to copy the data into
>> an upper-level Slot rather than a bare tuple. But this makes me wonder
>> how many other places are like this. In the past there wasn't that much
>> benefit to pulling data from a Slot instead of a bare tuple, so I'm
>> afraid we might have a number of similar gotchas we'd have to track
>> down.

> I compare this to adding CHECK_FOR_INTERRUPTS(): let's declare that
> every usage of bare tuples is a not-very-serious bug, and we can fix
> them one by one as we come across them.

Unfortunately we can't usefully have such a rule --- consider sorting
for example. We're not going to change over to using TupleTableSlots
as the items being sorted. What I foresee if we go down this path
is that there will be some places where we can fix toasting performance
problems by inserting a Slot, and some where we can't.

regards, tom lane


From: Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch: reducing overhead for repeat de-TOASTing
Date: 2008-07-03 09:01:23
Message-ID: 486C9563.3010006@siriusit.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

> OK, I've reproduced the test case locally. I believe that when you
> say "worse", you mean "worse than 8.3", right? And you did tell me
> offlist that you were testing with --enable-cassert. CVS HEAD has
> very substantially greater cassert overhead because of the
> randomize_memory addition --- oprofile output for this test looks like
>
> samples % image name symbol name
> 1239580 78.7721 postgres randomize_mem
> 143544 9.1218 libc-2.7.so memcpy
> 48039 3.0528 libc-2.7.so memset
> 13838 0.8794 postgres LWLockAcquire
> 12176 0.7738 postgres index_getnext
> 11697 0.7433 postgres LWLockRelease
> 10406 0.6613 postgres hash_search_with_hash_value
> 4739 0.3012 postgres toast_fetch_datum
> 4099 0.2605 postgres _bt_checkkeys
> 3905 0.2482 postgres AllocSetAlloc
> 3751 0.2384 postgres PinBuffer
> 3545 0.2253 postgres UnpinBuffer
>
> I'm inclined to think that we'd better turn that off by default,
> since it's not looking like it's catching anything new.

Yes, I suspect that's probably it. I applied the patch straight to CVS
tip as I wasn't aware of any changes that would affect the unpatched
result, but I was obviously wrong ;)

(cut)

> On the whole I'm still feeling pretty discouraged about this patch ...

At the very least we have some more information on how an eventual
solution should work, and a test case to help analyse the effectiveness
of any potential solution.

ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Mark Cave-Ayland" <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP patch: reducing overhead for repeat de-TOASTing
Date: 2008-07-03 09:29:00
Message-ID: 87prpvb9k3.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>> I'm inclined to think that we'd better turn that off by default,
>> since it's not looking like it's catching anything new.

Well at least it caught the bug that Mark was performance testing with a
--enable-cassert build :/

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!


From: Mark Cave-Ayland <mark(dot)cave-ayland(at)siriusit(dot)co(dot)uk>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch: reducing overhead for repeat de-TOASTing
Date: 2008-07-03 09:38:48
Message-ID: 486C9E28.9000604@siriusit.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark wrote:

> Well at least it caught the bug that Mark was performance testing with a
> --enable-cassert build :/

True ;) I appreciated that there would be some overhead, but I didn't
think it would be that much. This was mainly since I seem to remember
there was talk a while back of enabling some assertions in production
builds.

ATB,

Mark.

--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch: reducing overhead for repeat de-TOASTing
Date: 2008-08-21 20:39:16
Message-ID: 200808212039.m7LKdGo12488@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Added to TODO:

Eliminate de-TOASTing of values when not needed

---------------------------------------------------------------------------

Tom Lane wrote:
> Attached is a worked-out patch for the approach proposed here:
> http://archives.postgresql.org/pgsql-hackers/2008-06/msg00777.php
> namely, that cache management for de-TOASTed datums is handled
> by TupleTableSlots.
>
> To avoid premature detoasting of values that we might never need, the
> patch introduces a concept of an "indirect TOAST pointer", which has
> the same 0x80 or 0x01 header as an external TOAST pointer, but can
> be told apart by having a different length byte. Within that we have
> * pointer to original toasted field within the Slot's tuple
> * pointer to the owning Slot
> * pointer to decompressed copy, or NULL if not decompressed yet
> Some fairly straightforward extensions to the TupleTableSlot code,
> heaptuple.c, and tuptoaster.c make it all go.
>
> My original thoughts had included turning FREE_IF_COPY() into a no-op,
> but on investigation that seems impractical. One case that still
> depends on that pfree is where we have palloc'd a 4-byte-header copy
> of a short-header datum to support code that needs properly aligned
> datum content. The solution adopted in the patch is to arrange for
> pfree() applied to a cacheable detoasted object to be a no-op, whereas
> it still works normally for non-cached detoasted objects. We do this
> by inserting a dummy chunk header that points to a dummy memory context
> whose pfree support method does nothing. I think this part of the patch
> would be required for any toast caching method, not just this one.
>
> What I like about this patch is that it's a fairly small-footprint
> change, it doesn't add much overhead, and it covers caching of
> decompression for in-line-compressed datums as well as the out-of-line
> case.
>
> One thing I really *don't* like about it is that it requires everyplace
> that copies Datums to know about indirect pointers: in general, the copy
> must be a copy of the original toasted Datum, not of the indirect
> pointer, else we have indirect pointers that can outlive their owning
> TupleTableSlot (or at least outlive its current tuple cycle). There
> only seem to be about half a dozen such places in the current code,
> but still it seems a rather fragile coding rule.
>
> After playing with it for a little bit, I'm not convinced that it buys
> enough performance win to be worth applying --- the restriction of cache
> lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive.
> (For example, sorts that involve toasted sort keys continue to suck,
> because the tuples being sorted aren't in Slots.) It would probably
> fix the specific case that the PostGIS hackers were complaining of,
> but I think we need something more.
>
> Still, I wanted to get it into the archives because the idea of indirect
> toast pointers might be useful for something else.
>
> regards, tom lane
>

Content-Description: toast-indirect-1.patch.gz

[ Type application/octet-stream treated as attachment, skipping... ]

>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch: reducing overhead for repeat de-TOASTing
Date: 2008-08-22 00:42:20
Message-ID: 27162.1219365740@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Added to TODO:
> Eliminate de-TOASTing of values when not needed

That's a fairly bad description of what the patch was about. I changed
it to

Reduce costs of repeat de-TOASTing of values

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: WIP patch: reducing overhead for repeat de-TOASTing
Date: 2008-08-25 17:42:30
Message-ID: 1219686150.6213.195.camel@dell.linuxdev.us.dell.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2008-06-29 at 16:57 -0400, Tom Lane wrote:
> After playing with it for a little bit, I'm not convinced that it buys
> enough performance win to be worth applying --- the restriction of cache
> lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive.
> (For example, sorts that involve toasted sort keys continue to suck,
> because the tuples being sorted aren't in Slots.) It would probably
> fix the specific case that the PostGIS hackers were complaining of,
> but I think we need something more.
>
> Still, I wanted to get it into the archives because the idea of indirect
> toast pointers might be useful for something else.
>

Thank you for posting this to the list, this does help us at Truviso
(sometimes). In some real cases, we're seeing about 15-20x improvement
of the overall query; going from about 9 seconds to under 500 ms. In
other cases that could theoretically benefit from TOAST caching, we see
no improvement at all.

As you say, the cases where it helps are fairly narrow. It's
also very susceptible to changes in the plan. For instance, if the
toasted value is on one side of a nested loop, the patch helps a
lot; but it goes back to the performance of unpatched PostgreSQL for the
same plan if the nested loop is inverted, as shown below.

Regards,
Jeff Davis

[ plan estimates are omitted for readability ]

----------- UNPATCHED -----------

Nested Loop (actual time=3.021..1020.923 rows=1000 loops=1)
-> Seq Scan on toasted (actual time=0.016..0.017 rows=1 loops=1)
-> Seq Scan on a (actual time=0.016..0.735 rows=1000 loops=1)
Total runtime: 1021.397 ms

Nested Loop (actual time=1.352..1037.207 rows=1000 loops=1)
-> Seq Scan on a (actual time=0.007..0.364 rows=1000 loops=1)
-> Seq Scan on toasted (actual time=0.001..0.002 rows=1 loops=1000)
Total runtime: 1037.653 ms

------------ PATCHED ------------
-- patch helps
Nested Loop (actual time=1.266..2.247 rows=1000 loops=1)
-> Seq Scan on toasted (actual time=0.003..0.072 rows=1 loops=1)
-> Seq Scan on a (actual time=0.006..0.328 rows=1000 loops=1)
Total runtime: 2.519 ms

-- patch has no effect
Nested Loop (actual time=1.283..1012.371 rows=1000 loops=1)
-> Seq Scan on a (actual time=0.010..0.719 rows=1000 loops=1)
-> Seq Scan on toasted (actual time=0.001..0.059 rows=1 loops=1000)
Total runtime: 1012.973 ms


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch: reducing overhead for repeat de-TOASTing
Date: 2008-08-26 00:54:28
Message-ID: 10595.1219712068@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Sun, 2008-06-29 at 16:57 -0400, Tom Lane wrote:
>> After playing with it for a little bit, I'm not convinced that it buys
>> enough performance win to be worth applying --- the restriction of cache
>> lifespan to one tuple cycle of a TupleTableSlot is awfully restrictive.

> Thank you for posting this to the list, this does help us at Truviso
> (sometimes). In some real cases, we're seeing about 15-20x improvement
> of the overall query; going from about 9 seconds to under 500 ms. In
> other cases that could theoretically benefit from TOAST caching, we see
> no improvement at all.

> As you say, the cases where it helps are fairly narrow.

Thanks for giving it a workout. Looks like we do indeed need to work on
the other approach with a more persistent toasted-object cache. But the
numbers you got are good evidence that this will be worth doing if we
can get it right.

I might try to look at this after the September commit fest (but if
anyone else wants to tackle it, feel free!)

regards, tom lane