Reducing overhead for repeat de-TOASTing

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Reducing overhead for repeat de-TOASTing
Date: 2008-06-16 19:35:26
Message-ID: 6861.1213644926@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Recent discussions with the PostGIS hackers led me to think about ways
to reduce overhead when the same TOAST value is repeatedly detoasted.
In the example shown here
http://archives.postgresql.org/pgsql-hackers/2008-06/msg00384.php
90% of the runtime is being consumed by repeated detoastings of a single
datum. That is certainly an outlier case, but we've heard complaints of
TOAST being slow before. The solution I'm about to propose should fix
this, and as a bonus it will reduce the problem of memory leakage when
a detoasted value doesn't get pfreed.

What I am imagining is that the tuple toaster will maintain a cache of
recently-detoasted values, and that pg_detoast_datum() returns a pointer
to a cache entry rather than a freshly palloc'd value. The cache lookup
key is the toast table OID plus value OID. Now pg_detoast_datum() has no
idea where the pointer it returns will go, so the big problem with this
scheme is that it's hard to tell how long a cache entry needs to live.
But we can resolve that by ruling that the required lifetime is the same
as the value would have had if it'd really been palloc'd --- IOW, until
the memory context that was current at the time gets deleted or reset.
This can be implemented by allowing the toaster cache to hook into the
memory context delete/reset calls. (We have no such capability now, but
there have been some previous cases where it would've come in handy, so
I think this would be a useful extension anyhow. It should be an actual
hook that can be used by anyone, not something where mcxt.c is
specifically passing control to tuptoaster.) Once a cache entry is no
longer required anywhere, it can be removed, and will be if needed to
shrink the cache to some target size (work_mem perhaps, or is it worth
inventing a separate GUC for this?). But we can hang onto the value
longer if it's not taking up needed space. This is important since
the use pattern exhibited in the PostGIS problem involves repeat
detoastings that are separated by MemoryContextResets --- if we free
the cache entry as soon as possible, we'll not gain anything.

The other cache management problem that would have to be solved is
to invalidate entries when the underlying TOAST table or individual
TOAST value is deleted. This is exactly like the problem for tuples
in the syscaches, and can be solved via sinval messaging. (The
reason we need to worry about this is to guard against the possibility
that the same identifier is re-used for a new TOAST value; otherwise
we could just let dead values age out of the cache.)

A change of this sort has the potential to break a lot of code, but
I think the only thing we'd really have to do is turn PG_FREE_IF_COPY()
into a no-op. In general, functions are not supposed to scribble on
pass-by-reference input datums, and since most functions don't actually
distinguish whether their inputs got detoasted (except perhaps by using
PG_FREE_IF_COPY()), that means that they won't be scribbling on the
toast cache entry either.

It would be possible to extend this concept to caching toast slice
fetches (pg_detoast_datum_slice() calls) if we add the offset/length
arguments to the cache lookup key. I'm not certain if that's worth the
trouble though --- any feelings about that? Also, that case is much
more likely to have callers that think they can scribble on the result,
since they "know" it must be a palloc'd value.

One unsolved problem is that this scheme doesn't provide any way to cache
the result of decompressing an inline-compressed datum, because those have
no unique ID that could be used for a lookup key. This puts a bit of a
damper on the idea of making PG_FREE_IF_COPY() a no-op --- if it is, then
detoasting a datum of that type would indeed cause a memory leak. The
best idea I have at the moment is to continue to have inline decompression
produce a palloc'd value, leave PG_FREE_IF_COPY() as-is, and arrange
for pfree() on toast cache entries to be a no-op. (Which we can do by
setting up a special memory context methods pointer for them.) It'd
be cool if there were a way to cache decompression though. Ideas?

Comments, better ideas? Anyone think this is too much trouble to take
for the problem?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-06-16 19:38:32 Re: Question about Encoding a Custom Type
Previous Message David E. Wheeler 2008-06-16 19:10:22 Re: Question about Encoding a Custom Type