Re: avoiding tuple copying in btree index builds

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: avoiding tuple copying in btree index builds
Date: 2014-06-17 13:57:37
Message-ID: CA+TgmoZE-qGLioTijmBH2KUVn4pwHAm3Likty9BpRdyTzJfn0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 16, 2014 at 8:10 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On further review, this is definitely the way to go: it's a
>> straight-up win. The isnull array is never more than one element in
>> length, so testing the single element is quite trivial. The
>> attached, revised patch provides a modest but useful speedup for both
>> hash and btree index builds.
>
>> Anyone see any reason NOT to do this? So far it looks like a
>> slam-dunk from here.
>
> If index_form_tuple leaks any memory in the sort context, this would be
> not so much a slam-dunk win as a complete disaster. I'm not sure that
> no-leak is a safe assumption, especially in cases where we do toast
> compression of the index datums. (In fact, I'm pretty sure that the
> reason it was coded like this originally was exactly to avoid that
> assumption.)

Yes, that would be bad. The reason I think it's probably OK is that
index_form_tuple seems to have had, from ancient times (cf.
f67e79045), code of non-trivial to free any memory that's allocated as
a result of de-TOASTing, which we presumably would not bother to do if
we were counting on memory context resets to clean things up. It
calls toast_compress_datum(), which is similarly careful. The most
complicated code path appears to be the VARATT_IS_EXTERNAL() case,
where we call heap_tuple_fetch_attr(), which calls
toast_fetch_datum(). I'm kind of wondering whether that can really
happen, though, because I didn't think indexes could use TOAST
storage. Even if it does happen I don't see any obvious reason why it
should leak, but it'd be harder to be certain that it doesn't.

> Assuming that you inspect the code and determine it's safe, the patch
> had better decorate index_form_tuple with dire warnings about not leaking
> memory; because even if it's safe today, somebody might break it tomorrow.
>
> On a micro-optimization level, it might be worth passing the TID as
> ItemPointer not ItemPointerData (ie, pass a pointer until we get to
> the point of actually inserting the TID into the index tuple).
> I'm not sure that copying odd-size structs should be assumed to be
> efficient.

Yeah, true. Checking existing precedent, it looks like we usually
pass ItemPointer rather than ItemPointerData, so it's probably a good
idea to do this that way too for reasons of style if nothing else. I
kind of wonder whether it's really more efficient to pass an 8-byte
pointer to a 6-byte structure than to just pass the structure itself,
but it might be.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-06-17 14:02:51 Re: UPDATE SET (a,b,c) = (SELECT ...) versus rules
Previous Message Andres Freund 2014-06-17 13:48:05 Re: UPDATE SET (a,b,c) = (SELECT ...) versus rules