Quick Links

Re: avoiding tuple copying in btree index builds

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: avoiding tuple copying in btree index builds
Date:	2014-06-16 19:49:57
Message-ID:	CA+Tgmoao0WNzZ+bc7nAKy9yLpsHknWZaXSPkkw2KcwXZzWGVKw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Jun 3, 2014 at 4:38 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Jun 1, 2014 at 3:26 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Tue, May 6, 2014 at 12:04 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Mon, May 5, 2014 at 2:13 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
>>> wrote:
>>> > On 2014-05-05 13:52:39 -0400, Robert Haas wrote:
>>> >> Today, I discovered that when building a btree index, the btree code
>>> >> uses index_form_tuple() to create an index tuple from the heap tuple,
>>> >> calls tuplesort_putindextuple() to copy that tuple into the sort's
>>> >> memory context, and then frees the original one it built. This seemed
>>> >> inefficient, so I wrote a patch to eliminate the tuple copying. It
>>> >> works by adding a function tuplesort_putindextuplevalues(), which
>>> >> builds the tuple in the sort's memory context and thus avoids the need
>>> >> for a separate copy. I'm not sure if that's the best approach, but
>>> >> the optimization seems wortwhile.
>>> >
>>> > Hm. It looks like we could quite easily just get rid of
>>> > tuplesort_putindextuple(). The hash usage doesn't look hard to convert.
>>>
>>> I glanced at that, but it wasn't obvious to me how to convert the hash
>>> usage. If you have an idea, I'm all ears.
>>
>> I also think it's possible to have similar optimization for hash index
>> incase it has to spool the tuple for sorting.
>>
>> In function hashbuildCallback(), when buildstate->spool is true, we
>> can avoid to form index tuple. To check for nulls before calling
>>
>> _h_spool(), we can traverse the isnull array.
>
> Hmm, that might work. Arguably it's less efficient, but on the other
> hand if it avoids forming the tuple sometimes it might be MORE
> efficient. And anyway the difference might not be enough to matter.

On further review, this is definitely the way to go: it's a
straight-up win. The isnull array is never more than one element in
length, so testing the single element is quite trivial. The
attached, revised patch provides a modest but useful speedup for both
hash and btree index builds.

Anyone see any reason NOT to do this? So far it looks like a
slam-dunk from here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
avoid-index-build-tuple-copy-v2.patch	text/x-patch	7.3 KB

In response to

Re: avoiding tuple copying in btree index builds at 2014-06-03 20:38:11 from Robert Haas

Responses

Re: avoiding tuple copying in btree index builds at 2014-06-17 00:10:49 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2014-06-16 19:58:09	Re: review: Non-recursive processing of AND/OR lists
Previous Message	Ronan Dunklau	2014-06-16 19:36:05	Re: IMPORT FOREIGN SCHEMA statement