Re: avoiding tuple copying in btree index builds

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: avoiding tuple copying in btree index builds
Date: 2014-06-03 20:38:11
Message-ID: CA+TgmoajZ3FZrYSx_0u14ymNBYQHMoJ332AFScTEpwJtCO5tYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 1, 2014 at 3:26 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Tue, May 6, 2014 at 12:04 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, May 5, 2014 at 2:13 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
>> wrote:
>> > On 2014-05-05 13:52:39 -0400, Robert Haas wrote:
>> >> Today, I discovered that when building a btree index, the btree code
>> >> uses index_form_tuple() to create an index tuple from the heap tuple,
>> >> calls tuplesort_putindextuple() to copy that tuple into the sort's
>> >> memory context, and then frees the original one it built. This seemed
>> >> inefficient, so I wrote a patch to eliminate the tuple copying. It
>> >> works by adding a function tuplesort_putindextuplevalues(), which
>> >> builds the tuple in the sort's memory context and thus avoids the need
>> >> for a separate copy. I'm not sure if that's the best approach, but
>> >> the optimization seems wortwhile.
>> >
>> > Hm. It looks like we could quite easily just get rid of
>> > tuplesort_putindextuple(). The hash usage doesn't look hard to convert.
>>
>> I glanced at that, but it wasn't obvious to me how to convert the hash
>> usage. If you have an idea, I'm all ears.
>
> I also think it's possible to have similar optimization for hash index
> incase it has to spool the tuple for sorting.
>
> In function hashbuildCallback(), when buildstate->spool is true, we
> can avoid to form index tuple. To check for nulls before calling
>
> _h_spool(), we can traverse the isnull array.

Hmm, that might work. Arguably it's less efficient, but on the other
hand if it avoids forming the tuple sometimes it might be MORE
efficient. And anyway the difference might not be enough to matter.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2014-06-03 20:39:34 Re: json casts
Previous Message Robert Haas 2014-06-03 20:31:55 Re: Re-create dependent views on ALTER TABLE ALTER COLUMN ... TYPE?