Re: B-Tree index builds, CLUSTER, and sortsupport

From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: B-Tree index builds, CLUSTER, and sortsupport
Date: 2014-11-06 00:33:29
Message-ID: 545AC1D9.1040009@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/11/2014 02:26 AM, Peter Geoghegan wrote:> Both Robert and Heikki
expressed dissatisfaction with the fact that
> B-Tree index builds don't use sortsupport. Because B-Tree index builds
> cannot really use the "onlyKey" optimization, the historic oversight
> of not supporting B-Tree builds (and CLUSTER) might have been at least
> a little understandable [1]. But with the recent work on sortsupport
> for text, it's likely that B-Tree index builds will be missing out on
> rather a lot by not taking advantage of sortsupport.
>
> Attached patch modifies tuplesort to correct this oversight. What's
> really nice about it is that there is actually a net negative code
> footprint:
>
> src/backend/access/nbtree/nbtsort.c | 63 +++---
> src/backend/utils/sort/sortsupport.c | 77 ++++++--
> src/backend/utils/sort/tuplesort.c | 274 +++++++++++----------------
> src/include/utils/sortsupport.h | 3 +
> 4 files changed, 203 insertions(+), 214 deletions(-)

The code compiles and passes the test suite.

I looked at the changes to the code. The new code is clean and there is
more code re-use and improved readability. On possible further
improvement would be to move the preparation of SortSupport to a common
function since this is done three time in the code.

I did some simple benchmarks by adding indexes to temporary tables and
could see improvements of around 10% in index build time. So it gives a
nice, but not amazing, performance improvement.

Is there any case where we should expect any greater performance
improvement?

Either way I find this a nice patch which improves code quality and
performance.

= Minor code style issues I found

- There is a double space in "strategy = (scanKey->sk_flags [...]".
- I think there should be a newline in tuplesort_begin_index_btree()
before "/* Prepare SortSupport data for each column */".
- Remove the extra newline after reversedirection().
- End sentences in comments with period. That seems to be the common
practice in the project.

--
Andreas Karlsson

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-11-06 00:34:42 Re: numeric_normalize() is a few bricks shy of a load
Previous Message Josh Berkus 2014-11-06 00:32:24 recovery_target_time and standby_mode