Quick Links

Re: B-Tree index builds, CLUSTER, and sortsupport

Lists:	pgsql-hackers

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	B-Tree index builds, CLUSTER, and sortsupport
Date:	2014-10-11 00:26:48
Message-ID:	CAM3SWZTfKZHTUiWDdHg+6tcYuMsdHoC=bMuAiVgMP9AThj42Gw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Both Robert and Heikki expressed dissatisfaction with the fact that
B-Tree index builds don't use sortsupport. Because B-Tree index builds
cannot really use the "onlyKey" optimization, the historic oversight
of not supporting B-Tree builds (and CLUSTER) might have been at least
a little understandable [1]. But with the recent work on sortsupport
for text, it's likely that B-Tree index builds will be missing out on
rather a lot by not taking advantage of sortsupport.

Attached patch modifies tuplesort to correct this oversight. What's
really nice about it is that there is actually a net negative code
footprint:

src/backend/access/nbtree/nbtsort.c | 63 +++---
src/backend/utils/sort/sortsupport.c | 77 ++++++--
src/backend/utils/sort/tuplesort.c | 274 +++++++++++----------------
src/include/utils/sortsupport.h | 3 +
4 files changed, 203 insertions(+), 214 deletions(-)

I've been able to throw out a lot of code, including two virtually
identical inlinable comparators (that have logic for NULL-wise
comparisons). This patch pretty much justifies itself as a refactoring
exercise, without performance improvements entering into it, which is
nice. I haven't benchmarked it yet, but it seems reasonable to assume
that much the same benefits will be seen as were seen for the
MinimalTuple case (for multiple-attribute sorts, that similarly cannot
use the "onlyKey" optimization).

With this patch, all callers of _bt_mkscankey_nodata() now use
sortsupport. This raises the question: Why not just have
_bt_mkscankey_nodata() do it all for us? I think that continuing to
have B-Tree provide a scanKey, and working off of that is a better
abstraction. It would save a few cycles to have the
index_getprocinfo() call currently within _bt_mkscankey_nodata() look
for BTSORTSUPPORT_PROC rather than BTORDER_PROC and be done with it,
but I'd call that a modularity violation. Tuplesort decides a strategy
(BTGreaterStrategyNumber or BTLessStrategyNumber) based on the scanKey
B-Tree private flag SK_BT_DESC, and it's appropriate for that to live
in tuplesort's head if possible, because tuplesort/sortsupport tracks
sort "direction" in a fairly intimate way. Besides, I think that
sortsupport already has enough clients for what it is.

I imagine it makes sense to directly access a sortsupport-installed
comparator through a scanKey, for actual index scans (think
abbreviated keys in internal B-Tree pages), but I still want
uniformity with the other cases within tuplesort (i.e. the
MinimalTuple and Datum cases), so I'm not about to change scanKeys to
have a comparator that doesn't need a fmgr trampoline for the benefit
of this patch. Note that I don't even store a scanKey within
Tuplesortstate anymore. That uniformity is what allowed to to throw
out the per-tuplesort-case reversedirection() logic, and use generic
reversedirection() logic instead (as anticipated by current comments
within Tuplesortstate).

Thoughts?

[1] http://www.postgresql.org/message-id/CAM3SWZQLg8nx2YEb+67xx0giG+-fOLfbtQJKg+jL15_zhi1V7w@mail.gmail.com
--
Peter Geoghegan

Attachment	Content-Type	Size
sortsupport-btree.2014_10_10.patch	text/x-patch	29.2 KB

From:	Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To:	Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: B-Tree index builds, CLUSTER, and sortsupport
Date:	2014-11-05 16:36:02
Message-ID:	545A51F2.1070301@BlueTreble.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/10/14, 7:26 PM, Peter Geoghegan wrote:
> Both Robert and Heikki expressed dissatisfaction with the fact that
> B-Tree index builds don't use sortsupport. Because B-Tree index builds
> cannot really use the "onlyKey" optimization, the historic oversight
> of not supporting B-Tree builds (and CLUSTER) might have been at least
> a little understandable [1]. But with the recent work on sortsupport
> for text, it's likely that B-Tree index builds will be missing out on
> rather a lot by not taking advantage of sortsupport.
>
> Attached patch modifies tuplesort to correct this oversight. What's
> really nice about it is that there is actually a net negative code
> footprint:

Did anything ever happen with this?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: B-Tree index builds, CLUSTER, and sortsupport
Date:	2014-11-05 17:47:45
Message-ID:	CAM3SWZRxmwB_cFmZ-QzKMhq7fc1-VS6Qme1VqKM-4JZ=bopJjQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Nov 5, 2014 at 8:36 AM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> Did anything ever happen with this?

I consider this to be related to the abbreviated keys stuff, although
it is clearly independently valuable (so it could be reviewed
independently). Since Robert ran out of time to work on abbreviated
keys (hopefully temporarily), it's not all that surprising that no one
has paid attention to this smaller piece of work. I see that Andreas
Karlsson is signed up to review this, so that's something.

--
Peter Geoghegan

From:	Andreas Karlsson <andreas(at)proxel(dot)se>
To:	Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: B-Tree index builds, CLUSTER, and sortsupport
Date:	2014-11-06 00:33:29
Message-ID:	545AC1D9.1040009@proxel.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/11/2014 02:26 AM, Peter Geoghegan wrote:> Both Robert and Heikki
expressed dissatisfaction with the fact that
> B-Tree index builds don't use sortsupport. Because B-Tree index builds
> cannot really use the "onlyKey" optimization, the historic oversight
> of not supporting B-Tree builds (and CLUSTER) might have been at least
> a little understandable [1]. But with the recent work on sortsupport
> for text, it's likely that B-Tree index builds will be missing out on
> rather a lot by not taking advantage of sortsupport.
>
> Attached patch modifies tuplesort to correct this oversight. What's
> really nice about it is that there is actually a net negative code
> footprint:
>
> src/backend/access/nbtree/nbtsort.c | 63 +++---
> src/backend/utils/sort/sortsupport.c | 77 ++++++--
> src/backend/utils/sort/tuplesort.c | 274 +++++++++++----------------
> src/include/utils/sortsupport.h | 3 +
> 4 files changed, 203 insertions(+), 214 deletions(-)

The code compiles and passes the test suite.

I looked at the changes to the code. The new code is clean and there is
more code re-use and improved readability. On possible further
improvement would be to move the preparation of SortSupport to a common
function since this is done three time in the code.

I did some simple benchmarks by adding indexes to temporary tables and
could see improvements of around 10% in index build time. So it gives a
nice, but not amazing, performance improvement.

Is there any case where we should expect any greater performance
improvement?

Either way I find this a nice patch which improves code quality and
performance.

= Minor code style issues I found

- There is a double space in "strategy = (scanKey->sk_flags [...]".
- I think there should be a newline in tuplesort_begin_index_btree()
before "/* Prepare SortSupport data for each column */".
- Remove the extra newline after reversedirection().
- End sentences in comments with period. That seems to be the common
practice in the project.

--
Andreas Karlsson

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Andreas Karlsson <andreas(at)proxel(dot)se>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: B-Tree index builds, CLUSTER, and sortsupport
Date:	2014-11-06 01:30:43
Message-ID:	CAM3SWZReiNJ1P-GXSLzqnLvxmaoCw48G8wBHc84qWaOe_-_KCQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thanks for the review.

On Wed, Nov 5, 2014 at 4:33 PM, Andreas Karlsson <andreas(at)proxel(dot)se> wrote:
> I looked at the changes to the code. The new code is clean and there is more
> code re-use and improved readability. On possible further improvement would
> be to move the preparation of SortSupport to a common function since this is
> done three time in the code.

The idea there is to have more direct control of sortsupport. With the
abbreviated keys patch, abbreviation occurs based on a decision made
by tuplesort.c. I can see why you'd say that, but I prefer to keep
initialization of sortsupport structs largely concentrated in
tuplesort.c, and more or less uniform regardless of the tuple-type
being sorted.

> I did some simple benchmarks by adding indexes to temporary tables and could
> see improvements of around 10% in index build time. So it gives a nice, but
> not amazing, performance improvement.

Cool.

> Is there any case where we should expect any greater performance
> improvement?

The really compelling case is abbreviated keys - as you probably know,
there is a patch that builds on this patch, and the abbreviated keys
patch, so that B-Tree builds and CLUSTER can use abbreviated keys too.
That doesn't really have much to do with this patch, though. The
important point is that heap tuple sorting (with a query that has no
client overhead, and involves one big sort) and B-Tree index creation
both have their tuplesort as a totally dominant cost. The improvements
for each case should be quite comparable, which is good (except, as
noted in my opening e-mail, when heap/datum tuplesorting can use the
onlyKey optimization, while B-Tree/CLUSTER tuplesorting cannot).

--
Peter Geoghegan

From:	Andreas Karlsson <andreas(at)proxel(dot)se>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: B-Tree index builds, CLUSTER, and sortsupport
Date:	2014-11-06 10:51:45
Message-ID:	545B52C1.6010400@proxel.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/06/2014 02:30 AM, Peter Geoghegan wrote:
> Thanks for the review.
>
> On Wed, Nov 5, 2014 at 4:33 PM, Andreas Karlsson <andreas(at)proxel(dot)se> wrote:
>> I looked at the changes to the code. The new code is clean and there is more
>> code re-use and improved readability. On possible further improvement would
>> be to move the preparation of SortSupport to a common function since this is
>> done three time in the code.
>
> The idea there is to have more direct control of sortsupport. With the
> abbreviated keys patch, abbreviation occurs based on a decision made
> by tuplesort.c. I can see why you'd say that, but I prefer to keep
> initialization of sortsupport structs largely concentrated in
> tuplesort.c, and more or less uniform regardless of the tuple-type
> being sorted.

Ok, I am fine with either.

>> Is there any case where we should expect any greater performance
>> improvement?
>
> The really compelling case is abbreviated keys - as you probably know,
> there is a patch that builds on this patch, and the abbreviated keys
> patch, so that B-Tree builds and CLUSTER can use abbreviated keys too.
> That doesn't really have much to do with this patch, though. The
> important point is that heap tuple sorting (with a query that has no
> client overhead, and involves one big sort) and B-Tree index creation
> both have their tuplesort as a totally dominant cost. The improvements
> for each case should be quite comparable, which is good (except, as
> noted in my opening e-mail, when heap/datum tuplesorting can use the
> onlyKey optimization, while B-Tree/CLUSTER tuplesorting cannot).

Ah, I see.

--
Andreas Karlsson

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Andreas Karlsson <andreas(at)proxel(dot)se>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject:	Re: B-Tree index builds, CLUSTER, and sortsupport
Date:	2014-11-07 20:52:04
Message-ID:	CA+TgmoacZYeH5ZV1sLPjqs6+tfFw1sJsRVt+R3js69KK0sPyew@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Nov 5, 2014 at 8:30 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> Thanks for the review.

Committed with some copy-editing based on Andreas's comments.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andreas Karlsson <andreas(at)proxel(dot)se>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject:	Re: B-Tree index builds, CLUSTER, and sortsupport
Date:	2014-11-07 21:26:39
Message-ID:	CAM3SWZRCapVM5nvYCg6xP9cq5or-6LjTTPDCPEdWenASgwnReQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Nov 7, 2014 at 12:52 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Committed with some copy-editing based on Andreas's comments.

Thank you both.

--
Peter Geoghegan