Re: B-Tree index builds, CLUSTER, and sortsupport

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: B-Tree index builds, CLUSTER, and sortsupport
Date: 2014-11-06 01:30:43
Message-ID: CAM3SWZReiNJ1P-GXSLzqnLvxmaoCw48G8wBHc84qWaOe_-_KCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the review.

On Wed, Nov 5, 2014 at 4:33 PM, Andreas Karlsson <andreas(at)proxel(dot)se> wrote:
> I looked at the changes to the code. The new code is clean and there is more
> code re-use and improved readability. On possible further improvement would
> be to move the preparation of SortSupport to a common function since this is
> done three time in the code.

The idea there is to have more direct control of sortsupport. With the
abbreviated keys patch, abbreviation occurs based on a decision made
by tuplesort.c. I can see why you'd say that, but I prefer to keep
initialization of sortsupport structs largely concentrated in
tuplesort.c, and more or less uniform regardless of the tuple-type
being sorted.

> I did some simple benchmarks by adding indexes to temporary tables and could
> see improvements of around 10% in index build time. So it gives a nice, but
> not amazing, performance improvement.

Cool.

> Is there any case where we should expect any greater performance
> improvement?

The really compelling case is abbreviated keys - as you probably know,
there is a patch that builds on this patch, and the abbreviated keys
patch, so that B-Tree builds and CLUSTER can use abbreviated keys too.
That doesn't really have much to do with this patch, though. The
important point is that heap tuple sorting (with a query that has no
client overhead, and involves one big sort) and B-Tree index creation
both have their tuplesort as a totally dominant cost. The improvements
for each case should be quite comparable, which is good (except, as
noted in my opening e-mail, when heap/datum tuplesorting can use the
onlyKey optimization, while B-Tree/CLUSTER tuplesorting cannot).

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-11-06 01:38:36 Re: ltree::text not immutable?
Previous Message Jim Nasby 2014-11-06 01:06:15 Re: Repeatable read and serializable transactions see data committed after tx start