Quick Links

Re: Minor performance improvement in transition to external sort

From:	Jeremy Harris <jgh(at)wizmail(dot)org>
To:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Minor performance improvement in transition to external sort
Date:	2014-02-06 22:12:05
Message-ID:	52F408B5.7050102@wizmail.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 06/02/14 18:21, Jeff Janes wrote:
> How big of
> sets were you sorting in each case?

Big enough to go external. The timings and compare-counts given are
purely for the heapify stage not the total for the sort, so are
constrained by the work_mem not by the sort size per se.

I'm limited to working on a small machine, so the work_mem value
of 1e+6 is approaching my top limit of sort-time pain unless I rip the
heapify out into a test harness. What range of work_mem is it useful
to test over?

> Was it always just slightly bigger
> than would fit in work_mem?

I didn't check.

> Did you try sorting already-sorted, reverse
> sorted, or pipe-organ shaped data sets?

Not yet, but I can. What variety of pipe-organ shape is of
interest (I can think of several that might be called that)?

> We will also need to test it on
> strings. I usually use md5(random()::text) to generate strings for such
> purposes, at least for a first pass.

OK.

>
> The name of the attached patch is a bit unfortunate.

Is there a project standard for this?

> And I'm not sure what
> you are doing with tupindex, the change there doesn't seem to be necessary
> to the rest of your work, so some explanation on that would be helpful.

I'll add commentary.

> The project coding style usually has comments in blocks before loops on
> which they comment, rather than interspersed throughout the clauses of the
> "for" statement.

I'll adjust that.

> Another optimization that is possible is to do only one comparison at each
> level (between the two children) all the way down the heap, and then
> compare the parent to the chain of smallest children in reverse order.
> This can substantially reduce the number of comparisons on average,
> because most tuples sift most of the way down the heap (because most of the
> tuples are at the bottom of the heap).

Sounds interesting; I'll see if I can get that going.

> (Also, I think you should make your siftdown code look more like the
> existing siftdown code.)

A function called by the outer loop? Can do; the existing does that
because the function is called from multiple places but this will only
be used the once.

Thanks for the review.
--
Jeremy

In response to

Re: Minor performance improvement in transition to external sort at 2014-02-06 18:21:16 from Jeff Janes

Responses

Re: Minor performance improvement in transition to external sort at 2014-02-07 21:28:00 from Jeremy Harris

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Johnston	2014-02-06 22:22:47	Re: 'dml' value for log_statement
Previous Message	Greg Stark	2014-02-06 22:00:17	Re: Release schedule for 9.3.3?