Re: Minor performance improvement in transition to external sort

Lists: pgsql-hackers
From: Jeremy Harris <jgh(at)wizmail(dot)org>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Minor performance improvement in transition to external sort
Date: 2014-02-04 22:22:59
Message-ID: 52F16843.8080001@wizmail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The attached patch replaces the existing siftup method for heapify with
a siftdown method. Tested with random integers it does 18% fewer
compares and takes 10% less time for the heapify, over the work_mem
range 1024 to 1048576.

Both algorithms appear to be O(n) (contradicting Wikipedia's claim
that a siftup heapify is O(n log n)).

--
Cheers,
Jeremy

Attachment Content-Type Size
d text/plain 2.5 KB

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Jeremy Harris <jgh(at)wizmail(dot)org>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-02-05 01:11:54
Message-ID: CAB7nPqSAFzHju9FYzw1sUcuAn8xMVhnv5VW_iM6OOL-0xMX+cg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 5, 2014 at 7:22 AM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
> The attached patch replaces the existing siftup method for heapify with
> a siftdown method. Tested with random integers it does 18% fewer
> compares and takes 10% less time for the heapify, over the work_mem
> range 1024 to 1048576.
>
> Both algorithms appear to be O(n) (contradicting Wikipedia's claim
> that a siftup heapify is O(n log n)).
It looks interesting but it is too late to have that in 9.4... You
should definitely add this patch to the next commit fest so as it is
not lost in the void:
https://commitfest.postgresql.org/action/commitfest_view?id=22
Regards,
--
Michael


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeremy Harris <jgh(at)wizmail(dot)org>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-02-05 21:56:41
Message-ID: CA+TgmoY45KMBwV5ZtSzX+K1+gTXSDgrUDZMDhpp14AjdOQcDmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
> The attached patch replaces the existing siftup method for heapify with
> a siftdown method. Tested with random integers it does 18% fewer
> compares and takes 10% less time for the heapify, over the work_mem
> range 1024 to 1048576.
>
> Both algorithms appear to be O(n) (contradicting Wikipedia's claim
> that a siftup heapify is O(n log n)).

I think Wikipedia is right. Inserting an a tuple into a heap is O(lg
n); doing that n times is therefore O(n lg n). Your patch likewise
implements an outer loop which runs O(n) times and an inner loop which
runs at most O(lg n) times, so a naive analysis of that algorithm also
comes out to O(n lg n). Wikipedia's article on
http://en.wikipedia.org/wiki/Heapsort explains why a tighter bound is
possible for the siftdown case.

I think I tested something like this at some point and concluded that
it didn't really help much, because building the initial heap was a
pretty small part of the runtime of a large sort. It may still be
worth doing, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeremy Harris <jgh(at)wizmail(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-02-06 09:22:42
Message-ID: 52F35462.3030306@wizmail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/02/14 21:56, Robert Haas wrote:
> On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
>> The attached patch replaces the existing siftup method for heapify with
>> a siftdown method. Tested with random integers it does 18% fewer
>> compares and takes 10% less time for the heapify, over the work_mem
>> range 1024 to 1048576.
>>
>> Both algorithms appear to be O(n) (contradicting Wikipedia's claim
>> that a siftup heapify is O(n log n)).
>
> I think Wikipedia is right. Inserting an a tuple into a heap is O(lg
> n); doing that n times is therefore O(n lg n). Your patch likewise
> implements an outer loop which runs O(n) times and an inner loop which
> runs at most O(lg n) times, so a naive analysis of that algorithm also
> comes out to O(n lg n).

The patch implements a siftdown. Measurements attached.
--
Cheers,
Jeremy

Attachment Content-Type Size
results.csv text/csv 951 bytes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeremy Harris <jgh(at)wizmail(dot)org>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-02-06 14:22:32
Message-ID: CA+TgmobkRTSZ3CzXN3aAO9Pcg7sNJGro83aBioFB2JD=Uom2qg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 6, 2014 at 4:22 AM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
> On 05/02/14 21:56, Robert Haas wrote:
>> On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
>>> The attached patch replaces the existing siftup method for heapify with
>>> a siftdown method. Tested with random integers it does 18% fewer
>>> compares and takes 10% less time for the heapify, over the work_mem
>>> range 1024 to 1048576.
>>>
>>> Both algorithms appear to be O(n) (contradicting Wikipedia's claim
>>> that a siftup heapify is O(n log n)).
>>
>>
>> I think Wikipedia is right. Inserting an a tuple into a heap is O(lg
>> n); doing that n times is therefore O(n lg n). Your patch likewise
>> implements an outer loop which runs O(n) times and an inner loop which
>> runs at most O(lg n) times, so a naive analysis of that algorithm also
>> comes out to O(n lg n).
>
> The patch implements a siftdown. Measurements attached.

Uh, this spreadsheet is useless. You haven't explained how these
numbers were generated, or what the columns in the spreadsheet
actually mean. Ideally, you should include enough information that
someone else can try to reproduce your results.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Jeremy Harris <jgh(at)wizmail(dot)org>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-02-06 18:21:16
Message-ID: CAMkU=1wc7_mtc_2QvGxoLZEGBkd=D0dnwNKZ4C+ifSKu9e+imQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 4, 2014 at 2:22 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:

> The attached patch replaces the existing siftup method for heapify with
> a siftdown method. Tested with random integers it does 18% fewer
> compares and takes 10% less time for the heapify, over the work_mem
> range 1024 to 1048576.
>

Thanks for working on this. Several times I've wondered why we didn't use
siftdown for heapifying, as it is a well known optimization. How big of
sets were you sorting in each case? Was it always just slightly bigger
than would fit in work_mem? Did you try sorting already-sorted, reverse
sorted, or pipe-organ shaped data sets? We will also need to test it on
strings. I usually use md5(random()::text) to generate strings for such
purposes, at least for a first pass.

The name of the attached patch is a bit unfortunate. And I'm not sure what
you are doing with tupindex, the change there doesn't seem to be necessary
to the rest of your work, so some explanation on that would be helpful.

The project coding style usually has comments in blocks before loops on
which they comment, rather than interspersed throughout the clauses of the
"for" statement.

>
> Both algorithms appear to be O(n) (contradicting Wikipedia's claim
> that a siftup heapify is O(n log n)).

Siftdown is linear in all cases. Siftup may be linear in the typical case,
but is n log n in the worst case.

Another optimization that is possible is to do only one comparison at each
level (between the two children) all the way down the heap, and then
compare the parent to the chain of smallest children in reverse order.
This can substantially reduce the number of comparisons on average,
because most tuples sift most of the way down the heap (because most of the
tuples are at the bottom of the heap). Robert submitted a patch to do this
in the main siftdown routine (which for some reason is
named tuplesort_heap_siftup, rather than siftdown), and I found it a
substantial improvement but it never got pushed forward. I think Robert
was concerned it might make some obscure cases worse rather than better,
and no one put it through enough serious testing to overcome that concern.

(Also, I think you should make your siftdown code look more like the
existing siftdown code.)

Cheers,

Jeff


From: Jeremy Harris <jgh(at)wizmail(dot)org>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-02-06 22:12:05
Message-ID: 52F408B5.7050102@wizmail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/02/14 18:21, Jeff Janes wrote:
> How big of
> sets were you sorting in each case?

Big enough to go external. The timings and compare-counts given are
purely for the heapify stage not the total for the sort, so are
constrained by the work_mem not by the sort size per se.

I'm limited to working on a small machine, so the work_mem value
of 1e+6 is approaching my top limit of sort-time pain unless I rip the
heapify out into a test harness. What range of work_mem is it useful
to test over?

> Was it always just slightly bigger
> than would fit in work_mem?

I didn't check.

> Did you try sorting already-sorted, reverse
> sorted, or pipe-organ shaped data sets?

Not yet, but I can. What variety of pipe-organ shape is of
interest (I can think of several that might be called that)?

> We will also need to test it on
> strings. I usually use md5(random()::text) to generate strings for such
> purposes, at least for a first pass.

OK.

>
> The name of the attached patch is a bit unfortunate.

Is there a project standard for this?

> And I'm not sure what
> you are doing with tupindex, the change there doesn't seem to be necessary
> to the rest of your work, so some explanation on that would be helpful.

I'll add commentary.

> The project coding style usually has comments in blocks before loops on
> which they comment, rather than interspersed throughout the clauses of the
> "for" statement.

I'll adjust that.

> Another optimization that is possible is to do only one comparison at each
> level (between the two children) all the way down the heap, and then
> compare the parent to the chain of smallest children in reverse order.
> This can substantially reduce the number of comparisons on average,
> because most tuples sift most of the way down the heap (because most of the
> tuples are at the bottom of the heap).

Sounds interesting; I'll see if I can get that going.

> (Also, I think you should make your siftdown code look more like the
> existing siftdown code.)

A function called by the outer loop? Can do; the existing does that
because the function is called from multiple places but this will only
be used the once.

Thanks for the review.
--
Jeremy


From: Jeremy Harris <jgh(at)wizmail(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-02-06 22:30:01
Message-ID: 52F40CE9.1070509@wizmail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/02/14 14:22, Robert Haas wrote:
> On Thu, Feb 6, 2014 at 4:22 AM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
>> On 05/02/14 21:56, Robert Haas wrote:
>>> On Tue, Feb 4, 2014 at 5:22 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
>>>> The attached patch replaces the existing siftup method for heapify with
>>>> a siftdown method. Tested with random integers it does 18% fewer
>>>> compares and takes 10% less time for the heapify, over the work_mem
>>>> range 1024 to 1048576.
>>>>
>>>> Both algorithms appear to be O(n) (contradicting Wikipedia's claim
>>>> that a siftup heapify is O(n log n)).
>>>
>>>
>>> I think Wikipedia is right. Inserting an a tuple into a heap is O(lg
>>> n); doing that n times is therefore O(n lg n). Your patch likewise
>>> implements an outer loop which runs O(n) times and an inner loop which
>>> runs at most O(lg n) times, so a naive analysis of that algorithm also
>>> comes out to O(n lg n).
>>
>> The patch implements a siftdown. Measurements attached.
>
> Uh, this spreadsheet is useless. You haven't explained how these
> numbers were generated, or what the columns in the spreadsheet
> actually mean. Ideally, you should include enough information that
> someone else can try to reproduce your results.
>

I'm sorry I wasn't clear.

Test code was groups of the form:

set work_mem to 1024;
CREATE TABLE jgh (i integer);
INSERT INTO jgh (i) SELECT 20000 * random() FROM generate_series(1, 20000);
VACUUM ANALYZE jgh;
explain analyze SELECT * FROM jgh ORDER BY i;
set enable_siftdown_heapify to off;
explain analyze SELECT * FROM jgh ORDER BY i;
set enable_siftdown_heapify to on;
DROP TABLE jgh;

.. for the tabulated work_mem, and scaled values for the INSERT.

Compares were counted for the heapify stage (only), and timings
taken using pg_rusage_init() before and pg_rusage_show() after
it. Times are in seconds.

Baseline value for compares and time used 858ec11858.
Sift-down compares and time are for the patch.

"Baseline K cmps" is compares divided by work_mem (which should
be a scaled version of compares-per-tuple being heapified) pre-patch.
"Siftdown K cmps" is likewise with the patch.

"K ratio cmps" is the ratio (patched compares)/(unpatched compares).

Repeat for time measurements.

--
Cheers,
Jeremy


From: Jeremy Harris <jgh(at)wizmail(dot)org>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-02-07 21:28:00
Message-ID: 52F54FE0.8040408@wizmail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/02/14 22:12, Jeremy Harris wrote:
>> Did you try sorting already-sorted, reverse
>> sorted, or pipe-organ shaped data sets?

Summary (low numbers better):

Random ints: 83% compares, level on time.
Sorted ints: level compares, 70% time.
Reverse-sorted ints: 10% compares, 15% time (!)
Constant ints: 200% compares, 360% time (ouch, and not O(n))
Pipe-organ ints: 80% compares, 107% time
Random text: 83% compares, 106% time

--
Cheers,
Jeremy

Attachment Content-Type Size
siftdown_performance.ods application/vnd.oasis.opendocument.spreadsheet 24.3 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeremy Harris <jgh(at)wizmail(dot)org>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-02-09 16:56:47
Message-ID: CA+TgmoZh+fXQWWyjr1NVaGxguHa+9XCx06B6kNTkA4xgf0kLeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 7, 2014 at 4:28 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
> On 06/02/14 22:12, Jeremy Harris wrote:
>>> Did you try sorting already-sorted, reverse
>>> sorted, or pipe-organ shaped data sets?
>
> Summary (low numbers better):
>
> Random ints: 83% compares, level on time.
> Sorted ints: level compares, 70% time.
> Reverse-sorted ints: 10% compares, 15% time (!)
> Constant ints: 200% compares, 360% time (ouch, and not O(n))
> Pipe-organ ints: 80% compares, 107% time
> Random text: 83% compares, 106% time

This is kind of what I expected to happen. The problem is that it's
hard to make some cases better without making other cases worse. I
suspect (hope?) there's some simple fix for the constant-int case.
But the last two cases are trickier. It seems intuitively that
reducing comparisons ought to reduce runtime, but if I'm reading the
results, the runtime actually went up even though the number of
comparisons went down. This is hard to account for, but we probably
need to at least understand why that's happening. I feel like this
algorithm ought to be a win, but these data don't provide a compelling
case for change.

I wonder if it would be worth trying this test with text data as well.
Text comparisons are much more expensive than integer comparisons, so
the effect of saving comparisons ought to be more visible there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeremy Harris <jgh(at)wizmail(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-02-09 17:11:30
Message-ID: 52F7B6C2.2000705@wizmail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/02/14 18:21, Jeff Janes wrote:
> Did you try sorting already-sorted, reverse
> sorted, or pipe-organ shaped data sets? We will also need to test it on
> strings. I usually use md5(random()::text) to generate strings for such
> purposes, at least for a first pass.

Attached is version 2 of the patch, which fixes the performance on
constant-input.

Summary (low numbers better, vs 858ec11):

Random ints: 83% compares, level on time.
Sorted ints: level compares, 70% time.
Reverse-sorted ints: 10% compares, 15% time (!)
Constant ints: level compares, 75% time
Pipe-organ ints: 80% compares, 107% time
Random text: 83% compares, 106% time

> Another optimization that is possible is to do only one comparison at each
> level (between the two children) all the way down the heap, and then
> compare the parent to the chain of smallest children in reverse order.
> This can substantially reduce the number of comparisons on average,
> because most tuples sift most of the way down the heap (because most of the
> tuples are at the bottom of the heap). Robert submitted a patch to do this
> in the main siftdown routine (which for some reason is
> named tuplesort_heap_siftup, rather than siftdown), and I found it a
> substantial improvement but it never got pushed forward. I think Robert
> was concerned it might make some obscure cases worse rather than better,
> and no one put it through enough serious testing to overcome that concern.

I've done an implementation of this (also attached: siftdown_bounce).
Summary:

Random ints: 72% compares, 104% time.
Sorted ints: 200% compares, 270% time. (ouch)
Reverse-sorted ints: 7% compares, 12% time
Constant ints: 150% compares, 275% time
Pipe-organ ints: 93% compares, 115% time
Random text: 72% compares, 91% time

- I don't like the performance on already-sorted input. Thinking
into this, a sorted array is already a well-formed heap so optimal
behaviour is a single compare per item (lacking the global knowledge).
This "bounce" method will always do a full walk down the tree and
back, hence the poor behaviour. Unless the planner can tell
the sort when sorted input is possible I don't think we dare use
this despite the benefit on random input.

The tests were run with an instrumented variant of each patch, counting
compares and time for just the heapify portion of the inittapes()
routine. Test script (jgh.sql and results spreadsheet) also attached.
--
Cheers,
Jeremy

Attachment Content-Type Size
heapify_siftdown_v2.patch text/x-patch 4.1 KB
siftdown_bounce.patch text/x-patch 4.4 KB
jgh.sql application/sql 21.5 KB
siftdown_performance.ods application/vnd.oasis.opendocument.spreadsheet 32.1 KB

From: Jeremy Harris <jgh(at)wizmail(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-02-21 00:27:28
Message-ID: 53069D70.9080403@wizmail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/02/14 17:11, Jeremy Harris wrote:
> On 06/02/14 18:21, Jeff Janes wrote:
>> Did you try sorting already-sorted, reverse
>> sorted, or pipe-organ shaped data sets? We will also need to test it on
>> strings. I usually use md5(random()::text) to generate strings for such
>> purposes, at least for a first pass.
>
> Attached is version 2 of the patch, which fixes the performance on
> constant-input.

Having beaten on this some more I'm prepared to abandon it.

The wallclock time, for random input, drifts up at larger N
(compared to the existing code) despite the number of comparisons
being consistently less.

Run under cachegrind, it takes about N/10 last-level cache misses,
all for the new item being introduced to the heap. The existing
code takes none at all.

It might be worthwhile for a seriously expensive comparison function;
say, more than 50 clocks. For integers and md5-strings it isn't.
--
Cheers,
Jeremy


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeremy Harris <jgh(at)wizmail(dot)org>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-02-24 17:38:15
Message-ID: CA+TgmoaC_Lsy=Lituc7qXN4=BLzY+CpWwG85+QOSo_FXR3n0kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
> On 09/02/14 17:11, Jeremy Harris wrote:
>> On 06/02/14 18:21, Jeff Janes wrote:
>>> Did you try sorting already-sorted, reverse
>>> sorted, or pipe-organ shaped data sets? We will also need to test it on
>>> strings. I usually use md5(random()::text) to generate strings for such
>>> purposes, at least for a first pass.
>>
>>
>> Attached is version 2 of the patch, which fixes the performance on
>> constant-input.
>
> Having beaten on this some more I'm prepared to abandon it.
>
> The wallclock time, for random input, drifts up at larger N
> (compared to the existing code) despite the number of comparisons
> being consistently less.
>
> Run under cachegrind, it takes about N/10 last-level cache misses,
> all for the new item being introduced to the heap. The existing
> code takes none at all.

Can you explain this further? This seems like an important clue that
could be useful when trying to optimize this code, but I'm a little
unclear which part of the operation has more cache misses with your
changes and why.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeremy Harris <jgh(at)wizmail(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-02-25 19:55:08
Message-ID: 530CF51C.1030208@wizmail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 24/02/14 17:38, Robert Haas wrote:
> On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
>> Run under cachegrind, it takes about N/10 last-level cache misses,
>> all for the new item being introduced to the heap. The existing
>> code takes none at all.
>
> Can you explain this further? This seems like an important clue that
> could be useful when trying to optimize this code, but I'm a little
> unclear which part of the operation has more cache misses with your
> changes and why.

In the patched version, for the heapify operation the outer loop
starts at the last heap-parent tuple and works backward to the
start of the tuples array. A copy is taken of the SortTuple being
operated on for the inner loop to use. This copy suffers cache misses.

(The inner loop operates on elements between the copy source and the
end of the array).

In the original, the outer loop runs the array in increasing index
order. Again a copy is taken of the SortTuple for the inner loop
to use. This copy does not appear to take significant cache misses.

(The inner loop operates on elements between the copy source and
the start of the array).

--
Cheers,
Jeremy


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeremy Harris <jgh(at)wizmail(dot)org>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-02-26 03:08:21
Message-ID: CA+TgmoYTbwNE1WD2sqtzFJ7yxH-1UBUR-0HcF_2ZpQzeKzfMQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 25, 2014 at 2:55 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
> On 24/02/14 17:38, Robert Haas wrote:
>>
>> On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
>>>
>>> Run under cachegrind, it takes about N/10 last-level cache misses,
>>> all for the new item being introduced to the heap. The existing
>>> code takes none at all.
>>
>>
>> Can you explain this further? This seems like an important clue that
>> could be useful when trying to optimize this code, but I'm a little
>> unclear which part of the operation has more cache misses with your
>> changes and why.
>
>
> In the patched version, for the heapify operation the outer loop
> starts at the last heap-parent tuple and works backward to the
> start of the tuples array. A copy is taken of the SortTuple being
> operated on for the inner loop to use. This copy suffers cache misses.
>
> (The inner loop operates on elements between the copy source and the
> end of the array).
>
>
> In the original, the outer loop runs the array in increasing index
> order. Again a copy is taken of the SortTuple for the inner loop
> to use. This copy does not appear to take significant cache misses.
>
> (The inner loop operates on elements between the copy source and
> the start of the array).

Do you have any theory as to why this happens in one case but not the other?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeremy Harris <jgh(at)wizmail(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-02-26 19:53:38
Message-ID: 530E4642.1060405@wizmail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26/02/14 03:08, Robert Haas wrote:
> On Tue, Feb 25, 2014 at 2:55 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
>> On 24/02/14 17:38, Robert Haas wrote:
>>>
>>> On Thu, Feb 20, 2014 at 7:27 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
>>>>
>>>> Run under cachegrind, it takes about N/10 last-level cache misses,
>>>> all for the new item being introduced to the heap. The existing
>>>> code takes none at all.
>>>
>>>
>>> Can you explain this further? This seems like an important clue that
>>> could be useful when trying to optimize this code, but I'm a little
>>> unclear which part of the operation has more cache misses with your
>>> changes and why.
>>
>>
>> In the patched version, for the heapify operation the outer loop
>> starts at the last heap-parent tuple and works backward to the
>> start of the tuples array. A copy is taken of the SortTuple being
>> operated on for the inner loop to use. This copy suffers cache misses.
>>
>> (The inner loop operates on elements between the copy source and the
>> end of the array).
>>
>>
>> In the original, the outer loop runs the array in increasing index
>> order. Again a copy is taken of the SortTuple for the inner loop
>> to use. This copy does not appear to take significant cache misses.
>>
>> (The inner loop operates on elements between the copy source and
>> the start of the array).
>
> Do you have any theory as to why this happens in one case but not the other?

Nope. Only really wild stuff requiring the cpu memory channel
recognising the forward scan (despite the inner loop) and not
the backward one - and ignoring prefetch instructions, which I
experimented with.
--
Cheers,
Jeremy


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Jeremy Harris <jgh(at)wizmail(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-04-10 17:03:15
Message-ID: CA+U5nMJdnFmKh8++bRmq4P9jzsfzzQGPebctpsgUk6t02x3Atg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6 February 2014 18:21, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Tue, Feb 4, 2014 at 2:22 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
>>
>> The attached patch replaces the existing siftup method for heapify with
>> a siftdown method. Tested with random integers it does 18% fewer
>> compares and takes 10% less time for the heapify, over the work_mem
>> range 1024 to 1048576.
>
>
> Thanks for working on this.

+1

Your patch isn't linked properly from the CF manager though.

If you like patches like this then there's a long(er) list of
optimizations already proposed previously around sorting. It would be
good to have someone work through them for external sorts. I believe
Noah is working on parallel internal sort (as an aside).

There's also an optimization possible for merge joins where we use the
output of the first sort as an additional filter on the second sort.
That can help when we're going to join two disjoint tables.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Jeremy Harris <jgh(at)wizmail(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-04-16 23:38:53
Message-ID: 20140416233853.GS7443@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 10, 2014 at 06:03:15PM +0100, Simon Riggs wrote:
> On 6 February 2014 18:21, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> > On Tue, Feb 4, 2014 at 2:22 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
> >>
> >> The attached patch replaces the existing siftup method for heapify with
> >> a siftdown method. Tested with random integers it does 18% fewer
> >> compares and takes 10% less time for the heapify, over the work_mem
> >> range 1024 to 1048576.
> >
> >
> > Thanks for working on this.
>
> +1
>
> Your patch isn't linked properly from the CF manager though.
>
> If you like patches like this then there's a long(er) list of
> optimizations already proposed previously around sorting. It would be
> good to have someone work through them for external sorts. I believe
> Noah is working on parallel internal sort (as an aside).
>
> There's also an optimization possible for merge joins where we use the
> output of the first sort as an additional filter on the second sort.
> That can help when we're going to join two disjoint tables.

Where should this be recorded? TODO? Commitfest manager?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Jeremy Harris <jgh(at)wizmail(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minor performance improvement in transition to external sort
Date: 2014-04-17 13:23:45
Message-ID: CA+TgmoadZPOB4ERYWQCH614_Xn=LCMJ2dJkb-Zeiz=qo==TDDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 16, 2014 at 7:38 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Thu, Apr 10, 2014 at 06:03:15PM +0100, Simon Riggs wrote:
>> On 6 February 2014 18:21, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> > On Tue, Feb 4, 2014 at 2:22 PM, Jeremy Harris <jgh(at)wizmail(dot)org> wrote:
>> >>
>> >> The attached patch replaces the existing siftup method for heapify with
>> >> a siftdown method. Tested with random integers it does 18% fewer
>> >> compares and takes 10% less time for the heapify, over the work_mem
>> >> range 1024 to 1048576.
>> >
>> >
>> > Thanks for working on this.
>>
>> +1
>>
>> Your patch isn't linked properly from the CF manager though.
>>
>> If you like patches like this then there's a long(er) list of
>> optimizations already proposed previously around sorting. It would be
>> good to have someone work through them for external sorts. I believe
>> Noah is working on parallel internal sort (as an aside).
>>
>> There's also an optimization possible for merge joins where we use the
>> output of the first sort as an additional filter on the second sort.
>> That can help when we're going to join two disjoint tables.
>
> Where should this be recorded? TODO? Commitfest manager?

IIUC, the original patch was withdrawn; any remaining action items
should probably go to TODO. I'm not sure which specific idea you're
referring to, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company