Re: Memory usage during sorting

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory usage during sorting
Date: 2012-03-20 17:04:29
Message-ID: CA+TgmoZQPff2iSoYUSuzyi_ZzHE83Sbft1FvUHm_ov71ga6DNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 20, 2012 at 12:33 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Mar 20, 2012 at 7:44 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
>>> Offhand I wonder if this is all because we don't have the O(n) heapify
>>> implemented.
>
>> I'm pretty sure that's not the problem.  Even though our heapify is
>> not as efficient as it could be, it's plenty fast enough.  I thought
>> about writing a patch to implement the better algorithm, but it seems
>> like a distraction at this point because the heapify step is such a
>> small contributor to overall sort time.  What's taking all the time is
>> the repeated siftup operations as we pop things out of the heap.
>
> Right, but wouldn't getting rid of the run-number comparisons provide
> some marginal improvement in the speed of tuplesort_heap_siftup?

No. It does the opposite: it slows it down. This is a highly
surprising result but it's quite repeatable: removing comparisons
makes it slower. As previously pontificated, I think this is probably
because the heap can fill up with next-run tuples that are cheap to
compare against, and that spares us having to do "real" comparisons
involving the actual datatype comparators.

> BTW, there's a link at the bottom of the wikipedia page to a very
> interesting ACM Queue article, which argues that the binary-tree
> data structure isn't terribly well suited to virtual memory because
> it touches random locations in succession.  I'm not sure I believe
> his particular solution, but I'm wondering about B+ trees, ie more
> than 2 children per node.

I don't think virtual memory locality is the problem. I read
somewhere that a ternary heap is supposed to be about one-eighth
faster than a binary heap, but that's because picking the smallest of
three tuples requires two comparisons, whereas picking the smallest of
four tuples requires three comparisons, which is better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-03-20 17:07:50 Re: Cross-backend signals and administration (Was: Re: pg_terminate_backend for same-role)
Previous Message Atri Sharma 2012-03-20 16:57:01 Re: Regarding column reordering project for GSoc 2012