Re: Memory usage during sorting

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory usage during sorting
Date: 2012-03-20 01:57:10
Message-ID: 6206.1332208630@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> writes:
> On Mon, Mar 19, 2012 at 7:23 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> There's no real reason why the tuples destined for the next run need
>> to be maintained in heap order; we could just store them unordered and
>> heapify the whole lot of them when it's time to start the next run.

> This sounded familiar....
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=cf627ab41ab9f6038a29ddd04dd0ff0ccdca714e

Yeah, see also the pgsql-hackers thread starting here:
http://archives.postgresql.org/pgsql-hackers/1999-10/msg00384.php

That was a long time ago, of course, but I have some vague recollection
that keeping next-run tuples in the current heap achieves a net savings
in the total number of comparisons needed to heapify both runs.
Robert's point about integer comparisons being faster than data
comparisons may or may not be relevant. Certainly they are faster, but
there are never very many run numbers in the heap at once (possibly no
more than 2, I forget; and in any case often only 1). So I'd expect
most tuple comparisons to end up having to do a data comparison anyway.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joachim Wieland 2012-03-20 03:04:25 Re: patch for parallel pg_dump
Previous Message Tom Lane 2012-03-20 01:39:56 Re: Command Triggers, patch v11