Quick Links

Re: Releasing memory during External sorting?

From:	"Dann Corbit" <DCorbit(at)connx(dot)com>
To:	"Meir Maor" <meirmaor(at)gmail(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	<pgsql-hackers(at)postgresql(dot)org>, <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Releasing memory during External sorting?
Date:	2005-09-24 06:33:35
Message-ID:	D425483C2C5C9F49B5B7A41F8944154757D106@postal.corporate.connx.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Generally, when you read from a set of subfiles, the OS will cache the
reads to some degree, so the disk-seek jitter is not always that bad. On
a highly fragmented disk drive, you might also jump all over the place
reading serially from a single subfile. Of course, every situation is
different. At any rate, I would recommend to benchmark many different
approaches. It is also rather important how much memory is available to
perform sorts and reads. But with memory becoming larger and cheaper,
it becomes less and less of a worry.

________________________________

From: pgsql-hackers-owner(at)postgresql(dot)org
[mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Meir Maor
Sent: Friday, September 23, 2005 10:24 PM
To: Simon Riggs
Cc: pgsql-hackers(at)postgresql(dot)org; pgsql-performance(at)postgresql(dot)org
Subject: Re: [HACKERS] Releasing memory during External sorting?

Calculating Optimal memory for disk based sort is based only on
minimizing IO.
A previous post stated we can merge as many subfiles as we want in a
single pass,
this is not accurate, as we want to eliminate disk seeks also in the
merge phase,
also the merging should be done by reading blocks of data from each
subfile,
if we have data of size N and M memory, then we will have K=N/M subfiles
to merge
after sorting each.
in the merge operation if we want to merge all blocks in one pass we
will read
M/K data from each subfile into memory and begin merging, we will read
another M/K block
when the buffer from a subfile is empty,
we would like disk seek time to be irrelavant when comparing to
sequential IO time.
We notice that we are performing IO in blocks of N/K^2 which is
M/(N/M)^2
let us assume that sequeential IO is done at 100MB/s and that
a random seek requires ~15ms. and we want seek time to be irrelavnt in
one order of
magnitute we get, that in the time of one random seek we can read 1.5MB
of data
and would get optimal performance if we perform IO in blocks of 15MB.
and since in the merge algorithm showed above we perform IO in blocks of
M/K
we would like M>K*15MB which results in a very large memory requirement.
M^2>N*15MB
M>sqrt(N*15MB)
for example for sorting 10GB of data, we would like M>380MB
for optimal performance.

alternativly if we can choose a diffrent algorithm in which we merge
only a constant
number of sunfiles to gether at a time but then we will require multiple
passes to merge
the entire file. we will require log(K) passes over the entire data and
this approach obviously
improves with increase of memory.

The first aproach requires 2 passes of the entire data and K^2+K random
seeks,
the second aproach(when merging l blocks at a time) requires: log(l,K)
passes over the data
and K*l+K random seeks.

On 9/23/05, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

I have concerns about whether we are overallocating memory for use in
external sorts. (All code relating to this is in tuplesort.c)

When we begin a sort we allocate (work_mem | maintenance_work_mem) and
attempt to do the sort in memory. If the sort set is too big to fit in
memory we then write to disk and begin an external sort. The same memory
allocation is used for both types of sort, AFAICS.

The external sort algorithm benefits from some memory but not much.
Knuth says that the amount of memory required is very low, with a value
typically less than 1 kB. I/O overheads mean that there is benefit from
having longer sequential writes, so the optimum is much larger than
that. I've not seen any data that indicates that a setting higher than
16 MB adds any value at all to a large external sort. I have some
indications from private tests that very high memory settings may
actually hinder performance of the sorts, though I cannot explain that
and wonder whether it is the performance tests themselves that have
issues.

Does anyone have any clear data that shows the value of large settings
of work_mem when the data to be sorted is much larger than memory? (I am
well aware of the value of setting work_mem higher for smaller sorts, so

any performance data needs to reflect only very large sorts).

If not, I would propose that when we move from qsort to tapesort mode we
free the larger work_mem setting (if one exists) and allocate only a
lower, though still optimal setting for the tapesort. That way the
memory can be freed for use by other users or the OS while the tapesort
proceeds (which is usually quite a while...).

Feedback, please.

Best Regards, Simon Riggs

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tatsuo Ishii	2005-09-24 07:29:02	questionable item in HISTORY
Previous Message	Jeremy Drake	2005-09-24 05:35:54	Re: 64-bit API for large objects