Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search for
  Advanced Search

Re: [PERFORM] A Better External Sort?


  • From: Ron Peacetree <rjpeace(at)earthlink(dot)net>
  • To: Zeugswetter Andreas DAZ SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
  • Subject: Re: [PERFORM] A Better External Sort?
  • Date: Thu, 29 Sep 2005 22:57:19 -0400 (EDT)
  • Message-id: <6812451(dot)1128049039096(dot)JavaMail(dot)root(at)elwamui-polski(dot)atl(dot)sa(dot)earthlink(dot)net>

>From: Zeugswetter Andreas DAZ SD <ZeugswetterA(at)spardat(dot)at>
>Sent: Sep 29, 2005 9:28 AM
>Subject: RE: [HACKERS] [PERFORM] A Better External Sort?
>
>>In my original example, a sequential scan of the 1TB of 2KB 
>>or 4KB records, => 250M or 500M records of data, being sorted 
>>on a binary value key will take ~1000x more time than reading 
>>in the ~1GB Btree I described that used a Key+RID (plus node 
>>pointers) representation of the data.
>
>Imho you seem to ignore the final step your algorithm needs of
>collecting the data rows. After you sorted the keys the collect
>step will effectively access the tuples in random order (given a 
>sufficiently large key range).
>
"Collecting" the data rows can be done for each RAM buffer full of
of data in one pass through RAM after we've built the Btree.  Then
if desired those data rows can be read out to HD in sorted order
in essentially one streaming burst.  This combination of index build
+ RAM buffer rearrangement + write results to HD can be repeat
as often as needed until we end up with an overall Btree index and
a set of sorted sublists on HD.  Overall HD IO for the process is only
two effectively sequential passes through the data.

Subsequent retrieval of the sorted information from HD can be
done at full HD streaming speed and whatever we've decided to
save to HD can be reused later if we desire.

Hope this helps,
Ron



Home | Main Index | Thread Index

Privacy Policy | PostgreSQL Archives hosted by Command Prompt, Inc. | Designed by tinysofa
Copyright © 1996 – 2008 PostgreSQL Global Development Group