Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search for
  Advanced Search

Re: [PERFORM] A Better External Sort?


  • From: Ron Peacetree <rjpeace(at)earthlink(dot)net>
  • To: "Jeffrey W. Baker" <jwbaker(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
  • Subject: Re: [PERFORM] A Better External Sort?
  • Date: Wed, 28 Sep 2005 12:03:34 -0400 (EDT)
  • Message-id: <21402654(dot)1127923414088(dot)JavaMail(dot)root(at)elwamui-polski(dot)atl(dot)sa(dot)earthlink(dot)net>

>From: "Jeffrey W. Baker" <jwbaker(at)acm(dot)org>
>Sent: Sep 27, 2005 1:26 PM
>To: Ron Peacetree <rjpeace(at)earthlink(dot)net>
>Subject: Re: [HACKERS] [PERFORM] A Better External Sort?
>
>On Tue, 2005-09-27 at 13:15 -0400, Ron Peacetree wrote:
>
>>That Btree can be used to generate a physical reordering of the data
>>in one pass, but that's the weakest use for it.  The more powerful
>>uses involve allowing the Btree to persist and using it for more
>>efficient re-searches or combining it with other such Btrees (either as
>>a step in task distribution across multiple CPUs or as a more efficient
>>way to do things like joins by manipulating these Btrees rather than
>>the actual records.)
>
>Maybe you could describe some concrete use cases.  I can see what
>you are getting at, and I can imagine some advantageous uses, but
>I'd like to know what you are thinking.
>
1= In a 4P box, we split the data in RAM into 4 regions and create
a CPU cache friendly Btree using the method I described for each CPU.
The 4 Btrees can be merged in a more time and space efficient manner
than the original records to form a Btree that represents the sorted order
of the entire data set.  Any of these Btrees can be allowed to persist to
lower the cost of doing similar operations in the future (Updating the
Btrees during inserts and deletes is cheaper than updating the original
data files and then redoing the same sort from scratch in the future.)
Both the original sort and future such sorts are made more efficient
than current methods.

2= We use my method to sort two different tables.  We now have these
very efficient representations of a specific ordering on these tables.  A
join operation can now be done using these Btrees rather than the
original data tables that involves less overhead than many current
methods.

3=  We have multiple such Btrees for the same data set representing
sorts done using different fields (and therefore different Keys).
Calculating a sorted order for the data based on a composition of
those Keys is now cheaper than doing the sort based on the composite
Key from scratch.  When some of the Btrees exist and some of them
do not, there is a tradeoff calculation to be made.  Sometimes it will be
cheaper to do the sort from scratch using the composite Key.   


>Specifically I'd like to see some cases where this would beat sequential
>scan.  I'm thinking that in your example of a terabyte table with a
>column having only two values, all the queries I can think of would be
>better served with a sequential scan.
>
In my original example, a sequential scan of the 1TB of 2KB or 4KB
records, => 250M or 500M records of data, being sorted on a binary
value key will take ~1000x more time than reading in the ~1GB Btree
I described that used a Key+RID (plus node pointers) representation
of the data.
 
Just to clarify the point further,
1TB of 1B records => 2^40 records of at most 256 distinct values.
1TB of 2B records => 2^39 records of at most 2^16 distinct values.
1TB of 4B records => 2^38 records of at most 2^32 distinct values.
1TB of 5B records => 200B records of at most 200B distinct values.


Home | Main Index | Thread Index

Privacy Policy | PostgreSQL Archives hosted by Command Prompt, Inc. | Designed by tinysofa
Copyright © 1996 – 2008 PostgreSQL Global Development Group