Re: Randomisation for ensuring nlogn complexity in quicksort

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Randomisation for ensuring nlogn complexity in quicksort
Date: 2013-07-02 16:02:01
Message-ID: CAGTBQpbEX=0TJdQ-P5bdE2rTRa=jPQ+vE3Cgs2SMmsb2XJq3ug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 2, 2013 at 12:36 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Tue, Jul 2, 2013 at 5:04 AM, Atri Sharma <atri(dot)jiit(at)gmail(dot)com> wrote:
>>> I think if you'll try it you'll find that we perform quite well on
>>> data sets of this kind - and if you read the code you'll see why.
>>
>> Right, let me read the code again from that viewpoint.
>
> In my opinion, it would be worthwhile reading the original Bentley and
> McIlroy paper [1] and using what you learned to write a patch that
> adds comments throughout the canonical qsort_arg, and perhaps the
> other variants.
>
> [1] http://www.enseignement.polytechnique.fr/informatique/profs/Luc.Maranget/421/09/bentley93engineering.pdf

That's weird, it doesn't seem as sophisticated as even libc's introsort.

Perhaps an introsort[1] approach wouldn't hurt: do the quick and dirty
median selection pg is already doing (or a better one if a better one
is found), but check recursion depth/input size ratios.

When across K recursive calls the input set hasn't been halved in
size, switch to median of medians to guard off against quadratic
complexity.

[1] http://en.wikipedia.org/wiki/Selection_algorithm#Introselect

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2013-07-02 16:12:11 Re: Custom gucs visibility
Previous Message Heikki Linnakangas 2013-07-02 15:37:37 Re: MVCC catalog access