Re: Randomisation for ensuring nlogn complexity in quicksort

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Randomisation for ensuring nlogn complexity in quicksort
Date: 2013-07-01 20:31:59
Message-ID: CAGTBQpbWh9FZcv0wOWuKSeFDoCdzieG7d7M_spQTA5JRU6nhbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 1, 2013 at 5:12 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Jul 1, 2013 at 3:54 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
>> On Mon, Jul 1, 2013 at 4:32 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> This shouldn't be too complex, and should give us a fixed nlogn complexity even for wild data sets, without affecting existing normal data sets that are present in every day transactions. I even believe that those data sets will also benefit from the above optimisation.
>>>
>>> The only method of selecting a pivot for quicksort that obtain O(n lg
>>> n) run time with 100% certainty is have a magical oracle inside the
>>> computer that tells you in fixed time and with perfect accuracy which
>>> pivot you should select.
>>
>> Doesn't a linear median algorithm (say median of medians) get you O(n lg n)?
>>
>> Granted, with a huge constant (I think 4)... but it should still be O(n lg n).
>
> No. Thinking about this a little more, I believe the way it works
> out is that any algorithm for picking the median that guarantees that
> a certain *percentage* of the tuples will be in the smaller partition
> will have O(n lg n) complexity, but any algorithm that only guarantees
> that a fixed *number* of tuples in the smaller partition is still
> quadratic in complexity. In the case of a median algorithm, you're
> only guaranteed to have 2 elements in the smaller partition, which is
> a constant. If you take a median of medians, you're guaranteed to
> have 8 elements in the smaller partition, which is bigger, but still a
> constant.

What?

A median of medians algorithm will guarantee floor(N/2) elements on
the smaller. That's the definition of median.

Note that I'm referring to picking the actual median of all tuples,
not just a sample. That's slow, but it guarantees O(n log n).

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2013-07-01 20:39:14 Re: Bugfix and new feature for PGXS
Previous Message Peter Eisentraut 2013-07-01 20:30:40 Re: plpython implementation