Re: A worst case for qsort

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A worst case for qsort
Date: 2014-08-07 18:33:33
Message-ID: CA+Tgmoacp2dz7J6PH83qs3KY7FYtmHC4Mb6XAQ2=tYbMf1VogA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 7, 2014 at 1:16 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Thu, Aug 7, 2014 at 8:07 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> So here. You may not agree that the mitigation strategies for which
>> others are asking for are worthwhile, but you can't expect everyone
>> else to agree with your assessment of which cases are likely to occur
>> in practice. The case of a cohort of strings to be sorted which share
>> a long fixed prefix and have different stuff at the end does not seem
>> particularly pathological to me. It doesn't, in other words, require
>> an adversary: some real-world data sets will look like that. I will
>> forebear repeating examples I've given before, but I've seen that kind
>> of thing more than once in real data sets that people (well, me,
>> anyway) actually wanted to put into a PostgreSQL database. So I'm
>> personally of the opinion that the time you've put into trying to
>> protect against those cases is worthwhile. I understand that you may
>> disagree with that, and that's fine: we're not all going to agree on
>> everything.
>
> I actually agree with you here.

/me pops cork.

> Sorting text is important, so we
> should spend a lot of time avoiding regressions. Your example is
> reasonable - I believe that people do that to a non-trivial degree.
> The fact is, I probably can greatly ameliorate that case. However, to
> give an example of a case I have less sympathy for, I'll name the case
> you mention, *plus* the strings are already in logical order, and so
> our qsort() can (dubiously) take advantage of that, so that there is a
> 1:1 ratio of wasted strxfrm() calls and strcoll() tie-breaker calls.
> There might be other cases like that that crop up.

I think that's actually not a very unrealistic case at all. In
general, I think that if a particular data distribution is a
reasonable scenario, that data distribution plus "it's already sorted"
is also reasonable. Data gets presorted in all kinds of ways:
sometimes it gets loaded in sorted (or nearly-sorted) order from
another data source; sometimes we do an index scan that produces data
in order by column A and then later sort by A, B; sometimes somebody
clusters the table. Respecting the right of other people to have
different opinions, I don't think I'll ever be prepared to concede the
presorted case as either uncommon or unimportant.

That's not to prejudge anything that may or may not be in your patch,
which I have not studied in enormous detail. It's just what I think
about the subject in general.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-08-07 18:57:08 Re: Pg_upgrade and toast tables bug discovered
Previous Message Petr Jelinek 2014-08-07 18:23:45 Re: Minmax indexes