Quick Links

Re: A worst case for qsort

Lists:	pgsql-hackers

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	A worst case for qsort
Date:	2014-08-06 00:15:57
Message-ID:	CAM3SWZS3obANDaPqDy4rPSA962O7Tb1x-HDqE+AMkD8FwaFPrQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

There was some informal discussion of worst case performance of my
text sort support patch at the most recent developer's meeting in
Ottawa. There is no need to repeat the details here, which aren't
terribly interesting - I made a point about putting worst case
performance in context. I also made an argument around quicksort,
which despite being a widely used sorting algorithm is known to go
quadratic in the worst case, a really bad outcome. I've heard stories
about GUIs that hit the worst case and froze when a user clicked on a
column header to sort its contents. This kind of thing occurred until
surprisingly recently, and was one reason why Postgres eventually
adopted its own qsort() rather than using the OS-supplied
implementation. A lot of the measures proposed by Sedgewick in
particular made these outcomes occur very infrequently in practice.

Our quicksort routine is almost a verbatim copy of the implementation
that appeared in NetBSD, which is itself almost a verbatim copy of the
implementation that appears in the 1993 paper by J. L. Bentley and M.
D. McIlroy, ‘Engineering a sort function’. McIlroy
(with some input from Bentley) subsequently described a way of making
many high quality implementations perform very badly, including his
own:

http://www.cs.dartmouth.edu/~doug/mdmspe.pdf

I don't want to belabor the point here. I do not mean to imply that
everything I've proposed to date wrt sort support for text is
self-evidently reasonable, in the same way that our use of Bentley and
McIlroy's qsort() seems reasonable on balance. I expect some
interesting discussion there.

Anyway, the paper says:

"The adversarial method works for almost any polymorphic program
recognizable as quicksort. The subject quicksort may copy values at
will, or work with lists rather than arrays. It may even pick the
pivot at random. The quicksort will be vulnerable provided only that
it satisfies some mild assumptions that are met by every
implementation I have seen".

IMHO, while worst case performance is a very useful tool for analyzing
algorithms (particularly their worst case time complexity), a worst
case should be put in its practical context. For example, if we had
reason to be concerned about *adversarial* inputs, I think that there
is a good chance that our qsort() actually would be problematic to the
point of driving us to prefer some generally slower alternative.

--
Peter Geoghegan

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-06 06:49:18
Message-ID:	alpine.DEB.2.10.1408060838090.24380@sto
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> For example, if we had reason to be concerned about *adversarial*
> inputs, I think that there is a good chance that our qsort() actually
> would be problematic to the point of driving us to prefer some generally
> slower alternative.

That is an interesting point.

Indeed, a database in general often stores user-supplied data, which may
happen to be sorted for presentation purpose in an interface. Same thing
occured with hashtable algorithms and was/is a way to do DOS attacks on
web applications. I'm not sure whether the qsort version discussed here
would apply on user-supplied data, though. If so, adding some randomness
in the decision process would suffice to counter the adversarial input
argument you raised.

--
Fabien.

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-06 06:57:07
Message-ID:	CAM3SWZS3AMdRL+8B0DVrgJCFTVQWxrw0qmz00432vdTCgnAiOA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 5, 2014 at 11:49 PM, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
> If so, adding some randomness in the decision process would suffice to
> counter the adversarial input argument you raised.

This is specifically addressed by the paper. Indeed, randomly choosing
a pivot is a common strategy. It won't fix the problem.

--
Peter Geoghegan

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-06 13:56:24
Message-ID:	alpine.DEB.2.10.1408061548010.28413@sto
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>> If so, adding some randomness in the decision process would suffice to
>> counter the adversarial input argument you raised.
>
> This is specifically addressed by the paper. Indeed, randomly choosing
> a pivot is a common strategy. It won't fix the problem.

Too bad. I must admit that I do not see how to build a test case which
would trigger a worst case behavior against a qsort which chooses the
pivot randomly, but I have not read the paper, and possibly there is an
element of context which is eluding me.

--
Fabien.

From:	John Cochran <j69cochran(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: A worst case for qsort
Date:	2014-08-06 16:18:52
Message-ID:	CAGQU3n7ZPPLN9C2pcXrp9UoOOFMm-Y4g0jFAVufCHEX8-L3-Zg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I just browsed the paper linked by Peter and it looks like the attack has
to be active against a currently executing qsort. In the paper, what
happens is the comparison function is supplied by the attacker and
effectively lies about the result of a comparison. It keeps the lies
consistent in a very specific manner so that eventually qsort returns its
input in a properly sorted fashion.

So it seems to me that the vulnerability only exits if an attacker supplied
comparison function is permitted. For all other cases, assuming that only
vetted comparison functions are permitted, the selection of a random pivot
would make an attack on qsort using specially tailored input data
improbable.

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	John Cochran <j69cochran(at)gmail(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-06 17:52:36
Message-ID:	CAM3SWZTujgbu17eGOWSdub=t9uneb6t00QG_C9roJDQ=HqQNSg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Aug 6, 2014 at 9:18 AM, John Cochran <j69cochran(at)gmail(dot)com> wrote:
> So it seems to me that the vulnerability only exits if an attacker supplied
> comparison function is permitted. For all other cases, assuming that only
> vetted comparison functions are permitted, the selection of a random pivot
> would make an attack on qsort using specially tailored input data
> improbable.

Whether or not random pivot selection would make it more difficult for
a malicious party to generate pre-processed data that will cause very
bad performance is not all that relevant IMV, since we don't do that.
There are good practical reasons to prefer median of medians pivot
selection, which is what we actually do, and which is clearly affected
to the extent that pre-processing malicious data that causes (at
least) near worst-case performance is possible. It's possible to
predict pivot selection. As the paper says, "An adversary can make
such a quicksort go quadratic by arranging for the pivot to compare
low against almost all items not seen during pivot selection". Random
pivot selection will certainly result in more frequent lopsided
partitions without any malicious intent.

--
Peter Geoghegan

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	John Cochran <j69cochran(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-06 19:13:52
Message-ID:	alpine.DEB.2.10.1408062105240.30492@sto
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Random pivot selection will certainly result in more frequent lopsided
> partitions without any malicious intent.

Yep. It makes "adversary input" attacks more or less impossible, at the
price of higher average cost. Maybe a less randomized version would do,
i.e. select randomly one of the "3" medians found, but would keep some
nice better performance property. It would need proof, though.

--
Fabien.

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-07 07:17:00
Message-ID:	alpine.DEB.2.10.1408070905020.4194@sto
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> IMHO, while worst case performance is a very useful tool for analyzing
> algorithms (particularly their worst case time complexity), a worst
> case should be put in its practical context. For example, if we had
> reason to be concerned about *adversarial* inputs, I think that there
> is a good chance that our qsort() actually would be problematic to the
> point of driving us to prefer some generally slower alternative.

ISTM that you raise two distinct questions wrt to PostgreSQL, which are,
is the worst case performance really an issue:
(1) in general
(2) wrt adversarial inputs

The answer could be (1) "mostly no" and (2) "maybe yes".

It suggests that where qsort is used, the administrator wary of (2) could
be allowed to use an alternate implementation, maybe some merge sort, say
by tweaking a configuration option in "postgresql.conf".

--
Fabien.

From:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To:	John Cochran <j69cochran(at)gmail(dot)com>
Cc:	PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-07 07:18:16
Message-ID:	alpine.DEB.2.10.1408070846420.4194@sto
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello John,

> [...]
> In fact, the mentioned paper says this about the subject "Moreover, if
> worst-case performance is important, Quicksort is the wrong algorithm."

I fully agree with this conclusion.

--
Fabien

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-07 15:07:23
Message-ID:	CA+TgmoaFYNL0N3QJK=Tj9nbe3Usg=jYAvrziJZy298R=tC7gFA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 5, 2014 at 8:15 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> "The adversarial method works for almost any polymorphic program
> recognizable as quicksort. The subject quicksort may copy values at
> will, or work with lists rather than arrays. It may even pick the
> pivot at random. The quicksort will be vulnerable provided only that
> it satisfies some mild assumptions that are met by every
> implementation I have seen".
>
> IMHO, while worst case performance is a very useful tool for analyzing
> algorithms (particularly their worst case time complexity), a worst
> case should be put in its practical context. For example, if we had
> reason to be concerned about *adversarial* inputs, I think that there
> is a good chance that our qsort() actually would be problematic to the
> point of driving us to prefer some generally slower alternative.

I completely agree with this, and I think everyone else does, too.
Where we don't all necessarily agree is which worst cases are
realistic and which worst cases are simply pathological cases with
which we need not be concerned in practice. For example, when I
proposed the patch to use MVCC catalog snapshots, Andres invented a
test case which I thought was far more brutal than anything likely to
be encountered in the real world. But Andres didn't agree; he thought
the test case was realistic. So, I worked on the patch some more and
came up with a solution that performs acceptably even if those brutal
conditions are encountered in the world. As a result, the patch as
committed is better than the one originally submitted. I could
certainly have spent more time debating whether that effort was
worthwhile, and maybe I would have won the argument, but it was a
better of use of that time to instead try to improve the patch.

So here. You may not agree that the mitigation strategies for which
others are asking for are worthwhile, but you can't expect everyone
else to agree with your assessment of which cases are likely to occur
in practice. The case of a cohort of strings to be sorted which share
a long fixed prefix and have different stuff at the end does not seem
particularly pathological to me. It doesn't, in other words, require
an adversary: some real-world data sets will look like that. I will
forebear repeating examples I've given before, but I've seen that kind
of thing more than once in real data sets that people (well, me,
anyway) actually wanted to put into a PostgreSQL database. So I'm
personally of the opinion that the time you've put into trying to
protect against those cases is worthwhile. I understand that you may
disagree with that, and that's fine: we're not all going to agree on
everything.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	John Cochran <j69cochran(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: A worst case for qsort
Date:	2014-08-07 17:00:19
Message-ID:	CAGQU3n5YjS-iayG6Ah5_sBV5nhtw+zxptDNrdq8YZgP=7f5fiw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Aug 7, 2014 at 11:07 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Tue, Aug 5, 2014 at 8:15 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> > "The adversarial method works for almost any polymorphic program
> > recognizable as quicksort. The subject quicksort may copy values at
> > will, or work with lists rather than arrays. It may even pick the
> > pivot at random. The quicksort will be vulnerable provided only that
> > it satisfies some mild assumptions that are met by every
> > implementation I have seen".
> >
> > IMHO, while worst case performance is a very useful tool for analyzing
> > algorithms (particularly their worst case time complexity), a worst
> > case should be put in its practical context. For example, if we had
> > reason to be concerned about *adversarial* inputs, I think that there
> > is a good chance that our qsort() actually would be problematic to the
> > point of driving us to prefer some generally slower alternative.
>

If one is concerned about adversarial inputs, then as mentioned earlier,
two main steps need to be taken.
1. Don't permit potential adversaries define the comparison function used
by qsort().
2. Perform random selection of pivot to prevent precomputed lists of data
that invoke quadratic behavior in qsort.

And before the argument that a random pivot will be less efficient than the
median of median that's currently being used, you need to look at what is
currently being used.
1. If a "small" partition is being sorted, the element at n/2 is selected
as the pivot with n being the size of the partition.
2. If a "medium" partition is being sorted, then pivot selected is the
median of the elements at 0, n/2, and n.
3. And finally, if a "large" partition is being sorted, potential pivots
are selected from 0, n/8, n/4, 3n/8, n/2, 5n/8, 3n/4, 7n/8, and n. Of
those 9 candidates, the median of medians is selected as the pivot.

There is nothing in the world that would prevent the following.
1. Short partition - select a pivot at random.
2. Medium partition - select the median of 3 randomly selected candidates.
3. Large partition - select the median of medians of 9 randomly selected
candidates.

Using the above random pivot selection methods, the performance of a
hardened qsort should be close to that of an unhardened qsort. The only
real difference is the cost of generating random numbers to select the
candidates for the median tests. However, if an malicious compare function
is permitted to be defined, it would still be vulnerable to that attack,
but it would be pretty much immune to a precomputed data attack.

Or perhaps it just might be simpler to detect an attack and fall back to a
sort algorithm that doesn't have quadratic behavior.

What I mean by that is the qsort in the "Engineering a Sort Function" has a
big O of approximately 1.1 n ln n comparisons. If upon starting qsort, it
computes an upper bound of comparisons it's willing to perform. Say 3 n ln
n or so (1.386 n log n is the estimate for a randomly selected pivot). Then
if the upper bound is reached, the qsort function backs out leaving the
array in a partially sorted order, and then calls a slower sort that
doesn't exhibit quadratic behavior to actually finish the sort.

The result of doing that would be most, if not all, sorts being handled by
qsort in a timely fashion. But if bad luck or an adversary strikes, the
qsort bails out after things look bad and passes the task to a different
sort. I'll be the first to admit that the hybrid approach would be slower
in those bad cases than it would be if the alternate sort was selected at
the beginning, but it would be faster than letting qsort continue with
quadratic behavior.

There are two ways of constructing a software design: One way is to make it
so simple that there are obviously no deficiencies and the other way is to
make it so complicated that there are no obvious deficiencies. — C.A.R.
Hoare

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-07 17:16:15
Message-ID:	CAM3SWZRkSUYe9bpY9ZVxCH7ogEwHfR6OCH2E1KEMjFemSciQQw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Aug 7, 2014 at 8:07 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> So here. You may not agree that the mitigation strategies for which
> others are asking for are worthwhile, but you can't expect everyone
> else to agree with your assessment of which cases are likely to occur
> in practice. The case of a cohort of strings to be sorted which share
> a long fixed prefix and have different stuff at the end does not seem
> particularly pathological to me. It doesn't, in other words, require
> an adversary: some real-world data sets will look like that. I will
> forebear repeating examples I've given before, but I've seen that kind
> of thing more than once in real data sets that people (well, me,
> anyway) actually wanted to put into a PostgreSQL database. So I'm
> personally of the opinion that the time you've put into trying to
> protect against those cases is worthwhile. I understand that you may
> disagree with that, and that's fine: we're not all going to agree on
> everything.

I actually agree with you here. Sorting text is important, so we
should spend a lot of time avoiding regressions. Your example is
reasonable - I believe that people do that to a non-trivial degree.
The fact is, I probably can greatly ameliorate that case. However, to
give an example of a case I have less sympathy for, I'll name the case
you mention, *plus* the strings are already in logical order, and so
our qsort() can (dubiously) take advantage of that, so that there is a
1:1 ratio of wasted strxfrm() calls and strcoll() tie-breaker calls.
There might be other cases like that that crop up.

I also thought the paper was pretty cool, and thought it might be
interesting because of the fact that is was authored by McIlroy, and
he refers to weaknesses in his and our very own implementation
specifically.
--
Peter Geoghegan

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-07 18:33:33
Message-ID:	CA+Tgmoacp2dz7J6PH83qs3KY7FYtmHC4Mb6XAQ2=tYbMf1VogA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Aug 7, 2014 at 1:16 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Thu, Aug 7, 2014 at 8:07 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> So here. You may not agree that the mitigation strategies for which
>> others are asking for are worthwhile, but you can't expect everyone
>> else to agree with your assessment of which cases are likely to occur
>> in practice. The case of a cohort of strings to be sorted which share
>> a long fixed prefix and have different stuff at the end does not seem
>> particularly pathological to me. It doesn't, in other words, require
>> an adversary: some real-world data sets will look like that. I will
>> forebear repeating examples I've given before, but I've seen that kind
>> of thing more than once in real data sets that people (well, me,
>> anyway) actually wanted to put into a PostgreSQL database. So I'm
>> personally of the opinion that the time you've put into trying to
>> protect against those cases is worthwhile. I understand that you may
>> disagree with that, and that's fine: we're not all going to agree on
>> everything.
>
> I actually agree with you here.

/me pops cork.

> Sorting text is important, so we
> should spend a lot of time avoiding regressions. Your example is
> reasonable - I believe that people do that to a non-trivial degree.
> The fact is, I probably can greatly ameliorate that case. However, to
> give an example of a case I have less sympathy for, I'll name the case
> you mention, *plus* the strings are already in logical order, and so
> our qsort() can (dubiously) take advantage of that, so that there is a
> 1:1 ratio of wasted strxfrm() calls and strcoll() tie-breaker calls.
> There might be other cases like that that crop up.

I think that's actually not a very unrealistic case at all. In
general, I think that if a particular data distribution is a
reasonable scenario, that data distribution plus "it's already sorted"
is also reasonable. Data gets presorted in all kinds of ways:
sometimes it gets loaded in sorted (or nearly-sorted) order from
another data source; sometimes we do an index scan that produces data
in order by column A and then later sort by A, B; sometimes somebody
clusters the table. Respecting the right of other people to have
different opinions, I don't think I'll ever be prepared to concede the
presorted case as either uncommon or unimportant.

That's not to prejudge anything that may or may not be in your patch,
which I have not studied in enormous detail. It's just what I think
about the subject in general.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-07 19:06:11
Message-ID:	CAM3SWZT1=PZRp721edUcNayWBkfeiaq9pFjLk6eZX_9TmvPnWQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Aug 7, 2014 at 11:33 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think that's actually not a very unrealistic case at all. In
> general, I think that if a particular data distribution is a
> reasonable scenario, that data distribution plus "it's already sorted"
> is also reasonable. Data gets presorted in all kinds of ways:
> sometimes it gets loaded in sorted (or nearly-sorted) order from
> another data source; sometimes we do an index scan that produces data
> in order by column A and then later sort by A, B; sometimes somebody
> clusters the table. Respecting the right of other people to have
> different opinions, I don't think I'll ever be prepared to concede the
> presorted case as either uncommon or unimportant.

I think that pre-sorted, all-unique text datums, that have all
differences beyond the first 8 bytes, that the user happens to
actually want to sort are fairly rare. All I'm saying is: if that's a
case that has to lose out more than your more limited case in order to
get a large number of representative cases 3 - 7 times faster, and
marginal cases a good bit faster, that's probably worth it. I am
willing to fix any case that I can - as you say, it's often easier to
write the code that to argue - but let's have a sense of proportion
about these things. Even your quite reasonable case is going to lose
out a bit, because what I've done is fundamentally about deciding at
intervals during copying tuples if it's worth it. If it isn't, then
clearly that whole process went to waste, and we've merely cut our
losses. We could add a GUC to control it as suggested by Simon, or
even put something in attoptions per attribute to indicate that the
optimization should be avoided, but that's something that we've
historically resisted doing.

> That's not to prejudge anything that may or may not be in your patch,
> which I have not studied in enormous detail. It's just what I think
> about the subject in general.

Fair enough. At the risk of sounding trite, I'll add: everything is a
trade-off.

--
Peter Geoghegan

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-07 21:06:51
Message-ID:	CAM3SWZRY5EwLM8oGWhYJ8ypvM5DAz2aOSEquYn+YR6y_ZLsKAg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Aug 7, 2014 at 12:06 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> I think that pre-sorted, all-unique text datums, that have all
> differences beyond the first 8 bytes, that the user happens to
> actually want to sort are fairly rare.

Actually, you could use that case to justify not doing strxfrm()
transformation of very large lists in the more general case, where
strxfrm() blobs aren't truncated/abbreviated, but our qsort() is used,
which goes against the strong recommendation of, just to pick an
example at random, the glibc documentation. It states: "Here is an
example of how you can use strxfrm when you plan to do many
comparisons. It does the same thing as the previous example, but much
faster, because it has to transform each string only once, no matter
how many times it is compared with other strings. Even the time needed
to allocate and free storage is much less than the time we save, when
there are many strings." [1]

Sure, that would still be better than the equivalent abbreviated keys
case, since no strcoll() tie-breakers are required, but it would still
be bad, and all because of our pre-check for sorted input.

[1] https://www.gnu.org/software/libc/manual/html_node/Collation-Functions.html
--
Peter Geoghegan

From:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-07 21:23:23
Message-ID:	CAKddOFDunnmZThbw0DGRvrs3hEyB7TULmKdwr4cd4D=K=+7EOQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Aug 7, 2014 at 3:06 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:

> I think that pre-sorted, all-unique text datums, that have all
> differences beyond the first 8 bytes, that the user happens to
> actually want to sort are fairly rare.

While I'm sure it's not common, I've seen a couple of ten-million tuple
tables having a URL column as primary key where 98% of the entries begin
with 'http://www.'

So, that exact scenario is out there.

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-07 21:33:10
Message-ID:	CAM3SWZTTSUUa0SWhDPqi9QR5TUqF5mhBcDsLAUdrXR6_4k9vzQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Aug 7, 2014 at 2:23 PM, Rod Taylor <rod(dot)taylor(at)gmail(dot)com> wrote:
> While I'm sure it's not common, I've seen a couple of ten-million tuple
> tables having a URL column as primary key where 98% of the entries begin
> with 'http://www.'
>
> So, that exact scenario is out there.

Sure, that scenario is relatively common. My point was that that
scenario, alongside a perfect logical/physical heap correlation, and
alongside a frequent need to sort is uncommon. If you're able to get
away with a "bubble sort best case" there, then the sort is going to
be extremely fast relative to any other database system anyway, and
you don't have too much to complain about. It's exactly the same
wasted cost as in the general case (where the tuples aren't in order),
except that it looks more expensive next to your unrealistically fast
sort that you were very lucky to be able to get away with.

I think the special pre-check for already sorted tuples within qsort()
is at best only justified by scenarios like where a serial primary key
index is reindexed. That's about it.

You can totally alter the outcome of this case by inserting a single
tuple, so the top-level pre-check fails.

--
Peter Geoghegan

From:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-07 21:34:30
Message-ID:	CAKddOFAfkp_3M7k+-==qcGO8FRPX=6QP9FNtsSwCMFM+pZdpiA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Sigh. Found another example.

A table with 15 million entries and a unique key on filesystem location for
things users created via a web interface.

Entries all begin with /usr/home/ ...

This one is frequently sorted as batch operations against the files are
performed in alphabetical order to reduce conflict issues that a random
ordering may cause between jobs.

regards,

Rod

On Thu, Aug 7, 2014 at 5:23 PM, Rod Taylor <rod(dot)taylor(at)gmail(dot)com> wrote:

>
> On Thu, Aug 7, 2014 at 3:06 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>
>> I think that pre-sorted, all-unique text datums, that have all
>> differences beyond the first 8 bytes, that the user happens to
>> actually want to sort are fairly rare.
>
>
> While I'm sure it's not common, I've seen a couple of ten-million tuple
> tables having a URL column as primary key where 98% of the entries begin
> with 'http://www.'
>
> So, that exact scenario is out there.
>

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-07 21:52:17
Message-ID:	CAM3SWZR2R-ypWzJ=YpdKPKmchFe4xxgFmCYqPi86L=D9D0=x2g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Aug 7, 2014 at 2:34 PM, Rod Taylor <rod(dot)taylor(at)gmail(dot)com> wrote:
> This one is frequently sorted as batch operations against the files are
> performed in alphabetical order to reduce conflict issues that a random
> ordering may cause between jobs.

Sure. There are cases out there. But, again, I have a hard time
imagining why you'd expect those to be pre-sorted in practice, and
particularly why you'd feel justified in expecting that to sort much
faster than equivalent though slightly imperfectly correlated data.
Without that, the fmgr-elision aspect of sort support appears to offer
enough for us to still win on balance [1], assuming 9.4 is our basis
of comparison.

[1] http://www.postgresql.org/message-id/CAM3SWZQHjxiyrsqBs5w3u-vTJ_jT2hp8o02big5wYb4m9Lp1jg@mail.gmail.com
--
Peter Geoghegan

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-08 12:54:31
Message-ID:	CA+TgmobewrKKHG6wPovAmS3EZH+3kvD7Wv2uBWyiyyVbuzhidw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Aug 7, 2014 at 5:52 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Thu, Aug 7, 2014 at 2:34 PM, Rod Taylor <rod(dot)taylor(at)gmail(dot)com> wrote:
>> This one is frequently sorted as batch operations against the files are
>> performed in alphabetical order to reduce conflict issues that a random
>> ordering may cause between jobs.
>
> Sure. There are cases out there. But, again, I have a hard time
> imagining why you'd expect those to be pre-sorted in practice, ...

Well, I'm not sure why you're having a hard time imagining it.
Presorted input is a common case in general; that's why we have a
check for it. That check adds overhead in the non-pre-sorted case to
improve the pre-sorted case, and nobody's ever argued for removing it
that I can recall.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Peter Geoghegan <pg(at)heroku(dot)com>, Rod Taylor <rod(dot)taylor(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-08 13:29:54
Message-ID:	20140808132954.GL16422@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> On Thu, Aug 7, 2014 at 5:52 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> > On Thu, Aug 7, 2014 at 2:34 PM, Rod Taylor <rod(dot)taylor(at)gmail(dot)com> wrote:
> >> This one is frequently sorted as batch operations against the files are
> >> performed in alphabetical order to reduce conflict issues that a random
> >> ordering may cause between jobs.
> >
> > Sure. There are cases out there. But, again, I have a hard time
> > imagining why you'd expect those to be pre-sorted in practice, ...
>
> Well, I'm not sure why you're having a hard time imagining it.
> Presorted input is a common case in general; that's why we have a
> check for it. That check adds overhead in the non-pre-sorted case to
> improve the pre-sorted case, and nobody's ever argued for removing it
> that I can recall.

Agreed. This is not all that uncommon to happen in practice.

That said- this is perhaps another good case for where it'd be
extremely handy to have anonymized statistics information from real
systems which would allow us to more easily go *look* at this
specific question.

There are organizations out there who run many thousands of PG instances
who have expressed interest in supporting exactly that kind of
statistics gathering (indeed, I'm pretty sure Peter is familiar with
one... ;) - what we need is an architecture, design, and implementation
to make it happen..

I'm guessing you (Robert and Peter, especially) have already been
thinking about what we could do to make the above wish/dream a reality.
Perhaps if we could get the initial requirements down, someone would be
able to have time to work on making it happen; it would be really great
to see progress on this front for 9.5. Or, if existing products
implement such metrics collection already, perhaps some numbers could
be shared with the community to help address this (and other)
questions.

Thanks,

Stephen

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Rod Taylor <rod(dot)taylor(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: A worst case for qsort
Date:	2014-08-08 16:50:14
Message-ID:	CAM3SWZSvXeZW_6_-d_0v6JgPoGKJMYph1oyZAszCprx71ierMw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Aug 8, 2014 at 5:54 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Well, I'm not sure why you're having a hard time imagining it.
> Presorted input is a common case in general; that's why we have a
> check for it. That check adds overhead in the non-pre-sorted case to
> improve the pre-sorted case, and nobody's ever argued for removing it
> that I can recall.

I think that there has been a fair amount of skepticism - e.g., [1]

But that's beside the point. What I mean is that I think that the
intersection of those three things - pre-sorted input, with all
differences after the first 8 bytes, and with a user requirement to
sort using the column - is fairly rare in practice. It's not
impossible, but it is fairly rare. If that was the only case that was
actually regressed, even taking into account fmgr elision, I think
that would be quite reasonable. It wouldn't be massively regressed
either, and it's a case that's already very fast relative to other
systems anyway, if you're lucky enough to get it.

You'd better have exactly sorted tuples, though. It's been my
observation that one slight difference can drastically alter the
outcome [2].

[1] http://www.postgresql.org/message-id/18033.1361789032@sss.pgh.pa.us
[2] http://www.postgresql.org/message-id/CAEYLb_W++UhrcWprzG9TyBVF7Sn-c1s9oLbABvAvPGdeP2DFSQ@mail.gmail.com
--
Peter Geoghegan