Re: B-Tree support function number 3 (strxfrm() optimization)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: B-Tree support function number 3 (strxfrm() optimization)
Date: 2014-08-07 05:36:28
Message-ID: CAM3SWZRvK3jCpnewxM9OMKW_JvyheaC45u6sFcQQ9E1vt=9sqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 5, 2014 at 9:32 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> I knew that I'd heard that at least once. Apparently some other
> database systems have external sorts that tend to be faster than
> equivalent internal sorts. I'd guess that that is an artifact of their
> having a substandard internal sort, though.

This *almost* applies to patched Postgres if you pick a benchmark that
is very sympathetic to my patch. To my surprise, work_mem = '10MB'
(which results in an external tape sort) is sometimes snapping at the
heels of a work_mem = '5GB' setting (which results in an in-memory
quicksort).

I have a 338 MB table, that consists or a single text column of 8 byte
strings strings, with high cardinality. I ran VACUUM FREEZE, and took
all the usual precautions of that kind. On the test table n_distinct =
-1, and there is no discernible physical/logical correlation.

The external sort case stabilized as follows:

LOG: duration: 9731.776 ms statement: select * from (select * from
besttest order by tt offset 10000000) i;
LOG: duration: 9742.948 ms statement: select * from (select * from
besttest order by tt offset 10000000) i;
LOG: duration: 9733.918 ms statement: select * from (select * from
besttest order by tt offset 10000000) i;

The in-memory case stabilized as follows:

LOG: duration: 0.059 ms statement: set work_mem = '5GB';
LOG: duration: 9665.731 ms statement: select * from (select * from
besttest order by tt offset 10000000) i;
LOG: duration: 9602.841 ms statement: select * from (select * from
besttest order by tt offset 10000000) i;
LOG: duration: 9609.107 ms statement: select * from (select * from
besttest order by tt offset 10000000) i;

FWIW, master performed as follows with work_mem = '10MB':

LOG: duration: 60456.943 ms statement: select * from (select * from
besttest order by tt offset 10000000) i;
LOG: duration: 60368.987 ms statement: select * from (select * from
besttest order by tt offset 10000000) i;
LOG: duration: 61223.942 ms statement: select * from (select * from
besttest order by tt offset 10000000) i;

And master did quite a lot better with work_mem = '5GB', in a way that
fits with my prejudices about how quicksort is supposed to perform
relative to tape sort:

LOG: duration: 0.060 ms statement: set work_mem = '5GB';
LOG: duration: 41697.659 ms statement: select * from (select * from
besttest order by tt offset 10000000) i;
LOG: duration: 41755.496 ms statement: select * from (select * from
besttest order by tt offset 10000000) i;
LOG: duration: 41883.888 ms statement: select * from (select * from
besttest order by tt offset 10000000) i;

work_mem = '10MB' continues to be the external sort sweet spot for
this hardware, for whatever reason - I can add a few seconds to
execution time by increasing or decreasing the setting a bit. I'm
using an Intel Core i7-3520M CPU @ 2.90GHz, with a /proc/cpuinfo
reported L3 cache size of 4096 KB. I have been very careful to take
into account power saving features throughout all
experimentation/benchmarking of this patch and previous abbreviated
key patches - failing to do so is a good way to end up with complete
garbage when investigating this kind of thing.

Anyway, I'm not sure what this tells us about quicksort and tape sort,
but I think there might be an interesting and more general insight to
be gained here. I'd have thought that tape sort wastes memory
bandwidth by copying to operating system buffers to the extent that
things are slowed down considerably (this is after all a test
performed with lots of memory available, even when work_mem = '10
MB'). And even if that wasn't a significant factor, I'd expect
quicksort to win decisively anyway. Why does this happen?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-08-07 06:24:36 Re: Fixed redundant i18n strings in json
Previous Message Fujii Masao 2014-08-07 05:10:30 Re: psql: show only failed queries