Re: B-Tree support function number 3 (strxfrm() optimization)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: B-Tree support function number 3 (strxfrm() optimization)
Date: 2014-09-11 20:13:26
Message-ID: CAM3SWZQY95Sow00b+zJycrGMR-uF1mz8rYv4_Ou2ENcvsTnxYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 10, 2014 at 11:36 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> No, not really. All you have to do is right a little test program to
> gather the information.

I don't think a little test program is useful - IMV it's too much of a
simplification to suppose that a strcoll() has a fixed cost, and a
memcmp() has a fixed cost, and that we can determine algebraically
that we should (say) proceed or not proceed with the additional
opportunistic "memcmp() == 0" optimization based solely on that. I'm
not sure if that's what you meant, but it might have been. Temporal
locality is surely a huge factor here, for example. Are we talking
about a memcmp() that always immediately precedes a similar strcoll()
call on the same memory? Are we factoring in the cost of
NUL-termination in order to make each strcoll() possible? And that's
just for starters.

However, I think it's perfectly fair to consider a case where the
opportunistic "memcmp() == 0" thing never works out (as opposed to
mostly not helping, which is what I considered earlier [1]), as long
as we're sorting real tuples. You mentioned Heikki's test case; it
seems fair to consider that, but for the non-abbreviated case where
the additional, *totally* opportunistic "memcmp == 0" optimization
only applies (so no abbreviated keys), while still having the
additional optimization be 100% useless. Clearly that test case is
also just about perfectly pessimal for this case too. (Recall that
Heikki's test case shows performance on per-sorted input, so there are
far fewer comparisons than would typically be required anyway - n
comparisons, or a "bubble sort best case". If I wanted to cheat, I
could reduce work_mem so that an external tape sort is used, since as
it happens tapesort doesn't opportunistically check for pre-sorted
input, but I won't do that. Heikki's case both emphasizes the
amortized cost of a strxfrm() where we abbreviate, and in this
instance de-emphasizes the importance of memory latency by having
access be sequential/predictable.)

The only variation I'm adding here to Heikki's original test case is
to have a leading int4 attribute that always has a value of 1 -- that
conveniently removes abbreviation (including strxfrm() overhead) as a
factor that can influence the outcome, since right now that isn't
under consideration. So:

create table sorttest (dummy int4, t text);
insert into sorttest select 1, 'foobarfo' || (g) || repeat('a', 75)
from generate_series(10000, 30000) g;

Benchmark:

pg(at)hamster:~/tests$ cat heikki-sort.sql
select * from (select * from sorttest order by dummy, t offset 1000000) f;

pgbench -f heikki-sort.sql -T 100 -n

With optimization enabled
====================
tps = 77.861353 (including connections establishing)
tps = 77.862260 (excluding connections establishing)

tps = 78.211053 (including connections establishing)
tps = 78.212016 (excluding connections establishing)

tps = 77.996117 (including connections establishing)
tps = 77.997069 (excluding connections establishing)

With optimization disabled (len1 == len2 thing is commented out)
=================================================

tps = 78.719389 (including connections establishing)
tps = 78.720366 (excluding connections establishing)

tps = 78.764819 (including connections establishing)
tps = 78.765712 (excluding connections establishing)

tps = 78.472902 (including connections establishing)
tps = 78.473844 (excluding connections establishing)

So, yes, it looks like I might have just about regressed this case -
it's hard to be completely sure. However, this is still a very
unrealistic case, since invariably "len1 == len2" without the
optimization ever working out, whereas the case that benefits [2] is
quite representative. As I'm sure you were expecting, I still favor
pursuing this additional optimization.

If you think I've been unfair or not thorough, I'm happy to look at
other cases. Also, I'm not sure that you accept that I'm justified in
considering this a separate question to the more important question of
what to do in the tie-breaker abbreviation case (where we may be
almost certain that equality will be indicated by a memcmp()). If you
don't accept that I'm right about that more important case, I guess
that means that you don't have confidence in my ad-hoc cost model (the
HyperLogLog/cardinality stuff).

[1] http://www.postgresql.org/message-id/CAM3SWZR9dtGO+zX4VEn7GTW2=+umSNq=c57SJGxG8OqHjarL7g@mail.gmail.com
[2] http://www.postgresql.org/message-id/CAM3SWZQTYv3KP+CakZJZV3RwB1OJjaHwPCZ9cOYJXPkhbtcBVg@mail.gmail.com
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-09-11 20:22:20 Re: RLS Design
Previous Message Robert Haas 2014-09-11 20:07:45 Re: [v9.5] Custom Plan API