Re: B-Tree support function number 3 (strxfrm() optimization)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: B-Tree support function number 3 (strxfrm() optimization)
Date: 2014-09-15 17:17:28
Message-ID: CA+Tgmoacg2AXTJBvFnqJxk3Q-UEV_687jNWUBe4kAVR2_p9xPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Sep 14, 2014 at 10:37 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 09/13/2014 11:28 PM, Peter Geoghegan wrote:
>> Anyway, attached rough test program implements what you outline. This
>> is for 30,000 32 byte strings (where just the final two bytes differ).
>> On my laptop, output looks like this (edited to only show median
>> duration in each case):
>
> Got to be careful to not let the compiler optimize away microbenchmarks like
> this. At least with my version of gcc, the strcoll calls get optimized away,
> as do the memcmp calls, if you don't use the result for anything. Clang was
> even more aggressive; it ran both comparisons in 0.0 seconds. Apparently it
> optimizes away the loops altogether.
>
> Also, there should be a setlocale(LC_ALL, "") call somewhere. Otherwise it
> runs in C locale, and we don't use strcoll() at all for C locale.
>
> After fixing those, it runs much slower, so I had to reduce the number of
> strings. Here's a fixed program. I'm now getting numbers like this:
>
> (baseline) duration of comparisons without useless memcmp()s: 6.007368
> seconds
>
> duration of comparisons with useless memcmp()s: 6.079826 seconds
>
> Both values vary in range 5.9 - 6.1 s, so it's fair to say that the useless
> memcmp() is free with these parameters.
>
> Is this the worst case scenario?

I can't see a worse one, and I replicated your results here on my
MacBook Pro. I also tried with 1MB strings and, surprisingly, it was
basically free there, too.

It strikes me that perhaps we should make this change (rearranging
things so that the memcmp tiebreak is run before strcoll) first,
before dealing with the rest of the abbreviated keys infrastructure.
It appears to be a separate improvement which is worthwhile
independently of what we do about that patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2014-09-15 17:23:54 Re: jsonb format is pessimal for toast compression
Previous Message Josh Berkus 2014-09-15 17:12:18 Re: jsonb format is pessimal for toast compression