Re: B-Tree support function number 3 (strxfrm() optimization)

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)heroku(dot)com>, Thom Brown <thom(at)linux(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: B-Tree support function number 3 (strxfrm() optimization)
Date: 2014-04-07 18:17:42
Message-ID: 20140407181742.GX4582@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> To throw out one more point that I think is problematic, Peter's
> original email on this thread gives a bunch of examples of strxfrm()
> normalization that all different in the first few bytes - but so do
> the underlying strings. I *think* (but don't have time to check right
> now) that on my MacOS X box, strxfrm() spits out 3 bytes of header
> junk and then 8 bytes per character in the input string - so comparing
> the first 8 bytes of the strxfrm()'d representation would amount to
> comparing part of the first byte. If for any reason the first byte is
> the same (or similar enough) on many of the input strings, then this
> will probably work out to be slower rather than faster. Even if other
> platforms are more space-efficient (and I think at least some of them
> are), I think it's unlikely that this optimization will ever pay off
> for strings that don't differ in the first 8 bytes. And there are
> many cases where that could be true a large percentage of the time
> throughout the input, e.g. YYYY-MM-DD HH:MM:SS timestamps stored as
> text. It seems like that the patch pessimizes those cases, though of
> course there's no way to know without testing.

Portability and performance concerns were exactly what worried me as
well. It was my hope/understanding that this was a clear win which was
vetted by other large projects across multiple platforms. If that's
actually in doubt and it isn't a clear win then I agree that we can't be
trying to squeeze it in at this late date.

> Now it *may well be* that after doing some research and performance
> testing we will conclude that either no commonly-used platforms show
> any regressions or that the regressions that do occur are discountable
> in view of the benefits to more common cases to the benefits. I just
> don't think mid-April is the right time to start those discussions
> with the goal of a 9.4 commit; and I also don't think committing
> without having those discussions is very prudent.

I agree with this in concept- but I'd be willing to spend a bit of time
researching it, given that it's from a well known and respected author
who I trust has done much of this research already.

Thanks,

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-04-07 18:19:35 Re: B-Tree support function number 3 (strxfrm() optimization)
Previous Message Heikki Linnakangas 2014-04-07 18:16:40 WAL replay bugs