Quick Links

Re: B-Tree support function number 3 (strxfrm() optimization)

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Greg Stark <stark(at)mit(dot)edu>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)heroku(dot)com>, Thom Brown <thom(at)linux(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: B-Tree support function number 3 (strxfrm() optimization)
Date:	2014-04-07 18:17:42
Message-ID:	20140407181742.GX4582@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> To throw out one more point that I think is problematic, Peter's
> original email on this thread gives a bunch of examples of strxfrm()
> normalization that all different in the first few bytes - but so do
> the underlying strings. I *think* (but don't have time to check right
> now) that on my MacOS X box, strxfrm() spits out 3 bytes of header
> junk and then 8 bytes per character in the input string - so comparing
> the first 8 bytes of the strxfrm()'d representation would amount to
> comparing part of the first byte. If for any reason the first byte is
> the same (or similar enough) on many of the input strings, then this
> will probably work out to be slower rather than faster. Even if other
> platforms are more space-efficient (and I think at least some of them
> are), I think it's unlikely that this optimization will ever pay off
> for strings that don't differ in the first 8 bytes. And there are
> many cases where that could be true a large percentage of the time
> throughout the input, e.g. YYYY-MM-DD HH:MM:SS timestamps stored as
> text. It seems like that the patch pessimizes those cases, though of
> course there's no way to know without testing.

Portability and performance concerns were exactly what worried me as
well. It was my hope/understanding that this was a clear win which was
vetted by other large projects across multiple platforms. If that's
actually in doubt and it isn't a clear win then I agree that we can't be
trying to squeeze it in at this late date.

> Now it *may well be* that after doing some research and performance
> testing we will conclude that either no commonly-used platforms show
> any regressions or that the regressions that do occur are discountable
> in view of the benefits to more common cases to the benefits. I just
> don't think mid-April is the right time to start those discussions
> with the goal of a 9.4 commit; and I also don't think committing
> without having those discussions is very prudent.

I agree with this in concept- but I'd be willing to spend a bit of time
researching it, given that it's from a well known and respected author
who I trust has done much of this research already.

Thanks,

Stephen

In response to

Re: B-Tree support function number 3 (strxfrm() optimization) at 2014-04-07 17:47:25 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2014-04-07 18:19:35	Re: B-Tree support function number 3 (strxfrm() optimization)
Previous Message	Heikki Linnakangas	2014-04-07 18:16:40	WAL replay bugs