Re: sortsupport for text

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: sortsupport for text
Date: 2012-06-21 11:15:31
Message-ID: CAEYLb_UU=h=oFs+b96uNPJTbiQvsnHJPs5kV7pbPCxxKoTNPzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21 June 2012 11:40, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> At this point, my theory is that your choice of "random" strings prevents
> strxfrm() from ever winning over strcoll(). The reason being that you pick
> each letter uniformly distributed from a-z, resulting in a probability of
> two string differing in the first of 1 - 1/26 =~ 96%. Thus, even for
> extremely complex collation rules, strcoll() probably only needs to compare
> a few characters to determine the order. Whereas strxfrm() has to compute
> the whole sort key, no matter what.

Good point.

> The question is thus, how good a model are your "random" strings for the
> input of a typical sorting step in postgres? My guess is, a quite good one
> actually, since people probably don't deal with a lot of very similar strings
> very often. Which makes we wonder if using strxfrm() during sorting wouldn't
> be a net loss, all things considered…

Well, I think the answer to that has to be no. I posted a simple
postgres text -> strxfrm blob bytea SQL-callable wrapper function a
few days ago that clearly wins by quite a bit on some sample queries
against the dellstore sample database, which has a representative
sample set. Completely uniformly-distributed data actually isn't
representative of the real world at all. Normal distributions abound.

This C++ program was never intended to justify the general utility of
using strxfrm() for sorting, which I don't believe is in question. I
just wanted to get some idea about the performance characteristics of
using strxfrm() to traverse an index.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2012-06-21 11:20:56 Re: not null validation option in contrib/file_fdw
Previous Message Florian Pflug 2012-06-21 10:40:51 Re: sortsupport for text