Re: B-Tree support function number 3 (strxfrm() optimization)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: B-Tree support function number 3 (strxfrm() optimization)
Date: 2015-01-19 20:46:12
Message-ID: CA+TgmoaTiYy9aaMRe7m71Z=mrNZ_aPqepspQHtSNHq8Wiafjow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 19, 2015 at 3:33 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> All right, it seems Tom is with you on that point, so after some
> study, I've committed this with very minor modifications. Sorry for
> the long delay. I have not committed the 0002 patch, though, because
> I haven't studied that enough yet to know whether I think it's a good
> idea. Perhaps that could get its own CommitFest entry and thread,
> though, to separate it from this exceedingly long discussion and make
> it clear exactly what we're hoping to gain by that patch specifically.

By the way, for those following along at home, here's an example of
how this patch can help:

rhaas=# create table stuff as select random()::text as a, 'filler
filler filler'::text as b, g as c from generate_series(1, 1000000) g;
SELECT 1000000
rhaas=# create index on stuff (a);
CREATE INDEX

On the PPC64 machine I normally use for performance testing, it takes
about 6.3 seconds to build the index with the commit just before this
one. With this commit, it drops to 1.9 seconds. That's more than a
3x speedup!

Now, if I change the query that creates the table to this.

rhaas=# create table stuff as select 'aaaaaaaa' || random()::text as
a, 'filler filler filler'::text as b, g as c from generate_series(1,
1000000) g;

...then it takes 10.8 seconds with or without this patch. In general,
any case where the first few characters of every string are exactly
identical (or only quite rarely different) will not benefit, but many
practical cases will benefit significantly. Also, Peter's gone to a
fair amount of work to make sure that even when the patch does not
help, it doesn't hurt, either.

So that's pretty cool.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dan Langille 2015-01-19 20:49:19 PGCon 2015 - last day
Previous Message Robert Haas 2015-01-19 20:33:37 Re: B-Tree support function number 3 (strxfrm() optimization)