Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Subject: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
Date: 2011-07-02 03:01:46
Message-ID: 29388.1309575706@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> I couldn't resist looking into this, and came up with the attached
> patch. I tested this with:

> CREATE TABLE words (word text);
> COPY words FROM '/usr/share/dict/words';
> CREATE INDEX i_words ON words USING gist (word gist_trgm_ops);

> And then ran "REINDEX INDEX i_words" a few times with and without the
> patch. Without the patch, reindex takes about 4.7 seconds. With the
> patch, 3.7 seconds. That's a worthwhile gain on its own, but becomes
> even more important with Alexander's fast GiST build patch, which calls
> the penalty function more.

I tested this on HPPA (big-endian, picky about alignment) to verify that
you had that code path right. It's right, but on that platform the
speedup in the "reindex dict/words" test is only about 6%. I'm afraid
that the benefit is a lot more machine-specific than we could wish.

I tweaked the patch a bit further (don't pessimize the boundary case
where there's exactly 4n+1 trigrams, avoid forcing trg1 into memory,
improve the comment) and attach that version below. This is a little
bit faster but I still wonder if it's worth the price of adding an
obscure dependency on the header size.

> I used the attached showsign-debug.patch to verify that the patched
> makesign function produces the same results as the existing code. I
> haven't tested the big-endian code, however.

That didn't look terribly convincing. I added a direct validation that
old and new code give the same results, a la

if (ISARRKEY(newval))
{
BITVEC sign;
+ BITVEC osign;

makesign(sign, newval);
+ origmakesign(osign, newval);
+ Assert(memcmp(sign, osign, sizeof(BITVEC)) == 0);

if (ISALLTRUE(origval))
*penalty = ((float) (SIGLENBIT - sizebitvec(sign))) / (float) (SIGLENBIT + 1);

and ran the regression tests and the dict/words example with that.

regards, tom lane

Attachment Content-Type Size
fast_makesign_v2.patch text/x-patch 3.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2011-07-02 03:10:43 Re: pg_upgrade defaulting to port 25432
Previous Message Tom Lane 2011-07-01 23:06:08 Re: add support for logging current role (what to review?)