From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
Subject: | Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build) |
Date: | 2011-07-02 03:01:46 |
Message-ID: | 29388.1309575706@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> I couldn't resist looking into this, and came up with the attached
> patch. I tested this with:
> CREATE TABLE words (word text);
> COPY words FROM '/usr/share/dict/words';
> CREATE INDEX i_words ON words USING gist (word gist_trgm_ops);
> And then ran "REINDEX INDEX i_words" a few times with and without the
> patch. Without the patch, reindex takes about 4.7 seconds. With the
> patch, 3.7 seconds. That's a worthwhile gain on its own, but becomes
> even more important with Alexander's fast GiST build patch, which calls
> the penalty function more.
I tested this on HPPA (big-endian, picky about alignment) to verify that
you had that code path right. It's right, but on that platform the
speedup in the "reindex dict/words" test is only about 6%. I'm afraid
that the benefit is a lot more machine-specific than we could wish.
I tweaked the patch a bit further (don't pessimize the boundary case
where there's exactly 4n+1 trigrams, avoid forcing trg1 into memory,
improve the comment) and attach that version below. This is a little
bit faster but I still wonder if it's worth the price of adding an
obscure dependency on the header size.
> I used the attached showsign-debug.patch to verify that the patched
> makesign function produces the same results as the existing code. I
> haven't tested the big-endian code, however.
That didn't look terribly convincing. I added a direct validation that
old and new code give the same results, a la
if (ISARRKEY(newval))
{
BITVEC sign;
+ BITVEC osign;
makesign(sign, newval);
+ origmakesign(osign, newval);
+ Assert(memcmp(sign, osign, sizeof(BITVEC)) == 0);
if (ISALLTRUE(origval))
*penalty = ((float) (SIGLENBIT - sizebitvec(sign))) / (float) (SIGLENBIT + 1);
and ran the regression tests and the dict/words example with that.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
fast_makesign_v2.patch | text/x-patch | 3.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2011-07-02 03:10:43 | Re: pg_upgrade defaulting to port 25432 |
Previous Message | Tom Lane | 2011-07-01 23:06:08 | Re: add support for logging current role (what to review?) |