Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)

Lists: pgsql-hackers
From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <heikki(dot)linnakangas(at)enterprisedb(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
Date: 2011-09-26 02:32:27
Message-ID: 4E7F9DEB020000250004174F@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Tom Lane wrote:
> "Kevin Grittner" writes:
>> This is a review of the patch at this CF location:
>> https://commitfest.postgresql.org/action/patch_view?id=598
>> as posted here:
>>
http://archives.postgresql.org/message-id/4E04C099.3020604@enterprisedb.com
>
> Hmm, why is that patch the one posted for review, when several
> better versions were already discussed? See thread starting here:
> http://archives.postgresql.org/pgsql-hackers/2011-07/msg00028.php

The patch I reviewed was added to the CF app before the post you cite
was written. I didn't see it in following the links (probably
because it crossed a month boundary). Thanks for pointing that out;
I'll update the CF app and review the later versions.

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <heikki(dot)linnakangas(at)enterprisedb(dot)com>,<tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
Date: 2011-09-29 18:27:05
Message-ID: 4E8472290200002500041918@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> Tom Lane wrote:

>> Hmm, why is that patch the one posted for review, when several
>> better versions were already discussed? See thread starting here:
>> http://archives.postgresql.org/pgsql-hackers/2011-07/msg00028.php
>
> The patch I reviewed was added to the CF app before the post you
> cite was written. I didn't see it in following the links
> (probably because it crossed a month boundary). Thanks for
> pointing that out; I'll update the CF app and review the later
> versions.

The first patch Tom posted was a clear improvement on Heikki's
original patch, and performed slightly better.

The second patch Tom posted makes the patch more robust in the face
of changes to the structure, but reduces the performance benefit on
the dictionary, and performs very close to the unpatched version for
samples of English text. If anything, it seems to be slower with
real text, but the difference is so small it might be noise. (The
dictionary tests are skewed toward longer average word length than
typically found in actual readable text.) I wonder whether the code
for the leading, unaligned portion of the data could be written such
that it was effectively unrolled and the length resolved at compile
time rather than figured out at run time for every call to the
function. That would solve the robustness issue without hurting
performance. If we don't do something like that, this patch doesn't
seem worth applying.

Heikki's second version, a more radical revision optimized for 64
bit systems, blows up on a 32 bit compile, writing off the end of
the structure. Personally, I'd be OK with sacrificing some
performance for 32 bit systems to get better performance on 64 bit
systems, since people who care about performance generally seem to
be on 64 bit builds these days -- but it has to run. Given Tom's
reservations about this approach, I don't know whether Heikki is
interested in fixing the crash so it can be benchmarked. Heikki?

I will do a set of more carefully controlled performance tests on
whatever versions are still in the running after we sort out the
issues above.

-Kevin


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
Date: 2011-09-29 21:08:23
Message-ID: 4E84DE47.8070603@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 29.09.2011 20:27, Kevin Grittner wrote:
> Heikki's second version, a more radical revision optimized for 64
> bit systems, blows up on a 32 bit compile, writing off the end of
> the structure. Personally, I'd be OK with sacrificing some
> performance for 32 bit systems to get better performance on 64 bit
> systems, since people who care about performance generally seem to
> be on 64 bit builds these days -- but it has to run. Given Tom's
> reservations about this approach, I don't know whether Heikki is
> interested in fixing the crash so it can be benchmarked. Heikki?

No, I'm not going to work on that 64-bit patch.

Looking at the big picture, however, the real problem with all those
makesign() calls is that they happen in the first place. They happen
when gist needs to choose which child page to place a new tuple on. It
calls the penalty for every item on the internal page, always passing
the new key as the 2nd argument, along the lines of:

for (all items on internal page)
penalty(item[i], newitem);

At every call, gtrgm_penalty() has to calculate the signature for
newitem, using makesign(). That's an enormous waste of effort, but
there's currently no way gtrgm_penalty() to avoid that. If we could call
makesign() only on the first call in the loop, and remember it for the
subsequent calls, that would eliminate the need for any
micro-optimization in makesign() and make inserting into a trigram index
much faster (including building the index from scratch).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
Date: 2011-09-29 21:16:23
Message-ID: 15195.1317330983@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Looking at the big picture, however, the real problem with all those
> makesign() calls is that they happen in the first place. They happen
> when gist needs to choose which child page to place a new tuple on. It
> calls the penalty for every item on the internal page, always passing
> the new key as the 2nd argument, along the lines of:

> for (all items on internal page)
> penalty(item[i], newitem);

> At every call, gtrgm_penalty() has to calculate the signature for
> newitem, using makesign(). That's an enormous waste of effort, but
> there's currently no way gtrgm_penalty() to avoid that.

Hmm. Are there any other datatypes for which the penalty function has
to duplicate effort? I'm disinclined to fool with this if pg_trgm is
the only example ... but if it's not, maybe we should do something
about that instead of micro-optimizing makesign.

regards, tom lane


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
Date: 2011-09-29 21:20:47
Message-ID: CAPpHfdsc0k5wH6NjnfcN1ZRu5TXYxEhu4PSXOQmdrRjO2eds2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 30, 2011 at 1:08 AM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> At every call, gtrgm_penalty() has to calculate the signature for newitem,
> using makesign(). That's an enormous waste of effort, but there's currently
> no way gtrgm_penalty() to avoid that. If we could call makesign() only on
> the first call in the loop, and remember it for the subsequent calls, that
> would eliminate the need for any micro-optimization in makesign() and make
> inserting into a trigram index much faster (including building the index
> from scratch)
>
Isn't it possible to cache signature of newitem in gtrgm_penalty
like gtrgm_consistent do this for query?

------
With best regards,
Alexander Korotkov.


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
Date: 2011-09-29 21:29:06
Message-ID: CAPpHfdt86EaiV0EtYwKqGSvgBGbXfOWgFeCgUwzo00W5vnW-KA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 30, 2011 at 1:16 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Hmm. Are there any other datatypes for which the penalty function has
> to duplicate effort? I'm disinclined to fool with this if pg_trgm is
> the only example ... but if it's not, maybe we should do something
> about that instead of micro-optimizing makesign.
>
GiST for tsearch works similarly.

------
With best regards,
Alexander Korotkov.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
Date: 2011-09-29 23:03:01
Message-ID: 23146.1317337381@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> On Fri, Sep 30, 2011 at 1:08 AM, Heikki Linnakangas <
> heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> At every call, gtrgm_penalty() has to calculate the signature for newitem,
>> using makesign(). That's an enormous waste of effort, but there's currently
>> no way gtrgm_penalty() to avoid that. If we could call makesign() only on
>> the first call in the loop, and remember it for the subsequent calls, that
>> would eliminate the need for any micro-optimization in makesign() and make
>> inserting into a trigram index much faster (including building the index
>> from scratch)

> Isn't it possible to cache signature of newitem in gtrgm_penalty
> like gtrgm_consistent do this for query?

[ studies that code for awhile ... ] Ick, what a kluge.

The main problem with that code is that the cache data gets leaked at
the conclusion of a scan. Having just seen the consequences of leaking
the "giststate", I think this is something we need to fix not emulate.

I wonder whether it's worth having the GIST code create a special
scan-lifespan (or insert-lifespan) memory context that could be used
for cached data such as this? It's already creating a couple of
contexts for its own purposes, so one more might not be a big problem.
We'd have to figure out a way to make that context available to GIST
support functions, though, as well as something cleaner than fn_extra
for them to keep pointers in.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
Date: 2011-09-30 17:08:02
Message-ID: 9144.1317402482@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
>> Isn't it possible to cache signature of newitem in gtrgm_penalty
>> like gtrgm_consistent do this for query?

> [ studies that code for awhile ... ] Ick, what a kluge.

> The main problem with that code is that the cache data gets leaked at
> the conclusion of a scan. Having just seen the consequences of leaking
> the "giststate", I think this is something we need to fix not emulate.

> I wonder whether it's worth having the GIST code create a special
> scan-lifespan (or insert-lifespan) memory context that could be used
> for cached data such as this? It's already creating a couple of
> contexts for its own purposes, so one more might not be a big problem.
> We'd have to figure out a way to make that context available to GIST
> support functions, though, as well as something cleaner than fn_extra
> for them to keep pointers in.

I've been chewing on this for awhile and am now thinking that maybe what
gtrgm_consistent is doing isn't that unreasonable. It's certainly
legitimate for it to use fn_extra to maintain state between calls.
The problem fundamentally is that when a function uses fn_extra to
reference data it keeps in the fn_mcxt context, there's an implicit
contract that fn_extra will survive for the same length of time that the
fn_mcxt context does. (Otherwise there's no way for the function to
avoid leaking that data after it's been called for the last time using
that FmgrInfo.) And GIST is violating that assumption: it resets
fn_extra when it does initGISTstate, but fn_mcxt gets set to
CurrentMemoryContext, which in the problematic cases is a query-lifespan
context that will still be around after the GIST indexscan is concluded.

So what I'm thinking we ought to do is redefine things so that
initGISTstate sets fn_mcxt to a context that has the same lifespan as
the GISTSTATE itself does. We could possibly eliminate a few retail
pfree's in the process, eg by keeping the GISTSTATE itself in that same
context.

Having done that, what gtrgm_consistent is doing would be an officially
supported (dare I suggest even documented?) thing instead of a kluge,
and we could then adopt the same methodology in gtrgm_penalty.

Comments?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: Optimizing pg_trgm makesign() (was Re: WIP: Fast GiST index build)
Date: 2011-10-01 04:03:31
Message-ID: 12919.1317441811@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> So what I'm thinking we ought to do is redefine things so that
> initGISTstate sets fn_mcxt to a context that has the same lifespan as
> the GISTSTATE itself does. We could possibly eliminate a few retail
> pfree's in the process, eg by keeping the GISTSTATE itself in that same
> context.
> Having done that, what gtrgm_consistent is doing would be an officially
> supported (dare I suggest even documented?) thing instead of a kluge,
> and we could then adopt the same methodology in gtrgm_penalty.

I've committed patches along this line. On my machine, the time needed
to do the test case Heikki proposed (build a gist_trgm_ops index on the
contents of /usr/share/dict/words) drops from around 29.9 seconds to
about 17.3. makesign is now down in the noise according to oprofile,
so I see no further reason to pursue the patch that started this thread.

What's not in the noise is the memcmp() required to check if the cached
makesign result is still good --- as best I can tell, that's near 50%
of the remaining runtime. I don't see any non-invasive way to avoid
that. If we were willing to redo the GiST support function API, we
could fix it, but I'm not sure it's worth that.

regards, tom lane