Re: tsearch parser inefficiency if text includes urls or emails - new version

From: Andres Freund <andres(at)anarazel(dot)de>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: greg(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org, oleg(at)sai(dot)msu(dot)su, teodor(at)sigaev(dot)ru
Subject: Re: tsearch parser inefficiency if text includes urls or emails - new version
Date: 2009-12-08 15:26:11
Message-ID: 200912081626.11709.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday 08 December 2009 16:23:11 Kevin Grittner wrote:
> I wrote:
> > Frankly, I'd be amazed if there was a performance regression,
>
> OK, I'm amazed. While it apparently helps some cases dramatically
> (Andres had a case where run time was reduced by 93.2%), I found a
> pretty routine case where run time was increased by 3.1%. I tweaked
> the code and got that down to a 2.5% run time increase. I'm having
> troubles getting it any lower than that. And yes, this is real, not
> noise -- the slowest unpatched time for this test is faster than the
> fastest time with any version of the patch. :-(
>
> Andres, could you provide more information on the test which showed
> the dramatic improvement? In particular, info on OS, CPU, character
> set, encoding scheme, and what kind of data was used for the test.
>
> I'll do some more testing and try to figure out how the patch is
> slowing things down and post with details.
Could you show your testcase? I dont see why it could get slower?

I tested with various data, the one benefiting most was some changelog where
each entry was signed by an email.

OS: Debian Sid, Core2 Duo, UTF-8, and I tried both C and de_DE.UTF8.

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-12-08 15:40:54 Re: YAML
Previous Message Kevin Grittner 2009-12-08 15:23:11 Re: tsearch parser inefficiency if text includes urls or emails - new version