Re: tsearch parser inefficiency if text includes urls or emails - new version

From: Andres Freund <andres(at)anarazel(dot)de>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org, oleg(at)sai(dot)msu(dot)su, teodor(at)sigaev(dot)ru
Subject: Re: tsearch parser inefficiency if text includes urls or emails - new version
Date: 2009-11-25 13:18:46
Message-ID: 200911251418.47137.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Saturday 14 November 2009 15:33:00 Kevin Grittner wrote:
> Andres Freund wrote:
>
> On Saturday 14 November 2009 01:03:33 Kevin Grittner wrote:
> >> It is in context format, applies cleanly, and passes "make check".
> >
> > Unfortunately the latter is not saying much - I had a bug there and
> > it was not found by the regression tests. Perhaps I should take a
> > stab and add at least some more...
>
> Sounds like a good idea. The one thing to avoid is anything with a
> long enough run time to annoy those that run it many times a day.
Hm. There actually are tests excercising the part where I had a bug...
Strange.
It was a bug involving uninitialized data so probably the regression tests
where just "lucky".

> >> It is in context format, applies cleanly, and passes "make check".
> >> Next I read through the code, and have the same question that
> >> Andres posed 12 days ago. His patch massively reduces the cost of
> >> the parser recursively calling itself for some cases, and it seems
> >> like the least invasive way to modify the parser to solve this
> >> performance problem; but it does beg the question of why a state
> >> machine like this should recursively call itself when it hits
> >> certain states.
> > I was wondering about that as well. I am not completely sure but to
> > me it looks like its just done to reduce the amount of rules and
> > states.
> I'm assuming that's the reason, but didn't dig deep enough to be sure.
> I suspect to be really sure, I'd have to set it up without the
> recursion and see what breaks. I can't imagine it would be anything
> which couldn't be fixed by adding enough states; but perhaps they ran
> into something where these types would require so many new states that
> the recursion seemed like the lesser of evils.
This is similar to my understanding...

> > I have to say that that code is not exactly clear and well
> > documented...
> Yeah. I was happy with the level of documentation that you added with
> your new code, but what was there before is mighty thin. If you
> gleaned enough information while working on it to feel comfortable
> adding documentation anywhere else, that would be a good thing.
It definitely would be a good thing. But that would definitely be seperate
patch. But I fear my current leel of knowledge is sufficient and also I am not
sure if I can make myself interested enough in that part.

> So far the only vote is to proceed with the mitigation, which was my
> preference, and apparently yours -- so I guess we're at 3 to 0 in
> favor of that. I'll mark the patch as "Waiting on Author" so you can
> add any comments and regression tests you feel are appropriate.
>
> By the way, I found one typo in the comments -- it should by useful,
> not usefull.
Ok, will update.

> I liked what I saw so far, but need to spend more time desk-checking
> for correctness, testing to confirm that it doesn't change results,
> and confirming the performance improvement.
Thanks again for your reviewing!

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2009-11-25 13:22:43 Re: Application name patch - v3
Previous Message Peter Eisentraut 2009-11-25 13:00:02 Re: garbage in psql -l