From: | Sushant Sinha <sushant354(at)gmail(dot)com> |
---|---|
To: | Florian Pflug <fgp(at)phlo(dot)org> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: lexemes in prefix search going through dictionary modifications |
Date: | 2011-10-25 18:15:17 |
Message-ID: | 1319566517.2023.24.camel@dragflick |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 2011-10-25 at 19:27 +0200, Florian Pflug wrote:
> Assume, for example, that the postgres mailing list archive search used
> tsearch (which I think it does, but I'm not sure). It'd then probably make
> sense to add "postgres" to the list of stopwords, because it's bound to
> appear in nearly every mail. But wouldn't you want searched which include
> 'postgres*' to turn up empty? Quite certainly not.
That improves recall for "postgres:*" query and certainly doesn't help
other queries like "post:*". But more importantly it affects precision
for all queries like "a:*", "an:*", "and:*", "s:*", 't:*', "the:*", etc
(When that is the only search it also affects recall as no row matches
an empty tsquery). Since stopwords are smaller, it means prefix search
for a few characters is meaningless. And I would argue that is when the
prefix search is more important -- only when you know a few characters.
-Sushant.
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2011-10-25 18:48:05 | Re: GiST for range types (was Re: Range Types - typo + NULL string constructor) |
Previous Message | Erik Rijkers | 2011-10-25 18:09:51 | Re: (PATCH) Adding CORRESPONDING to Set Operations |