Re: Question regarding custom parser

From: Arjen Nienhuis <a(dot)g(dot)nienhuis(at)gmail(dot)com>
To: Arthur van der Wal <arthurvanderwal(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Question regarding custom parser
Date: 2010-10-05 07:26:54
Message-ID: AANLkTikZFH6mwudStHrzuH2GDA4D8e3P8kPDRa-M-x0e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

You can create an index on to_tsvector(replace(foo, '-', ' ')) and then
search using ...match..(replace(foo, ...), ...)

On Mon, Oct 4, 2010 at 11:41 AM, Arthur van der Wal <
arthurvanderwal(at)gmail(dot)com> wrote:

> Hi,
>
> I want to change the way PostgreSQL splits text into tokens, for example:
>
> plainto_tsquery("v-74") should split it up as "v" & "74" instead of "v" &
> "-74".
>
> Another example:
>
> select to_tsvector('NL83-V-74-001-001')'-001':5,6 '74':4 'nl83':2 'nl83-v':1 'v':3
>
> Searching for 'v-71' does not find the database entry as the '-' in 'v-71'
> is not indexed. It's hard to determine when PostgreSQL splits things up by
> '-' and when not
>
>
> I tried writing my own parser (based on the the test_parser example) which
> does nothing more than split at '-', however it seems to me that the logic
> for finding 'base' words and derivitives that postgres does so nicely
> doesn't work anymore.
>
> Another way would be to disable the (signed) int tokeniser and have the
> unsigned int tokeniser accept preceeding 0's.
>
> Can anybody point me in the right direction as in how to tackle this
> problem?
>
> Thanks very much in advance,
>
> Arthur van der Wal
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Massa, Harald Armin 2010-10-05 07:33:44 queriing the version of libpq
Previous Message Rajesh Kumar Mallah 2010-10-05 07:04:52 Re: streaming replication question