Re: tsearch2 dictionary for statute cites

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>,"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: tsearch2 dictionary for statute cites
Date: 2009-03-11 14:01:22
Message-ID: 49B77DE2.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>>> Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
> On Tue, 10 Mar 2009, Tom Lane wrote:
>> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>>> People are likely to search for statute cites, which tend to have
a
>>> hierarchical form. I'm not sure the prefix approach will work for
>>> this. For example, there is a section 939.64 in the state
statutes
>>> dealing with commission of a crime while wearing a bulletproof
>>> garment. If someone searches for that, they should find
subsections
>>> like 939.64(1) or 939.64(2) but not different sections which start
>>> with the same characters like 939.641 (the section on concealing
>>> identity) or 939.645 (the section on hate crimes). A search for
>>> chapter 939 should return any of the above.
>>
>> Perhaps you could pass the texts and the queries through a regexp
>> substitution that converts digit-dot-digit to digit-dash-digit?
>
> perhaps, for 8.4 it's better to utilize prefix search, like
> to_tsquery('939.645:*') will find what Kevin need. The problem is
with
> parser, so I'd preprocess text before indexing to convert all
> digit.digit(digit) to digit.digit.digit, which is what parser
recognizes as
> a single lexem 'version'. Here is just an illustration
>
> qq=# select * from ts_parse('default',translate('939.64(1)','()','.
'));
> tokid | token
> -------+----------
> 8 | 939.64.1
> 12 |
>
> btw, having 'version' it's possible to use dict_regex for 8.3.

Tom, Oleg: Thanks for the suggestions. Looks promising.

-Kevin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Woody Woodring 2009-03-11 14:01:43 Suggestions for blocking user inserts during admin bulk loading.
Previous Message George Oakman 2009-03-11 13:05:34 Re: C++ User-defined functions