ts_count

Lists: pgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: ts_count
Date: 2011-06-04 12:47:02
Message-ID: 4DEA2946.3040109@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


One of our PostgreSQL Experts Inc customers wanted a function to count
all the occurrences of terms in a tsquery in a tsvector. This has been
written as a loadable module function, and initial testing shows it is
working well. With the client's permission we are releasing the code -
it's available at <https://github.com/pgexperts/ts_count>. The actual
new code involved here is tiny, some of the code is C&P'd from tsrank.c
and much of the rest is boilerplate.

A snippet from the regression test:

select ts_count(to_tsvector('managing managers manage peons
managerially'),
to_tsquery('managers | peon'));
ts_count
----------
4

We'd like to add something like this for 9.2, so I'd like to get the API agreed and then I'll prepare a patch and submit it for the next CF.

Comments? cheers andrew


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ts_count
Date: 2011-06-04 20:51:19
Message-ID: Pine.LNX.4.64.1106050039330.9772@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Well, there are several functions available around tsearch2. so I suggest
somebody to collect all of them and create one extension - ts_addon.
For example, these are what I remember:
1. tsvector2array
2. noccurences(tsvector, tsquery) - like your ts_count
3. nmatches(tsvector, tsquery) - # of matched lexems in query
Of course, we need to think about better names for functions, since
ts_count is a bit ambiguous.

Oleg

On Sat, 4 Jun 2011, Andrew Dunstan wrote:

>
> One of our PostgreSQL Experts Inc customers wanted a function to count all
> the occurrences of terms in a tsquery in a tsvector. This has been written as
> a loadable module function, and initial testing shows it is working well.
> With the client's permission we are releasing the code - it's available at
> <https://github.com/pgexperts/ts_count>. The actual new code involved here is
> tiny, some of the code is C&P'd from tsrank.c and much of the rest is
> boilerplate.
>
> A snippet from the regression test:
>
>
> select ts_count(to_tsvector('managing managers manage peons
> managerially'),
> to_tsquery('managers | peon'));
> ts_count
> ----------
> 4
>
> We'd like to add something like this for 9.2, so I'd like to get the API
> agreed and then I'll prepare a patch and submit it for the next CF.
>
> Comments? cheers andrew
>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ts_count
Date: 2011-06-04 23:45:39
Message-ID: 4DEAC3A3.1080309@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/04/2011 04:51 PM, Oleg Bartunov wrote:
> Well, there are several functions available around tsearch2. so I suggest
> somebody to collect all of them and create one extension - ts_addon.
> For example, these are what I remember:
> 1. tsvector2array
> 2. noccurences(tsvector, tsquery) - like your ts_count
> 3. nmatches(tsvector, tsquery) - # of matched lexems in query
> Of course, we need to think about better names for functions, since
> ts_count is a bit ambiguous.
>
>

Getting agreed names was one reason for posting. I don't know why these
need to be an extension. I think they are of sufficiently general
interest (and sufficiently lightweight) that we could just build them in.

cheers

andrew


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ts_count
Date: 2011-06-05 00:59:51
Message-ID: 1307235537-sup-1699@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Andrew Dunstan's message of sáb jun 04 08:47:02 -0400 2011:

> A snippet from the regression test:
>
>
> select ts_count(to_tsvector('managing managers manage peons managerially'),
> to_tsquery('managers | peon'));
> ts_count
> ----------
> 4

Err, shouldn't this return 5?

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ts_count
Date: 2011-06-05 02:09:25
Message-ID: 4DEAE555.7040203@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/04/2011 08:59 PM, Alvaro Herrera wrote:
> Excerpts from Andrew Dunstan's message of sáb jun 04 08:47:02 -0400 2011:
>
>> A snippet from the regression test:
>>
>>
>> select ts_count(to_tsvector('managing managers manage peons managerially'),
>> to_tsquery('managers | peon'));
>> ts_count
>> ----------
>> 4
> Err, shouldn't this return 5?

No. 'managerially' doesn't get the same stemming.

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ts_count
Date: 2011-11-03 20:44:50
Message-ID: 4EB2FD42.5060704@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/04/2011 04:51 PM, Oleg Bartunov wrote:
> Well, there are several functions available around tsearch2. so I suggest
> somebody to collect all of them and create one extension - ts_addon.
> For example, these are what I remember:
> 1. tsvector2array
> 2. noccurences(tsvector, tsquery) - like your ts_count
> 3. nmatches(tsvector, tsquery) - # of matched lexems in query
> Of course, we need to think about better names for functions, since
> ts_count is a bit ambiguous.
>
>
>

Oleg, are you doing this? I'd rather this stuff didn't get dropped on
the floor.

cheers

andrew