Why is there a tsquery data type?

Lists: pgsql-hackers
From: Bruce Momjian <bruce(at)momjian(dot)us>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Why is there a tsquery data type?
Date: 2007-08-29 20:22:23
Message-ID: 200708292022.l7TKMNm19553@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Why does text search need a tsquery data type? I realize it needs
tsvector so it can create indexes and updated trigger columns, but it
seems tsquery could instead just be a simple text string.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is there a tsquery data type?
Date: 2007-08-29 22:35:57
Message-ID: 12179.1188426957@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Why does text search need a tsquery data type? I realize it needs
> tsvector so it can create indexes and updated trigger columns, but it
> seems tsquery could instead just be a simple text string.

By that logic, we don't need any data types other than text.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is there a tsquery data type?
Date: 2007-08-29 23:10:10
Message-ID: 200708292310.l7TNAAn12960@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Why does text search need a tsquery data type? I realize it needs
> > tsvector so it can create indexes and updated trigger columns, but it
> > seems tsquery could instead just be a simple text string.
>
> By that logic, we don't need any data types other than text.

What is tsquery giving us that text would not?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is there a tsquery data type?
Date: 2007-08-29 23:26:09
Message-ID: 12966.1188429969@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Tom Lane wrote:
>> By that logic, we don't need any data types other than text.

> What is tsquery giving us that text would not?

A preprocessed representation that can be compared to tsvector
efficiently.

Now, if you'd asked whether we need *both* tsvector and tsquery,
that'd be a fair question. I'm not 100% clear on what the differences
are, but they seem pretty subtle. Do you think that having only
one datatype would be clearer?

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is there a tsquery data type?
Date: 2007-08-30 01:03:47
Message-ID: 200708300103.l7U13lr15346@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Tom Lane wrote:
> >> By that logic, we don't need any data types other than text.
>
> > What is tsquery giving us that text would not?
>
> A preprocessed representation that can be compared to tsvector
> efficiently.
>
> Now, if you'd asked whether we need *both* tsvector and tsquery,
> that'd be a fair question. I'm not 100% clear on what the differences
> are, but they seem pretty subtle. Do you think that having only
> one datatype would be clearer?

There is no question things would be clearer with only one text search
data type. The only value I can see to having a tsquery data type is
that you can store a tsquery value in a column, but why would that be
much better than just storing it in a TEXT field?

Internally I assume you would have to generate a tsquery structure from
a TEXT string, so the idea of a query representation wouldn't go away;
it would just be internal.

The one thing we would lose is the ability to process the query string
with a named configuration. If we always cast to TEXT, I assume we
would always be using "default_text_search_config", and I am a little
worried about queries in triggers that have to wire-down the
configuration name. As I understand it the tsquery goes through the
configuration just like the tsvector.

Right now you can already do:

'query' @@ 'boy girl'::tsvector

and the system casts your text string to tsquery automatically. Perhaps
we just need to minimize tsquery in the documentation and mention its
special purpose.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Bruce Momjian" <bruce(at)momjian(dot)us>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is there a tsquery data type?
Date: 2007-08-30 06:32:28
Message-ID: 87642xbkhv.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Bruce Momjian" <bruce(at)momjian(dot)us> writes:

> There is no question things would be clearer with only one text search
> data type. The only value I can see to having a tsquery data type is
> that you can store a tsquery value in a column, but why would that be
> much better than just storing it in a TEXT field?

When you try storing a tsquery in a column does it alert you if you have an
invalid syntax at that point? Storing it as text would mean not finding out
until you try to use the query.

Is converting a text query into the internal format faster or less memory
intensive than converting text into the internal representation? When you run
a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then
'...' would have to be parsed over and over again for each row.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is there a tsquery data type?
Date: 2007-08-30 15:40:34
Message-ID: 200708301540.l7UFeYC11905@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark wrote:
> "Bruce Momjian" <bruce(at)momjian(dot)us> writes:
>
> > There is no question things would be clearer with only one text search
> > data type. The only value I can see to having a tsquery data type is
> > that you can store a tsquery value in a column, but why would that be
> > much better than just storing it in a TEXT field?
>
> When you try storing a tsquery in a column does it alert you if you have an
> invalid syntax at that point? Storing it as text would mean not finding out
> until you try to use the query.

Yes it does check syntax:

test=> select 'lkjadsf kjfdsa'::tsquery;
ERROR: syntax error in tsearch query: "lkjadsf kjfdsa"

A larger question is how many people store queries in the database to
make it worth the complexity.

> Is converting a text query into the internal format faster or less memory
> intensive than converting text into the internal representation? When you run
> a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then
> '...' would have to be parsed over and over again for each row.

No, internally the TEXT string would be converted to something the
system could deal with for that query, which is probably what 99% of all
queries are going to do anyway by calling to_tsquery().

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Bruce Momjian" <bruce(at)momjian(dot)us>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is there a tsquery data type?
Date: 2007-08-30 16:01:48
Message-ID: 877ind57v7.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Bruce Momjian" <bruce(at)momjian(dot)us> writes:

> Gregory Stark wrote:
>> "Bruce Momjian" <bruce(at)momjian(dot)us> writes:
>>
>> > There is no question things would be clearer with only one text search
>> > data type. The only value I can see to having a tsquery data type is
>> > that you can store a tsquery value in a column, but why would that be
>> > much better than just storing it in a TEXT field?
>>
>> When you try storing a tsquery in a column does it alert you if you have an
>> invalid syntax at that point? Storing it as text would mean not finding out
>> until you try to use the query.
>
> Yes it does check syntax:
>
> test=> select 'lkjadsf kjfdsa'::tsquery;
> ERROR: syntax error in tsearch query: "lkjadsf kjfdsa"
>
> A larger question is how many people store queries in the database to
> make it worth the complexity.

So would this still happen if you didn't have a tsquery type? Or would it
throw the error only when the actual matching operator executed and tried to
parse the text?

>> Is converting a text query into the internal format faster or less memory
>> intensive than converting text into the internal representation? When you run
>> a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then
>> '...' would have to be parsed over and over again for each row.
>
> No, internally the TEXT string would be converted to something the
> system could deal with for that query, which is probably what 99% of all
> queries are going to do anyway by calling to_tsquery().

How would that happen if there wasn't a tsquery type?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is there a tsquery data type?
Date: 2007-08-30 16:17:09
Message-ID: 200708301617.l7UGH9M13386@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark wrote:
> "Bruce Momjian" <bruce(at)momjian(dot)us> writes:
>
> > Gregory Stark wrote:
> >> "Bruce Momjian" <bruce(at)momjian(dot)us> writes:
> >>
> >> > There is no question things would be clearer with only one text search
> >> > data type. The only value I can see to having a tsquery data type is
> >> > that you can store a tsquery value in a column, but why would that be
> >> > much better than just storing it in a TEXT field?
> >>
> >> When you try storing a tsquery in a column does it alert you if you have an
> >> invalid syntax at that point? Storing it as text would mean not finding out
> >> until you try to use the query.
> >
> > Yes it does check syntax:
> >
> > test=> select 'lkjadsf kjfdsa'::tsquery;
> > ERROR: syntax error in tsearch query: "lkjadsf kjfdsa"
> >
> > A larger question is how many people store queries in the database to
> > make it worth the complexity.
>
> So would this still happen if you didn't have a tsquery type? Or would it
> throw the error only when the actual matching operator executed and tried to
> parse the text?

Well, if you didn't have a tsquery data type then you would get the
error when the TEXT was cast to tsquery for the search itself.

> >> Is converting a text query into the internal format faster or less memory
> >> intensive than converting text into the internal representation? When you run
> >> a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then
> >> '...' would have to be parsed over and over again for each row.
> >
> > No, internally the TEXT string would be converted to something the
> > system could deal with for that query, which is probably what 99% of all
> > queries are going to do anyway by calling to_tsquery().
>
> How would that happen if there wasn't a tsquery type?

There is an internal C structure which holds the tsquery information.
My guess is that we would internally have something like tsquery but it
wouldn't be user-visible perhaps. Right now I am a little confused
about how to do this and keep the data-type-independent nature of the
backend. You are right that we might have to end up re-parsing the TEXT
field every time it hits the @@ operator, which is a pain.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Bruce Momjian" <bruce(at)momjian(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is there a tsquery data type?
Date: 2007-08-30 16:18:00
Message-ID: 2265.1188490680@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> How would that happen if there wasn't a tsquery type?

I don't think Bruce is suggesting that the search operand can really
be plain text (or if he is, he's nuts). The question here is whether
there's really a need for a distinction between tsvector and tsquery
datatypes; could we have tsvector serve both purposes instead?

I can see that there are differences: tsquery can tell the difference
between "x AND y" and "x OR y", whereas tsvector just knows "x, y".
A superset datatype that can do both is certainly possible, but whether
it's practical, or would be easier to use than the current design,
I dunno.

Perhaps a suitable analogy is regexp pattern matching. Traditionally
regexps are conceived of as strings, but if they'd originated in more
strongly typed languages than they did, they'd certainly be thought
of as a distinct data type. Had we implemented ~ as taking a right
operand of type 'regexp', we could win on a number of levels: entry-time
syntax checking for regexps, and a precompiled internal representation,
for instance. For regexps it seems clear to me that the target text
string and the pattern really are different datatypes, and fuzzing that
distinction is not an improvement.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is there a tsquery data type?
Date: 2007-08-30 16:21:31
Message-ID: 200708301621.l7UGLVA18356@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > How would that happen if there wasn't a tsquery type?
>
> I don't think Bruce is suggesting that the search operand can really
> be plain text (or if he is, he's nuts). The question here is whether

Yes, that is what I was suggesting because as I mentioned TEXT already
functions fine as tsquery.

> there's really a need for a distinction between tsvector and tsquery
> datatypes; could we have tsvector serve both purposes instead?
>
> I can see that there are differences: tsquery can tell the difference
> between "x AND y" and "x OR y", whereas tsvector just knows "x, y".
> A superset datatype that can do both is certainly possible, but whether
> it's practical, or would be easier to use than the current design,
> I dunno.

Because of the special behavior of & and |, I assume tsquery and
tsvector cannot be the same data type.

> Perhaps a suitable analogy is regexp pattern matching. Traditionally
> regexps are conceived of as strings, but if they'd originated in more
> strongly typed languages than they did, they'd certainly be thought
> of as a distinct data type. Had we implemented ~ as taking a right
> operand of type 'regexp', we could win on a number of levels: entry-time
> syntax checking for regexps, and a precompiled internal representation,
> for instance. For regexps it seems clear to me that the target text
> string and the pattern really are different datatypes, and fuzzing that
> distinction is not an improvement.

Yes, this is a good analogy.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why is there a tsquery data type?
Date: 2007-08-30 17:03:00
Message-ID: Pine.LNX.4.64.0708302102080.2767@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 30 Aug 2007, Bruce Momjian wrote:

> Gregory Stark wrote:
>> "Bruce Momjian" <bruce(at)momjian(dot)us> writes:
>>
>>> There is no question things would be clearer with only one text search
>>> data type. The only value I can see to having a tsquery data type is
>>> that you can store a tsquery value in a column, but why would that be
>>> much better than just storing it in a TEXT field?
>>
>> When you try storing a tsquery in a column does it alert you if you have an
>> invalid syntax at that point? Storing it as text would mean not finding out
>> until you try to use the query.
>
> Yes it does check syntax:
>
> test=> select 'lkjadsf kjfdsa'::tsquery;
> ERROR: syntax error in tsearch query: "lkjadsf kjfdsa"
>
> A larger question is how many people store queries in the database to
> make it worth the complexity.

you forget about very powerfull query rewriting, which is table driven

>
>> Is converting a text query into the internal format faster or less memory
>> intensive than converting text into the internal representation? When you run
>> a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then
>> '...' would have to be parsed over and over again for each row.
>
> No, internally the TEXT string would be converted to something the
> system could deal with for that query, which is probably what 99% of all
> queries are going to do anyway by calling to_tsquery().
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Why is there a tsquery data type?
Date: 2007-08-30 17:26:45
Message-ID: 46D6FDD5.7080703@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> Why does text search need a tsquery data type? I realize it needs
>> tsvector so it can create indexes and updated trigger columns, but it
>> seems tsquery could instead just be a simple text string.
>
> By that logic, we don't need any data types other than text.
>

Could similar logic argue that we'd want special types for regular
expressions too? That seems quite parallel to the tsquery type to me.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Why is there a tsquery data type?
Date: 2007-08-30 18:10:32
Message-ID: 3786.1188497432@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> writes:
> Could similar logic argue that we'd want special types for regular
> expressions too? That seems quite parallel to the tsquery type to me.

Yeah, it certainly seems like something we might want to consider in
future --- we could get rid of that klugy cache for compiled regexps,
for one thing. An implicit cast from text to the regexp type would
cover backwards compatibility issues, I think.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Why is there a tsquery data type?
Date: 2008-03-12 19:41:31
Message-ID: 200803121941.m2CJfVF06131@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Added to TODO:

* Consider a special data type for regular expressions

http://archives.postgresql.org/pgsql-hackers/2007-08/msg01067.php

---------------------------------------------------------------------------

Tom Lane wrote:
> Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> writes:
> > Could similar logic argue that we'd want special types for regular
> > expressions too? That seems quite parallel to the tsquery type to me.
>
> Yeah, it certainly seems like something we might want to consider in
> future --- we could get rid of that klugy cache for compiled regexps,
> for one thing. An implicit cast from text to the regexp type would
> cover backwards compatibility issues, I think.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +