Re: tsvector/tsearch equality and/or portability issue

Lists: pgsql-hackers
From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: tsvector/tsearch equality and/or portability issue issue ?
Date: 2006-08-24 16:34:58
Message-ID: 44EDD532.5040104@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

We just had a complaint on IRC that:

devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;
?column?
----------
f
(1 row)

and that searches for certain values would not return all matches under
some circumstances.

a little bit of testing shows the following:

postgres=# create table foo (bla tsvector);
CREATE TABLE
postgres=# insert into foo values ('bla bla');
INSERT 0 1
postgres=# insert into foo values ('bla bla');
INSERT 0 1
postgres=# select bla from foo group by bla;
bla
-------
'bla'
(1 row)

postgres=# create index foo_idx on foo(bla);
CREATE INDEX
postgres=# set enable_seqscan to off;
SET
postgres=# select bla from foo group by bla;
bla
-------
'bla'
'bla'
(2 rows)

postgres=# set enable_seqscan to on;
SET
postgres=# select bla from foo group by bla;
bla
-------
'bla'
(1 row)

ouch :-(

I can reproduce that at least on OpenBSD/i386 and Debian Etch/x86_64.

It is also noteworthy that the existing regression tests for tsearch2 do
not seem to do any equality testing ...

Stefan


From: "Andrew J(dot) Kopciuch" <akopciuch(at)bddf(dot)ca>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsvector/tsearch equality and/or portability issue issue ?
Date: 2006-08-24 16:58:48
Message-ID: 200608241058.48891.akopciuch@bddf.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 24 August 2006 10:34, Stefan Kaltenbrunner wrote:
> We just had a complaint on IRC that:
>
> devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;
> ?column?
> ----------
> f
> (1 row)
>

This could be an endianess issue?

This was probably the same person who posted this on the OpenFTS list.

He's compiled from source :

<snip>
dew=# select version();
PostgreSQL 8.1.4 on powerpc-apple-darwin8.6.0, compiled by GCC
powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build
5250)
</snip>

I don't have any access to an OSX box to verify things ATM. I am trying to
get access to one though. :S Can someone else verify this right now?

Andy


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsvector/tsearch equality and/or portability issue
Date: 2006-08-24 17:40:13
Message-ID: 44EDE47D.8070805@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;
> ?column?
> ----------
> f
> (1 row)

Fixed in 8.1 and HEAD. Thank you

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: AgentM <agentm(at)themactionfaction(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsvector/tsearch equality and/or portability issue issue ?
Date: 2006-08-24 17:48:52
Message-ID: C083305F-ED80-4A2A-B6A4-C32096D72994@themactionfaction.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Aug 24, 2006, at 12:58 , Andrew J. Kopciuch wrote:

> On Thursday 24 August 2006 10:34, Stefan Kaltenbrunner wrote:
>> We just had a complaint on IRC that:
>>
>> devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;
>> ?column?
>> ----------
>> f
>> (1 row)
>>
>
>
> This could be an endianess issue?
>
> This was probably the same person who posted this on the OpenFTS list.
>
> He's compiled from source :
>
> <snip>
> dew=# select version();
> PostgreSQL 8.1.4 on powerpc-apple-darwin8.6.0, compiled by GCC
> powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc.
> build
> 5250)
> </snip>
>
> I don't have any access to an OSX box to verify things ATM. I am
> trying to
> get access to one though. :S Can someone else verify this right
> now?

Stefan said he reproduced on OpenBSD/i386 so it is unlikely to be an
endianness issue. Anyway, here's the comparison code- I guess it
doesn't use strcmp to avoid encoding silliness. (?)

static int
silly_cmp_tsvector(const tsvector * a, const tsvector * b)
{
if (a->len < b->len)
return -1;
else if (a->len > b->len)
return 1;
else if (a->size < b->size)
return -1;
else if (a->size > b->size)
return 1;
else
{
unsigned char *aptr = (unsigned char *) (a->data) +
DATAHDRSIZE;
unsigned char *bptr = (unsigned char *) (b->data) +
DATAHDRSIZE;

while (aptr - ((unsigned char *) (a->data)) < a->len)
{
if (*aptr != *bptr)
return (*aptr < *bptr) ? -1 : 1;
aptr++;
bptr++;
}
}
return 0;
}


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: akopciuch(at)bddf(dot)ca
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsvector/tsearch equality and/or portability issue issue ?
Date: 2006-08-24 17:50:05
Message-ID: 7579.1156441805@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Andrew J. Kopciuch" <akopciuch(at)bddf(dot)ca> writes:
> On Thursday 24 August 2006 10:34, Stefan Kaltenbrunner wrote:
>> devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;
>> ?column?
>> ----------
>> f
>> (1 row)

> This could be an endianess issue?

Apparently not, it works for me on HPPA (big endian) and on Darwin/PPC
(ditto). I'm testing CVS HEAD though, not 8.1 branch.

However ... I also see that tsearch2's regression test is dumping
core on my OS X machine. I haven't cvs update'd for awhile on this
machine though --- will bring it to HEAD and report back.

Can some other people try this? We need to get a handle on which
machines show the problem.

regards, tom lane


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsvector/tsearch equality and/or portability issue
Date: 2006-08-24 17:59:48
Message-ID: 44EDE914.9090901@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev wrote:
>> devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;
>> ?column?
>> ----------
>> f
>> (1 row)
>
> Fixed in 8.1 and HEAD. Thank you

thanks for the fast response - would it maybe be worthwhile to add
regression tests for this kind of thing though ?

Stefan


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: AgentM <agentm(at)themactionfaction(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsvector/tsearch equality and/or portability issue
Date: 2006-08-24 18:20:55
Message-ID: 44EDEE07.8090907@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Stefan said he reproduced on OpenBSD/i386 so it is unlikely to be an
> endianness issue. Anyway, here's the comparison code- I guess it doesn't
> use strcmp to avoid encoding silliness. (?)

I suppose that ordering for tsvector type is some strange and it hasn't any
matter. For me, it's a secret why it's needed :)
The reason of bug was: some internal parts of tsvector should be shortaligned,
so there was an unused bytes. Previous comparing function compares they too...

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: akopciuch(at)bddf(dot)ca, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsvector/tsearch equality and/or portability issue
Date: 2006-08-24 18:27:05
Message-ID: 44EDEF79.8050001@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> "Andrew J. Kopciuch" <akopciuch(at)bddf(dot)ca> writes:
>> On Thursday 24 August 2006 10:34, Stefan Kaltenbrunner wrote:
>>> devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;
>>> ?column?
>>> ----------
>>> f
>>> (1 row)
>
>> This could be an endianess issue?
>
> Apparently not, it works for me on HPPA (big endian) and on Darwin/PPC
> (ditto). I'm testing CVS HEAD though, not 8.1 branch.
>
> However ... I also see that tsearch2's regression test is dumping
> core on my OS X machine. I haven't cvs update'd for awhile on this
> machine though --- will bring it to HEAD and report back.
>
> Can some other people try this? We need to get a handle on which
> machines show the problem.

I am trying on current copy of HEAD.. however:

jd(at)scratch:~/pgsqldev$ bin/psql -U postgres postgres <
share/contrib/tsearch2.sql
SET
BEGIN
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"pg_ts_dict_pkey" for table "pg_ts_dict"
CREATE TABLE
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
INSERT 57434167 1
CREATE FUNCTION
CREATE FUNCTION
INSERT 57434170 1
ERROR: could not find function "snb_ru_init_koi8" in file
"/usr/local/pgsql/lib/tsearch2.so"
ERROR: current transaction is aborted, commands ignored until end of
transaction block
ERROR: current transaction is aborted, commands ignored until end of
transaction block

I will try on 8.1 in a moment.

Joshua D. Drake

>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: akopciuch(at)bddf(dot)ca, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsvector/tsearch equality and/or portability issue
Date: 2006-08-24 18:28:29
Message-ID: 44EDEFCD.50804@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Can some other people try this? We need to get a handle on which
> machines show the problem.

d(at)scratch:~/pgsqldev$ /usr/local/pgsql/bin/psql -U postgres postgres
Welcome to psql 8.1.3, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

postgres=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;
?column?
----------
t
(1 row)

postgres=#

AMD 64 X2, Ubuntu Dapper LTS.

Sincerely,

Joshua D. Drake

>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, akopciuch(at)bddf(dot)ca, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsvector/tsearch equality and/or portability issue
Date: 2006-08-24 18:30:56
Message-ID: 44EDF060.6060205@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>> Can some other people try this? We need to get a handle on which
>> machines show the problem.
>
> I am trying on current copy of HEAD.. however:

Ignore the below... This is an error with my linker/ld.so.conf

Joshua D. Drake

>
> jd(at)scratch:~/pgsqldev$ bin/psql -U postgres postgres <
> share/contrib/tsearch2.sql
> SET
> BEGIN
> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
> "pg_ts_dict_pkey" for table "pg_ts_dict"
> CREATE TABLE
> CREATE FUNCTION
> CREATE FUNCTION
> CREATE FUNCTION
> CREATE FUNCTION
> CREATE FUNCTION
> CREATE FUNCTION
> CREATE FUNCTION
> INSERT 57434167 1
> CREATE FUNCTION
> CREATE FUNCTION
> INSERT 57434170 1
> ERROR: could not find function "snb_ru_init_koi8" in file
> "/usr/local/pgsql/lib/tsearch2.so"
> ERROR: current transaction is aborted, commands ignored until end of
> transaction block
> ERROR: current transaction is aborted, commands ignored until end of
> transaction block
>
> I will try on 8.1 in a moment.
>
> Joshua D. Drake
>
>
>
>>
>> regards, tom lane
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 3: Have you checked our extensive FAQ?
>>
>> http://www.postgresql.org/docs/faq
>>
>
>

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsvector/tsearch equality and/or portability issue
Date: 2006-08-24 18:34:44
Message-ID: 8212.1156444484@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
> Fixed in 8.1 and HEAD. Thank you

This appears to have created a regression test failure:

*** ./expected/tsearch2.out Sun Jun 18 12:55:28 2006
--- ./results/tsearch2.out Thu Aug 24 14:30:02 2006
***************
*** 2496,2503 ****
f |
f | '345':1 'qwerti':2 'copyright':3
f | 'qq':7 'bar':2,8 'foo':1,3,6 'copyright':9
- f | 'a':1A,2,3C 'b':5A,6B,7C,8B
f | 'a':1A,2,3B 'b':5A,6A,7C,8
f | '7w' 'ch' 'd7' 'eo' 'gw' 'i4' 'lq' 'o6' 'qt' 'y0'
f | 'ar' 'ei' 'kq' 'ma' 'qa' 'qh' 'qq' 'qz' 'rx' 'st'
f | 'gs' 'i6' 'i9' 'j2' 'l0' 'oq' 'qx' 'sc' 'xe' 'yu'
--- 2496,2503 ----
f |
f | '345':1 'qwerti':2 'copyright':3
f | 'qq':7 'bar':2,8 'foo':1,3,6 'copyright':9
f | 'a':1A,2,3B 'b':5A,6A,7C,8
+ f | 'a':1A,2,3C 'b':5A,6B,7C,8B
f | '7w' 'ch' 'd7' 'eo' 'gw' 'i4' 'lq' 'o6' 'qt' 'y0'
f | 'ar' 'ei' 'kq' 'ma' 'qa' 'qh' 'qq' 'qz' 'rx' 'st'
f | 'gs' 'i6' 'i9' 'j2' 'l0' 'oq' 'qx' 'sc' 'xe' 'yu'

======================================================================

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: akopciuch(at)bddf(dot)ca, pgsql-hackers(at)postgresql(dot)org
Subject: Re: tsvector/tsearch equality and/or portability issue
Date: 2006-08-24 18:41:44
Message-ID: 8398.1156444904@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
>>> Can some other people try this? We need to get a handle on which
>>> machines show the problem.
>>
>> I am trying on current copy of HEAD.. however:

Looks like Teodor already solved the problem, so no need for a fire
drill anymore.

regards, tom lane


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsvector/tsearch equality and/or portability issue
Date: 2006-08-25 09:13:13
Message-ID: 44EEBF29.5020805@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Oops. Fixed.

Tom Lane wrote:
> Teodor Sigaev <teodor(at)sigaev(dot)ru> writes:
>> Fixed in 8.1 and HEAD. Thank you
>
> This appears to have created a regression test failure:
>
> *** ./expected/tsearch2.out Sun Jun 18 12:55:28 2006
> --- ./results/tsearch2.out Thu Aug 24 14:30:02 2006
> ***************
> *** 2496,2503 ****
> f |
> f | '345':1 'qwerti':2 'copyright':3
> f | 'qq':7 'bar':2,8 'foo':1,3,6 'copyright':9
> - f | 'a':1A,2,3C 'b':5A,6B,7C,8B
> f | 'a':1A,2,3B 'b':5A,6A,7C,8
> f | '7w' 'ch' 'd7' 'eo' 'gw' 'i4' 'lq' 'o6' 'qt' 'y0'
> f | 'ar' 'ei' 'kq' 'ma' 'qa' 'qh' 'qq' 'qz' 'rx' 'st'
> f | 'gs' 'i6' 'i9' 'j2' 'l0' 'oq' 'qx' 'sc' 'xe' 'yu'
> --- 2496,2503 ----
> f |
> f | '345':1 'qwerti':2 'copyright':3
> f | 'qq':7 'bar':2,8 'foo':1,3,6 'copyright':9
> f | 'a':1A,2,3B 'b':5A,6A,7C,8
> + f | 'a':1A,2,3C 'b':5A,6B,7C,8B
> f | '7w' 'ch' 'd7' 'eo' 'gw' 'i4' 'lq' 'o6' 'qt' 'y0'
> f | 'ar' 'ei' 'kq' 'ma' 'qa' 'qh' 'qq' 'qz' 'rx' 'st'
> f | 'gs' 'i6' 'i9' 'j2' 'l0' 'oq' 'qx' 'sc' 'xe' 'yu'
>
> ======================================================================
>
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Phil Frost <indigo(at)bitglue(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsvector/tsearch equality and/or portability issue
Date: 2006-08-28 13:32:26
Message-ID: 20060828133226.GA9938@unununium.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 24, 2006 at 09:40:13PM +0400, Teodor Sigaev wrote:
> >devel=# select 'blah foo bar'::tsvector = 'blah foo bar'::tsvector;
> > ?column?
> >----------
> > f
> >(1 row)
>
> Fixed in 8.1 and HEAD. Thank you

Things still seem to be broken for me. Among other things, the script at
<http://unununium.org/~indigo/testvectors.sql.bz2> fails. It performs two
tests, comparing 1000 random vectors with positions and random weights, and
comparing the same vectors, but stripped. Oddly, the unstripped comparisons all
pass, which is not consistant with what I am seeing in my database. However,
I'm yet unable to reproduce those problems.

It's worth noting that in running this script I have seen the number of
failures change, which seems to indicate that some uninitialized memory
is still being compared.

test=# \i testvectors.sql
BEGIN
CREATE FUNCTION
CREATE TABLE
total vectors in test set
---------------------------
1000
(1 row)

failing unstripped equality
-----------------------------
0
(1 row)

failing stripped equality
---------------------------
389
(1 row)

ROLLBACK
test=#


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Phil Frost <indigo(at)bitglue(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsvector/tsearch equality and/or portability issue
Date: 2006-08-28 21:57:48
Message-ID: 18800.1156802268@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Phil Frost <indigo(at)bitglue(dot)com> writes:
> Things still seem to be broken for me. Among other things, the script at
> <http://unununium.org/~indigo/testvectors.sql.bz2> fails. It performs two
> tests, comparing 1000 random vectors with positions and random weights, and
> comparing the same vectors, but stripped. Oddly, the unstripped comparisons all
> pass, which is not consistant with what I am seeing in my database. However,
> I'm yet unable to reproduce those problems.

It looks to me like tsvector comparison may be too strong. The strip()
function evidently thinks that it's OK to rearrange the string chunks
into the same order as the WordEntry items, which suggests to me that
the "pos" fields are not really semantically significant. But
silly_cmp_tsvector() considers that a difference in pos values is
important. I don't understand the data structure well enough to know
which one to believe, but something's not consistent here.

regards, tom lane


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Phil Frost <indigo(at)bitglue(dot)com>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: tsvector/tsearch equality and/or portability issue
Date: 2006-08-29 13:59:32
Message-ID: 44F44844.6040801@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>> comparing the same vectors, but stripped. Oddly, the unstripped comparisons all
>> pass, which is not consistant with what I am seeing in my database. However,
>> I'm yet unable to reproduce those problems.

Fixed: strncmp was called with wrong length parameter.

>
> It looks to me like tsvector comparison may be too strong. The strip()
> function evidently thinks that it's OK to rearrange the string chunks
> into the same order as the WordEntry items, which suggests to me that
> the "pos" fields are not really semantically significant. But
> silly_cmp_tsvector() considers that a difference in pos values is
> important. I don't understand the data structure well enough to know
> which one to believe, but something's not consistent here.

You are right: Pos really means position of lexeme itself in a tail of tsvector
structure. So, it's removed from comparison.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/