Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)

Lists: pgsql-hackers
From: Andreas Seltenreich <seltenreich(at)gmx(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Subject: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
Date: 2016-08-03 17:29:13
Message-ID: 87invhoj6e.fsf@credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

the following statement triggers an assertion in tsearch:

select ts_delete(array_to_tsvector('{smith,smith,smith}'::text[]), '{smith,smith}'::text[]);
-- TRAP: FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)

regards,
Andreas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas Seltenreich <seltenreich(at)gmx(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <obartunov(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
Date: 2016-08-03 21:39:36
Message-ID: 20000.1470260376@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andreas Seltenreich <seltenreich(at)gmx(dot)de> writes:
> the following statement triggers an assertion in tsearch:

> select ts_delete(array_to_tsvector('{smith,smith,smith}'::text[]), '{smith,smith}'::text[]);
> -- TRAP: FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)

Confirmed here. I notice that the output of array_to_tsvector() is
already fishy in this example:

regression=# select array_to_tsvector('{smith,smith,smith}'::text[]);
array_to_tsvector
-------------------------
'smith' 'smith' 'smith'
(1 row)

Shouldn't those have been merged together? You certainly don't get
results like that from other tsvector-producing operations:

regression=# select to_tsvector('smith smith smith');
to_tsvector
---------------
'smith':1,2,3
(1 row)
regression=# select 'smith smith smith'::tsvector;
tsvector
----------
'smith'
(1 row)

However, that does not seem to be the proximate cause of the crash
in ts_delete, because this non-duplicated case still crashes:

select ts_delete(array_to_tsvector('{smith,smithx,smithy}'::text[]), '{smith,smith}'::text[]);

It kinda looks like you need more than one deletion request for
the first entry in the sorted tsvector, because for example
{smith,foo,toolbox} works but not {smith,too,toolbox}.

I'm thinking there are two distinct bugs here.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas Seltenreich <seltenreich(at)gmx(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <obartunov(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
Date: 2016-08-03 21:52:44
Message-ID: 20542.1470261164@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> I'm thinking there are two distinct bugs here.

Actually, make that three bugs. I was so focused on the crashing
that I failed to notice that ts_delete wasn't producing sane answers
even when it didn't crash:

regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), '{smith,smith}'::text[]);
ts_delete
---------------------
'smith' 'foo' 'bar'
(1 row)

regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), '{smith,foo}'::text[]);
ts_delete
---------------
'smith' 'bar'
(1 row)

regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), '{bar,smith}'::text[]);
ts_delete
---------------------
'smith' 'foo' 'bar'
(1 row)

The non-array version is no better:

regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), 'smith'::text);
ts_delete
---------------------
'smith' 'foo' 'bar'
(1 row)

regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), 'foo'::text);
ts_delete
---------------
'smith' 'bar'
(1 row)

regression=# select ts_delete(array_to_tsvector('{smith,foo,bar}'::text[]), 'bar'::text);
ts_delete
---------------------
'smith' 'foo' 'bar'
(1 row)

I'm not sure if ts_delete takes its second argument as verbatim lexemes
or normalizes them first, but none of these words are changed by
to_tsvector, so either way it seems to fail to delete stuff it should.

regards, tom lane


From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andreas Seltenreich <seltenreich(at)gmx(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
Date: 2016-08-03 21:53:16
Message-ID: CAEepm=0Mcmv34u4NBThj1XDFAeohgxsOzFPn4nsQFVaPMEX7MA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 4, 2016 at 9:39 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andreas Seltenreich <seltenreich(at)gmx(dot)de> writes:
>> the following statement triggers an assertion in tsearch:
>
>> select ts_delete(array_to_tsvector('{smith,smith,smith}'::text[]), '{smith,smith}'::text[]);
>> -- TRAP: FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
>
> Confirmed here. I notice that the output of array_to_tsvector() is
> already fishy in this example:
>
> regression=# select array_to_tsvector('{smith,smith,smith}'::text[]);
> array_to_tsvector
> -------------------------
> 'smith' 'smith' 'smith'
> (1 row)
>
> Shouldn't those have been merged together? You certainly don't get
> results like that from other tsvector-producing operations:
>
> regression=# select to_tsvector('smith smith smith');
> to_tsvector
> ---------------
> 'smith':1,2,3
> (1 row)
> regression=# select 'smith smith smith'::tsvector;
> tsvector
> ----------
> 'smith'
> (1 row)
>
> However, that does not seem to be the proximate cause of the crash
> in ts_delete, because this non-duplicated case still crashes:
>
> select ts_delete(array_to_tsvector('{smith,smithx,smithy}'::text[]), '{smith,smith}'::text[]);
>
> It kinda looks like you need more than one deletion request for
> the first entry in the sorted tsvector, because for example
> {smith,foo,toolbox} works but not {smith,too,toolbox}.
>
> I'm thinking there are two distinct bugs here.

The assertion in tsvector_delete_by_indices fails because its counting
algorithm doesn't expect indices_to_delete to contain multiple
references to the same index. Maybe that could be fixed by
uniquifying in tsvector_delete_arr before calling it, but since
tsvector_delete_by_indices already qsorts its input, it should be able
to handle duplicates cheaply. I was thinking something like this:

for (i = j = k = 0; i < tsv->size; i++)
{
+ bool drop_lexeme = false;
+
/*
* Here we should check whether current i is present in
* indices_to_delete or not. Since indices_to_delete
is already sorted
- * we can advance it index only when we have match.
+ * we can advance it index only when we have match. We do this
+ * repeatedly, in case indices_to_delete contains
duplicate references
+ * to the same index.
*/
- if (k < indices_count && i == indices_to_delete[k])
+ while (k < indices_count && i == indices_to_delete[k])
{
+ drop_lexeme = true;
k++;
- continue;
}
+ if (drop_lexeme)
+ continue;

But that doesn't seem to be enough, there is something else wrong here
resulting in garbage output, maybe related to the failure to merge the
tsvector...

--
Thomas Munro
http://www.enterprisedb.com


From: Noah Misch <noah(at)leadboat(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Andreas Seltenreich <seltenreich(at)gmx(dot)de>, pgsql-hackers(at)postgresql(dot)org, Oleg Bartunov <obartunov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
Date: 2016-08-04 04:14:20
Message-ID: 20160804041420.GA2402899@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 03, 2016 at 05:52:44PM -0400, Tom Lane wrote:
> I wrote:
> > I'm thinking there are two distinct bugs here.
>
> Actually, make that three bugs. I was so focused on the crashing
> that I failed to notice that ts_delete wasn't producing sane answers
> even when it didn't crash:

[Action required within 72 hours. This is a generic notification.]

The above-described topic is currently a PostgreSQL 9.6 open item. Teodor,
since you committed the patch believed to have created it, you own this open
item. If some other commit is more relevant or if this does not belong as a
9.6 open item, please let us know. Otherwise, please observe the policy on
open item ownership[1] and send a status update within 72 hours of this
message. Include a date for your subsequent status update. Testers may
discover new open items at any time, and I want to plan to get them all fixed
in advance of shipping 9.6rc1 next week. Consequently, I will appreciate your
efforts toward speedy resolution. Thanks.

[1] http://www.postgresql.org/message-id/20160527025039.GA447393@tornado.leadboat.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Andreas Seltenreich <seltenreich(at)gmx(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
Date: 2016-08-05 16:10:21
Message-ID: CA+TgmoaT310Lsu9fTOxM+NthT0MWsjYeZPesuwobXrOWn8AjRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 4, 2016 at 12:14 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Wed, Aug 03, 2016 at 05:52:44PM -0400, Tom Lane wrote:
>> I wrote:
>> > I'm thinking there are two distinct bugs here.
>>
>> Actually, make that three bugs. I was so focused on the crashing
>> that I failed to notice that ts_delete wasn't producing sane answers
>> even when it didn't crash:
>
> [Action required within 72 hours. This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 9.6 open item. Teodor,
> since you committed the patch believed to have created it, you own this open
> item. If some other commit is more relevant or if this does not belong as a
> 9.6 open item, please let us know. Otherwise, please observe the policy on
> open item ownership[1] and send a status update within 72 hours of this
> message. Include a date for your subsequent status update. Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> in advance of shipping 9.6rc1 next week. Consequently, I will appreciate your
> efforts toward speedy resolution. Thanks.

Action within 72 hours now seems inadequate; we are scheduled to wrap
rc1 on Monday. We need someone to either fix these bugs very very
soon, or decide to ship beta4 instead of rc1 (uggh), or decide it's OK
to ship rc1 with these known defects, or postpone the planned release.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Andreas Seltenreich <seltenreich(at)gmx(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <obartunov(at)gmail(dot)com>
Subject: Re: Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
Date: 2016-08-05 17:40:04
Message-ID: 8766.1470418804@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Action within 72 hours now seems inadequate; we are scheduled to wrap
> rc1 on Monday. We need someone to either fix these bugs very very
> soon, or decide to ship beta4 instead of rc1 (uggh), or decide it's OK
> to ship rc1 with these known defects, or postpone the planned release.

Given the time of year, I'd not be surprised if Oleg and Teodor are on
vacation. In view of the time pressure, I'll take a whack at fixing this.
I think that Thomas Munro's suggestion is good as far as fixing the Assert
failure is concerned. I do not know where the other problems are, but
maybe I can find them ...

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Andreas Seltenreich <seltenreich(at)gmx(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <obartunov(at)gmail(dot)com>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [sqlsmith] FailedAssertion("!(k == indices_count)", File: "tsvector_op.c", Line: 511)
Date: 2016-08-05 18:40:58
Message-ID: 11562.1470422458@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> The assertion in tsvector_delete_by_indices fails because its counting
> algorithm doesn't expect indices_to_delete to contain multiple
> references to the same index. Maybe that could be fixed by
> uniquifying in tsvector_delete_arr before calling it, but since
> tsvector_delete_by_indices already qsorts its input, it should be able
> to handle duplicates cheaply.

I poked at this and realized that that's not sufficient. If there are
duplicates in indices_to_delete, then the initial estimate

tsout->size = tsv->size - indices_count;

is wrong because indices_count is an overestimate of how many lexemes
will be removed. And because the calculation "dataout = STRPTR(tsout)"
depends on tsout->size, we can't just wait till later to get it right.

We could possibly initialize tsout->size = tsv->size (the maximum
possible value), thereby ensuring that the WordEntry array doesn't
overlap the dataout area; compute the correct tsout->size in the loop;
and then memmove the data area into place to collapse out wasted space.
But I think it might be simpler and better-performant just to de-dup the
indices_to_delete array after qsort'ing it; that would certainly win
for the case of indices_count == 1.

The other problems I noted with failure to delete items seem to stem
from the fact that tsvector_delete_arr relies on tsvector_bsearch to
find items, but the input tsvector is not sorted (never mind de'duped)
by array_to_tsvector. This seems like simple brain fade in
array_to_tsvector, as AFAICS that's a required property of tsvectors.

regards, tom lane