Nasty tsvector can make dumps unrestorable

Lists: pgsql-bugspgsql-hackers
From: Stuart Bishop <stuart(at)stuartbishop(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Nasty tsvector can make dumps unrestorable
Date: 2007-09-28 17:04:38
Message-ID: 46FD3426.9090300@stuartbishop.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

To continue our streak of bad luck, here is the second tsearch2 bug we found
this week.

The attached script creates a tsvector with a value that can be dumped using
pg_dump, but not loaded again using pg_restore. This causes restores of a
dump containing this value to fail.

This script only tested with PG 8.2.5 under Ubuntu Feisty so far, although
we found the original problem under 8.2.4 on Ubuntu Dapper.

Also reported in the Ubuntu bug tracker at:

https://bugs.launchpad.net/ubuntu/+source/postgresql-8.2/+bug/146382

--
Stuart Bishop <stuart(at)stuartbishop(dot)net>
http://www.stuartbishop.net/

Attachment Content-Type Size
unrestorabledb.sql text/x-sql 1.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stuart Bishop <stuart(at)stuartbishop(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: Nasty tsvector can make dumps unrestorable
Date: 2007-09-29 18:45:40
Message-ID: 8464.1191091540@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Stuart Bishop <stuart(at)stuartbishop(dot)net> writes:
> The attached script creates a tsvector with a value that can be dumped using
> pg_dump, but not loaded again using pg_restore. This causes restores of a
> dump containing this value to fail.

Hmm, sorta looks like tsvectorout should be doubling backslashes?

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stuart Bishop <stuart(at)stuartbishop(dot)net>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [BUGS] Nasty tsvector can make dumps unrestorable
Date: 2007-11-10 03:19:44
Message-ID: 200711100319.lAA3Ji219846@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:
> Stuart Bishop <stuart(at)stuartbishop(dot)net> writes:
> > The attached script creates a tsvector with a value that can be dumped using
> > pg_dump, but not loaded again using pg_restore. This causes restores of a
> > dump containing this value to fail.
>
> Hmm, sorta looks like tsvectorout should be doubling backslashes?

I think the larger question is why tsvectorin() requires
double-backslashes? It seems it is for marking of single-quotes in
phrases, from what I can tell from the code and regression test usage:

SELECT E'''1 \\''2'' 3'::tsvector;
tsvector
-------------
'3' '1 ''2'
(1 row)

My guess is that the '' is used to start/stop phrases, and \\'' puts a
literal '' in the phrase.

I have developed the attached patch which doubles backslashes on output:

test=> INSERT INTO Foo(bar) VALUES (E'\\\\x');
INSERT 0 1
test=> select * from foo;
bar
-------
'\\x'
(1 row)

However, I am still unclear if the dump code is correct because I don't
see the backslash preserved in \\'' cases, just \\\\ cases:

test=> CREATE TABLE Foo(bar tsvector);
CREATE
test=> INSERT INTO Foo(bar) VALUES (E'\\''x');
INSERT 0 1
test=> select * from foo;
bar
-------
'''x'
(1 row)

and pg_dump outputs:

COPY foo (bar) FROM stdin;
'''x'
\.

While the COPY will load into the table, this doesn't:

test=> INSERT INTO Foo(bar) VALUES (E'''''x');
ERROR: syntax error in tsvector: "''x"

I am confused.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

Attachment Content-Type Size
/pgpatches/ts_backslash text/x-diff 567 bytes

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stuart Bishop <stuart(at)stuartbishop(dot)net>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [BUGS] Nasty tsvector can make dumps unrestorable
Date: 2007-11-10 03:53:18
Message-ID: 47352B2E.7020507@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Bruce Momjian wrote:
> However, I am still unclear if the dump code is correct because I don't
> see the backslash preserved in \\'' cases, just \\\\ cases:
>
> test=> CREATE TABLE Foo(bar tsvector);
> CREATE
> test=> INSERT INTO Foo(bar) VALUES (E'\\''x');
> INSERT 0 1
> test=> select * from foo;
> bar
> -------
> '''x'
> (1 row)
>
> and pg_dump outputs:
>
> COPY foo (bar) FROM stdin;
> '''x'
> \.
>
>
> While the COPY will load into the table, this doesn't:
>
> test=> INSERT INTO Foo(bar) VALUES (E'''''x');
> ERROR: syntax error in tsvector: "''x"
>
> I am confused.
>
>

These two are not equivalent. What happens if you try this?

INSERT INTO Foo(bar) VALUES (E'''''''x''');

cheers

andrew


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stuart Bishop <stuart(at)stuartbishop(dot)net>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [BUGS] Nasty tsvector can make dumps unrestorable
Date: 2007-11-10 03:55:15
Message-ID: 200711100355.lAA3tF416292@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Andrew Dunstan wrote:
> > While the COPY will load into the table, this doesn't:
> >
> > test=> INSERT INTO Foo(bar) VALUES (E'''''x');
> > ERROR: syntax error in tsvector: "''x"
> >
> > I am confused.
> >
> >
>
>
> These two are not equivalent. What happens if you try this?
>
> INSERT INTO Foo(bar) VALUES (E'''''''x''');

test=> INSERT INTO Foo(bar) VALUES (E'''''''x''');
INSERT 0 1
test=> select * from foo;
bar
-------
'''x'
(1 row)

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Stuart Bishop <stuart(at)stuartbishop(dot)net>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] Nasty tsvector can make dumps unrestorable
Date: 2007-11-10 16:41:18
Message-ID: 16207.1194712878@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> However, I am still unclear if the dump code is correct because I don't
> see the backslash preserved in \\'' cases, just \\\\ cases:

> test=> INSERT INTO Foo(bar) VALUES (E'\\''x');

You're just confused. That produces a word whose contents are the
two characters 'x, so either '\'x' or '''x' would be legitimate
output.

However, I'd prefer to see Teodor fix this, because it needs to be
back-patched too, and I'm not entirely sure if there are other
consequences.

regards, tom lane