Lists: | pgsql-bugspgsql-hackers |
---|
From: | Stuart Bishop <stuart(at)stuartbishop(dot)net> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Nasty tsvector can make dumps unrestorable |
Date: | 2007-09-28 17:04:38 |
Message-ID: | 46FD3426.9090300@stuartbishop.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-hackers |
To continue our streak of bad luck, here is the second tsearch2 bug we found
this week.
The attached script creates a tsvector with a value that can be dumped using
pg_dump, but not loaded again using pg_restore. This causes restores of a
dump containing this value to fail.
This script only tested with PG 8.2.5 under Ubuntu Feisty so far, although
we found the original problem under 8.2.4 on Ubuntu Dapper.
Also reported in the Ubuntu bug tracker at:
https://bugs.launchpad.net/ubuntu/+source/postgresql-8.2/+bug/146382
--
Stuart Bishop <stuart(at)stuartbishop(dot)net>
http://www.stuartbishop.net/
Attachment | Content-Type | Size |
---|---|---|
unrestorabledb.sql | text/x-sql | 1.5 KB |
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Stuart Bishop <stuart(at)stuartbishop(dot)net> |
Cc: | pgsql-bugs(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Subject: | Re: Nasty tsvector can make dumps unrestorable |
Date: | 2007-09-29 18:45:40 |
Message-ID: | 8464.1191091540@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-hackers |
Stuart Bishop <stuart(at)stuartbishop(dot)net> writes:
> The attached script creates a tsvector with a value that can be dumped using
> pg_dump, but not loaded again using pg_restore. This causes restores of a
> dump containing this value to fail.
Hmm, sorta looks like tsvectorout should be doubling backslashes?
regards, tom lane
From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Stuart Bishop <stuart(at)stuartbishop(dot)net>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org> |
Subject: | Re: [BUGS] Nasty tsvector can make dumps unrestorable |
Date: | 2007-11-10 03:19:44 |
Message-ID: | 200711100319.lAA3Ji219846@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-hackers |
Tom Lane wrote:
> Stuart Bishop <stuart(at)stuartbishop(dot)net> writes:
> > The attached script creates a tsvector with a value that can be dumped using
> > pg_dump, but not loaded again using pg_restore. This causes restores of a
> > dump containing this value to fail.
>
> Hmm, sorta looks like tsvectorout should be doubling backslashes?
I think the larger question is why tsvectorin() requires
double-backslashes? It seems it is for marking of single-quotes in
phrases, from what I can tell from the code and regression test usage:
SELECT E'''1 \\''2'' 3'::tsvector;
tsvector
-------------
'3' '1 ''2'
(1 row)
My guess is that the '' is used to start/stop phrases, and \\'' puts a
literal '' in the phrase.
I have developed the attached patch which doubles backslashes on output:
test=> INSERT INTO Foo(bar) VALUES (E'\\\\x');
INSERT 0 1
test=> select * from foo;
bar
-------
'\\x'
(1 row)
However, I am still unclear if the dump code is correct because I don't
see the backslash preserved in \\'' cases, just \\\\ cases:
test=> CREATE TABLE Foo(bar tsvector);
CREATE
test=> INSERT INTO Foo(bar) VALUES (E'\\''x');
INSERT 0 1
test=> select * from foo;
bar
-------
'''x'
(1 row)
and pg_dump outputs:
COPY foo (bar) FROM stdin;
'''x'
\.
While the COPY will load into the table, this doesn't:
test=> INSERT INTO Foo(bar) VALUES (E'''''x');
ERROR: syntax error in tsvector: "''x"
I am confused.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
Attachment | Content-Type | Size |
---|---|---|
/pgpatches/ts_backslash | text/x-diff | 567 bytes |
From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stuart Bishop <stuart(at)stuartbishop(dot)net>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org> |
Subject: | Re: [BUGS] Nasty tsvector can make dumps unrestorable |
Date: | 2007-11-10 03:53:18 |
Message-ID: | 47352B2E.7020507@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-hackers |
Bruce Momjian wrote:
> However, I am still unclear if the dump code is correct because I don't
> see the backslash preserved in \\'' cases, just \\\\ cases:
>
> test=> CREATE TABLE Foo(bar tsvector);
> CREATE
> test=> INSERT INTO Foo(bar) VALUES (E'\\''x');
> INSERT 0 1
> test=> select * from foo;
> bar
> -------
> '''x'
> (1 row)
>
> and pg_dump outputs:
>
> COPY foo (bar) FROM stdin;
> '''x'
> \.
>
>
> While the COPY will load into the table, this doesn't:
>
> test=> INSERT INTO Foo(bar) VALUES (E'''''x');
> ERROR: syntax error in tsvector: "''x"
>
> I am confused.
>
>
These two are not equivalent. What happens if you try this?
INSERT INTO Foo(bar) VALUES (E'''''''x''');
cheers
andrew
From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stuart Bishop <stuart(at)stuartbishop(dot)net>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org> |
Subject: | Re: [BUGS] Nasty tsvector can make dumps unrestorable |
Date: | 2007-11-10 03:55:15 |
Message-ID: | 200711100355.lAA3tF416292@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-hackers |
Andrew Dunstan wrote:
> > While the COPY will load into the table, this doesn't:
> >
> > test=> INSERT INTO Foo(bar) VALUES (E'''''x');
> > ERROR: syntax error in tsvector: "''x"
> >
> > I am confused.
> >
> >
>
>
> These two are not equivalent. What happens if you try this?
>
> INSERT INTO Foo(bar) VALUES (E'''''''x''');
test=> INSERT INTO Foo(bar) VALUES (E'''''''x''');
INSERT 0 1
test=> select * from foo;
bar
-------
'''x'
(1 row)
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com
+ If your life is a hard drive, Christ can be your backup. +
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Bruce Momjian <bruce(at)momjian(dot)us> |
Cc: | Stuart Bishop <stuart(at)stuartbishop(dot)net>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [BUGS] Nasty tsvector can make dumps unrestorable |
Date: | 2007-11-10 16:41:18 |
Message-ID: | 16207.1194712878@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs pgsql-hackers |
Bruce Momjian <bruce(at)momjian(dot)us> writes:
> However, I am still unclear if the dump code is correct because I don't
> see the backslash preserved in \\'' cases, just \\\\ cases:
> test=> INSERT INTO Foo(bar) VALUES (E'\\''x');
You're just confused. That produces a word whose contents are the
two characters 'x, so either '\'x' or '''x' would be legitimate
output.
However, I'd prefer to see Teodor fix this, because it needs to be
back-patched too, and I'm not entirely sure if there are other
consequences.
regards, tom lane