Re: pg_restore (libpq? parser?) bug in 8

Lists: pgsql-hackers
From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: pg_restore (libpq? parser?) bug in 8
Date: 2004-08-11 16:09:16
Message-ID: 6.1.1.1.0.20040812020654.05ef1690@203.8.195.10
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> CREATE FUNCTION xxx() RETURNS integer
> AS $$ begin return 1;
>2004-08-12 01:38:48 EST<zzz,birds>: ERROR: unterminated dollar-quoted
>string at or near "$$ begin return 1;" at character 115

Just realized the problem; pg_restore uses a trivial parser to work out
when statements start/end. It knows about quotes but not about dollar-quotes.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_restore (libpq? parser?) bug in 8
Date: 2004-08-11 17:33:18
Message-ID: 14447.1092245598@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
> Just realized the problem; pg_restore uses a trivial parser to work out
> when statements start/end. It knows about quotes but not about dollar-quotes.

Blech. Do you have time to fix it?

regards, tom lane


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_restore (libpq? parser?) bug in 8
Date: 2004-08-12 00:41:29
Message-ID: 6.1.1.1.0.20040812103644.058b9f60@203.8.195.10
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 03:33 AM 12/08/2004, Tom Lane wrote:
>Do you have time to fix it?

Should do; I'll add the die-on-error option as well.

Con someone confirm how dollar quoting works:

'$[tag]$'

where tag is alpha chars? any chars? \n? \r?

and closing tag must match.

All dollar-quotes inside any kind of quotes can be ignored.

Is there any circumstance where an unquoted '$' is valid?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_restore (libpq? parser?) bug in 8
Date: 2004-08-12 02:15:20
Message-ID: 23064.1092276920@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
> Con someone confirm how dollar quoting works:
> '$[tag]$'
> where tag is alpha chars? any chars? \n? \r?

IIRC the tag is either empty or anything that looks like a
(dollar-sign-less) identifier. But check the rules in scan.l to be sure.

> Is there any circumstance where an unquoted '$' is valid?

$n parameter identifiers (which is why the tag cannot start with
a digit). Also, I believe $ can be embedded in identifiers (ie,
it can be a non-first character). So a dollar quote can't be
adjacent to a preceding identifier.

IIRC we tried to do ad-hoc code for dollar quoting in psql, and gave it
up as a bad job. You might need to bite the bullet and implement a flex
lexer.

Why exactly does pg_restore need to parse the SQL anyway?

regards, tom lane


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_restore (libpq? parser?) bug in 8
Date: 2004-08-12 02:24:04
Message-ID: 6.1.1.1.0.20040812121941.048b2db8@203.8.195.10
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 12:15 PM 12/08/2004, Tom Lane wrote:
>Why exactly does pg_restore need to parse the SQL anyway?

It just looks for complete statements. From memory it relates to the
possibility that TOC entries can have more than one statement, or it may
relate to handling COPY statements. I think it has to look for
PQresultStatus(...) == PGRES_COPY_IN for each statement it executes, so it
needs to pass statements one at a time.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_restore (libpq? parser?) bug in 8
Date: 2004-08-12 02:28:08
Message-ID: 6.1.1.1.0.20040812122451.0403a3a8@203.8.195.10
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 12:15 PM 12/08/2004, Tom Lane wrote:
>IIRC we tried to do ad-hoc code for dollar quoting in psql, and gave it
>up as a bad job. You might need to bite the bullet and implement a flex
>lexer.

Looks like the psql side of things is not ideal (see other post).

Any idea how backslashes should handled in and out of the tag?

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_restore (libpq? parser?) bug in 8
Date: 2004-08-12 02:42:15
Message-ID: 23283.1092278535@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
> At 12:15 PM 12/08/2004, Tom Lane wrote:
>> Why exactly does pg_restore need to parse the SQL anyway?

> It just looks for complete statements. From memory it relates to the
> possibility that TOC entries can have more than one statement, or it may
> relate to handling COPY statements. I think it has to look for
> PQresultStatus(...) == PGRES_COPY_IN for each statement it executes, so it
> needs to pass statements one at a time.

Hm. But we could assume that a COPY will be all by itself in a TOC
entry, couldn't we?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_restore (libpq? parser?) bug in 8
Date: 2004-08-12 02:43:30
Message-ID: 23304.1092278610@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
> Any idea how backslashes should handled in and out of the tag?

Backslashes aren't legal in the tag, I'm quite sure. In the body of the
quoted string they are not special in any way.

regards, tom lane


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_restore (libpq? parser?) bug in 8
Date: 2004-08-12 02:56:16
Message-ID: 6.1.1.1.0.20040812124759.0493f060@203.8.195.10
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 12:42 PM 12/08/2004, Tom Lane wrote:
>Hm. But we could assume that a COPY will be all by itself in a TOC
>entry, couldn't we?

Maybe. I know I hit a couple of nasty examples in the original code. Isn't
the COPY combined with the data? If so, we still have to scan for it's end.
The existing scanner is pretty trivial. The dollar-quoting will not make it
much worse (I hope). But I'll hold off on the changes if you want to
experiment -- I used to use my own DBs + the regression DB for testing.

Another possible issue - if I pass two statements in one string to libpq,
separated by semicolons, will it cope? If so, has that been true since 7.0?
If the answers are ('no',_), or ('yes', 'no') then that explains why
pg_restore has to parse statements - the TOC entry can have more than one
statement.

Sorry to be vague, it's a long time since I wrote the code.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_restore (libpq? parser?) bug in 8
Date: 2004-08-12 03:25:57
Message-ID: 23760.1092281157@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Philip Warner <pjw(at)rhyme(dot)com(dot)au> writes:
> At 12:42 PM 12/08/2004, Tom Lane wrote:
>> Hm. But we could assume that a COPY will be all by itself in a TOC
>> entry, couldn't we?

> Maybe. I know I hit a couple of nasty examples in the original code. Isn't
> the COPY combined with the data? If so, we still have to scan for it's end.
> The existing scanner is pretty trivial.

Agreed. But we only emit dollar quoting in CREATE FUNCTION entries, and
I don't really see why you need to parse those with any accuracy. I
think we could do something here with making assumptions based on the
known TOC entry type about what might be in it.

> Another possible issue - if I pass two statements in one string to libpq,
> separated by semicolons, will it cope? If so, has that been true since 7.0?

Yes, and yes, except that if the first one gets an error the second will
not be executed. Per the other thread, that's probably a behavior change
we don't want.

regards, tom lane


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_restore (libpq? parser?) bug in 8
Date: 2004-08-12 04:35:01
Message-ID: 6.1.1.1.0.20040812132807.05515138@203.8.195.10
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 12:15 PM 12/08/2004, Tom Lane wrote:
>You might need to bite the bullet and implement a flex
>lexer.

I'd like to avoid this if I can; AFAICT, for statement detection on
pg_restore, I can require white space before the $tag. Since I also skip
over all quoted text, the bodies of functions are ignored. The only issues
will be attribute names with ' $' in them, but they will be quoted as well
(so ignored).

So to recognize a tag, I look for a '$' after white space, and assume it's
a tag start. If I subsequently read an invalid tag char, I just go back
into scan mode on that character and assume the '$...' was some other valid
sql element.

From other threads, it sounds like removing the statement detection code
entirely is not an option.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: <pjw(at)rhyme(dot)com(dot)au>
Cc: <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_restore (libpq? parser?) bug in 8
Date: 2004-08-12 09:06:59
Message-ID: 4181.24.211.141.25.1092301619.squirrel@www.dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Philip Warner said:
> At 12:15 PM 12/08/2004, Tom Lane wrote:
>>You might need to bite the bullet and implement a flex
>>lexer.
>
> I'd like to avoid this if I can; AFAICT, for statement detection on
> pg_restore, I can require white space before the $tag. Since I also
> skip over all quoted text, the bodies of functions are ignored. The
> only issues will be attribute names with ' $' in them, but they will
> be quoted as well (so ignored).
>
> So to recognize a tag, I look for a '$' after white space, and assume
> it's a tag start. If I subsequently read an invalid tag char, I just
> go back into scan mode on that character and assume the '$...' was
> some other valid sql element.
>
> From other threads, it sounds like removing the statement detection
> code
> entirely is not an option.
>
>

See the discussions that culminated here before Tom bit the bullet and
implemented a flex scanner for psql:

http://archives.postgresql.org/pgsql-patches/2004-02/msg00182.php

It's not as easy as you might think.

I had a thought that might short-circuit this - do we even need pg_dump to
use dollar quoting for non-text output, such as is required by pg_restore?
IIRC, the idea was to have text dumps that didn't undo what the user had
done in using dollar quoting, but I am not sure that consideration applies
to non-text output. Turning it off should be very easy - we already have the
switch.

Thoughts?

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc: pjw(at)rhyme(dot)com(dot)au, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_restore (libpq? parser?) bug in 8
Date: 2004-08-12 13:42:05
Message-ID: 28195.1092318125@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Andrew Dunstan" <andrew(at)dunslane(dot)net> writes:
> I had a thought that might short-circuit this - do we even need pg_dump to
> use dollar quoting for non-text output, such as is required by pg_restore?

That's a possibility, but I'd rather work around it by finding a way for
pg_restore not to need to parse the dollar-quoted literal in the first
place. I haven't heard a reason why it needs to parse the contents of a
CREATE FUNCTION entry. If we guarantee not to put dollar-quoted
literals into the TOC types it *does* need to parse, ISTM we're good.

regards, tom lane


From: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_restore (libpq? parser?) bug in 8
Date: 2004-08-13 01:45:11
Message-ID: 6.1.1.1.0.20040813113553.063442d8@203.8.195.10
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 11:42 PM 12/08/2004, Tom Lane wrote:
>That's a possibility, but I'd rather work around it by finding a way for
>pg_restore not to need to parse the dollar-quoted literal in the first
>place.

Two possibilities:

1) Parse the tags (I have the code working): it's not that hard, the only
trick bit being recognizing the tags in the first place. I have assumed
that any bare unquoted string that is not preceded by valid identifier name
chars, and which starts with a '$' may be a dollar quote. This seems valid
to me.

2) We could avoid special coding for TOC entry types (eg. pg_restore
knowing 'FUNCTION' TOC entries should not be parsed), by changing the TOC
data to include a flag/counter (set by pg_dump) indicating that the entry
contains > 1 statements. Then we don't hard code knowledge of TOC entry
types, and function definitions will not be parsed. Old dump files would be
treated as multi-statement, and still be parsed.

If my assumption in (1) is valid, then I have a very mild preference for
it, but am happy with either.

----------------------------------------------------------------
Philip Warner | __---_____
Albatross Consulting Pty. Ltd. |----/ - \
(A.B.N. 75 008 659 498) | /(@) ______---_
Tel: (+61) 0500 83 82 81 | _________ \
Fax: (+61) 03 5330 3172 | ___________ |
Http://www.rhyme.com.au | / \|
| --________--
PGP key available upon request, | /
and from pgp.mit.edu:11371 |/


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Philip Warner <pjw(at)rhyme(dot)com(dot)au>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_restore (libpq? parser?) bug in 8
Date: 2004-08-17 17:10:39
Message-ID: 200408171710.i7HHAdI22280@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Added to open items list:

* fix dollar quoting problems in pg_restore

---------------------------------------------------------------------------

Philip Warner wrote:
>
> > CREATE FUNCTION xxx() RETURNS integer
> > AS $$ begin return 1;
> >2004-08-12 01:38:48 EST<zzz,birds>: ERROR: unterminated dollar-quoted
> >string at or near "$$ begin return 1;" at character 115
>
>
> Just realized the problem; pg_restore uses a trivial parser to work out
> when statements start/end. It knows about quotes but not about dollar-quotes.
>
>
>
> ----------------------------------------------------------------
> Philip Warner | __---_____
> Albatross Consulting Pty. Ltd. |----/ - \
> (A.B.N. 75 008 659 498) | /(@) ______---_
> Tel: (+61) 0500 83 82 81 | _________ \
> Fax: (+61) 03 5330 3172 | ___________ |
> Http://www.rhyme.com.au | / \|
> | --________--
> PGP key available upon request, | /
> and from pgp.mit.edu:11371 |/
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073