A couple of issues with psql variable substitution

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: A couple of issues with psql variable substitution
Date: 2011-08-25 16:47:27
Message-ID: 17158.1314290847@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On my way to do something else entirely, I came across a couple of
things that are not very nice about psql's lexer:

1. Somebody broke the no-backtracking property back in 9.0 while adding
quoted variable substitution. According to the flex manual, use of
backtracking creates a performance penalty. We once measured the
backend's lexer as being about a third faster with backtrack avoidance,
and presumably it's about the same for psql's. This is not hard to fix,
but should I consider it a bug fix and back-patch? We've not had
complaints about psql getting slower as of 9.0.

2. The lexer rules associated with variable substitution think that
variable names can consist only of ASCII letters and digits (and
underscores). The psql manual is noncommittal about whether non-ASCII
characters are allowed, but a reasonable person would think that the
rules ought to be the same as the backend's idea of what an identifier
is. Does anybody have a problem with improving that? (I'm not
proposing this part as a bug fix, because it does look a little bit
more invasive to fix, because of the way psql deals with unsafe
multibyte encodings.)

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: A couple of issues with psql variable substitution
Date: 2011-08-25 16:51:16
Message-ID: CA+TgmoZ9i-pcEEgM8Vt6v66Ezkw0OiFZ-CU-5+6poytGmgavnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 25, 2011 at 12:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> On my way to do something else entirely, I came across a couple of
> things that are not very nice about psql's lexer:
>
> 1. Somebody broke the no-backtracking property back in 9.0 while adding
> quoted variable substitution.  According to the flex manual, use of
> backtracking creates a performance penalty.  We once measured the
> backend's lexer as being about a third faster with backtrack avoidance,
> and presumably it's about the same for psql's.  This is not hard to fix,
> but should I consider it a bug fix and back-patch?  We've not had
> complaints about psql getting slower as of 9.0.

That may well have been me. How would I have known that I broke it?

Also, how invasive is the fix?

> 2. The lexer rules associated with variable substitution think that
> variable names can consist only of ASCII letters and digits (and
> underscores).  The psql manual is noncommittal about whether non-ASCII
> characters are allowed, but a reasonable person would think that the
> rules ought to be the same as the backend's idea of what an identifier
> is.  Does anybody have a problem with improving that?

Nope. Or at least, I don't.

> (I'm not
> proposing this part as a bug fix, because it does look a little bit
> more invasive to fix, because of the way psql deals with unsafe
> multibyte encodings.)

+1 for not back-patching this part.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: A couple of issues with psql variable substitution
Date: 2011-08-25 17:00:57
Message-ID: 17405.1314291657@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Aug 25, 2011 at 12:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> 1. Somebody broke the no-backtracking property back in 9.0 while adding
>> quoted variable substitution. According to the flex manual, use of
>> backtracking creates a performance penalty. We once measured the
>> backend's lexer as being about a third faster with backtrack avoidance,
>> and presumably it's about the same for psql's. This is not hard to fix,
>> but should I consider it a bug fix and back-patch? We've not had
>> complaints about psql getting slower as of 9.0.

> That may well have been me.

[ checks "git blame" ] Well, you commmitted the patch anyway: d0cfc018.

> How would I have known that I broke it?

Per the header comments in the backend lexer, you should run flex with
"-b" switch and verify that the resulting lex.backup file says "no
backing up". I've occasionally thought about automating that, but I'm
not sure if the output is entirely locale- and flex-version-independent.

> Also, how invasive is the fix?

We need to add a couple more rules that will match an unterminated
quoted variable and do something reasonable (probably just throw back
everything but the colon with yyless). I've not coded it but I think
it can't be more than a dozen lines or so.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A couple of issues with psql variable substitution
Date: 2011-08-25 17:16:06
Message-ID: 1314292430-sup-2544@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Tom Lane's message of jue ago 25 14:00:57 -0300 2011:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Thu, Aug 25, 2011 at 12:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> 1. Somebody broke the no-backtracking property back in 9.0 while adding
> >> quoted variable substitution. According to the flex manual, use of
> >> backtracking creates a performance penalty. We once measured the
> >> backend's lexer as being about a third faster with backtrack avoidance,
> >> and presumably it's about the same for psql's. This is not hard to fix,
> >> but should I consider it a bug fix and back-patch? We've not had
> >> complaints about psql getting slower as of 9.0.
>
> > That may well have been me.
>
> [ checks "git blame" ] Well, you commmitted the patch anyway: d0cfc018.
>
> > How would I have known that I broke it?
>
> Per the header comments in the backend lexer, you should run flex with
> "-b" switch and verify that the resulting lex.backup file says "no
> backing up". I've occasionally thought about automating that, but I'm
> not sure if the output is entirely locale- and flex-version-independent.

It is locale dependent, though of course for the automated check you
could just run flex under LC_ALL=C.

$ /usr/bin/flex -Cfe -b /pgsql/source/REL8_4_STABLE/src/bin/psql/psqlscan.l
$ cat lex.backup
Sin retroceso.
$ LC_ALL=C /usr/bin/flex -Cfe -b /pgsql/source/REL8_4_STABLE/src/bin/psql/psqlscan.l
$ cat lex.backup
No backing up.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A couple of issues with psql variable substitution
Date: 2011-08-25 17:40:05
Message-ID: 4E5688F5.4000001@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/25/2011 01:16 PM, Alvaro Herrera wrote:
> Excerpts from Tom Lane's message of jue ago 25 14:00:57 -0300 2011:
>> Robert Haas<robertmhaas(at)gmail(dot)com> writes:
>>> On Thu, Aug 25, 2011 at 12:47 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> 1. Somebody broke the no-backtracking property back in 9.0 while adding
>>>> quoted variable substitution. According to the flex manual, use of
>>>> backtracking creates a performance penalty. We once measured the
>>>> backend's lexer as being about a third faster with backtrack avoidance,
>>>> and presumably it's about the same for psql's. This is not hard to fix,
>>>> but should I consider it a bug fix and back-patch? We've not had
>>>> complaints about psql getting slower as of 9.0.
>>> That may well have been me.
>> [ checks "git blame" ] Well, you commmitted the patch anyway: d0cfc018.
>>
>>> How would I have known that I broke it?
>> Per the header comments in the backend lexer, you should run flex with
>> "-b" switch and verify that the resulting lex.backup file says "no
>> backing up". I've occasionally thought about automating that, but I'm
>> not sure if the output is entirely locale- and flex-version-independent.
> It is locale dependent, though of course for the automated check you
> could just run flex under LC_ALL=C.
>
> $ /usr/bin/flex -Cfe -b /pgsql/source/REL8_4_STABLE/src/bin/psql/psqlscan.l
> $ cat lex.backup
> Sin retroceso.
> $ LC_ALL=C /usr/bin/flex -Cfe -b /pgsql/source/REL8_4_STABLE/src/bin/psql/psqlscan.l
> $ cat lex.backup
> No backing up.
>

We could just add -b unconditionally to the flex flags and then count
the number of lines in lex.backup. If it's greater that 1 whine loudly,
or even fail, otherwise remove lex.backup. Would that avoid locale
dependencies?

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A couple of issues with psql variable substitution
Date: 2011-08-25 17:50:40
Message-ID: 18627.1314294640@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> We could just add -b unconditionally to the flex flags and then count
> the number of lines in lex.backup. If it's greater that 1 whine loudly,
> or even fail, otherwise remove lex.backup. Would that avoid locale
> dependencies?

Hm, yeah, seems like that ought to work.

I'm tempted to add "-p -p" also, even though that only results in some
whinging on stderr. It would still probably get noticed by anyone who
was changing the lexer.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A couple of issues with psql variable substitution
Date: 2011-08-25 18:45:42
Message-ID: 20535.1314297942@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> We could just add -b unconditionally to the flex flags and then count
>> the number of lines in lex.backup. If it's greater that 1 whine loudly,
>> or even fail, otherwise remove lex.backup. Would that avoid locale
>> dependencies?

> Hm, yeah, seems like that ought to work.

Done in HEAD, but only for the Makefile-based build mechanism. Anybody
want to add the comparable logic to the MSVC scripts?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A couple of issues with psql variable substitution
Date: 2011-08-25 21:12:59
Message-ID: 23537.1314306779@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While I'm looking at this ... the current implementation has got a
number of very inconsistent behaviors with respect to when it will
expand a variable reference within a psql meta-command argument.
Observe:

regression=# \set foo 'value of foo'
regression=# \set bar 'value of bar'
regression=# \echo :foo
value of foo
regression=# \echo :foo(at)bar
value of foo @bar

(there shouldn't be a space before the @, IMO --- there is because this
gets treated as two separate arguments, which seems bizarre)

regression=# \echo :foo:bar
value of foo value of bar

(again, why is this two arguments not one?)

regression=# \echo :foo@:bar
value of foo @:bar

(why isn't :bar expanded here, when it is in the previous case?)

regression=# \echo foo:foo@:bar
foo:foo@:bar

(and now neither one gets expanded)

ISTM the general rule ought to be that we attempt to substitute for a
colon-construct regardless of where it appears within an argument, as
long as it's not within quotes.

Thoughts?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A couple of issues with psql variable substitution
Date: 2011-08-25 21:18:18
Message-ID: CA+TgmoY0_Oa2_UifDDn-kni_E+kVw57UOpXOOkFCZK84u3ZkJw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 25, 2011 at 5:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> While I'm looking at this ... the current implementation has got a
> number of very inconsistent behaviors with respect to when it will
> expand a variable reference within a psql meta-command argument.
> Observe:
>
> regression=# \set foo 'value of foo'
> regression=# \set bar 'value of bar'
> regression=# \echo :foo
> value of foo
> regression=# \echo :foo(at)bar
> value of foo @bar
>
> (there shouldn't be a space before the @, IMO --- there is because this
> gets treated as two separate arguments, which seems bizarre)
>
> regression=# \echo :foo:bar
> value of foo value of bar
>
> (again, why is this two arguments not one?)
>
> regression=# \echo :foo@:bar
> value of foo @:bar
>
> (why isn't :bar expanded here, when it is in the previous case?)
>
> regression=# \echo foo:foo@:bar
> foo:foo@:bar
>
> (and now neither one gets expanded)
>
> ISTM the general rule ought to be that we attempt to substitute for a
> colon-construct regardless of where it appears within an argument, as
> long as it's not within quotes.
>
> Thoughts?

My main thought is that I remember this code being pretty awful -
especially with respect to error handling - when I looked at it. A
lot of dubious behaviors were more or less compelled by the
impossibility of bailing out at an arbitrary point a la ereport(). At
least, barring a drastic refactoring of the code, which might not be a
bad idea either.

No objection if you want to clean some of it up, but you may find it's
a larger sinkhole than you anticipate.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A couple of issues with psql variable substitution
Date: 2011-08-25 21:29:54
Message-ID: 4E56BED2.2030307@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/25/2011 02:45 PM, Tom Lane wrote:
> I wrote:
>> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>>> We could just add -b unconditionally to the flex flags and then count
>>> the number of lines in lex.backup. If it's greater that 1 whine loudly,
>>> or even fail, otherwise remove lex.backup. Would that avoid locale
>>> dependencies?
>> Hm, yeah, seems like that ought to work.
> Done in HEAD, but only for the Makefile-based build mechanism. Anybody
> want to add the comparable logic to the MSVC scripts?
>
>

Done.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A couple of issues with psql variable substitution
Date: 2011-08-26 03:13:48
Message-ID: 28227.1314328428@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Aug 25, 2011 at 5:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ISTM the general rule ought to be that we attempt to substitute for a
>> colon-construct regardless of where it appears within an argument, as
>> long as it's not within quotes.

> My main thought is that I remember this code being pretty awful -
> especially with respect to error handling - when I looked at it. A
> lot of dubious behaviors were more or less compelled by the
> impossibility of bailing out at an arbitrary point a la ereport(). At
> least, barring a drastic refactoring of the code, which might not be a
> bad idea either.

What I had in mind to do was just to rearrange the flex rules --- the
issues that I called out have to do with dubious choices about when to
transition between different lexer states.

I agree that the error handling isn't terribly friendly in unexpected
cases like there not being a connection available to determine the
literal-quoting rules, but that's not what I'm on about here. I'm
just after consistent variable-expansion behavior in normal operation.

regards, tom lane