Re: Dollar in identifiers

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jan Wieck <JanWieck(at)yahoo(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Thomas Lockhart <lockhart(at)fourpalms(dot)org>, PostgreSQL HACKERS <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Dollar in identifiers
Date: 2001-08-17 15:00:18
Message-ID: 29572.998060418@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been thinking some more about this dollar-sign business. There
are a couple of points that haven't been made yet. If you'll allow
me to recap:

It seems like there are two reasonable paths we could take:

1. Keep $ as an operator character. If we go this way, I think we
should allow a single $ as an operator name too (by removing $ from
the set of "self" characters in scan.l, so that it lexes as an Op).

2. Make $ an identifier character. Remove it from the set of allowed
operator characters, and instead allow it as second-or-later character
in identifiers. (It cannot be allowed as first character, else it's
totally ambiguous whether $12 is meant to be a parameter or identifier.)

Option 2 improves Oracle compatibility, at the price of breaking
backwards compatibility for applications that presently use $ as part
of multi-character operator names. (But does anyone know of any?)

An important thing to think about here is the effects on lexing of
parameter symbols ($digits). Option 1 does not complicate parameter
lexing; $digits will still be read as a parameter since it's a longer
token than could be formed by taking the $ as an Op. However, this
option doesn't make things any better either: in particular, we still
have the lexing ambiguity of multicharacter operator vs. parameter.
"x+$12" will be read as x +$ 12, though more likely x + $12 was meant.

With $-as-identifier, it'd no longer be possible for adjacent operators
and parameters to be confused. Instead we have a new ambiguity with
adjacent parameters and identifiers/keywords. Presently "select$1from"
is read as SELECT param FROM, but with $-as-identifier it'd be read as
a single identifier. But the interesting point is that this'd make
parameters work a lot more like identifiers. People don't expect to
be able to write identifiers adjacent to other identifiers with no
whitespace. They do expect to be able to write them adjacent to
operators.

In fact, with $-as-identifier we'd have this useful property: given a
lexically-recognizable identifier, substitution of a parameter token
for the identifier does not require insertion of any whitespace to
keep the parameter lexically recognizable. Some of you will recall
plpgsql bugs associated with the fact that the current lexer behavior
does not have this property. (The other direction doesn't work 100%,
for example: "select $1from" is lexable, "select foofrom" isn't. But
that direction is much less interesting in practice.)

In short, $-as-identifier makes the lexer behavior noticeably cleaner
than it is now.

I started out firmly in the "keep $ an operator character" camp. But
after thinking this through I'm sitting on the fence: both options seem
about equally attractive to me.

Comments?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2001-08-17 15:25:10 Re: Dollar in identifiers
Previous Message Bruce Momjian 2001-08-17 14:47:49 Re: [HACKERS] Re: WIN32 errno patch