From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Replacing plpgsql's lexer |
Date: | 2009-04-14 20:56:56 |
Message-ID: | 603c8f070904141356o7522e8fbu7f45e6d10e3dc139@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Apr 14, 2009 at 4:37 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Whichever way the current discussion about Unicode literals turns out,
> it's clear that plpgsql is not up to speed on matching the core lexer's
> behavior --- it's wrong anyway with respect to
> standard_conforming_strings.
>
> I had earlier speculated semi-facetiously about ripping out the plpgsql
> lexer altogether, but the more I think about it the less silly the idea
> looks. Suppose that we change the core lexer so that the keyword lookup
> table it's supposed to use is passed to scanner_init() rather than being
> hard-wired in. Then make plpgsql call the core lexer using its own
> keyword table. Everything else would match core lexical behavior
> automatically. The special behavior that we do want, such as being
> able to construct a string representing a desired subrange of the input,
> could all be handled in plpgsql-specific wrapper code.
>
> I've just spent a few minutes looking for trouble spots in this theory,
> and so far the only real ugliness I can see is that plpgsql treats
> ":=" and ".." as single tokens whereas the core would parse them as two
> tokens. We could hack the core lexer to have an additional switch that
> controls that. Or maybe just make it always return them as single
> tokens --- AFAICS, neither combination is legal in core SQL anyway,
> so this would only result in a small change in the exact syntax error
> you get if you write such a thing in core SQL.
>
> Another trouble spot is the #option syntax, but that could be handled
> by a special-purpose prescan, or just dropped altogether; it's not like
> we've ever used that for anything but debugging.
>
> It looks like this might take about a day's worth of work (IOW two
> or three days real time) to get done.
>
> Normally I'd only consider doing such a thing during development phase,
> but since we're staring at at least one and maybe two bugs that are
> going to be hard to fix in any materially-less-intrusive way, I'm
> thinking about doing it now. Theoretically this change shouldn't break
> any working code, so letting it hit the streets in 8.4beta2 doesn't seem
> totally unreasonable.
>
> Comments, objections, better ideas?
All this sounds good. As for how to handle := and .., I think making
them lex the same way in PL/pgsql and core SQL would be a good thing.
...Robert
From | Date | Subject | |
---|---|---|---|
Next Message | Guillaume Smet | 2009-04-14 20:58:03 | Re: Clean shutdown and warm standby |
Previous Message | Robert Haas | 2009-04-14 20:51:48 | Re: Unicode string literals versus the world |