Re: Replacing plpgsql's lexer

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Replacing plpgsql's lexer
Date: 2009-04-14 20:56:56
Message-ID: 603c8f070904141356o7522e8fbu7f45e6d10e3dc139@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 14, 2009 at 4:37 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Whichever way the current discussion about Unicode literals turns out,
> it's clear that plpgsql is not up to speed on matching the core lexer's
> behavior --- it's wrong anyway with respect to
> standard_conforming_strings.
>
> I had earlier speculated semi-facetiously about ripping out the plpgsql
> lexer altogether, but the more I think about it the less silly the idea
> looks.  Suppose that we change the core lexer so that the keyword lookup
> table it's supposed to use is passed to scanner_init() rather than being
> hard-wired in.  Then make plpgsql call the core lexer using its own
> keyword table.  Everything else would match core lexical behavior
> automatically.  The special behavior that we do want, such as being
> able to construct a string representing a desired subrange of the input,
> could all be handled in plpgsql-specific wrapper code.
>
> I've just spent a few minutes looking for trouble spots in this theory,
> and so far the only real ugliness I can see is that plpgsql treats
> ":=" and ".." as single tokens whereas the core would parse them as two
> tokens.  We could hack the core lexer to have an additional switch that
> controls that.  Or maybe just make it always return them as single
> tokens --- AFAICS, neither combination is legal in core SQL anyway,
> so this would only result in a small change in the exact syntax error
> you get if you write such a thing in core SQL.
>
> Another trouble spot is the #option syntax, but that could be handled
> by a special-purpose prescan, or just dropped altogether; it's not like
> we've ever used that for anything but debugging.
>
> It looks like this might take about a day's worth of work (IOW two
> or three days real time) to get done.
>
> Normally I'd only consider doing such a thing during development phase,
> but since we're staring at at least one and maybe two bugs that are
> going to be hard to fix in any materially-less-intrusive way, I'm
> thinking about doing it now.  Theoretically this change shouldn't break
> any working code, so letting it hit the streets in 8.4beta2 doesn't seem
> totally unreasonable.
>
> Comments, objections, better ideas?

All this sounds good. As for how to handle := and .., I think making
them lex the same way in PL/pgsql and core SQL would be a good thing.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Guillaume Smet 2009-04-14 20:58:03 Re: Clean shutdown and warm standby
Previous Message Robert Haas 2009-04-14 20:51:48 Re: Unicode string literals versus the world