Quick Links

Re: Replacing plpgsql's lexer

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Replacing plpgsql's lexer
Date:	2009-04-14 20:56:56
Message-ID:	603c8f070904141356o7522e8fbu7f45e6d10e3dc139@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Apr 14, 2009 at 4:37 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Whichever way the current discussion about Unicode literals turns out,
> it's clear that plpgsql is not up to speed on matching the core lexer's
> behavior --- it's wrong anyway with respect to
> standard_conforming_strings.
>
> I had earlier speculated semi-facetiously about ripping out the plpgsql
> lexer altogether, but the more I think about it the less silly the idea
> looks. Suppose that we change the core lexer so that the keyword lookup
> table it's supposed to use is passed to scanner_init() rather than being
> hard-wired in. Then make plpgsql call the core lexer using its own
> keyword table. Everything else would match core lexical behavior
> automatically. The special behavior that we do want, such as being
> able to construct a string representing a desired subrange of the input,
> could all be handled in plpgsql-specific wrapper code.
>
> I've just spent a few minutes looking for trouble spots in this theory,
> and so far the only real ugliness I can see is that plpgsql treats
> ":=" and ".." as single tokens whereas the core would parse them as two
> tokens. We could hack the core lexer to have an additional switch that
> controls that. Or maybe just make it always return them as single
> tokens --- AFAICS, neither combination is legal in core SQL anyway,
> so this would only result in a small change in the exact syntax error
> you get if you write such a thing in core SQL.
>
> Another trouble spot is the #option syntax, but that could be handled
> by a special-purpose prescan, or just dropped altogether; it's not like
> we've ever used that for anything but debugging.
>
> It looks like this might take about a day's worth of work (IOW two
> or three days real time) to get done.
>
> Normally I'd only consider doing such a thing during development phase,
> but since we're staring at at least one and maybe two bugs that are
> going to be hard to fix in any materially-less-intrusive way, I'm
> thinking about doing it now. Theoretically this change shouldn't break
> any working code, so letting it hit the streets in 8.4beta2 doesn't seem
> totally unreasonable.
>
> Comments, objections, better ideas?

All this sounds good. As for how to handle := and .., I think making
them lex the same way in PL/pgsql and core SQL would be a good thing.

...Robert

In response to

Replacing plpgsql's lexer at 2009-04-14 20:37:06 from Tom Lane

Responses

Re: Replacing plpgsql's lexer at 2009-04-14 21:07:49 from Andrew Dunstan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Guillaume Smet	2009-04-14 20:58:03	Re: Clean shutdown and warm standby
Previous Message	Robert Haas	2009-04-14 20:51:48	Re: Unicode string literals versus the world