Re: Unicode string literals versus the world

From: Marko Kreen <markokr(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unicode string literals versus the world
Date: 2009-04-14 19:51:54
Message-ID: e51f66da0904141251i52fb42d3t6a7f4bed43807ac@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/14/09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > On Tuesday 14 April 2009 18:54:33 Tom Lane wrote:
> >> The other proposal that seemed
> >> attractive to me was a decode-like function:
> >>
> >> uescape('foo\00e9bar')
> >> uescape('foo\00e9bar', '\')
>
> > This was discussed previously, but rejected with the following argument:
>
> > There are some other disadvantages for making a function call. You
> > couldn't use that kind of literal in any other place where the parser
> > calls for a string constant: role names, tablespace locations,
> > passwords, copy delimiters, enum values, function body, file names.
>
>
> I'm less than convinced that those are really plausible use-cases for
> characters that one is unable to type directly. However, I'll grant the
> point. So that narrows us down to considering the \u extension to E''
> strings as a saner and safer alternative to the spec's syntax.

My vote would go to \u. The U& may be "sql standard" but it's different
from any established practical standard.

Alternative would be to make U& follow stdstr setting:

stdstr=on -> you get fully standard-conforming syntax:

U&'\xxx' UESCAPE '\'

stdstr=off -> you need to follow old quoting rules:

U&'\\xxx' UESCAPE '\\'

This would result in safe, and when stdstr=on, fully standard compliant
syntax. Only downside would be that in practice - stdstr=off - it would
be unusable.

Third alternative would be to do both of them - \u as a usable method
and safe-U& to mark the checkbox for SQL-standard compliance.
If we do want U&, I would prefer that to U&-only syntax.

--
marko

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-04-14 20:37:06 Replacing plpgsql's lexer
Previous Message Meredith L. Patterson 2009-04-14 19:39:41 Re: Unicode string literals versus the world