Re: Unicode string literals versus the world

From: Marko Kreen <markokr(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Unicode string literals versus the world
Date: 2009-04-14 14:13:00
Message-ID: e51f66da0904140713r4144a5d9i1382af935de77c4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/14/09, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On Tuesday 14 April 2009 14:38:38 Marko Kreen wrote:
> > I think the problem is that they should not act like E'' strings, but they
> > should act like plain '' strings - they should follow stdstr setting.
> >
> > That way existing tools that may (or may not..) understand E'' and stdstr
> > settings, but definitely have not heard about U&'' strings can still
> > parse the SQL without new surprises.
>
>
> Can you be more specific in what "surprises" you expect? What algorithms do
> you suppose those "existing tools" use and what expectations do they have?

If the parsing does not happen in 2 passes and it does not take account
of stdstr setting then the default breakage would be:

stdstr=off, U&' \' UESCAPE '!'.

And anything, whose security or functionality depends on parsing SQL
can be broken that way.

Broken functionality would be eg. Slony (or other replication solution)
distributing developer-written SQL code to bunch of nodes. It needs to
parse text file to SQL statements and execute them separately.

There are probably other solutions who expect to understand SQL
at least token level to function correctly. (pgpool, java has
probably something depending on it, etc.)

> > I still stand on my proposal, how about extending E'' strings with
> > unicode escapes (eg. \uXXXX)? The E'' strings are already more
> > clearly defined than '' and they are our "own", we don't need to
> > consider random standards, but can consider our sanity.
>
>
> This doesn't excite me. I think the tendency should be to get rid of E''
> usage, because its definition of escape sequences is single-byte and ASCII
> centric and thus overall a legacy construct.

Why are you concentrating only on \0xx escapes? The \\, \n, etc
seem standard and forward looking enough. Yes, unicode escapes are
missing but we can add them without breaking anything.

> Certainly, we will want to keep
> around E'' for a long time or forever, but it is a legitimate goal for
> application writers to not use it, which is after all the reason behind this
> whole standards-conforming strings project. I wouldn't want to have a
> forward-looking feature such as the Unicode escapes be burdened with that kind
> of legacy behavior.
>
> Also note that Unicode escapes are also available for identifiers, for which
> there is no existing E"" that you can add it to.

Well, I was not rejecting the standard quoting, but suggesting
postponing until the stdstr mess is sorted out. We can use \uXX
in meantime and I think most Postgres users would prefer to keep
using it...

--
marko

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2009-04-14 14:27:36 Re: proposal: add columns created and altered to pg_proc and pg_class
Previous Message Tom Lane 2009-04-14 13:58:29 Re: Windowing functions vs aggregates