Quick Links

Re: [rfc] unicode escapes for extended strings

From:	Marko Kreen <markokr(at)gmail(dot)com>
To:	Sam Mason <sam(at)samason(dot)me(dot)uk>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [rfc] unicode escapes for extended strings
Date:	2009-04-16 19:32:16
Message-ID:	e51f66da0904161232k7f287f9ey751ec1c09188af8d@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 4/16/09, Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:
> On Thu, Apr 16, 2009 at 08:48:58PM +0300, Marko Kreen wrote:
> > Seems I'm bad at communicating in english,
>
>
> I hope you're not saying this because of my misunderstandings!
>
>
> > so here is C variant of
> > my proposal to bring \u escaping into extended strings. Reasons:
> >
> > - More people are familiar with \u escaping, as it's standard
> > in Java/C#/Python, probably more..
> > - U& strings will not work when stdstr=off.
> >
> > Syntax:
> >
> > \uXXXX - 16-bit value
> > \UXXXXXXXX - 32-bit value
> >
> > Additionally, both \u and \U can be used to specify UTF-16 surrogate
> > pairs to encode characters with value > 0xFFFF. This is exact behaviour
> > used by Java/C#/Python. (except that Java does not have \U)
>
>
> Are you sure that this handling of surrogates is correct? The best
> answer I've managed to find on the Unicode consortium's site is:
>
> http://unicode.org/faq/utf_bom.html#utf16-7
>
> it says:
>
> They are invalid in interchange, but may be freely used internal to an
> implementation.
>
> I think this means they consider the handling of them you noted above,
> in other languages, to be an error.

It's up to UTF8 validator whether to consider non-characters as error.

--
marko

In response to

Re: [rfc] unicode escapes for extended strings at 2009-04-16 18:43:09 from Sam Mason

Responses

Re: [rfc] unicode escapes for extended strings at 2009-04-17 16:07:31 from Marko Kreen

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tiramisu Mokka	2009-04-16 20:02:01	postgres 8.4 beta1
Previous Message	Sam Mason	2009-04-16 19:21:37	Re: [rfc] unicode escapes for extended strings