Quick Links

Re: invalidly encoded strings

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: invalidly encoded strings
Date:	2007-09-10 14:04:30
Message-ID:	3702.1189433070@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-patches

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> The reason we are prepared to make an exception for Unicode is precisely
> because the code point maps to an encoding pattern independently of
> architecture, ISTM.

Right --- there is a well-defined standard for the numerical value of
each character in Unicode. And it's also clear what to do in
single-byte encodings. It's not at all clear what the representation
ought to be for other multibyte encodings. A direct transliteration
of the byte sequence not only has endianness issues, but will have
a weird non-dense set of valid values because of the restrictions on
valid multibyte characters.

Given that chr() has never before behaved sanely for multibyte values at
all, extending it to Unicode code points is a reasonable extension,
and throwing error for other encodings is reasonable too. If we ever do
come across code-point standards for other encodings we can adopt 'em at
that time.

regards, tom lane

In response to

Re: invalidly encoded strings at 2007-09-10 13:51:09 from Andrew Dunstan

Responses

Re: invalidly encoded strings at 2007-09-10 15:30:51 from Tatsuo Ishii

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2007-09-10 14:14:34	Re: invalidly encoded strings
Previous Message	Oleg Bartunov	2007-09-10 14:04:13	Re: Include Lists for Text Search

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Tom Lane	2007-09-10 14:14:34	Re: invalidly encoded strings
Previous Message	Oleg Bartunov	2007-09-10 14:04:13	Re: Include Lists for Text Search