Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search for
  Advanced Search

Re: Bug with UTF-8 character


  • From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
  • To: Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
  • Cc: pgsql-hackers(at)postgresql(dot)org, eg(at)cybertec(dot)at
  • Subject: Re: Bug with UTF-8 character
  • Date: Fri, 26 May 2006 10:33:59 -0400
  • Message-id: <25791(dot)1148654039(at)sss(dot)pgh(dot)pa(dot)us>

=?windows-1252?Q?Hans-J=FCrgen_Sch=F6nig?= <postgres(at)cybertec(dot)at> writes:
> But the code does a check where the second character should not be 
> greater than 0x9F, when first character is 0xED. This is not according 
> to UTF-8 standard in RFC 3629.

Better read the RFC again: it says

   UTF8-3      = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
                 %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
                 ------------

The reason for the prohibition is explained as

  The definition of UTF-8 prohibits encoding character numbers between
  U+D800 and U+DFFF, which are reserved for use with the UTF-16 encoding
  form (as surrogate pairs) and do not directly represent characters.

I don't know anything about "surrogate pairs", but I am not about to
decide that we know more about this than the RFC authors do.  If they
say it's invalid, it's invalid.

			regards, tom lane



Home | Main Index | Thread Index

Privacy Policy | PostgreSQL Archives hosted by Command Prompt, Inc. | Designed by tinysofa
Copyright © 1996 – 2008 PostgreSQL Global Development Group