Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search for
  Advanced Search

Bug with UTF-8 character


  • From: Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
  • To: pgsql-hackers(at)postgresql(dot)org, eg(at)cybertec(dot)at
  • Subject: Bug with UTF-8 character
  • Date: Fri, 26 May 2006 08:21:56 +0200
  • Message-id: <44769E84(dot)7000006(at)cybertec(dot)at>

good morning,

I got a bug request for the following unicode character in PostgreSQL 8.1.4: 0xedaeb8

ERROR:  invalid byte sequence for encoding "UTF8": 0xedaeb8

This one seemed to work properly in PostgreSQL 8.0.3.

I think the following code in postgreSQL 814 has a bug in it.

File: postgresql-8.1.4/src/backend/utils/mb/wchar.c


The entry values to the function are:

source = ed ae b8 20 20 20 20 20 20 20 20 20 20 20 20

length = 3 (length is the length of current utf-8 character)

But the code does a check where the second character should not be greater than 0x9F, when first character is 0xED. This is not according to UTF-8 standard in RFC 3629. I believe that is not a valid test.

This test fails on our string, when it shouldn’t.

I believe this is a bug, could you please confirm or let me know what I am doing wrong.


	Many thanks,

		Hans


--
Cybertec Geschwinde & Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at



Home | Main Index | Thread Index

Privacy Policy | PostgreSQL Archives hosted by Command Prompt, Inc. | Designed by tinysofa
Copyright © 1996 – 2008 PostgreSQL Global Development Group