Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search archives
  Advanced Search

More message encoding woes


  • From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
  • To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
  • Subject: More message encoding woes
  • Date: Mon, 30 Mar 2009 15:52:37 +0300
  • Message-id: <49D0C095.8000304@enterprisedb.com> <text/plain>

latin1db=# SELECT version();
version
-----------------------------------------------------------------------------------
PostgreSQL 8.3.7 on i686-pc-linux-gnu, compiled by GCC gcc (Debian 4.3.3-5) 4.3.3
(1 row)

latin1db=# SELECT name, setting FROM pg_settings where name like 'lc%' OR name like '%encoding';
      name       | setting
-----------------+---------
 client_encoding | utf8
 lc_collate      | C
 lc_ctype        | C
 lc_messages     | es_ES
 lc_monetary     | C
 lc_numeric      | C
 lc_time         | C
 server_encoding | LATIN1
(8 rows)

latin1db=# SELECT * FROM foo;
ERROR:  no existe la relación «foo»

The accented characters are garbled. When I try the same with a database that's in UTF8 in the same cluster, it works:

utf8db=# SELECT name, setting FROM pg_settings where name like 'lc%' OR name like '%encoding';
      name       | setting
-----------------+---------
 client_encoding | UTF8
 lc_collate      | C
 lc_ctype        | C
 lc_messages     | es_ES
 lc_monetary     | C
 lc_numeric      | C
 lc_time         | C
 server_encoding | UTF8
(8 rows)

utf8db=# SELECT * FROM foo;
ERROR:  no existe la relación «foo»

What is happening is that gettext() returns the message in the encoding determined by LC_CTYPE, while we expect it to return it in the database encoding. Starting with PG 8.3 we enforce that the encoding specified in LC_CTYPE matches the database encoding, but not for the C locale.

In CVS HEAD, we call bind_textdomain_codeset() in SetDatabaseEncoding() which fixes that, but we only do it on Windows. In earlier versions we called it on all platforms, but only for UTF-8. It seems that we should call bind_textdomain_codeset on all platforms and all encodings. However, there seems to be a reason why we only do it for Windows on CVS HEAD: we need a mapping from our encoding ID to the OS codeset name, and the OS codeset names vary.

How can we make this more robust?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com



Home | Main Index | Thread Index

Privacy Policy | About PostgreSQL
Copyright © 1996 – 2012 PostgreSQL Global Development Group