Re: Database object names and libpq in UTF-8 locale on Windows

From: Sebastien FLAESCH <sf(at)4js(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Database object names and libpq in UTF-8 locale on Windows
Date: 2012-10-22 16:53:54
Message-ID: 50857A22.4010405@4js.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,
I would appreciate some help or comments on this topic...
Is this the wrong mailing list to ask such question?
Seb

On 10/18/2012 10:15 AM, Sebastien FLAESCH wrote:
> Hello,
>
> Using PostgreSQL 9.2.1 on Windows, I am facing a strange character set
> issue
> with a UTF-8 database.
>
> Maybe this is expected, but want to be sure that I am not missing
> something.
>
> On Windows, I have created a database with:
>
> ENCODING = 'UTF-8'
> LC_COLLATE = 'English_United States.1252'
> LC_CTYPE = 'English_United States.1252'
>
> I am using libpq with a simple C program.
> The program handles UTF-8 strings.
>
> While it's not a traditional "UNICODE" Windows application using WCHAR, it
> should be possible to send and get back UTF-8 data with PostgreSQL.
>
> Interacting with the Windows system in the system locale is another
> story, but
> from a pure C / SQL / libpq point of view, as long as the PostgreSQL client
> encoding is properly defined to UTF-8, it should work, and it does mainly:
>
> - I can use UTF-8 string constants in my queries.
> - I can pass UTF-8 data to the database with parameterized queries.
> - I can fetch UTF-8 data from the database.
> - I can create db object names with UTF-8 characters.
>
> But the db object names must be specified with double quotes:
> When I do not use quoted db object names, I get a strange problem.
> The table is created, I can use it in my program, I can even use it in the
> pgAdmin query tool, but in the pgAdmin db browser, there is no table name
> displayed in the treeview...
>
> Further, when selecting schema information from pg_class, I can see that
> the
> table exists, but there is nothing displayed in the relname attribute...
>
> It appears that the problem disappears when using a "C" collation are
> char type:
>
> ENCODING = 'UTF-8'
> LC_COLLATE = 'C'
> LC_CTYPE = 'C'
>
> I suspect this has something to do with the fact that non-quoted
> identifiers
> are converted to lowercase, and because my LC_CTYPE is English_United
> States.1252,
> the conversion to lowercase fails...
>
> But:
>
> - why does PostgreSQL accept to create the table with invalid UTF-8
> characters?
>
> - why can I make queries in the query tool with the UTF-8 table name?
> (note that show client_encoding in the query tool gives me "UNICODE" -
> is this
> the same as "UTF-8"?)
>
>
> So is there a bug, or do I have to use a C collation and char type for
> UTF-8
> databases on Windows?
>
> I know that UTF-8 is not supported by the Windows C/POSIX library
> (setlocale,
> and co).
>
> Does it mean that it's not realistic to use UTF-8 encoding for PostgreSQL
> databases on Windows...?
>
> Thanks for reading.
> Sebastien FLAESCH
> Four Js Development Tools
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2012-10-22 17:25:47 Re: Successor of MD5 authentication, let's use SCRAM
Previous Message Andrew Dunstan 2012-10-22 16:25:42 Re: [PATCH] Support for Array ELEMENT Foreign Keys