Re: Encoding issues in console and eventlog on win32

From: Josh Williams <joshwilliams(at)ij(dot)net>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Encoding issues in console and eventlog on win32
Date: 2009-09-21 04:00:29
Message-ID: 1253505629.7374.23.camel@lapdragon
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2009-09-15 at 12:49 +0900, Itagaki Takahiro wrote:
> Here is an updated version of the patch.

This is a review of the Eventlog encoding on Windows patch:
http://archives.postgresql.org/message-id/20090915123243.9C59.52131E4D@oss.ntt.co.jp

Purpose & Format
================
This patch is designed to coerce log messages to a specific encoding.
It's currently only targeted at the win32 port, where the logs are
written in UTF-16.

The patch applies cleanly. It doesn't include any documentation updates
or additional regression tests. A comment in the documentation that
logs on Windows will go through an encoding conversion if appropriate
might be nice, though.

Initial Run
===========
To (hopefully) properly test I initdb'd a couple directories under
different locales. I then ran a few statements designed to generate
event log messages showing characters in a different encoding:
SELECT E'\xF0'::int;

The unpatched backend generated event log message showing only the byte
value interpreted as the same character each time in the system default
encoding.

With the patch in place the event log message showed the character
correctly for each of the different encodings.

I haven't tried any performance testing against it.

Concurrent Development Issues
=============================
On a hunch, tried applying the "syslogger infrastructure changes" at the
same time. They conflict on elog.c. Not sure if we're supposed to
check for that, but thought I'd point it out. :)

Editorial
=========
The problem seems to stem from PG and Windows each having a few
encodings the other won't understand, or at least don't immediately
support. So log messages back to the system from its perspective
contain incorrect or broken characters. I'm not sure this is as much of
a problem on other platforms, though, where the database encoding
typically doesn't have any trouble matching the system's; would it be
worth pursuing beyond the win32 port?

I'm not too familiar with alternate character sets... I would assume if
there's a code page supported on win32 it'll naturally support
conversion to UTF-16 on the platform, but is there any time this could
fail? What about the few encodings that it doesn't directly support,
which need a conversion to UTF-8 first?

Maybe someone with more familiarity with encoding conversion issues
could comment on that? Otherwise I think this is ready to be bumped up
for committer review.

- Josh Williams

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Steve Prentice 2009-09-21 04:01:36 Re: PATCH: make plpgsql IN args mutable (v1)
Previous Message Marcos Luis Ortiz Valmaseda 2009-09-21 03:10:24 Re: generic copy options