Re: [GENERAL] invalid byte sequence ?

Lists: pgsql-generalpgsql-hackers
From: Andreas <maps(dot)on(at)gmx(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: invalid byte sequence ?
Date: 2006-08-23 21:02:42
Message-ID: 44ECC272.9040606@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hi,

I've got pg 8.1.4 from the binary Windows installer.
Windows 2000 / German
Now I entered "\d" into psql on the text-console and got this:

db_test=# \d
ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572220a

What's up ?
db_test was created UTF8 encoded


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andreas <maps(dot)on(at)gmx(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: invalid byte sequence ?
Date: 2006-08-23 21:05:06
Message-ID: 200608232105.k7NL56F17816@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Andreas wrote:
> Hi,
>
> I've got pg 8.1.4 from the binary Windows installer.
> Windows 2000 / German
> Now I entered "\d" into psql on the text-console and got this:
>
> db_test=# \d
> ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572220a
>
> What's up ?
> db_test was created UTF8 encoded

What does your client_encoding show? It should be UTF8 too.
--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Andreas <maps(dot)on(at)gmx(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: invalid byte sequence ?
Date: 2006-08-23 21:53:22
Message-ID: 44ECCE52.8060500@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian schrieb:
> Andreas wrote:
>
>> I've got pg 8.1.4 from the binary Windows installer.
>> Windows 2000 / German
>> Now I entered "\d" into psql on the text-console and got this:
>>
>> db_test=# \d
>> ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572220a
>>
>> What's up ?
>> db_test was created UTF8 encoded
>>
>
> What does your client_encoding show? It should be UTF8 too.
>

it is.

db_test=# \d
ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572220a
db_test=# show client_encoding;
client_encoding
-----------------
UTF8
(1 Zeile)

psql complains about the code page, too, now. (850 vs. 1252)
I'm sure I checked it the other day with a cmd that used 1252 and still
got the error for the \d command.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas <maps(dot)on(at)gmx(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: invalid byte sequence ?
Date: 2006-08-23 22:45:03
Message-ID: 25307.1156373103@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Andreas <maps(dot)on(at)gmx(dot)net> writes:
> I've got pg 8.1.4 from the binary Windows installer.
> Windows 2000 / German
> Now I entered "\d" into psql on the text-console and got this:
>
> db_test=# \d
> ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572220a

I can replicate this by using a UTF8 database and running the client
in a non-UTF8 locale. For example

$ LANG=de_DE.iso88591 psql postgres
Dies ist psql 8.2devel, das interaktive PostgreSQL-Terminal.

Geben Sie ein: \copyright fr Urheberrechtsinformationen
\h fr Hilfe ber SQL-Anweisungen
\? fr Hilfe ber interne Anweisungen
\g oder Semikolon, um eine Anfrage auszufhren
\q um zu beenden

postgres=# \l
ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572222c
TIP: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
postgres=# \d
ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572220a
TIP: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
postgres=# \encoding
UTF8
postgres=#

The problem here is that psql is using gettext() to convert column
headings for its display to German, and gettext() sees its locale
as specifying ISO8859-1, so that's the encoding it produces. When
that data is sent over to the server --- which thinks that the
client is using UTF8 encoding, because it hasn't been told any
different --- the server quite naturally barfs.

We've known about this and related issues with gettext for some time,
but a bulletproof solution isn't clear. For the moment all you can
do is be real careful about making your locale settings match up.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andreas <maps(dot)on(at)gmx(dot)net>, pgsql-general(at)postgresql(dot)org
Subject: Re: invalid byte sequence ?
Date: 2006-08-23 22:47:54
Message-ID: 200608232247.k7NMls528659@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


Is this a TODO?

---------------------------------------------------------------------------

Tom Lane wrote:
> Andreas <maps(dot)on(at)gmx(dot)net> writes:
> > I've got pg 8.1.4 from the binary Windows installer.
> > Windows 2000 / German
> > Now I entered "\d" into psql on the text-console and got this:
> >
> > db_test=# \d
> > ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572220a
>
> I can replicate this by using a UTF8 database and running the client
> in a non-UTF8 locale. For example
>
> $ LANG=de_DE.iso88591 psql postgres
> Dies ist psql 8.2devel, das interaktive PostgreSQL-Terminal.
>
> Geben Sie ein: \copyright fr Urheberrechtsinformationen
> \h fr Hilfe ber SQL-Anweisungen
> \? fr Hilfe ber interne Anweisungen
> \g oder Semikolon, um eine Anfrage auszufhren
> \q um zu beenden
>
> postgres=# \l
> ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572222c
> TIP: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
> postgres=# \d
> ERROR: invalid byte sequence for encoding "UTF8": 0xfc6d6572220a
> TIP: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".
> postgres=# \encoding
> UTF8
> postgres=#
>
> The problem here is that psql is using gettext() to convert column
> headings for its display to German, and gettext() sees its locale
> as specifying ISO8859-1, so that's the encoding it produces. When
> that data is sent over to the server --- which thinks that the
> client is using UTF8 encoding, because it hasn't been told any
> different --- the server quite naturally barfs.
>
> We've known about this and related issues with gettext for some time,
> but a bulletproof solution isn't clear. For the moment all you can
> do is be real careful about making your locale settings match up.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas <maps(dot)on(at)gmx(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: invalid byte sequence ?
Date: 2006-08-23 22:52:00
Message-ID: 25374.1156373520@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

I wrote:
> We've known about this and related issues with gettext for some time,
> but a bulletproof solution isn't clear. For the moment all you can
> do is be real careful about making your locale settings match up.

I forgot to mention that it works fine if the server is told the client
encoding actually being used:

postgres=# \encoding iso8859-1
postgres=# \l
Liste der Datenbanken
Name | Eigentmer | Kodierung
------------+------------+-----------
postgres | tgl | UTF8
regression | tgl | SQL_ASCII
template0 | tgl | UTF8
template1 | tgl | UTF8
(4 Zeilen)

postgres=# \d
Keine Relationen gefunden
postgres=#

A possible solution therefore is to have psql or libpq drive the
client_encoding off the client's locale environment instead of letting
it default to equal the server_encoding. But I'm not sure what
downsides that would have, and in any case it's not entirely clear that
we can always derive the correct Postgres encoding name from the
system's locale info.

regards, tom lane


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andreas <maps(dot)on(at)gmx(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: invalid byte sequence ?
Date: 2006-08-24 09:57:15
Message-ID: 20060824095715.GB24070@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, Aug 23, 2006 at 06:52:00PM -0400, Tom Lane wrote:
> A possible solution therefore is to have psql or libpq drive the
> client_encoding off the client's locale environment instead of letting
> it default to equal the server_encoding. But I'm not sure what
> downsides that would have, and in any case it's not entirely clear that
> we can always derive the correct Postgres encoding name from the
> system's locale info.

For glibc systems we can get 100% reliable results. Even for other
systems there's standard code out there for determining the charset.
But this has been discussed before:

http://archives.postgresql.org/pgsql-hackers/2003-05/msg00744.php
http://archives.postgresql.org/pgsql-general/2004-04/msg00470.php
http://archives.postgresql.org/pgsql-hackers/2006-06/msg01027.php

It seems to me that setting the client encoding based on the
client-locale is the *only* sensible way of doing it. The locale is
going to effect the results of programs like sort and any scripts used
to process the data anyway.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andreas <maps(dot)on(at)gmx(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: invalid byte sequence ?
Date: 2006-08-24 13:45:23
Message-ID: 20060824134523.GC18349@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Martijn van Oosterhout wrote:

> For glibc systems we can get 100% reliable results. Even for other
> systems there's standard code out there for determining the charset.
> But this has been discussed before:
>
> http://archives.postgresql.org/pgsql-hackers/2003-05/msg00744.php
> http://archives.postgresql.org/pgsql-general/2004-04/msg00470.php
> http://archives.postgresql.org/pgsql-hackers/2006-06/msg01027.php
>
> It seems to me that setting the client encoding based on the
> client-locale is the *only* sensible way of doing it. The locale is
> going to effect the results of programs like sort and any scripts used
> to process the data anyway.

Yes please. This would make the pgsql-es-ayuda list lose a small but
measurable amount of its traffic (which I won't miss). Non-matching
\encoding settings is just too frequent.

FWIW I'm not sure if it really belongs in libpq, or it must be rather in
psql (and thus in every client).

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Andreas <maps(dot)on(at)gmx(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: invalid byte sequence ?
Date: 2006-08-24 14:01:05
Message-ID: 3942.1156428065@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Martijn van Oosterhout wrote:
>> It seems to me that setting the client encoding based on the
>> client-locale is the *only* sensible way of doing it.

> Yes please.

> FWIW I'm not sure if it really belongs in libpq, or it must be rather in
> psql (and thus in every client).

libpq is what implements PGCLIENTENCODING, so I'd say that's where any
change in the default has to be handled. Presumably we'd still allow
PGCLIENTENCODING to override the locale?

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-general(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andreas <maps(dot)on(at)gmx(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: invalid byte sequence ?
Date: 2006-08-24 15:55:56
Message-ID: 200608241755.57574.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> A possible solution therefore is to have psql or libpq drive the
> client_encoding off the client's locale environment instead of
> letting it default to equal the server_encoding.

I have been proposing that for years, but just about now the Japanese
would speak up and protest ... I say, rush this in before anyone
notices.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-general(at)postgresql(dot)org, Andreas <maps(dot)on(at)gmx(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: invalid byte sequence ?
Date: 2006-08-24 17:17:49
Message-ID: 6998.1156439869@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Tom Lane wrote:
>> A possible solution therefore is to have psql or libpq drive the
>> client_encoding off the client's locale environment instead of
>> letting it default to equal the server_encoding.

> I have been proposing that for years, but just about now the Japanese
> would speak up and protest ... I say, rush this in before anyone
> notices.

I guess the key point might be "what do we do if the client locale
is C?" Perhaps if it's C, we continue to use the server encoding
as we have in the past. This would be a reasonable fallback in
other cases where we fail to deduce an encoding from the locale, too.

regards, tom lane


From: Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: invalid byte sequence ?
Date: 2006-08-24 21:22:26
Message-ID: 20060824212226.GF6550@merkur.hilbert.loc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Thu, Aug 24, 2006 at 01:17:49PM -0400, Tom Lane wrote:

> I guess the key point might be "what do we do if the client locale
> is C?" Perhaps if it's C, we continue to use the server encoding
> as we have in the past. This would be a reasonable fallback in
> other cases where we fail to deduce an encoding from the locale, too.

In that case I would suggest to also emit a suitable warning
(with a postgresql.conf option to switch that off which
defaults to ON).

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-general(at)postgresql(dot)org
Cc: Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
Subject: Re: invalid byte sequence ?
Date: 2006-08-25 11:53:30
Message-ID: 200608251353.31466.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Am Donnerstag, 24. August 2006 23:22 schrieb Karsten Hilbert:
> In that case I would suggest to also emit a suitable warning
> (with a postgresql.conf option to switch that off which
> defaults to ON).

libpq can neither read postgresql.conf nor does it have the liberty to write
messages anywhere.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: invalid byte sequence ?
Date: 2006-08-25 12:11:13
Message-ID: 20060825121112.GM6550@merkur.hilbert.loc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Fri, Aug 25, 2006 at 01:53:30PM +0200, Peter Eisentraut wrote:

> > In that case I would suggest to also emit a suitable warning
> > (with a postgresql.conf option to switch that off which
> > defaults to ON).
>
> libpq can neither read postgresql.conf nor does it have the liberty to write
> messages anywhere.
LOL, duh, of course. Don't know how I got that idea.

Karsten
--
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] invalid byte sequence ?
Date: 2006-08-25 15:07:03
Message-ID: 200608251707.04128.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Am Donnerstag, 24. August 2006 00:52 schrieb Tom Lane:
> A possible solution therefore is to have psql or libpq drive the
> client_encoding off the client's locale environment instead of letting
> it default to equal the server_encoding.

I got started on this and just wanted to post an intermediate patch. I have
taken the logic from initdb and placed it into libpq and refined the API a
bit. At this point, there should be no behaviorial change. It remains to
make libpq use this stuff if PGCLIENTENCODING is not set. Unless someone
beats me, I'll figure that out later.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Attachment Content-Type Size
codeset-refactor.patch.gz application/x-gzip 3.0 KB

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] invalid byte sequence ?
Date: 2006-08-25 15:30:40
Message-ID: 20060825153040.GJ16535@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Fri, Aug 25, 2006 at 05:07:03PM +0200, Peter Eisentraut wrote:
> I got started on this and just wanted to post an intermediate patch. I have
> taken the logic from initdb and placed it into libpq and refined the API a
> bit. At this point, there should be no behaviorial change. It remains to
> make libpq use this stuff if PGCLIENTENCODING is not set. Unless someone
> beats me, I'll figure that out later.

Umm, why export all these functions. For starters, does this even need
to be in libpq? I wouldn't have thought so the first time round,
especially not three functions. The only thing you need is to take a
locale name and return the charset you can pass to PQsetClientEncoding.

In fact, the only thing you need is PQsetClientEncodingFromLocale(),
anything else is just sugar. Why would the user care about what the OS
calls it? We have a "pg_enc" enum, so lets use it.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] invalid byte sequence ?
Date: 2006-08-25 15:38:20
Message-ID: 200608251738.21300.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Am Freitag, 25. August 2006 17:30 schrieb Martijn van Oosterhout:
> Umm, why export all these functions. For starters, does this even need
> to be in libpq?

Where else would you put it?

> In fact, the only thing you need is PQsetClientEncodingFromLocale(),
> anything else is just sugar. Why would the user care about what the OS
> calls it? We have a "pg_enc" enum, so lets use it.

initdb has different requirements. Let me know if you have a different way to
refactor it that satisfies initdb.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] invalid byte sequence ?
Date: 2006-08-25 15:50:00
Message-ID: 20060825155000.GK16535@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Fri, Aug 25, 2006 at 05:38:20PM +0200, Peter Eisentraut wrote:
> > In fact, the only thing you need is PQsetClientEncodingFromLocale(),
> > anything else is just sugar. Why would the user care about what the OS
> > calls it? We have a "pg_enc" enum, so lets use it.
>
> initdb has different requirements. Let me know if you have a different way to
> refactor it that satisfies initdb.

Well, check_encodings_match(pg_enc,ctype) is simply a short way of
saying: if(find_matching_encoding(ctype) != pg_enc ) { error }.
And get_encoding_from_locale() is not used outside of those functions.

So the only thing initdb actually needs is an implementation of
find_matching_encoding(ctype), which returns a value of "enum pg_enc".
check_encodings_match() stays in initdb, and get_encoding_from_locale()
becomes internal to libpq.

How does that sound?

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] invalid byte sequence ?
Date: 2006-08-25 16:10:52
Message-ID: 2051.1156522252@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Am Freitag, 25. August 2006 17:30 schrieb Martijn van Oosterhout:
>> Umm, why export all these functions. For starters, does this even need
>> to be in libpq?

> Where else would you put it?
> ...
> initdb has different requirements. Let me know if you have a different way to
> refactor it that satisfies initdb.

Um, but initdb doesn't use libpq, so it's going to need its own copy
anyway. I agree with Martijn that putting these into libpq's API
seems like useless clutter.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] invalid byte sequence ?
Date: 2006-08-25 18:13:39
Message-ID: 200608252013.40132.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> Um, but initdb doesn't use libpq, so it's going to need its own copy
> anyway.

initdb certainly links against libpq.

> I agree with Martijn that putting these into libpq's API
> seems like useless clutter.

Where else to put it? We need it in libpq anyway if we want this
behavior in all client applications (by default).

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] invalid byte sequence ?
Date: 2006-08-25 18:30:19
Message-ID: 4047.1156530619@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Tom Lane wrote:
>> I agree with Martijn that putting these into libpq's API
>> seems like useless clutter.

> Where else to put it? We need it in libpq anyway if we want this
> behavior in all client applications (by default).

Having the code in libpq doesn't necessarily mean exposing it to the
outside world. I can't see a reason for these to be in the API at all.

Possibly we could avoid the duplication-of-source-code issue by putting
the code in libpgport, or someplace, whence both initdb and libpq could
get at it?

regards, tom lane


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] invalid byte sequence ?
Date: 2006-08-25 18:37:11
Message-ID: 20060825183711.GL16535@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Fri, Aug 25, 2006 at 08:13:39PM +0200, Peter Eisentraut wrote:
> > I agree with Martijn that putting these into libpq's API
> > seems like useless clutter.
>
> Where else to put it? We need it in libpq anyway if we want this
> behavior in all client applications (by default).

Is that so? I thought we were only talkng about psql. Even then, I'm
wondering if we should alter the current behaviour at all if stdout is
not a tty (i.e. run as a pipe).

And as a counter-example: pg_dump should absolutly not use the client
locale, it should always dump as the same encoding as the server...

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] invalid byte sequence ?
Date: 2006-08-25 18:43:34
Message-ID: 4361.1156531414@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> And as a counter-example: pg_dump should absolutly not use the client
> locale, it should always dump as the same encoding as the server...

Sure, but pg_dump should set that explicitly. I'm prepared to believe
that looking at the locale is sane for all normal clients.

It might be worth providing a way to set the client_encoding through a
PQconnectdb connection-string keyword, just in case the override-via-
PGCLIENTENCODING dodge doesn't suit someone. The priority order
would presumably be connection string, then PGCLIENTENCODING, then
locale.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] invalid byte sequence ?
Date: 2006-08-25 18:53:59
Message-ID: 20060825185359.GN14622@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> > And as a counter-example: pg_dump should absolutly not use the client
> > locale, it should always dump as the same encoding as the server...
>
> Sure, but pg_dump should set that explicitly. I'm prepared to believe
> that looking at the locale is sane for all normal clients.

What are "normal clients"? I would think that programs in PHP or Perl
have their own idea of the correct encoding (JDBC already has one).

> It might be worth providing a way to set the client_encoding through a
> PQconnectdb connection-string keyword, just in case the override-via-
> PGCLIENTENCODING dodge doesn't suit someone. The priority order
> would presumably be connection string, then PGCLIENTENCODING, then
> locale.

This sounds like a good idea anyway...

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: invalid byte sequence ?
Date: 2006-09-02 22:55:24
Message-ID: 200609022255.k82MtOX20998@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


Is this being done?

---------------------------------------------------------------------------

Karsten Hilbert wrote:
> On Thu, Aug 24, 2006 at 01:17:49PM -0400, Tom Lane wrote:
>
> > I guess the key point might be "what do we do if the client locale
> > is C?" Perhaps if it's C, we continue to use the server encoding
> > as we have in the past. This would be a reasonable fallback in
> > other cases where we fail to deduce an encoding from the locale, too.
>
> In that case I would suggest to also emit a suitable warning
> (with a postgresql.conf option to switch that off which
> defaults to ON).
>
> Karsten
> --
> GPG key ID E4071346 @ wwwkeys.pgp.net
> E167 67FD A291 2BEA 73BD 4537 78B9 A9F9 E407 1346
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +