Composite Type with Domain

Lists: pgsql-bugs
From: 维 姜 <jw(dot)pgsql(at)sduept(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Composite Type with Domain
Date: 2006-04-04 05:36:43
Message-ID: 1144129003.6769.8.camel@dell.sduept.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

# pg8.1.3

=> CREATE DOMAIN d_1 integer CHECK (VALUE < 10);
=> CREATE TYPE t_1 AS (m d_1);
=> SELECT '(100)':: t_1;
t_1
-------
(100)
(1 row)

=> SELECT row(100):: t_1;
错误: 域 d_1 的值违反了检查约束 "d_1_check"

=> \encoding ISO_8859_1
=> SELECT row(100):: t_1;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: 维 姜 <jw(dot)pgsql(at)sduept(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Composite Type with Domain
Date: 2006-04-04 05:46:53
Message-ID: 13353.1144129613@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

=?UTF-8?Q?=E7=BB=B4_?= =?UTF-8?Q?=E5=A7=9C?= <jw(dot)pgsql(at)sduept(dot)com> writes:
> => \encoding ISO_8859_1
> => SELECT row(100):: t_1;
> server closed the connection unexpectedly

Works for me:

regression=# SELECT row(100):: t_1;
ERROR: value for domain d_1 violates check constraint "d_1_check"
regression=# \encoding ISO_8859_1
regression=# SELECT row(100):: t_1;
ERROR: value for domain d_1 violates check constraint "d_1_check"

Please provide more details, like your locale and encoding settings.

regards, tom lane


From: JiangWei <jw(dot)pgsql(at)sduept(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Composite Type with Domain
Date: 2006-04-04 13:52:08
Message-ID: 1144158728.3577.13.camel@fedora.sduept.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

* BUG #1:

=> SELECT '(100)':: t_1;
t_1
-------
(100)
(1 row)

-------------------------------------------------------------------

* BUG #2:



=> \encoding
UTF8
=> show server_encoding ;
server_encoding
-----------------
UTF8




=> \encoding ISO_8859_1
=> SELECT row(100):: t_1;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed

[jw(at)dell ~]$ locale
LANG=zh_CN.UTF-8
LC_CTYPE="zh_CN.UTF-8"
LC_NUMERIC="zh_CN.UTF-8"
LC_TIME="zh_CN.UTF-8"
LC_COLLATE="zh_CN.UTF-8"
LC_MONETARY="zh_CN.UTF-8"
LC_MESSAGES="zh_CN.UTF-8"
LC_PAPER="zh_CN.UTF-8"
LC_NAME="zh_CN.UTF-8"
LC_ADDRESS="zh_CN.UTF-8"
LC_TELEPHONE="zh_CN.UTF-8"
LC_MEASUREMENT="zh_CN.UTF-8"
LC_IDENTIFICATION="zh_CN.UTF-8"
LC_ALL=

在 2006-04-04二的 01:46 -0400,Tom Lane写道:
> =?UTF-8?Q?=E7=BB=B4_?= =?UTF-8?Q?=E5=A7=9C?= <jw(dot)pgsql(at)sduept(dot)com> writes:
> > => \encoding ISO_8859_1
> > => SELECT row(100):: t_1;
> > server closed the connection unexpectedly
>
> Works for me:
>
> regression=# SELECT row(100):: t_1;
> ERROR: value for domain d_1 violates check constraint "d_1_check"
> regression=# \encoding ISO_8859_1
> regression=# SELECT row(100):: t_1;
> ERROR: value for domain d_1 violates check constraint "d_1_check"
>
> Please provide more details, like your locale and encoding settings.
>
> regards, tom lane
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: JiangWei <jw(dot)pgsql(at)sduept(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: NLS vs error processing, again (was Re: Composite Type with Domain)
Date: 2006-04-04 14:41:13
Message-ID: 18913.1144161673@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

JiangWei <jw(dot)pgsql(at)sduept(dot)com> writes:
> LANG=zh_CN.UTF-8
> [ set client_encoding to LATIN1 and provoke an error ]

OK, I can reproduce the crash after initdb'ing with that LANG setting
(in an nls-enabled build). The postmaster log fills with a whole lot
of occurrences of

: UTF-8 0x00e9
: UTF-8 0x00e8
: UTF-8 0x00e8
: UTF-8 0x00e8
: ERRORDATA_STACK_SIZE exceeded

Tracing through the dump shows that the error-handling code is
recursively producing this warning while trying to translate the word
WARNING to LATIN1. The zh_CN.po file shows the translation as

#: utils/error/elog.c:1909
msgid "WARNING"
msgstr ""

(which apparently is GB2312?) and what's actually getting passed to
utf8_to_iso8859_1() is

(gdb) x/6o str
0x8b89d8: 0350 0255 0246 0345 0221 0212

I have no idea if this is a correct UTF8 transliteration of the GB2312
phrase --- can anyone confirm? But anyway, if this is Chinese then it's
hardly surprising that there would be no LATIN1 equivalent. And then
trying to report the problem gets us into a new instance of the same
problem. Even the code that's supposed to stop error recursion doesn't
get us out of it.

It seems to me that there basically is no graceful solution to this sort
of mismatch. It might be possible to kluge things so that we disable
NLS once we've recursed too many times in error processing, but that's
surely pretty ugly. What would be a lot more user-friendly would be to
refuse the attempt to set client_encoding to something that can't handle
our error message encoding, but I don't know what a reasonable set of
restrictions would be.

Comments?

regards, tom lane


From: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: JiangWei <jw(dot)pgsql(at)sduept(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: NLS vs error processing, again (was Re: Composite Type
Date: 2006-04-05 02:10:35
Message-ID: 4433271B.5090204@timbira.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:

>It seems to me that there basically is no graceful solution to this sort
>of mismatch. It might be possible to kluge things so that we disable
>NLS once we've recursed too many times in error processing, but that's
>surely pretty ugly. What would be a lot more user-friendly would be to
>refuse the attempt to set client_encoding to something that can't handle
>our error message encoding, but I don't know what a reasonable set of
>restrictions would be.
>
>
>
Maybe it's the time to convert all PO files to UTF-8. I'm in process to
convert pt_BR ones.

--
Euler Taveira de Oliveira


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: JiangWei <jw(dot)pgsql(at)sduept(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: NLS vs error processing, again (was Re: Composite Type with Domain)
Date: 2006-04-05 02:44:23
Message-ID: 23106.1144205063@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Euler Taveira de Oliveira <euler(at)timbira(dot)com> writes:
> Tom Lane wrote:
>> It seems to me that there basically is no graceful solution to this sort
>> of mismatch. It might be possible to kluge things so that we disable
>> NLS once we've recursed too many times in error processing, but that's
>> surely pretty ugly. What would be a lot more user-friendly would be to
>> refuse the attempt to set client_encoding to something that can't handle
>> our error message encoding, but I don't know what a reasonable set of
>> restrictions would be.

> Maybe it's the time to convert all PO files to UTF-8. I'm in process to
> convert pt_BR ones.

What does that have to do with it?

regards, tom lane


From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: jw(dot)pgsql(at)sduept(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: NLS vs error processing, again
Date: 2006-04-05 03:20:47
Message-ID: 20060405.122047.98856262.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

> JiangWei <jw(dot)pgsql(at)sduept(dot)com> writes:
> > LANG=zh_CN.UTF-8
> > [ set client_encoding to LATIN1 and provoke an error ]
>
> OK, I can reproduce the crash after initdb'ing with that LANG setting
> (in an nls-enabled build). The postmaster log fills with a whole lot
> of occurrences of
>
> ������: ��������������������� UTF-8 ������ 0x00e9
> ������: ��������������������� UTF-8 ������ 0x00e8
> ������: ��������������������� UTF-8 ������ 0x00e8
> ������: ��������������������� UTF-8 ������ 0x00e8
> ���������������������������������: ERRORDATA_STACK_SIZE exceeded
>
> Tracing through the dump shows that the error-handling code is
> recursively producing this warning while trying to translate the word
> WARNING to LATIN1. The zh_CN.po file shows the translation as
>
> #: utils/error/elog.c:1909
> msgid "WARNING"
> msgstr "����"
>
> (which apparently is GB2312?)

It seems. zh_CN.po has the line:

"Content-Type: text/plain; charset=GB2312\n"

Which means at least someone who wrote the file intended to be it as
GB2312. However, please note that GB2312 is a character set, not an
encoding. The reality is that the file seems encoded in EUC-CN. Note
that I have confirmed this by just examining the bytes above
(����) are correct EUC-CN byte sequences. It is posibble
that the file is not written in EUC-CN, but I guess it's hardly
possible.

> and what's actually getting passed to
> utf8_to_iso8859_1() is
>
> (gdb) x/6o str
> 0x8b89d8: 0350 0255 0246 0345 0221 0212
>
> I have no idea if this is a correct UTF8 transliteration of the GB2312
> phrase --- can anyone confirm?

As fas as looking into utils/mb/Unicode/euc_cn_to_utf8.map, the
translation above seems to be correct. BTW, who does the translation
from EUC-CN to UTF-8? Maybe gettext()?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> But anyway, if this is Chinese then it's
> hardly surprising that there would be no LATIN1 equivalent. And then
> trying to report the problem gets us into a new instance of the same
> problem. Even the code that's supposed to stop error recursion doesn't
> get us out of it.
>
> It seems to me that there basically is no graceful solution to this sort
> of mismatch. It might be possible to kluge things so that we disable
> NLS once we've recursed too many times in error processing, but that's
> surely pretty ugly. What would be a lot more user-friendly would be to
> refuse the attempt to set client_encoding to something that can't handle
> our error message encoding, but I don't know what a reasonable set of
> restrictions would be.
>
> Comments?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: jw(dot)pgsql(at)sduept(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: NLS vs error processing, again
Date: 2006-04-05 03:57:03
Message-ID: 23732.1144209423@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
> As fas as looking into utils/mb/Unicode/euc_cn_to_utf8.map, the
> translation above seems to be correct. BTW, who does the translation
> from EUC-CN to UTF-8? Maybe gettext()?

I'm far from an expert on this, but the gettext documentation indicates
that it tries to translate the .po file contents into whatever encoding
is implied by LC_CTYPE. The fact that the string passed to
utf8_to_iso8859_1 is not identical to the .po file contents indicates
that gettext is doing *something*. I'm a bit worried that this
translation could be out of step with what we will expect the
server_encoding to be --- but there's not any immediate evidence of
that.

Anyway, the real problem seems to be what to do if translation of an
error message to the client_encoding fails. That's clearly a risk even
if gettext has behaved perfectly.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, JiangWei <jw(dot)pgsql(at)sduept(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: NLS vs error processing, again (was Re: Composite Type
Date: 2006-04-05 12:13:20
Message-ID: 20060405121320.GA6720@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Euler Taveira de Oliveira wrote:
> Tom Lane wrote:
>
> >It seems to me that there basically is no graceful solution to this sort
> >of mismatch. It might be possible to kluge things so that we disable
> >NLS once we've recursed too many times in error processing, but that's
> >surely pretty ugly. What would be a lot more user-friendly would be to
> >refuse the attempt to set client_encoding to something that can't handle
> >our error message encoding, but I don't know what a reasonable set of
> >restrictions would be.
>
> Maybe it's the time to convert all PO files to UTF-8. I'm in process to
> convert pt_BR ones.

I don't understand what do you think would be gained by doing that. If
the message has chinese chars, a recode from UTF8 to Latin1 is as bad as
GB1232 to Latin1.

What needs to be done for this to work is to refuse trying to recode, as
Tom proposes above. We would need to determine what recodes are "safe";
for example, (I think) valid encodings to Latin1 (iso 8859-1) are from
Latin9 (iso 8859-15 ?), Unicode and Win1252 and ASCII. If the server
encoding or the encoding of the message files is a chinese encoding,
setting client_encoding to latin1 would raise an error.

The problem, I think, would be in determining what recodings are sane.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, jw(dot)pgsql(at)sduept(dot)com
Subject: Re: NLS vs error processing, again
Date: 2006-04-05 13:03:54
Message-ID: 200604051503.56309.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:
> I'm far from an expert on this, but the gettext documentation
> indicates that it tries to translate the .po file contents into
> whatever encoding is implied by LC_CTYPE.

Correct. That is just one more reason to have server encoding,
LC_COLLATE, and LC_CTYPE matching. In practice, there is hardly a
reason to have LC_COLLATE and LC_CTYPE be different, so the problem
should not be that big.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, jw(dot)pgsql(at)sduept(dot)com
Subject: Re: NLS vs error processing, again
Date: 2006-04-20 10:55:48
Message-ID: 200604201055.k3KAtmW06560@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Peter Eisentraut wrote:
> Tom Lane wrote:
> > I'm far from an expert on this, but the gettext documentation
> > indicates that it tries to translate the .po file contents into
> > whatever encoding is implied by LC_CTYPE.
>
> Correct. That is just one more reason to have server encoding,
> LC_COLLATE, and LC_CTYPE matching. In practice, there is hardly a
> reason to have LC_COLLATE and LC_CTYPE be different, so the problem
> should not be that big.

Is there any TODO here?

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +