Re: prevent encoding conversion recursive error

Lists: pgsql-hackerspgsql-patches
From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-patches(at)postgresql(dot)org
Subject: prevent encoding conversion recursive error
Date: 2005-08-04 08:53:51
Message-ID: dcsl7c$284i$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

As this thread reports (some sever messages are in Chinese):

http://archives.postgresql.org/pgsql-bugs/2005-07/msg00247.php

a SQL grammar error could crash the error stack.

My explaination is that "select;" incurs a parse error and this error
message is
supposed to be translated into your encoding, but unfortunately not every
UTF8 character is necessarily be encoded as GB18030, which will cause an
infinite recursive elogs just like this:

1:elog(parse_error) // contain unencodable characters
2: elog(report_not_translatable) // contain unencodable characters
again
3: elog(report_report_not_translatable)
4: elog(report_report_report_not_translatable)
5: ...

and corrupt the elog stack.

To fix this, we just change errmsg() to errmsg_internal() to avoid
tranlation which could stop the recursion at step 2.

Regards,
Qingqing

---

Index: backend/utils/mb/conv.c
===================================================================
RCS file: /projects/cvsroot/pgsql/src/backend/utils/mb/conv.c,v
retrieving revision 1.53
diff -c -r1.53 conv.c
*** backend/utils/mb/conv.c 15 Jun 2005 00:15:08 -0000 1.53
--- backend/utils/mb/conv.c 4 Aug 2005 08:33:57 -0000
***************
*** 380,386 ****
{
ereport(WARNING,

(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
! errmsg("ignoring unconvertible UTF8
character 0x%04x",
iutf)));
continue;
}
--- 380,386 ----
{
ereport(WARNING,

(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
! errmsg_internal("ignoring unconvertible
UTF8 character 0x%04x",
iutf)));
continue;
}
***************
*** 449,455 ****
{
ereport(WARNING,

(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
! errmsg("ignoring unconvertible %s
character 0x%04x",

(&pg_enc2name_tbl[encoding])->name, iiso)));
continue;
}
--- 449,455 ----
{
ereport(WARNING,

(errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
! errmsg_internal("ignoring
unconvertible %s character 0x%04x",

(&pg_enc2name_tbl[encoding])->name, iiso)));
continue;
}


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: prevent encoding conversion recursive error
Date: 2005-08-04 14:46:24
Message-ID: 15716.1123166784@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu> writes:
> As this thread reports (some sever messages are in Chinese):
> http://archives.postgresql.org/pgsql-bugs/2005-07/msg00247.php
> a SQL grammar error could crash the error stack.

Hmm, I thought we had fixed that years ago.

> To fix this, we just change errmsg() to errmsg_internal() to avoid
> tranlation which could stop the recursion at step 2.

This is a really ugly solution ... and I don't think it solves the
general problem anyway, since this isn't the only possible error message.

I don't seem to have gotten the original problem report, and the archive
page is pretty useless because all the non-ASCII characters have gotten
changed to "?". Could you post a self-contained example case? Might be
best to wrap it as a compressed attachment so it doesn't get munged in
transmission.

regards, tom lane


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: prevent encoding conversion recursive error
Date: 2005-08-05 01:57:23
Message-ID: dcuh6e$18ea$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes
>
> This is a really ugly solution ... and I don't think it solves the
> general problem anyway, since this isn't the only possible error message.
>

Yeah, it is not a very clean solution. Do you mean the general problem is
"prevent recursive error reporting because of the error in transalting error
message"?

I put the image of the reporting email here:
http://www.cs.toronto.edu/~zhouqq/encode.jpg

Regards,
Qingqing


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: pgsql-patches(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: prevent encoding conversion recursive error
Date: 2005-08-09 02:21:28
Message-ID: 11386.1123554088@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu> writes:
> Yeah, it is not a very clean solution. Do you mean the general problem is
> "prevent recursive error reporting because of the error in transalting error
> message"?

> I put the image of the reporting email here:
> http://www.cs.toronto.edu/~zhouqq/encode.jpg

Actually, I believe the general problem is that the gettext software
is doing the wrong internal character-set conversion for translated
message texts.

I can get this same crash on a Linux machine if I have server encoding
= utf8 and client encoding = gb18030 and I set lc_messages = zh_TW
... but if I instead make lc_messages = zh_CN, no problem. The backend
zh_TW.po file contains

msgid "ignoring unconvertible UTF-8 character 0x%04x"
msgstr "UTF-80x%04x"

and if I read the header correctly, this is claimed to be in UTF8
encoding. So it ought to be delivered as-is when in a UTF8 database.
But tracing through the failure with gdb, I see that what is actually
delivered back from gettext() is

(gdb) p str
$1 = 0x82e8a74 "???UTF-80xd4da"
(gdb) x/32cx str
0x82e8a74: 0xba 0xf6 0xc2 0xd4 0x3f 0xb7 0xa8 0x3f
0x82e8a7c: 0x3f 0xb5 0xc4 0x55 0x54 0x46 0x2d 0x38
0x82e8a84: 0xd7 0xd6 0xd4 0xaa 0x30 0x78 0x64 0x34
0x82e8a8c: 0x64 0x61 0x00 0x7e 0x7f 0x7f 0x7f 0x7f
(gdb)

so some sort of conversion has taken place. I had initially initialized
the database with initdb --locale=zh_CN, which was interpreted by
Postgres as requesting EUC_CN encoding. I suspect the above is the
EUC_CN equivalent of the message text from the .po file, and that the
real problem is that gettext() has not been told the correct character
set to convert messages to.

ISTM we've seen this issue before and Peter had an idea how to fix it,
but I forget the details. Peter?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-patches(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: prevent encoding conversion recursive error
Date: 2005-08-09 02:34:14
Message-ID: 11458.1123554854@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I wrote:
> ...real problem is that gettext() has not been told the correct character
> set to convert messages to.

> ISTM we've seen this issue before and Peter had an idea how to fix it,
> but I forget the details. Peter?

A little bit of digging in the list archives located
http://archives.postgresql.org/pgsql-hackers/2003-11/msg01299.php
in which Peter opines

: - lc_collate and lc_ctype need to be held fixed in the entire cluster.
:
: - Gettext relies on iconv character set conversion, which relies on
: lc_ctype, which leads to a complete screw-up in the server because of
: the previous item.

which seems to fit with my observation: the message texts are being
converted to the cluster's original encoding rather than the encoding
that's active in the current database.

This does not look real easy to fix. Who's up for reimplementing
gettext and a few other pieces from scratch?

There is a separate line of thought here, which is that we are unlikely
ever to get this completely perfect, and so it'd be good if errors
during error processing didn't lead to recursion and PANIC. I don't
have an idea how to solve that one either.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-patches(at)postgresql(dot)org, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: prevent encoding conversion recursive error
Date: 2005-08-09 02:51:27
Message-ID: 11566.1123555887@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I wrote:
> This does not look real easy to fix. Who's up for reimplementing
> gettext and a few other pieces from scratch?

However, I did find

http://gnu.miscellaneousmirror.org/software/libc/manual/html_node/Charset-conversion-in-gettext.html#Charset-conversion-in-gettext

which leads to the question "why aren't we using
bind_textdomain_codeset() to tell gettext what character set it should
produce"?

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: [PATCHES] prevent encoding conversion recursive error
Date: 2005-08-13 02:37:36
Message-ID: 200508130237.j7D2baF02009@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Any comments on this idea?

---------------------------------------------------------------------------

Tom Lane wrote:
> I wrote:
> > This does not look real easy to fix. Who's up for reimplementing
> > gettext and a few other pieces from scratch?
>
> However, I did find
>
> http://gnu.miscellaneousmirror.org/software/libc/manual/html_node/Charset-conversion-in-gettext.html#Charset-conversion-in-gettext
>
> which leads to the question "why aren't we using
> bind_textdomain_codeset() to tell gettext what character set it should
> produce"?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-patches(at)postgresql(dot)org
Subject: Re: prevent encoding conversion recursive error
Date: 2005-08-14 21:03:56
Message-ID: 200508142303.57215.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Am Dienstag, 9. August 2005 04:51 schrieb Tom Lane:
> which leads to the question "why aren't we using
> bind_textdomain_codeset() to tell gettext what character set it should
> produce"?

That would probably require us to solve the question on how to translate
PostgreSQL encoding names to OS encoding names.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-patches(at)postgresql(dot)org
Subject: Re: prevent encoding conversion recursive error
Date: 2005-08-14 21:48:14
Message-ID: 3944.1124056094@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Am Dienstag, 9. August 2005 04:51 schrieb Tom Lane:
>> which leads to the question "why aren't we using
>> bind_textdomain_codeset() to tell gettext what character set it should
>> produce"?

> That would probably require us to solve the question on how to translate
> PostgreSQL encoding names to OS encoding names.

Yeah, but don't we already have some code for that (or, actually, the
reverse direction) in initdb? It's probably not perfect, but it'd be
a lot better than crashing.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-patches(at)postgresql(dot)org
Subject: Re: prevent encoding conversion recursive error
Date: 2005-08-14 21:55:20
Message-ID: 200508142355.21440.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Am Sonntag, 14. August 2005 23:48 schrieb Tom Lane:
> Yeah, but don't we already have some code for that (or, actually, the
> reverse direction) in initdb? It's probably not perfect, but it'd be
> a lot better than crashing.

The reverse direction is a lot simpler because we know the set of possible
output values. I'm not sure how to do the mapping in the direction of the
OS.


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-patches(at)postgresql(dot)org
Subject: Re: prevent encoding conversion recursive error
Date: 2005-08-20 23:09:20
Message-ID: 200508202309.j7KN9KX09278@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Peter Eisentraut wrote:
> Am Sonntag, 14. August 2005 23:48 schrieb Tom Lane:
> > Yeah, but don't we already have some code for that (or, actually, the
> > reverse direction) in initdb? It's probably not perfect, but it'd be
> > a lot better than crashing.
>
> The reverse direction is a lot simpler because we know the set of possible
> output values. I'm not sure how to do the mapping in the direction of the
> OS.

Is there a TODO here?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-patches(at)postgresql(dot)org
Subject: Re: prevent encoding conversion recursive error
Date: 2005-08-20 23:48:49
Message-ID: 17347.1124581729@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Is there a TODO here?

Yeah:

* Fix problems with wrong runtime encoding conversion for NLS message files

One thing that occurred to me is that we might be able to simplify the
problem by adopting a project standard that all NLS message files shall
be in UTF8, period. Then we only have one encoding name to figure out
rather than N. Maybe this doesn't help much ...

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-patches(at)postgresql(dot)org
Subject: Re: prevent encoding conversion recursive error
Date: 2005-09-01 15:48:55
Message-ID: 200509011748.56565.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Am Sonntag, 21. August 2005 01:48 schrieb Tom Lane:
> One thing that occurred to me is that we might be able to simplify the
> problem by adopting a project standard that all NLS message files shall
> be in UTF8, period. Then we only have one encoding name to figure out
> rather than N. Maybe this doesn't help much ...

I suppose this would then break NLS on Windows, no?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-patches(at)postgresql(dot)org
Subject: Re: prevent encoding conversion recursive error
Date: 2005-09-01 16:46:16
Message-ID: 20050901164616.GA30081@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, Sep 01, 2005 at 05:48:55PM +0200, Peter Eisentraut wrote:
> Am Sonntag, 21. August 2005 01:48 schrieb Tom Lane:
> > One thing that occurred to me is that we might be able to simplify the
> > problem by adopting a project standard that all NLS message files shall
> > be in UTF8, period. Then we only have one encoding name to figure out
> > rather than N. Maybe this doesn't help much ...
>
> I suppose this would then break NLS on Windows, no?

We now have a patch to handle UTF-8 on Windows, via recoding to UTF-16
and back, so I guess not (not sure though). It would manage to annoy me
as a translator, but nothing too serious really.

--
Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com
"Escucha y olvidars; ve y recordars; haz y entenders" (Confucio)