Quick Links

Re: BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding

Lists:	pgsql-bugspgsql-patches

From:	"Sagi Bashari" <sagi(at)adamnet(dot)co(dot)il>
To:	pgsql-bugs(at)postgresql(dot)org
Subject:	BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding conversion
Date:	2005-12-21 12:18:51
Message-ID:	20051221121851.313B5F0A7F@svr2.postgresql.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-patches

The following bug has been logged online:

Bug reference: 2120
Logged by: Sagi Bashari
Email address: sagi(at)adamnet(dot)co(dot)il
PostgreSQL version: 8.1.1
Operating system: Debian Sarge
Description: Crash when doing UTF8<->ISO_8859_8 encoding conversion
Details:

Postgresql crashes when a client with ISO_8859_8 encoding tries to select
data from a utf8 database.

I have compiled postgresql 8.1.1 from scratch with the following commands:
./configure --prefix=/home/sagi/temp/pgtest --enable-debug
make
make install
mkdir /home/sagi/temp/pgtest/data
/home/sagi/temp/pgtest/bin/initdb -D /home/sagi/temp/pgtest/data/
/home/sagi/temp/pgtest/bin/postmaster -D /home/sagi/temp/pgtest/data

Created a utf8 database:
./createdb test -E utf8

And ran 'SET client_encoding = 'ISO_8859_8'; SELECT '';' inside
`psql test`.

That's ISO_8859_8 hebrew text inside the SELECT. Here's a file containing
the query (incase the mail breaks it):
http://future.adamnet.co.il/~sagi/temp/enc.sql

Psql returned the following message:
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

The database log:
LOG: server process (PID 1290) was terminated by signal 11

Backtrace:
Core was generated by `postgres: sagi test [local] idle
'.
Program terminated with signal 11, Segmentation fault.

#0 0x0823a1a6 in compare2 ()
#1 0x40100b52 in bsearch () from /lib/libc.so.6
#2 0x0823a41c in LocalToUtf ()
#3 0x40df8909 in iso8859_to_utf8 () from
/home/sagi/temp/pgtest/lib/postgresql/utf8_and_iso8859.so
#4 0x08233112 in FunctionCall5 ()
#5 0x0823adc8 in perform_default_encoding_conversion ()
#6 0x0823acd3 in pg_client_to_server ()
#7 0x081508d4 in pq_getmsgstring ()
#8 0x081b924e in PostgresMain ()
#9 0x08191b1c in BackendRun ()
#10 0x08191565 in BackendStartup ()
#11 0x0818f7c1 in ServerLoop ()
#12 0x0818eb5a in PostmasterMain ()
#13 0x08150e0e in main ()

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Sagi Bashari" <sagi(at)adamnet(dot)co(dot)il>
Cc:	pgsql-bugs(at)postgresql(dot)org, Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Subject:	Re: BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding conversion
Date:	2005-12-21 23:58:50
Message-ID:	17284.1135209530@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-patches

"Sagi Bashari" <sagi(at)adamnet(dot)co(dot)il> writes:
> Postgresql crashes when a client with ISO_8859_8 encoding tries to select
> data from a utf8 database.

It looks like somebody rearranged the pg_enc enum without bothering to
fix the tables that are affected by this.

utf8_and_iso8859.c is certainly broken, and I'm wondering what else
might be. Tatsuo, can you think of any other places to look?

regards, tom lane

From:	Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To:	tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc:	sagi(at)adamnet(dot)co(dot)il, pgsql-bugs(at)postgresql(dot)org
Subject:	Re: BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding
Date:	2005-12-22 00:34:51
Message-ID:	20051222.093451.31238525.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-patches

> "Sagi Bashari" <sagi(at)adamnet(dot)co(dot)il> writes:
> > Postgresql crashes when a client with ISO_8859_8 encoding tries to select
> > data from a utf8 database.
>
> It looks like somebody rearranged the pg_enc enum without bothering to
> fix the tables that are affected by this.
>
> utf8_and_iso8859.c is certainly broken, and I'm wondering what else
> might be. Tatsuo, can you think of any other places to look?

I will look into this.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc:	sagi(at)adamnet(dot)co(dot)il, pgsql-bugs(at)postgresql(dot)org
Subject:	Re: BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding conversion
Date:	2005-12-22 03:09:01
Message-ID:	18342.1135220941@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-patches

Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
>> It looks like somebody rearranged the pg_enc enum without bothering to
>> fix the tables that are affected by this.

> I will look into this.

Thank you. It might be worth adding a comment to pg_wchar.h listing all
the places that need to be fixed when enum pg_enc changes.

regards, tom lane

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, sagi(at)adamnet(dot)co(dot)il, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject:	Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding conversion
Date:	2005-12-22 03:17:30
Message-ID:	200512220317.jBM3HUR12214@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-patches

Tom Lane wrote:
> Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
> >> It looks like somebody rearranged the pg_enc enum without bothering to
> >> fix the tables that are affected by this.
>
> > I will look into this.
>
> Thank you. It might be worth adding a comment to pg_wchar.h listing all
> the places that need to be fixed when enum pg_enc changes.
>

I have developed the following patch against CVS. Tatsuo, you can use
it as a starting point. It adds a comment to encnames.c and reorders
utf8_and_iso8859.c to match the existing order. I also added the
missing entries at the bottom. I checked for pg_conv_map in the source
code and only utf8_and_iso8859.c has that structure, so I assume it is
the only one that also depends on the encnames.c ordering.

Looking at 8.0.X, it has the matching order, so we are OK there, but it
doesn't have the trailing entries. Tatsuo, are those needed?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment	Content-Type	Size
unknown_filename	text/plain	2.1 KB

From:	Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To:	ishii(at)sraoss(dot)co(dot)jp
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, sagi(at)adamnet(dot)co(dot)il, pgsql-bugs(at)postgresql(dot)org
Subject:	Re: BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding
Date:	2005-12-22 05:11:45
Message-ID:	20051222.141145.132981037.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-patches

> > "Sagi Bashari" <sagi(at)adamnet(dot)co(dot)il> writes:
> > > Postgresql crashes when a client with ISO_8859_8 encoding tries to select
> > > data from a utf8 database.
> >
> > It looks like somebody rearranged the pg_enc enum without bothering to
> > fix the tables that are affected by this.
> >
> > utf8_and_iso8859.c is certainly broken, and I'm wondering what else
> > might be. Tatsuo, can you think of any other places to look?
>
> I will look into this.

Quick check reveals that ISO-8859-5 to ISO-8859-8 are broken.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

From:	Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To:	pgman(at)candle(dot)pha(dot)pa(dot)us
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, ishii(at)sraoss(dot)co(dot)jp, sagi(at)adamnet(dot)co(dot)il, pgsql-patches(at)postgresql(dot)org
Subject:	Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding
Date:	2005-12-23 01:42:07
Message-ID:	20051223.104207.129789350.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-patches

> Tom Lane wrote:
> > Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
> > >> It looks like somebody rearranged the pg_enc enum without bothering to
> > >> fix the tables that are affected by this.
> >
> > > I will look into this.
> >
> > Thank you. It might be worth adding a comment to pg_wchar.h listing all
> > the places that need to be fixed when enum pg_enc changes.
> >
>
> I have developed the following patch against CVS. Tatsuo, you can use
> it as a starting point. It adds a comment to encnames.c and reorders
> utf8_and_iso8859.c to match the existing order. I also added the
> missing entries at the bottom. I checked for pg_conv_map in the source
> code and only utf8_and_iso8859.c has that structure, so I assume it is
> the only one that also depends on the encnames.c ordering.

I think the current implementaion in utf8_and_iso8859.c is fast but
too fragile against rearranging of encoding id. I modify those functions
in utf8_and_iso8859.c to do a linear search with encoding id. With
this change developers feel free to rearrange encoding id, and this
kind of problems will be gone forever. The only penalty is the time of
searching 13 entries in the encoding map. We can do a quick sort but
it will need sorted entry by encoding id and may cause similar problem
in the future. So I'm not sure it's worth doing the quick sort.

Propsed patch attached.

> Looking at 8.0.X, it has the matching order, so we are OK there, but it
> doesn't have the trailing entries. Tatsuo, are those needed?

I think it's OK, since the last missing entry will never be visited.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Attachment	Content-Type	Size
unknown_filename	text/plain	3.0 KB

From:	Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To:	Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, sagi(at)adamnet(dot)co(dot)il, pgsql-patches(at)postgresql(dot)org
Subject:	Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding conversion
Date:	2005-12-23 01:53:09
Message-ID:	200512230153.jBN1r9810536@candle.pha.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-patches

That is a nice solution --- instead of listing all the encodings, you
listed just the ones that need to be used. The list is shorter and
clearer. It seems like the right approach. Thanks.

---------------------------------------------------------------------------

Tatsuo Ishii wrote:
> > Tom Lane wrote:
> > > Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
> > > >> It looks like somebody rearranged the pg_enc enum without bothering to
> > > >> fix the tables that are affected by this.
> > >
> > > > I will look into this.
> > >
> > > Thank you. It might be worth adding a comment to pg_wchar.h listing all
> > > the places that need to be fixed when enum pg_enc changes.
> > >
> >
> > I have developed the following patch against CVS. Tatsuo, you can use
> > it as a starting point. It adds a comment to encnames.c and reorders
> > utf8_and_iso8859.c to match the existing order. I also added the
> > missing entries at the bottom. I checked for pg_conv_map in the source
> > code and only utf8_and_iso8859.c has that structure, so I assume it is
> > the only one that also depends on the encnames.c ordering.
>
> I think the current implementaion in utf8_and_iso8859.c is fast but
> too fragile against rearranging of encoding id. I modify those functions
> in utf8_and_iso8859.c to do a linear search with encoding id. With
> this change developers feel free to rearrange encoding id, and this
> kind of problems will be gone forever. The only penalty is the time of
> searching 13 entries in the encoding map. We can do a quick sort but
> it will need sorted entry by encoding id and may cause similar problem
> in the future. So I'm not sure it's worth doing the quick sort.
>
> Propsed patch attached.
>
> > Looking at 8.0.X, it has the matching order, so we are OK there, but it
> > doesn't have the trailing entries. Tatsuo, are those needed?
>
> I think it's OK, since the last missing entry will never be visited.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan

> Index: src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c,v
> retrieving revision 1.16
> diff -u -r1.16 utf8_and_iso8859.c
> --- src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c 22 Nov 2005 18:17:26 -0000 1.16
> +++ src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c 23 Dec 2005 01:43:38 -0000
> @@ -68,15 +68,6 @@
> } pg_conv_map;
>
> static pg_conv_map maps[] = {
> - {PG_SQL_ASCII}, /* SQL/ASCII */
> - {PG_EUC_JP}, /* EUC for Japanese */
> - {PG_EUC_CN}, /* EUC for Chinese */
> - {PG_EUC_KR}, /* EUC for Korean */
> - {PG_EUC_TW}, /* EUC for Taiwan */
> - {PG_JOHAB}, /* EUC for Korean JOHAB */
> - {PG_UTF8}, /* Unicode UTF8 */
> - {PG_MULE_INTERNAL}, /* Mule internal code */
> - {PG_LATIN1}, /* ISO-8859-1 Latin 1 */
> {PG_LATIN2, LUmapISO8859_2, ULmapISO8859_2,
> sizeof(LUmapISO8859_2) / sizeof(pg_local_to_utf),
> sizeof(ULmapISO8859_2) / sizeof(pg_utf_to_local)}, /* ISO-8859-2 Latin 2 */
> @@ -104,12 +95,6 @@
> {PG_LATIN10, LUmapISO8859_16, ULmapISO8859_16,
> sizeof(LUmapISO8859_16) / sizeof(pg_local_to_utf),
> sizeof(ULmapISO8859_16) / sizeof(pg_utf_to_local)}, /* ISO-8859-16 Latin 10 */
> - {PG_WIN1256}, /* windows-1256 */
> - {PG_WIN1258}, /* Windows-1258 */
> - {PG_WIN874}, /* windows-874 */
> - {PG_KOI8R}, /* KOI8-R */
> - {PG_WIN1251}, /* windows-1251 */
> - {PG_WIN866}, /* (MS-DOS CP866) */
> {PG_ISO_8859_5, LUmapISO8859_5, ULmapISO8859_5,
> sizeof(LUmapISO8859_5) / sizeof(pg_local_to_utf),
> sizeof(ULmapISO8859_5) / sizeof(pg_utf_to_local)}, /* ISO-8859-5 */
> @@ -131,11 +116,23 @@
> unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
> unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
> int len = PG_GETARG_INT32(4);
> + int i;
>
> Assert(PG_GETARG_INT32(1) == PG_UTF8);
> Assert(len >= 0);
>
> - LocalToUtf(src, dest, maps[encoding].map1, maps[encoding].size1, encoding, len);
> + for (i=0;i<sizeof(maps)/sizeof(pg_conv_map);i++)
> + {
> + if (encoding == maps[i].encoding)
> + {
> + LocalToUtf(src, dest, maps[i].map1, maps[i].size1, encoding, len);
> + PG_RETURN_VOID();
> + }
> + }
> +
> + ereport(ERROR,
> + (errcode(ERRCODE_INTERNAL_ERROR),
> + errmsg("unexpected encoding id %d for ISO-8859 charsets", encoding)));
>
> PG_RETURN_VOID();
> }
> @@ -147,11 +144,23 @@
> unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
> unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
> int len = PG_GETARG_INT32(4);
> + int i;
>
> Assert(PG_GETARG_INT32(0) == PG_UTF8);
> Assert(len >= 0);
>
> - UtfToLocal(src, dest, maps[encoding].map2, maps[encoding].size2, len);
> + for (i=0;i<sizeof(maps)/sizeof(pg_conv_map);i++)
> + {
> + if (encoding == maps[i].encoding)
> + {
> + UtfToLocal(src, dest, maps[i].map2, maps[i].size2, len);
> + PG_RETURN_VOID();
> + }
> + }
> +
> + ereport(ERROR,
> + (errcode(ERRCODE_INTERNAL_ERROR),
> + errmsg("unexpected encoding id %d for ISO-8859 charsets", encoding)));
>
> PG_RETURN_VOID();
> }

From:	Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To:	pgman(at)candle(dot)pha(dot)pa(dot)us
Cc:	ishii(at)sraoss(dot)co(dot)jp, tgl(at)sss(dot)pgh(dot)pa(dot)us, sagi(at)adamnet(dot)co(dot)il, pgsql-patches(at)postgresql(dot)org
Subject:	Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8
Date:	2005-12-23 01:59:06
Message-ID:	20051223.105906.127176963.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-patches

Ok, I will commit the patches.

BTW, the example code sequence (ISO-8859-8) Sagi posted seems to have
wrong one.

select '';
WARNING: ignoring unconvertible ISO_8859_8 character 0x00d7
:
:

0x00d7(\327) is not listed in our ISO-8858-8/UTF-8 conversion map. Is
this OK or do we need to add the conversion for the code?
What do you think, Sega?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> That is a nice solution --- instead of listing all the encodings, you
> listed just the ones that need to be used. The list is shorter and
> clearer. It seems like the right approach. Thanks.
>
> ---------------------------------------------------------------------------
>
> Tatsuo Ishii wrote:
> > > Tom Lane wrote:
> > > > Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
> > > > >> It looks like somebody rearranged the pg_enc enum without bothering to
> > > > >> fix the tables that are affected by this.
> > > >
> > > > > I will look into this.
> > > >
> > > > Thank you. It might be worth adding a comment to pg_wchar.h listing all
> > > > the places that need to be fixed when enum pg_enc changes.
> > > >
> > >
> > > I have developed the following patch against CVS. Tatsuo, you can use
> > > it as a starting point. It adds a comment to encnames.c and reorders
> > > utf8_and_iso8859.c to match the existing order. I also added the
> > > missing entries at the bottom. I checked for pg_conv_map in the source
> > > code and only utf8_and_iso8859.c has that structure, so I assume it is
> > > the only one that also depends on the encnames.c ordering.
> >
> > I think the current implementaion in utf8_and_iso8859.c is fast but
> > too fragile against rearranging of encoding id. I modify those functions
> > in utf8_and_iso8859.c to do a linear search with encoding id. With
> > this change developers feel free to rearrange encoding id, and this
> > kind of problems will be gone forever. The only penalty is the time of
> > searching 13 entries in the encoding map. We can do a quick sort but
> > it will need sorted entry by encoding id and may cause similar problem
> > in the future. So I'm not sure it's worth doing the quick sort.
> >
> > Propsed patch attached.
> >
> > > Looking at 8.0.X, it has the matching order, so we are OK there, but it
> > > doesn't have the trailing entries. Tatsuo, are those needed?
> >
> > I think it's OK, since the last missing entry will never be visited.
> > --
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
>
> > Index: src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c
> > ===================================================================
> > RCS file: /cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c,v
> > retrieving revision 1.16
> > diff -u -r1.16 utf8_and_iso8859.c
> > --- src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c 22 Nov 2005 18:17:26 -0000 1.16
> > +++ src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c 23 Dec 2005 01:43:38 -0000
> > @@ -68,15 +68,6 @@
> > } pg_conv_map;
> >
> > static pg_conv_map maps[] = {
> > - {PG_SQL_ASCII}, /* SQL/ASCII */
> > - {PG_EUC_JP}, /* EUC for Japanese */
> > - {PG_EUC_CN}, /* EUC for Chinese */
> > - {PG_EUC_KR}, /* EUC for Korean */
> > - {PG_EUC_TW}, /* EUC for Taiwan */
> > - {PG_JOHAB}, /* EUC for Korean JOHAB */
> > - {PG_UTF8}, /* Unicode UTF8 */
> > - {PG_MULE_INTERNAL}, /* Mule internal code */
> > - {PG_LATIN1}, /* ISO-8859-1 Latin 1 */
> > {PG_LATIN2, LUmapISO8859_2, ULmapISO8859_2,
> > sizeof(LUmapISO8859_2) / sizeof(pg_local_to_utf),
> > sizeof(ULmapISO8859_2) / sizeof(pg_utf_to_local)}, /* ISO-8859-2 Latin 2 */
> > @@ -104,12 +95,6 @@
> > {PG_LATIN10, LUmapISO8859_16, ULmapISO8859_16,
> > sizeof(LUmapISO8859_16) / sizeof(pg_local_to_utf),
> > sizeof(ULmapISO8859_16) / sizeof(pg_utf_to_local)}, /* ISO-8859-16 Latin 10 */
> > - {PG_WIN1256}, /* windows-1256 */
> > - {PG_WIN1258}, /* Windows-1258 */
> > - {PG_WIN874}, /* windows-874 */
> > - {PG_KOI8R}, /* KOI8-R */
> > - {PG_WIN1251}, /* windows-1251 */
> > - {PG_WIN866}, /* (MS-DOS CP866) */
> > {PG_ISO_8859_5, LUmapISO8859_5, ULmapISO8859_5,
> > sizeof(LUmapISO8859_5) / sizeof(pg_local_to_utf),
> > sizeof(ULmapISO8859_5) / sizeof(pg_utf_to_local)}, /* ISO-8859-5 */
> > @@ -131,11 +116,23 @@
> > unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
> > unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
> > int len = PG_GETARG_INT32(4);
> > + int i;
> >
> > Assert(PG_GETARG_INT32(1) == PG_UTF8);
> > Assert(len >= 0);
> >
> > - LocalToUtf(src, dest, maps[encoding].map1, maps[encoding].size1, encoding, len);
> > + for (i=0;i<sizeof(maps)/sizeof(pg_conv_map);i++)
> > + {
> > + if (encoding == maps[i].encoding)
> > + {
> > + LocalToUtf(src, dest, maps[i].map1, maps[i].size1, encoding, len);
> > + PG_RETURN_VOID();
> > + }
> > + }
> > +
> > + ereport(ERROR,
> > + (errcode(ERRCODE_INTERNAL_ERROR),
> > + errmsg("unexpected encoding id %d for ISO-8859 charsets", encoding)));
> >
> > PG_RETURN_VOID();
> > }
> > @@ -147,11 +144,23 @@
> > unsigned char *src = (unsigned char *) PG_GETARG_CSTRING(2);
> > unsigned char *dest = (unsigned char *) PG_GETARG_CSTRING(3);
> > int len = PG_GETARG_INT32(4);
> > + int i;
> >
> > Assert(PG_GETARG_INT32(0) == PG_UTF8);
> > Assert(len >= 0);
> >
> > - UtfToLocal(src, dest, maps[encoding].map2, maps[encoding].size2, len);
> > + for (i=0;i<sizeof(maps)/sizeof(pg_conv_map);i++)
> > + {
> > + if (encoding == maps[i].encoding)
> > + {
> > + UtfToLocal(src, dest, maps[i].map2, maps[i].size2, len);
> > + PG_RETURN_VOID();
> > + }
> > + }
> > +
> > + ereport(ERROR,
> > + (errcode(ERRCODE_INTERNAL_ERROR),
> > + errmsg("unexpected encoding id %d for ISO-8859 charsets", encoding)));
> >
> > PG_RETURN_VOID();
> > }
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square, Pennsylvania 19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc:	pgman(at)candle(dot)pha(dot)pa(dot)us, sagi(at)adamnet(dot)co(dot)il, pgsql-patches(at)postgresql(dot)org
Subject:	Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8 encoding
Date:	2005-12-23 04:38:26
Message-ID:	15856.1135312706@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-patches

Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
> I think the current implementaion in utf8_and_iso8859.c is fast but
> too fragile against rearranging of encoding id. I modify those functions
> in utf8_and_iso8859.c to do a linear search with encoding id.

That's not unreasonable, but I was wondering whether we could add some
Assert() tests that would catch problems without imposing any extra cost
in normal non-Assert builds.

regards, tom lane

From:	Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To:	tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc:	ishii(at)sraoss(dot)co(dot)jp, pgman(at)candle(dot)pha(dot)pa(dot)us, sagi(at)adamnet(dot)co(dot)il, pgsql-patches(at)postgresql(dot)org
Subject:	Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8
Date:	2005-12-24 00:27:23
Message-ID:	20051224.092723.123398431.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-patches

> Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
> > I think the current implementaion in utf8_and_iso8859.c is fast but
> > too fragile against rearranging of encoding id. I modify those functions
> > in utf8_and_iso8859.c to do a linear search with encoding id.
>
> That's not unreasonable, but I was wondering whether we could add some
> Assert() tests that would catch problems without imposing any extra cost
> in normal non-Assert builds.

I thought about that too. But my conclusion was current code is too
hard to maintain even if appropreate comments are written in related
files.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

From:	Sagi Bashari <sagi(at)adamnet(dot)co(dot)il>
To:	Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc:	pgman(at)candle(dot)pha(dot)pa(dot)us, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-patches(at)postgresql(dot)org
Subject:	Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8
Date:	2005-12-25 09:34:42
Message-ID:	43AE67B2.9060208@adamnet.co.il
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-patches

On 23/12/2005 03:59, Tatsuo Ishii wrote:
> BTW, the example code sequence (ISO-8859-8) Sagi posted seems to have
> wrong one.
>
> select '��';
> WARNING: ignoring unconvertible ISO_8859_8 character 0x00d7
> :
> :
>
> 0x00d7(\327) is not listed in our ISO-8858-8/UTF-8 conversion map. Is
> this OK or do we need to add the conversion for the code?
> What do you think, Sega?
>

I'm not sure whats 0x00d7(\327). The example I sent is the word "shalom"
(hello/peace) in hebrew, four letters, here's the ascii conversion:
SELECT 'שלום';:
S=83
E=69
L=76
E=69
C=67
T=84
=32
'=39
ש=249
ל=236
ו=229
ם=237
'=39
;=59

Sagi

From:	Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To:	sagi(at)adamnet(dot)co(dot)il
Cc:	ishii(at)sraoss(dot)co(dot)jp, pgman(at)candle(dot)pha(dot)pa(dot)us, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-patches(at)postgresql(dot)org
Subject:	Re: [BUGS] BUG #2120: Crash when doing UTF8<->ISO_8859_8
Date:	2005-12-25 09:50:01
Message-ID:	20051225.185001.105527470.t-ishii@sraoss.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs pgsql-patches

> On 23/12/2005 03:59, Tatsuo Ishii wrote:
> > BTW, the example code sequence (ISO-8859-8) Sagi posted seems to have
> > wrong one.
> >
> > select '$,3u=u=u=u=u=u=u=u=';
> > WARNING: ignoring unconvertible ISO_8859_8 character 0x00d7
> > :
> > :
> >
> > 0x00d7(\327) is not listed in our ISO-8858-8/UTF-8 conversion map. Is
> > this OK or do we need to add the conversion for the code?
> > What do you think, Sega?
> >
>
> I'm not sure whats 0x00d7(\327). The example I sent is the word "shalom"
> (hello/peace) in hebrew, four letters, here's the ascii conversion:

Oh, Ok. Maybe a noise added by my MUA.
--
Tatsuo Ishii
SRA OSS, Inc. Japan