Re: encoding names

Lists: pgsql-patches
From: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
To: pgsql-patches <pgsql-patches(at)postgreSQL(dot)org>
Subject: encoding names
Date: 2001-08-29 14:43:51
Message-ID: 20010829164351.A14528@zf.jcu.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


Hi,

this is final version (I hope) of multibyte clean up.

All routines as input accept "more standard" encoding names, but all
names on outputs are back compatible.

New names is possible obtain only by:

database_character_set()
- returns database encoding name

character_set(int)
- convert encoding 'id' to encoding name

character_set(name)
- convert encoding 'name' to 'id'

The configure.in is not changed.

All encoding map files are renamed to standard and lower case names.

... and other changes described in last versions of this patch


Don't forget for CVS commit:

* following files are renamed:

src/utils/mb/Unicode/KOI8_to_utf8.map -->
src/utils/mb/Unicode/koi8r_to_utf8.map

src/utils/mb/Unicode/WIN_to_utf8.map -->
src/utils/mb/Unicode/win1251_to_utf8.map

src/utils/mb/Unicode/utf8_to_KOI8.map -->
src/utils/mb/Unicode/utf8_to_koi8r.map

src/utils/mb/Unicode/utf8_to_WIN.map -->
src/utils/mb/Unicode/utf8_to_win1251.map

* new file:

src/utils/mb/encname.c

* removed file:

src/utils/mb/common.c

Examples:

l2=# select getdatabaseencoding(), database_character_set();
getdatabaseencoding | database_character_set
---------------------+------------------------
LATIN2 | ISO-8859-2
(1 row)

l2=# select pg_encoding_to_char(5), character_set(5);
pg_encoding_to_char | character_set
---------------------+---------------
UNICODE | UTF-8
(1 row)

l2=# select pg_char_to_encoding('Latin2'), character_set('Latin2');
pg_char_to_encoding | character_set
---------------------+---------------
8 | 8
(1 row)

test=# select pg_char_to_encoding('ISO-8859-3'), character_set('Latin3');
pg_char_to_encoding | character_set
---------------------+---------------
9 | 9
(1 row)

Karel

--
Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
http://home.zf.jcu.cz/~zakkr/

C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

Attachment Content-Type Size
mb-08292001.patch.gz application/x-gzip 21.4 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
Cc: pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: encoding names
Date: 2001-08-29 23:30:40
Message-ID: Pine.LNX.4.30.0108300116210.677-100000@peter.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Karel Zak writes:

> New names is possible obtain only by:
>
> database_character_set()
> - returns database encoding name
>
> character_set(int)
> - convert encoding 'id' to encoding name
>
> character_set(name)
> - convert encoding 'name' to 'id'

I thought we decided not to add functions returning "new" names until we
know exactly what the new names should be, and pending schema
implementation. These three functions just implement an interface that is
equivalent to an existing one but no more standard than the existing one.

> l2=# select getdatabaseencoding(), database_character_set();
> getdatabaseencoding | database_character_set
> ---------------------+------------------------
> LATIN2 | ISO-8859-2
> (1 row)

For instance, from an SQL point of view, the left side is more official
than the right side, and it's easier to handle as identifier.

> l2=# select pg_encoding_to_char(5), character_set(5);
> pg_encoding_to_char | character_set
> ---------------------+---------------
> UNICODE | UTF-8
> (1 row)

Spelled UTF8 in SQL. This is a boring debate, but it needs to be done
first, so people can rely on the names. Accepting flexible input is good,
but the output needs to be reliable.

Also:

pg_char_to_encname_struct(): too much long encoding name

better

...(): encoding name too long

The rest looks okay superficially, but someone else should probably check
it.

--
Peter Eisentraut peter_e(at)gmx(dot)net http://funkturm.homeip.net/~peter


From: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: encoding names
Date: 2001-08-31 15:24:38
Message-ID: 20010831172438.A27823@zf.jcu.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

On Thu, Aug 30, 2001 at 01:30:40AM +0200, Peter Eisentraut wrote:
> > - convert encoding 'name' to 'id'
>
> I thought we decided not to add functions returning "new" names until we
> know exactly what the new names should be, and pending schema

Ok, the patch not to add functions.

> better
>
> ...(): encoding name too long

Fixed.

I found new bug in command/variable.c in parse_client_encoding(), nobody
probably never see this error:

if (pg_set_client_encoding(encoding))
{
elog(ERROR, "Conversion between %s and %s is not supported",
value, GetDatabaseEncodingName());
}

because pg_set_client_encoding() returns -1 for error and 0 as true.
It's fixed too.

IMHO it can be apply.

Karel
PS:

* following files are renamed:

src/utils/mb/Unicode/KOI8_to_utf8.map -->
src/utils/mb/Unicode/koi8r_to_utf8.map

src/utils/mb/Unicode/WIN_to_utf8.map -->
src/utils/mb/Unicode/win1251_to_utf8.map

src/utils/mb/Unicode/utf8_to_KOI8.map -->
src/utils/mb/Unicode/utf8_to_koi8r.map

src/utils/mb/Unicode/utf8_to_WIN.map -->
src/utils/mb/Unicode/utf8_to_win1251.map

* new file:

src/utils/mb/encname.c

* removed file:

src/utils/mb/common.c

--
Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
http://home.zf.jcu.cz/~zakkr/

C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

Attachment Content-Type Size
mb-08312001.patch.gz application/x-gzip 21.2 KB

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: zakkr(at)zf(dot)jcu(dot)cz
Cc: peter_e(at)gmx(dot)net, pgsql-patches(at)postgresql(dot)org
Subject: Re: encoding names
Date: 2001-09-03 01:02:44
Message-ID: 20010903100244C.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Thanks for the patches. I will check them as soon as possible. Also,
I would like to ask Hiroshi and others who are working for the ODBC
driver to check if everything is ok.

> I found new bug in command/variable.c in parse_client_encoding(), nobody
> probably never see this error:
>
> if (pg_set_client_encoding(encoding))
> {
> elog(ERROR, "Conversion between %s and %s is not supported",
> value, GetDatabaseEncodingName());
> }
>
> because pg_set_client_encoding() returns -1 for error and 0 as true.
> It's fixed too.

??? In C, anthing other than 0 is evaluted to true. So the original
code would work as expected.
--
Tatsuo Ishii


From: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: peter_e(at)gmx(dot)net, pgsql-patches(at)postgresql(dot)org
Subject: Re: encoding names
Date: 2001-09-03 07:43:29
Message-ID: 20010903094329.A3122@zf.jcu.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

On Mon, Sep 03, 2001 at 10:02:44AM +0900, Tatsuo Ishii wrote:
> Thanks for the patches. I will check them as soon as possible. Also,
> I would like to ask Hiroshi and others who are working for the ODBC
> driver to check if everything is ok.

Thanks.

>
> > I found new bug in command/variable.c in parse_client_encoding(), nobody
> > probably never see this error:
> >
> > if (pg_set_client_encoding(encoding))
> > {
> > elog(ERROR, "Conversion between %s and %s is not supported",
> > value, GetDatabaseEncodingName());
> > }
> >
> > because pg_set_client_encoding() returns -1 for error and 0 as true.
> > It's fixed too.
>
> ??? In C, anthing other than 0 is evaluted to true. So the original
> code would work as expected.

Grrrr, I'm really forget my brain at home sometime.... (But with "< 0"
it's more readable, right?:-)

--
Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
http://home.zf.jcu.cz/~zakkr/

C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz


From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: zakkr(at)zf(dot)jcu(dot)cz
Cc: peter_e(at)gmx(dot)net, pgsql-patches(at)postgresql(dot)org
Subject: Re: encoding names
Date: 2001-09-06 05:04:34
Message-ID: 20010906140434K.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Karel,

> Thanks for the patches. I will check them as soon as possible. Also,
> I would like to ask Hiroshi and others who are working for the ODBC
> driver to check if everything is ok.

I have committed your patches with some fixes to
interfaces/odbc/multibyte.c suggested by Tokuya Eiji.
--
Tatsuo Ishii


From: Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: peter_e(at)gmx(dot)net, pgsql-patches(at)postgresql(dot)org
Subject: Re: encoding names
Date: 2001-09-06 06:45:18
Message-ID: 20010906084518.A1047@zf.jcu.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

On Thu, Sep 06, 2001 at 02:04:34PM +0900, Tatsuo Ishii wrote:
> Karel,
>
> > Thanks for the patches. I will check them as soon as possible. Also,
> > I would like to ask Hiroshi and others who are working for the ODBC
> > driver to check if everything is ok.
>
> I have committed your patches with some fixes to
> interfaces/odbc/multibyte.c suggested by Tokuya Eiji.

Thanks and thanks for all suggestions from you and Peter
and the others!

Karel

--
Karel Zak <zakkr(at)zf(dot)jcu(dot)cz>
http://home.zf.jcu.cz/~zakkr/

C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: zakkr(at)zf(dot)jcu(dot)cz, peter_e(at)gmx(dot)net, pgsql-patches(at)postgresql(dot)org
Subject: Re: encoding names
Date: 2001-09-06 14:18:38
Message-ID: 200109061418.f86EId820207@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

> Karel,
>
> > Thanks for the patches. I will check them as soon as possible. Also,
> > I would like to ask Hiroshi and others who are working for the ODBC
> > driver to check if everything is ok.
>
> I have committed your patches with some fixes to
> interfaces/odbc/multibyte.c suggested by Tokuya Eiji.

Tatsuo, I think you forgot to commit the new encname.c file and remove
the common.c file. Can you check that?

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: pgman(at)candle(dot)pha(dot)pa(dot)us
Cc: zakkr(at)zf(dot)jcu(dot)cz, peter_e(at)gmx(dot)net, pgsql-patches(at)postgresql(dot)org
Subject: Re: encoding names
Date: 2001-09-07 01:39:01
Message-ID: 20010907103901G.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

> >
> > > Thanks for the patches. I will check them as soon as possible. Also,
> > > I would like to ask Hiroshi and others who are working for the ODBC
> > > driver to check if everything is ok.
> >
> > I have committed your patches with some fixes to
> > interfaces/odbc/multibyte.c suggested by Tokuya Eiji.
>
> Tatsuo, I think you forgot to commit the new encname.c file and remove
> the common.c file. Can you check that?

Oops. I will commit encname.c
--
Tatsuo Ishii


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: zakkr(at)zf(dot)jcu(dot)cz, peter_e(at)gmx(dot)net, pgsql-patches(at)postgresql(dot)org
Subject: Re: encoding names
Date: 2001-09-07 01:40:51
Message-ID: 200109070140.f871epe17174@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


Thanks.

> > >
> > > > Thanks for the patches. I will check them as soon as possible. Also,
> > > > I would like to ask Hiroshi and others who are working for the ODBC
> > > > driver to check if everything is ok.
> > >
> > > I have committed your patches with some fixes to
> > > interfaces/odbc/multibyte.c suggested by Tokuya Eiji.
> >
> > Tatsuo, I think you forgot to commit the new encname.c file and remove
> > the common.c file. Can you check that?
>
> Oops. I will commit encname.c
> --
> Tatsuo Ishii
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026