Re: Patch: add conversion from pg_wchar to multibyte

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Patch: add conversion from pg_wchar to multibyte
Date: 2012-05-22 10:48:11
Message-ID: CAPpHfduQEZUV89CnDJcjnPrdDmB810O4_xLc71GbEA42Yi=40Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 22, 2012 at 11:50 AM, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>
> I think it's possible. The first characters are defined like this:
>
> #define IS_LCPRV1(c) ((unsigned char)(c) == 0x9a || (unsigned char)(c)
> == 0x9b)
> #define IS_LCPRV2(c) ((unsigned char)(c) == 0x9c || (unsigned char)(c)
> == 0x9d)
>
> It seems IS_LCPRV1 is not used in any of PostgreSQL supported
> encodings at this point, that means there's 0 chance which existing
> databases include LCPRV1. So you could safely ignore it.
>
> For IS_LCPRV2, it is only used for Chinese encodings (EUC_TW and BIG5)
> in backend/utils/mb/conversion_procs/euc_tw_and_big5/euc_tw_and_big5.c
> and it is fixed to 0x9d. So you can always restore the value to 0x9d.
>
> > Also in this part of code we're shifting first byte by 16 bits:
> >
> > if (IS_LC1(*from) && len >= 2)
> > {
> > *to = *from++ << 16;
> > *to |= *from++;
> > len -= 2;
> > }
> > else if (IS_LCPRV1(*from) && len >= 3)
> > {
> > from++;
> > *to = *from++ << 16;
> > *to |= *from++;
> > len -= 3;
> > }
> >
> > Why don't we shift it by 8 bits?
>
> Because we want the first byte of LC1 case to be placed in the second
> byte of wchar. i.e.
>
> 0th byte: always 0
> 1th byte: leading byte (the first byte of the multibyte)
> 2th byte: always 0
> 3th byte: the second byte of the multibyte
>
> Note that we always assume that the 1th byte (called "leading byte":
> LB in short) represents the id of the character set (from 0x81 to
> 0xff) in MULE INTERNAL encoding. For the mapping between LB and
> charsets, see pg_wchar.h.

Thanks for your comments. They clarify a lot.
But I still don't realize how can we distinguish IS_LCPRV2 and IS_LC2?
Isn't it possible for them to produce same pg_wchar?

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message José Luis Tallón 2012-05-22 11:05:55 Re: Changing the concept of a DATABASE
Previous Message Simon Riggs 2012-05-22 09:46:02 Changing the concept of a DATABASE