Re: pl/perl and utf-8 in sql_ascii databases

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: alvherre(at)commandprompt(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, badalex(at)gmail(dot)com, cb(at)df7cb(dot)de, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pl/perl and utf-8 in sql_ascii databases
Date: 2012-07-12 04:12:24
Message-ID: 20120712.131224.120940995.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Very sorry for rotten subject. I resent the message with correct subject.
# Our mail server insisted that the message is spam. sigh..
====
Hmm... Sorry for immature patch..

> ... and this story hasn't ended yet, because one of the new tests is
> failing. See here:
>
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=magpie&dt=2012-07-11%2010%3A00%3A04
>
> The interesting part of the diff is:
...
> SELECT encode(perl_utf_inout(E'ab\xe5\xb1\xb1cd')::bytea, 'escape')
> ! ERROR: character with byte sequence 0xe5 0xb7 0x9d in encoding "UTF8" has no equivalent in encoding "LATIN1"
> ! CONTEXT: PL/Perl function "perl_utf_inout"
>
>
> I am not sure what can we do here other than remove this function and
> query from the test.

I've run the regress only for the environment capable to handle
the character U+5ddd (Japanese character which means river)...

The byte sequences which can be decoded and the result byte
sequences of encoding from a unicode character vary among the
encodings.

The problem itself which is the aim of this thread could be
covered without the additional test. That confirms if
encoding/decoding is done as expected on calling the language
handler. I suppose that testing for the two cases and additional
one case which runs pg_do_encoding_conversion(), say latin1,
would be enough to confirm that encoding/decoding is properly
done, since the concrete conversion scheme is not significant
this case.

So I recommend that we should add the test for latin1 and omit
the test from other than sql_ascii, utf8 and latin1. This might
be archieved by create empty plperl_lc.sql and plperl_lc.out
files for those encodings.

What do you think about that?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2012-07-12 04:18:10 Re: Synchronous Standalone Master Redoux
Previous Message Kyotaro HORIGUCHI 2012-07-12 04:09:19 Re: [SPAM] [MessageLimit][lowlimit] Re: pl/perl and utf-8 in sql_ascii databases