Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

From: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-10-03 10:20:30
Message-ID: CACoZds1SE7=+8iuoS2BntHwwX+nhcq57pH1qF5sN+-eqn5Omhg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On 12 February 2011 14:48, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> On Sun, Feb 6, 2011 at 15:31, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> Force strings passed to and from plperl to be in UTF8 encoding.
>>
>> String are converted to UTF8 on the way into perl and to the
>> database encoding on the way back. This avoids a number of
>> observed anomalies, and ensures Perl a consistent view of the
>> world.
>
> So I noticed a problem while playing with this in my discussion with
> David Wheeler. pg_do_encoding() does nothing when the src encoding ==
> the dest encoding. That means on a UTF-8 database we fail make sure
> our strings are valid utf8.
>
> An easy way to see this is to embed a null in the middle of a string:
> => create or replace function zerob() returns text as $$ return
> "abcd\0efg"; $$ language plperl;
> => SELECT zerob();
> abcd
>
> Also It seems bogus to bogus to do any encoding conversion when we are
> SQL_ASCII, and its really trivial to fix.
>
> With the attached:
> - when we are on a utf8 database make sure to verify our output string
> in sv2cstr (we assume database strings coming in are already valid)
>
> - Do no string conversion when we are SQL_ASCII in or out
>
> - add plperl_helpers.h as a dep to plperl.o in our makefile
>
> - remove some redundant calls to pg_verify_mbstr()
>
> - as utf_e2u only as one caller dont pstrdup() instead have the caller
> check (saves some cycles and memory)
>

Is there a plan to commit this issue? I am still seeing this issue on
PG 9.1 STABLE branch. Attached is a small patch that targets only the
specific issue in the described testcase :

create or replace function zerob() returns text as $$ return
"abcd\0efg"; $$ language plperl;
SELECT zerob();

The patch does the perl data validation in the function utf_u2e() itself.

>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Attachment Content-Type Size
verify_in_utf_u2e.patch text/x-diff 490 bytes

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2011-10-03 16:18:48 pgsql: ProcedureCreate neglected to record dependencies on default expr
Previous Message User Itagaki 2011-10-03 07:31:22 textsearch-ja - eudc: Support 9.1 and 9.2dev.

Browse pgsql-hackers by date

  From Date Subject
Next Message Christian Ullrich 2011-10-03 11:05:53 Unexpected collation error in 9.1.1
Previous Message Fujii Masao 2011-10-03 09:56:53 Re: [REVIEW] pg_last_xact_insert_timestamp