Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.

Lists: pgsql-committerspgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-02-06 22:31:13
Message-ID: E1PmD8T-0003YF-Fv@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Force strings passed to and from plperl to be in UTF8 encoding.

String are converted to UTF8 on the way into perl and to the
database encoding on the way back. This avoids a number of
observed anomalies, and ensures Perl a consistent view of the
world.

Some minor code cleanups are also accomplished.

Alex Hunsaker, reviewed by Andy Colson.

Branch
------
master

Details
-------
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=50d89d422f9c68a52a6964e5468e8eb4f90b1d95

Modified Files
--------------
doc/src/sgml/plperl.sgml | 8 ++
src/pl/plperl/SPI.xs | 52 ++++++---
src/pl/plperl/Util.xs | 66 +++++-----
src/pl/plperl/plperl.c | 260 +++++++++++++++++++++++-----------------
src/pl/plperl/plperl_helpers.h | 69 +++++++++++
5 files changed, 295 insertions(+), 160 deletions(-)


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Alex Hunsaker <badalex(at)gmail(dot)com>
Subject: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-02-07 01:02:05
Message-ID: 4D4F448D.5000902@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 02/06/2011 05:31 PM, Andrew Dunstan wrote:
> Force strings passed to and from plperl to be in UTF8 encoding.
>
> String are converted to UTF8 on the way into perl and to the
> database encoding on the way back. This avoids a number of
> observed anomalies, and ensures Perl a consistent view of the
> world.
>
> Some minor code cleanups are also accomplished.
>
> Alex Hunsaker, reviewed by Andy Colson.

This has broken the buildfarm :-(

perl ppport.h reports:

*** WARNING: Uses HeUTF8, which may not be portable below perl
5.11.0, even with 'ppport.h'

Experimentation on a CentOS machine suggests we can cure it with this:

#ifndef HeUTF8
#define HeUTF8(he) ((HeKLEN(he) == HEf_SVKEY) ? \
SvUTF8(HeKEY_sv(he))
: \
(U32)HeKUTF8(he))
#endif

cheers

andrew


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-02-07 02:13:22
Message-ID: AANLkTik2jYJ6SJ0zPUT_p8gmQ_RCjWxjvswSQ7ntrt=9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Sun, Feb 6, 2011 at 18:02, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
>
>
>
> On 02/06/2011 05:31 PM, Andrew Dunstan wrote:
>>
>> Force strings passed to and from plperl to be in UTF8 encoding.
>>
>> String are converted to UTF8 on the way into perl and to the
>> database encoding on the way back. This avoids a number of
>> observed anomalies, and ensures Perl a consistent view of the
>> world.
>>
>> Some minor code cleanups are also accomplished.
>>
>> Alex Hunsaker, reviewed by Andy Colson.
>
>
> This has broken the buildfarm :-(

Drat.

> perl ppport.h reports:
>
>   *** WARNING: Uses HeUTF8, which may not be portable below perl
>   5.11.0, even with 'ppport.h'
> Experimentation on a CentOS machine suggests we can cure it with this:
>
>   #ifndef HeUTF8
>   #define HeUTF8(he)             ((HeKLEN(he) == HEf_SVKEY) ?            \
>                                    SvUTF8(HeKEY_sv(he))
>   :                 \
>                                    (U32)HeKUTF8(he))
>   #endif

Yeah, that should work. BTW I would have loved to add some regression
tests for some of this (like the example hek2cstr states). Is there
any way to do that?


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-02-07 02:52:48
Message-ID: 4D4F5E80.8090809@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 02/06/2011 09:13 PM, Alex Hunsaker wrote:
> I would have loved to add some regression
> tests for some of this (like the example hek2cstr states). Is there
> any way to do that?
>

I can't think of an obvious way. Anyone else?

cheers

ndrew


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-02-12 09:18:36
Message-ID: AANLkTinzzzJCJE=Ac_kZOOv4Hirogd9M2Wjjcecx_Si1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Sun, Feb 6, 2011 at 15:31, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Force strings passed to and from plperl to be in UTF8 encoding.
>
> String are converted to UTF8 on the way into perl and to the
> database encoding on the way back. This avoids a number of
> observed anomalies, and ensures Perl a consistent view of the
> world.

So I noticed a problem while playing with this in my discussion with
David Wheeler. pg_do_encoding() does nothing when the src encoding ==
the dest encoding. That means on a UTF-8 database we fail make sure
our strings are valid utf8.

An easy way to see this is to embed a null in the middle of a string:
=> create or replace function zerob() returns text as $$ return
"abcd\0efg"; $$ language plperl;
=> SELECT zerob();
abcd

Also It seems bogus to bogus to do any encoding conversion when we are
SQL_ASCII, and its really trivial to fix.

With the attached:
- when we are on a utf8 database make sure to verify our output string
in sv2cstr (we assume database strings coming in are already valid)

- Do no string conversion when we are SQL_ASCII in or out

- add plperl_helpers.h as a dep to plperl.o in our makefile

- remove some redundant calls to pg_verify_mbstr()

- as utf_e2u only as one caller dont pstrdup() instead have the caller
check (saves some cycles and memory)

Attachment Content-Type Size
plperl_utf8_mbverify.patch text/x-patch 4.2 KB

From: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-10-03 10:20:30
Message-ID: CACoZds1SE7=+8iuoS2BntHwwX+nhcq57pH1qF5sN+-eqn5Omhg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 12 February 2011 14:48, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> On Sun, Feb 6, 2011 at 15:31, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> Force strings passed to and from plperl to be in UTF8 encoding.
>>
>> String are converted to UTF8 on the way into perl and to the
>> database encoding on the way back. This avoids a number of
>> observed anomalies, and ensures Perl a consistent view of the
>> world.
>
> So I noticed a problem while playing with this in my discussion with
> David Wheeler. pg_do_encoding() does nothing when the src encoding ==
> the dest encoding. That means on a UTF-8 database we fail make sure
> our strings are valid utf8.
>
> An easy way to see this is to embed a null in the middle of a string:
> => create or replace function zerob() returns text as $$ return
> "abcd\0efg"; $$ language plperl;
> => SELECT zerob();
> abcd
>
> Also It seems bogus to bogus to do any encoding conversion when we are
> SQL_ASCII, and its really trivial to fix.
>
> With the attached:
> - when we are on a utf8 database make sure to verify our output string
> in sv2cstr (we assume database strings coming in are already valid)
>
> - Do no string conversion when we are SQL_ASCII in or out
>
> - add plperl_helpers.h as a dep to plperl.o in our makefile
>
> - remove some redundant calls to pg_verify_mbstr()
>
> - as utf_e2u only as one caller dont pstrdup() instead have the caller
> check (saves some cycles and memory)
>

Is there a plan to commit this issue? I am still seeing this issue on
PG 9.1 STABLE branch. Attached is a small patch that targets only the
specific issue in the described testcase :

create or replace function zerob() returns text as $$ return
"abcd\0efg"; $$ language plperl;
SELECT zerob();

The patch does the perl data validation in the function utf_u2e() itself.

>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Attachment Content-Type Size
verify_in_utf_u2e.patch text/x-diff 490 bytes

From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-10-03 17:07:01
Message-ID: CAFaPBrTRG2+4rA+QowWWx0dHrhXNBG0eoP+dhEKS19-yTJ9i2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Mon, Oct 3, 2011 at 04:20, Amit Khandekar
<amit(dot)khandekar(at)enterprisedb(dot)com> wrote:

> Is there a plan to commit this issue? I am still seeing this issue on
> PG 9.1 STABLE branch. Attached is a small patch that targets only the
> specific issue in the described testcase :
>
> create or replace function zerob() returns text as $$ return
> "abcd\0efg"; $$ language plperl;
> SELECT zerob();
>
> The patch does the perl data validation in the function utf_u2e() itself.

I think thats fine, but as coded it will verify the string twice in
the GetDatabaseEncoding() != PG_UTF8 case (once for
pg_do_encoding_conversion() and again with the added
pg_verify_mbstr_len), which seems a bit wasteful.

It might be worth adding a regression test also...


From: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-10-04 05:35:21
Message-ID: CACoZds1fs8kpQvdLoMnvC7+KPLDRJ+-MHDf74jBahEiaJofUdg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 3 October 2011 22:37, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> On Mon, Oct 3, 2011 at 04:20, Amit Khandekar
> <amit(dot)khandekar(at)enterprisedb(dot)com> wrote:
>
>> Is there a plan to commit this issue? I am still seeing this issue on
>> PG 9.1 STABLE branch. Attached is a small patch that targets only the
>> specific issue in the described testcase :
>>
>> create or replace function zerob() returns text as $$ return
>> "abcd\0efg"; $$ language plperl;
>> SELECT zerob();
>>
>> The patch does the perl data validation in the function utf_u2e() itself.
>
> I think thats fine, but as coded it will verify the string twice in
> the GetDatabaseEncoding() != PG_UTF8 case (once for
> pg_do_encoding_conversion() and again with the added
> pg_verify_mbstr_len), which seems a bit wasteful.

WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
utf8_str, so pg_verify_mbstr_len() will not get called. That's the
reason, pg_verify_mbstr_len() is under the ( ret == utf8_str )
condition.

>
> It might be worth adding a regression test also...

I could not find any basic pl/perl tests in the regression
serial_schedule. I am not sure if we want to add just this scenario
without any basic tests for pl/perl ?

>


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
Cc: Alex Hunsaker <badalex(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-10-04 06:43:53
Message-ID: 4E8AAB29.4090808@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 04.10.2011 08:35, Amit Khandekar wrote:
> On 3 October 2011 22:37, Alex Hunsaker<badalex(at)gmail(dot)com> wrote:
>> It might be worth adding a regression test also...
>
> I could not find any basic pl/perl tests in the regression
> serial_schedule. I am not sure if we want to add just this scenario
> without any basic tests for pl/perl ?

See src/pl/plperl/[sql|expected]

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-10-04 08:34:14
Message-ID: CAFaPBrT8Na6d-dTV+hmRzZ==6uwdWnQBMpbc3yCMHdiP9emL3w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Mon, Oct 3, 2011 at 23:35, Amit Khandekar
<amit(dot)khandekar(at)enterprisedb(dot)com> wrote:

> WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
> utf8_str, so pg_verify_mbstr_len() will not get called. That's the
> reason, pg_verify_mbstr_len() is under the ( ret == utf8_str )
> condition.

Consider a latin1 database where utf8_str was a string of ascii
characters. Then no conversion would take place and ret == utf8_str
but the string would be verified by pg_do_encdoing_conversion() and
verified again by your added check :-).

>> It might be worth adding a regression test also...
>
> I could not find any basic pl/perl tests in the regression
> serial_schedule. I am not sure if we want to add just this scenario
> without any basic tests for pl/perl ?

I went ahead and added one in the attached based upon your example.

Look ok to you?

BTW thanks for the patch!

[ side note ]
I still think we should not be doing any conversion in the SQL_ASCII
case but this slimmed down patch should be less controversial.

Attachment Content-Type Size
plperl_verify_utf_u2e_v2.patch text/x-patch 2.6 KB

From: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-10-04 09:09:44
Message-ID: CACoZds1kqNZrr_6SWGfNCbHUpMDC0S_NP4zYu84MpMXw03tnfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 4 October 2011 14:04, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> On Mon, Oct 3, 2011 at 23:35, Amit Khandekar
> <amit(dot)khandekar(at)enterprisedb(dot)com> wrote:
>
>> WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
>> utf8_str, so pg_verify_mbstr_len() will not get called. That's the
>> reason, pg_verify_mbstr_len() is under the ( ret == utf8_str )
>> condition.
>
> Consider a latin1 database where utf8_str was a string of ascii
> characters. Then no conversion would take place and ret == utf8_str
> but the string would be verified by pg_do_encdoing_conversion() and
> verified again by your added check :-).
>
>>> It might be worth adding a regression test also...
>>
>> I could not find any basic pl/perl tests in the regression
>> serial_schedule. I am not sure if we want to add just this scenario
>> without any basic tests for pl/perl ?
>
> I went ahead and added one in the attached based upon your example.
>
> Look ok to you?
>

+ if(GetDatabaseEncoding() == PG_UTF8)
+ pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);

In your patch, the above will again skip mb-validation if the database
encoding is SQL_ASCII. Note that in pg_do_encoding_conversion returns
the un-converted string even if *one* of the src and dest encodings is
SQL_ASCII.

I think :
if (ret == utf8_str)
+ {
+ pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
ret = pstrdup(ret);
+ }

This (ret == utf8_str) condition would be a reliable way for knowing
whether pg_do_encoding_conversion() has done the conversion at all.

I am ok with the new test. Thanks for doing it yourself !

> BTW thanks for the patch!
>
> [ side note ]
> I still think we should not be doing any conversion in the SQL_ASCII
> case but this slimmed down patch should be less controversial.
>


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-10-04 17:27:32
Message-ID: CAFaPBrT9EpWWcHYrvUv5OQQz6Vgg+xQX0mE1ZaQguVUuXo6-Rg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Tue, Oct 4, 2011 at 03:09, Amit Khandekar
<amit(dot)khandekar(at)enterprisedb(dot)com> wrote:
> On 4 October 2011 14:04, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>> On Mon, Oct 3, 2011 at 23:35, Amit Khandekar
>> <amit(dot)khandekar(at)enterprisedb(dot)com> wrote:
>>
>>> WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
>>> utf8_str, so pg_verify_mbstr_len() will not get called. [...]
>>
>> Consider a latin1 database where utf8_str was a string of ascii
>> characters. [...]

>> [Patch] Look ok to you?
>>
>
> +       if(GetDatabaseEncoding() == PG_UTF8)
> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>
> In your patch, the above will again skip mb-validation if the database
> encoding is SQL_ASCII. Note that in pg_do_encoding_conversion returns
> the un-converted string even if *one* of the src and dest encodings is
> SQL_ASCII.

*scratches head* I thought the point of SQL_ASCII was no encoding
conversion was done and so there would be nothing to verify.

Ahh I see looks like pg_verify_mbstr_len() will make sure there are no
NULL bytes in the string when we are a single byte encoding.

> I think :
>        if (ret == utf8_str)
> +       {
> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>                ret = pstrdup(ret);
> +       }
>
> This (ret == utf8_str) condition would be a reliable way for knowing
> whether pg_do_encoding_conversion() has done the conversion at all.

Yes. However (and maybe im nitpicking here), I dont see any reason to
verify certain strings twice if we can avoid it.

What do you think about:
+ /*
+ * when we are a PG_UTF8 or SQL_ASCII database pg_do_encoding_conversion()
+ * will not do any conversion or verification. we need to do it
manually instead.
+ */
+ if( GetDatabaseEncoding() == PG_UTF8 ||
GetDatabaseEncoding() == SQL_ASCII)
+ pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);


From: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-10-05 05:46:02
Message-ID: CACoZds04PaVcuV_GWdqBuBoaaUO52w-wiPROeRp9ZN+f6BvVtQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 4 October 2011 22:57, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> On Tue, Oct 4, 2011 at 03:09, Amit Khandekar
> <amit(dot)khandekar(at)enterprisedb(dot)com> wrote:
>> On 4 October 2011 14:04, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>>> On Mon, Oct 3, 2011 at 23:35, Amit Khandekar
>>> <amit(dot)khandekar(at)enterprisedb(dot)com> wrote:
>>>
>>>> WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
>>>> utf8_str, so pg_verify_mbstr_len() will not get called. [...]
>>>
>>> Consider a latin1 database where utf8_str was a string of ascii
>>> characters. [...]
>
>>> [Patch] Look ok to you?
>>>
>>
>> +       if(GetDatabaseEncoding() == PG_UTF8)
>> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>>
>> In your patch, the above will again skip mb-validation if the database
>> encoding is SQL_ASCII. Note that in pg_do_encoding_conversion returns
>> the un-converted string even if *one* of the src and dest encodings is
>> SQL_ASCII.
>
> *scratches head* I thought the point of SQL_ASCII was no encoding
> conversion was done and so there would be nothing to verify.
>
> Ahh I see looks like pg_verify_mbstr_len() will make sure there are no
> NULL bytes in the string when we are a single byte encoding.
>
>> I think :
>>        if (ret == utf8_str)
>> +       {
>> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>>                ret = pstrdup(ret);
>> +       }
>>
>> This (ret == utf8_str) condition would be a reliable way for knowing
>> whether pg_do_encoding_conversion() has done the conversion at all.
>
> Yes. However (and maybe im nitpicking here), I dont see any reason to
> verify certain strings twice if we can avoid it.
>
> What do you think about:
> +    /*
> +    * when we are a PG_UTF8 or SQL_ASCII database pg_do_encoding_conversion()
> +    * will not do any conversion or verification. we need to do it
> manually instead.
> +    */
> +       if( GetDatabaseEncoding() == PG_UTF8 ||
>              GetDatabaseEncoding() == SQL_ASCII)
> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>

You mean the final changes in plperl_helpers.h would look like
something like this right? :

static inline char *
utf_u2e(const char *utf8_str, size_t len)
{
char *ret = (char *) pg_do_encoding_conversion((unsigned
char *) utf8_str, len, PG_UTF8, GetDatabaseEncoding());

if (ret == utf8_str)
+ {
+ if (GetDatabaseEncoding() == PG_UTF8 ||
+ GetDatabaseEncoding() == PG_SQL_ASCII)
+ {
+ pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
+ }
+
ret = pstrdup(ret);
+ }
return ret;
}

Yeah I am ok with that. It's just an additional check besides (ret ==
utf8_str) to know if we really require validation.


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-10-05 06:30:01
Message-ID: CAFaPBrSrsKFL7tJ2HM1Z6UvsjMGv19Q2vkDQi=rXSnYfE=Mv5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Tue, Oct 4, 2011 at 23:46, Amit Khandekar
<amit(dot)khandekar(at)enterprisedb(dot)com> wrote:
> On 4 October 2011 22:57, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>> On Tue, Oct 4, 2011 at 03:09, Amit Khandekar
>> <amit(dot)khandekar(at)enterprisedb(dot)com> wrote:
>>> On 4 October 2011 14:04, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>>>> On Mon, Oct 3, 2011 at 23:35, Amit Khandekar
>>>> <amit(dot)khandekar(at)enterprisedb(dot)com> wrote:
>>>>
>>>>> WHen GetDatabaseEncoding() != PG_UTF8 case, ret will not be equal to
>>>>> utf8_str, so pg_verify_mbstr_len() will not get called. [...]
>>>>
>>>> Consider a latin1 database where utf8_str was a string of ascii
>>>> characters. [...]
>>
>>>> [Patch] Look ok to you?
>>>>
>>>
>>> +       if(GetDatabaseEncoding() == PG_UTF8)
>>> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>>>
>>> In your patch, the above will again skip mb-validation if the database
>>> encoding is SQL_ASCII. Note that in pg_do_encoding_conversion returns
>>> the un-converted string even if *one* of the src and dest encodings is
>>> SQL_ASCII.
>>
>> *scratches head* I thought the point of SQL_ASCII was no encoding
>> conversion was done and so there would be nothing to verify.
>>
>> Ahh I see looks like pg_verify_mbstr_len() will make sure there are no
>> NULL bytes in the string when we are a single byte encoding.
>>
>>> I think :
>>>        if (ret == utf8_str)
>>> +       {
>>> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>>>                ret = pstrdup(ret);
>>> +       }
>>>
>>> This (ret == utf8_str) condition would be a reliable way for knowing
>>> whether pg_do_encoding_conversion() has done the conversion at all.
>>
>> Yes. However (and maybe im nitpicking here), I dont see any reason to
>> verify certain strings twice if we can avoid it.
>>
>> What do you think about:
>> +    /*
>> +    * when we are a PG_UTF8 or SQL_ASCII database pg_do_encoding_conversion()
>> +    * will not do any conversion or verification. we need to do it
>> manually instead.
>> +    */
>> +       if( GetDatabaseEncoding() == PG_UTF8 ||
>>              GetDatabaseEncoding() == SQL_ASCII)
>> +               pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>>
>
> You mean the final changes in plperl_helpers.h would look like
> something like this right? :
>
>  static inline char *
>  utf_u2e(const char *utf8_str, size_t len)
>  {
>        char       *ret = (char *) pg_do_encoding_conversion((unsigned
> char *) utf8_str, len, PG_UTF8, GetDatabaseEncoding());
>
>        if (ret == utf8_str)
> +       {
> +               if (GetDatabaseEncoding() == PG_UTF8 ||
> +                       GetDatabaseEncoding() == PG_SQL_ASCII)
> +               {
> +                       pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
> +               }
> +
>                ret = pstrdup(ret);
> +       }
>        return ret;
>  }

Yes.

> Yeah I am ok with that. It's just an additional check besides (ret ==
> utf8_str) to know if we really require validation.
>


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-10-05 06:59:47
Message-ID: CAFaPBrToG+yseU7vdMZ3uMmGgjeoYYEtec1MUWcN1kkuuCJ5Zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Oct 5, 2011 at 00:30, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> On Tue, Oct 4, 2011 at 23:46, Amit Khandekar
> <amit(dot)khandekar(at)enterprisedb(dot)com> wrote:

>> You mean the final changes in plperl_helpers.h would look like
>> something like this right? :
>>
>>  static inline char *
>>  utf_u2e(const char *utf8_str, size_t len)
>>  {
>>        char       *ret = (char *) pg_do_encoding_conversion((unsigned
>> char *) utf8_str, len, PG_UTF8, GetDatabaseEncoding());
>>
>>        if (ret == utf8_str)
>> +       {
>> +               if (GetDatabaseEncoding() == PG_UTF8 ||
>> +                       GetDatabaseEncoding() == PG_SQL_ASCII)
>> +               {
>> +                       pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>> +               }
>> +
>>                ret = pstrdup(ret);
>> +       }
>>        return ret;
>>  }
>

>> Yeah I am ok with that. It's just an additional check besides (ret ==
>> utf8_str) to know if we really require validation.

Find it attached. [ Note I didn't put the check inside the if (ret ==
utf8_str) as it seemed a bit cleaner (indentation wise) to have it
outside ]

Attachment Content-Type Size
plperl_verify_utf_u2e_v3.patch text/x-patch 2.9 KB

From: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-10-05 07:58:51
Message-ID: CACoZds0c_Ky9=9YmuhEiB+y8TBS-mxaPnT65wzstHqy4JPg53A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 5 October 2011 12:29, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> On Wed, Oct 5, 2011 at 00:30, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>> On Tue, Oct 4, 2011 at 23:46, Amit Khandekar
>> <amit(dot)khandekar(at)enterprisedb(dot)com> wrote:
>
>>> You mean the final changes in plperl_helpers.h would look like
>>> something like this right? :
>>>
>>>  static inline char *
>>>  utf_u2e(const char *utf8_str, size_t len)
>>>  {
>>>        char       *ret = (char *) pg_do_encoding_conversion((unsigned
>>> char *) utf8_str, len, PG_UTF8, GetDatabaseEncoding());
>>>
>>>        if (ret == utf8_str)
>>> +       {
>>> +               if (GetDatabaseEncoding() == PG_UTF8 ||
>>> +                       GetDatabaseEncoding() == PG_SQL_ASCII)
>>> +               {
>>> +                       pg_verify_mbstr_len(PG_UTF8, utf8_str, len, false);
>>> +               }
>>> +
>>>                ret = pstrdup(ret);
>>> +       }
>>>        return ret;
>>>  }
>>
>
>>> Yeah I am ok with that. It's just an additional check besides (ret ==
>>> utf8_str) to know if we really require validation.
>
> Find it attached. [ Note I didn't put the check inside the if (ret ==
> utf8_str) as it seemed a bit cleaner (indentation wise) to have it
> outside ]

I have no more issues with the patch.
Thanks!
-Amit

>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
Cc: Alex Hunsaker <badalex(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-10-05 14:18:16
Message-ID: CA+TgmoZdot8jaW=tphRi0RE5yO-RszqT75RP-117Q-gUYgqdsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Oct 5, 2011 at 3:58 AM, Amit Khandekar
<amit(dot)khandekar(at)enterprisedb(dot)com> wrote:
> I have no more issues with the patch.
> Thanks!

I think this patch needs to be added to the open CommitFest, with
links to the reviews, and marked Ready for Committer.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-10-05 21:03:41
Message-ID: CAFaPBrS+DXyXiyG80pk2D0qO7auumMjacU02qO61q5gczt=F1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Oct 5, 2011 at 08:18, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Oct 5, 2011 at 3:58 AM, Amit Khandekar
> <amit(dot)khandekar(at)enterprisedb(dot)com> wrote:
>> I have no more issues with the patch.
>> Thanks!
>
> I think this patch needs to be added to the open CommitFest, with
> links to the reviews, and marked Ready for Committer.

The open commitfest? Even if its an "important" bug fix that should be
backpatched?


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-10-06 02:36:27
Message-ID: CA+TgmoYUVsud0yQPF4M=7OX3SWyJCV9gsW47KK60qfBTaAzVLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Oct 5, 2011 at 5:03 PM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> On Wed, Oct 5, 2011 at 08:18, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Oct 5, 2011 at 3:58 AM, Amit Khandekar
>> <amit(dot)khandekar(at)enterprisedb(dot)com> wrote:
>>> I have no more issues with the patch.
>>> Thanks!
>>
>> I think this patch needs to be added to the open CommitFest, with
>> links to the reviews, and marked Ready for Committer.
>
> The open commitfest? Even if its an "important" bug fix that should be
> backpatched?

Considering that the issue appears to have been ignored from
mid-February until early October, I don't see why it should now get to
jump to the head of the queue. Other people may have different
opinions, of course.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alex Hunsaker <badalex(at)gmail(dot)com>, Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-10-06 16:19:23
Message-ID: 7E85AF2F-7B0D-4BF9-A2CE-8532628E2B0A@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Oct 5, 2011, at 7:36 PM, Robert Haas wrote:

>> The open commitfest? Even if its an "important" bug fix that should be
>> backpatched?
>
> Considering that the issue appears to have been ignored from
> mid-February until early October, I don't see why it should now get to
> jump to the head of the queue. Other people may have different
> opinions, of course.

Proper encoding of text data to and from Perl is more important every day, as it's used for more and more uses beyond ASCII. I think it's important to backport such issues modulo behavioral changes.

Best,

David


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-10-07 16:51:46
Message-ID: CAFaPBrRRF0pmLr9=riCz30o8EKTB6x1A-HkHmxv40H=GO_5ypQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Oct 5, 2011 at 20:36, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Oct 5, 2011 at 5:03 PM, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
>> On Wed, Oct 5, 2011 at 08:18, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Wed, Oct 5, 2011 at 3:58 AM, Amit Khandekar
>>> <amit(dot)khandekar(at)enterprisedb(dot)com> wrote:
>>>> I have no more issues with the patch.
>>>> Thanks!
>>>
>>> I think this patch needs to be added to the open CommitFest, with
>>> links to the reviews, and marked Ready for Committer.
>>
>> The open commitfest? Even if its an "important" bug fix that should be
>> backpatched?
>
> Considering that the issue appears to have been ignored from
> mid-February until early October, I don't see why it should now get to
> jump to the head of the queue.  Other people may have different
> opinions, of course.

Added. :-)


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-11-02 23:12:24
Message-ID: 4EB1CE58.2010006@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 10/07/2011 12:51 PM, Alex Hunsaker wrote:
> On Wed, Oct 5, 2011 at 20:36, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Oct 5, 2011 at 5:03 PM, Alex Hunsaker<badalex(at)gmail(dot)com> wrote:
>>> On Wed, Oct 5, 2011 at 08:18, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>>>> On Wed, Oct 5, 2011 at 3:58 AM, Amit Khandekar
>>>> <amit(dot)khandekar(at)enterprisedb(dot)com> wrote:
>>>>> I have no more issues with the patch.
>>>>> Thanks!
>>>> I think this patch needs to be added to the open CommitFest, with
>>>> links to the reviews, and marked Ready for Committer.
>>> The open commitfest? Even if its an "important" bug fix that should be
>>> backpatched?
>> Considering that the issue appears to have been ignored from
>> mid-February until early October, I don't see why it should now get to
>> jump to the head of the queue. Other people may have different
>> opinions, of course.
> Added. :-)
>

I'm just starting to look at this, by way of a break in staring at
pg_dump code ;-). This just needs to be backpatched to 9.1, is that right?

cheers

andrew


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-11-02 23:39:19
Message-ID: CAFaPBrSDadqtUFLyXztoGsaJ4_BaKLynQFOACLML3h+bgsuEFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Nov 2, 2011 at 17:12, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>> Considering that the issue appears to have been ignored from
>>> mid-February until early October, I don't see why it should now get to
>>> jump to the head of the queue.  Other people may have different
>>> opinions, of course.
>>
>> Added. :-)
>>
>
> I'm just starting to look at this, by way of a break in staring at pg_dump
> code ;-). This just needs to be backpatched to 9.1, is that right?

Yes please.

9.0 did not have this problem (or at least if it did it was a separate issue).


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Amit Khandekar <amit(dot)khandekar(at)enterprisedb(dot)com>
Cc: Alex Hunsaker <badalex(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Force strings passed to and from plperl to be in UTF8 encoding.
Date: 2011-11-26 17:23:30
Message-ID: 4ED12092.4040100@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 10/05/2011 03:58 AM, Amit Khandekar wrote:
> On 5 October 2011 12:29, Alex Hunsaker<badalex(at)gmail(dot)com> wrote:
>>
>> Find it attached. [ Note I didn't put the check inside the if (ret ==
>> utf8_str) as it seemed a bit cleaner (indentation wise) to have it
>> outside ]
> I have no more issues with the patch.
>

Committed.

cheers

andrew