Re: Careful PL/Perl Release Not Required

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Careful PL/Perl Release Not Required
Date: 2011-02-11 04:53:29
Message-ID: D08A9F21-8162-4891-975D-C8F51737181A@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Feb 10, 2011, at 5:28 PM, Alex Hunsaker wrote:

> Hrm? For UTF-8 databases, in practice, nothing should have changed--
> we already passed strings in as utf8. What I fixed was some corner
> cases where some strings did not always have character semantics. See
> The "Unicode Bug" and "Forcing Unicode in Perl" in perldoc perlunicode
> for the problem and more or less how I fixed it.

Uh…

try=# create function is_utf8(text) returns boolean language plperl AS 'utf8::is_utf8(shift)';
CREATE FUNCTION

try=# select is_utf8('whatever');
is_utf8
─────────
t
(1 row)

try=# select is_utf8(U&'\0441\043B\043E\043D');
is_utf8
─────────
t
(1 row)

Damn, I guess you're right. How did I miss that?

> The other thing that changed is non UTF-8 databases now also get
> character semantics. That is we convert from the database encoding
> into utf8 and visa versa on output. That probably should be noted
> somewhere...

Oh. I see. And Oleg's database wasn't utf-8 then, I guess. I'll have to re-read the JSON docs, I guess. Erm…feh. Okay. I have to pass the false value to utf8() *now*. Okay, at least that's more consistent.

> If you do have to change your semantics/functions, could you post an
> example? I'd like to make sure its because you were hitting one of
> those nasty corner cases and not something new is broken.

I think that people who have non-utf-8 databases might be surprised.

>> This probably won't be that common, but Oleg, for example, will need to convert his fixed function from:
>> ...
>
> Well assuming he fixed his bug by encoding uri_unescape's output he
> should not have to do anything. IIRC the problem was basically double
> encoded utf8, not a postgres bug.

No, the problem was that the string was passed to his pl/perl function encoded in utf-8. He added a line to decode it to Perl's internal form. Once he goes to 9.1, unless the database is SQL_ASCII, he can dump the decode() line. I think.

> [ he had %3A%4A or something, uri_decode() decodes that to _two_
> characters because _it_ knows nothing about utf8. so you would need to
> call utf8::decode() on the result to turn those two bytes into a
> character ]

No, he had to add the decode line, IIRC:

CREATE OR REPLACE FUNCTION url_decode(Vkw varchar) RETURNS varchar AS $$
use strict;
use URI::Escape;
utf8::decode($_[0]);
return uri_unescape($_[0]); $$ LANGUAGE plperlu;

Because uri_unescape() needs its argument to be decoded to Perl's internal form. On 9.1, it will be, so he won't need to call utf8::decode(). That is, in a latin-1 database:

latin=# create or replace function is_utf8(text) returns boolean language plperl AS 'utf8::is_utf8(shift) ? 1 : 0';
CREATE FUNCTION
Time: 1.934 ms
latin=# select is_utf8('whatever'); is_utf8
─────────
f
(1 row)

That will change, if I understand correctly.

>> So this needs to be highlighted in the release notes: If a PL/Perl function is currently relying on a parameter passed in bytes, it will >need to be modified to deal with utf8 strings, instead.
>
> FYI Andrew did add some docs.

Yeah, I was thinking of the release notes. Those who have non-uft-8 databases might be surprised if their PL/Perl functions expect strings to be passed as bytes.

> Thanks for keeping a sharp eye out.
>
> [ P.S. This stuff is confusing as hell, im just glad I got a sucker to
> commit it *waves* at Andrew :-) ]

Heh, well done. Frankly, though, this stuff isn't *that* hard. It's Perl's terminology that's really bad.

Best,

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2011-02-11 05:04:41 Re: ALTER EXTENSION UPGRADE, v3
Previous Message Robert Haas 2011-02-11 04:01:33 Re: Spread checkpoint sync