Initial refactoring of plperl.c - updated

Lists: pgsql-hackers
From: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Subject: Initial refactoring of plperl.c - rebased [PATCH]
Date: 2009-12-02 15:04:01
Message-ID: 20091202150401.GA97781@timac.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've attached an update of my previous refactoring of plperl.c.
It's been rebased over the current (git) HEAD and has a few
very minor additions.

Background:

I've started work on the enhancements to plperl I outlined on pg-general
(in the "Wishlist of PL/Perl Enhancements for 8.5" thread).
I have a working implementation of those changes, plus some performance
enhancements, that I'm now re-working into a clean set of tested and
polished patches.

This patch is a first step that doesn't add any extra functionality.
It refactors the internals to make adding the extra functionality
easier (and more clearly visible).

Changes in this patch:

- Changed MULTIPLICITY check from runtime to compiletime.
No loads the large Config module.
- Changed plperl_init_interp() to return new interp
and not alter the global interp_state
- Moved plperl_safe_init() call into check_interp().
- Removed plperl_safe_init_done state variable
as interp_state now covers that role.
- Changed plperl_create_sub() to take a plperl_proc_desc argument.
- Simplified return value handling in plperl_create_sub.
- Added a test for the effect of the utf8fix function.
- Changed perl.com link in the docs to perl.org and tweaked
wording to clarify that require, not use, is what's blocked.
- Moved perl code in large multi-line C string literal macros
out to plc_*.pl files.
- Added a test2macro.pl utility to convert the plc_*.pl files to
macros in a perlchunks.h file which is #included
Additions since previous verion:
- Replaced calls to SvPV(val, PL_na) with SvPV_nolen(val)
- Simplifed plperl_safe_init() slightly
- Removed trailing whitespace from new plc_*.pl files.

I'd appreciate any feedback on the patch.

Tim.

Attachment Content-Type Size
master-plperl-refactor3.patch text/x-patch 30.3 KB

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial refactoring of plperl.c - rebased [PATCH]
Date: 2009-12-03 23:47:21
Message-ID: 4B184E09.20208@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tim,

Since there's a commitfest on right now, meaningful feedback on your
patch could be delayed. Just so you know.

--Josh Berkus


From: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial refactoring of plperl.c - rebased [PATCH]
Date: 2009-12-04 19:19:28
Message-ID: 20091204191928.GB89699@timac.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 03, 2009 at 03:47:21PM -0800, Josh Berkus wrote:
> Tim,
>
> Since there's a commitfest on right now, meaningful feedback on your
> patch could be delayed. Just so you know.

Understood. Thanks Josh.

Tim.


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial refactoring of plperl.c - rebased [PATCH]
Date: 2009-12-25 17:54:13
Message-ID: 4B34FC45.3000208@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tim Bunce wrote:
> I've attached an update of my previous refactoring of plperl.c.
> It's been rebased over the current (git) HEAD and has a few
> very minor additions.
>
>
[snip]
> + -- Test compilation of unicode regex
> + --
> + CREATE OR REPLACE FUNCTION perl_unicode_regex(text) RETURNS INTEGER AS $$
> + # see http://rt.perl.org/rt3/Ticket/Display.html?id=47576
> + return ($_[0] =~ /\x{263A}|happy/i) ? 1 : 0; # unicode smiley
> + $$ LANGUAGE plperl;
>

This test is failing on my setup at least when the target db is not UTF8
encoded.

Maybe that's a bug we need to fix?

cheers

andrew


From: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial refactoring of plperl.c - rebased [PATCH]
Date: 2009-12-25 20:16:02
Message-ID: 20091225201602.GC69056@timac.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 25, 2009 at 12:54:13PM -0500, Andrew Dunstan wrote:
> Tim Bunce wrote:
>> I've attached an update of my previous refactoring of plperl.c.
>> It's been rebased over the current (git) HEAD and has a few
>> very minor additions.
>>
> [snip]
>> + -- Test compilation of unicode regex
>> + --
>> + CREATE OR REPLACE FUNCTION perl_unicode_regex(text) RETURNS INTEGER AS $$
>> + # see http://rt.perl.org/rt3/Ticket/Display.html?id=47576
>> + return ($_[0] =~ /\x{263A}|happy/i) ? 1 : 0; # unicode smiley
>> + $$ LANGUAGE plperl;
>
> This test is failing on my setup at least when the target db is not UTF8
> encoded.
>
> Maybe that's a bug we need to fix?

Yes. I believe the test is highlighting an existing problem: that plperl
function in non-PG_UTF8 databases can't use regular expressions that
require unicode character meta-data.

Either the (GetDatabaseEncoding() == PG_UTF8) test in plperl_safe_init()
should be removed, so the utf8fix function is always called, or the
test should be removed (or hacked to only apply to PG_UTF8 databases).

Tim.

p.s. There may be other problems using unicode in non-PG_UTF8 databases,
but I believe this patch doesn't change the behaviour for better or worse.


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial refactoring of plperl.c - rebased [PATCH]
Date: 2009-12-25 20:59:05
Message-ID: 4B352799.1070203@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tim Bunce wrote:
> On Fri, Dec 25, 2009 at 12:54:13PM -0500, Andrew Dunstan wrote:
>
>> Tim Bunce wrote:
>>
>>> I've attached an update of my previous refactoring of plperl.c.
>>> It's been rebased over the current (git) HEAD and has a few
>>> very minor additions.
>>>
>>>
>> [snip]
>>
>>> + -- Test compilation of unicode regex
>>> + --
>>> + CREATE OR REPLACE FUNCTION perl_unicode_regex(text) RETURNS INTEGER AS $$
>>> + # see http://rt.perl.org/rt3/Ticket/Display.html?id=47576
>>> + return ($_[0] =~ /\x{263A}|happy/i) ? 1 : 0; # unicode smiley
>>> + $$ LANGUAGE plperl;
>>>
>> This test is failing on my setup at least when the target db is not UTF8
>> encoded.
>>
>> Maybe that's a bug we need to fix?
>>
>
> Yes. I believe the test is highlighting an existing problem: that plperl
> function in non-PG_UTF8 databases can't use regular expressions that
> require unicode character meta-data.
>
> Either the (GetDatabaseEncoding() == PG_UTF8) test in plperl_safe_init()
> should be removed, so the utf8fix function is always called, or the
> test should be removed (or hacked to only apply to PG_UTF8 databases).
>

I tried forcing the test, but it doesn't seem to work, possibly because
in the case that the db is not utf8 we aren't forcing argument strings
to UTF8 :-(

I think we might need to remove the test from the patch.

>
> p.s. There may be other problems using unicode in non-PG_UTF8 databases,
> but I believe this patch doesn't change the behaviour for better or worse.
>
>

Right.

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial refactoring of plperl.c - rebased [PATCH]
Date: 2010-01-04 23:38:03
Message-ID: 4B427BDB.8010404@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
>>
>> Yes. I believe the test is highlighting an existing problem: that plperl
>> function in non-PG_UTF8 databases can't use regular expressions that
>> require unicode character meta-data.
>>
>> Either the (GetDatabaseEncoding() == PG_UTF8) test in plperl_safe_init()
>> should be removed, so the utf8fix function is always called, or the
>> test should be removed (or hacked to only apply to PG_UTF8 databases).
>>
>
>
> I tried forcing the test, but it doesn't seem to work, possibly
> because in the case that the db is not utf8 we aren't forcing argument
> strings to UTF8 :-(
>
> I think we might need to remove the test from the patch.
>
>

I have not been able to come up with a fix for this - the whole thing
seems very fragile. I'm going to commit what remains of this patch, but
not add the extra regression test. I'll add a TODO to allow plperl to do
utf8 operations in non-utf8 databases.

cheers

andrew


From: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Initial refactoring of plperl.c - updated
Date: 2010-01-08 12:46:13
Message-ID: 20100108124613.GL2505@timac.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 04, 2010 at 06:38:03PM -0500, Andrew Dunstan wrote:
> Andrew Dunstan wrote:
> >>
> >>Yes. I believe the test is highlighting an existing problem: that plperl
> >>function in non-PG_UTF8 databases can't use regular expressions that
> >>require unicode character meta-data.
> >>
> >>Either the (GetDatabaseEncoding() == PG_UTF8) test in plperl_safe_init()
> >>should be removed, so the utf8fix function is always called, or the
> >>test should be removed (or hacked to only apply to PG_UTF8 databases).
> >
> >I tried forcing the test, but it doesn't seem to work, possibly
> >because in the case that the db is not utf8 we aren't forcing
> >argument strings to UTF8 :-(
> >
> >I think we might need to remove the test from the patch.
>
> I have not been able to come up with a fix for this - the whole
> thing seems very fragile. I'm going to commit what remains of this
> patch, but not add the extra regression test. I'll add a TODO to
> allow plperl to do utf8 operations in non-utf8 databases.

I see you've not commited it yet, so to help out I've attached
a new diff, over the current CVS head, with two minor changes:

- Removed the test, as noted above.
- Optimized pg_verifymbstr calls to avoid unneeded strlen()s.

This should apply cleanly to cvs, saving you the need to resolve the
conflicts caused by the recent pg_verifymbstr patch.
I'll add it to the commitfest once it reaches the archives.

Tim.

Attachment Content-Type Size
master-a3-plperl-refactor2.patch text/x-patch 29.5 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial refactoring of plperl.c - updated
Date: 2010-01-09 02:47:01
Message-ID: 4B47EE25.5080400@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tim Bunce wrote:
>
> I see you've not commited it yet, so to help out I've attached
> a new diff, over the current CVS head, with two minor changes:
>
> - Removed the test, as noted above.
> - Optimized pg_verifymbstr calls to avoid unneeded strlen()s.
>
>

I have committed this with minor edits. That should give us a clean base
for the features patch(es).

cheers

andrew


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial refactoring of plperl.c - updated
Date: 2010-01-09 21:16:18
Message-ID: 1263071778.1339.22.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On fre, 2010-01-08 at 12:46 +0000, Tim Bunce wrote:
> *** 45,50 ****
> --- 45,55 ----
>
> include $(top_srcdir)/src/Makefile.shlib
>
> + plperl.o: perlchunks.h
> +
> + perlchunks.h: plc_*.pl
> + $(PERL) text2macro.pl --strip='^(\#.*|\s*)$$' plc_*.pl >
> perlchunks.htmp
> + mv perlchunks.htmp perlchunks.h
>
> all: all-lib
>

What's the reason for the temp file here?


From: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial refactoring of plperl.c - updated
Date: 2010-01-09 21:27:03
Message-ID: 20100109212703.GB2481@timac.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 08, 2010 at 09:47:01PM -0500, Andrew Dunstan wrote:
> Tim Bunce wrote:
> >
> >I see you've not commited it yet, so to help out I've attached
> >a new diff, over the current CVS head, with two minor changes:
> >
> >- Removed the test, as noted above.
> >- Optimized pg_verifymbstr calls to avoid unneeded strlen()s.
>
> I have committed this with minor edits. That should give us a clean
> base for the features patch(es).

Wonderful. Many thanks Andrew.

Tim.


From: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial refactoring of plperl.c - updated
Date: 2010-01-09 23:49:22
Message-ID: 20100109234922.GE2481@timac.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jan 09, 2010 at 11:16:18PM +0200, Peter Eisentraut wrote:
> On fre, 2010-01-08 at 12:46 +0000, Tim Bunce wrote:
> > *** 45,50 ****
> > --- 45,55 ----
> >
> > include $(top_srcdir)/src/Makefile.shlib
> >
> > + plperl.o: perlchunks.h
> > +
> > + perlchunks.h: plc_*.pl
> > + $(PERL) text2macro.pl --strip='^(\#.*|\s*)$$' plc_*.pl >
> > perlchunks.htmp
> > + mv perlchunks.htmp perlchunks.h
> >
> > all: all-lib
>
> What's the reason for the temp file here?

Defensive. If the text2macro.pl program fails/dies then you'd be left
with a broken output file with a newer timestamp, so the next make
wouldn't rerun text2macro.pl.

Tim.

p.s. In the makefile for perl we use a little utility called mv_if_diff
instead of a plain mv so that any downstream dependencies only get
rebuilt if the contents have changed.


From: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial refactoring of plperl.c - updated
Date: 2010-01-10 00:03:33
Message-ID: 20100110000333.GF2481@timac.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jan 09, 2010 at 11:49:22PM +0000, Tim Bunce wrote:
> On Sat, Jan 09, 2010 at 11:16:18PM +0200, Peter Eisentraut wrote:
> > On fre, 2010-01-08 at 12:46 +0000, Tim Bunce wrote:
> > > *** 45,50 ****
> > > --- 45,55 ----
> > >
> > > include $(top_srcdir)/src/Makefile.shlib
> > >
> > > + plperl.o: perlchunks.h
> > > +
> > > + perlchunks.h: plc_*.pl
> > > + $(PERL) text2macro.pl --strip='^(\#.*|\s*)$$' plc_*.pl >
> > > perlchunks.htmp
> > > + mv perlchunks.htmp perlchunks.h
> > >
> > > all: all-lib
> >
> > What's the reason for the temp file here?
>
> Defensive. If the text2macro.pl program fails/dies then you'd be left
> with a broken output file with a newer timestamp, so the next make
> wouldn't rerun text2macro.pl.

An alternative would be for text2macro.pl to write to a temp file and
rename at the end.

Tim.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial refactoring of plperl.c - updated
Date: 2010-01-10 00:19:36
Message-ID: 1263082776.1339.35.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On sön, 2010-01-10 at 00:03 +0000, Tim Bunce wrote:
> On Sat, Jan 09, 2010 at 11:49:22PM +0000, Tim Bunce wrote:
> > On Sat, Jan 09, 2010 at 11:16:18PM +0200, Peter Eisentraut wrote:
> > > On fre, 2010-01-08 at 12:46 +0000, Tim Bunce wrote:
> > > > *** 45,50 ****
> > > > --- 45,55 ----
> > > >
> > > > include $(top_srcdir)/src/Makefile.shlib
> > > >
> > > > + plperl.o: perlchunks.h
> > > > +
> > > > + perlchunks.h: plc_*.pl
> > > > + $(PERL) text2macro.pl --strip='^(\#.*|\s*)$$' plc_*.pl >
> > > > perlchunks.htmp
> > > > + mv perlchunks.htmp perlchunks.h
> > > >
> > > > all: all-lib
> > >
> > > What's the reason for the temp file here?
> >
> > Defensive. If the text2macro.pl program fails/dies then you'd be left
> > with a broken output file with a newer timestamp, so the next make
> > wouldn't rerun text2macro.pl.
>
> An alternative would be for text2macro.pl to write to a temp file and
> rename at the end.

Sounds better. I think any program should be written such that it
doesn't produce an output file at all if it cannot produce a correct
output file. So use a temp file or a trap or something like that. The
makefile should not have to clean up after everyone.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial refactoring of plperl.c - updated
Date: 2010-01-10 06:16:13
Message-ID: 25644.1263104173@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com> writes:
> On Sat, Jan 09, 2010 at 11:16:18PM +0200, Peter Eisentraut wrote:
>> What's the reason for the temp file here?

> Defensive. If the text2macro.pl program fails/dies then you'd be left
> with a broken output file with a newer timestamp, so the next make
> wouldn't rerun text2macro.pl.

Doesn't make forcibly remove the target file if the command fails?

[ rummages in manual... ] It does if you specify .DELETE_ON_ERROR,
which we do (in Makefile.global); so this bit of complication is
a waste.

regards, tom lane


From: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial refactoring of plperl.c - updated
Date: 2010-01-10 12:29:32
Message-ID: 20100110122932.GG2481@timac.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 10, 2010 at 01:16:13AM -0500, Tom Lane wrote:
> Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com> writes:
> > On Sat, Jan 09, 2010 at 11:16:18PM +0200, Peter Eisentraut wrote:
> >> What's the reason for the temp file here?
>
> > Defensive. If the text2macro.pl program fails/dies then you'd be left
> > with a broken output file with a newer timestamp, so the next make
> > wouldn't rerun text2macro.pl.
>
> Doesn't make forcibly remove the target file if the command fails?
>
> [ rummages in manual... ] It does if you specify .DELETE_ON_ERROR,
> which we do (in Makefile.global); so this bit of complication is
> a waste.

Okay.

Andrew, perhaps you could apply the attached to fix that. (Or I could
bundle it into one of the split out plperl feature patches.)

Tim.

Attachment Content-Type Size
master-fixtext2macro.patch text/x-patch 1.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Initial refactoring of plperl.c - updated
Date: 2010-01-10 18:11:11
Message-ID: 20753.1263147071@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com> writes:
> On Sun, Jan 10, 2010 at 01:16:13AM -0500, Tom Lane wrote:
>> Doesn't make forcibly remove the target file if the command fails?

> Andrew, perhaps you could apply the attached to fix that. (Or I could
> bundle it into one of the split out plperl feature patches.)

Applied.

regards, tom lane