Re: First feature patch for plperl - draft [PATCH]

Lists: pgsql-hackers
From: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Subject: First feature patch for plperl - draft [PATCH]
Date: 2009-12-03 23:30:29
Message-ID: 20091203233029.GA86442@timac.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Building on my earlier plperl refactoring patch, here's a draft of my
first plperl feature patch.

Significant changes in this patch:

- New GUC plperl.on_perl_init='...perl...' for admin use.
- New GUC plperl.on_trusted_init='...perl...' for plperl user use.
- New GUC plperl.on_untrusted_init='...perl...' for plperlu user use.
- END blocks now run at backend exit (fixes bug #5066).
- Stored procedure subs are now given names ($name__$oid).
- More error checking and reporting.
- Warnings no longer have an extra newline in the NOTICE text.
- Various minor optimizations like pre-growing data structures.

I'm working on adding tests and documentation now, meanwhile I'd very
much appreciate any feedback on the patch.

Tim.

p.s. Once this patch is complete I plan to work on patches that:
- add quote_literal and quote_identifier functions in C.
- generalize the Safe setup code to enable more control.
- formalize namespace usage, moving things out of main::
- add a way to perform inter-sub calling (at least for simple cases).
- possibly rewrite _plperl_to_pg_array in C.

Attachment Content-Type Size
master-plperl-feature1.patch text/x-patch 24.6 KB

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 00:53:47
Message-ID: FA0B1B61-15FE-4A47-BC8A-4E9AC6391C3A@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 3, 2009, at 3:30 PM, Tim Bunce wrote:

> - New GUC plperl.on_perl_init='...perl...' for admin use.
> - New GUC plperl.on_trusted_init='...perl...' for plperl user use.
> - New GUC plperl.on_untrusted_init='...perl...' for plperlu user use.

Since there is no documentation yet, how do these work, exactly? Or should I just wait for the docs?

> - END blocks now run at backend exit (fixes bug #5066).
> - Stored procedure subs are now given names ($name__$oid).
> - More error checking and reporting.
> - Warnings no longer have an extra newline in the NOTICE text.
> - Various minor optimizations like pre-growing data structures.

Nice.

> I'm working on adding tests and documentation now, meanwhile I'd very
> much appreciate any feedback on the patch.
>
> Tim.
>
> p.s. Once this patch is complete I plan to work on patches that:
> - add quote_literal and quote_identifier functions in C.

I expect you can just use the C versions in PostgreSQL. They're in utils/builtins.h, along with quote_nullable(), which might also be useful to add.

> - generalize the Safe setup code to enable more control.
> - formalize namespace usage, moving things out of main::

Nice.

> - add a way to perform inter-sub calling (at least for simple cases).
> - possibly rewrite _plperl_to_pg_array in C.

Sounds great, Tim. I'm not really qualified to say anything about the C code, but I'd be happy to try it out once there are docs.

Best,

David


From: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 11:18:24
Message-ID: 20091204111824.GA86763@timac.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 03, 2009 at 04:53:47PM -0800, David E. Wheeler wrote:
> On Dec 3, 2009, at 3:30 PM, Tim Bunce wrote:
>
> > - New GUC plperl.on_perl_init='...perl...' for admin use.
> > - New GUC plperl.on_trusted_init='...perl...' for plperl user use.
> > - New GUC plperl.on_untrusted_init='...perl...' for plperlu user use.
>
> Since there is no documentation yet, how do these work, exactly? Or should I just wait for the docs?

The perl code in plperl.on_perl_init gets eval'd as soon as an
interpreter is created. That could be at server startup if
shared_preload_libraries is used. plperl.on_perl_init can only be set by
an admin (PGC_SUSET).

The perl code in plperl.on_trusted_init gets eval'd when an interpreter
is initialized into trusted mode, e.g., used for the plperl language.
The perl code is eval'd inside the Safe compartment.
plperl.on_trusted_init can be set by users but it's only useful if set
before the plperl interpreter is first used.

plperl.on_untrusted_init acts like plperl.on_trusted_init but for
plperlu code.

So, if all three were set then, before any perl stored procedure or DO
block is executed, the interpreter would have executed either
on_perl_init and then on_trusted_init (for plperl), or on_perl_init and
then on_untrusted_init (for plperlu).

> > - END blocks now run at backend exit (fixes bug #5066).
> > - Stored procedure subs are now given names ($name__$oid).
> > - More error checking and reporting.
> > - Warnings no longer have an extra newline in the NOTICE text.
> > - Various minor optimizations like pre-growing data structures.
>
> Nice.

Thanks.

> > I'm working on adding tests and documentation now, meanwhile I'd very
> > much appreciate any feedback on the patch.
> >
> > Tim.
> >
> > p.s. Once this patch is complete I plan to work on patches that:
> > - add quote_literal and quote_identifier functions in C.
>
> I expect you can just use the C versions in PostgreSQL. They're in utils/builtins.h,

That's my plan. (I've been discussing this and other issues with Andrew
Dunstan via IM.)

> along with quote_nullable(), which might also be useful to add.

I was planning to build that behaviour into quote_literal since it fits
naturally into perl's idea of undef and mirrors DBI's quote() method.
So:
quote_literal(undef) => "NULL"
quote_literal('foo') => "'foo'"

> > - generalize the Safe setup code to enable more control.

Specifically control what gets loaded into the Compartment, what gets
shared with it (e.g. sharing *a & *b as a workaround for the sort bug),
and what class to use for Safe (to enable deeper changes if desired via
subclassing). Naturally all this is only possible for admin (via
plperl.on_perl_init).

> > - formalize namespace usage, moving things out of main::
>
> Nice.
>
> > - add a way to perform inter-sub calling (at least for simple cases).

My current plan here is to use an SP::AUTOLOAD to handle loading and
dispatching. So calling SP::some_random_procedure(...) will trigger
SP::AUTOLOAD to try to resolve "some_random_procedure" to a particular
stored procedure. There are three tricky parts: handling polymorphism (at
least "well enough"), making autoloading of stored procedures work
inside Safe, making it fast. I think I have reasonable approaches for
those but I won't know for sure till I work on it.

> > - possibly rewrite _plperl_to_pg_array in C.
>
> Sounds great, Tim. I'm not really qualified to say anything about the
> C code, but I'd be happy to try it out once there are docs.

Great. Thanks David.

Tim.


From: Jeff <threshar(at)torgo(dot)978(dot)org>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 15:19:53
Message-ID: E5B68FAC-54A7-4359-B026-FBD37DBFCA6F@torgo.978.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Dec 4, 2009, at 6:18 AM, Tim Bunce wrote:
>
>>> - generalize the Safe setup code to enable more control.
>

Is there any possible way to enable "use strict;" for plperl (trusted)
modules?
I would love to have that feature. Sure does help cut down on bugs and
makes things nicer.

--
Jeff Trout <jeff(at)jefftrout(dot)com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff <threshar(at)threshar(dot)is-a-geek(dot)com>
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 16:01:42
Message-ID: 25459.1259942502@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff <threshar(at)threshar(dot)is-a-geek(dot)com> writes:
> Is there any possible way to enable "use strict;" for plperl (trusted)
> modules?

The plperl manual shows a way to do it using some weird syntax or
other. It'd sure be nice to be able to use the regular syntax though.

regards, tom lane


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 17:40:08
Message-ID: 004BC628-506A-489A-B0D9-820C15D40852@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 4, 2009, at 3:18 AM, Tim Bunce wrote:

> The perl code in plperl.on_perl_init gets eval'd as soon as an
> interpreter is created. That could be at server startup if
> shared_preload_libraries is used. plperl.on_perl_init can only be set by
> an admin (PGC_SUSET).

Are multiline GUCs allowed in the postgresql.conf file?

> The perl code in plperl.on_trusted_init gets eval'd when an interpreter
> is initialized into trusted mode, e.g., used for the plperl language.
> The perl code is eval'd inside the Safe compartment.
> plperl.on_trusted_init can be set by users but it's only useful if set
> before the plperl interpreter is first used.

So immediately after connecting would be the place to make sure you do it, IOW.

> plperl.on_untrusted_init acts like plperl.on_trusted_init but for
> plperlu code.
>
> So, if all three were set then, before any perl stored procedure or DO
> block is executed, the interpreter would have executed either
> on_perl_init and then on_trusted_init (for plperl), or on_perl_init and
> then on_untrusted_init (for plperlu).

Awesome, thanks! This is really a great feature.

>> along with quote_nullable(), which might also be useful to add.
>
> I was planning to build that behaviour into quote_literal since it fits
> naturally into perl's idea of undef and mirrors DBI's quote() method.
> So:
> quote_literal(undef) => "NULL"
> quote_literal('foo') => "'foo'"

Is there an existing `quote_literal()` in PL/Perl? If so, you might not want to change its behavior.

>>> - generalize the Safe setup code to enable more control.
>
> Specifically control what gets loaded into the Compartment, what gets
> shared with it (e.g. sharing *a & *b as a workaround for the sort bug),
> and what class to use for Safe (to enable deeper changes if desired via
> subclassing). Naturally all this is only possible for admin (via
> plperl.on_perl_init).

Sounds good.

>>> - formalize namespace usage, moving things out of main::
>>
>> Nice.
>>
>>> - add a way to perform inter-sub calling (at least for simple cases).
>
> My current plan here is to use an SP::AUTOLOAD to handle loading and
> dispatching. So calling SP::some_random_procedure(...) will trigger
> SP::AUTOLOAD to try to resolve "some_random_procedure" to a particular
> stored procedure. There are three tricky parts: handling polymorphism (at
> least "well enough"), making autoloading of stored procedures work
> inside Safe, making it fast. I think I have reasonable approaches for
> those but I won't know for sure till I work on it.

I'm wondering if there might be some way to use some sort of attributes to identify data types passed to a PL/Perl function called from another PL/Perl function. Maybe some other functions that identify types, in the case of ambiguities?

foo(int(1), text('bar'));

? Kind of ugly, but perhaps only to be used if there are ambiguities? Not sure it's a great idea, mind. Just thinking out loud (so to speak).

Best,

David


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 18:36:52
Message-ID: 28202.1259951812@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> On Dec 4, 2009, at 3:18 AM, Tim Bunce wrote:
>> The perl code in plperl.on_perl_init gets eval'd as soon as an
>> interpreter is created. That could be at server startup if
>> shared_preload_libraries is used. plperl.on_perl_init can only be set by
>> an admin (PGC_SUSET).

> Are multiline GUCs allowed in the postgresql.conf file?

I don't think so. In any case this seems like an extreme abuse of the
concept of a GUC, as well as being a solution in search of a problem,
as well as being something that should absolutely not ever happen inside
the postmaster process for both reliability and security reasons.
I vote a big no on this.

regards, tom lane


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 18:42:24
Message-ID: D52625E7-BD7A-4D93-8FC3-EAED7E822F47@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 4, 2009, at 10:36 AM, Tom Lane wrote:

>> Are multiline GUCs allowed in the postgresql.conf file?
>
> I don't think so. In any case this seems like an extreme abuse of the
> concept of a GUC, as well as being a solution in search of a problem,
> as well as being something that should absolutely not ever happen inside
> the postmaster process for both reliability and security reasons.
> I vote a big no on this.

That's fine. It's relatively simple for an admin to create a Perl module that does everything she wants, call it PGInit or something, and then just make the GUC:

plperl.on_perl_init = 'use PGInit;'

Best,

David


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff <threshar(at)threshar(dot)is-a-geek(dot)com>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 18:44:57
Message-ID: 4B1958A9.2050402@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Jeff <threshar(at)threshar(dot)is-a-geek(dot)com> writes:
>
>> Is there any possible way to enable "use strict;" for plperl (trusted)
>> modules?
>>
>
> The plperl manual shows a way to do it using some weird syntax or
> other. It'd sure be nice to be able to use the regular syntax though.
>
>
>

As is documented, all you have to do is have:

custom_variable_classes = 'plperl'
plperl.use_strict = 'true'

in your config. You only need to put the documented BEGIN block in your
function body if you want to do use strict mode on a case by case basis.

We can't allow an unrestricted "use strict;" in plperl functions because
it invokes an operation (require) that Safe.pm rightly regards as unsafe.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 18:51:00
Message-ID: 28618.1259952660@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> On Dec 4, 2009, at 10:36 AM, Tom Lane wrote:
>> I vote a big no on this.

> That's fine. It's relatively simple for an admin to create a Perl module that does everything she wants, call it PGInit or something, and then just make the GUC:

> plperl.on_perl_init = 'use PGInit;'

No, you missed the point: I'm objecting to having any such thing as
plperl.on_perl_init, full stop.

Aside from the points I already made, it's not even well defined.
What is to happen if the admin changes the value when the system
is already up?

regards, tom lane


From: Jeff <threshar(at)torgo(dot)978(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff <threshar(at)threshar(dot)is-a-geek(dot)com>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 18:51:53
Message-ID: D5304934-DE14-4171-80BD-4CEFF7B35FCB@torgo.978.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Dec 4, 2009, at 1:44 PM, Andrew Dunstan wrote:

>
> As is documented, all you have to do is have:
>
> custom_variable_classes = 'plperl'
> plperl.use_strict = 'true'
>
> in your config. You only need to put the documented BEGIN block in
> your function body if you want to do use strict mode on a case by
> case basis.
>
> We can't allow an unrestricted "use strict;" in plperl functions
> because it invokes an operation (require) that Safe.pm rightly
> regards as unsafe.
>

Yeah, saw that in the manual in the plperl functions & arguments page
(at the bottom).
I think my confusion came up because I'd read the trust/untrusted
thing which removes the ability to use use/require.

Maybe a blurb or moving that chunk of doc to the trusted/untrusted
page might make that tidbit easier to find?

--
Jeff Trout <jeff(at)jefftrout(dot)com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 18:56:14
Message-ID: 603c8f070912041056k36518268lf6fb3ac56e4b1327@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 4, 2009 at 1:51 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "David E. Wheeler" <david(at)kineticode(dot)com> writes:
>> On Dec 4, 2009, at 10:36 AM, Tom Lane wrote:
>>> I vote a big no on this.
>
>> That's fine. It's relatively simple for an admin to create a Perl module that does everything she wants, call it PGInit or something, and then just make the GUC:
>
>>     plperl.on_perl_init = 'use PGInit;'
>
> No, you missed the point: I'm objecting to having any such thing as
> plperl.on_perl_init, full stop.
>
> Aside from the points I already made, it's not even well defined.
> What is to happen if the admin changes the value when the system
> is already up?

So, do we look for another way to provide the functionality besides
having a GUC, or is the functionality itself bad?

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 19:05:28
Message-ID: 28909.1259953528@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> So, do we look for another way to provide the functionality besides
> having a GUC, or is the functionality itself bad?

I don't think we want random Perl code running inside the postmaster,
no matter what the API to cause it is. I might hold my nose for "on
load" code if it can only run in backends, though I still say that
it's a badly designed concept because of the uncertainty about who
will run what when. Shlib load time is not an event that ought to be
user-visible.

regards, tom lane


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 19:09:36
Message-ID: F9C9B06B-B62A-41BC-A696-B6C147978B38@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 4, 2009, at 10:51 AM, Tom Lane wrote:

>> plperl.on_perl_init = 'use PGInit;'
>
> No, you missed the point: I'm objecting to having any such thing as
> plperl.on_perl_init, full stop.
>
> Aside from the points I already made, it's not even well defined.
> What is to happen if the admin changes the value when the system
> is already up?

Nothing. Hence the "init".

Best,

David


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 19:11:12
Message-ID: 91840DBA-3D85-4A61-BE4D-A3B72C6F122B@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 4, 2009, at 11:05 AM, Tom Lane wrote:

>> So, do we look for another way to provide the functionality besides
>> having a GUC, or is the functionality itself bad?
>
> I don't think we want random Perl code running inside the postmaster,
> no matter what the API to cause it is. I might hold my nose for "on
> load" code if it can only run in backends, though I still say that
> it's a badly designed concept because of the uncertainty about who
> will run what when. Shlib load time is not an event that ought to be
> user-visible.

So only the child processes would be allowed to load the code? That could make connections even slower if there's a lot of Perl code to be added, though that's also the issue we have today. I guess I could live with that, though I'd rather have such code shared across processes.

If it's a badly designed concept, do you have any ideas that are less bad?

Best,

David


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 19:13:28
Message-ID: 603c8f070912041113x6dbca08aq1ed69f0206ea6625@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 4, 2009 at 2:05 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> So, do we look for another way to provide the functionality besides
>> having a GUC, or is the functionality itself bad?
>
> I don't think we want random Perl code running inside the postmaster,
> no matter what the API to cause it is.  I might hold my nose for "on
> load" code if it can only run in backends, though I still say that
> it's a badly designed concept because of the uncertainty about who
> will run what when.  Shlib load time is not an event that ought to be
> user-visible.

I agree that the uncertainty is not a wonderful thing, but e.g. Apache
has the same problem with mod_perl, and you just deal with it. I
choose to deal with it by doing "apachectl graceful" every time I
change the source code; or you can install Perl modules that check
whether the mod-times on the other modules you've loaded have changed
and reload them if so. In practice, being able to pre-load the Perl
libraries you're going to want to execute is absolutely essential if
you don't want performance to be in the toilet. My code base is so
large now that it takes 3 or 4 seconds for Apache to pull it all in on
my crappy dev box, but it's blazingly fast once it's up and running.
Having that be something that happens on the production server only
once a week or once a month when I roll out a new release rather than
any more frequently is really important.

...Robert


From: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff <threshar(at)threshar(dot)is-a-geek(dot)com>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 19:16:36
Message-ID: 20091204191635.GA89699@timac.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 04, 2009 at 11:01:42AM -0500, Tom Lane wrote:
> Jeff <threshar(at)threshar(dot)is-a-geek(dot)com> writes:
> > Is there any possible way to enable "use strict;" for plperl (trusted)
> > modules?
>
> The plperl manual shows a way to do it using some weird syntax or
> other. It'd sure be nice to be able to use the regular syntax though.

Finding a solution is definitely on my list. I've spent a little time
exploring this already but haven't found a simple solution yet.

The neatest would have been overriding &CORE::GLOBAL::require but sadly
the Safe/Opcode mechanism takes priority over that and forbids compiling
code that does a use/require.

I may end up re-enabling the require opcode but redirecting it to run
some C code in plperl.c (the same 'opcode redirection' technique used by
my NYTProf profiler). That C code would only need to throw an exception
if the module hasn't been loaded already.

Tim.


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 19:29:06
Message-ID: 20091204192906.GG4705@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David E. Wheeler escribió:

> If it's a badly designed concept, do you have any ideas that are less bad?

I'm not sure that we want to duplicate this idea today, but in pltcl
there's a pltcl_modules table that is scanned on interpreter init and
loads user-defined code.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 19:40:18
Message-ID: 20091204194018.GC89699@timac.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 04, 2009 at 02:05:28PM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > So, do we look for another way to provide the functionality besides
> > having a GUC, or is the functionality itself bad?
>
> I don't think we want random Perl code running inside the postmaster,
> no matter what the API to cause it is. I might hold my nose for "on
> load" code if it can only run in backends, though I still say that
> it's a badly designed concept because of the uncertainty about who
> will run what when.

Robert's comparison with mod_perl is very apt. Preloading code gives
dramatic performance gains in production situations where there's a
significant codebase and connections are frequent.

The docs for plperl.on_perl_init could include a section relating to
it's use with shared_preload_libraries. That could document any issues
and caveats you feel are important.

Tim.


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 20:51:17
Message-ID: C45BA781-F3EC-4A9B-9C8C-3F712332FF15@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Le 4 déc. 2009 à 20:40, Tim Bunce a écrit :
> Robert's comparison with mod_perl is very apt. Preloading code gives
> dramatic performance gains in production situations where there's a
> significant codebase and connections are frequent.

How far do you go with using a connection pooler such as pgbouncer?

--
dim


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 23:25:56
Message-ID: 97F5E1C9-5161-46D1-BC59-A54A9E614D71@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 4, 2009, at 11:40 AM, Tim Bunce wrote:

> Robert's comparison with mod_perl is very apt. Preloading code gives
> dramatic performance gains in production situations where there's a
> significant codebase and connections are frequent.
>
> The docs for plperl.on_perl_init could include a section relating to
> it's use with shared_preload_libraries. That could document any issues
> and caveats you feel are important.

+1

Tom, what's your objection to Shlib load time being user-visible?

Best,

David


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-04 23:28:35
Message-ID: 4B199B23.7080006@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>
>> So, do we look for another way to provide the functionality besides
>> having a GUC, or is the functionality itself bad?
>>
>
> I don't think we want random Perl code running inside the postmaster,
> no matter what the API to cause it is. I might hold my nose for "on
> load" code if it can only run in backends, though I still say that
> it's a badly designed concept because of the uncertainty about who
> will run what when. Shlib load time is not an event that ought to be
> user-visible.
>
>

But you can load an arbitrary shared lib inside the postmaster and it
can do what it likes, so I'm not clear that your caution is actually
saving us from much.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-05 06:21:22
Message-ID: 8950.1259994082@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> Tom, what's your objection to Shlib load time being user-visible?

It's not really designed to be user-visible. Let me give you just
two examples:

* We call a plperl function for the first time in a session, causing
plperl.so to be loaded. Later the transaction fails and is rolled
back. If loading plperl.so caused some user-visible things to happen,
should those be rolled back? If so, how do we get perl to play along?
If not, how do we get postgres to play along?

* We call a plperl function for the first time in a session, causing
plperl.so to be loaded. This happens in the context of a superuser
calling a non-superuser security definer function, or perhaps vice
versa. Whose permissions apply to whatever the on_load code tries
to do? (Hint: every answer is wrong.)

That doesn't even begin to cover the problems with allowing any of
this to happen inside the postmaster. Recall that the postmaster
does not have any database access. Furthermore, it is a very long
established reliability principle around here that the postmaster
process should do as little as possible, because every thing that it
does creates another opportunity to have a nonrecoverable failure.
The postmaster can recover if a child crashes, but the other way
round, not so much.

regards, tom lane


From: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-05 13:56:00
Message-ID: 20091205135600.GA96338@timac.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 05, 2009 at 01:21:22AM -0500, Tom Lane wrote:
> "David E. Wheeler" <david(at)kineticode(dot)com> writes:
> > Tom, what's your objection to Shlib load time being user-visible?
>
> It's not really designed to be user-visible. Let me give you just
> two examples:
>
> * We call a plperl function for the first time in a session, causing
> plperl.so to be loaded. Later the transaction fails and is rolled
> back. If loading plperl.so caused some user-visible things to happen,
> should those be rolled back?

No. Establishing initial state, no matter how that's triggered, is not
part of a transaction.

> * We call a plperl function for the first time in a session, causing
> plperl.so to be loaded. This happens in the context of a superuser
> calling a non-superuser security definer function, or perhaps vice
> versa. Whose permissions apply to whatever the on_load code tries
> to do? (Hint: every answer is wrong.)

I'll modify the patch to disable the SPI functions during
initialization (both on_perl_init and on_(un)trusted_init).

Would that address your concerns?

> That doesn't even begin to cover the problems with allowing any of
> this to happen inside the postmaster. Recall that the postmaster
> does not have any database access. Furthermore, it is a very long
> established reliability principle around here that the postmaster
> process should do as little as possible, because every thing that it
> does creates another opportunity to have a nonrecoverable failure.
> The postmaster can recover if a child crashes, but the other way
> round, not so much.

I hope the combination of disabling the SPI functions during
initialization, and documenting the risks of combining on_perl_init and
shared_preload_libraries, is sufficient.

Tim.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-05 16:41:36
Message-ID: 17793.1260031296@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com> writes:
> I'll modify the patch to disable the SPI functions during
> initialization (both on_perl_init and on_(un)trusted_init).

Yeah, in the shower this morning I was thinking that not loading
SPI till after the on_init code runs would alleviate the concerns
about transactionality and permissions --- that would ensure that
whatever on_init does affects only the Perl world and not the database
world.

However, we're not out of the woods yet. In a trusted interpreter
(plperl not plperlu), is the on_init code executed before we lock down
the interpreter with Safe? I would think it has to be since the main
point AFAICS is to let you preload code via "use". But then what is
left of the security guarantees of plperl? I can hardly imagine DBAs
wanting to vet a few thousand lines of random Perl code to see if it
contains anything that could be subverted. For example, the ability
to scribble on database files (like say pg_hba.conf) would almost surely
be easy to come by.

If you're willing to also confine the feature to plperlu, then maybe
the risk level could be decreased from insane to merely unreasonable.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-05 17:03:36
Message-ID: 4B1A9268.2000805@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tim Bunce wrote:
>> That doesn't even begin to cover the problems with allowing any of
>> this to happen inside the postmaster. Recall that the postmaster
>> does not have any database access. Furthermore, it is a very long
>> established reliability principle around here that the postmaster
>> process should do as little as possible, because every thing that it
>> does creates another opportunity to have a nonrecoverable failure.
>> The postmaster can recover if a child crashes, but the other way
>> round, not so much.
>>
>
> I hope the combination of disabling the SPI functions during
> initialization, and documenting the risks of combining on_perl_init and
> shared_preload_libraries, is sufficient.
>
>
>

We already do a lot during library load - plperl's _PG_init() calls
plperl_init_interp() which sets up an interpreter, runs the boot code,
loads the Dynaloader and bootstraps the SPI module.

Pre-loading perl libraries in forking servers has well known benefits,
as Robert Haas noted.

We're not talking about touching the database at all.

If we turn Tim's proposal down, I suspect someone will create a fork of
plperl that allows it anyway - it's not like it needs anything changed
elsewhere in the backend - it would be a drop-in replacement, pretty much.

Here's a concrete example of something I was working on just yesterday,
where it would be useful. One of my clients has a Postgres based
application that needs to talk to a number of foreign databases, mostly
SQLServer. In some cases it pulls data from them, in this new case we
are pushing lots of data at arbitrary times into SQLServer, using
plperlu with DBI/DBD::Sybase. We would probably get a significant
performance gain if we could have DBI and DBD::Sybase preloaded. The
application does use connection pooling, but every so often a function
call will take significantly longer because it occurs in a new backend
that is having to reload the libraries.

I think if we do this the on_perl_init setting should probably be
PGC_POSTMASTER, which would remove any issue about it changing
underneath us.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-05 17:17:27
Message-ID: 18338.1260033447@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> If we turn Tim's proposal down, I suspect someone will create a fork of
> plperl that allows it anyway - it's not like it needs anything changed
> elsewhere in the backend - it would be a drop-in replacement, pretty much.

The question is not about whether we think it's useful; the question
is about whether it's safe.

> I think if we do this the on_perl_init setting should probably be
> PGC_POSTMASTER, which would remove any issue about it changing
> underneath us.

Yes, if the main intended usage is in combination with preloading perl
at postmaster start, it would be pointless to imagine that PGC_SIGHUP
is useful anyway.

regards, tom lane


From: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-05 17:55:18
Message-ID: 20091205175518.GA98088@timac.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 05, 2009 at 11:41:36AM -0500, Tom Lane wrote:
> Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com> writes:
> > I'll modify the patch to disable the SPI functions during
> > initialization (both on_perl_init and on_(un)trusted_init).
>
> Yeah, in the shower this morning I was thinking that not loading
> SPI till after the on_init code runs would alleviate the concerns
> about transactionality and permissions --- that would ensure that
> whatever on_init does affects only the Perl world and not the database
> world.
>
> However, we're not out of the woods yet. In a trusted interpreter
> (plperl not plperlu), is the on_init code executed before we lock down
> the interpreter with Safe?

The on_perl_init code (PGC_SUSET) is run before Safe is loaded.

The on_trusted_init code (PGC_USERSET) is run inside Safe.

> I would think it has to be since the main point AFAICS is to let you
> preload code via "use".

The main use case being targeted at the moment for on_trusted_init
is setting values in %_SHARED, perhaps to enable debugging.

Inside Safe you'll only be able to 'use' modules that have already been
loaded inside Safe. In my draft patch that's currently just strict and
warnings.

(I am also adding an interface to enable DBAs to configure what gets
loaded into the Safe compartment and what gets shared with it.
That'll be the way extra modules can be used by plperl.
It'll be used via on_perl_init so be controlled via the DBA.)

> I can hardly imagine DBAs wanting to vet a few thousand lines of
> random Perl code to see if it contains anything that could be
> subverted. For example, the ability to scribble on database files
> (like say pg_hba.conf) would almost surely be easy to come by.

It's surely better to give the DBA that option than to remove the choice
entirely.

> If you're willing to also confine the feature to plperlu, then maybe
> the risk level could be decreased from insane to merely unreasonable.

I believe I can arrange for the SPI functions to be disabled during
on_*_init for both plperl and plperlu. Hopefully then the default risk
level will be better than unreasonable :)

Tim.


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: First feature patch for plperl - draft [PATCH]
Date: 2009-12-05 18:28:04
Message-ID: 20091205182804.GB9381@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane escribió:
> "David E. Wheeler" <david(at)kineticode(dot)com> writes:
> > Tom, what's your objection to Shlib load time being user-visible?
>
> It's not really designed to be user-visible. Let me give you just
> two examples:
>
> * We call a plperl function for the first time in a session, causing
> plperl.so to be loaded. Later the transaction fails and is rolled
> back.

I don't think there's any way for this to work sanely unless the library
has been loaded previously. What about allowing those settings only if
plperl is specified in shared_preload_libraries?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support