Re: Add on_perl_init and proper destruction to plperl [PATCH]

From: Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tim Bunce <Tim(dot)Bunce(at)pobox(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Add on_perl_init and proper destruction to plperl [PATCH]
Date: 2010-01-27 14:33:18
Message-ID: 20100127143318.GE713@timac.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 27, 2010 at 01:14:16AM -0500, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > Tim Bunce wrote:
> >> - Added plperl.on_perl_init GUC for DBA use (PGC_SIGHUP)
> >> SPI functions are not available when the code is run.
> >>
> >> - Added normal interpreter destruction behaviour
> >> END blocks, if any, are run then objects are
> >> destroyed, calling their DESTROY methods, if any.
> >> SPI functions will die if called at this time.
>
> > So, are there still objections to applying this patch?
>
> Yes.

To focus the discussion I've looked back through all the messages from
you that relate to this issue so I can summarize and try to address your
objections.

Some I've split or presented out of order, most relate to earlier (less
restricted) versions of the patch before it was split out, and naturally
they are lacking some context, so I've included archive URLs.

Please forgive and correct me if I misrepresent you or your intent here.

Regarding the utility of plperl.on_perl_init and END:

http://archives.postgresql.org/message-id/18338.1260033447@sss.pgh.pa.us
The question is not about whether we think it's useful; the question
is about whether it's safe.

I agree.

Regarding visibility of changes to plperl.on_perl_init:

http://archives.postgresql.org/message-id/28618.1259952660@sss.pgh.pa.us
What is to happen if the admin changes the value when the system is already up?

If a GUC could be defined as PGC_BACKEND and only settable by superuser,
perhaps that would be a good fit. [GucContext seems to conflate some things.]
Meanwhile the _init name is meant to convey the fact that it's a
before-first-use GUC, like temp_buffers.

I'm happy to accept whatever you'd recommend by way of PGC_* GUC selection.
Documentation can note any caveats associated with combining
plperl.on_perl_init with shared_preload_libraries.

http://archives.postgresql.org/message-id/4516.1263168347@sss.pgh.pa.us
However, I think PGC_SIGHUP would be enough to address my basic
worry, which is that people shouldn't be depending on the ability to set
these things within an individual session.

The patch uses PGC_SIGHUP for plperl.on_perl_init.

http://archives.postgresql.org/message-id/8950.1259994082@sss.pgh.pa.us
> Tom, what's your objection to Shlib load time being user-visible?
It's not really designed to be user-visible. Let me give you just
two examples:

* We call a plperl function for the first time in a session, causing
plperl.so to be loaded. Later the transaction fails and is rolled
back. If loading plperl.so caused some user-visible things to happen,
should those be rolled back? If so, how do we get perl to play along?
If not, how do we get postgres to play along?

I believe that's addressed by spi functions being disabled when init code runs.

* We call a plperl function for the first time in a session, causing
plperl.so to be loaded. This happens in the context of a superuser
calling a non-superuser security definer function, or perhaps vice
versa. Whose permissions apply to whatever the on_load code tries
to do? (Hint: every answer is wrong.)

I think that related to on_*trusted_init not plperl.on_perl_init, and
is also addressed by spi functions being disabled when init code runs.

That doesn't even begin to cover the problems with allowing any of
this to happen inside the postmaster. Recall that the postmaster
does not have any database access.

I believe that's addressed by spi functions being disabled when init code runs.

Furthermore, it is a very long
established reliability principle around here that the postmaster
process should do as little as possible, because every thing that it
does creates another opportunity to have a nonrecoverable failure.
The postmaster can recover if a child crashes, but the other way
round, not so much.

I understand that concern. Ultimately, though, that comes down to the
judgement of DBAs and the trust placed in them. They can already
load arbitrary code via shared_preload_libraries.

http://archives.postgresql.org/message-id/18338.1260033447@sss.pgh.pa.us
> I think if we do this the on_perl_init setting should probably be
> PGC_POSTMASTER, which would remove any issue about it changing
> underneath us.

Yes, if the main intended usage is in combination with preloading perl
at postmaster start, it would be pointless to imagine that PGC_SIGHUP
is useful anyway.

http://archives.postgresql.org/message-id/17793.1260031296@sss.pgh.pa.us
Yeah, in the shower this morning I was thinking that not loading
SPI till after the on_init code runs would alleviate the concerns
about transactionality and permissions --- that would ensure that
whatever on_init does affects only the Perl world and not the database
world.

That's included in the current patch (and also applies to END blocks).

However, we're not out of the woods yet. In a trusted interpreter
(plperl not plperlu), is the on_init code executed before we lock down
the interpreter with Safe? I would think it has to be since the main
point AFAICS is to let you preload code via "use". But then what is
left of the security guarantees of plperl? I can hardly imagine DBAs
wanting to vet a few thousand lines of random Perl code to see if it
contains anything that could be subverted.

plperl.on_perl_init code, set by the DBA, runs before the Safe
compartment is created. Without explicitextra steps the Safe
compartment has no access to code loaded by plperl.on_perl_init.

The Safe compartment (plperl) could get access to loaded code in one of
these ways:
1. by using SQL to call a plperlu function that accesses the code.
2. by the DBA 'sharing' a specific subroutine with the compartment.
3. by the DBA loading a module into the compartment.

There's no formal interface for 2. and 3. at the moment, so the only
official option is 1. (The final patch in the series includes some
building blocks towards an interface for 2 & 3, but doesn't add one.)

If you're willing to also confine the feature to plperlu, then maybe
the risk level could be decreased from insane to merely unreasonable.

I think you could reasonably describe plperl.on_perl_init as effectively
confined to plperlu (because plperl has no access to any new code).

http://archives.postgresql.org/message-id/26766.1263149361@sss.pgh.pa.us
For the record, I think it's a bad idea to run arbitrary
user-defined code in the postmaster, and I think it's a worse idea to
run arbitrary user-defined code at backend shutdown (the END-blocks bit).
I do not care in the least what applications you think this might
enable --- the negative consequences for overall system stability seem
to me to outweigh any possible arguments on that side.
- What happens when the supplied code has errors,

For on_perl_init it throws an exception that propagates to the user
statement that triggered the initialization of perl. It also ensures
that perl is left in a non-initialized state, so any further uses
also fail.

For END blocks an error triggers an exception that's caught by perl.

(As noted above, there's no access to postgres from init or END code.)

- takes an unreasonable amount of time to run,

Unreasonable is in the eye of the DBA, of course, and they
have the discretion to set on_perl_init to fit their needs.

For END blocks, I don't see how this issue is any different from
"users might do something dumb", like DO 'while(1){}' language plperl;
(or plpython , pltcl, or plpgsql for that matter).

- does something unsafe,

Such as? The code can't do anything more unsafe than is already possible.

- depends on the backend not being in an error state already,

The code has no access to postgress, whatever the state.

- etc. etc?

I'd welcome more concrete examples of potential issues.

Tim.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ivan Sergio Borgonovo 2010-01-27 14:39:19 Re: C function accepting/returning cstring vs. text
Previous Message Hitoshi Harada 2010-01-27 14:32:08 Re: Improving the accuracy of estimate_num_groups()