Re: Proposal: access control jails (and introduction as aspiring GSoC student)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date: 2010-03-26 16:07:46
Message-ID: 603c8f071003260907y23f273a8tce191427e24a97de@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 25, 2010 at 11:42 PM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> From what I can tell, a big problem with my jails idea (as well as the
> variables Robert described) is that there really isn't a way to store
> context in the backend specifically for the end client (e.g. a PHP
> script) due to connection pooling.  Also, I almost feel that storing
> such context would be a disadvantage, as it would harm some of the
> referential transparency that pooling and caching take advantage of,
> now and in the future.  However, I'm not going to give up :)
>
> Perhaps we could have some sort of LET statement that allows the
> client to pass data to the server, then have libpq automatically wrap
> queries with the LET statement (when necessary).  Here's what it would
> look like to the PHP scripter:
>
> // New libpq function
> pg_set('current_user', 'bob');
>
> $result = pg_query_params(
>        'SELECT answer FROM secrets WHERE user=current_user AND question=$1',
>        array('Birth place'));
>
>
> What this really does is something like:
>
> $result = pg_query_params(
>        'LET current_user=$1 DO $2 $3',
>        array(
>                'bob',
>                'SELECT answer FROM secrets WHERE user=current_user AND question=$1',
>                'Birth place')
>        ));
>
>
> Here, the hypothetical LET statement executes a query string, binding
> current_user to our desired value.  The client library would wrap all
> future queries in this fashion.
>
> Granted, it would be silly to pass the value itself to the server over
> and over, so a serious implementation would probably pass a context
> ID, and these variable assignments would live in the backend instead.
> Moreover, LET is a terrible keyword choice here, considering most
> PostgreSQL users won't need to use it explicitly thanks to additional
> libpq support.
>
> Alternatively (this might require changing the client/server
> protocol), a context ID could be passed back and forth, thus providing
> a way to tell clients apart.
>
> Implementing this idea requires adding to the backend and to libpq.
> The backend would need at least two new statements.  One would set a
> variable of a session context, creating one if necessary and returning
> its ID.  Another would execute a string as a parameter and bind both
> immediate arguments and session context to it.  libpq would need a
> function to set a variable, and it would need to wrap queries it sends
> out with LET statements if necessary.
>
> Note that these variables can't be used in pre-defined functions
> unless they are somehow declared in advance.  One idea would be to
> first add global variable support, then make session-local contexts be
> able to temporarily reassign those variables.  Another would be to
> provide an explicit declaration statement.
>
> Would this make a good proposal for GSoC?:  Implement the backend part
> of my proposal, and create a proof-of-concept wrapper demonstrating
> it.  This way, I add the new statements, but don't mess around with
> existing functionality too much.

Hmm. I'm not sure exactly what problem you're trying to solve here.
I don't think this is a particularly good design for supporting
variables inside the server, since, well, it doesn't actually support
variables inside the server. If we just want a crude hack for
allowing the appearance of session-local server-side variables, that
could be implemented entirely in client code - in fact it could be
done as a thin wrapper around libpq that just does textual
substitution of the variables actually referenced by a particular
query. That wouldn't require any modifications to core PostgreSQL at
all, and it would probably perform better too since you'd not send all
the unnecessary variables with every query.

Of course, you're 100% correct that connection pooling won't
necessarily play well with this feature, but that doesn't mean that we
shouldn't implement it. For one thing, not everybody uses connection
pooling; for two things, I think global variables (that would behave
sort of like a sequence - they'd act sort of like a single column
single row relation) would also be useful, and those WOULD work in a
connection-pooling environment.

But, I think that implementing any kind of variable support in the
backend is way too ambitious a project for a first-time hacker to get
done in a couple of months. I would guess that's a two-year project
for a first time hacker or a one-year project for an experienced
hacker (or a three week project for Tom Lane). Here are some ideas
from http://wiki.postgresql.org/wiki/Todo that I think MIGHT be closer
to the right size for GSOC:

Allow administrators to cancel multi-statement idle transactions
Check for unreferenced table files created by transactions that were
in-progress when the server terminated abruptly
Add functions to check correctness of configuration files before they
are loaded "live"
Add JSON (JavaScript Object Notation) data type [tricky part will be
getting community buy-in on which JSON library to use]
Allow ALTER TABLE ... ALTER CONSTRAINT ... RENAME
Allow ALTER TABLE to change constraint deferrability and actions
Add missing object types for ALTER ... SET SCHEMA
Add support for multiple pg_restore -t options, like pg_dump
Allow triggers to be disabled in only the current session [without the
necessity of modifying system tables]
Allow single batch hash joins to preserve outer pathkeys [definitely
harder than some of the above]
Fix system views like pg_stat_all_tables to use set-returning
functions, rather than views of per-column functions

Other ideas:

Allow per-tablespace effective_io_concurrency
Add a GIST opclass for inet/cidr that can support an exclusion
constraint for "cidr blocks do not overlap"
ALTER VIEW ... DROP COLUMN (or alternatively/in addition RENAME COLUMN)

(I now wait for the chorus of people telling me that these ideas are
(a) too easy, (b) too hard, or (c) too biased toward my own
priorities. I readily admit to (c) - I tried to list things here
where I have some idea of what would be required to implement the
feature, so it's therefore biased toward the parts of the system with
which I'm familiar, which in turn are the ones I care about. Feel
free to add your own ideas or critique these.)

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-03-26 16:41:23 TODO list updates
Previous Message Tom Lane 2010-03-26 15:05:02 Re: Postgres 9.0 Alpha, GIN indexes, and intarray contrib module, and SQL Functions