Quick Links

Re: Proposal: access control jails (and introduction as aspiring GSoC student)

Lists:	pgsql-hackers

From:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-22 04:36:37
Message-ID:	e7e5fefd1003212136o799f3385h57d5d0b0b5e6d271@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, I'm Joey Adams, and I'm interested in applying for Google
Summer of Code to work on PostgreSQL. I'm a former GSoC student (I
worked on CCAN last year), and a strong C programmer, though I am
still new to working with large, established communities. I apologize
if this is the wrong place to send GSoC student introductions.

My proposal is bold, though I believe it can be trimmed down and
refined into something very useful, yet simple enough to implement in
3 months by a newcomer to the PostgreSQL code base.

I propose adding application-level access control to PostgreSQL via a
jails concept. In a nutshell, a jail is created as part of the
database definition (typically exposing a free variable for the
current user). When a jail is activated for a session, the only
accesses allowed are those indicated in the jail itself. A jail
cannot be exited without closing the session. If used properly, jails
make it possible to safely execute untrusted SQL code (though one may
not want to, citing the principle of least privilege).

For example, suppose we have the following database definition for a
trivial discussion board system:

CREATE TABLE users (
id SERIAL PRIMARY KEY,
name TEXT,
password TEXT,
email TEXT,
email_public BOOLEAN
);

CREATE TABLE posts (
id SERIAL PRIMARY KEY,
title TEXT,
"user" INT REFERENCES users(id),
content TEXT,
published BOOLEAN
);

CREATE TABLE tags (
post INT REFERENCES posts(id),
tag TEXT
);

Suppose our discussion board is written in PHP. It authenticates the
user via cookies, after which it has an ID for the current user. One
way to arbitrate access is to code it into the PHP using queries like
this:

pg_query_params('SELECT users.email WHERE id=$1 AND (email_public=TRUE
OR id=$2)', $_GET['user'], $current_user);

Simple enough. However, this access control check has to be done for
every query, and it can get rather complex as more tables are
involved.

Views are a way to alleviate this complexity, but:

* Views only apply to SELECTs
* Queries still have to specify the current user (unless one uses a
global variable (which requires a bit of a hack, if I'm not mistaken))

My proposal would make it possible to have a statement in the database
definition much like this:

CREATE JAIL jail (
SELECT id, name FROM users;
SELECT email FROM users
WHERE id=current_user
OR email_public=TRUE;

SELECT * FROM posts
WHERE "user"=current_user
OR published=TRUE;
UPDATE posts
WHERE "user"=current_user
SET title, content, published;
INSERT INTO posts
WHERE "user"=current_user;
DELETE FROM posts
WHERE "user"=current_user;

SELECT * FROM tags, posts
WHERE tags.post=posts.id;
INSERT INTO tags
WHERE post IN (
SELECT id FROM posts
WHERE "user"=current_user);
DELETE FROM posts
WHERE post IN (
SELECT id FROM posts
WHERE "user"=current_user);
);

Inside of the jail definition is a series of pseudo-statements that
indicate the space of queries the user can perform. Simply creating a
jail does not make it go into effect. A jail is activated using
another query, and it remains in effect for the remainder of the
session. It cannot be deactivated through the protocol, as doing so
would constitute a privilege escalation.

Example of a PHP script invoking a jail:

<?php
pg_connect('user=foo password=spiderman') or die('Database error occurred');

/* authentication-fu */
// $current_user now contains the id of the logged-in user.

pg_query_params('USE JAIL jail WITH current_user=$1', array($current_user));

/* Remaining queries in this session will only be able to SELECT,
UPDATE, INSERT, and DELETE on users, posts, and tags, following the
rules set forth by the jail. */
?>

I came up with this idea while working on a small
software-as-a-service application using PostgreSQL. I needed to
implement access control in the PHP frontend. I ended up creating a
database abstraction class to protect the rest of the frontend code
from SQL messiness. Every database query I needed, I wrote/extended a
PHP function for it. My database access class for my "small"
application exploded to over 1000 lines (granted, a lot of it was
comments).

Although having accessor functions can make the code easier to read
and maintain, it can result in a lot of redundancy. Why not just use
the database abstraction library PostgreSQL gives you: SQL?

Jails seem to me like a revolutionary addition to SQL, and
implementing them "fully" could be quite involved. However, my guess
is that by restricting what may be done in a jail simply to SELECT,
UPDATE, INSERT and DELETE, an initial jail implementation capable of
what I posted above would be relatively easy. Work would mainly
consist of adding code to PostgreSQL's parser and the rewrite engine.

Comments?

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-22 08:50:59
Message-ID:	4BA72F73.3040105@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Joseph Adams wrote:
> Hello, I'm Joey Adams, and I'm interested in applying for Google
> Summer of Code to work on PostgreSQL. I'm a former GSoC student (I
> worked on CCAN last year), and a strong C programmer, though I am
> still new to working with large, established communities. I apologize
> if this is the wrong place to send GSoC student introductions.

Hi Joey, you're in the right place!

> * Views only apply to SELECTs

Views can be made updateable by defining suitable rules for them, that
is usually the way people solve this problem. See chapters
http://www.postgresql.org/docs/8.4/interactive/rules-views.html and
http://www.postgresql.org/docs/8.4/interactive/rules-update.html in the
user manual.

You should also take a look at Veil
(http://veil.projects.postgresql.org/), which uses views to implement
something that sounds much like what you're proposing.

You should also be aware of an issue with this approach:
http://archives.postgresql.org/message-id/4AE02DF0.40101@enterprisedb.com.
Some ideas on plugging that hole were discussed, but unfortunately there
was no clear consensus and no progress since.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-22 13:39:47
Message-ID:	20100322133947.GS21875@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Joseph Adams (joeyadams3(dot)14159(at)gmail(dot)com) wrote:
> I propose adding application-level access control to PostgreSQL via a
> jails concept. In a nutshell, a jail is created as part of the
> database definition (typically exposing a free variable for the
> current user). When a jail is activated for a session, the only
> accesses allowed are those indicated in the jail itself. A jail
> cannot be exited without closing the session. If used properly, jails
> make it possible to safely execute untrusted SQL code (though one may
> not want to, citing the principle of least privilege).

I guess my initial reaction to this is that you can use roles, views,
and pl/pgsql (security definer) functions to achieve this. This does
have an interesting intersection with row-level security concepts and
that's definitely a project that I'd like to see happen at some point in
PG. Not sure if you've considered this, but you can do a 'set role' at
the start of a session and then use CURRENT_ROLE in view definitions and
in other places. You can also make it so that the user who is logging
in (eg 'www-data') doesn't have any rights to anything, except the
ability to 'set role' to other roles.

Note that, with any of this, you need to consider pooled database
connections. Unfortunately, it's still pretty expensive to establish a
new database connection to PG.

Thanks,

Stephen

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-22 13:54:30
Message-ID:	603c8f071003220654u4de47527qe7ada197b6b38447@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 22, 2010 at 9:39 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Joseph Adams (joeyadams3(dot)14159(at)gmail(dot)com) wrote:
>> I propose adding application-level access control to PostgreSQL via a
>> jails concept. In a nutshell, a jail is created as part of the
>> database definition (typically exposing a free variable for the
>> current user). When a jail is activated for a session, the only
>> accesses allowed are those indicated in the jail itself. A jail
>> cannot be exited without closing the session. If used properly, jails
>> make it possible to safely execute untrusted SQL code (though one may
>> not want to, citing the principle of least privilege).
>
> I guess my initial reaction to this is that you can use roles, views,
> and pl/pgsql (security definer) functions to achieve this. This does
> have an interesting intersection with row-level security concepts and
> that's definitely a project that I'd like to see happen at some point in
> PG. Not sure if you've considered this, but you can do a 'set role' at
> the start of a session and then use CURRENT_ROLE in view definitions and
> in other places. You can also make it so that the user who is logging
> in (eg 'www-data') doesn't have any rights to anything, except the
> ability to 'set role' to other roles.

Sometimes it would be nice to conditionalize queries on a value other
than the authenticated role. I really wish we had some kind of SQL
variable support. Talking out of my rear end:

CREATE VARIABLE name AS type [GLOBAL | LOCAL]; -- local variables are
per-backend
SET VARIABLE name = value;
SELECT * FROM foo WHERE col = ${name};

We have had one previous request for the ability to limit the list of
queries that the user can issue to a fixed set, but it's not clear to
me that that's really all that useful. If that's what you want, you
might be better off putting an intermediate layer between the client
and the database - e.g. this kind of restriction could be implemented
in the connection pooler...

...Robert

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-22 14:03:57
Message-ID:	20100322140357.GU21875@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> Sometimes it would be nice to conditionalize queries on a value other
> than the authenticated role. I really wish we had some kind of SQL
> variable support. Talking out of my rear end:

I certainly agree- having variable support in the backend would
definitely be nice. I'd want it to be explicit and distinct from GUCs
though, unlike the situation we have w/ psql right now. All that said,
I'm not really a huge fan of write-your-own-authorization-system in
general. If the existing authorization system isn't sufficient for what
you want, then let's improve it. There may be specific cases where
what's needed is particularly complex, but that's what security definer
functions are for..

Thanks,

Stephen

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-22 15:02:13
Message-ID:	603c8f071003220802s4fae3ae9j672acecb77ddb06a@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 22, 2010 at 10:03 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
>> Sometimes it would be nice to conditionalize queries on a value other
>> than the authenticated role. I really wish we had some kind of SQL
>> variable support. Talking out of my rear end:
>
> I certainly agree- having variable support in the backend would
> definitely be nice. I'd want it to be explicit and distinct from GUCs
> though, unlike the situation we have w/ psql right now.

Agreed.

> All that said,
> I'm not really a huge fan of write-your-own-authorization-system in
> general. If the existing authorization system isn't sufficient for what
> you want, then let's improve it. There may be specific cases where
> what's needed is particularly complex, but that's what security definer
> functions are for..

Fortunately this functionality also has other uses, so I don't know
that we really need to decide which of those uses we approve of more
or less.

Does the SQL standard specify anything in this area?

...Robert

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-22 15:36:54
Message-ID:	162867791003220836w6e859012w7ce07dccc2e1a99f@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2010/3/22 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Mon, Mar 22, 2010 at 9:39 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> * Joseph Adams (joeyadams3(dot)14159(at)gmail(dot)com) wrote:
>>> I propose adding application-level access control to PostgreSQL via a
>>> jails concept. In a nutshell, a jail is created as part of the
>>> database definition (typically exposing a free variable for the
>>> current user). When a jail is activated for a session, the only
>>> accesses allowed are those indicated in the jail itself. A jail
>>> cannot be exited without closing the session. If used properly, jails
>>> make it possible to safely execute untrusted SQL code (though one may
>>> not want to, citing the principle of least privilege).
>>
>> I guess my initial reaction to this is that you can use roles, views,
>> and pl/pgsql (security definer) functions to achieve this. This does
>> have an interesting intersection with row-level security concepts and
>> that's definitely a project that I'd like to see happen at some point in
>> PG. Not sure if you've considered this, but you can do a 'set role' at
>> the start of a session and then use CURRENT_ROLE in view definitions and
>> in other places. You can also make it so that the user who is logging
>> in (eg 'www-data') doesn't have any rights to anything, except the
>> ability to 'set role' to other roles.
>
> Sometimes it would be nice to conditionalize queries on a value other
> than the authenticated role. I really wish we had some kind of SQL
> variable support. Talking out of my rear end:
>
> CREATE VARIABLE name AS type [GLOBAL | LOCAL]; -- local variables are
> per-backend
> SET VARIABLE name = value;
> SELECT * FROM foo WHERE col = ${name};

I though about it two, three years ago as lighter solution without
packages. Now we have more almost all what we need. Variables are
supported by parser (with last changes related to integration of main
parser to plpgsql).

just you can write

select * from tab where col = var;

there are not necessary some special syntax. And it is available from
all environments. Implementation for scalar type cannot be dificult
(maybe).

Regards
Pavel Stehule

>
> We have had one previous request for the ability to limit the list of
> queries that the user can issue to a fixed set, but it's not clear to
> me that that's really all that useful. If that's what you want, you
> might be better off putting an intermediate layer between the client
> and the database - e.g. this kind of restriction could be implemented
> in the connection pooler...
>
> ...Robert
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>, "Stephen Frost" <sfrost(at)snowman(dot)net>
Cc:	"Joseph Adams" <joeyadams3(dot)14159(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-22 15:53:00
Message-ID:	4BA74C0C0200002500030052@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> Does the SQL standard specify anything in this area?

The only thing that comes to mind for me is the SQL/PSM
<SQL variable declaration>.

-Kevin

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-22 22:42:40
Message-ID:	1269297760.14588.16.camel@vanquo.pezone.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On mån, 2010-03-22 at 09:54 -0400, Robert Haas wrote:
> We have had one previous request for the ability to limit the list of
> queries that the user can issue to a fixed set, but it's not clear to
> me that that's really all that useful.

Well, sudo is pretty useful, and this would be quite similar.

From:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-23 08:54:51
Message-ID:	87eijbfmec.fsf@hi-media-techno.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Well, sudo is pretty useful, and this would be quite similar.

+1.

I guess one of the big difficulties would be to be able to match a given
random query with the list of queries we have in any Jail, given that we
put in there "generic" queries and we want to allow "specific" queries.

But once we have that, it could turn out pretty useful for other
thoughts. I can't find it again in the archives, but the idea was to
collect statistics on views rather than plain table so that you can have
correlated stats on JOINs and some columns etc. The hard part here too
looks like being able to tell at runtime that a given query is a
specific form of an existing view.

Regards,
--
dim

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-23 17:28:34
Message-ID:	4BA8FA42.5070406@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/21/10 9:36 PM, Joseph Adams wrote:
> Inside of the jail definition is a series of pseudo-statements that
> indicate the space of queries the user can perform. Simply creating a
> jail does not make it go into effect. A jail is activated using
> another query, and it remains in effect for the remainder of the
> session. It cannot be deactivated through the protocol, as doing so
> would constitute a privilege escalation.

This is an interesting approach and I don't think that most of the
people commenting on this list have quite grasped it.

I see two major difficulties to solve with this approach: (1)
developing a way of phrasing the query stubs which would allow common
things like dynamic where clauses, order by, and limit, and (2) whether
it's practical for the author of any real application to define all of
those queries beforehand.

For (1), you might want to look at Meredith's libDejector, which takes a
similar approach for SQL-injection protection:
http://www.thesmartpolitenerd.com/code/dejector.html

I don't think that the idea of turning on the jail mode via a
session-level switch works, given the realities of connection pooling.
Also, I do not believe that we currently have any USERSET variable which
can be turned on but not off, so that would require adding a whole new mode.

BTW, if you wanted something less ambitious, we have a longstanding
request to implement "local superuser", that is, the ability to give one
role the ability to edit other roles in one database only.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-23 17:42:49
Message-ID:	603c8f071003231042y3da3536ufcf0c7c691489a55@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 23, 2010 at 1:28 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> I don't think that the idea of turning on the jail mode via a
> session-level switch works, given the realities of connection pooling.
> Also, I do not believe that we currently have any USERSET variable which
> can be turned on but not off, so that would require adding a whole new mode.

I think this could be done with an assign hook.

> BTW, if you wanted something less ambitious, we have a longstanding
> request to implement "local superuser", that is, the ability to give one
> role the ability to edit other roles in one database only.

But roles aren't database-specific... they're globals.

...Robert

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-23 18:43:58
Message-ID:	20100323184358.GC3240@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas escribió:
> On Tue, Mar 23, 2010 at 1:28 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> > BTW, if you wanted something less ambitious, we have a longstanding
> > request to implement "local superuser", that is, the ability to give one
> > role the ability to edit other roles in one database only.
>
> But roles aren't database-specific... they're globals.

Well, that's another longstanding request ;-) (See the
db_user_namespace hack)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-23 18:58:49
Message-ID:	4567.1269370729@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Robert Haas escribi:
>> On Tue, Mar 23, 2010 at 1:28 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>> BTW, if you wanted something less ambitious, we have a longstanding
>>> request to implement "local superuser", that is, the ability to give one
>>> role the ability to edit other roles in one database only.
>>
>> But roles aren't database-specific... they're globals.

> Well, that's another longstanding request ;-) (See the
> db_user_namespace hack)

Yeah, you'd have to fix that first. The "ambitious" part of that is
coming up with a spec that everybody will accept. Once you had that,
coding it might not be very hard ...

BTW, "local superuser" is an oxymoron. If you're superuser you'd have
no trouble whatsoever breaking into other databases. "Local CREATEROLE"
privilege could be a sane concept, though, if we had local roles.

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-24 00:10:43
Message-ID:	20100324001043.GD3240@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane escribió:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Robert Haas escribi:

> >> But roles aren't database-specific... they're globals.
>
> > Well, that's another longstanding request ;-) (See the
> > db_user_namespace hack)
>
> Yeah, you'd have to fix that first. The "ambitious" part of that is
> coming up with a spec that everybody will accept. Once you had that,
> coding it might not be very hard ...

I wonder if this is simpler now that we got rid of the flat files stuff.
We could validate the user once we've connected to a database and thus
able to poke at the local user catalog, not just the global one. I
think that was a serious roadblock.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-24 00:16:07
Message-ID:	10784.1269389767@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> I wonder if this is simpler now that we got rid of the flat files stuff.
> We could validate the user once we've connected to a database and thus
> able to poke at the local user catalog, not just the global one. I
> think that was a serious roadblock.

I think it'd be a mistake to invent a separate catalog for local users;
what had been nice clean foreign key relationships (eg, relowner ->
pg_auth.oid) would suddenly become a swamp.

My first thought about a catalog representation would be to add a column
to pg_auth which is a DB OID for local users or zero for global users.
However, you'd probably want to prevent local users and global users
from having the same names, and it's not very clear how to do that
with this representation (though that'd be even worse with separate
catalogs). I guess we could fall back on a creation-time check (ick).

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-24 00:27:31
Message-ID:	603c8f071003231727k1642667dr75af056531b63da4@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 23, 2010 at 8:16 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> I wonder if this is simpler now that we got rid of the flat files stuff.
>> We could validate the user once we've connected to a database and thus
>> able to poke at the local user catalog, not just the global one. I
>> think that was a serious roadblock.
>
> I think it'd be a mistake to invent a separate catalog for local users;
> what had been nice clean foreign key relationships (eg, relowner ->
> pg_auth.oid) would suddenly become a swamp.
>
> My first thought about a catalog representation would be to add a column
> to pg_auth which is a DB OID for local users or zero for global users.
> However, you'd probably want to prevent local users and global users
> from having the same names, and it's not very clear how to do that
> with this representation (though that'd be even worse with separate
> catalogs). I guess we could fall back on a creation-time check (ick).

Could we use a suitably defined exclusion constraint?

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-24 00:30:09
Message-ID:	11008.1269390609@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Mar 23, 2010 at 8:16 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> My first thought about a catalog representation would be to add a column
>> to pg_auth which is a DB OID for local users or zero for global users.
>> However, you'd probably want to prevent local users and global users
>> from having the same names, and it's not very clear how to do that
>> with this representation (though that'd be even worse with separate
>> catalogs). I guess we could fall back on a creation-time check (ick).

> Could we use a suitably defined exclusion constraint?

Not unless you'd like to solve the issues with triggers on system
catalogs first ...

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-24 01:51:32
Message-ID:	603c8f071003231851s1c50687eq107bc7b347a1bd8f@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 23, 2010 at 8:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Mar 23, 2010 at 8:16 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> My first thought about a catalog representation would be to add a column
>>> to pg_auth which is a DB OID for local users or zero for global users.
>>> However, you'd probably want to prevent local users and global users
>>> from having the same names, and it's not very clear how to do that
>>> with this representation (though that'd be even worse with separate
>>> catalogs). I guess we could fall back on a creation-time check (ick).
>
>> Could we use a suitably defined exclusion constraint?
>
> Not unless you'd like to solve the issues with triggers on system
> catalogs first ...

Urp. Not really, though I don't know what they are exactly. I didn't
think exclusion constraints depended on triggers. UNIQUE constraints
work on system catalogs, right?

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-24 02:17:31
Message-ID:	12784.1269397051@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Mar 23, 2010 at 8:30 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Not unless you'd like to solve the issues with triggers on system
>> catalogs first ...

> Urp. Not really, though I don't know what they are exactly. I didn't
> think exclusion constraints depended on triggers. UNIQUE constraints
> work on system catalogs, right?

UNIQUE constraints depend on internal support in the index access method
(see today's thread with Gokulakannan Somasundaram for some details of
how btree does it). Exclusion constraints have a totally different
implementation --- they don't require index AM support, but they do use
triggers.

Now having said that, my recollection is that the worst issues
surrounding triggers on catalogs had to do with BEFORE triggers.
Exclusion constraint triggers would be AFTER triggers, so maybe it could
be made to work. It'd still be significant work though, for not a lot
of value as far as this particular issue goes.

regards, tom lane

From:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-26 03:42:48
Message-ID:	e7e5fefd1003252042w80dc35era426f8b356211446@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I apologize for my silence, as I've been busy reading up more on the
internals of PostgreSQL.

From what I can tell, a big problem with my jails idea (as well as the
variables Robert described) is that there really isn't a way to store
context in the backend specifically for the end client (e.g. a PHP
script) due to connection pooling. Also, I almost feel that storing
such context would be a disadvantage, as it would harm some of the
referential transparency that pooling and caching take advantage of,
now and in the future. However, I'm not going to give up :)

Perhaps we could have some sort of LET statement that allows the
client to pass data to the server, then have libpq automatically wrap
queries with the LET statement (when necessary). Here's what it would
look like to the PHP scripter:

// New libpq function
pg_set('current_user', 'bob');

$result = pg_query_params(
'SELECT answer FROM secrets WHERE user=current_user AND question=$1',
array('Birth place'));

What this really does is something like:

$result = pg_query_params(
'LET current_user=$1 DO $2 $3',
array(
'bob',
'SELECT answer FROM secrets WHERE user=current_user AND question=$1',
'Birth place')
));

Here, the hypothetical LET statement executes a query string, binding
current_user to our desired value. The client library would wrap all
future queries in this fashion.

Granted, it would be silly to pass the value itself to the server over
and over, so a serious implementation would probably pass a context
ID, and these variable assignments would live in the backend instead.
Moreover, LET is a terrible keyword choice here, considering most
PostgreSQL users won't need to use it explicitly thanks to additional
libpq support.

Alternatively (this might require changing the client/server
protocol), a context ID could be passed back and forth, thus providing
a way to tell clients apart.

Implementing this idea requires adding to the backend and to libpq.
The backend would need at least two new statements. One would set a
variable of a session context, creating one if necessary and returning
its ID. Another would execute a string as a parameter and bind both
immediate arguments and session context to it. libpq would need a
function to set a variable, and it would need to wrap queries it sends
out with LET statements if necessary.

Note that these variables can't be used in pre-defined functions
unless they are somehow declared in advance. One idea would be to
first add global variable support, then make session-local contexts be
able to temporarily reassign those variables. Another would be to
provide an explicit declaration statement.

Would this make a good proposal for GSoC?: Implement the backend part
of my proposal, and create a proof-of-concept wrapper demonstrating
it. This way, I add the new statements, but don't mess around with
existing functionality too much.

From:	Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-26 09:44:05
Message-ID:	4BAC81E5.7030405@cs.helsinki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 3/26/10 5:42 AM +0200, Joseph Adams wrote:
> // New libpq function
> pg_set('current_user', 'bob');
>
> $result = pg_query_params(
> 'SELECT answer FROM secrets WHERE user=current_user AND question=$1',
> array('Birth place'));
>
>
> What this really does is something like:
>
> $result = pg_query_params(
> 'LET current_user=$1 DO $2 $3',
> array(
> 'bob',
> 'SELECT answer FROM secrets WHERE user=current_user AND question=$1',
> 'Birth place')
> ));

Looks to me like this is already achievable with custom GUCs and views.

Regards,
Marko Tiikkaja

From:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-26 10:36:06
Message-ID:	87ljdfbca1.fsf@hi-media-techno.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Disclaimer: the following is only my view on the matter and is not meant
as representative of the project views, which are reached through
discussion and consensus. IOW, my 2¢.

Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com> writes:
> Perhaps we could have some sort of LET statement that allows the
> client to pass data to the server, then have libpq automatically wrap
> queries with the LET statement (when necessary). Here's what it would
> look like to the PHP scripter:

My first reaction reading this would be to keep the idea but extend the
WITH syntax instead, so you'd have

WITH BINDINGS bob AS (
current_user = 'bob'
)
SELECT answer FROM secrets WHERE user=current_user AND question=$1;

That said, you can already (as of 8.4) do the following:

WITH bob(name) AS (
SELECT 'bob'
)
SELECT answer FROM secrets, bob WHERE user=bob.name AND question=$1;

The syntax WITH bob(current_user) is not possible because of the
standard using current_user as a keyword (IIUC), but you get the idea.

> Granted, it would be silly to pass the value itself to the server over
> and over, so a serious implementation would probably pass a context
> ID, and these variable assignments would live in the backend instead.

I wonder if creating a temporary (I mean per-backend) new catalog where
to store the bindings (or whatever you name them) and then allow another
syntax like the following would help here:

WITH BINDINGS bob ()
SELECT ...

The real problem though is that when using a transaction level pooling
system you want to tie your bindings to a transaction, not to a
session. So I'm not sure if storing the bindings in a local backend
catalog is a must-have feature.

Regards,
--
dim

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-26 16:07:46
Message-ID:	603c8f071003260907y23f273a8tce191427e24a97de@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 25, 2010 at 11:42 PM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> From what I can tell, a big problem with my jails idea (as well as the
> variables Robert described) is that there really isn't a way to store
> context in the backend specifically for the end client (e.g. a PHP
> script) due to connection pooling. Also, I almost feel that storing
> such context would be a disadvantage, as it would harm some of the
> referential transparency that pooling and caching take advantage of,
> now and in the future. However, I'm not going to give up :)
>
> Perhaps we could have some sort of LET statement that allows the
> client to pass data to the server, then have libpq automatically wrap
> queries with the LET statement (when necessary). Here's what it would
> look like to the PHP scripter:
>
> // New libpq function
> pg_set('current_user', 'bob');
>
> $result = pg_query_params(
> 'SELECT answer FROM secrets WHERE user=current_user AND question=$1',
> array('Birth place'));
>
>
> What this really does is something like:
>
> $result = pg_query_params(
> 'LET current_user=$1 DO $2 $3',
> array(
> 'bob',
> 'SELECT answer FROM secrets WHERE user=current_user AND question=$1',
> 'Birth place')
> ));
>
>
> Here, the hypothetical LET statement executes a query string, binding
> current_user to our desired value. The client library would wrap all
> future queries in this fashion.
>
> Granted, it would be silly to pass the value itself to the server over
> and over, so a serious implementation would probably pass a context
> ID, and these variable assignments would live in the backend instead.
> Moreover, LET is a terrible keyword choice here, considering most
> PostgreSQL users won't need to use it explicitly thanks to additional
> libpq support.
>
> Alternatively (this might require changing the client/server
> protocol), a context ID could be passed back and forth, thus providing
> a way to tell clients apart.
>
> Implementing this idea requires adding to the backend and to libpq.
> The backend would need at least two new statements. One would set a
> variable of a session context, creating one if necessary and returning
> its ID. Another would execute a string as a parameter and bind both
> immediate arguments and session context to it. libpq would need a
> function to set a variable, and it would need to wrap queries it sends
> out with LET statements if necessary.
>
> Note that these variables can't be used in pre-defined functions
> unless they are somehow declared in advance. One idea would be to
> first add global variable support, then make session-local contexts be
> able to temporarily reassign those variables. Another would be to
> provide an explicit declaration statement.
>
> Would this make a good proposal for GSoC?: Implement the backend part
> of my proposal, and create a proof-of-concept wrapper demonstrating
> it. This way, I add the new statements, but don't mess around with
> existing functionality too much.

Hmm. I'm not sure exactly what problem you're trying to solve here.
I don't think this is a particularly good design for supporting
variables inside the server, since, well, it doesn't actually support
variables inside the server. If we just want a crude hack for
allowing the appearance of session-local server-side variables, that
could be implemented entirely in client code - in fact it could be
done as a thin wrapper around libpq that just does textual
substitution of the variables actually referenced by a particular
query. That wouldn't require any modifications to core PostgreSQL at
all, and it would probably perform better too since you'd not send all
the unnecessary variables with every query.

Of course, you're 100% correct that connection pooling won't
necessarily play well with this feature, but that doesn't mean that we
shouldn't implement it. For one thing, not everybody uses connection
pooling; for two things, I think global variables (that would behave
sort of like a sequence - they'd act sort of like a single column
single row relation) would also be useful, and those WOULD work in a
connection-pooling environment.

But, I think that implementing any kind of variable support in the
backend is way too ambitious a project for a first-time hacker to get
done in a couple of months. I would guess that's a two-year project
for a first time hacker or a one-year project for an experienced
hacker (or a three week project for Tom Lane). Here are some ideas
from http://wiki.postgresql.org/wiki/Todo that I think MIGHT be closer
to the right size for GSOC:

Allow administrators to cancel multi-statement idle transactions
Check for unreferenced table files created by transactions that were
in-progress when the server terminated abruptly
Add functions to check correctness of configuration files before they
are loaded "live"
Add JSON (JavaScript Object Notation) data type [tricky part will be
getting community buy-in on which JSON library to use]
Allow ALTER TABLE ... ALTER CONSTRAINT ... RENAME
Allow ALTER TABLE to change constraint deferrability and actions
Add missing object types for ALTER ... SET SCHEMA
Add support for multiple pg_restore -t options, like pg_dump
Allow triggers to be disabled in only the current session [without the
necessity of modifying system tables]
Allow single batch hash joins to preserve outer pathkeys [definitely
harder than some of the above]
Fix system views like pg_stat_all_tables to use set-returning
functions, rather than views of per-column functions

Other ideas:

Allow per-tablespace effective_io_concurrency
Add a GIST opclass for inet/cidr that can support an exclusion
constraint for "cidr blocks do not overlap"
ALTER VIEW ... DROP COLUMN (or alternatively/in addition RENAME COLUMN)

(I now wait for the chorus of people telling me that these ideas are
(a) too easy, (b) too hard, or (c) too biased toward my own
priorities. I readily admit to (c) - I tried to list things here
where I have some idea of what would be required to implement the
feature, so it's therefore biased toward the parts of the system with
which I'm familiar, which in turn are the ones I care about. Feel
free to add your own ideas or critique these.)

...Robert

From:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-27 02:19:14
Message-ID:	e7e5fefd1003261919qb7ddd5ch5b4e6f22fa40a69@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 26, 2010 at 12:07 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> Hmm. I'm not sure exactly what problem you're trying to solve here.
> I don't think this is a particularly good design for supporting
> variables inside the server, since, well, it doesn't actually support
> variables inside the server. If we just want a crude hack for
> allowing the appearance of session-local server-side variables, that
> could be implemented entirely in client code - in fact it could be
> done as a thin wrapper around libpq that just does textual
> substitution of the variables actually referenced by a particular
> query. That wouldn't require any modifications to core PostgreSQL at
> all, and it would probably perform better too since you'd not send all
> the unnecessary variables with every query.

One problem with a textual substitution is that implicit variable use
(e.g. selecting from a view) can't be substituted, at least not
trivially. As for "sending unnecessary variables with every query",
my idea was to store those variables in a global table keyed by
context ID, then just send that context ID with every query.

> But, I think that implementing any kind of variable support in the
> backend is way too ambitious a project for a first-time hacker to get
> done in a couple of months. I would guess that's a two-year project
> for a first time hacker or a one-year project for an experienced
> hacker (or a three week project for Tom Lane). Here are some ideas
> from http://wiki.postgresql.org/wiki/Todo that I think MIGHT be closer
> to the right size for GSOC:
> [...]
> Add JSON (JavaScript Object Notation) data type [tricky part will be
> getting community buy-in on which JSON library to use]

The JSON idea caught my eye. I guess the best approach here would be
not to use an external library, but to implement it manually using
flex/bison. Most of the work would probably revolve around converting
things to/from PostgreSQL types, writing test cases, and getting it
integrated; writing the parser itself should be a "piece of cake".

At first, I figured adding JSON support would be almost too trivial:
just parse it, then you're done. After seeing that
src/backend/utils/adt/xml.c is 3497 lines, I learned there's a bit
more to it :)

I skimmed through some JSON implementations in C, and I didn't find
any using bison/flex. From the looks of it, I do like JSON_parser (
http://fara.cs.uni-potsdam.de/~jsg/json_parser/ ) because it appears
to be written for speed.

I think one benefit of adding JSON support is that it would provide a
way to store EAV-type data with less overhead than XML (and no
dependency on an external library). If this were the only goal,
binary encoding would be even better. However, I suppose JSON is more
popular and easier to work with in practice.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal: access control jails (and introduction as aspiring GSoC student)
Date:	2010-03-27 19:30:51
Message-ID:	603c8f071003271230t5fc77d65j523a54d6c9599f23@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 26, 2010 at 10:19 PM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> One problem with a textual substitution is that implicit variable use
> (e.g. selecting from a view) can't be substituted, at least not
> trivially. As for "sending unnecessary variables with every query",
> my idea was to store those variables in a global table keyed by
> context ID, then just send that context ID with every query.

That could work, though I'm not totally sure I like the design.

>> But, I think that implementing any kind of variable support in the
>> backend is way too ambitious a project for a first-time hacker to get
>> done in a couple of months. I would guess that's a two-year project
>> for a first time hacker or a one-year project for an experienced
>> hacker (or a three week project for Tom Lane). Here are some ideas
>> from http://wiki.postgresql.org/wiki/Todo that I think MIGHT be closer
>> to the right size for GSOC:
>> [...]
>> Add JSON (JavaScript Object Notation) data type [tricky part will be
>> getting community buy-in on which JSON library to use]
>
> The JSON idea caught my eye. I guess the best approach here would be
> not to use an external library, but to implement it manually using
> flex/bison. Most of the work would probably revolve around converting
> things to/from PostgreSQL types, writing test cases, and getting it
> integrated; writing the parser itself should be a "piece of cake".

Well, that's one of the central questions: is that the best approach,
or should we integrate to an external library and if so which one?
That's part of why this is a research problem.

> At first, I figured adding JSON support would be almost too trivial:
> just parse it, then you're done. After seeing that
> src/backend/utils/adt/xml.c is 3497 lines, I learned there's a bit
> more to it :)
>
> I skimmed through some JSON implementations in C, and I didn't find
> any using bison/flex. From the looks of it, I do like JSON_parser (
> http://fara.cs.uni-potsdam.de/~jsg/json_parser/ ) because it appears
> to be written for speed.
>
> I think one benefit of adding JSON support is that it would provide a
> way to store EAV-type data with less overhead than XML (and no
> dependency on an external library). If this were the only goal,
> binary encoding would be even better. However, I suppose JSON is more
> popular and easier to work with in practice.

Yeah. And the row can always be toasted afterwards if appropriate.

...Robert