Re: [PATCH] Fix leaky VIEWs for RLS

Lists: pgsql-hackers
From: Marc Munro <marc(at)bloodnok(dot)com>
To: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [pgsql-hackers] Daily digest v1.10705 (13 messages)
Date: 2010-06-03 17:23:12
Message-ID: 1275585792.2010.28.camel@bloodnok.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2010-06-03 at 05:53 -0300, pgsql-hackers-owner(at)postgresql(dot)org
wrote:
> [ . . . ]
>
> In my current idea, when a qual-node that contains FuncExpr tries to
> reference a part of relations within a security view, its referencing
> relations will be expanded to whole of the security view at
> distribute_qual_to_rels().
> [ . . . ]

I may be missing something here but this seems a bit too simplistic and,
I think, fails to deal with an important use case.

Security views, that do anything useful at all, tend to introduce
performance issues. I am concerned that placing a conceptual barrier
between the secured and unsecured parts of queries is going to
unnecessarily restrict what the optimiser can do.

For example consider that we have three secured views, each of the form:

create view s_x as select * from x where i_can_see(x.key);

and consider the query:

select stuff from s_x
inner join s_y on s_y.key = s_x.key
inner join s_z on s_z.key = s_x.key
where fn(s_x.a) = 3;

The optimiser ought to be able to spot the fact that i_can_see() need
only be called once for each joined result. By placing a barrier (if I
understand your proposal correctly) between the outermost joins and the
inner views, doesn't this optimisation become impossible?

I think a simpler solution may be possible here. If you can tag the
function i_can_see() as a security function, at least in the context of
its use in the security views, and then create the rule that security
functions are always considered to be lower cost than user-defined
non-security functions, don't we achieve the result of preventing the
insecure function from seeing rows that it shouldn't?

I guess my concern is that a query may be constructed a=out of secured
and unsecured parts and the optimiser should be free to group all of the
secured parts together before considering the unsecured parts.

Sorry for the imprecise language and terminolgy, and also if I have
completely misunderstood the implications.

Best Wishes

__
Marc (the veil guy)


From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: marc(at)bloodnok(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-04 00:32:03
Message-ID: 4C084983.6080301@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I fixed up the subject.

(2010/06/04 2:23), Marc Munro wrote:
> On Thu, 2010-06-03 at 05:53 -0300, pgsql-hackers-owner(at)postgresql(dot)org
> wrote:
>> [ . . . ]
>>
>> In my current idea, when a qual-node that contains FuncExpr tries to
>> reference a part of relations within a security view, its referencing
>> relations will be expanded to whole of the security view at
>> distribute_qual_to_rels().
>> [ . . . ]
>
> I may be missing something here but this seems a bit too simplistic and,
> I think, fails to deal with an important use case.
>
> Security views, that do anything useful at all, tend to introduce
> performance issues. I am concerned that placing a conceptual barrier
> between the secured and unsecured parts of queries is going to
> unnecessarily restrict what the optimiser can do.
>
> For example consider that we have three secured views, each of the form:
>
> create view s_x as select * from x where i_can_see(x.key);
>
> and consider the query:
>
> select stuff from s_x
> inner join s_y on s_y.key = s_x.key
> inner join s_z on s_z.key = s_x.key
> where fn(s_x.a) = 3;
>
> The optimiser ought to be able to spot the fact that i_can_see() need
> only be called once for each joined result. By placing a barrier (if I
> understand your proposal correctly) between the outermost joins and the
> inner views, doesn't this optimisation become impossible?
>
It seems to me incorrect.

If i_can_see() can filter a part of tuples within table: x, the optimizer
prefers to call i_can_see() on scanning each tables rather than result of
join, because it effectively reduce the size of scan.

In fact, the existing optimizer make the following plan:

postgres=# CREATE FUNCTION i_can_see(int) RETURNS bool IMMUTABLE
AS 'begin return $1 % 1 = 1; end;' LANGUAGE 'plpgsql';
CREATE FUNCTION
postgres=# CREATE VIEW v1 as select * from t1 where i_can_see(a);
CREATE VIEW
postgres=# CREATE VIEW v2 as select * from t2 where i_can_see(x);
CREATE VIEW
postgres=# CREATE VIEW v3 as select * from t3 where i_can_see(s);
CREATE VIEW

postgres=# EXPLAIN SELECT * FROM (v1 JOIN v2 ON v1.a=v2.x) JOIN v3 on v1.a=v3.s;
QUERY PLAN
-------------------------------------------------------------------------------
Nested Loop (cost=0.00..860.47 rows=410 width=108)
-> Nested Loop (cost=0.00..595.13 rows=410 width=72)
-> Seq Scan on t1 (cost=0.00..329.80 rows=410 width=36)
Filter: i_can_see(a)
-> Index Scan using t2_pkey on t2 (cost=0.00..0.63 rows=1 width=36)
Index Cond: (t2.x = t1.a)
Filter: i_can_see(t2.x)
-> Index Scan using t3_pkey on t3 (cost=0.00..0.63 rows=1 width=36)
Index Cond: (t3.s = t1.a)
Filter: i_can_see(t3.s)
(10 rows)

Sorry, I may miss something your point.

> I think a simpler solution may be possible here. If you can tag the
> function i_can_see() as a security function, at least in the context of
> its use in the security views, and then create the rule that security
> functions are always considered to be lower cost than user-defined
> non-security functions, don't we achieve the result of preventing the
> insecure function from seeing rows that it shouldn't?
>
Sorry, it seems to me you mixed up different issues.

My patch tries to tackle the problem that optimizer distributes outermost
functions into inside of the view when the function takes arguments only
from a side of join, even if security views.
This issue is independent from cost of functions.

On the other hand, when a scan plan has multiple qualifiers to filter
fetched tuples, we have to pay attention on the order to be called.
If outermost functions are called prior to view's policy functions,
it may cause unexpected leaks.

In my opinion, we should consider the nestlevel of the function where
it is originally put. But it is not concluded yet, and the patch does
not tackle anything about this issue.

> I guess my concern is that a query may be constructed a=out of secured
> and unsecured parts and the optimiser should be free to group all of the
> secured parts together before considering the unsecured parts.
>
Yes, it is an opinion, but it may cause unnecessary performance regression
when user given condition is obviously harmless.

E.g, If table: t has primary key t.a, and a security view is defined as:

CREATE VIEW v AS SELECT * FROM t WHERE f_policy(t.b);

When user gives the following query:

SELECT * FROM v WHERE a = 100;

If we strictly separate secured and unsecure part prior to optimizer,
it will break a chance to scan the table: t with index.

I also think security is not free, so it may take a certain level of
performance degradation. But it should not bring down more than necessity.

Thanks,
--
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: marc(at)bloodnok(dot)com
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [pgsql-hackers] Daily digest v1.10705 (13 messages)
Date: 2010-06-04 02:55:40
Message-ID: AANLkTilU1rwx-Co1LCWQ2murDUVaLVUCXtPEh8dtk3L8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 3, 2010 at 1:23 PM, Marc Munro <marc(at)bloodnok(dot)com> wrote:
> On Thu, 2010-06-03 at 05:53 -0300, pgsql-hackers-owner(at)postgresql(dot)org
> wrote:
>> [ . . . ]
>>
>> In my current idea, when a qual-node that contains FuncExpr tries to
>> reference a part of relations within a security view, its referencing
>> relations will be expanded to whole of the security view at
>> distribute_qual_to_rels().
>> [ . . . ]
>
> I may be missing something here but this seems a bit too simplistic and,
> I think, fails to deal with an important use case.

If anything, you're putting it mildly. This is quite a bit too
simplistic and fails to deal with several important issues, at least
some of which have already been mentioned on this thread.

> The optimiser ought to be able to spot the fact that i_can_see() need
> only be called once for each joined result.  By placing a barrier (if I
> understand your proposal correctly) between the outermost joins and the
> inner views, doesn't this optimisation become impossible?
>
> I think a simpler solution may be possible here.  If you can tag the
> function i_can_see() as a security function, at least in the context of
> its use in the security views, and then create the rule that security
> functions are always considered to be lower cost than user-defined
> non-security functions, don't we achieve the result of preventing the
> insecure function from seeing rows that it shouldn't?

So, yes and no. You DO need a security barrier between the view and
the rest of the query, but if a function can be trusted not to do evil
things, then it should be allowed to be pushed down. What we need to
prevent is the pushdown of untrusted functions (or operators). A
(very) important part of this problem is determining which quals are
safe to push down.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-04 04:17:38
Message-ID: 4C087E62.1070500@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2010/06/04 11:55), Robert Haas wrote:
> On Thu, Jun 3, 2010 at 1:23 PM, Marc Munro<marc(at)bloodnok(dot)com> wrote:
>> On Thu, 2010-06-03 at 05:53 -0300, pgsql-hackers-owner(at)postgresql(dot)org
>> wrote:
>>> [ . . . ]
>>>
>>> In my current idea, when a qual-node that contains FuncExpr tries to
>>> reference a part of relations within a security view, its referencing
>>> relations will be expanded to whole of the security view at
>>> distribute_qual_to_rels().
>>> [ . . . ]
>>
>> I may be missing something here but this seems a bit too simplistic and,
>> I think, fails to deal with an important use case.
>
> If anything, you're putting it mildly. This is quite a bit too
> simplistic and fails to deal with several important issues, at least
> some of which have already been mentioned on this thread.
>
Hmm. Was it too simplified even if just a proof of concept?

>> The optimiser ought to be able to spot the fact that i_can_see() need
>> only be called once for each joined result. By placing a barrier (if I
>> understand your proposal correctly) between the outermost joins and the
>> inner views, doesn't this optimisation become impossible?
>>
>> I think a simpler solution may be possible here. If you can tag the
>> function i_can_see() as a security function, at least in the context of
>> its use in the security views, and then create the rule that security
>> functions are always considered to be lower cost than user-defined
>> non-security functions, don't we achieve the result of preventing the
>> insecure function from seeing rows that it shouldn't?
>
> So, yes and no. You DO need a security barrier between the view and
> the rest of the query, but if a function can be trusted not to do evil
> things, then it should be allowed to be pushed down. What we need to
> prevent is the pushdown of untrusted functions (or operators). A
> (very) important part of this problem is determining which quals are
> safe to push down.
>
At least, I don't have an idea to distinguish trusted functions from
others without any additional hints, because we support variable kind
of PL languages. :(

An idea is to add TRUSTED (or other) keyword for CREATE FUNCTION to mark
this function is harmless to execute, but only DBA can define functions
with the option.
(Perhaps, most of built-in functions should be marked as trusted?)

If we should identify a function as either trusted or untrusted, is
there any other ideas?

Thanks,
--
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-04 04:57:13
Message-ID: 25135.1275627433@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com> writes:
> (2010/06/04 11:55), Robert Haas wrote:
>> A (very) important part of this problem is determining which quals are
>> safe to push down.
>>
> At least, I don't have an idea to distinguish trusted functions from
> others without any additional hints, because we support variable kind
> of PL languages. :(

The proposal some time back in this thread was to trust all built-in
functions and no others. That's a bit simplistic, no doubt, but it
seems to me to largely solve the performance problem and to do so with
minimal effort. When and if you get to a solution that's committable
with respect to everything else, it might be time to think about
more flexible answers to that particular point.

regards, tom lane


From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-04 05:24:54
Message-ID: 4C088E26.2010201@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2010/06/04 13:57), Tom Lane wrote:
> KaiGai Kohei<kaigai(at)ak(dot)jp(dot)nec(dot)com> writes:
>> (2010/06/04 11:55), Robert Haas wrote:
>>> A (very) important part of this problem is determining which quals are
>>> safe to push down.
>>>
>> At least, I don't have an idea to distinguish trusted functions from
>> others without any additional hints, because we support variable kind
>> of PL languages. :(
>
> The proposal some time back in this thread was to trust all built-in
> functions and no others. That's a bit simplistic, no doubt, but it
> seems to me to largely solve the performance problem and to do so with
> minimal effort. When and if you get to a solution that's committable
> with respect to everything else, it might be time to think about
> more flexible answers to that particular point.
>
Although I've not check all the functions within pg_proc.h, it seems to
me reasonable assumptions, except for a small number of exception which
have side-effects, such as lo_write().

Maybe, we have to shut out a small number of exceptions.
However, at least, it seems to me reasonable assumption in this stage.

Thanks,
--
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-04 09:26:10
Message-ID: m28w6v87yl.fsf@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> The proposal some time back in this thread was to trust all built-in
> functions and no others. That's a bit simplistic, no doubt, but it
> seems to me to largely solve the performance problem and to do so with
> minimal effort. When and if you get to a solution that's committable
> with respect to everything else, it might be time to think about
> more flexible answers to that particular point.

What about trusting all "internal" and "C" language function instead? My
understanding is that "internal" covers built-in functions, and as you
need to be a superuser to CREATE a "C" language function, surely you're
able to accept that by doing so you get to trust it?

How useful would that be?
--
dim


From: KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-04 10:38:12
Message-ID: 4C08D794.4080903@kaigai.gr.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2010/06/04 18:26), Dimitri Fontaine wrote:
> Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> The proposal some time back in this thread was to trust all built-in
>> functions and no others. That's a bit simplistic, no doubt, but it
>> seems to me to largely solve the performance problem and to do so with
>> minimal effort. When and if you get to a solution that's committable
>> with respect to everything else, it might be time to think about
>> more flexible answers to that particular point.
>
> What about trusting all "internal" and "C" language function instead? My
> understanding is that "internal" covers built-in functions, and as you
> need to be a superuser to CREATE a "C" language function, surely you're
> able to accept that by doing so you get to trust it?
>
> How useful would that be?

If we trust all the "C" language functions, it also means DBA can never
install any binary functions having side-effect (e.g, pg_file_write() in
the contrib/adminpack ) without security risks.

If we need an intelligence to identify what functions are trusted and
what ones are untrusted, it will eventually need a hint to mark a certain
function as trusted, won't it?

Thanks,
--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-04 12:26:28
Message-ID: 4C08F0F4.90104@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/06/10 07:57, Tom Lane wrote:
> KaiGai Kohei<kaigai(at)ak(dot)jp(dot)nec(dot)com> writes:
>> (2010/06/04 11:55), Robert Haas wrote:
>>> A (very) important part of this problem is determining which quals are
>>> safe to push down.
>>>
>> At least, I don't have an idea to distinguish trusted functions from
>> others without any additional hints, because we support variable kind
>> of PL languages. :(
>
> The proposal some time back in this thread was to trust all built-in
> functions and no others.

I thought I debunked that idea already
(http://archives.postgresql.org/pgsql-hackers/2009-10/msg01428.php). Not
all built-in functions are safe. Consider casting integer to text, for
example. Seems innocent at first glance, but it's not; if the input is
not a valid integer, it throws an error which contains the input string,
revealing it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-04 14:33:57
Message-ID: 2629.1275662037@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 04/06/10 07:57, Tom Lane wrote:
>> The proposal some time back in this thread was to trust all built-in
>> functions and no others.

> I thought I debunked that idea already
> (http://archives.postgresql.org/pgsql-hackers/2009-10/msg01428.php). Not
> all built-in functions are safe. Consider casting integer to text, for
> example. Seems innocent at first glance, but it's not; if the input is
> not a valid integer, it throws an error which contains the input string,
> revealing it.

Hmm ... that's a mighty interesting example, because it shows that any
well-meaning change in error handling might render seemingly-unrelated
functions "unsafe". And we're certainly not going to make error
messages stop showing relevant information just because of this.

Maybe the entire idea is unworkable. I certainly don't find any comfort
in your proposal in the above-referenced message to trust index
operators; where is it written that those don't throw errors?

regards, tom lane


From: Marc Munro <marc(at)bloodnok(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-04 16:49:22
Message-ID: 1275670162.26353.19.camel@bloodnok.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2010-06-04 at 10:33 -0400, Tom Lane wrote:
> Hmm ... that's a mighty interesting example, because it shows that any
> well-meaning change in error handling might render seemingly-unrelated
> functions "unsafe". And we're certainly not going to make error
> messages stop showing relevant information just because of this.

Although that looks like a show-stopper, I think it can be worked
around. Errors in operations on security views could simply be caught
and conditionally rewritten. The original error could still appear in
the logs but the full details need not be reported to unprivileged
users.

If that can be done, then we would still need to be able to identify
trusted functions and views used for security purposes, and ensure that
(please excuse my terminology if it is incorrect) untrusted quals do not
get pushed down inside secured views. If all of that can be done along
with the error trapping, then we may have a solution.

My big concern is still about performance, particularly when joining
between multiple security views and other objects. I don't expect to
get security for free but I don't want to see unnecessary barriers to
optimisation.

__
Marc


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-04 17:46:16
Message-ID: 4C093BE8.2040200@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/06/10 17:33, Tom Lane wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> On 04/06/10 07:57, Tom Lane wrote:
>>> The proposal some time back in this thread was to trust all built-in
>>> functions and no others.
>
>> I thought I debunked that idea already
>> (http://archives.postgresql.org/pgsql-hackers/2009-10/msg01428.php). Not
>> all built-in functions are safe. Consider casting integer to text, for
>> example.

(I meant "text to integer", of course)

> Maybe the entire idea is unworkable. I certainly don't find any comfort
> in your proposal in the above-referenced message to trust index
> operators; where is it written that those don't throw errors?

Let's consider b-tree operators for an index on the secure table, for
starters. Surely a b-tree index comparison operator can't throw an error
on any value that's in the table already, you would've gotten an error
trying to insert that.

Now, is it safe to expand that thinking to b-tree operators in general,
even if there's no such index on the table? I'm not sure. But indexable
operations are what we care about the most; the order of executing those
determines if you can use an index scan or not.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-04 18:00:19
Message-ID: AANLkTimfhs6q5DCBP1IogCQdHX7mwoHrzLmT_u9So4JF@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jun 4, 2010 at 1:46 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 04/06/10 17:33, Tom Lane wrote:
>>
>> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com>  writes:
>>>
>>> On 04/06/10 07:57, Tom Lane wrote:
>>>>
>>>> The proposal some time back in this thread was to trust all built-in
>>>> functions and no others.
>>
>>> I thought I debunked that idea already
>>> (http://archives.postgresql.org/pgsql-hackers/2009-10/msg01428.php). Not
>>> all built-in functions are safe. Consider casting integer to text, for
>>> example.
>
> (I meant "text to integer", of course)
>
>> Maybe the entire idea is unworkable.  I certainly don't find any comfort
>> in your proposal in the above-referenced message to trust index
>> operators; where is it written that those don't throw errors?
>
> Let's consider b-tree operators for an index on the secure table, for
> starters. Surely a b-tree index comparison operator can't throw an error on
> any value that's in the table already, you would've gotten an error trying
> to insert that.
>
> Now, is it safe to expand that thinking to b-tree operators in general, even
> if there's no such index on the table? I'm not sure. But indexable
> operations are what we care about the most; the order of executing those
> determines if you can use an index scan or not.

Another idea I had was... would it be safe to trust functions defined
by the same user who owns the view? If he's granted access to the
view and the function to some other user, presumably he doesn't mind
them being used together? Or is that too optimistic?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-04 19:33:29
Message-ID: 22280.1275680009@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 04/06/10 17:33, Tom Lane wrote:
>> Maybe the entire idea is unworkable. I certainly don't find any comfort
>> in your proposal in the above-referenced message to trust index
>> operators; where is it written that those don't throw errors?

> Let's consider b-tree operators for an index on the secure table, for
> starters. Surely a b-tree index comparison operator can't throw an error
> on any value that's in the table already, you would've gotten an error
> trying to insert that.

Man, are *you* trusting.

A counterexample: suppose we had a form of type "text" that carried a
collation specifier internally, and the comparison routine threw an
error if asked to compare values with incompatible specifiers. An index
built on a column of all the same collation would work fine. A query
that tried to compare against a constant of a different collation would
throw an error.

> I'm not sure. But indexable
> operations are what we care about the most; the order of executing those
> determines if you can use an index scan or not.

Personally, I care just as much about hash and merge join operators...

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-04 19:58:25
Message-ID: 4C095AE1.7000309@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/06/10 22:33, Tom Lane wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> On 04/06/10 17:33, Tom Lane wrote:
>>> Maybe the entire idea is unworkable. I certainly don't find any comfort
>>> in your proposal in the above-referenced message to trust index
>>> operators; where is it written that those don't throw errors?
>
>> Let's consider b-tree operators for an index on the secure table, for
>> starters. Surely a b-tree index comparison operator can't throw an error
>> on any value that's in the table already, you would've gotten an error
>> trying to insert that.
>
> Man, are *you* trusting.
>
> A counterexample: suppose we had a form of type "text" that carried a
> collation specifier internally, and the comparison routine threw an
> error if asked to compare values with incompatible specifiers. An index
> built on a column of all the same collation would work fine. A query
> that tried to compare against a constant of a different collation would
> throw an error.

I can't take that example seriously. First of all, tacking a collation
specifier to text values would be an awful hack. Secondly, it would be a
bad idea to define the b-tree comparison operators to throw an error; it
would be a lot more useful to impose an arbitrary order on the
collations, so that all values with collation A are considered smaller
than values with collation B. We do that for types like box; smaller or
greater than don't make much sense for boxes, but we implement them in a
pretty arbitrary way anyway to make it possible to build a b-tree index
on them, and for the planner to use merge joins on them, and implement
DISTINCT using sort etc.

>> I'm not sure. But indexable
>> operations are what we care about the most; the order of executing those
>> determines if you can use an index scan or not.
>
> Personally, I care just as much about hash and merge join operators...

Hash seems safe too. Don't merge joins just use the default b-tree operator?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-04 20:12:19
Message-ID: 22765.1275682339@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 04/06/10 22:33, Tom Lane wrote:
>> A counterexample: suppose we had a form of type "text" that carried a
>> collation specifier internally, and the comparison routine threw an
>> error if asked to compare values with incompatible specifiers. An index
>> built on a column of all the same collation would work fine. A query
>> that tried to compare against a constant of a different collation would
>> throw an error.

> I can't take that example seriously. First of all, tacking a collation
> specifier to text values would be an awful hack.

Really? I thought that was under serious discussion. But whether it
applies to text or not is insignificant; I believe there are cases just
like this in existence today for some datatypes (think postgis).

The real point is that the comparison constant is under the control of
the attacker, and it's not part of the index. Therefore "it didn't
throw an error during index construction" proves nothing whatever.

> ... Secondly, it would be a
> bad idea to define the b-tree comparison operators to throw an error;

You're still being far too trusting, by imagining that only *designed*
error conditions matter here. Think about overflows, out-of-memory,
(perhaps intentionally) corrupted data, etc etc.

I think the only real fix would be something like what Marc suggested:
if there's a security view involved in the query, we simply don't give
the client the real error message. Of course, now our "security
feature" is utterly disastrous on usability as well as performance
counts ...

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-07 01:38:00
Message-ID: AANLkTin2NTnDjPQWKCpWiOPJhM3xLg-J7lKDSoOYntOj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jun 4, 2010 at 4:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> On 04/06/10 22:33, Tom Lane wrote:
>>> A counterexample: suppose we had a form of type "text" that carried a
>>> collation specifier internally, and the comparison routine threw an
>>> error if asked to compare values with incompatible specifiers.  An index
>>> built on a column of all the same collation would work fine.  A query
>>> that tried to compare against a constant of a different collation would
>>> throw an error.
>
>> I can't take that example seriously. First of all, tacking a collation
>> specifier to text values would be an awful hack.
>
> Really?  I thought that was under serious discussion.  But whether it
> applies to text or not is insignificant; I believe there are cases just
> like this in existence today for some datatypes (think postgis).
>
> The real point is that the comparison constant is under the control of
> the attacker, and it's not part of the index.  Therefore "it didn't
> throw an error during index construction" proves nothing whatever.
>
>> ... Secondly, it would be a
>> bad idea to define the b-tree comparison operators to throw an error;
>
> You're still being far too trusting, by imagining that only *designed*
> error conditions matter here.  Think about overflows, out-of-memory,
> (perhaps intentionally) corrupted data, etc etc.

btree comparison operators should handle overflow and corrupted data
without blowing up. Maybe out-of-memory is worth worrying about, but
I think that is a mighty thin excuse for abandoning this feature
altogether. You'd have to contrive a situation where the system was
just about out of memory and something about the value being compared
against resulted in the comparison blowing up or not. I think that's
likely to be rather hard in practice, and in any case it's a covert
channel attack, which I think everyone already agrees is beyond the
scope of what we can protect against. You can probably learn more
information more quickly about the unseen data by fidding with
EXPLAIN, analyzing query execution times, etc. As long as we are
preventing the actual contents of the unseen tuples from being
revealed, I feel reasonably good about it.

> I think the only real fix would be something like what Marc suggested:
> if there's a security view involved in the query, we simply don't give
> the client the real error message.  Of course, now our "security
> feature" is utterly disastrous on usability as well as performance
> counts ...

Not pushing anything into the view is an equally real fix, although
I'll be pretty depressed if that's where we end up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-07 03:06:47
Message-ID: 20100607030647.GX21875@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> Another idea I had was... would it be safe to trust functions defined
> by the same user who owns the view? If he's granted access to the
> view and the function to some other user, presumably he doesn't mind
> them being used together? Or is that too optimistic?

This was more-or-less what I'd been kind of kicking around in my head.
Forget about functions that are defined in the view itself. Any other
functions, etc, which are attached to the view by the calling user would
be suspect, etc. Perhaps with the exception of some built-ins that
we've marked as "safe" in some way.

My first thought was to track the "run this as X" information on every
RTE (more-or-less, relations, function calls, etc) and then at least be
able to, hopefully, *detect* situations that might be a problem- eg:
running a function which has "run as Q" against a relation that was
accessed as "run as R" when a filter "run as R" happens later. This is
all far too hand-wavey, I'm sure, but at least if we could detect it
then we might be able to find a way to deal with it.

Also, perhaps I'm not being paranoid enough, but all this concern over
error cases really doesn't really worry me that much. The amount of
data one could acquire that way is pretty limited. It'd be great if we
could deal with that case too, but maybe we could worry about the bigger
issue (at least, as I see it) first.

Just my 2c.

Thanks,

Stephen


From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-07 04:59:02
Message-ID: 4C0C7C96.10109@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2010/06/07 10:38), Robert Haas wrote:
> On Fri, Jun 4, 2010 at 4:12 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>>> On 04/06/10 22:33, Tom Lane wrote:
>>>> A counterexample: suppose we had a form of type "text" that carried a
>>>> collation specifier internally, and the comparison routine threw an
>>>> error if asked to compare values with incompatible specifiers. An index
>>>> built on a column of all the same collation would work fine. A query
>>>> that tried to compare against a constant of a different collation would
>>>> throw an error.
>>
>>> I can't take that example seriously. First of all, tacking a collation
>>> specifier to text values would be an awful hack.
>>
>> Really? I thought that was under serious discussion. But whether it
>> applies to text or not is insignificant; I believe there are cases just
>> like this in existence today for some datatypes (think postgis).
>>
>> The real point is that the comparison constant is under the control of
>> the attacker, and it's not part of the index. Therefore "it didn't
>> throw an error during index construction" proves nothing whatever.
>>
>>> ... Secondly, it would be a
>>> bad idea to define the b-tree comparison operators to throw an error;
>>
>> You're still being far too trusting, by imagining that only *designed*
>> error conditions matter here. Think about overflows, out-of-memory,
>> (perhaps intentionally) corrupted data, etc etc.
>
> btree comparison operators should handle overflow and corrupted data
> without blowing up. Maybe out-of-memory is worth worrying about, but
> I think that is a mighty thin excuse for abandoning this feature
> altogether. You'd have to contrive a situation where the system was
> just about out of memory and something about the value being compared
> against resulted in the comparison blowing up or not. I think that's
> likely to be rather hard in practice, and in any case it's a covert
> channel attack, which I think everyone already agrees is beyond the
> scope of what we can protect against. You can probably learn more
> information more quickly about the unseen data by fidding with
> EXPLAIN, analyzing query execution times, etc. As long as we are
> preventing the actual contents of the unseen tuples from being
> revealed, I feel reasonably good about it.
>
It seems to me a good point. In general, any software have
possibility to leak information via cover-channels, and it is
nearly impossible to fix all the channels.
However, from a security perspective, covert channels with low
bandwidths represent a lower threat than those with high bandwidths.
So, evaluation criteria stands on the viewpoint that it does not
necessary to eliminate all the covert channels more than necessity.

Even if we can estimate the contents of invisible tuples from error
messages or EXPLAIN output, its bandwidths are quite limited.

But, lowrite() might write out the given argument somewhere, if it
is invoked prior to security policy functions.
In this case, the contents of invisible tuples will be moved to
another large object unexpectedly with unignorable speed.
Such a direct data flow channel should be fixed at first, because of
its large bandwidth.

>> I think the only real fix would be something like what Marc suggested:
>> if there's a security view involved in the query, we simply don't give
>> the client the real error message. Of course, now our "security
>> feature" is utterly disastrous on usability as well as performance
>> counts ...
>
> Not pushing anything into the view is an equally real fix, although
> I'll be pretty depressed if that's where we end up.
>
I could not find out any certified commercial RDBMS with functionality
to tackle covert channel. It includes Oracle's row-level security stuff.
So, I don't think it is our goal to prevent anything into views.

An idea is that we focus on the direct data flow which allows to move
information to be invisible into another tables, files and so on.
In this stage, I don't think the priority of error messages are high.

Thanks,
--
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>


From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-07 05:29:21
Message-ID: 4C0C83B1.5060902@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2010/06/07 12:06), Stephen Frost wrote:
> * Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
>> Another idea I had was... would it be safe to trust functions defined
>> by the same user who owns the view? If he's granted access to the
>> view and the function to some other user, presumably he doesn't mind
>> them being used together? Or is that too optimistic?
>
> This was more-or-less what I'd been kind of kicking around in my head.
> Forget about functions that are defined in the view itself. Any other
> functions, etc, which are attached to the view by the calling user would
> be suspect, etc. Perhaps with the exception of some built-ins that
> we've marked as "safe" in some way.
>
> My first thought was to track the "run this as X" information on every
> RTE (more-or-less, relations, function calls, etc) and then at least be
> able to, hopefully, *detect* situations that might be a problem- eg:
> running a function which has "run as Q" against a relation that was
> accessed as "run as R" when a filter "run as R" happens later. This is
> all far too hand-wavey, I'm sure, but at least if we could detect it
> then we might be able to find a way to deal with it.
>
It is a similar idea to what I tried to implement at the conceptual
patch. It checks privileges of calling user on the underlaying tables
to be referenced using views. If violated, it prevents to push down
the user given FuncExpr into join loops of views. Otherwise, it does
not prevent anything, because the user originally has privileges to
reference them.
Are you suggesting the idea (with adjustments such as "safe" functions)?

> Also, perhaps I'm not being paranoid enough, but all this concern over
> error cases really doesn't really worry me that much. The amount of
> data one could acquire that way is pretty limited. It'd be great if we
> could deal with that case too, but maybe we could worry about the bigger
> issue (at least, as I see it) first.
>
As I also mentioned at previous message. It seems to me a good
point to focus on bandwidth of the covert channel.

The error message indeed can deliver information to be invisible,
but I don't think its priority is higher than functions that can
be abused to direct data flow channel, such as lowrite().

Thanks,
--
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-07 06:48:59
Message-ID: 4C0C965B.2040902@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/06/10 06:06, Stephen Frost wrote:
> Also, perhaps I'm not being paranoid enough, but all this concern over
> error cases really doesn't really worry me that much. The amount of
> data one could acquire that way is pretty limited.

It's not limited. It allows you to read all contents of the underlying
table or tables. I don't see much point doing anything at all if we
don't plug that.

There's many side channels like exposing row counts in EXPLAIN and
statistics and timing attacks, that are not as critical, because they
don't let expose all data, and the attacker can't accurately choose what
data is exposed. Those are not as important.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-07 07:30:30
Message-ID: 4C0CA016.5070204@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2010/06/07 15:48), Heikki Linnakangas wrote:
> On 07/06/10 06:06, Stephen Frost wrote:
>> Also, perhaps I'm not being paranoid enough, but all this concern over
>> error cases really doesn't really worry me that much. The amount of
>> data one could acquire that way is pretty limited.
>
> It's not limited. It allows you to read all contents of the underlying
> table or tables. I don't see much point doing anything at all if we
> don't plug that.
>
IIRC, Oracle pays attention not to expose function's arguments via
error messages due to the security reasons, although it focuses on
that does not provide a hint to attacker who tries SQL injection.

It seems to me it is a matter of degree.

If we allows to execute lowrite() inside of the join loop, it can
be abused to move massive invisible information into visible large
objects within a little time. So, its priority is relatively higher
than error messages.

However, we cannot move massive information via error messages in
a little time, although it may expose whole of the arguments.
So, its threat is relatively smaller.

> There's many side channels like exposing row counts in EXPLAIN and
> statistics and timing attacks, that are not as critical, because they
> don't let expose all data, and the attacker can't accurately choose what
> data is exposed. Those are not as important.
>
It also means; because they can provide much smaller bandwidth to leak
invisible information than error messages, these are not as important.
Is it right?

Thanks,
--
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-07 09:07:53
Message-ID: 4C0CB6E9.4090903@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/06/10 10:30, KaiGai Kohei wrote:
> (2010/06/07 15:48), Heikki Linnakangas wrote:
>> There's many side channels like exposing row counts in EXPLAIN and
>> statistics and timing attacks, that are not as critical, because they
>> don't let expose all data, and the attacker can't accurately choose what
>> data is exposed. Those are not as important.
>>
> It also means; because they can provide much smaller bandwidth to leak
> invisible information than error messages, these are not as important.
> Is it right?

The big difference is what information can be obtained, not how fast it
can be obtained.

Imagine a table that holds username/passwords for users. Each user is
allowed to see his own row, including password, but not anyone else's.
EXPLAIN side-channel might give pretty accurate information of how many
rows there is in the table, and via clever EXPLAIN+statistics probing
you might be able to find out what the top-10 passwords are, for
example. But if you wanted to know what your neighbor's password is, the
side-channels would not help you much, but an error message would reveal
it easily.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-07 11:06:35
Message-ID: 20100607110635.GB21875@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki,

* Heikki Linnakangas (heikki(dot)linnakangas(at)enterprisedb(dot)com) wrote:
> The big difference is what information can be obtained, not how fast it
> can be obtained.

Actually, I disagree. Time required to acquire the data does matter.

> Imagine a table that holds username/passwords for users. Each user is
> allowed to see his own row, including password, but not anyone else's.
> EXPLAIN side-channel might give pretty accurate information of how many
> rows there is in the table, and via clever EXPLAIN+statistics probing
> you might be able to find out what the top-10 passwords are, for
> example. But if you wanted to know what your neighbor's password is, the
> side-channels would not help you much, but an error message would reveal
> it easily.

Using only built-ins, could you elaborate on how one could pick exactly
what row was revealed using an error case? That strikes me as
difficult, but perhaps I'm not thinking creatively enough.

Thanks,

Stephen


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-07 11:53:47
Message-ID: 4C0CDDCB.5050602@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07/06/10 14:06, Stephen Frost wrote:
> * Heikki Linnakangas (heikki(dot)linnakangas(at)enterprisedb(dot)com) wrote:
>> The big difference is what information can be obtained, not how fast it
>> can be obtained.
>
> Actually, I disagree. Time required to acquire the data does matter.

Depends on the magnitude, of course. If it takes 1 year per row, that's
probably acceptable. If it takes 1 second, that's extremely slow
compared to normal queries, but most likely still disastreous from a
security point of view.

>> Imagine a table that holds username/passwords for users. Each user is
>> allowed to see his own row, including password, but not anyone else's.
>> EXPLAIN side-channel might give pretty accurate information of how many
>> rows there is in the table, and via clever EXPLAIN+statistics probing
>> you might be able to find out what the top-10 passwords are, for
>> example. But if you wanted to know what your neighbor's password is, the
>> side-channels would not help you much, but an error message would reveal
>> it easily.
>
> Using only built-ins, could you elaborate on how one could pick exactly
> what row was revealed using an error case? That strikes me as
> difficult, but perhaps I'm not thinking creatively enough.

WHERE should do it:

SELECT * FROM secrets_view WHERE username = 'neighbor' AND
password::integer = 1234;
ERROR: invalid input syntax for integer: "neighborssecretpassword"

Assuming that username = 'neighbor' is evaluated before the cast.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-07 12:56:21
Message-ID: 20100607125621.GC21875@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Heikki Linnakangas (heikki(dot)linnakangas(at)enterprisedb(dot)com) wrote:
> WHERE should do it:
>
> SELECT * FROM secrets_view WHERE username = 'neighbor' AND
> password::integer = 1234;
> ERROR: invalid input syntax for integer: "neighborssecretpassword"
>
> Assuming that username = 'neighbor' is evaluated before the cast.

Fair enough, so we can't allow built-ins either, except perhaps in very
specific/limited situations. Still, if we track that the above WHERE
and password::integer calls *should* be run as "role X", while the view
should run as "role Y", maybe we can at least identify the case where
we've ended up in a situation where we are going to expose unintended
data. I don't know enough about the optimizer or the planner to have
any clue how we might teach them to actually avoid doing such, though I
certainly believe it could end up being a disaster on performance based
on comments from others who know better. :)

Thanks,

Stephen


From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-08 00:10:30
Message-ID: 4C0D8A76.4000206@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2010/06/07 20:53), Heikki Linnakangas wrote:
> On 07/06/10 14:06, Stephen Frost wrote:
>> * Heikki Linnakangas (heikki(dot)linnakangas(at)enterprisedb(dot)com) wrote:
>>> The big difference is what information can be obtained, not how fast it
>>> can be obtained.
>>
>> Actually, I disagree. Time required to acquire the data does matter.
>
> Depends on the magnitude, of course. If it takes 1 year per row, that's
> probably acceptable. If it takes 1 second, that's extremely slow
> compared to normal queries, but most likely still disastreous from a
> security point of view.
>
FYI, the classic also mentioned about bandwidth of covert channel,
although it was already obsoleted.

See the page.80 of:
http://csrc.nist.gov/publications/history/dod85.pdf

It said 1bit/sec are acceptable on DoD in 25 years ago.

>>> Imagine a table that holds username/passwords for users. Each user is
>>> allowed to see his own row, including password, but not anyone else's.
>>> EXPLAIN side-channel might give pretty accurate information of how many
>>> rows there is in the table, and via clever EXPLAIN+statistics probing
>>> you might be able to find out what the top-10 passwords are, for
>>> example. But if you wanted to know what your neighbor's password is, the
>>> side-channels would not help you much, but an error message would reveal
>>> it easily.
>>
>> Using only built-ins, could you elaborate on how one could pick exactly
>> what row was revealed using an error case? That strikes me as
>> difficult, but perhaps I'm not thinking creatively enough.
>
> WHERE should do it:
>
> SELECT * FROM secrets_view WHERE username = 'neighbor' AND
> password::integer = 1234;
> ERROR: invalid input syntax for integer: "neighborssecretpassword"
>
> Assuming that username = 'neighbor' is evaluated before the cast.
>

In this case, is it unnecessary to expose the given argument in
the error message (from security perspective), isn't it?
Because it is basically matter of the integer input handler,
it seems to me what we should fix up is int4in(), not optimizer.

Perhaps, we should categorize the issued functionalities base on
the level of its threat when abused.

* High-threat
Functions have side-effect that allows to move the given arguments
into another tables or other high-bandwidth chennel.
E.g) lowrite(), pg_write_file()

-> It should be fixed soon.

* Middle-threat
Functions have side-effect that allows to move the given arguments
using error messages or other low-bandwidth channel.
E.g) int4in()

-> It should be fixed in long term.

* Row-threat
Functions can imply existence of invisible tuples, but it does not
expose the value itself.
E.g) EXPLAIN statement, PK/FK constraints

-> It should not be fixed in PostgreSQL.

Now we allow all of them.
But isn't it valuable to fix the high-threat first?
Then, we can revise error messages in built-in functions, I think.

Thanks,
--
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>


From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-08 00:23:34
Message-ID: 4C0D8D86.4080702@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2010/06/07 21:56), Stephen Frost wrote:
> * Heikki Linnakangas (heikki(dot)linnakangas(at)enterprisedb(dot)com) wrote:
>> WHERE should do it:
>>
>> SELECT * FROM secrets_view WHERE username = 'neighbor' AND
>> password::integer = 1234;
>> ERROR: invalid input syntax for integer: "neighborssecretpassword"
>>
>> Assuming that username = 'neighbor' is evaluated before the cast.
>
> Fair enough, so we can't allow built-ins either, except perhaps in very
> specific/limited situations. Still, if we track that the above WHERE
> and password::integer calls *should* be run as "role X", while the view
> should run as "role Y", maybe we can at least identify the case where
> we've ended up in a situation where we are going to expose unintended
> data. I don't know enough about the optimizer or the planner to have
> any clue how we might teach them to actually avoid doing such, though I
> certainly believe it could end up being a disaster on performance based
> on comments from others who know better. :)
>

My opinion is that it is a matter in individual functions, not optimizer.
Basically, built-in functions *should* be trusted, because our security
mechanism is not designed to prevent anything from malicious internal
binary modules.

Historically, we have not known the risk to leak invisible information
using error messages for a long time, so most of internal functions have
not been designed not to return users unnecessary information.
If so, it needs to revise error messages, but it will not complete with
a single commit.

Thanks,
--
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>


From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-08 00:46:27
Message-ID: 4C0D92E3.70301@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2010/06/08 9:23), KaiGai Kohei wrote:
> (2010/06/07 21:56), Stephen Frost wrote:
>> * Heikki Linnakangas (heikki(dot)linnakangas(at)enterprisedb(dot)com) wrote:
>>> WHERE should do it:
>>>
>>> SELECT * FROM secrets_view WHERE username = 'neighbor' AND
>>> password::integer = 1234;
>>> ERROR: invalid input syntax for integer: "neighborssecretpassword"
>>>
>>> Assuming that username = 'neighbor' is evaluated before the cast.
>>
>> Fair enough, so we can't allow built-ins either, except perhaps in very
>> specific/limited situations. Still, if we track that the above WHERE
>> and password::integer calls *should* be run as "role X", while the view
>> should run as "role Y", maybe we can at least identify the case where
>> we've ended up in a situation where we are going to expose unintended
>> data. I don't know enough about the optimizer or the planner to have
>> any clue how we might teach them to actually avoid doing such, though I
>> certainly believe it could end up being a disaster on performance based
>> on comments from others who know better. :)
>>
>
> My opinion is that it is a matter in individual functions, not optimizer.
> Basically, built-in functions *should* be trusted, because our security
> mechanism is not designed to prevent anything from malicious internal
> binary modules.
>
Sorry, it does not mean *all* the built-in functions could be trusted.
Some of built-in ones cannot be "trusted" from definitions, such as
lowrite().

Perhaps, it eventually needs a flag in the pg_proc to mark a function
being either trusted or untrusted. Then, planner may be able to check
the flag to decide whether is can be pushed down, or not.

If so, we can mark int4in() as trusted, when we revise the issue of
error message. I think the idea makes this problem more simple.

Thanks,

> Historically, we have not known the risk to leak invisible information
> using error messages for a long time, so most of internal functions have
> not been designed not to return users unnecessary information.
> If so, it needs to revise error messages, but it will not complete with
> a single commit.
>
> Thanks,
--
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-08 00:46:38
Message-ID: 24663.1275957998@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com> writes:
> In this case, is it unnecessary to expose the given argument in
> the error message (from security perspective), isn't it?

Yes, if all you care about is security and not usability, that looks
like a great solution. We're *not* doing it.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-08 01:02:48
Message-ID: AANLkTikgjzjhLcDp-m1GBGUId6y_46xfpbaJq0pfU0Nm@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 8, 2010 at 1:46 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com> writes:
>> In this case, is it unnecessary to expose the given argument in
>> the error message (from security perspective), isn't it?
>
> Yes, if all you care about is security and not usability, that looks
> like a great solution.  We're *not* doing it.

It's possible to take a more nuanced angle on this approach. You could
imagine a flag indicating whether the call is within a SECURE VIEW
which if enabled caused error messages to elide data taken from
arguments. In order for this not to destroy the code legibility I
think you would need to design some new %-escapes for error messages
to indicate such arguments or some similar trick. You wouldn't want to
have to put extra conditionals everywhere. And I'm not sure where to
conveniently put such a flag so it would have the right semantics.

It would still be a lot of work and a big patch, but mostly mechanical
and it wouldn't impact usability for any case where it wasn't
necessary. It would still have the problem described earlier that we
would keep finding new omissions for years. I can't see us
implementing variable taint tracking in C to do it automatically
though. Perhaps we could get it from one of the static analysis tools.

--
greg


From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-08 01:09:53
Message-ID: 4C0D9861.9050201@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2010/06/08 9:46), Tom Lane wrote:
> KaiGai Kohei<kaigai(at)ak(dot)jp(dot)nec(dot)com> writes:
>> In this case, is it unnecessary to expose the given argument in
>> the error message (from security perspective), isn't it?
>
> Yes, if all you care about is security and not usability, that looks
> like a great solution. We're *not* doing it.
>
Sorry, are you saying we should not revise error messages because
of usability??

If so, and if we decide the middle-threat also should be fixed,
it is necessary to distinguish functions trusted and untrusted,
even if a function is built-in.
Perhaps, pg_proc takes a new flag to represent it.

Thanks,
--
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-08 01:17:32
Message-ID: 20100608011732.GG21875@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

KaiGai,

* KaiGai Kohei (kaigai(at)ak(dot)jp(dot)nec(dot)com) wrote:
> Perhaps, pg_proc takes a new flag to represent it.

Without an actual well-formed approach for dealing with this which
requires it, it's far too soon to be asking for changes in the catalog.
Please stop suggesting individual maybe-this-would-help changes. We
need an actual solution which addresses all, or at least most, of the
concerns described, followed by a patch which implements the mechanics
of the solution. If the solution calls for an additional pg_proc field,
then the patch should include it.

Not sure if this would translate well, but asking for new flags to be
added to pg_proc right now is putting the cart before the horse. We
don't even know which functions we might mark as trusted or not yet, nor
is it even clear that adding such a flag would actually help. Adding a
flag to pg_proc is certainly nothing like a solution to this problem by
itself.

Thanks,

Stephen


From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-08 02:10:13
Message-ID: 4C0DA685.3000300@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2010/06/08 10:17), Stephen Frost wrote:
> KaiGai,
>
> * KaiGai Kohei (kaigai(at)ak(dot)jp(dot)nec(dot)com) wrote:
>> Perhaps, pg_proc takes a new flag to represent it.
>
> Without an actual well-formed approach for dealing with this which
> requires it, it's far too soon to be asking for changes in the catalog.
> Please stop suggesting individual maybe-this-would-help changes. We
> need an actual solution which addresses all, or at least most, of the
> concerns described, followed by a patch which implements the mechanics
> of the solution. If the solution calls for an additional pg_proc field,
> then the patch should include it.
>
OK, it was too implementation-specific.

Please return to the categorization with 3-level that I mentioned at
the previous message.

I guess nobody opposes to disable pushing down on functions categorized
to high-threat, such as lowrite() and so on.
It actually can move given arguments to other tables, files, ...

And, I guess nobody wants to tackle an issue categorized to low-threat,
such as EXPLAIN, PK/FK constraints and so on.
It can imply existence of invisible objects, but no leaks of actual value.

Our headache is on functions categorized to middle-threat. It enables to
leak the given arguments using error messages. Here are several ideas,
but they have good and bad points.

At first, it is necessary whether we should fix up the threat in this
level, or not. It seems to me majority of the -hackers want to fix both
of high and middle level.

If we should fix up the issue, I think we have only two options semantically.

(A) It prevents leaky functions to be pushed down, so no invisible
information will be provided. But it makes performance regression.

(B) It prevents leaky functions to raise an error, although we allow
it to be pushed down. But is needs large scale of code changes.

Of course, it has trade-off. As TCSEC mentioned, we are facing with the
large potential cost of reducing the bandwidths of such covert channel

> Not sure if this would translate well, but asking for new flags to be
> added to pg_proc right now is putting the cart before the horse. We
> don't even know which functions we might mark as trusted or not yet, nor
> is it even clear that adding such a flag would actually help. Adding a
> flag to pg_proc is certainly nothing like a solution to this problem by
> itself.
>
For built-in functions, the code should be reviewed to ensure it does not
expose the given argument using error messages.
Then, we can mark it as trusted.

Thanks,
--
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-08 02:15:36
Message-ID: AANLkTinLBHZtyjPH1kwE2AV9ofcEwlntPxIctgLK5ZdJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/6/7 KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>:
> Our headache is on functions categorized to middle-threat. It enables to
> leak the given arguments using error messages. Here are several ideas,
> but they have good and bad points.

I think we are altogether off in the weeds here. We ought to start
with an implementation that pushes nothing down, and then try to
figure out how much we can relax that without too much compromising
security.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-08 02:19:52
Message-ID: 20100608021952.GK21875@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Robert Haas (robertmhaas(at)gmail(dot)com) wrote:
> 2010/6/7 KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>:
> > Our headache is on functions categorized to middle-threat. It enables to
> > leak the given arguments using error messages. Here are several ideas,
> > but they have good and bad points.
>
> I think we are altogether off in the weeds here. We ought to start
> with an implementation that pushes nothing down, and then try to
> figure out how much we can relax that without too much compromising
> security.

I agree with this- and it's more-or-less what I was trying to propose in
my previous comments. I'm not even sure we need to focus on not pushing
anything down at this point- I'd start with trying to get enough
information passed around/through the system to even *identify* the case
where there's a problem in the first place..

Thanks,

Stephen


From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-08 02:25:18
Message-ID: 4C0DAA0E.20707@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2010/06/08 11:15), Robert Haas wrote:
> 2010/6/7 KaiGai Kohei<kaigai(at)ak(dot)jp(dot)nec(dot)com>:
>> Our headache is on functions categorized to middle-threat. It enables to
>> leak the given arguments using error messages. Here are several ideas,
>> but they have good and bad points.
>
> I think we are altogether off in the weeds here. We ought to start
> with an implementation that pushes nothing down, and then try to
> figure out how much we can relax that without too much compromising
> security.
>
It seems to me fair enough.
I think we can adjust what functions are harmless, and whats are not later.

Thanks,
--
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-08 02:28:56
Message-ID: 20100608022856.GL21875@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

For the sake of clarity..

* KaiGai Kohei (kaigai(at)ak(dot)jp(dot)nec(dot)com) wrote:
> OK, it was too implementation-specific.

No, that wasn't the problem. There isn't an actual implementation yet
for it to be too-specific on. The problem is that proposing a change to
the catalog without figuring out what it'd actually be used for in an
overall solution is a waste of time.

> Please return to the categorization with 3-level that I mentioned at
> the previous message.

As Robert said, we're off in the weeds here. I'm not convinced that
we've got 3 levels, for starters. It could well be fewer, or more.
Let's stop making assumptions about what's OK and what's not OK.

> For built-in functions, the code should be reviewed to ensure it does not
> expose the given argument using error messages.
> Then, we can mark it as trusted.

One thing that I think *is* clear- removing useful information from
error messages is *not* going to be an acceptable "solution".

Thanks,

Stephen


From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-08 02:45:30
Message-ID: 4C0DAECA.1060702@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2010/06/08 11:28), Stephen Frost wrote:
> For the sake of clarity..
>
> * KaiGai Kohei (kaigai(at)ak(dot)jp(dot)nec(dot)com) wrote:
>> OK, it was too implementation-specific.
>
> No, that wasn't the problem. There isn't an actual implementation yet
> for it to be too-specific on. The problem is that proposing a change to
> the catalog without figuring out what it'd actually be used for in an
> overall solution is a waste of time.
>
Indeed,

>> Please return to the categorization with 3-level that I mentioned at
>> the previous message.
>
> As Robert said, we're off in the weeds here. I'm not convinced that
> we've got 3 levels, for starters. It could well be fewer, or more.
> Let's stop making assumptions about what's OK and what's not OK.
>
Indeed, we may find out the 4th category in the future.

>> For built-in functions, the code should be reviewed to ensure it does not
>> expose the given argument using error messages.
>> Then, we can mark it as trusted.
>
> One thing that I think *is* clear- removing useful information from
> error messages is *not* going to be an acceptable "solution".
>
Even if it is conditional, like as Greg Stark suggested?

Thanks,
--
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>


From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, marc(at)bloodnok(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Fix leaky VIEWs for RLS
Date: 2010-06-09 06:01:13
Message-ID: 4C0F2E29.3060602@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(2010/06/08 11:15), Robert Haas wrote:
> 2010/6/7 KaiGai Kohei<kaigai(at)ak(dot)jp(dot)nec(dot)com>:
>> Our headache is on functions categorized to middle-threat. It enables to
>> leak the given arguments using error messages. Here are several ideas,
>> but they have good and bad points.
>
> I think we are altogether off in the weeds here. We ought to start
> with an implementation that pushes nothing down, and then try to
> figure out how much we can relax that without too much compromising
> security.
>

The attached patch tries to prevent pushing down anything into subqueries
from outside of them.

The distribute_qual_to_rels() tries to distribute the given qualifier
into a certain scanning-plan based on the dependency of qualifier.

E.g) SELECT * FROM (SELECT * FROM t1 JOIN t2 ON t1.a = t2.x WHERE f_policy(t1.a)) WHERE f_user(t2.x);

In this case, f_user() function depends on only t2 table, so it is
reasonable to attach on the scanning plan of t2 from perspective of
performance.

However, f_user() may have a side-effect which writes arguments into
somewhere. If here is such a possibility, f_user() should not be called
before the joined tuples being filtered by f_policy().

In the case when we can ensure all functions within the qualifier are
enough trustable, we don't need to prevent them to be pushed down.
But the algorithm to determine it is under discussion. So, right now,
we prevent all the possible pushing down.

Example.1) CREATE VIEW v1 AS SELECT * FROM t1 JOIN t2 ON a = x WHERE f_policy(a);
SELECT * FROM v1 WHERE f_malicious(b);

* without this patch
postgres=# EXPLAIN SELECT * FROM v1 WHERE f_malicious(b);
QUERY PLAN
-------------------------------------------------------------------
Hash Join (cost=639.01..667.29 rows=137 width=72)
Hash Cond: (t2.x = t1.a)
-> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=637.30..637.30 rows=137 width=36)
-> Seq Scan on t1 (cost=0.00..637.30 rows=137 width=36)
Filter: (f_policy(a) AND f_malicious(b))
(6 rows)

* with this patch
postgres=# EXPLAIN SELECT * FROM v1 WHERE f_malicious(b);
QUERY PLAN
-------------------------------------------------------------------
Hash Join (cost=334.93..468.44 rows=137 width=72)
Hash Cond: (t2.x = t1.a)
Join Filter: f_malicious(t1.b)
-> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=329.80..329.80 rows=410 width=36)
-> Seq Scan on t1 (cost=0.00..329.80 rows=410 width=36)
Filter: f_policy(a)
(7 rows)

It prevents to push down f_malicious() inside of the join loop.

Example.2) CREATE VIEW v1 AS SELECT * FROM t1 JOIN t2 ON a = x WHERE f_policy(a);
SELECT * FROM v1 JOIN t3 ON v1.a=t3.s WHERE f_malicious(b);

* without this patch
postgres=# EXPLAIN SELECT * FROM v1 JOIN t3 ON v1.a=t3.s WHERE f_malicious(b);
QUERY PLAN
-------------------------------------------------------------------------------
Hash Join (cost=669.01..697.29 rows=137 width=108)
Hash Cond: (t3.s = t1.a)
-> Seq Scan on t3 (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=667.29..667.29 rows=137 width=72)
-> Hash Join (cost=639.01..667.29 rows=137 width=72)
Hash Cond: (t2.x = t1.a)
-> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=637.30..637.30 rows=137 width=36)
-> Seq Scan on t1 (cost=0.00..637.30 rows=137 width=36)
Filter: (f_policy(a) AND f_malicious(b))
(10 rows)

* with this patch
postgres=# EXPLAIN SELECT * FROM v1 JOIN t3 ON v1.a=t3.s WHERE f_malicious(b);
QUERY PLAN
-------------------------------------------------------------------------------
Hash Join (cost=470.15..498.43 rows=137 width=108)
Hash Cond: (t3.s = t1.a)
-> Seq Scan on t3 (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=468.44..468.44 rows=137 width=72)
-> Hash Join (cost=334.93..468.44 rows=137 width=72)
Hash Cond: (t2.x = t1.a)
Join Filter: f_malicious(t1.b)
-> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=329.80..329.80 rows=410 width=36)
-> Seq Scan on t1 (cost=0.00..329.80 rows=410 width=36)
Filter: f_policy(a)
(11 rows)

It also prevents f_malisious() to be pushed down into the join loop within view,
but we can push it down into same level of the query.

Please note that it specially handles equality operator at the bottom half of
the distribute_qual_to_rels(), so this patch does not care about these cases.
However, I'm not in hustle to prevent these optimization, because I guess
these should be entirely trusted. So, the patch is in just a start up phase,
not commitable anyway.

postgres=# EXPLAIN SELECT * FROM v1 WHERE b = 'aaa';
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop (cost=0.00..349.44 rows=2 width=72)
-> Seq Scan on t1 (cost=0.00..332.88 rows=2 width=36)
Filter: ((b = 'aaa'::text) AND f_policy(a))
-> Index Scan using t2_pkey on t2 (cost=0.00..8.27 rows=1 width=36)
Index Cond: (t2.x = t1.a)
(5 rows)

Thanks,
--
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

Attachment Content-Type Size
pgsql-fix-leaky-join-view.3.patch text/x-patch 7.4 KB