Re: leaky views, yet again

From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: leaky views, yet again
Date: 2010-09-06 04:51:42
Message-ID: 4C84735E.4070604@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2010/09/02 11:57), Robert Haas wrote:
> 2010/9/1 KaiGai Kohei<kaigai(at)ak(dot)jp(dot)nec(dot)com>:
>> Right now, it stands on a strict assumption that considers operators
>> implemented with built-in functions are safe; it does not have no
>> possibility to leak supplied arguments anywhere.
>>
>> Please note that this patch does not case about a case when
>> a function inside a view and a function outside a view are
>> distributed into same level and the later function has lower
>> cost value.
>
> Without making some attempt to address these two points, I don't see
> the point of this patch.
>
> Also, I believe we decided previously do this deoptimization only in
> case the user requests it with CREATE SECURITY VIEW.
>

The attached patch tackles both of the above two points, and allows to
control this deoptimization when CREATE SECURITY VIEW is used.

Example
---------
CREATE FUNCTION f_policy(int) RETURNS bool LANGUAGE 'plpgsql'
AS 'begin return $1 % 2 = 0; end';
CREATE FUNCTION f_malicious(text) RETURNS bool COST 0.0001 LANGUAGE 'plpgsql'
AS 'begin raise notice ''leak: %'', $1; return true; end';
CREATE OR REPLACE VIEW v1
AS SELECT * FROM t1 JOIN t2 ON t1.a = t2.x WHERE f_policy(x);
CREATE OR REPLACE VIEW v2
AS SELECT * FROM t1 WHERE f_policy(a);

In this case, we don't want to invoke f_malicious() with argument come
from the tuples which should be filtered inside of the views, even if
people provides it on the WHERE clause.

Without this patch
====================

postgres=# EXPLAIN select * from v1 where f_malicious(b);
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop (cost=0.00..287.63 rows=410 width=72)
-> Seq Scan on t1 (cost=0.00..22.30 rows=410 width=36)
Filter: f_malicious(b)
-> Index Scan using t2_pkey on t2 (cost=0.00..0.63 rows=1 width=36)
Index Cond: (x = t1.a)
Filter: f_policy(x)
(6 rows)

==> f_policy(x) was chained inside of the join loop, so it enables to
something to be filtered out.

postgres=# select * from v1 where f_malicious(b);
NOTICE: leak: aaa
NOTICE: leak: bbb
NOTICE: leak: ccc
NOTICE: leak: ddd
a | b | x | y
---+-----+---+-----
2 | bbb | 2 | xxx
4 | ddd | 4 | zzz
(2 rows)

postgres=# EXPLAIN select * from v2 where f_malicious(b);
QUERY PLAN
-------------------------------------------------------
Seq Scan on t1 (cost=0.00..329.80 rows=137 width=36)
Filter: (f_malicious(b) AND f_policy(a))
(2 rows)

==> The view of v2 is pulled up to the upper level, because this subquery
is enough simple. In this case, f_malicious() and f_policy() are
handled in a same level, and ordered by its cost to execution.
Since f_malicious() has lower cost (0.001) the f_policy(), so it shall
be executed earlier, although f_policy() is used in more deep depth.

With this patch
=================

postgres=# EXPLAIN select * from v1 where f_malicious(b);
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop (cost=0.00..287.63 rows=410 width=72)
-> Seq Scan on t1 (cost=0.00..22.30 rows=410 width=36)
Filter: f_malicious(b)
-> Index Scan using t2_pkey on t2 (cost=0.00..0.63 rows=1 width=36)
Index Cond: (x = t1.a)
Filter: f_policy(x)
(6 rows)

==> It has same result of the case when without this patch,
because this view is declared without "SECURITY"

postgres=# CREATE OR REPLACE SECURITY VIEW v1
AS SELECT * FROM t1 JOIN t2 ON t1.a = t2.x WHERE f_policy(x);
CREATE VIEW
postgres=# EXPLAIN select * from v1 where f_malicious(b);
QUERY PLAN
-------------------------------------------------------------------
Hash Join (cost=37.68..384.39 rows=137 width=72)
Hash Cond: (t1.a = t2.x)
Join Filter: (f_policy(t2.x) AND f_malicious(t1.b))
-> Seq Scan on t1 (cost=0.00..22.30 rows=1230 width=36)
-> Hash (cost=22.30..22.30 rows=1230 width=36)
-> Seq Scan on t2 (cost=0.00..22.30 rows=1230 width=36)
(6 rows)

==> After we replaced this view with SECURITY option, it prevents
f_malicious() being come from outside of the view into inside
of the join loop.

postgres=# EXPLAIN select * from v1 where f_malicious(b) and a = 2;
QUERY PLAN
-------------------------------------------------------------------------
Nested Loop (cost=0.00..16.80 rows=1 width=72)
Join Filter: (f_policy(t2.x) AND f_malicious(t1.b))
-> Index Scan using t1_pkey on t1 (cost=0.00..8.27 rows=1 width=36)
Index Cond: (a = 2)
-> Index Scan using t2_pkey on t2 (cost=0.00..8.27 rows=1 width=36)
Index Cond: (x = 2)
(6 rows)

==> We assume operators implemented with built-in functions are safe.
So, we don't prevent this pushing-down, if the clause does not
contains something leakable.

postgres=# EXPLAIN select * from v2 where f_malicious(b);
QUERY PLAN
-------------------------------------------------------
Seq Scan on t1 (cost=0.00..329.80 rows=137 width=36)
Filter: (f_policy(a) AND f_malicious(b))
(2 rows)

==> In addition, when functions come from different nest levels
are distributed a certain scan filter, these are ordered by
its original nest level, not only function cost.

Introduction of the patch
===========================
This patch tries to tackle the two matters.

The first one is pushing down leakable functions into inside of join-loops
being originated from security view.
The distribute_qual_to_rels() distributes qualifiers of WHERE/JOIN ... ON
clauses to appropriate set of relations. Even if a qualifier is depending
on a part of join-loop inside of security view, we need to prevent it to
be pushed down.
This patch informs distribute_qual_to_rels() what relations being come from
security views using Bitmapset of 'sec_barriers'. If a qualifier overlaps
with any members of security_barriers, its dependency shall be expanded to
the union set of 'sec_barriers'. In this case, this qualifier depends on
whole of the relations inside of the security view, we cannot push it down
anymore.

Views are expanded to sub-queries at query rewriter.
This patch add a flag to FromExpr to distinguish whether this sub-query was
expanded from security view, or not. When deconstruct_recurse() walks on
the supplied join-tree, it construct a bitmapset of the relations under the
FromExpr with security_view = true, then, it shall be delivered to
the distribute_qual_to_rels().

Instead of modifying the structure of pg_class, this patch stores a flag of
security view on the reloption. So, it needed to modify routines about
reloptions because it is the first case to store reloptions of views.

The other matter is priority to order qualifiers of a certain scanning
filter. As I mentioned above, the planner pull up simple subqueries to
join, so a function originated from inside of view and outside of view are
distributed to same scan plan. Then, these are ordered by cost value.
It means functions come from outside of views (maybe leakable) are invoked
with arguments come from tuples to be filtered out with functions come from
inside of the views.

So, I added 'depth' field to FuncExpr and so on. It shall be set just after
query rewriter, then referenced in order_qual_clauses().
If the supplied plan multiple qualifiers, these are ordered by the depth of
qualifier first, then ordered by the cost when here are multiple qualifiers
with same depth.
It makes sure qualifiers being originated inside of views are executed
earlier than others.

Thanks,
--
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

Attachment Content-Type Size
pgsql-fix-leaky-join-view.6.patch text/x-patch 63.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Itagaki Takahiro 2010-09-06 05:16:44 Re: string function - "format" function proposal
Previous Message Tom Lane 2010-09-06 04:09:46 Re: git: uh-oh