Re: API change advice: Passing plan invalidation info from the rewriter into the planner?

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Adam Brightwell <adam(dot)brightwell(at)crunchydatasolutions(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Yeb Havinga <yeb(dot)havinga(at)portavita(dot)nl>
Subject: Re: API change advice: Passing plan invalidation info from the rewriter into the planner?
Date: 2014-06-24 16:27:33
Message-ID: 20140624162733.GO16098@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert,

I feel like we are getting to the point of simply talking past each
other and so I'll try anew, and I'll include my understanding of how the
different approaches would address the specific use-case you outlined
up-thread.

Single policy
-------------
The current implementation approach only allows a single policy to be
included. The concern raised with this approach is that it won't be
very performant due to the qual complexity, which you outlined
(reformatted a bit) as:

WHERE
sales_rep_id = (SELECT oid FROM pg_roles
WHERE rolname = current_user
AND
oid IN (SELECT id FROM person WHERE is_sales_rep))
OR
partner_id = (SELECT p.org_id
FROM pg_roles a, person p
WHERE a.rolname = current_user
AND a.oid = p.id)

Which I take to mean there is a 'person' table which looks like:

id, is_sales_rep, org_id

and a table which has the RLS qual which looks like:

pk_id, sales_rep_id, partner_id

Then, if the individual is_sales_rep and it's their account by
sales_rep_id, or if the individual's org_id matches the partner_id, they
can see the record.

Using this example with security barrier views and indexes on person.id,
data.pk_id, data.sales_rep_id, and data.partner_id, we'll get a bitmap
heap scan across the 'data' table by having the two OR's run as
InitPlan 1 and InitPlan 2.

Does that address the concern you had around multi-branch OR policies?
This works with more than two OR branches also, though of course we need
appropriate indexes to make use of a Bitmap Heap Scan.

Even with per-user policies, we would define a policy along these lines,
for the "sfrost" role:

WHERE
sales_rep_id = 16384
OR partner_id = 1

Which also ends up doing a Bitmap Heap Scan across the data table.

For the case where a sales rep isn't also a partner, you could simplify
this to:

WHERE
sales_rep_id = 16384

but I'm not sure that really buys you much? With the bitmap heap
scan, if one side of the OR ends up not returning anything then it
doesn't contribute to the blocks which have to be scanned. The index
might still need to be scanned, although I think you could avoid even
that with an EXISTS check to see if the user is a partner at all.
That's not to say that a bitmap scan is equivilant to an index scan, but
it's certainly likely to be far better than a sequential scan.

Now, if the query is "select * from data_view with pk_id = 1002;", then
we get an indexed lookup on the data table based on the PK. That's what
I was trying to point out previously regarding leakproof functions
(which comprise about half of the boolean functions we provide, if I
recall my previous analysis correctly). We also get indexed lookups
with "pk_id < 10" or similar as those are also leakproof.

Multiple, Overlapping policies
------------------------------
Per discussion, these would generally be OR'd together.

Building up the overall qual which has to include an OR branch for each
individual policy qual(s) looks like a complicated bit of work and one
which might be better left to the user (and, as just pointed out, the
user may actually want AND instead of OR in some cases..).

Managing the plan cache in a sensible way is certainly made more
complicated by this and might mean that it can't be used at all, which
has already been raised as a show-stopper issue.

In the example which you provided, while we could represent that the two
policies exist (sales representatives vs partners) and that they are to
be OR'd together in the catalog, but I don't immediately see how that
would change the qual which ends up being added to the query in this
case or really improving the overall query plan; at least, not without
eliminating one of the OR branches somehow- which I discuss below.

Multiple, Non-overlapping policies
----------------------------------
Preventing the overlap of policies ends up being very complicated if
many dimensions are allowed. For the simple case, perhaps only the
'current role' dimension is useful. I expect that going down that
route would very quickly lead to requests for other dimensions (client
IP, etc) which is why I'm not a big fan of it, but if that's the
concensus then let's work out the syntax and update the patch and move
on.

Another option might be to have a qual for each policy which
the user can define that indicates if that policy is to be applied or
not and then simply pick the first policy for which that qual which
returns 'true'. We would require an ordering to be defined in this
case, which I believe was an issue up-thread. If we allow all policies
matching the quals then we run into the complications mentioned under
"Overlapping policies" above.

If we decide that per-role policies need to be supported, I very
quickly see the need to have "groups" of roles to which a policy is to
be applied. This would differ from roles today as they would not be
allowed to overlap (otherwise we are into overlapping policies again, or
having to figure out which of the overlapping policies should be applied
for each query; another option would be to error at run-time, but that
seems pretty ugly). In this case we would still need to support "all"
as an option, which is what I would expect to have implemented for 9.5,
or at least in the early part of 9.5 (I really don't want to wait until
the last CF or even the CF before that to get anything in as I suspect
it will have grown by that point to be large enough to be an issue..),
adding the per-role(s) option could be for 9.6/10.0.

In your example, if sales representatives have distinct roles from
partners, then the specific policy could be chosen and used based on
which role is running the query, which might lead to, perhaps only
maginal, improved performance in those specific cases, as discussed
above.

General multi-policy concerns
-----------------------------
Choosing which policy or policies to apply for a given query gets very
complicated very quickly if we're to do so in an automated way. Dean
suggests that the user would pick which policy to use, to which I argued
that roles could be used to manage that instead (a user could 'set role'
to a role which has the access requested). That mechanism would also
work in the existing single-policy approach by having a policy which
depends on the calling role (eg: by looking up that role in a table
which defines what access that role should have). It would also work in
the above proposal for multiple non-overlapping policies where the
policy to use is based on the current role.

Overall, while I'm interested in defining where this is going in a way
which allows us implement an initial RLS capability while avoiding
future upgrade issues, I am perfectly happy to say that the 9.5 RLS
implementation may not be exactly syntax-compatible with 9.6 or 10.0.
What I wish to avoid is a case where what's in 9.5 includes RLS
definitions which can't be implemented in 9.6/10.0 and as would cause
upgrade problems. As long as what's in 9.5 can be represented and
supported in 9.6/10.0, we can implement the necessary logic to migrate
from one to the other in pg_dump. We do not guarantee syntax
compatibility between major versions and we often warn users of newer
features that there may be some changes in subsequent releases which
they'll need to address when they upgrade (and, of course, these are
noted in the release notes).

Hopefully this will help us move the discussion forward to a point where
we have a long-term design as well as a short-term goal which is
actionable for 9.5. The current work is around adding the GUCs
discussed previously to the RLS patch and modifying pg_dump to use them,
to address the concerns raised previously about pg_dump running user
code and possibly not having a complete copy of the data.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2014-06-24 16:39:13 Re: Autonomous Transaction (WIP)
Previous Message David G Johnston 2014-06-24 15:39:48 Re: idle_in_transaction_timeout