Re: [PATCH] Largeobject access controls

From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>, KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Largeobject access controls
Date: 2009-10-14 05:34:32
Message-ID: 4AD562E8.9020804@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com> writes:
>> The attached patch is the revised one for largeobejct access controls,
>> because it conflicted to the "GRANT/REVOKE ON ALL xxx IN SCHEMA".
>
> I started to look through this patch with the hope of committing it, but
> found out that it's not really ready.
>
> The most serious problem is that you ripped out myLargeObjectExists(),
> apparently because you didn't understand what it's there for. The reason
> it's there is to ensure that pg_dump runs don't fail. pg_dump is expected
> to produce a consistent dump of all large objects that existed at the
> time of its transaction snapshot. With this code, pg_dump would get a
> "large object doesn't exist" error on any LO that is deleted between the
> time of the snapshot and the time pg_dump actually tries to dump it ---
> which could be quite a large window in a large database.
>
> The reason this is a significant problem and not just an easily fixable
> oversight is that it's not entirely clear what to do instead. We can
> certainly make the pure existence test use the query snapshot instead of
> SnapshotNow, but what about the implied permissions tests? Should we
> attempt to make them run using the version of the LO's ACL found in the
> query-snapshot-time metadata row? The trouble with that is it might refer
> to roles that don't exist anymore, perhaps resulting in failures down
> inside the ACL checking routines. It would be safer to rely on the
> current metadata row contents, but then we have the question of whether to
> allow the access if the row doesn't exist according to SnapshotNow.
>
> Now these issues arise to some extent already in pg_dump, but the current
> window for failure is quite short because it obtains access share locks on
> all the tables it will dump at the start of the run. With large objects
> the window in which things could have changed is very much longer.
>
> Of course, in the cases that people are most concerned about, pg_dump is
> running as superuser and so the actual ACL contents don't really matter
> anyway, so long as we don't fail outright before getting to the check.
> So I'm kind of inclined to say that the least evil solution is to apply
> the permissions check using the query-snapshot-time metadata row.
> It's definitely a debatable question though. We'd also want to make sure
> that the aclcheck code doesn't fail if it finds a nonexistent role ID
> in the ACL.

The origin of this matter is the basis of existence was changed.
Our current basis is whether pg_largeobject has one or more data chunk for
the given loid in the correct snapshot, or not.
The newer one is whether pg_largeobject_metadata has the entry for the given
loid in the SnapshotNow, or not.

The newer basis is same as other database objects, such as functions.
But why pg_dump performs correctly for these database objects?
In my understanding, because it reads the system catalog directly in
the query snapshot. So, it will be visible, if concurrent transaction
dropped a function to be backed up.

Now, pg_dump uses libpq's large object interface which internally uses
loread()/lowrite().
If pg_dump fetches data chunks from the system catalog, it seems to me
the matter is reasonably solvable.

My assumption is that you're not talking about a generic situation when
a certain database object is unavailable even if we can find it within
the system catalog, apart from large object backups.

For example, we can easily produce a similar behavior when we tries to
use a function which can be found in pg_proc, but concurrent transaction
already removed it.
I don't believe PostgreSQL guarantees equivalence between the visibility
of system catalog and the availability of the related database object.
So, is it the simplest approach to patch on the pg_dump?

> Moving on to lesser but still significant problems, you probably already
> guessed my next one: there's no pg_dump support. If the system tracks
> owner and ACL for large objects, pg_dump *must* be prepared to dump that
> information. It'd also be worthwhile to teach pg_dump that in servers >=
> 8.5, it can look in the metadata catalog to make the list of LO OIDs
> instead of having to do a SELECT DISTINCT from pg_largeobject.

Hmm. I planed to add support to the pg_dump next to the serve-side enhancement.
If both of patches are necessary soon, I'll include it in this phase.

> I notice that the patch decides to change the pg_description classoid for
> LO comments from pg_largeobject's OID to pg_largeobject_metadata's. This
> will break existing clients that look at pg_description (eg, pg_dump and
> psql, plus any other clients that have any intelligence about comments,
> for instance it probably breaks pgAdmin). And there's just not a lot of
> return that I can see. I agree that using pg_largeobject_metadata would
> be more consistent given the new catalog layout, but I'm inclined to think
> we should stick to the old convention on compatibility grounds. Given
> that choice, for consistency we'd better also use pg_largeobject's OID not
> pg_largeobject_metadata's in pg_shdepend entries for LOs.

It seems to me reasonable.
I'll fix it with comments why it uses LargeObjectRelationId not
LargeObjectMetadataRelationId here.

> In the category of lesser issues that have still got to be fixed:
>
> * You need to pay more attention to updating comments. For example
> your changes to LargeObjectCreate render its header comment a complete
> lie, but you didn't change the comment.
OK, I'll fix it.

> (And what is the purpose of
> renaming it to CreateLargeObject, and similarly for the adjacent
> routines? You didn't change the API meaningfully, so there is no
> reason to break calling code that way.)
Yes, ExecAlterOwnerStmt() calls AlterXXXXOwner() but only largeobject
has different naming convention, if we follow the existing names such
as LargeObjectCreate().
But I don't have any strong motication for the name. I'll fix it.

> * The documentation needs work too, eg a new system catalog requires a
> page in catalogs.sgml, and I'll bet there's a few references to large
> objects and/or permissions that need to be updated.
I left for adding the page for pg_largeobject_metadata. I'll add it.

> * "largeobject" is not an English word. The occurrences in user-visible
> messages and documentation must get changed to "large object". I've got
> mixed emotions even about using it in code identifiers, although we
> certainly aren't going to rename pg_largeobject, so anything that's named
> in parallel to that should stay as-is.

OK, I'll fix it.

> In the same vein, "acl" is not
> a word we want to expose to users, so "largeobject_check_acl" is doubly
> bad as a GUC variable name. Perhaps "large_object_privilege_checks"
> would do.
Hmm. OK, I'll fix it.

> * I'm not really happy with the ac_largeobject_foo shim layer, and would
> frankly prefer to rip it out and put those tests inline. It's poorly
> thought out IMO --- eg, what the heck is that cascade boolean --- and
> doesn't look like any of the rest of the Postgres code stylistically, and
> it makes the calling code look different from similar tests elsewhere too.
> When and if SELinux support ever gets in, I'd expect that the stubs for
> it would get put into the aclchk.c routines not into all their callers.
> So this doesn't seem to me that it's going in the right direction even if
> we posit that that support will get in.

At the beginning of this commit fest, I was not clear which patch is
merged earlier, so I separated access control routines for easy integration
in the future.
However, it is not a matter to deploy inlined ACL checks in this stage.
If arguable, I'll implement it with the current style in this patch.

BTW, we tried to put SELinux hooks within pg_xxx_aclcheck() routines
but it was already failed on the first commit fest, so we are now working
for the abstranction layer for access controls.

> * Lastly, that change in psql is neither version-aware nor schema-safe.
> psql is expected to still work with older server versions, so you need
> two versions of that query, not just a replacement. And don't omit the
> "pg_catalog." qualifier. (Also, I wonder whether it shouldn't have a way
> to display permissions for LOs, though maybe that ought to be conditional.
> Time for "\lo_list+" perhaps?)

In the meantime, I'll add two versions of that query here.
If we need "\lo_list+", it can be fixed later.

Thanks,
--
OSS Platform Development Division, NEC
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2009-10-14 07:27:43 Re: Client application name
Previous Message Itagaki Takahiro 2009-10-14 04:34:31 Re: Triggers on columns