Re: [v9.1] sepgsql - userspace access vector cache

From: Kohei Kaigai <Kohei(dot)Kaigai(at)EMEA(dot)NEC(dot)COM>
To: Yeb Havinga <yebhavinga(at)gmail(dot)com>, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [v9.1] sepgsql - userspace access vector cache
Date: 2011-07-20 16:04:02
Message-ID: D0C1A1F8BF513F469926E6C71461D9EC01DAB9@EX10MBX02.EU.NEC.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Yeb, Thanks for your volunteering.

> On 2011-07-14 21:46, Kohei KaiGai wrote:
> > Sorry, the syscache part was mixed to contrib/sepgsql part
> > in my previous post.
> > Please see the attached revision.
> >
> > Although its functionality is enough simple (it just reduces
> > number of system-call invocation), its performance
> > improvement is obvious.
> > So, I hope someone to volunteer to review these patches.
> This is a review of the two userspace access vector cache patches for
> SE-PostgreSQL. Proofreading the source I've seen mostly cosmetic things.
> Though I have a few questions, I think the overal shape of the patch is
> good enough to mark it ready for comitter.
>
> Remarks:
>
> * The patches apply cleanly, compile cleanly on Fedora 15. It depends on
> libselinux-2.0.99, and that is checked on by the configure script: good.
>
> * I run SELECT sepgsql_restorecon(NULL) and saw comparable numbers in
> speed gain and also backend process memory increase as indicated by
> KaiGai-san. The vmsize without the patch increases from running
> restorecon 3864kB, with the patch is increases 8160kB, a difference of
> ~5MB. Where this change comes from is unclear to me. The avc_cache holds
> only 7 entries, and the avc memory context stats indicate it's only at
> 8kB. Investigation into the SECLABELOID syscache revealed that even when
> that cache was set to a #buckets of 2, still after restorecon() the
> backend's vmsize increased about the ~5MB more.
>
The sepgsql_restorecon(NULL) assigns default security label on all the
database objects being controlled, thus, its workload caches security
label (including text data) of these objects.
So, ~5MB of difference is an upper limit of syscache usage because of
SECLABELOID.
The number of buckets is independent from the memory consumption.

> * The size of SECLABELOID is currently set to 2048, which is equal to
> sizes of the pg_proc and pg_attribute caches and hence makes sense. The
> initial contents of pg_seclabel is 3346 entries. Just to get some idea
> what the cachesize means for restorecon performance I tested some
> different values (debugging was on). Until a size of 256, restorecon had
> comparable performance. Small drop from ~415 to ~425 from 128 to 64.
> Cache of 32 had ~435ms performance. Cache of 2 had 680ms. Without
> debugging symbols, I also got a slightly better performance on the
> restorecon call with a 1024 syscache size. This might be something to
> tweak in later versions, when there is more experience what this cache
> size means for performance on real databases.
>
Thanks for your research.
The reason why I set 2048 is just a copy from the catalog which may
hold biggest number of entries (pg_proc). It may be improved later.

> * The cache is called userspace, presumably because it isn't cached in
> shared memory? If sepgsql is targeted at installations with many
> different kinds of clients (having different subject contexts), or
> access different objects, this is a good choice.
>
SELinux's kernel implementation also has access vector cache:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob;f=security/selinux/avc.c

The reason why I didn't choose shared memory is the length of security
label text is variable, so it is not feasible to acquire a fixed length
region on shared memory in startup time.

> * Checked that the cache keeps working after errors, ok.
>
> * uavc.c defines some static variables like lru_hint, threshold and
> unlabeled. It would be nice to have a small comment with each one that
> says what the variable represents (like is done for the avc_cache struct
> members right above it)
>
OK, I'll add comments here.

> * I had to google SELINUX_AVD_FLAGS_PERMISSIVE to understand what was
> going on. Since the goal why it is recorded a domain is permissive, is
> to prevent flooding the log, maybe renaming avc_cache.permissive into
> avc_cache.log_violations_once would explain itself.
>
It is a bit concern for me, because the permissive domain means it shall
be performed like as system is configured as permissive mode.
The behavior of log-violation-at-once is a property of permissive mode.
So, how about an idea to add comments about permissive domain around
the code to modify cache->allowed?

> * selinux_status_open() is called and it's man page mentions
> selinux_status_close(). It currently unmaps memory and closes a file
> descriptor, which is done on process exit anyway. I don't have enough
> experience with these kinds of system calls to have a feeling whether it
> might change in the future (and in such cases I'd have added a call to
> selinux_status_close() on proc_exit, just to be on the safe side).
>
I omitted it because OS will handle these things correctly on process
Exit time, however, it is good idea to move on safe side.

> * sepgsel_avs_unlabeled(). I was confused a bit because objects can have
> a label 'unlabeled', and the comments use wording like 'when unlabeled'.
> Does this mean with no label, or with a label 'unlabeled'?
>
It means object has no label. I'll fix this comment.

> * sepgsql_avc_compute(): the large comment in the beginning is hard to
> follow since sentences seem to have been a bit mixed up.
>
OK, I'll simplify the comment. How about your feeling about?

/*
* Validation check of the supplied security context.
* Because it always invoke system-call, frequent check should be avoided.
* Unless security policy is reloaded, validation status shall be kept,
* so we also cache whether the supplied security context was valid, or not.
*/

> * question: why is ucontext stored in avc_cache?
>
The 'tcontext' has to be kept on avc_cache as-is, even if the security label
is not valid according to security_check_context_raw(), because it is used to
hash-key.
If your point is that validation status should be kept in bool, not char *,
it is fair enough for me.

> * there is a new guc parameter for the user: avc_threshold, which is
> also documented. My initial question after reading the description was:
> what are the 'items' and how can I see current usage / what are numbers
> to expect? Without that knowledge any educated change of that parameter
> would be impossible. Would it be possible to remove this guc?
>
Hmm, it might require users deep knowledge about inside of sepgsql,
although it is useful to few situations.
OK, I'll remove it in this version.

> * avc_init has magic numbers 384, 4096. Maybe these can be defines (with
> a small comment what it is?)
>
I'll add a comment around static variable of avc_threshold.

> * Finally, IMO the call sites, e.g. check_relation_priviliges, have
> improved in readability with this patch.
>

Regarding to the point you suggested, I'll revise my patch within a day.

Thanks,
--
NEC Europe Ltd, SAP Global Competence Center
KaiGai Kohei <kohei(dot)kaigai(at)emea(dot)nec(dot)com>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-07-20 16:21:59 Re: [v9.1] sepgsql - userspace access vector cache
Previous Message Yeb Havinga 2011-07-20 16:00:28 Re: [v9.1] sepgsql - userspace access vector cache