bd2953ebbb
This patch makes exception changes to propagate down in hierarchy respecting when possible local exceptions. New exceptions allowing additional access to devices won't be propagated, but it'll be possible to add an exception to access all of part of the newly allowed device(s). New exceptions disallowing access to devices will be propagated down and the local group's exceptions will be revalidated for the new situation. Example: A / \ B group behavior exceptions A allow "b 8:* rwm", "c 116:1 rw" B deny "c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm" If a new exception is added to group A: # echo "c 116:* r" > A/devices.deny it'll propagate down and after revalidating B's local exceptions, the exception "c 116:2 rwm" will be removed. In case parent's exceptions change and local exceptions are not allowed anymore, they'll be deleted. v7: - do not allow behavior change when the cgroup has children - update documentation v6: fixed issues pointed by Serge Hallyn - only copy parent's exceptions while propagating behavior if the local behavior is different - while propagating exceptions, do not clear and copy parent's: it'd be against the premise we don't propagate access to more devices v5: fixed issues pointed by Serge Hallyn - updated documentation - not propagating when an exception is written to devices.allow - when propagating a new behavior, clean the local exceptions list if they're for a different behavior v4: fixed issues pointed by Tejun Heo - separated function to walk the tree and collect valid propagation targets v3: fixed issues pointed by Tejun Heo - update documentation - move css_online/css_offline changes to a new patch - use cgroup_for_each_descendant_pre() instead of own descendant walk - move exception_copy rework to a separared patch - move exception_clean rework to a separated patch v2: fixed issues pointed by Tejun Heo - instead of keeping the local settings that won't apply anymore, remove them Cc: Tejun Heo <tj@kernel.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Aristeu Rozanski <aris@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
117 lines
4.3 KiB
Plaintext
117 lines
4.3 KiB
Plaintext
Device Whitelist Controller
|
|
|
|
1. Description:
|
|
|
|
Implement a cgroup to track and enforce open and mknod restrictions
|
|
on device files. A device cgroup associates a device access
|
|
whitelist with each cgroup. A whitelist entry has 4 fields.
|
|
'type' is a (all), c (char), or b (block). 'all' means it applies
|
|
to all types and all major and minor numbers. Major and minor are
|
|
either an integer or * for all. Access is a composition of r
|
|
(read), w (write), and m (mknod).
|
|
|
|
The root device cgroup starts with rwm to 'all'. A child device
|
|
cgroup gets a copy of the parent. Administrators can then remove
|
|
devices from the whitelist or add new entries. A child cgroup can
|
|
never receive a device access which is denied by its parent.
|
|
|
|
2. User Interface
|
|
|
|
An entry is added using devices.allow, and removed using
|
|
devices.deny. For instance
|
|
|
|
echo 'c 1:3 mr' > /sys/fs/cgroup/1/devices.allow
|
|
|
|
allows cgroup 1 to read and mknod the device usually known as
|
|
/dev/null. Doing
|
|
|
|
echo a > /sys/fs/cgroup/1/devices.deny
|
|
|
|
will remove the default 'a *:* rwm' entry. Doing
|
|
|
|
echo a > /sys/fs/cgroup/1/devices.allow
|
|
|
|
will add the 'a *:* rwm' entry to the whitelist.
|
|
|
|
3. Security
|
|
|
|
Any task can move itself between cgroups. This clearly won't
|
|
suffice, but we can decide the best way to adequately restrict
|
|
movement as people get some experience with this. We may just want
|
|
to require CAP_SYS_ADMIN, which at least is a separate bit from
|
|
CAP_MKNOD. We may want to just refuse moving to a cgroup which
|
|
isn't a descendant of the current one. Or we may want to use
|
|
CAP_MAC_ADMIN, since we really are trying to lock down root.
|
|
|
|
CAP_SYS_ADMIN is needed to modify the whitelist or move another
|
|
task to a new cgroup. (Again we'll probably want to change that).
|
|
|
|
A cgroup may not be granted more permissions than the cgroup's
|
|
parent has.
|
|
|
|
4. Hierarchy
|
|
|
|
device cgroups maintain hierarchy by making sure a cgroup never has more
|
|
access permissions than its parent. Every time an entry is written to
|
|
a cgroup's devices.deny file, all its children will have that entry removed
|
|
from their whitelist and all the locally set whitelist entries will be
|
|
re-evaluated. In case one of the locally set whitelist entries would provide
|
|
more access than the cgroup's parent, it'll be removed from the whitelist.
|
|
|
|
Example:
|
|
A
|
|
/ \
|
|
B
|
|
|
|
group behavior exceptions
|
|
A allow "b 8:* rwm", "c 116:1 rw"
|
|
B deny "c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm"
|
|
|
|
If a device is denied in group A:
|
|
# echo "c 116:* r" > A/devices.deny
|
|
it'll propagate down and after revalidating B's entries, the whitelist entry
|
|
"c 116:2 rwm" will be removed:
|
|
|
|
group whitelist entries denied devices
|
|
A all "b 8:* rwm", "c 116:* rw"
|
|
B "c 1:3 rwm", "b 3:* rwm" all the rest
|
|
|
|
In case parent's exceptions change and local exceptions are not allowed
|
|
anymore, they'll be deleted.
|
|
|
|
Notice that new whitelist entries will not be propagated:
|
|
A
|
|
/ \
|
|
B
|
|
|
|
group whitelist entries denied devices
|
|
A "c 1:3 rwm", "c 1:5 r" all the rest
|
|
B "c 1:3 rwm", "c 1:5 r" all the rest
|
|
|
|
when adding "c *:3 rwm":
|
|
# echo "c *:3 rwm" >A/devices.allow
|
|
|
|
the result:
|
|
group whitelist entries denied devices
|
|
A "c *:3 rwm", "c 1:5 r" all the rest
|
|
B "c 1:3 rwm", "c 1:5 r" all the rest
|
|
|
|
but now it'll be possible to add new entries to B:
|
|
# echo "c 2:3 rwm" >B/devices.allow
|
|
# echo "c 50:3 r" >B/devices.allow
|
|
or even
|
|
# echo "c *:3 rwm" >B/devices.allow
|
|
|
|
Allowing or denying all by writing 'a' to devices.allow or devices.deny will
|
|
not be possible once the device cgroups has children.
|
|
|
|
4.1 Hierarchy (internal implementation)
|
|
|
|
device cgroups is implemented internally using a behavior (ALLOW, DENY) and a
|
|
list of exceptions. The internal state is controlled using the same user
|
|
interface to preserve compatibility with the previous whitelist-only
|
|
implementation. Removal or addition of exceptions that will reduce the access
|
|
to devices will be propagated down the hierarchy.
|
|
For every propagated exception, the effective rules will be re-evaluated based
|
|
on current parent's access rules.
|