265 lines
12 KiB
Plaintext
265 lines
12 KiB
Plaintext
|
This is the Linux kernel capabilities FAQ
|
||
|
|
||
|
Its history, to the extent that I am able to reconstruct it is that
|
||
|
v2.0 was posted to the Linux kernel list on 1999/04/02 by Boris
|
||
|
Tobotras. Thanks to Denis Ducamp for forwarding me a copy.
|
||
|
|
||
|
Cheers
|
||
|
|
||
|
Andrew
|
||
|
|
||
|
Linux Capabilities FAQ 0.2
|
||
|
==========================
|
||
|
|
||
|
1) What is a capability?
|
||
|
|
||
|
The name "capabilities" as used in the Linux kernel can be confusing.
|
||
|
First there are Capabilities as defined in computer science. A
|
||
|
capability is a token used by a process to prove that it is allowed to
|
||
|
do an operation on an object. The capability identifies the object
|
||
|
and the operations allowed on that object. A file descriptor is a
|
||
|
capability. You create the file descriptor with the "open" call and
|
||
|
request read or write permissions. Later, when doing a read or write
|
||
|
operation, the kernel uses the file descriptor as an index into a
|
||
|
data structure that indicates what operations are allowed. This is an
|
||
|
efficient way to check permissions. The necessary data structures are
|
||
|
created once during the "open" call. Later read and write calls only
|
||
|
have to do a table lookup. Operations on capabilities include copying
|
||
|
capabilities, transferring capabilities between processes, modifying a
|
||
|
capability, and revoking a capability. Modifying a capability can be
|
||
|
something like taking a read-write filedescriptor and making it
|
||
|
read-only. A capability often has a notion of an "owner" which is
|
||
|
able to invalidate all copies and derived versions of a capability.
|
||
|
Entire OSes are based on this "capability" model, with varying degrees
|
||
|
of purity. There are other ways of implementing capabilities than the
|
||
|
file descriptor model - traditionally special hardware has been used,
|
||
|
but modern systems also use the memory management unit of the CPU.
|
||
|
|
||
|
Then there is something quite different called "POSIX capabilities"
|
||
|
which is what Linux uses. These capabilities are a partitioning of
|
||
|
the all powerful root privilege into a set of distinct privileges (but
|
||
|
look at securelevel emulation to find out that this isn't necessary
|
||
|
the whole truth). Users familiar with VMS or "Trusted" versions of
|
||
|
other UNIX variants will know this under the name "privileges". The
|
||
|
name "capabilities" comes from the now defunct POSIX draft 1003.1e
|
||
|
which used this name.
|
||
|
|
||
|
2) So what is a "POSIX capability"?
|
||
|
|
||
|
A process has three sets of bitmaps called the inheritable(I),
|
||
|
permitted(P), and effective(E) capabilities. Each capability is
|
||
|
implemented as a bit in each of these bitmaps which is either set or
|
||
|
unset. When a process tries to do a privileged operation, the
|
||
|
operating system will check the appropriate bit in the effective set
|
||
|
of the process (instead of checking whether the effective uid of the
|
||
|
process i 0 as is normally done). For example, when a process tries
|
||
|
to set the clock, the Linux kernel will check that the process has the
|
||
|
CAP_SYS_TIME bit (which is currently bit 25) set in its effective set.
|
||
|
|
||
|
The permitted set of the process indicates the capabilities the
|
||
|
process can use. The process can have capabilities set in the
|
||
|
permitted set that are not in the effective set. This indicates that
|
||
|
the process has temporarily disabled this capability. A process is
|
||
|
allowed to set a bit in its effective set only if it is available in
|
||
|
the permitted set. The distinction between effective and permitted
|
||
|
exists so that processes can "bracket" operations that need privilege.
|
||
|
|
||
|
The inheritable capabilities are the capabilities of the current
|
||
|
process that should be inherited by a program executed by the current
|
||
|
process. The permitted set of a process is masked against the
|
||
|
inheritable set during exec(). Nothing special happens during fork()
|
||
|
or clone(). Child processes and threads are given an exact copy of
|
||
|
the capabilities of the parent process.
|
||
|
|
||
|
3) What about other entities in the system? Users, Groups, Files?
|
||
|
|
||
|
Files have capabilities. Conceptually they have the same three
|
||
|
bitmaps that processes have, but to avoid confusion we call them by
|
||
|
other names. Only executable files have capabilities, libraries don't
|
||
|
have capabilities (yet). The three sets are called the allowed set,
|
||
|
the forced set, and the effective set.
|
||
|
|
||
|
The allowed set indicates what capabilities the executable is allowed
|
||
|
to receive from an execing process. This means that during exec(),
|
||
|
the capabilities of the old process are first masked against a set
|
||
|
which indicates what the process gives away (the inheritable set of
|
||
|
the process), and then they are masked against a set which indicates
|
||
|
what capabilities the new process image is allowed to receive (the
|
||
|
allowed set of the executable).
|
||
|
|
||
|
The forced set is a set of capabilities created out of thin air and
|
||
|
given to the process after execing the executable. The forced set is
|
||
|
similar in nature to the setuid feature. In fact, the setuid bit from
|
||
|
the filesystem is "read" as a full forced set by the kernel.
|
||
|
|
||
|
The effective set indicates which bits in the permitted set of the new
|
||
|
process should be transferred to the effective set of the new process.
|
||
|
The effective set is best thought of as a "capability aware" set. It
|
||
|
should consist of only 1s if the executable is capability-dumb, or
|
||
|
only 0s if the executable is capability-smart. Since the effective
|
||
|
set consists of only 0s or only 1s, the filesystem can implement this
|
||
|
set using a single bit.
|
||
|
|
||
|
NOTE: Filesystem support for capabilities is not part of Linux 2.2.
|
||
|
|
||
|
Users and Groups don't have associated capabilities from the kernel's
|
||
|
point of view, but it is entirely reasonable to associate users or
|
||
|
groups with capabilities. By letting the "login" program set some
|
||
|
capabilities it is possible to make role users such as a backup user
|
||
|
that will have the CAP_DAC_READ_SEARCH capability and be able to do
|
||
|
backups. This could also be implemented as a PAM module, but nobody
|
||
|
has implemented one yet.
|
||
|
|
||
|
4) What capabilities exist?
|
||
|
|
||
|
The capabilities available in Linux are listed and documented in the
|
||
|
file /usr/src/linux/include/linux/capability.h.
|
||
|
|
||
|
5) Are Linux capabilities hierarchical?
|
||
|
|
||
|
No, you cannot make a "subcapability" out of a Linux capability as in
|
||
|
capability-based OSes.
|
||
|
|
||
|
6) How can I use capabilities to make sure Mr. Evil Luser (eluser)
|
||
|
can't exploit my "suid" programs?
|
||
|
|
||
|
This is the general outline of how this works given filesystem
|
||
|
capability support exists. First, you have a PAM module that sets the
|
||
|
inheritable capabilities of the login-shell of eluser. Then for all
|
||
|
"suid" programs on the system, you decide what capabilities they need
|
||
|
and set the _allowed_ set of the executable to that set of
|
||
|
capabilities. The capability rules
|
||
|
|
||
|
new permitted = forced | (allowed & inheritable)
|
||
|
|
||
|
means that you should be careful about setting forced capabilities on
|
||
|
executables. In a few cases, this can be useful though. For example
|
||
|
the login program needs to set the inheritable set of the new user and
|
||
|
therefore needs an almost full permitted set. So if you want eluser
|
||
|
to be able to run login and log in as a different user, you will have
|
||
|
to set some forced bits on that executable.
|
||
|
|
||
|
7) What about passing capabilities between processes?
|
||
|
|
||
|
Currently this is done by the system call "setcap" which can set the
|
||
|
capabilities of another process. This requires the CAP_SETPCAP
|
||
|
capability which you really only want to grant a _few_ processes.
|
||
|
CAP_SETPCAP was originally intended as a workaround to be able to
|
||
|
implement filesystem support for capabilities using a daemon outside
|
||
|
the kernel.
|
||
|
|
||
|
There has been discussions about implementing socket-level capability
|
||
|
passing. This means that you can pass a capability over a socket. No
|
||
|
support for this exists in the official kernel yet.
|
||
|
|
||
|
8) I see securelevel has been removed from 2.2 and are superceeded by
|
||
|
capabilities. How do I emulate securelevel using capabilities?
|
||
|
|
||
|
The setcap system call can remove a capability from _all_ processes on
|
||
|
the system in one atomic operation. The setcap utility from the
|
||
|
libcap distribution will do this for you. The utility requires the
|
||
|
CAP_SETPCAP privilege to do this. The CAP_SETPCAP capability is not
|
||
|
enabled by default.
|
||
|
|
||
|
libcap is available from
|
||
|
ftp://ftp.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.2/
|
||
|
|
||
|
9) I noticed that the capability.h file lacks some capabilities that
|
||
|
are needed to fully emulate 2.0 securelevel. Is there a patch for
|
||
|
this?
|
||
|
|
||
|
Actually yes - funny you should ask :-). The problem with 2.0
|
||
|
securelevel is that they for example stop root from accessing block
|
||
|
devices. At the same time they restrict the use of iopl. These two
|
||
|
changes are fundamentally different. Blocking access to block devices
|
||
|
means restricting something that usually isn't restricted.
|
||
|
Restricting access to the use of iopl on the other hand means
|
||
|
restricting (blocking) access to something that is already blocked.
|
||
|
Emulating the parts of 2.0 securelevel that restricts things that are
|
||
|
normally not restricted means that the capabilites in the kernel has
|
||
|
to have a set of capabilities that are usually _on_ for a normal
|
||
|
process (note that this breaks the explanation that capabilities are a
|
||
|
partitioning of the root privileges). There is an experimental patch at
|
||
|
|
||
|
ftp://ftp.guardian.no/pub/free/linux/capabilities/patch-cap-exp-1
|
||
|
|
||
|
which implements a set of capabilities with the "CAP_USER" prefix:
|
||
|
|
||
|
cap_user_sock - allowed to use socket()
|
||
|
cap_user_dev - allowed to open char/block devices
|
||
|
cap_user_fifo - allowed to use pipes
|
||
|
|
||
|
These should be enough to emulate 2.0 securelevel (tell me if we need
|
||
|
something more).
|
||
|
|
||
|
10) Seems I need a CAP_SETPCAP capability that I don't have to make use
|
||
|
of capabilities. How do I enable this capability?
|
||
|
|
||
|
Change the definition of CAP_INIT_EFF_SET and CAP_INIT_INH_SET to the
|
||
|
following in include/linux/capability.h:
|
||
|
|
||
|
#define CAP_INIT_EFF_SET { ~0 }
|
||
|
#define CAP_INIT_INH_SET { ~0 }
|
||
|
|
||
|
This will start init with a full capability set and not with
|
||
|
CAP_SETPCAP removed.
|
||
|
|
||
|
11) How do I start a process with a limited set of capabilities?
|
||
|
|
||
|
Get the libcap library and use the execcap utility. The following
|
||
|
example starts the update daemon with only the CAP_SYS_ADMIN
|
||
|
capability.
|
||
|
|
||
|
execcap 'cap_sys_admin=eip' update
|
||
|
|
||
|
12) How do I start a process with a limited set of capabilities under
|
||
|
another uid?
|
||
|
|
||
|
Use the sucap utility which changes uid from root without loosing any
|
||
|
capabilities. Normally all capabilities are cleared when changing uid
|
||
|
from root. The sucap utility requires the CAP_SETPCAP capability.
|
||
|
The following example starts updated under uid updated and gid updated
|
||
|
with CAP_SYS_ADMIN raised in the Effective set.
|
||
|
|
||
|
sucap updated updated execcap 'cap_sys_admin=eip' update
|
||
|
|
||
|
[ Sucap is currently available from
|
||
|
ftp://ftp.guardian.no/pub/free/linux/capabilities/sucap.c. Put it in
|
||
|
the progs directory of libcap to compile.]
|
||
|
|
||
|
13) What are the "capability rules"
|
||
|
|
||
|
The capability rules are the rules used to set the capabilities of the
|
||
|
new process image after an exec. They work like this:
|
||
|
|
||
|
pI' = pI
|
||
|
(***) pP' = fP | (fI & pI)
|
||
|
pE' = pP' & fE [NB. fE is 0 or ~0]
|
||
|
|
||
|
I=Inheritable, P=Permitted, E=Effective // p=process, f=file
|
||
|
' indicates post-exec().
|
||
|
|
||
|
Now to make sense of the equations think of fP as the Forced set of
|
||
|
the executable, and fI as the Allowed set of the executable. Notice
|
||
|
how the Inheritable set isn't touched at all during exec().
|
||
|
|
||
|
14) What are the laws for setting capability bits in the Inheritable,
|
||
|
Permitted, and Effective sets?
|
||
|
|
||
|
Bits can be transferred from Permitted to either Effective or
|
||
|
Inheritable set.
|
||
|
|
||
|
Bits can be removed from all sets.
|
||
|
|
||
|
15) Where is the standard on which the Linux capabilities are based?
|
||
|
|
||
|
There used to be a POSIX draft called POSIX.6 and later POSIX 1003.1e.
|
||
|
However after the committee had spent over 10 years, POSIX decided
|
||
|
that enough is enough and dropped the draft. There will therefore not
|
||
|
be a POSIX standard covering security anytime soon. This may lead to
|
||
|
that the POSIX draft is available for free, however.
|
||
|
|
||
|
--
|
||
|
Best regards, -- Boris.
|
||
|
|