| .\" Copyright (c) 2002 by Michael Kerrisk <mtk.manpages@gmail.com> |
| .\" |
| .\" %%%LICENSE_START(VERBATIM) |
| .\" Permission is granted to make and distribute verbatim copies of this |
| .\" manual provided the copyright notice and this permission notice are |
| .\" preserved on all copies. |
| .\" |
| .\" Permission is granted to copy and distribute modified versions of this |
| .\" manual under the conditions for verbatim copying, provided that the |
| .\" entire resulting derived work is distributed under the terms of a |
| .\" permission notice identical to this one. |
| .\" |
| .\" Since the Linux kernel and libraries are constantly changing, this |
| .\" manual page may be incorrect or out-of-date. The author(s) assume no |
| .\" responsibility for errors or omissions, or for damages resulting from |
| .\" the use of the information contained herein. The author(s) may not |
| .\" have taken the same level of care in the production of this manual, |
| .\" which is licensed free of charge, as they might when working |
| .\" professionally. |
| .\" |
| .\" Formatted or processed versions of this manual, if unaccompanied by |
| .\" the source, must acknowledge the copyright and authors of this work. |
| .\" %%%LICENSE_END |
| .\" |
| .\" 6 Aug 2002 - Initial Creation |
| .\" Modified 2003-05-23, Michael Kerrisk, <mtk.manpages@gmail.com> |
| .\" Modified 2004-05-27, Michael Kerrisk, <mtk.manpages@gmail.com> |
| .\" 2004-12-08, mtk Added O_NOATIME for CAP_FOWNER |
| .\" 2005-08-16, mtk, Added CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE |
| .\" 2008-07-15, Serge Hallyn <serue@us.bbm.com> |
| .\" Document file capabilities, per-process capability |
| .\" bounding set, changed semantics for CAP_SETPCAP, |
| .\" and other changes in 2.6.2[45]. |
| .\" Add CAP_MAC_ADMIN, CAP_MAC_OVERRIDE, CAP_SETFCAP. |
| .\" 2008-07-15, mtk |
| .\" Add text describing circumstances in which CAP_SETPCAP |
| .\" (theoretically) permits a thread to change the |
| .\" capability sets of another thread. |
| .\" Add section describing rules for programmatically |
| .\" adjusting thread capability sets. |
| .\" Describe rationale for capability bounding set. |
| .\" Document "securebits" flags. |
| .\" Add text noting that if we set the effective flag for one file |
| .\" capability, then we must also set the effective flag for all |
| .\" other capabilities where the permitted or inheritable bit is set. |
| .\" 2011-09-07, mtk/Serge hallyn: Add CAP_SYSLOG |
| .\" |
| .TH CAPABILITIES 7 2015-05-07 "Linux" "Linux Programmer's Manual" |
| .SH NAME |
| capabilities \- overview of Linux capabilities |
| .SH DESCRIPTION |
| For the purpose of performing permission checks, |
| traditional UNIX implementations distinguish two categories of processes: |
| .I privileged |
| processes (whose effective user ID is 0, referred to as superuser or root), |
| and |
| .I unprivileged |
| processes (whose effective UID is nonzero). |
| Privileged processes bypass all kernel permission checks, |
| while unprivileged processes are subject to full permission |
| checking based on the process's credentials |
| (usually: effective UID, effective GID, and supplementary group list). |
| |
| Starting with kernel 2.2, Linux divides the privileges traditionally |
| associated with superuser into distinct units, known as |
| .IR capabilities , |
| which can be independently enabled and disabled. |
| Capabilities are a per-thread attribute. |
| .\" |
| .SS Capabilities list |
| The following list shows the capabilities implemented on Linux, |
| and the operations or behaviors that each capability permits: |
| .TP |
| .BR CAP_AUDIT_CONTROL " (since Linux 2.6.11)" |
| Enable and disable kernel auditing; change auditing filter rules; |
| retrieve auditing status and filtering rules. |
| .TP |
| .BR CAP_AUDIT_READ " (since Linux 3.16)" |
| .\" commit a29b694aa1739f9d76538e34ae25524f9c549d59 |
| .\" commit 3a101b8de0d39403b2c7e5c23fd0b005668acf48 |
| Allow reading the audit log via a multicast netlink socket. |
| .TP |
| .BR CAP_AUDIT_WRITE " (since Linux 2.6.11)" |
| Write records to kernel auditing log. |
| .TP |
| .BR CAP_BLOCK_SUSPEND " (since Linux 3.5)" |
| Employ features that can block system suspend |
| .RB ( epoll (7) |
| .BR EPOLLWAKEUP , |
| .IR /proc/sys/wake_lock ). |
| .TP |
| .B CAP_CHOWN |
| Make arbitrary changes to file UIDs and GIDs (see |
| .BR chown (2)). |
| .TP |
| .B CAP_DAC_OVERRIDE |
| Bypass file read, write, and execute permission checks. |
| (DAC is an abbreviation of "discretionary access control".) |
| .TP |
| .B CAP_DAC_READ_SEARCH |
| .PD 0 |
| .RS |
| .IP * 2 |
| Bypass file read permission checks and |
| directory read and execute permission checks; |
| .IP * |
| Invoke |
| .BR open_by_handle_at (2). |
| .RE |
| .PD |
| |
| .TP |
| .B CAP_FOWNER |
| .PD 0 |
| .RS |
| .IP * 2 |
| Bypass permission checks on operations that normally |
| require the filesystem UID of the process to match the UID of |
| the file (e.g., |
| .BR chmod (2), |
| .BR utime (2)), |
| excluding those operations covered by |
| .B CAP_DAC_OVERRIDE |
| and |
| .BR CAP_DAC_READ_SEARCH ; |
| .IP * |
| set extended file attributes (see |
| .BR chattr (1)) |
| on arbitrary files; |
| .IP * |
| set Access Control Lists (ACLs) on arbitrary files; |
| .IP * |
| ignore directory sticky bit on file deletion; |
| .IP * |
| specify |
| .B O_NOATIME |
| for arbitrary files in |
| .BR open (2) |
| and |
| .BR fcntl (2). |
| .RE |
| .PD |
| .TP |
| .B CAP_FSETID |
| Don't clear set-user-ID and set-group-ID mode |
| bits when a file is modified; |
| set the set-group-ID bit for a file whose GID does not match |
| the filesystem or any of the supplementary GIDs of the calling process. |
| .TP |
| .B CAP_IPC_LOCK |
| .\" FIXME . As at Linux 3.2, there are some strange uses of this capability |
| .\" in other places; they probably should be replaced with something else. |
| Lock memory |
| .RB ( mlock (2), |
| .BR mlockall (2), |
| .BR mmap (2), |
| .BR shmctl (2)). |
| .TP |
| .B CAP_IPC_OWNER |
| Bypass permission checks for operations on System V IPC objects. |
| .TP |
| .B CAP_KILL |
| Bypass permission checks for sending signals (see |
| .BR kill (2)). |
| This includes use of the |
| .BR ioctl (2) |
| .B KDSIGACCEPT |
| operation. |
| .\" FIXME . CAP_KILL also has an effect for threads + setting child |
| .\" termination signal to other than SIGCHLD: without this |
| .\" capability, the termination signal reverts to SIGCHLD |
| .\" if the child does an exec(). What is the rationale |
| .\" for this? |
| .TP |
| .BR CAP_LEASE " (since Linux 2.4)" |
| Establish leases on arbitrary files (see |
| .BR fcntl (2)). |
| .TP |
| .B CAP_LINUX_IMMUTABLE |
| Set the |
| .B FS_APPEND_FL |
| and |
| .B FS_IMMUTABLE_FL |
| .\" These attributes are now available on ext2, ext3, Reiserfs, XFS, JFS |
| inode flags (see |
| .BR chattr (1)). |
| .TP |
| .BR CAP_MAC_ADMIN " (since Linux 2.6.25)" |
| Override Mandatory Access Control (MAC). |
| Implemented for the Smack Linux Security Module (LSM). |
| .TP |
| .BR CAP_MAC_OVERRIDE " (since Linux 2.6.25)" |
| Allow MAC configuration or state changes. |
| Implemented for the Smack LSM. |
| .TP |
| .BR CAP_MKNOD " (since Linux 2.4)" |
| Create special files using |
| .BR mknod (2). |
| .TP |
| .B CAP_NET_ADMIN |
| Perform various network-related operations: |
| .PD 0 |
| .RS |
| .IP * 2 |
| interface configuration; |
| .IP * |
| administration of IP firewall, masquerading, and accounting; |
| .IP * |
| modify routing tables; |
| .IP * |
| bind to any address for transparent proxying; |
| .IP * |
| set type-of-service (TOS) |
| .IP * |
| clear driver statistics; |
| .IP * |
| set promiscuous mode; |
| .IP * |
| enabling multicasting; |
| .IP * |
| use |
| .BR setsockopt (2) |
| to set the following socket options: |
| .BR SO_DEBUG , |
| .BR SO_MARK , |
| .BR SO_PRIORITY |
| (for a priority outside the range 0 to 6), |
| .BR SO_RCVBUFFORCE , |
| and |
| .BR SO_SNDBUFFORCE . |
| .RE |
| .PD |
| .TP |
| .B CAP_NET_BIND_SERVICE |
| Bind a socket to Internet domain privileged ports |
| (port numbers less than 1024). |
| .TP |
| .B CAP_NET_BROADCAST |
| (Unused) Make socket broadcasts, and listen to multicasts. |
| .TP |
| .B CAP_NET_RAW |
| .PD 0 |
| .RS |
| .IP * 2 |
| use RAW and PACKET sockets; |
| .IP * |
| bind to any address for transparent proxying. |
| .RE |
| .PD |
| .\" Also various IP options and setsockopt(SO_BINDTODEVICE) |
| .TP |
| .B CAP_SETGID |
| Make arbitrary manipulations of process GIDs and supplementary GID list; |
| forge GID when passing socket credentials via UNIX domain sockets; |
| write a group ID mapping in a user namespace (see |
| .BR user_namespaces (7)). |
| .TP |
| .BR CAP_SETFCAP " (since Linux 2.6.24)" |
| Set file capabilities. |
| .TP |
| .B CAP_SETPCAP |
| If file capabilities are not supported: |
| grant or remove any capability in the |
| caller's permitted capability set to or from any other process. |
| (This property of |
| .B CAP_SETPCAP |
| is not available when the kernel is configured to support |
| file capabilities, since |
| .B CAP_SETPCAP |
| has entirely different semantics for such kernels.) |
| |
| If file capabilities are supported: |
| add any capability from the calling thread's bounding set |
| to its inheritable set; |
| drop capabilities from the bounding set (via |
| .BR prctl (2) |
| .BR PR_CAPBSET_DROP ); |
| make changes to the |
| .I securebits |
| flags. |
| .TP |
| .B CAP_SETUID |
| Make arbitrary manipulations of process UIDs |
| .RB ( setuid (2), |
| .BR setreuid (2), |
| .BR setresuid (2), |
| .BR setfsuid (2)); |
| forge UID when passing socket credentials via UNIX domain sockets; |
| write a user ID mapping in a user namespace (see |
| .BR user_namespaces (7)). |
| .\" FIXME CAP_SETUID also an effect in exec(); document this. |
| .TP |
| .B CAP_SYS_ADMIN |
| .PD 0 |
| .RS |
| .IP * 2 |
| Perform a range of system administration operations including: |
| .BR quotactl (2), |
| .BR mount (2), |
| .BR umount (2), |
| .BR swapon (2), |
| .BR swapoff (2), |
| .BR sethostname (2), |
| and |
| .BR setdomainname (2); |
| .IP * |
| perform privileged |
| .BR syslog (2) |
| operations (since Linux 2.6.37, |
| .BR CAP_SYSLOG |
| should be used to permit such operations); |
| .IP * |
| perform |
| .B VM86_REQUEST_IRQ |
| .BR vm86 (2) |
| command; |
| .IP * |
| perform |
| .B IPC_SET |
| and |
| .B IPC_RMID |
| operations on arbitrary System V IPC objects; |
| .IP * |
| override |
| .B RLIMIT_NPROC |
| resource limit; |
| .IP * |
| perform operations on |
| .I trusted |
| and |
| .I security |
| Extended Attributes (see |
| .BR xattr (7)); |
| .IP * |
| use |
| .BR lookup_dcookie (2); |
| .IP * |
| use |
| .BR ioprio_set (2) |
| to assign |
| .B IOPRIO_CLASS_RT |
| and (before Linux 2.6.25) |
| .B IOPRIO_CLASS_IDLE |
| I/O scheduling classes; |
| .IP * |
| forge PID when passing socket credentials via UNIX domain sockets; |
| .IP * |
| exceed |
| .IR /proc/sys/fs/file-max , |
| the system-wide limit on the number of open files, |
| in system calls that open files (e.g., |
| .BR accept (2), |
| .BR execve (2), |
| .BR open (2), |
| .BR pipe (2)); |
| .IP * |
| employ |
| .B CLONE_* |
| flags that create new namespaces with |
| .BR clone (2) |
| and |
| .BR unshare (2) |
| (but, since Linux 3.8, |
| creating user namespaces does not require any capability); |
| .IP * |
| call |
| .BR perf_event_open (2); |
| .IP * |
| access privileged |
| .I perf |
| event information; |
| .IP * |
| call |
| .BR setns (2) |
| (requires |
| .B CAP_SYS_ADMIN |
| in the |
| .I target |
| namespace); |
| .IP * |
| call |
| .BR fanotify_init (2); |
| .IP * |
| call |
| .BR bpf (2); |
| .IP * |
| perform |
| .B KEYCTL_CHOWN |
| and |
| .B KEYCTL_SETPERM |
| .BR keyctl (2) |
| operations; |
| .IP * |
| perform |
| .BR madvise (2) |
| .B MADV_HWPOISON |
| operation; |
| .IP * |
| employ the |
| .B TIOCSTI |
| .BR ioctl (2) |
| to insert characters into the input queue of a terminal other than |
| the caller's controlling terminal; |
| .IP * |
| employ the obsolete |
| .BR nfsservctl (2) |
| system call; |
| .IP * |
| employ the obsolete |
| .BR bdflush (2) |
| system call; |
| .IP * |
| perform various privileged block-device |
| .BR ioctl (2) |
| operations; |
| .IP * |
| perform various privileged filesystem |
| .BR ioctl (2) |
| operations; |
| .IP * |
| perform administrative operations on many device drivers. |
| .RE |
| .PD |
| .TP |
| .B CAP_SYS_BOOT |
| Use |
| .BR reboot (2) |
| and |
| .BR kexec_load (2). |
| .TP |
| .B CAP_SYS_CHROOT |
| Use |
| .BR chroot (2). |
| .TP |
| .B CAP_SYS_MODULE |
| Load and unload kernel modules |
| (see |
| .BR init_module (2) |
| and |
| .BR delete_module (2)); |
| in kernels before 2.6.25: |
| drop capabilities from the system-wide capability bounding set. |
| .TP |
| .B CAP_SYS_NICE |
| .PD 0 |
| .RS |
| .IP * 2 |
| Raise process nice value |
| .RB ( nice (2), |
| .BR setpriority (2)) |
| and change the nice value for arbitrary processes; |
| .IP * |
| set real-time scheduling policies for calling process, |
| and set scheduling policies and priorities for arbitrary processes |
| .RB ( sched_setscheduler (2), |
| .BR sched_setparam (2), |
| .BR shed_setattr (2)); |
| .IP * |
| set CPU affinity for arbitrary processes |
| .RB ( sched_setaffinity (2)); |
| .IP * |
| set I/O scheduling class and priority for arbitrary processes |
| .RB ( ioprio_set (2)); |
| .IP * |
| apply |
| .BR migrate_pages (2) |
| to arbitrary processes and allow processes |
| to be migrated to arbitrary nodes; |
| .\" FIXME CAP_SYS_NICE also has the following effect for |
| .\" migrate_pages(2): |
| .\" do_migrate_pages(mm, &old, &new, |
| .\" capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE); |
| .\" Document this. |
| .IP * |
| apply |
| .BR move_pages (2) |
| to arbitrary processes; |
| .IP * |
| use the |
| .B MPOL_MF_MOVE_ALL |
| flag with |
| .BR mbind (2) |
| and |
| .BR move_pages (2). |
| .RE |
| .PD |
| .TP |
| .B CAP_SYS_PACCT |
| Use |
| .BR acct (2). |
| .TP |
| .B CAP_SYS_PTRACE |
| .PD 0 |
| .RS |
| .IP * 3 |
| Trace arbitrary processes using |
| .BR ptrace (2); |
| .IP * |
| apply |
| .BR get_robust_list (2) |
| to arbitrary processes; |
| .IP * |
| transfer data to or from the memory of arbitrary processes using |
| .BR process_vm_readv (2) |
| and |
| .BR process_vm_writev (2). |
| .IP * |
| inspect processes using |
| .BR kcmp (2). |
| .RE |
| .PD |
| .TP |
| .B CAP_SYS_RAWIO |
| .PD 0 |
| .RS |
| .IP * 2 |
| Perform I/O port operations |
| .RB ( iopl (2) |
| and |
| .BR ioperm (2)); |
| .IP * |
| access |
| .IR /proc/kcore ; |
| .IP * |
| employ the |
| .B FIBMAP |
| .BR ioctl (2) |
| operation; |
| .IP * |
| open devices for accessing x86 model-specific registers (MSRs, see |
| .BR msr (4)) |
| .IP * |
| update |
| .IR /proc/sys/vm/mmap_min_addr ; |
| .IP * |
| create memory mappings at addresses below the value specified by |
| .IR /proc/sys/vm/mmap_min_addr ; |
| .IP * |
| map files in |
| .IR /proc/bus/pci ; |
| .IP * |
| open |
| .IR /dev/mem |
| and |
| .IR /dev/kmem ; |
| .IP * |
| perform various SCSI device commands; |
| .IP * |
| perform certain operations on |
| .BR hpsa (4) |
| and |
| .BR cciss (4) |
| devices; |
| .IP * |
| perform a range of device-specific operations on other devices. |
| .RE |
| .PD |
| .TP |
| .B CAP_SYS_RESOURCE |
| .PD 0 |
| .RS |
| .IP * 2 |
| Use reserved space on ext2 filesystems; |
| .IP * |
| make |
| .BR ioctl (2) |
| calls controlling ext3 journaling; |
| .IP * |
| override disk quota limits; |
| .IP * |
| increase resource limits (see |
| .BR setrlimit (2)); |
| .IP * |
| override |
| .B RLIMIT_NPROC |
| resource limit; |
| .IP * |
| override maximum number of consoles on console allocation; |
| .IP * |
| override maximum number of keymaps; |
| .IP * |
| allow more than 64hz interrupts from the real-time clock; |
| .IP * |
| raise |
| .I msg_qbytes |
| limit for a System V message queue above the limit in |
| .I /proc/sys/kernel/msgmnb |
| (see |
| .BR msgop (2) |
| and |
| .BR msgctl (2)); |
| .IP * |
| override the |
| .I /proc/sys/fs/pipe-size-max |
| limit when setting the capacity of a pipe using the |
| .B F_SETPIPE_SZ |
| .BR fcntl (2) |
| command. |
| .IP * |
| use |
| .BR F_SETPIPE_SZ |
| to increase the capacity of a pipe above the limit specified by |
| .IR /proc/sys/fs/pipe-max-size ; |
| .IP * |
| override |
| .I /proc/sys/fs/mqueue/queues_max |
| limit when creating POSIX message queues (see |
| .BR mq_overview (7)); |
| .IP * |
| employ |
| .BR prctl (2) |
| .B PR_SET_MM |
| operation; |
| .IP * |
| set |
| .IR /proc/PID/oom_score_adj |
| to a value lower than the value last set by a process with |
| .BR CAP_SYS_RESOURCE . |
| .RE |
| .PD |
| .TP |
| .B CAP_SYS_TIME |
| Set system clock |
| .RB ( settimeofday (2), |
| .BR stime (2), |
| .BR adjtimex (2)); |
| set real-time (hardware) clock. |
| .TP |
| .B CAP_SYS_TTY_CONFIG |
| Use |
| .BR vhangup (2); |
| employ various privileged |
| .BR ioctl (2) |
| operations on virtual terminals. |
| .TP |
| .BR CAP_SYSLOG " (since Linux 2.6.37)" |
| .RS |
| .PD 0 |
| .IP * 3 |
| Perform privileged |
| .BR syslog (2) |
| operations. |
| See |
| .BR syslog (2) |
| for information on which operations require privilege. |
| .IP * |
| View kernel addresses exposed via |
| .I /proc |
| and other interfaces when |
| .IR /proc/sys/kernel/kptr_restrict |
| has the value 1. |
| (See the discussion of the |
| .I kptr_restrict |
| in |
| .BR proc (5).) |
| .PD |
| .RE |
| .TP |
| .BR CAP_WAKE_ALARM " (since Linux 3.0)" |
| Trigger something that will wake up the system (set |
| .B CLOCK_REALTIME_ALARM |
| and |
| .B CLOCK_BOOTTIME_ALARM |
| timers). |
| .\" |
| .SS Past and current implementation |
| A full implementation of capabilities requires that: |
| .IP 1. 3 |
| For all privileged operations, |
| the kernel must check whether the thread has the required |
| capability in its effective set. |
| .IP 2. |
| The kernel must provide system calls allowing a thread's capability sets to |
| be changed and retrieved. |
| .IP 3. |
| The filesystem must support attaching capabilities to an executable file, |
| so that a process gains those capabilities when the file is executed. |
| .PP |
| Before kernel 2.6.24, only the first two of these requirements are met; |
| since kernel 2.6.24, all three requirements are met. |
| .\" |
| .SS Thread capability sets |
| Each thread has three capability sets containing zero or more |
| of the above capabilities: |
| .TP |
| .IR Permitted : |
| This is a limiting superset for the effective |
| capabilities that the thread may assume. |
| It is also a limiting superset for the capabilities that |
| may be added to the inheritable set by a thread that does not have the |
| .B CAP_SETPCAP |
| capability in its effective set. |
| |
| If a thread drops a capability from its permitted set, |
| it can never reacquire that capability (unless it |
| .BR execve (2)s |
| either a set-user-ID-root program, or |
| a program whose associated file capabilities grant that capability). |
| .TP |
| .IR Inheritable : |
| This is a set of capabilities preserved across an |
| .BR execve (2). |
| Inheritable capabilities remain inheritable when executing any program, |
| and inheritable capabilities are added to the permitted set when executing |
| a program that has the corresponding bits set in the file inheritable set. |
| .IP |
| Because inheritable capabilities are not generally preserved across |
| .BR execve (2) |
| when running as a non-root user, applications that wish to run helper |
| programs with elevated capabilities should consider using ambient capabilities, |
| described below. |
| .TP |
| .IR Effective : |
| This is the set of capabilities used by the kernel to |
| perform permission checks for the thread. |
| .TP |
| .IR Ambient " (since Linux 4.3):" |
| This is a set of capabilities that are preserved across an |
| .BR execve (2) |
| of a program that does not have file capabilities. The ambient capability |
| set obeys the invariant that no capability can ever be ambient if it is |
| not both permitted and inheritable. Ambient capabilities are |
| preserved in the permitted set and added to the effective |
| set when |
| .BR execve (2) |
| is called. The ambient capability set is modified using |
| .BR prctl (2). |
| Executing a program that changes uid or gid due to the setuid or setgid |
| bits or executing a program that has any file capabilities set will clear |
| the ambient set. |
| .PP |
| A child created via |
| .BR fork (2) |
| inherits copies of its parent's capability sets. |
| See below for a discussion of the treatment of capabilities during |
| .BR execve (2). |
| .PP |
| Using |
| .BR capset (2), |
| a thread may manipulate its own capability sets (see below). |
| .PP |
| Since Linux 3.2, the file |
| .I /proc/sys/kernel/cap_last_cap |
| .\" commit 73efc0394e148d0e15583e13712637831f926720 |
| exposes the numerical value of the highest capability |
| supported by the running kernel; |
| this can be used to determine the highest bit |
| that may be set in a capability set. |
| .\" |
| .SS File capabilities |
| Since kernel 2.6.24, the kernel supports |
| associating capability sets with an executable file using |
| .BR setcap (8). |
| The file capability sets are stored in an extended attribute (see |
| .BR setxattr (2)) |
| named |
| .IR "security.capability" . |
| Writing to this extended attribute requires the |
| .BR CAP_SETFCAP |
| capability. |
| The file capability sets, |
| in conjunction with the capability sets of the thread, |
| determine the capabilities of a thread after an |
| .BR execve (2). |
| |
| The three file capability sets are: |
| .TP |
| .IR Permitted " (formerly known as " forced ): |
| These capabilities are automatically permitted to the thread, |
| regardless of the thread's inheritable capabilities. |
| .TP |
| .IR Inheritable " (formerly known as " allowed ): |
| This set is ANDed with the thread's inheritable set to determine which |
| inheritable capabilities are enabled in the permitted set of |
| the thread after the |
| .BR execve (2). |
| .TP |
| .IR Effective : |
| This is not a set, but rather just a single bit. |
| If this bit is set, then during an |
| .BR execve (2) |
| all of the new permitted capabilities for the thread are |
| also raised in the effective set. |
| If this bit is not set, then after an |
| .BR execve (2), |
| none of the new permitted capabilities is in the new effective set. |
| |
| Enabling the file effective capability bit implies |
| that any file permitted or inheritable capability that causes a |
| thread to acquire the corresponding permitted capability during an |
| .BR execve (2) |
| (see the transformation rules described below) will also acquire that |
| capability in its effective set. |
| Therefore, when assigning capabilities to a file |
| .RB ( setcap (8), |
| .BR cap_set_file (3), |
| .BR cap_set_fd (3)), |
| if we specify the effective flag as being enabled for any capability, |
| then the effective flag must also be specified as enabled |
| for all other capabilities for which the corresponding permitted or |
| inheritable flags is enabled. |
| .\" |
| .SS Transformation of capabilities during execve() |
| .PP |
| During an |
| .BR execve (2), |
| the kernel calculates the new capabilities of |
| the process using the following algorithm: |
| .in +4n |
| .nf |
| |
| P'(ambient) = (file has capabilities or is setuid or setgid) ? 0 : P(ambient) |
| |
| P'(permitted) = (P(inheritable) & F(inheritable)) | |
| (F(permitted) & cap_bset) | P'(ambient) |
| |
| P'(effective) = F(effective) ? P'(permitted) : P'(ambient) |
| |
| P'(inheritable) = P(inheritable) [i.e., unchanged] |
| |
| .fi |
| .in |
| where: |
| .RS 4 |
| .IP P 10 |
| denotes the value of a thread capability set before the |
| .BR execve (2) |
| .IP P' |
| denotes the value of a capability set after the |
| .BR execve (2) |
| .IP F |
| denotes a file capability set |
| .IP cap_bset |
| is the value of the capability bounding set (described below). |
| .RE |
| .\" |
| .SS Capabilities and execution of programs by root |
| In order to provide an all-powerful |
| .I root |
| using capability sets, during an |
| .BR execve (2): |
| .IP 1. 3 |
| If a set-user-ID-root program is being executed, |
| or the real user ID of the process is 0 (root) |
| then the file inheritable and permitted sets are defined to be all ones |
| (i.e., all capabilities enabled). |
| .IP 2. |
| If a set-user-ID-root program is being executed, |
| then the file effective bit is defined to be one (enabled). |
| .PP |
| The upshot of the above rules, |
| combined with the capabilities transformations described above, |
| is that when a process |
| .BR execve (2)s |
| a set-user-ID-root program, or when a process with an effective UID of 0 |
| .BR execve (2)s |
| a program, |
| it gains all capabilities in its permitted and effective capability sets, |
| except those masked out by the capability bounding set. |
| .\" If a process with real UID 0, and nonzero effective UID does an |
| .\" exec(), then it gets all capabilities in its |
| .\" permitted set, and no effective capabilities |
| This provides semantics that are the same as those provided by |
| traditional UNIX systems. |
| .SS Capability bounding set |
| The capability bounding set is a security mechanism that can be used |
| to limit the capabilities that can be gained during an |
| .BR execve (2). |
| The bounding set is used in the following ways: |
| .IP * 2 |
| During an |
| .BR execve (2), |
| the capability bounding set is ANDed with the file permitted |
| capability set, and the result of this operation is assigned to the |
| thread's permitted capability set. |
| The capability bounding set thus places a limit on the permitted |
| capabilities that may be granted by an executable file. |
| .IP * |
| (Since Linux 2.6.25) |
| The capability bounding set acts as a limiting superset for |
| the capabilities that a thread can add to its inheritable set using |
| .BR capset (2). |
| This means that if a capability is not in the bounding set, |
| then a thread can't add this capability to its |
| inheritable set, even if it was in its permitted capabilities, |
| and thereby cannot have this capability preserved in its |
| permitted set when it |
| .BR execve (2)s |
| a file that has the capability in its inheritable set. |
| .PP |
| Note that the bounding set masks the file permitted capabilities, |
| but not the inherited capabilities. |
| If a thread maintains a capability in its inherited set |
| that is not in its bounding set, |
| then it can still gain that capability in its permitted set |
| by executing a file that has the capability in its inherited set. |
| .PP |
| Depending on the kernel version, the capability bounding set is either |
| a system-wide attribute, or a per-process attribute. |
| .PP |
| .B "Capability bounding set prior to Linux 2.6.25" |
| .PP |
| In kernels before 2.6.25, the capability bounding set is a system-wide |
| attribute that affects all threads on the system. |
| The bounding set is accessible via the file |
| .IR /proc/sys/kernel/cap-bound . |
| (Confusingly, this bit mask parameter is expressed as a |
| signed decimal number in |
| .IR /proc/sys/kernel/cap-bound .) |
| |
| Only the |
| .B init |
| process may set capabilities in the capability bounding set; |
| other than that, the superuser (more precisely: programs with the |
| .B CAP_SYS_MODULE |
| capability) may only clear capabilities from this set. |
| |
| On a standard system the capability bounding set always masks out the |
| .B CAP_SETPCAP |
| capability. |
| To remove this restriction (dangerous!), modify the definition of |
| .B CAP_INIT_EFF_SET |
| in |
| .I include/linux/capability.h |
| and rebuild the kernel. |
| |
| The system-wide capability bounding set feature was added |
| to Linux starting with kernel version 2.2.11. |
| .\" |
| .PP |
| .B "Capability bounding set from Linux 2.6.25 onward" |
| .PP |
| From Linux 2.6.25, the |
| .I "capability bounding set" |
| is a per-thread attribute. |
| (There is no longer a system-wide capability bounding set.) |
| |
| The bounding set is inherited at |
| .BR fork (2) |
| from the thread's parent, and is preserved across an |
| .BR execve (2). |
| |
| A thread may remove capabilities from its capability bounding set using the |
| .BR prctl (2) |
| .B PR_CAPBSET_DROP |
| operation, provided it has the |
| .B CAP_SETPCAP |
| capability. |
| Once a capability has been dropped from the bounding set, |
| it cannot be restored to that set. |
| A thread can determine if a capability is in its bounding set using the |
| .BR prctl (2) |
| .B PR_CAPBSET_READ |
| operation. |
| |
| Removing capabilities from the bounding set is supported only if file |
| capabilities are compiled into the kernel. |
| In kernels before Linux 2.6.33, |
| file capabilities were an optional feature configurable via the |
| .B CONFIG_SECURITY_FILE_CAPABILITIES |
| option. |
| Since Linux 2.6.33, |
| .\" commit b3a222e52e4d4be77cc4520a57af1a4a0d8222d1 |
| the configuration option has been removed |
| and file capabilities are always part of the kernel. |
| When file capabilities are compiled into the kernel, the |
| .B init |
| process (the ancestor of all processes) begins with a full bounding set. |
| If file capabilities are not compiled into the kernel, then |
| .B init |
| begins with a full bounding set minus |
| .BR CAP_SETPCAP , |
| because this capability has a different meaning when there are |
| no file capabilities. |
| |
| Removing a capability from the bounding set does not remove it |
| from the thread's inherited set. |
| However it does prevent the capability from being added |
| back into the thread's inherited set in the future. |
| .\" |
| .\" |
| .SS Effect of user ID changes on capabilities |
| To preserve the traditional semantics for transitions between |
| 0 and nonzero user IDs, |
| the kernel makes the following changes to a thread's capability |
| sets on changes to the thread's real, effective, saved set, |
| and filesystem user IDs (using |
| .BR setuid (2), |
| .BR setresuid (2), |
| or similar): |
| .IP 1. 3 |
| If one or more of the real, effective or saved set user IDs |
| was previously 0, and as a result of the UID changes all of these IDs |
| have a nonzero value, |
| then all capabilities are cleared from the permitted and effective |
| capability sets. |
| .IP 2. |
| If the effective user ID is changed from 0 to nonzero, |
| then all capabilities are cleared from the effective set. |
| .IP 3. |
| If the effective user ID is changed from nonzero to 0, |
| then the permitted set is copied to the effective set. |
| .IP 4. |
| If the filesystem user ID is changed from 0 to nonzero (see |
| .BR setfsuid (2)), |
| then the following capabilities are cleared from the effective set: |
| .BR CAP_CHOWN , |
| .BR CAP_DAC_OVERRIDE , |
| .BR CAP_DAC_READ_SEARCH , |
| .BR CAP_FOWNER , |
| .BR CAP_FSETID , |
| .B CAP_LINUX_IMMUTABLE |
| (since Linux 2.6.30), |
| .BR CAP_MAC_OVERRIDE , |
| and |
| .B CAP_MKNOD |
| (since Linux 2.6.30). |
| If the filesystem UID is changed from nonzero to 0, |
| then any of these capabilities that are enabled in the permitted set |
| are enabled in the effective set. |
| .PP |
| If a thread that has a 0 value for one or more of its user IDs wants |
| to prevent its permitted capability set being cleared when it resets |
| all of its user IDs to nonzero values, it can do so using the |
| .BR prctl (2) |
| .B PR_SET_KEEPCAPS |
| operation or the |
| .B SECBIT_KEEP_CAPS |
| securebits flag described below. |
| .\" |
| .SS Programmatically adjusting capability sets |
| A thread can retrieve and change its capability sets using the |
| .BR capget (2) |
| and |
| .BR capset (2) |
| system calls. |
| However, the use of |
| .BR cap_get_proc (3) |
| and |
| .BR cap_set_proc (3), |
| both provided in the |
| .I libcap |
| package, |
| is preferred for this purpose. |
| The following rules govern changes to the thread capability sets: |
| .IP 1. 3 |
| If the caller does not have the |
| .B CAP_SETPCAP |
| capability, |
| the new inheritable set must be a subset of the combination |
| of the existing inheritable and permitted sets. |
| .IP 2. |
| (Since Linux 2.6.25) |
| The new inheritable set must be a subset of the combination of the |
| existing inheritable set and the capability bounding set. |
| .IP 3. |
| The new permitted set must be a subset of the existing permitted set |
| (i.e., it is not possible to acquire permitted capabilities |
| that the thread does not currently have). |
| .IP 4. |
| The new effective set must be a subset of the new permitted set. |
| .SS The securebits flags: establishing a capabilities-only environment |
| .\" For some background: |
| .\" see http://lwn.net/Articles/280279/ and |
| .\" http://article.gmane.org/gmane.linux.kernel.lsm/5476/ |
| Starting with kernel 2.6.26, |
| and with a kernel in which file capabilities are enabled, |
| Linux implements a set of per-thread |
| .I securebits |
| flags that can be used to disable special handling of capabilities for UID 0 |
| .RI ( root ). |
| These flags are as follows: |
| .TP |
| .B SECBIT_KEEP_CAPS |
| Setting this flag allows a thread that has one or more 0 UIDs to retain |
| its capabilities when it switches all of its UIDs to a nonzero value. |
| If this flag is not set, |
| then such a UID switch causes the thread to lose all capabilities. |
| This flag is always cleared on an |
| .BR execve (2). |
| (This flag provides the same functionality as the older |
| .BR prctl (2) |
| .B PR_SET_KEEPCAPS |
| operation.) |
| .TP |
| .B SECBIT_NO_SETUID_FIXUP |
| Setting this flag stops the kernel from adjusting capability sets when |
| the threads's effective and filesystem UIDs are switched between |
| zero and nonzero values. |
| (See the subsection |
| .IR "Effect of User ID Changes on Capabilities" .) |
| .TP |
| .B SECBIT_NOROOT |
| If this bit is set, then the kernel does not grant capabilities |
| when a set-user-ID-root program is executed, or when a process with |
| an effective or real UID of 0 calls |
| .BR execve (2). |
| (See the subsection |
| .IR "Capabilities and execution of programs by root" .) |
| .TP |
| .B SECBIT_NO_CAP_AMBIENT_RAISE |
| Setting this flag disallows |
| .BR PR_CAP_AMBIENT_RAISE . |
| .PP |
| Each of the above "base" flags has a companion "locked" flag. |
| Setting any of the "locked" flags is irreversible, |
| and has the effect of preventing further changes to the |
| corresponding "base" flag. |
| The locked flags are: |
| .BR SECBIT_KEEP_CAPS_LOCKED , |
| .BR SECBIT_NO_SETUID_FIXUP_LOCKED , |
| .BR SECBIT_NOROOT_LOCKED , |
| and |
| .BR SECBIT_NO_CAP_AMBIENT_RAISE . |
| .PP |
| The |
| .I securebits |
| flags can be modified and retrieved using the |
| .BR prctl (2) |
| .B PR_SET_SECUREBITS |
| and |
| .B PR_GET_SECUREBITS |
| operations. |
| The |
| .B CAP_SETPCAP |
| capability is required to modify the flags. |
| |
| The |
| .I securebits |
| flags are inherited by child processes. |
| During an |
| .BR execve (2), |
| all of the flags are preserved, except |
| .B SECBIT_KEEP_CAPS |
| which is always cleared. |
| |
| An application can use the following call to lock itself, |
| and all of its descendants, |
| into an environment where the only way of gaining capabilities |
| is by executing a program with associated file capabilities: |
| .in +4n |
| .nf |
| |
| prctl(PR_SET_SECUREBITS, |
| SECBIT_KEEP_CAPS_LOCKED | |
| SECBIT_NO_SETUID_FIXUP | |
| SECBIT_NO_SETUID_FIXUP_LOCKED | |
| SECBIT_NOROOT | |
| SECBIT_NOROOT_LOCKED); |
| .fi |
| .in |
| .SS Interaction with user namespaces |
| For a discussion of the interaction of capabilities and user namespaces, see |
| .BR user_namespaces (7). |
| .SH CONFORMING TO |
| .PP |
| No standards govern capabilities, but the Linux capability implementation |
| is based on the withdrawn POSIX.1e draft standard; see |
| .UR http://wt.tuxomania.net\:/publications\:/posix.1e/ |
| .UE . |
| .SH NOTES |
| From kernel 2.5.27 to kernel 2.6.26, |
| .\" commit 5915eb53861c5776cfec33ca4fcc1fd20d66dd27 removed |
| .\" CONFIG_SECURITY_CAPABILITIES |
| capabilities were an optional kernel component, |
| and can be enabled/disabled via the |
| .B CONFIG_SECURITY_CAPABILITIES |
| kernel configuration option. |
| |
| The |
| .I /proc/PID/task/TID/status |
| file can be used to view the capability sets of a thread. |
| The |
| .I /proc/PID/status |
| file shows the capability sets of a process's main thread. |
| Before Linux 3.8, nonexistent capabilities were shown as being |
| enabled (1) in these sets. |
| Since Linux 3.8, |
| .\" 7b9a7ec565505699f503b4fcf61500dceb36e744 |
| all nonexistent capabilities (above |
| .BR CAP_LAST_CAP ) |
| are shown as disabled (0). |
| |
| The |
| .I libcap |
| package provides a suite of routines for setting and |
| getting capabilities that is more comfortable and less likely |
| to change than the interface provided by |
| .BR capset (2) |
| and |
| .BR capget (2). |
| This package also provides the |
| .BR setcap (8) |
| and |
| .BR getcap (8) |
| programs. |
| It can be found at |
| .br |
| .UR http://www.kernel.org\:/pub\:/linux\:/libs\:/security\:/linux-privs |
| .UE . |
| |
| Before kernel 2.6.24, and from kernel 2.6.24 to kernel 2.6.32 if |
| file capabilities are not enabled, a thread with the |
| .B CAP_SETPCAP |
| capability can manipulate the capabilities of threads other than itself. |
| However, this is only theoretically possible, |
| since no thread ever has |
| .BR CAP_SETPCAP |
| in either of these cases: |
| .IP * 2 |
| In the pre-2.6.25 implementation the system-wide capability bounding set, |
| .IR /proc/sys/kernel/cap-bound , |
| always masks out this capability, and this can not be changed |
| without modifying the kernel source and rebuilding. |
| .IP * |
| If file capabilities are disabled in the current implementation, then |
| .B init |
| starts out with this capability removed from its per-process bounding |
| set, and that bounding set is inherited by all other processes |
| created on the system. |
| .SH SEE ALSO |
| .BR capsh (1), |
| .BR setpriv (1), |
| .BR prctl (2), |
| .BR setfsuid (2), |
| .BR cap_clear (3), |
| .BR cap_copy_ext (3), |
| .BR cap_from_text (3), |
| .BR cap_get_file (3), |
| .BR cap_get_proc (3), |
| .BR cap_init (3), |
| .BR capgetp (3), |
| .BR capsetp (3), |
| .BR libcap (3), |
| .BR credentials (7), |
| .BR user_namespaces (7), |
| .BR pthreads (7), |
| .BR getcap (8), |
| .BR setcap (8) |
| .PP |
| .I include/linux/capability.h |
| in the Linux kernel source tree |