| .\" Copyright (c) 2016 by Michael Kerrisk <mtk.manpages@gmail.com> |
| .\" |
| .\" %%%LICENSE_START(VERBATIM) |
| .\" Permission is granted to make and distribute verbatim copies of this |
| .\" manual provided the copyright notice and this permission notice are |
| .\" preserved on all copies. |
| .\" |
| .\" Permission is granted to copy and distribute modified versions of this |
| .\" manual under the conditions for verbatim copying, provided that the |
| .\" entire resulting derived work is distributed under the terms of a |
| .\" permission notice identical to this one. |
| .\" |
| .\" Since the Linux kernel and libraries are constantly changing, this |
| .\" manual page may be incorrect or out-of-date. The author(s) assume no |
| .\" responsibility for errors or omissions, or for damages resulting from |
| .\" the use of the information contained herein. The author(s) may not |
| .\" have taken the same level of care in the production of this manual, |
| .\" which is licensed free of charge, as they might when working |
| .\" professionally. |
| .\" |
| .\" Formatted or processed versions of this manual, if unaccompanied by |
| .\" the source, must acknowledge the copyright and authors of this work. |
| .\" %%%LICENSE_END |
| .\" |
| .\" |
| .TH CGROUP_NAMESPACES 7 2016-07-17 "Linux" "Linux Programmer's Manual" |
| .SH NAME |
| cgroup_namespaces \- overview of Linux cgroup namespaces |
| .SH DESCRIPTION |
| For an overview of namespaces, see |
| .BR namespaces (7). |
| |
| Cgroup namespaces virtualize the view of a process's cgroups (see |
| .BR cgroups (7)) |
| as seen via |
| .IR /proc/[pid]/cgroup |
| and |
| .IR /proc/[pid]/mountinfo . |
| |
| Each cgroup namespace has its own set of cgroup root directories, |
| which are the base points for the relative locations displayed in |
| .IR /proc/[pid]/cgroup . |
| When a process creates a new cgroup namespace using |
| .BR clone (2) |
| or |
| .BR unshare (2) |
| with the |
| .BR CLONE_NEWCGROUP |
| flag, it enters a new cgroup namespace in which its current |
| cgroups directories become the cgroup root directories |
| of the new namespace. |
| (This applies both for the cgroups version 1 hierarchies |
| and the cgroups version 2 unified hierarchy.) |
| |
| When viewing |
| .IR /proc/[pid]/cgroup , |
| the pathname shown in the third field of each record will be |
| relative to the reading process's cgroup root directory. |
| If the cgroup directory of the target process lies outside |
| the root directory of the reading process's cgroup namespace, |
| then the pathname will show |
| .I ../ |
| entries for each ancestor level in the cgroup hierarchy. |
| |
| The following shell session demonstrates the effect of creating |
| a new cgroup namespace. |
| First, (as superuser) we create a child cgroup in the |
| .I freezer |
| hierarchy, and put the shell into that cgroup: |
| |
| .nf |
| .in +4n |
| # \fBmkdir \-p /sys/fs/cgroup/freezer/sub\fP |
| # \fBecho $$\fP # Show PID of this shell |
| 30655 |
| # \fBsh \-c 'echo 30655 > /sys/fs/cgroup/sub'\fP |
| # \fBcat /proc/self/cgroup | grep freezer\fP |
| 7:freezer:/sub |
| .in |
| .fi |
| |
| Next, we use |
| .BR unshare (1) |
| to create a process running a new shell in new cgroup and mount namespaces: |
| |
| .nf |
| .in +4n |
| # \fBunshare \-Cm bash\fP |
| .in |
| .fi |
| |
| We then inspect the |
| .IR /proc/[pid]/cgroup |
| files of, respectively, the new shell process started by the |
| .BR unshare (1) |
| command, a process that is in the original cgroup namespace |
| .RI ( init , |
| with PID 1), and a process in a sibling cgroup: |
| |
| .nf |
| .in +4n |
| $ \fBcat /proc/self/cgroup | grep freezer\fP |
| 7:freezer:/ |
| $ \fBcat /proc/1/cgroup | grep freezer\fP |
| 7:freezer:/.. |
| $ \fBcat /proc/20124/cgroup | grep freezer\fP |
| 7:freezer:/../sub2 |
| .in |
| .fi |
| |
| However, when we look in |
| .IR /proc/self/mountinfo |
| we see the following anomaly: |
| |
| .nf |
| .in +4n |
| # \fBcat /proc/self/mountinfo | grep freezer\fP |
| 155 145 0:32 /.. /sys/fs/cgroup/freezer ... |
| .in |
| .fi |
| |
| The fourth field of this file should show the |
| directory in the cgroup filesystem which forms the root of this mount. |
| Since by the definition of cgroup namespaces, the process's current |
| freezer cgroup directory became its root freezer cgroup directory, |
| we should see \(aq/\(aq in this field. |
| The problem here is that we are seeing a mount entry for the cgroup |
| filesystem corresponding to our initial shell process's cgroup namespace |
| (whose cgroup filesystem is indeed rooted in the parent directory of |
| .IR sub ). |
| We need to remount the freezer cgroup filesystem |
| inside this cgroup namespace, after which we see the expected results: |
| |
| .nf |
| .in +4n |
| # \fBmount \-\-make\-rslave /\fP # Don't propagate mount events |
| # to other namespaces |
| # \fBumount /sys/fs/cgroup/freezer\fP |
| # \fBmount \-t cgroup \-o freezer freezer /sys/fs/cgroup/freezer\fP |
| # \fBcat /proc/self/mountinfo | grep freezer\fP |
| 155 145 0:32 / /sys/fs/cgroup/freezer rw,relatime ... |
| .in |
| .fi |
| |
| Use of cgroup namespaces requires a kernel that is configured with the |
| .B CONFIG_CGROUPS |
| option. |
| .\" |
| .SH CONFORMING TO |
| Namespaces are a Linux-specific feature. |
| .SH NOTES |
| Among the purposes served by the |
| virtualization provided by cgroup namespaces are the following: |
| .IP * 2 |
| It prevents information leaks whereby cgroup directory paths outside of |
| a container would otherwise be visible to processes in the container. |
| Such leakages could, for example, |
| reveal information about the container framework |
| to containerized applications. |
| .IP * |
| It eases tasks such as container migration. |
| The virtualization provided by cgroup namespaces |
| allows containers to be isolated from knowledge of |
| the pathnames of ancestor cgroups. |
| Without such isolation, the full cgroup pathnames (displayed in |
| .IR /proc/self/cgroups ) |
| would need to be replicated on the target system when migrating a container; |
| those pathnames would also need to be unique, |
| so that they don't conflict with other pathnames on the target system. |
| .IP * |
| It allows better confinement of containerized processes, |
| because it is possible to mount the container's cgroup filesystems such that |
| the container processes can't gain access to ancestor cgroup directories. |
| Consider, for example, the following scenario: |
| .RS 4 |
| .IP \(bu 2 |
| We have a cgroup directory, |
| .IR /cg/1 , |
| that is owned by user ID 9000. |
| .IP \(bu |
| We have a process, |
| .IR X , |
| also owned by user ID 9000, |
| that is namespaced under the cgroup |
| .IR /cg/1/2 |
| (i.e., |
| .I X |
| was placed in a new cgroup namespace via |
| .BR clone (2) |
| or |
| .BR unshare (2) |
| with the |
| .BR CLONE_NEWCGROUP |
| flag). |
| .RE |
| .IP |
| In the absence of cgroup namespacing, because the cgroup directory |
| .IR /cg/1 |
| is owned (and writable) by UID 9000 and process |
| .I X |
| is also owned by user ID 9000, then process |
| .I X |
| would be able to modify the contents of cgroups files |
| (i.e., change cgroup settings) not only in |
| .IR /cg/1/2 |
| but also in the ancestor cgroup directory |
| .IR /cg/1 . |
| Namespacing process |
| .IR X |
| under the cgroup directory |
| .IR /cg/1/2 , |
| in combination with suitable mount operations |
| for the cgroup filesystem (as shown above), |
| prevents it modifying files in |
| .IR /cg/1 , |
| since it cannot even see the contents of that directory |
| (or of further removed cgroup ancestor directories). |
| Combined with correct enforcement of hierarchical limits, |
| this prevents process |
| .I X |
| from escaping the limits imposed by ancestor cgroups. |
| .SH SEE ALSO |
| .BR unshare (1), |
| .BR clone (2), |
| .BR setns (2), |
| .BR unshare (2), |
| .BR proc (5), |
| .BR cgroups (7), |
| .BR credentials (7), |
| .BR namespaces (7), |
| .BR user_namespaces (7) |