| '\" t |
| .\" Copyright (c) 2013, 2016, 2017 by Michael Kerrisk <mtk.manpages@gmail.com> |
| .\" and Copyright (c) 2012 by Eric W. Biederman <ebiederm@xmission.com> |
| .\" |
| .\" SPDX-License-Identifier: Linux-man-pages-copyleft |
| .\" |
| .\" |
| .TH namespaces 7 (date) "Linux man-pages (unreleased)" |
| .SH NAME |
| namespaces \- overview of Linux namespaces |
| .SH DESCRIPTION |
| A namespace wraps a global system resource in an abstraction that |
| makes it appear to the processes within the namespace that they |
| have their own isolated instance of the global resource. |
| Changes to the global resource are visible to other processes |
| that are members of the namespace, but are invisible to other processes. |
| One use of namespaces is to implement containers. |
| .P |
| This page provides pointers to information on the various namespace types, |
| describes the associated |
| .I /proc |
| files, and summarizes the APIs for working with namespaces. |
| .\" |
| .SS Namespace types |
| The following table shows the namespace types available on Linux. |
| The second column of the table shows the flag value that is used to specify |
| the namespace type in various APIs. |
| The third column identifies the manual page that provides details |
| on the namespace type. |
| The last column is a summary of the resources that are isolated by |
| the namespace type. |
| .P |
| .TS |
| lB lB lB lB |
| l lB lw(21n) lx. |
| Namespace Flag Page Isolates |
| _ |
| Cgroup CLONE_NEWCGROUP \fBcgroup_namespaces\fP(7) T{ |
| Cgroup root directory |
| T} |
| IPC CLONE_NEWIPC \fBipc_namespaces\fP(7) T{ |
| System V IPC, |
| POSIX message queues |
| T} |
| Network CLONE_NEWNET \fBnetwork_namespaces\fP(7) T{ |
| Network devices, |
| stacks, ports, etc. |
| T} |
| Mount CLONE_NEWNS \fBmount_namespaces\fP(7) Mount points |
| PID CLONE_NEWPID \fBpid_namespaces\fP(7) Process IDs |
| Time CLONE_NEWTIME \fBtime_namespaces\fP(7) T{ |
| Boot and monotonic |
| clocks |
| T} |
| User CLONE_NEWUSER \fBuser_namespaces\fP(7) T{ |
| User and group IDs |
| T} |
| UTS CLONE_NEWUTS \fButs_namespaces\fP(7) T{ |
| Hostname and NIS |
| domain name |
| T} |
| .TE |
| .\" |
| .\" ==================== The namespaces API ==================== |
| .\" |
| .SS The namespaces API |
| As well as various |
| .I /proc |
| files described below, |
| the namespaces API includes the following system calls: |
| .TP |
| .BR clone (2) |
| The |
| .BR clone (2) |
| system call creates a new process. |
| If the |
| .I flags |
| argument of the call specifies one or more of the |
| .B CLONE_NEW* |
| flags listed above, then new namespaces are created for each flag, |
| and the child process is made a member of those namespaces. |
| (This system call also implements a number of features |
| unrelated to namespaces.) |
| .TP |
| .BR setns (2) |
| The |
| .BR setns (2) |
| system call allows the calling process to join an existing namespace. |
| The namespace to join is specified via a file descriptor that refers to |
| one of the |
| .IR /proc/ pid /ns |
| files described below. |
| .TP |
| .BR unshare (2) |
| The |
| .BR unshare (2) |
| system call moves the calling process to a new namespace. |
| If the |
| .I flags |
| argument of the call specifies one or more of the |
| .B CLONE_NEW* |
| flags listed above, then new namespaces are created for each flag, |
| and the calling process is made a member of those namespaces. |
| (This system call also implements a number of features |
| unrelated to namespaces.) |
| .TP |
| .BR ioctl (2) |
| Various |
| .BR ioctl (2) |
| operations can be used to discover information about namespaces. |
| These operations are described in |
| .BR ioctl_nsfs (2). |
| .P |
| Creation of new namespaces using |
| .BR clone (2) |
| and |
| .BR unshare (2) |
| in most cases requires the |
| .B CAP_SYS_ADMIN |
| capability, since, in the new namespace, |
| the creator will have the power to change global resources |
| that are visible to other processes that are subsequently created in, |
| or join the namespace. |
| User namespaces are the exception: since Linux 3.8, |
| no privilege is required to create a user namespace. |
| .\" |
| .\" ==================== The /proc/[pid]/ns/ directory ==================== |
| .\" |
| .SS The \fI/proc/\fPpid\fI/ns/\fP directory |
| Each process has a |
| .IR /proc/ pid /ns/ |
| .\" See commit 6b4e306aa3dc94a0545eb9279475b1ab6209a31f |
| subdirectory containing one entry for each namespace that |
| supports being manipulated by |
| .BR setns (2): |
| .P |
| .in +4n |
| .EX |
| $ \fBls \-l /proc/$$/ns | awk \[aq]{print $1, $9, $10, $11}\[aq]\fP |
| total 0 |
| lrwxrwxrwx. cgroup \-> cgroup:[4026531835] |
| lrwxrwxrwx. ipc \-> ipc:[4026531839] |
| lrwxrwxrwx. mnt \-> mnt:[4026531840] |
| lrwxrwxrwx. net \-> net:[4026531969] |
| lrwxrwxrwx. pid \-> pid:[4026531836] |
| lrwxrwxrwx. pid_for_children \-> pid:[4026531834] |
| lrwxrwxrwx. time \-> time:[4026531834] |
| lrwxrwxrwx. time_for_children \-> time:[4026531834] |
| lrwxrwxrwx. user \-> user:[4026531837] |
| lrwxrwxrwx. uts \-> uts:[4026531838] |
| .EE |
| .in |
| .P |
| Bind mounting (see |
| .BR mount (2)) |
| one of the files in this directory |
| to somewhere else in the filesystem keeps |
| the corresponding namespace of the process specified by |
| .I pid |
| alive even if all processes currently in the namespace terminate. |
| .P |
| Opening one of the files in this directory |
| (or a file that is bind mounted to one of these files) |
| returns a file handle for |
| the corresponding namespace of the process specified by |
| .IR pid . |
| As long as this file descriptor remains open, |
| the namespace will remain alive, |
| even if all processes in the namespace terminate. |
| The file descriptor can be passed to |
| .BR setns (2). |
| .P |
| In Linux 3.7 and earlier, these files were visible as hard links. |
| Since Linux 3.8, |
| .\" commit bf056bfa80596a5d14b26b17276a56a0dcb080e5 |
| they appear as symbolic links. |
| If two processes are in the same namespace, |
| then the device IDs and inode numbers of their |
| .IR /proc/ pid /ns/ xxx |
| symbolic links will be the same; an application can check this using the |
| .I stat.st_dev |
| .\" Eric Biederman: "I reserve the right for st_dev to be significant |
| .\" when comparing namespaces." |
| .\" https://lore.kernel.org/lkml/87poky5ca9.fsf@xmission.com/ |
| .\" Re: Documenting the ioctl interfaces to discover relationships... |
| .\" Date: Mon, 12 Dec 2016 11:30:38 +1300 |
| and |
| .I stat.st_ino |
| fields returned by |
| .BR stat (2). |
| The content of this symbolic link is a string containing |
| the namespace type and inode number as in the following example: |
| .P |
| .in +4n |
| .EX |
| $ \fBreadlink /proc/$$/ns/uts\fP |
| uts:[4026531838] |
| .EE |
| .in |
| .P |
| The symbolic links in this subdirectory are as follows: |
| .TP |
| .IR /proc/ pid /ns/cgroup " (since Linux 4.6)" |
| This file is a handle for the cgroup namespace of the process. |
| .TP |
| .IR /proc/ pid /ns/ipc " (since Linux 3.0)" |
| This file is a handle for the IPC namespace of the process. |
| .TP |
| .IR /proc/ pid /ns/mnt " (since Linux 3.8)" |
| .\" commit 8823c079ba7136dc1948d6f6dcb5f8022bde438e |
| This file is a handle for the mount namespace of the process. |
| .TP |
| .IR /proc/ pid /ns/net " (since Linux 3.0)" |
| This file is a handle for the network namespace of the process. |
| .TP |
| .IR /proc/ pid /ns/pid " (since Linux 3.8)" |
| .\" commit 57e8391d327609cbf12d843259c968b9e5c1838f |
| This file is a handle for the PID namespace of the process. |
| This handle is permanent for the lifetime of the process |
| (i.e., a process's PID namespace membership never changes). |
| .TP |
| .IR /proc/ pid /ns/pid_for_children " (since Linux 4.12)" |
| .\" commit eaa0d190bfe1ed891b814a52712dcd852554cb08 |
| This file is a handle for the PID namespace of |
| child processes created by this process. |
| This can change as a consequence of calls to |
| .BR unshare (2) |
| and |
| .BR setns (2) |
| (see |
| .BR pid_namespaces (7)), |
| so the file may differ from |
| .IR /proc/ pid /ns/pid . |
| The symbolic link gains a value only after the first child process |
| is created in the namespace. |
| (Beforehand, |
| .BR readlink (2) |
| of the symbolic link will return an empty buffer.) |
| .TP |
| .IR /proc/ pid /ns/time " (since Linux 5.6)" |
| This file is a handle for the time namespace of the process. |
| .TP |
| .IR /proc/ pid /ns/time_for_children " (since Linux 5.6)" |
| This file is a handle for the time namespace of |
| child processes created by this process. |
| This can change as a consequence of calls to |
| .BR unshare (2) |
| and |
| .BR setns (2) |
| (see |
| .BR time_namespaces (7)), |
| so the file may differ from |
| .IR /proc/ pid /ns/time . |
| .TP |
| .IR /proc/ pid /ns/user " (since Linux 3.8)" |
| .\" commit cde1975bc242f3e1072bde623ef378e547b73f91 |
| This file is a handle for the user namespace of the process. |
| .TP |
| .IR /proc/ pid /ns/uts " (since Linux 3.0)" |
| This file is a handle for the UTS namespace of the process. |
| .P |
| Permission to dereference or read |
| .RB ( readlink (2)) |
| these symbolic links is governed by a ptrace access mode |
| .B PTRACE_MODE_READ_FSCREDS |
| check; see |
| .BR ptrace (2). |
| .\" |
| .\" ==================== The /proc/sys/user directory ==================== |
| .\" |
| .SS The \fI/proc/sys/user\fP directory |
| The files in the |
| .I /proc/sys/user |
| directory (which is present since Linux 4.9) expose limits |
| on the number of namespaces of various types that can be created. |
| The files are as follows: |
| .TP |
| .I max_cgroup_namespaces |
| The value in this file defines a per-user limit on the number of |
| cgroup namespaces that may be created in the user namespace. |
| .TP |
| .I max_ipc_namespaces |
| The value in this file defines a per-user limit on the number of |
| ipc namespaces that may be created in the user namespace. |
| .TP |
| .I max_mnt_namespaces |
| The value in this file defines a per-user limit on the number of |
| mount namespaces that may be created in the user namespace. |
| .TP |
| .I max_net_namespaces |
| The value in this file defines a per-user limit on the number of |
| network namespaces that may be created in the user namespace. |
| .TP |
| .I max_pid_namespaces |
| The value in this file defines a per-user limit on the number of |
| PID namespaces that may be created in the user namespace. |
| .TP |
| .IR max_time_namespaces " (since Linux 5.7)" |
| .\" commit eeec26d5da8248ea4e240b8795bb4364213d3247 |
| The value in this file defines a per-user limit on the number of |
| time namespaces that may be created in the user namespace. |
| .TP |
| .I max_user_namespaces |
| The value in this file defines a per-user limit on the number of |
| user namespaces that may be created in the user namespace. |
| .TP |
| .I max_uts_namespaces |
| The value in this file defines a per-user limit on the number of |
| uts namespaces that may be created in the user namespace. |
| .P |
| Note the following details about these files: |
| .IP \[bu] 3 |
| The values in these files are modifiable by privileged processes. |
| .IP \[bu] |
| The values exposed by these files are the limits for the user namespace |
| in which the opening process resides. |
| .IP \[bu] |
| The limits are per-user. |
| Each user in the same user namespace |
| can create namespaces up to the defined limit. |
| .IP \[bu] |
| The limits apply to all users, including UID 0. |
| .IP \[bu] |
| These limits apply in addition to any other per-namespace |
| limits (such as those for PID and user namespaces) that may be enforced. |
| .IP \[bu] |
| Upon encountering these limits, |
| .BR clone (2) |
| and |
| .BR unshare (2) |
| fail with the error |
| .BR ENOSPC . |
| .IP \[bu] |
| For the initial user namespace, |
| the default value in each of these files is half the limit on the number |
| of threads that may be created |
| .RI ( /proc/sys/kernel/threads\-max ). |
| In all descendant user namespaces, the default value in each file is |
| .BR MAXINT . |
| .IP \[bu] |
| When a namespace is created, the object is also accounted |
| against ancestor namespaces. |
| More precisely: |
| .RS |
| .IP \[bu] 3 |
| Each user namespace has a creator UID. |
| .IP \[bu] |
| When a namespace is created, |
| it is accounted against the creator UIDs in each of the |
| ancestor user namespaces, |
| and the kernel ensures that the corresponding namespace limit |
| for the creator UID in the ancestor namespace is not exceeded. |
| .IP \[bu] |
| The aforementioned point ensures that creating a new user namespace |
| cannot be used as a means to escape the limits in force |
| in the current user namespace. |
| .RE |
| .\" |
| .SS Namespace lifetime |
| Absent any other factors, |
| a namespace is automatically torn down when the last process in |
| the namespace terminates or leaves the namespace. |
| However, there are a number of other factors that may pin |
| a namespace into existence even though it has no member processes. |
| These factors include the following: |
| .IP \[bu] 3 |
| An open file descriptor or a bind mount exists for the corresponding |
| .IR /proc/ pid /ns/* |
| file. |
| .IP \[bu] |
| The namespace is hierarchical (i.e., a PID or user namespace), |
| and has a child namespace. |
| .IP \[bu] |
| It is a user namespace that owns one or more nonuser namespaces. |
| .IP \[bu] |
| It is a PID namespace, |
| and there is a process that refers to the namespace via a |
| .IR /proc/ pid /ns/pid_for_children |
| symbolic link. |
| .IP \[bu] |
| It is a time namespace, |
| and there is a process that refers to the namespace via a |
| .IR /proc/ pid /ns/time_for_children |
| symbolic link. |
| .IP \[bu] |
| It is an IPC namespace, and a corresponding mount of an |
| .I mqueue |
| filesystem (see |
| .BR mq_overview (7)) |
| refers to this namespace. |
| .IP \[bu] |
| It is a PID namespace, and a corresponding mount of a |
| .BR proc (5) |
| filesystem refers to this namespace. |
| .SH EXAMPLES |
| See |
| .BR clone (2) |
| and |
| .BR user_namespaces (7). |
| .SH SEE ALSO |
| .BR nsenter (1), |
| .BR readlink (1), |
| .BR unshare (1), |
| .BR clone (2), |
| .BR ioctl_nsfs (2), |
| .BR setns (2), |
| .BR unshare (2), |
| .BR proc (5), |
| .BR capabilities (7), |
| .BR cgroup_namespaces (7), |
| .BR cgroups (7), |
| .BR credentials (7), |
| .BR ipc_namespaces (7), |
| .BR network_namespaces (7), |
| .BR pid_namespaces (7), |
| .BR user_namespaces (7), |
| .BR uts_namespaces (7), |
| .BR lsns (8), |
| .BR switch_root (8) |