Introduce v3 namespaced file capabilities
Root in a user ns cannot be trusted to write a traditional
security.capability xattr. If it were allowed to do so, then any
unprivileged user on the host could map his own uid to root in a
namespace, write the xattr, and execute the file with privilege on the
host.
However, supporting file capabilities in a user namespace is very
desirable. Not doing so means that any programs designed to run
with limited privilege must continue to support other methods of
gaining and dropping privilege. For instance a program installer
must detect whether file capabilities can be assigned, and assign
them if so but set setuid-root otherwise.
This patch introduces v3 of the security.capability xattr. It builds a
vfs_ns_cap_data struct by appending a uid_t rootid to struct
vfs_cap_data. This is the absolute uid_t (i.e. the uid_t in
init_user_ns) of the root id (uid 0 in a namespace) in whose namespaces
the file capabilities may take effect.
When a task in a user ns (which is privileged with CAP_SETFCAP toward
that user_ns) asks to write v2 security.capability, the kernel will
transparently rewrite the xattr as a v3 with the appropriate rootid.
Subsequently, any task executing the file which has the noted kuid as
its root uid, or which is in a descendent user_ns of such a user_ns,
will run the file with capabilities.
Similarly when asking to read a v2 capability, a v3 capability will
be presented as v2 if it applies to the caller'snamespace.
If a task writes a v3 security.capability, then it can provide a
uid (valid within its own user namespace, over which it has CAP_SETFCAP)
for the xattr. The kernel will translate that to the absolute uid, and
write that to disk. After this, a task in the writer's namespace will
not be able to use those capabilities, but a task in a namespace where
the given uid is root will.
Only a single security.capability xattr may be written. A task may
overwrite the existing one so long as it was written by a user mapped
into his own user_ns over which he has CAP_SETFCAP.
This allows a simple setxattr to work, allows tar/untar to work, and
allows us to tar in one namespace and untar in another while preserving
the capability, without risking leaking privilege into a parent
namespace.
Changelog:
Nov 02 2016: fix invalid check at refuse_fcap_overwrite()
Nov 07 2016: convert rootid from and to fs user_ns
(From ebiederm: mar 28 2017)
commoncap.c: fix typos - s/v4/v3
get_vfs_caps_from_disk: clarify the fs_ns root access check
nsfscaps: change the code split for cap_inode_setxattr()
Apr 09 2017:
don't return v3 cap for caps owned by current root.
return a v2 cap for a true v2 cap in non-init ns
5 files changed