user-namespaced file capabilities - now with more magic

This patch introduces a new security.nscapability xattr.  It
is mostly like security.capability, but also lists a 'rootid'.
This is the uid_t (in init_user_ns) of the root id (uid 0 in a
namespace) in whose namespaces the file capabilities may take
effect.

A privileged (cap_setfcap) process in the initial user ns may
set and read this xattr directly.  However, its real intent is
to be used as a transparent fallback in user namespaces.

Root in a user ns cannot be trusted to write security.capability
xattrs, because any user on the host could map his own uid to root
in a namespace, write the xattr, and execute the file with privilege
on the host.

With this patch, when root in a user ns asks to write security.capability,
the kernel will transparently write a security.nscapability xattr
instead, filling in the kuid of the calling user's root uid.  Subsequently,
any task executing the file which has the noted k_uid as its root uid,
or which is in a descendent user_ns of such a user_ns, will run the
file with capabilities.

When reading the security.capability xattr from a non-init user_ns, a valid
security.nscapability will be shown if it exists.  Such a task is not
allowed to read security.nscapability.  This could be accomodated, however
it requires the kernel to convert the kuid_t to a valid uid in the reader's
user_ns.  So for now it's simply not supported.

Only a single security.nscapability xattr may be written.  This patch
could be expanded to allow a list of capabilities and rootids, however
I do not believe that to be a worthwhile use case.

This allows a simple setxattr to work, allows tar/untar to
work, and allows us to tar in one namespace and untar in
another while preserving the capability, without risking
leaking privilege into a parent namespace.

Note - listxattr is not being handled here.  So results of that can be
inconsistent with get/setxattr.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
5 files changed