user-namespaced file capabilities - now with even more magic

Root in a user ns cannot be trusted to write a traditional
security.capability xattr, because any user on the host could map his
own uid to root in a namespace, write the xattr, and execute the file
with privilege on the host.

This patch introduces v4 of the security.capability xattr.  It builds a
vfs_ns_cap_data struct by appending a uid_t rootid to struct
vfs_cap_data.  This is the uid_t (in init_user_ns) of the root id (uid 0
in a namespace) in whose namespaces the file capabilities may take
effect.

When root in a user ns asks to write security.capability, the kernel
will transparently rewrite the xattr as a v4 with the appropriate
rootid.  Subsequently, any task executing the file which has the noted
kuid as its root uid, or which is in a descendent user_ns of such a
user_ns, will run the file with capabilities.

Only a single security.capability xattr may be written.  A task may
overwrite the existing one so long as it was written by a user mapped
into his own user_ns over which he has CAP_SETFCAP.

This allows a simple setxattr to work, allows tar/untar to work, and
allows us to tar in one namespace and untar in another while preserving
the capability, without risking leaking privilege into a parent
namespace.

Note - because getxattr will return a v4 file capability, userspace
does need updates.  the setcap(8) program will work unmodified, but
getcap(8) and filecap(8) may not.

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
5 files changed