capabilities: prevent by-passing lack of CAP_SETFCAP (v4)

Current, a process running as uid 0 but without cap_setfcap can
unshare a new user namespace with uid 0 mapped to 0.  While this task
will not have new capabilities against the parent namespace, there is
a loophole due to the way namespaced file capabilities work.  File
capabilities valid in userns 1 are distinguised from file capabilities
valid in userns 2 by the kuid which underlies uid 0.  Therefore
the restricted root process can unshare a new self-mapping namespace,
add a namespaced file capability onto a file, then use that file
capability in the parent namespace.

To prevent that, mark a namespace which should not be allowed to
create file capabilities, and honor that when creating fscaps.

When a task creates a user namespace, mark in the child whether
the parent had cap_setfcap.

When a user namespace gets its uid 0 mapped, check whether that
uid 0 is shared with uid 0 for any ancestors.  If so, verify
that that ancestor had cap_setfcap when it created its immediate
child.  If not, then mark the new namespace as !may_setfcap.
When creating a namespaced file capability, refuse if may_setfcap
is false.

With this patch:

1. unprivileged user can still unshare -Ur

ubuntu@caps:~$ unshare -Ur
root@caps:~# logout

2. root user can still unshare -Ur

ubuntu@caps:~$ sudo bash
root@caps:/home/ubuntu# unshare -Ur
root@caps:/home/ubuntu# logout

3. root user without CAP_SETFCAP cannot unshare -Ur:

root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
unable to set CAP_SETFCAP effective capability: Operation not permitted
root@caps:/home/ubuntu# unshare -Ur
unshare: write failed /proc/self/uid_map: Operation not permitted

Signed-off-by: Serge Hallyn <>
4 files changed