kernfs, sysfs, cgroup: restrict extra perm check on open to sysfs

The kernfs open method - kernfs_fop_open() - inherited extra
permission checks from sysfs.  While the vfs layer allows ignoring the
read/write permissions checks if the issuer has CAP_DAC_OVERRIDE,
sysfs explicitly denied open regardless of the cap if the file doesn't
have any of the UGO perms of the requested access or doesn't implement
the requested operation.  It can be debated whether this was a good
idea or not but the behavior is too subtle and dangerous to change at
this point.

After cgroup got converted to kernfs, this extra perm check also got
applied to cgroup breaking libcgroup which opens write-only files with
O_RDWR as root.  This patch gates the extra open permission check with
a new flag KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK and enables it for sysfs.
For sysfs, nothing changes.  For cgroup, root now can perform any
operation regardless of the permissions as it was before kernfs
conversion.  Note that kernfs still fails unimplemented operations
with -EINVAL.

While at it, add comments explaining KERNFS_ROOT flags.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andrey Wagin <avagin@gmail.com>
Tested-by: Andrey Wagin <avagin@gmail.com>
Cc: Li Zefan <lizefan@huawei.com>
References: http://lkml.kernel.org/g/CANaxB-xUm3rJ-Cbp72q-rQJO5mZe1qK6qXsQM=vh0U8upJ44+A@mail.gmail.com
Fixes: 2bd59d48ebfb ("cgroup: convert to kernfs")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
diff --git a/fs/kernfs/file.c b/fs/kernfs/file.c
index e01ea4a..5e9a80c 100644
--- a/fs/kernfs/file.c
+++ b/fs/kernfs/file.c
@@ -610,6 +610,7 @@
 static int kernfs_fop_open(struct inode *inode, struct file *file)
 {
 	struct kernfs_node *kn = file->f_path.dentry->d_fsdata;
+	struct kernfs_root *root = kernfs_root(kn);
 	const struct kernfs_ops *ops;
 	struct kernfs_open_file *of;
 	bool has_read, has_write, has_mmap;
@@ -624,14 +625,16 @@
 	has_write = ops->write || ops->mmap;
 	has_mmap = ops->mmap;
 
-	/* check perms and supported operations */
-	if ((file->f_mode & FMODE_WRITE) &&
-	    (!(inode->i_mode & S_IWUGO) || !has_write))
-		goto err_out;
+	/* see the flag definition for details */
+	if (root->flags & KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK) {
+		if ((file->f_mode & FMODE_WRITE) &&
+		    (!(inode->i_mode & S_IWUGO) || !has_write))
+			goto err_out;
 
-	if ((file->f_mode & FMODE_READ) &&
-	    (!(inode->i_mode & S_IRUGO) || !has_read))
-		goto err_out;
+		if ((file->f_mode & FMODE_READ) &&
+		    (!(inode->i_mode & S_IRUGO) || !has_read))
+			goto err_out;
+	}
 
 	/* allocate a kernfs_open_file for the file */
 	error = -ENOMEM;
diff --git a/fs/sysfs/mount.c b/fs/sysfs/mount.c
index a66ad61..8794423 100644
--- a/fs/sysfs/mount.c
+++ b/fs/sysfs/mount.c
@@ -63,7 +63,8 @@
 {
 	int err;
 
-	sysfs_root = kernfs_create_root(NULL, 0, NULL);
+	sysfs_root = kernfs_create_root(NULL, KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK,
+					NULL);
 	if (IS_ERR(sysfs_root))
 		return PTR_ERR(sysfs_root);
 
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index b0122dc..ca1be5c 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -50,7 +50,24 @@
 
 /* @flags for kernfs_create_root() */
 enum kernfs_root_flag {
-	KERNFS_ROOT_CREATE_DEACTIVATED = 0x0001,
+	/*
+	 * kernfs_nodes are created in the deactivated state and invisible.
+	 * They require explicit kernfs_activate() to become visible.  This
+	 * can be used to make related nodes become visible atomically
+	 * after all nodes are created successfully.
+	 */
+	KERNFS_ROOT_CREATE_DEACTIVATED		= 0x0001,
+
+	/*
+	 * For regular flies, if the opener has CAP_DAC_OVERRIDE, open(2)
+	 * succeeds regardless of the RW permissions.  sysfs had an extra
+	 * layer of enforcement where open(2) fails with -EACCES regardless
+	 * of CAP_DAC_OVERRIDE if the permission doesn't have the
+	 * respective read or write access at all (none of S_IRUGO or
+	 * S_IWUGO) or the respective operation isn't implemented.  The
+	 * following flag enables that behavior.
+	 */
+	KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK	= 0x0002,
 };
 
 /* type-specific structures for kernfs_node union members */