block: handle BLK_OPEN_RESTRICT_WRITES correctly
Last kernel release we introduce CONFIG_BLK_DEV_WRITE_MOUNTED. By
default this option is set. When it is set the long-standing behavior
of being able to write to mounted block devices is enabled.
But in order to guard against unintended corruption by writing to the
block device buffer cache CONFIG_BLK_DEV_WRITE_MOUNTED can be turned
off. In that case it isn't possible to write to mounted block devices
anymore.
A filesystem may open its block devices with BLK_OPEN_RESTRICT_WRITES
which disallows concurrent BLK_OPEN_WRITE access. When we still had the
bdev handle around we could recognize BLK_OPEN_RESTRICT_WRITES because
the mode was passed around. Since we managed to get rid of the bdev
handle we changed that logic to recognize BLK_OPEN_RESTRICT_WRITES based
on whether the file was opened writable and writes to that block device
are blocked. That logic doesn't work because we do allow
BLK_OPEN_RESTRICT_WRITES to be specified without BLK_OPEN_WRITE.
So fix the detection logic. When we open a block device with
BLK_OPEN_RESTRICT_WRITES we know that a holder must've been specified as
we forbid BLK_OPEN_RESTRICT_WRITES without a holder in
bdev_permission(). The holder will be stashed in
bdev_file->private_data. So we check whether bdev_file->private_data is
set and whether writes are blocked. If so, we unblock writes. Otherwise,
if the bdev_file was opened writable we just decrement the write count.
Note that BLK_OPEN_RESTRICT_WRITES is an internal only flag that cannot
directly be raised by userspace. It is implicitly raised during
mounting. So any concurrent writable request from userspace will fail
and so recognition based on bdev_file->private_data is safe.
Passes xftests and blktests with CONFIG_BLK_DEV_WRITE_MOUNTED set and
unset.
Fixes: 321de651fa56 ("block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access")
Reported-by: Matthew Wilcox <willy@infradead.org>
Link: https://lore.kernel.org/r/ZfyyEwu9Uq5Pgb94@casper.infradead.org
Signed-off-by: Christian Brauner <brauner@kernel.org>
---
@Jan, I have one question for you. Currently your original changes in
v6.8 do allow for a block device to be reopened with
BLK_OPEN_RESTRICT_WRITES provided the same holder is used as per
bdev_may_open(). I think that may have a bug.
The first opener @f1 of that block device will set bdev->bd_writers to
-1. The second opener @f2 using the same holder will pass the check in
bdev_may_open() that bdev->bd_writers must not be greater than zero.
The first opener @f1 now closes the block device and in bdev_release()
will end up calling bdev_yield_write_access() which calls
bdev_writes_blocked() and sets bdev->bd_writers to 0 again.
Now @f2 holds a file to that block device which was opened with
exclusive write access but bdev->bd_writers has been reset to 0.
So now @f3 comes along and succeeds in opening the block device with
BLK_OPEN_WRITE betraying @f2's request to have exclusive write access.
This isn't a practical issue yet because afaict there's no codepath
inside the kernel that reopenes the same block device with
BLK_OPEN_RESTRICT_WRITES but it will be if there is.
If that's right then either this needs to be fixed by counting the
number of BLK_OPEN_RESTRICT_WRITES (could just be counting negative) or
we simply enforce that BLK_OPEN_RESTRICT_WRITES means that there's only
one opener a la:
diff --git a/block/bdev.c b/block/bdev.c
index e9f1b12bd75c..cefb94a75530 100644
--- a/block/bdev.c
+++ b/block/bdev.c
@@ -758,7 +758,7 @@ static bool bdev_may_open(struct block_device *bdev, blk_mode_t mode)
/* Writes blocked? */
if (mode & BLK_OPEN_WRITE && bdev_writes_blocked(bdev))
return false;
- if (mode & BLK_OPEN_RESTRICT_WRITES && bdev->bd_writers > 0)
+ if (mode & BLK_OPEN_RESTRICT_WRITES && bdev->bd_writers != 0)
return false;
return true;
}
But maybe I'm just not reading this right. Let me know.
Christian
---
1 file changed