fs: use READ_ONCE/WRITE_ONCE with the i_size helpers
I spent the last few weeks running down a weird regression in btrfs we
were seeing in production. It turned out to be introduced by
62b37622718c, which took the following
loff_t isize = i_size_read(inode);
actual_end = min_t(u64, isize, end + 1);
and turned it into
actual_end = min_t(u64, i_size_read(inode), end + 1);
The problem here is that the compiler is optimizing out the temporary
variables used in __cmp_once, so the resulting assembly looks like this
498 actual_end = min_t(u64, i_size_read(inode), end + 1);
0xffffffff814b08c1 <+145>: 48 8b 44 24 28 mov 0x28(%rsp),%rax
0xffffffff814b08c6 <+150>: 48 39 45 50 cmp %rax,0x50(%rbp)
0xffffffff814b08ca <+154>: 48 89 c6 mov %rax,%rsi
0xffffffff814b08cd <+157>: 48 0f 46 75 50 cmovbe 0x50(%rbp),%rsi
as you can see we read the value of the inode to compare, and then we
read it a second time to assign it.
This code is simply an optimization, so there's no locking to keep
i_size from changing, however we really need min_t to actually return
the minimum value for these two values, which it is failing to do.
We've reverted that patch for now to fix the problem, but it's only a
matter of time before the compiler becomes smart enough to optimize out
the loff_t isize intermediate variable as well.
Instead we want to make it explicit that i_size_read() should only read
the value once. This will keep this class of problem from happening in
the future, regardless of what the compiler chooses to do. With this
change we get the following assembly generated for this code
491 actual_end = min_t(u64, i_size_read(inode), end + 1);
0xffffffff8148f625 <+149>: 48 8b 44 24 20 mov 0x20(%rsp),%rax
./include/linux/compiler.h:
199 __READ_ONCE_SIZE;
0xffffffff8148f62a <+154>: 4c 8b 75 50 mov 0x50(%rbp),%r14
fs/btrfs/inode.c:
491 actual_end = min_t(u64, i_size_read(inode), end + 1);
0xffffffff8148f62e <+158>: 49 39 c6 cmp %rax,%r14
0xffffffff8148f631 <+161>: 4c 0f 47 f0 cmova %rax,%r14
and this works out properly, we only read the value once and so we won't
trip over this problem again.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
1 file changed