refs/tags/xfs-for-linus-v3.13-rc1-2 - pub/scm/linux/kernel/git/mbroz/linux

tag	02991738495c075cbdfc42e16b43acd38b942faf
tagger	Ben Myers <bpm@sgi.com>	Thu Nov 21 10:32:43 2013 -0600
object	2fe8c1c08b3fbd87dd2641c8f032ff6e965d5803

xfs: update #2 for v3.13-rc1 Here we have a performance fix for inode iversion, increased inode cluster size for v5 superblock filesystems, a fix for error handling in xfs_bmap_add_attrfork, and a MAINTAINERS update to add Dave.

commit	2fe8c1c08b3fbd87dd2641c8f032ff6e965d5803	[log] [tgz]
author	Dave Chinner <dchinner@redhat.com>	Fri Nov 01 15:27:17 2013 +1100
committer	Ben Myers <bpm@sgi.com>	Mon Nov 18 09:42:08 2013 -0600
tree	6770197965f678c93bbbdbe77914bf7d3268266d
parent	8f80587bacb6eb893df0ee4e35fefa0dfcfdf9f4 [diff]

xfs: open code inc_inode_iversion when logging an inode Michael L Semon reported that generic/069 runtime increased on v5 superblocks by 100% compared to v4 superblocks. his perf-based analysis pointed directly at the timestamp updates being done by the write path in this workload. The append writers are doing 4-byte writes, so there are lots of timestamp updates occurring. The thing is, they aren't being triggered by timestamp changes - they are being triggered by the inode change counter needing to be updated. That is, every write(2) system call needs to bump the inode version count, and it does that through the timestamp update mechanism. Hence for v5 filesystems, test generic/069 is running 3 orders of magnitude more timestmap update transactions on v5 filesystems due to the fact it does a huge number of *4 byte* write(2) calls. This isn't a real world scenario we really need to address - anyone doing such sequential IO should be using fwrite(3), not write(2). i.e. fwrite(3) buffers the writes in userspace to minimise the number of write(2) syscalls, and the problem goes away. However, there is a small change we can make to improve the situation - removing the expensive lock operation on the change counter update. All inode version counter changes in XFS occur under the ip->i_ilock during a transaction, and therefore we don't actually need the spin lock that provides exclusive access to it through inc_inode_iversion(). Hence avoid the lock and just open code the increment ourselves when logging the inode. Reported-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>