writeback: throttle buffered writeback

Test patch that throttles buffered writeback to make it a lot
more smooth, and has way less impact on other system activity.
Background writeback should be, by definition, background
activity. The fact that we flush huge bundles of it at the time
means that it potentially has heavy impacts on foreground workloads,
which isn't ideal. We can't easily limit the sizes of writes that
we do, since that would impact file system layout in the presence
of delayed allocation. So just throttle back buffered writeback,
unless someone is waiting for it.

Would likely need a dynamic adaption to the current device, this
one has only been tested on NVMe. But it brings down background
activity impact from 1-2s to tens of milliseconds instead.

This is just a test patch, and as such, it registers a queue sysfs
entry to both monitor the current state:

$ cat /sys/block/nvme0n1/queue/wb_stats
limit=4, batch=2, inflight=0, wait=0, timer=0

'limit' denotes how many requests we will allow inflight for buffered
writeback, this settings can be tweaked through writing to the
'wb_depth' file. Writing '0' turns this off completely. 'inflight' shows
how many requests are currently inflight for buffered writeback, 'wait'
shows if anyone is currently waiting for access, and 'timer' shows
if we have processes being deferred in write back cache timeout.

Background buffered writeback will be throttled at depth 'wb_depth',
and even lower (QD=1) if the device recently completed "competing" IO.
If we are doing reclaim or otherwise sync buffered writeback, the limit
is increased 4x to achieve full device bandwidth.

Finally, if the device has write back caching, 'wb_cache_delay' delays
by this amount of usecs when a write completes before allowing more.

Signed-off-by: Jens Axboe <axboe@fb.com>
8 files changed