netfs: Combine prepare and issue ops and grab the buffers on request
Modify the way subrequests are generated in netfslib to try and simplify
the code. The issue, primarily, is in writeback: the code has to create
multiple streams of write requests to disparate targets with different
properties (e.g. server and fscache), where not every folio needs to go to
every target (e.g. data just read from the server may only need writing to
the cache).
The current model in writeback, at least, is to go carefully through every
folio, preparing a subrequest for each stream when it was detected that
part of the current folio needed to go to that stream, and repeating this
within and across contiguous folios; then to issue subrequests as they
become full or hit boundaries after first setting up the buffer. However,
this is quite difficult to follow - and makes it tricky to handle
discontiguous folios in a request.
This is changed such that netfs now accumulates buffers and attaches them
to each stream when they become valid for that stream, then flushes the
stream when a limit or a boundary is hit. The issuing code in netfs then
loops around creating and issuing subrequests without calling a separate
prepare stage (though a function is provided to get an estimate of when
flushing should occur). The filesystem (or cache) then gets to take a
slice of the master bvec chain as its I/O buffer for each subrequest,
including discontiguities if it can support a sparse/vectored RPC (as Ceph
can).
Similar-ish changes also apply to buffered read and unbuffered read and
write, though in each of those cases there is only a single contiguous
stream. Though for buffered read this consists of interwoven requests from
multiple sources (server or cache).
To this end, netfslib is changed in the following ways:
(1) ->prepare_xxx(), buffer selection and ->issue_xxx() are now collapsed
together such that one ->issue_xxx() call is made with the subrequest
defined to the maximum extent; the filesystem/cache then reduces the
length of the subrequest and calls back to netfslib to grab a slice of
the buffer, which may reduce the subrequest further if a maximum
segment limit is set. The filesystem/cache then dispatches the
operation.
(2) Retry buffer tracking is added to the netfs_io_request struct. This
is then selected by the subrequest retry counter being non-zero.
(3) The use of iov_iter is pushed down to the filesystem. Netfslib now
provides the filesystem with a bvecq holding the buffer rather than an
iov_iter. The bvecq can be duplicated and headers/trailers attached
to hold protocol and several bvecqs can be linked together to create a
compound operation.
(4) The ->issue_xxx() functions now return an error code that allows them
to return an error without having to terminate the subrequest.
Netfslib will handle the error immediately if it can but may request
termination and punt responsibility to the result collector.
->issue_xxx() can return 0 if synchronously compete and -EIOCBQUEUED
if the operation will complete (or already has completed)
asynchronously.
(5) During writeback, netfslib now builds up an accumulation of buffered
data before issuing writes on each stream (one server, one cache). It
asks each stream for an estimate of how much data to accumulate before
it next generates subrequests on the stream. The filesystem or cache
is not required to use up all the data accumulated on a stream at that
time unless the end of the pagecache is hit.
(6) During read-gaps, in which there are two gaps on either end of a dirty
streaming write page that need to be filled, a buffer is constructed
consisting of the two ends plus a sink page repeated to cover the
middle portion. This is passed to the server as a single write. For
something like Ceph, this should probably be done either as a
vectored/sparse read or as two separate reads (if different Ceph
objects are involved).
(7) During unbuffered/DIO read/write, there is a single contiguous file
region to be read or written as a single stream. The dispatching
function just creates subrequests and calls ->issue_xxx() repeatedly
to eat through the bufferage.
(8) At the start of buffered read, the entire set of folios allocated by
VM readahead is loaded into a bvecq chain, rather than trying to do it
piecemeal as-needed. As the pages were already added and locked by
the VM, this is slightly more efficient than loading piecemeal as only
a single iteration of the xarray is required.
(9) During buffered read, there is a single contiguous file region, to
read as a single stream - however, this stream may be stitched
together from subrequests to multiple sources. Which sources are used
where is now determined by querying the cache to find the next couple
of extents in which it has data; netfslib uses this to direct the
subrequests towards the appropriate sources.
Each subrequest is given the maximum length in the current extent and
then ->issue_read() is called. The filesystem then limits the size
and slices off a piece of the buffer for that extent.
(10) Cachefiles now provides an estimation function that indicates the
standard maxima for doing DIO (MAX_RW_COUNT and BIO_MAX_VECS).
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Paulo Alcantara <pc@manguebit.org>
cc: Matthew Wilcox <willy@infradead.org>
cc: Christoph Hellwig <hch@infradead.org>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
39 files changed