blob: c59cf8320fbf9d90a768059325ccf9938c1e4ec5 [file] [log] [blame]
Anticipatory IO scheduler
-------------------------
Nick Piggin <piggin@cyberone.com.au> 13 Sep 2003
Attention! Database servers, especially those using "TCQ" disks should
investigate performance with the 'deadline' IO scheduler. Any system with high
disk performance requirements should do so, in fact.
If you see unusual performance characteristics of your disk systems, or you
see big performance regressions versus the deadline scheduler, please email
me. Database users don't bother unless you're willing to test a lot of patches
from me ;) its a known issue.
Selecting IO schedulers
-----------------------
To choose IO schedulers at boot time, use the argument 'elevator=deadline'.
'noop' and 'as' (the default) are also available. IO schedulers are assigned
globally at boot time only presently.
Tuning the anticipatory IO scheduler
------------------------------------
When using 'as', the anticipatory IO scheduler there are 5 parameters under
/sys/block/*/iosched/. All are units of milliseconds.
The parameters are:
* read_expire
Controls how long until a request becomes "expired". It also controls the
interval between which expired requests are served, so set to 50, a request
might take anywhere < 100ms to be serviced _if_ it is the next on the
expired list. Obviously it won't make the disk go faster. The result
basically equates to the timeslice a single reader gets in the presence of
other IO. 100*((seek time / read_expire) + 1) is very roughly the %
streaming read efficiency your disk should get with multiple readers.
* read_batch_expire
Controls how much time a batch of reads is given before pending writes are
served. Higher value is more efficient. This might be set below read_expire
if writes are to be given higher priority than reads, but reads are to be
as efficient as possible when there are no writes. Generally though, it
should be some multiple of read_expire.
* write_expire, and
* write_batch_expire are equivalent to the above, for writes.
* antic_expire
Controls the maximum amount of time we can anticipate a good read before
giving up. Many other factors may cause anticipation to be stopped early,
or some processes will not be "anticipated" at all. Should be a bit higher
for big seek time devices though not a linear correspondence - most
processes have only a few ms thinktime.