| .\" Copyright Neil Brown and others. |
| .\" This program is free software; you can redistribute it and/or modify |
| .\" it under the terms of the GNU General Public License as published by |
| .\" the Free Software Foundation; either version 2 of the License, or |
| .\" (at your option) any later version. |
| .\" See file COPYING in distribution for details. |
| .if n .pl 1000v |
| .TH MD 4 |
| .SH NAME |
| md \- Multiple Device driver aka Linux Software RAID |
| .SH SYNOPSIS |
| .BI /dev/md n |
| .br |
| .BI /dev/md/ n |
| .br |
| .BR /dev/md/ name |
| .SH DESCRIPTION |
| The |
| .B md |
| driver provides virtual devices that are created from one or more |
| independent underlying devices. This array of devices often contains |
| redundancy and the devices are often disk drives, hence the acronym RAID |
| which stands for a Redundant Array of Independent Disks. |
| .PP |
| .B md |
| supports RAID levels |
| 1 (mirroring), |
| 4 (striped array with parity device), |
| 5 (striped array with distributed parity information), |
| 6 (striped array with distributed dual redundancy information), and |
| 10 (striped and mirrored). |
| If some number of underlying devices fails while using one of these |
| levels, the array will continue to function; this number is one for |
| RAID levels 4 and 5, two for RAID level 6, and all but one (N-1) for |
| RAID level 1, and dependent on configuration for level 10. |
| .PP |
| .B md |
| also supports a number of pseudo RAID (non-redundant) configurations |
| including RAID0 (striped array), LINEAR (catenated array), |
| MULTIPATH (a set of different interfaces to the same device), |
| and FAULTY (a layer over a single device into which errors can be injected). |
| |
| .SS MD METADATA |
| Each device in an array may have some |
| .I metadata |
| stored in the device. This metadata is sometimes called a |
| .BR superblock . |
| The metadata records information about the structure and state of the array. |
| This allows the array to be reliably re-assembled after a shutdown. |
| |
| From Linux kernel version 2.6.10, |
| .B md |
| provides support for two different formats of metadata, and |
| other formats can be added. Prior to this release, only one format is |
| supported. |
| |
| The common format \(em known as version 0.90 \(em has |
| a superblock that is 4K long and is written into a 64K aligned block that |
| starts at least 64K and less than 128K from the end of the device |
| (i.e. to get the address of the superblock round the size of the |
| device down to a multiple of 64K and then subtract 64K). |
| The available size of each device is the amount of space before the |
| super block, so between 64K and 128K is lost when a device in |
| incorporated into an MD array. |
| This superblock stores multi-byte fields in a processor-dependent |
| manner, so arrays cannot easily be moved between computers with |
| different processors. |
| |
| The new format \(em known as version 1 \(em has a superblock that is |
| normally 1K long, but can be longer. It is normally stored between 8K |
| and 12K from the end of the device, on a 4K boundary, though |
| variations can be stored at the start of the device (version 1.1) or 4K from |
| the start of the device (version 1.2). |
| This metadata format stores multibyte data in a |
| processor-independent format and supports up to hundreds of |
| component devices (version 0.90 only supports 28). |
| |
| The metadata contains, among other things: |
| .TP |
| LEVEL |
| The manner in which the devices are arranged into the array |
| (LINEAR, RAID0, RAID1, RAID4, RAID5, RAID10, MULTIPATH). |
| .TP |
| UUID |
| a 128 bit Universally Unique Identifier that identifies the array that |
| contains this device. |
| |
| .PP |
| When a version 0.90 array is being reshaped (e.g. adding extra devices |
| to a RAID5), the version number is temporarily set to 0.91. This |
| ensures that if the reshape process is stopped in the middle (e.g. by |
| a system crash) and the machine boots into an older kernel that does |
| not support reshaping, then the array will not be assembled (which |
| would cause data corruption) but will be left untouched until a kernel |
| that can complete the reshape processes is used. |
| |
| .SS ARRAYS WITHOUT METADATA |
| While it is usually best to create arrays with superblocks so that |
| they can be assembled reliably, there are some circumstances when an |
| array without superblocks is preferred. These include: |
| .TP |
| LEGACY ARRAYS |
| Early versions of the |
| .B md |
| driver only supported LINEAR and RAID0 configurations and did not use |
| a superblock (which is less critical with these configurations). |
| While such arrays should be rebuilt with superblocks if possible, |
| .B md |
| continues to support them. |
| .TP |
| FAULTY |
| Being a largely transparent layer over a different device, the FAULTY |
| personality doesn't gain anything from having a superblock. |
| .TP |
| MULTIPATH |
| It is often possible to detect devices which are different paths to |
| the same storage directly rather than having a distinctive superblock |
| written to the device and searched for on all paths. In this case, |
| a MULTIPATH array with no superblock makes sense. |
| .TP |
| RAID1 |
| In some configurations it might be desired to create a RAID1 |
| configuration that does not use a superblock, and to maintain the state of |
| the array elsewhere. While not encouraged for general use, it does |
| have special-purpose uses and is supported. |
| |
| .SS ARRAYS WITH EXTERNAL METADATA |
| |
| From release 2.6.28, the |
| .I md |
| driver supports arrays with externally managed metadata. That is, |
| the metadata is not managed by the kernel but rather by a user-space |
| program which is external to the kernel. This allows support for a |
| variety of metadata formats without cluttering the kernel with lots of |
| details. |
| .PP |
| .I md |
| is able to communicate with the user-space program through various |
| sysfs attributes so that it can make appropriate changes to the |
| metadata \- for example to mark a device as faulty. When necessary, |
| .I md |
| will wait for the program to acknowledge the event by writing to a |
| sysfs attribute. |
| The manual page for |
| .IR mdmon (8) |
| contains more detail about this interaction. |
| |
| .SS CONTAINERS |
| Many metadata formats use a single block of metadata to describe a |
| number of different arrays which all use the same set of devices. |
| In this case it is helpful for the kernel to know about the full set |
| of devices as a whole. This set is known to md as a |
| .IR container . |
| A container is an |
| .I md |
| array with externally managed metadata and with device offset and size |
| so that it just covers the metadata part of the devices. The |
| remainder of each device is available to be incorporated into various |
| arrays. |
| |
| .SS LINEAR |
| |
| A LINEAR array simply catenates the available space on each |
| drive to form one large virtual drive. |
| |
| One advantage of this arrangement over the more common RAID0 |
| arrangement is that the array may be reconfigured at a later time with |
| an extra drive, so the array is made bigger without disturbing the |
| data that is on the array. This can even be done on a live |
| array. |
| |
| If a chunksize is given with a LINEAR array, the usable space on each |
| device is rounded down to a multiple of this chunksize. |
| |
| .SS RAID0 |
| |
| A RAID0 array (which has zero redundancy) is also known as a |
| striped array. |
| A RAID0 array is configured at creation with a |
| .B "Chunk Size" |
| which must be a power of two (prior to Linux 2.6.31), and at least 4 |
| kibibytes. |
| |
| The RAID0 driver assigns the first chunk of the array to the first |
| device, the second chunk to the second device, and so on until all |
| drives have been assigned one chunk. This collection of chunks forms a |
| .BR stripe . |
| Further chunks are gathered into stripes in the same way, and are |
| assigned to the remaining space in the drives. |
| |
| If devices in the array are not all the same size, then once the |
| smallest device has been exhausted, the RAID0 driver starts |
| collecting chunks into smaller stripes that only span the drives which |
| still have remaining space. |
| |
| A bug was introduced in linux 3.14 which changed the layout of blocks in |
| a RAID0 beyond the region that is striped over all devices. This bug |
| does not affect an array with all devices the same size, but can affect |
| other RAID0 arrays. |
| |
| Linux 5.4 (and some stable kernels to which the change was backported) |
| will not normally assemble such an array as it cannot know which layout |
| to use. There is a module parameter "raid0.default_layout" which can be |
| set to "1" to force the kernel to use the pre-3.14 layout or to "2" to |
| force it to use the 3.14-and-later layout. when creating a new RAID0 |
| array, |
| .I mdadm |
| will record the chosen layout in the metadata in a way that allows newer |
| kernels to assemble the array without needing a module parameter. |
| |
| To assemble an old array on a new kernel without using the module parameter, |
| use either the |
| .B "--update=layout-original" |
| option or the |
| .B "--update=layout-alternate" |
| option. |
| |
| Once you have updated the layout you will not be able to mount the array |
| on an older kernel. If you need to revert to an older kernel, the |
| layout information can be erased with the |
| .B "--update=layout-unspecificed" |
| option. If you use this option to |
| .B --assemble |
| while running a newer kernel, the array will NOT assemble, but the |
| metadata will be update so that it can be assembled on an older kernel. |
| |
| No that setting the layout to "unspecified" removes protections against |
| this bug, and you must be sure that the kernel you use matches the |
| layout of the array. |
| |
| .SS RAID1 |
| |
| A RAID1 array is also known as a mirrored set (though mirrors tend to |
| provide reflected images, which RAID1 does not) or a plex. |
| |
| Once initialised, each device in a RAID1 array contains exactly the |
| same data. Changes are written to all devices in parallel. Data is |
| read from any one device. The driver attempts to distribute read |
| requests across all devices to maximise performance. |
| |
| All devices in a RAID1 array should be the same size. If they are |
| not, then only the amount of space available on the smallest device is |
| used (any extra space on other devices is wasted). |
| |
| Note that the read balancing done by the driver does not make the RAID1 |
| performance profile be the same as for RAID0; a single stream of |
| sequential input will not be accelerated (e.g. a single dd), but |
| multiple sequential streams or a random workload will use more than one |
| spindle. In theory, having an N-disk RAID1 will allow N sequential |
| threads to read from all disks. |
| |
| Individual devices in a RAID1 can be marked as "write-mostly". |
| These drives are excluded from the normal read balancing and will only |
| be read from when there is no other option. This can be useful for |
| devices connected over a slow link. |
| |
| .SS RAID4 |
| |
| A RAID4 array is like a RAID0 array with an extra device for storing |
| parity. This device is the last of the active devices in the |
| array. Unlike RAID0, RAID4 also requires that all stripes span all |
| drives, so extra space on devices that are larger than the smallest is |
| wasted. |
| |
| When any block in a RAID4 array is modified, the parity block for that |
| stripe (i.e. the block in the parity device at the same device offset |
| as the stripe) is also modified so that the parity block always |
| contains the "parity" for the whole stripe. I.e. its content is |
| equivalent to the result of performing an exclusive-or operation |
| between all the data blocks in the stripe. |
| |
| This allows the array to continue to function if one device fails. |
| The data that was on that device can be calculated as needed from the |
| parity block and the other data blocks. |
| |
| .SS RAID5 |
| |
| RAID5 is very similar to RAID4. The difference is that the parity |
| blocks for each stripe, instead of being on a single device, are |
| distributed across all devices. This allows more parallelism when |
| writing, as two different block updates will quite possibly affect |
| parity blocks on different devices so there is less contention. |
| |
| This also allows more parallelism when reading, as read requests are |
| distributed over all the devices in the array instead of all but one. |
| |
| .SS RAID6 |
| |
| RAID6 is similar to RAID5, but can handle the loss of any \fItwo\fP |
| devices without data loss. Accordingly, it requires N+2 drives to |
| store N drives worth of data. |
| |
| The performance for RAID6 is slightly lower but comparable to RAID5 in |
| normal mode and single disk failure mode. It is very slow in dual |
| disk failure mode, however. |
| |
| .SS RAID10 |
| |
| RAID10 provides a combination of RAID1 and RAID0, and is sometimes known |
| as RAID1+0. Every datablock is duplicated some number of times, and |
| the resulting collection of datablocks are distributed over multiple |
| drives. |
| |
| When configuring a RAID10 array, it is necessary to specify the number |
| of replicas of each data block that are required (this will usually |
| be\ 2) and whether their layout should be "near", "far" or "offset" |
| (with "offset" being available since Linux\ 2.6.18). |
| |
| .B About the RAID10 Layout Examples: |
| .br |
| The examples below visualise the chunk distribution on the underlying |
| devices for the respective layout. |
| |
| For simplicity it is assumed that the size of the chunks equals the |
| size of the blocks of the underlying devices as well as those of the |
| RAID10 device exported by the kernel (for example \fB/dev/md/\fPname). |
| .br |
| Therefore the chunks\ /\ chunk numbers map directly to the blocks\ /\ |
| block addresses of the exported RAID10 device. |
| |
| Decimal numbers (0,\ 1, 2,\ ...) are the chunks of the RAID10 and due |
| to the above assumption also the blocks and block addresses of the |
| exported RAID10 device. |
| .br |
| Repeated numbers mean copies of a chunk\ /\ block (obviously on |
| different underlying devices). |
| .br |
| Hexadecimal numbers (0x00,\ 0x01, 0x02,\ ...) are the block addresses |
| of the underlying devices. |
| |
| .TP |
| \fB "near" Layout\fP |
| When "near" replicas are chosen, the multiple copies of a given chunk are laid |
| out consecutively ("as close to each other as possible") across the stripes of |
| the array. |
| |
| With an even number of devices, they will likely (unless some misalignment is |
| present) lay at the very same offset on the different devices. |
| .br |
| This is as the "classic" RAID1+0; that is two groups of mirrored devices (in the |
| example below the groups Device\ #1\ /\ #2 and Device\ #3\ /\ #4 are each a |
| RAID1) both in turn forming a striped RAID0. |
| |
| .ne 10 |
| .B Example with 2\ copies per chunk and an even number\ (4) of devices: |
| .TS |
| tab(;); |
| C - - - - |
| C | C | C | C | C | |
| | - | - | - | - | - | |
| | C | C | C | C | C | |
| | C | C | C | C | C | |
| | C | C | C | C | C | |
| | C | C | C | C | C | |
| | C | C | C | C | C | |
| | C | C | C | C | C | |
| | - | - | - | - | - | |
| C C S C S |
| C C S C S |
| C C S S S |
| C C S S S. |
| ; |
| ;Device #1;Device #2;Device #3;Device #4 |
| 0x00;0;0;1;1 |
| 0x01;2;2;3;3 |
| \.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\. |
| :;:;:;:;: |
| \.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\. |
| 0x80;254;254;255;255 |
| ;\\---------v---------/;\\---------v---------/ |
| ;RAID1;RAID1 |
| ;\\---------------------v---------------------/ |
| ;RAID0 |
| .TE |
| |
| .ne 10 |
| .B Example with 2\ copies per chunk and an odd number\ (5) of devices: |
| .TS |
| tab(;); |
| C - - - - - |
| C | C | C | C | C | C | |
| | - | - | - | - | - | - | |
| | C | C | C | C | C | C | |
| | C | C | C | C | C | C | |
| | C | C | C | C | C | C | |
| | C | C | C | C | C | C | |
| | C | C | C | C | C | C | |
| | C | C | C | C | C | C | |
| | - | - | - | - | - | - | |
| C. |
| ; |
| ;Dev #1;Dev #2;Dev #3;Dev #4;Dev #5 |
| 0x00;0;0;1;1;2 |
| 0x01;2;3;3;4;4 |
| \.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\. |
| :;:;:;:;:;: |
| \.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\. |
| 0x80;317;318;318;319;319 |
| ; |
| .TE |
| |
| .TP |
| \fB "far" Layout\fP |
| When "far" replicas are chosen, the multiple copies of a given chunk |
| are laid out quite distant ("as far as reasonably possible") from each |
| other. |
| |
| First a complete sequence of all data blocks (that is all the data one |
| sees on the exported RAID10 block device) is striped over the |
| devices. Then another (though "shifted") complete sequence of all data |
| blocks; and so on (in the case of more than 2\ copies per chunk). |
| |
| The "shift" needed to prevent placing copies of the same chunks on the |
| same devices is actually a cyclic permutation with offset\ 1 of each |
| of the stripes within a complete sequence of chunks. |
| .br |
| The offset\ 1 is relative to the previous complete sequence of chunks, |
| so in case of more than 2\ copies per chunk one gets the following |
| offsets: |
| .br |
| 1.\ complete sequence of chunks: offset\ =\ \ 0 |
| .br |
| 2.\ complete sequence of chunks: offset\ =\ \ 1 |
| .br |
| 3.\ complete sequence of chunks: offset\ =\ \ 2 |
| .br |
| : |
| .br |
| n.\ complete sequence of chunks: offset\ =\ n-1 |
| |
| .ne 10 |
| .B Example with 2\ copies per chunk and an even number\ (4) of devices: |
| .TS |
| tab(;); |
| C - - - - |
| C | C | C | C | C | |
| | - | - | - | - | - | |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | - | - | - | - | - | |
| C. |
| ; |
| ;Device #1;Device #2;Device #3;Device #4 |
| ; |
| 0x00;0;1;2;3;\\ |
| 0x01;4;5;6;7;> [#] |
| \.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;: |
| :;:;:;:;:;: |
| \.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;: |
| 0x40;252;253;254;255;/ |
| 0x41;3;0;1;2;\\ |
| 0x42;7;4;5;6;> [#]~ |
| \.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;: |
| :;:;:;:;:;: |
| \.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;: |
| 0x80;255;252;253;254;/ |
| ; |
| .TE |
| |
| .ne 10 |
| .B Example with 2\ copies per chunk and an odd number\ (5) of devices: |
| .TS |
| tab(;); |
| C - - - - - |
| C | C | C | C | C | C | |
| | - | - | - | - | - | - | |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | - | - | - | - | - | - | |
| C. |
| ; |
| ;Dev #1;Dev #2;Dev #3;Dev #4;Dev #5 |
| ; |
| 0x00;0;1;2;3;4;\\ |
| 0x01;5;6;7;8;9;> [#] |
| \.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;: |
| :;:;:;:;:;:;: |
| \.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;: |
| 0x40;315;316;317;318;319;/ |
| 0x41;4;0;1;2;3;\\ |
| 0x42;9;5;6;7;8;> [#]~ |
| \.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;: |
| :;:;:;:;:;:;: |
| \.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;: |
| 0x80;319;315;316;317;318;/ |
| ; |
| .TE |
| |
| With [#]\ being the complete sequence of chunks and [#]~\ the cyclic permutation |
| with offset\ 1 thereof (in the case of more than 2 copies per chunk there would |
| be ([#]~)~,\ (([#]~)~)~,\ ...). |
| |
| The advantage of this layout is that MD can easily spread sequential reads over |
| the devices, making them similar to RAID0 in terms of speed. |
| .br |
| The cost is more seeking for writes, making them substantially slower. |
| |
| .TP |
| \fB"offset" Layout\fP |
| When "offset" replicas are chosen, all the copies of a given chunk are |
| striped consecutively ("offset by the stripe length after each other") |
| over the devices. |
| |
| Explained in detail, <number of devices> consecutive chunks are |
| striped over the devices, immediately followed by a "shifted" copy of |
| these chunks (and by further such "shifted" copies in the case of more |
| than 2\ copies per chunk). |
| .br |
| This pattern repeats for all further consecutive chunks of the |
| exported RAID10 device (in other words: all further data blocks). |
| |
| The "shift" needed to prevent placing copies of the same chunks on the |
| same devices is actually a cyclic permutation with offset\ 1 of each |
| of the striped copies of <number of devices> consecutive chunks. |
| .br |
| The offset\ 1 is relative to the previous striped copy of <number of |
| devices> consecutive chunks, so in case of more than 2\ copies per |
| chunk one gets the following offsets: |
| .br |
| 1.\ <number of devices> consecutive chunks: offset\ =\ \ 0 |
| .br |
| 2.\ <number of devices> consecutive chunks: offset\ =\ \ 1 |
| .br |
| 3.\ <number of devices> consecutive chunks: offset\ =\ \ 2 |
| .br |
| : |
| .br |
| n.\ <number of devices> consecutive chunks: offset\ =\ n-1 |
| |
| .ne 10 |
| .B Example with 2\ copies per chunk and an even number\ (4) of devices: |
| .TS |
| tab(;); |
| C - - - - |
| C | C | C | C | C | |
| | - | - | - | - | - | |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | C | C | C | C | C | L |
| | - | - | - | - | - | |
| C. |
| ; |
| ;Device #1;Device #2;Device #3;Device #4 |
| ; |
| 0x00;0;1;2;3;) AA |
| 0x01;3;0;1;2;) AA~ |
| 0x02;4;5;6;7;) AB |
| 0x03;7;4;5;6;) AB~ |
| \.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;) \.\.\. |
| :;:;:;:;:; : |
| \.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;) \.\.\. |
| 0x79;251;252;253;254;) EX |
| 0x80;254;251;252;253;) EX~ |
| ; |
| .TE |
| |
| .ne 10 |
| .B Example with 2\ copies per chunk and an odd number\ (5) of devices: |
| .TS |
| tab(;); |
| C - - - - - |
| C | C | C | C | C | C | |
| | - | - | - | - | - | - | |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | C | C | C | C | C | C | L |
| | - | - | - | - | - | - | |
| C. |
| ; |
| ;Dev #1;Dev #2;Dev #3;Dev #4;Dev #5 |
| ; |
| 0x00;0;1;2;3;4;) AA |
| 0x01;4;0;1;2;3;) AA~ |
| 0x02;5;6;7;8;9;) AB |
| 0x03;9;5;6;7;8;) AB~ |
| \.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;) \.\.\. |
| :;:;:;:;:;:; : |
| \.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;\.\.\.;) \.\.\. |
| 0x79;314;315;316;317;318;) EX |
| 0x80;318;314;315;316;317;) EX~ |
| ; |
| .TE |
| |
| With AA,\ AB,\ ..., AZ,\ BA,\ ... being the sets of <number of devices> consecutive |
| chunks and AA~,\ AB~,\ ..., AZ~,\ BA~,\ ... the cyclic permutations with offset\ 1 |
| thereof (in the case of more than 2 copies per chunk there would be (AA~)~,\ ... |
| as well as ((AA~)~)~,\ ... and so on). |
| |
| This should give similar read characteristics to "far" if a suitably large chunk |
| size is used, but without as much seeking for writes. |
| .PP |
| |
| |
| It should be noted that the number of devices in a RAID10 array need |
| not be a multiple of the number of replica of each data block; however, |
| there must be at least as many devices as replicas. |
| |
| If, for example, an array is created with 5 devices and 2 replicas, |
| then space equivalent to 2.5 of the devices will be available, and |
| every block will be stored on two different devices. |
| |
| Finally, it is possible to have an array with both "near" and "far" |
| copies. If an array is configured with 2 near copies and 2 far |
| copies, then there will be a total of 4 copies of each block, each on |
| a different drive. This is an artifact of the implementation and is |
| unlikely to be of real value. |
| |
| .SS MULTIPATH |
| |
| MULTIPATH is not really a RAID at all as there is only one real device |
| in a MULTIPATH md array. However there are multiple access points |
| (paths) to this device, and one of these paths might fail, so there |
| are some similarities. |
| |
| A MULTIPATH array is composed of a number of logically different |
| devices, often fibre channel interfaces, that all refer the the same |
| real device. If one of these interfaces fails (e.g. due to cable |
| problems), the MULTIPATH driver will attempt to redirect requests to |
| another interface. |
| |
| The MULTIPATH drive is not receiving any ongoing development and |
| should be considered a legacy driver. The device-mapper based |
| multipath drivers should be preferred for new installations. |
| |
| .SS FAULTY |
| The FAULTY md module is provided for testing purposes. A FAULTY array |
| has exactly one component device and is normally assembled without a |
| superblock, so the md array created provides direct access to all of |
| the data in the component device. |
| |
| The FAULTY module may be requested to simulate faults to allow testing |
| of other md levels or of filesystems. Faults can be chosen to trigger |
| on read requests or write requests, and can be transient (a subsequent |
| read/write at the address will probably succeed) or persistent |
| (subsequent read/write of the same address will fail). Further, read |
| faults can be "fixable" meaning that they persist until a write |
| request at the same address. |
| |
| Fault types can be requested with a period. In this case, the fault |
| will recur repeatedly after the given number of requests of the |
| relevant type. For example if persistent read faults have a period of |
| 100, then every 100th read request would generate a fault, and the |
| faulty sector would be recorded so that subsequent reads on that |
| sector would also fail. |
| |
| There is a limit to the number of faulty sectors that are remembered. |
| Faults generated after this limit is exhausted are treated as |
| transient. |
| |
| The list of faulty sectors can be flushed, and the active list of |
| failure modes can be cleared. |
| |
| .SS UNCLEAN SHUTDOWN |
| |
| When changes are made to a RAID1, RAID4, RAID5, RAID6, or RAID10 array |
| there is a possibility of inconsistency for short periods of time as |
| each update requires at least two block to be written to different |
| devices, and these writes probably won't happen at exactly the same |
| time. Thus if a system with one of these arrays is shutdown in the |
| middle of a write operation (e.g. due to power failure), the array may |
| not be consistent. |
| |
| To handle this situation, the md driver marks an array as "dirty" |
| before writing any data to it, and marks it as "clean" when the array |
| is being disabled, e.g. at shutdown. If the md driver finds an array |
| to be dirty at startup, it proceeds to correct any possibly |
| inconsistency. For RAID1, this involves copying the contents of the |
| first drive onto all other drives. For RAID4, RAID5 and RAID6 this |
| involves recalculating the parity for each stripe and making sure that |
| the parity block has the correct data. For RAID10 it involves copying |
| one of the replicas of each block onto all the others. This process, |
| known as "resynchronising" or "resync" is performed in the background. |
| The array can still be used, though possibly with reduced performance. |
| |
| If a RAID4, RAID5 or RAID6 array is degraded (missing at least one |
| drive, two for RAID6) when it is restarted after an unclean shutdown, it cannot |
| recalculate parity, and so it is possible that data might be |
| undetectably corrupted. The 2.4 md driver |
| .B does not |
| alert the operator to this condition. The 2.6 md driver will fail to |
| start an array in this condition without manual intervention, though |
| this behaviour can be overridden by a kernel parameter. |
| |
| .SS RECOVERY |
| |
| If the md driver detects a write error on a device in a RAID1, RAID4, |
| RAID5, RAID6, or RAID10 array, it immediately disables that device |
| (marking it as faulty) and continues operation on the remaining |
| devices. If there are spare drives, the driver will start recreating |
| on one of the spare drives the data which was on that failed drive, |
| either by copying a working drive in a RAID1 configuration, or by |
| doing calculations with the parity block on RAID4, RAID5 or RAID6, or |
| by finding and copying originals for RAID10. |
| |
| In kernels prior to about 2.6.15, a read error would cause the same |
| effect as a write error. In later kernels, a read-error will instead |
| cause md to attempt a recovery by overwriting the bad block. i.e. it |
| will find the correct data from elsewhere, write it over the block |
| that failed, and then try to read it back again. If either the write |
| or the re-read fail, md will treat the error the same way that a write |
| error is treated, and will fail the whole device. |
| |
| While this recovery process is happening, the md driver will monitor |
| accesses to the array and will slow down the rate of recovery if other |
| activity is happening, so that normal access to the array will not be |
| unduly affected. When no other activity is happening, the recovery |
| process proceeds at full speed. The actual speed targets for the two |
| different situations can be controlled by the |
| .B speed_limit_min |
| and |
| .B speed_limit_max |
| control files mentioned below. |
| |
| .SS SCRUBBING AND MISMATCHES |
| |
| As storage devices can develop bad blocks at any time it is valuable |
| to regularly read all blocks on all devices in an array so as to catch |
| such bad blocks early. This process is called |
| .IR scrubbing . |
| |
| md arrays can be scrubbed by writing either |
| .I check |
| or |
| .I repair |
| to the file |
| .I md/sync_action |
| in the |
| .I sysfs |
| directory for the device. |
| |
| Requesting a scrub will cause |
| .I md |
| to read every block on every device in the array, and check that the |
| data is consistent. For RAID1 and RAID10, this means checking that the copies |
| are identical. For RAID4, RAID5, RAID6 this means checking that the |
| parity block is (or blocks are) correct. |
| |
| If a read error is detected during this process, the normal read-error |
| handling causes correct data to be found from other devices and to be |
| written back to the faulty device. In many case this will |
| effectively |
| .I fix |
| the bad block. |
| |
| If all blocks read successfully but are found to not be consistent, |
| then this is regarded as a |
| .IR mismatch . |
| |
| If |
| .I check |
| was used, then no action is taken to handle the mismatch, it is simply |
| recorded. |
| If |
| .I repair |
| was used, then a mismatch will be repaired in the same way that |
| .I resync |
| repairs arrays. For RAID5/RAID6 new parity blocks are written. For RAID1/RAID10, |
| all but one block are overwritten with the content of that one block. |
| |
| A count of mismatches is recorded in the |
| .I sysfs |
| file |
| .IR md/mismatch_cnt . |
| This is set to zero when a |
| scrub starts and is incremented whenever a sector is |
| found that is a mismatch. |
| .I md |
| normally works in units much larger than a single sector and when it |
| finds a mismatch, it does not determine exactly how many actual sectors were |
| affected but simply adds the number of sectors in the IO unit that was |
| used. So a value of 128 could simply mean that a single 64KB check |
| found an error (128 x 512bytes = 64KB). |
| |
| If an array is created by |
| .I mdadm |
| with |
| .I \-\-assume\-clean |
| then a subsequent check could be expected to find some mismatches. |
| |
| On a truly clean RAID5 or RAID6 array, any mismatches should indicate |
| a hardware problem at some level - software issues should never cause |
| such a mismatch. |
| |
| However on RAID1 and RAID10 it is possible for software issues to |
| cause a mismatch to be reported. This does not necessarily mean that |
| the data on the array is corrupted. It could simply be that the |
| system does not care what is stored on that part of the array - it is |
| unused space. |
| |
| The most likely cause for an unexpected mismatch on RAID1 or RAID10 |
| occurs if a swap partition or swap file is stored on the array. |
| |
| When the swap subsystem wants to write a page of memory out, it flags |
| the page as 'clean' in the memory manager and requests the swap device |
| to write it out. It is quite possible that the memory will be |
| changed while the write-out is happening. In that case the 'clean' |
| flag will be found to be clear when the write completes and so the |
| swap subsystem will simply forget that the swapout had been attempted, |
| and will possibly choose a different page to write out. |
| |
| If the swap device was on RAID1 (or RAID10), then the data is sent |
| from memory to a device twice (or more depending on the number of |
| devices in the array). Thus it is possible that the memory gets changed |
| between the times it is sent, so different data can be written to |
| the different devices in the array. This will be detected by |
| .I check |
| as a mismatch. However it does not reflect any corruption as the |
| block where this mismatch occurs is being treated by the swap system as |
| being empty, and the data will never be read from that block. |
| |
| It is conceivable for a similar situation to occur on non-swap files, |
| though it is less likely. |
| |
| Thus the |
| .I mismatch_cnt |
| value can not be interpreted very reliably on RAID1 or RAID10, |
| especially when the device is used for swap. |
| |
| |
| .SS BITMAP WRITE-INTENT LOGGING |
| |
| From Linux 2.6.13, |
| .I md |
| supports a bitmap based write-intent log. If configured, the bitmap |
| is used to record which blocks of the array may be out of sync. |
| Before any write request is honoured, md will make sure that the |
| corresponding bit in the log is set. After a period of time with no |
| writes to an area of the array, the corresponding bit will be cleared. |
| |
| This bitmap is used for two optimisations. |
| |
| Firstly, after an unclean shutdown, the resync process will consult |
| the bitmap and only resync those blocks that correspond to bits in the |
| bitmap that are set. This can dramatically reduce resync time. |
| |
| Secondly, when a drive fails and is removed from the array, md stops |
| clearing bits in the intent log. If that same drive is re-added to |
| the array, md will notice and will only recover the sections of the |
| drive that are covered by bits in the intent log that are set. This |
| can allow a device to be temporarily removed and reinserted without |
| causing an enormous recovery cost. |
| |
| The intent log can be stored in a file on a separate device, or it can |
| be stored near the superblocks of an array which has superblocks. |
| |
| It is possible to add an intent log to an active array, or remove an |
| intent log if one is present. |
| |
| In 2.6.13, intent bitmaps are only supported with RAID1. Other levels |
| with redundancy are supported from 2.6.15. |
| |
| .SS BAD BLOCK LIST |
| |
| From Linux 3.5 each device in an |
| .I md |
| array can store a list of known-bad-blocks. This list is 4K in size |
| and usually positioned at the end of the space between the superblock |
| and the data. |
| |
| When a block cannot be read and cannot be repaired by writing data |
| recovered from other devices, the address of the block is stored in |
| the bad block list. Similarly if an attempt to write a block fails, |
| the address will be recorded as a bad block. If attempting to record |
| the bad block fails, the whole device will be marked faulty. |
| |
| Attempting to read from a known bad block will cause a read error. |
| Attempting to write to a known bad block will be ignored if any write |
| errors have been reported by the device. If there have been no write |
| errors then the data will be written to the known bad block and if |
| that succeeds, the address will be removed from the list. |
| |
| This allows an array to fail more gracefully - a few blocks on |
| different devices can be faulty without taking the whole array out of |
| action. |
| |
| The list is particularly useful when recovering to a spare. If a few blocks |
| cannot be read from the other devices, the bulk of the recovery can |
| complete and those few bad blocks will be recorded in the bad block list. |
| |
| .SS RAID WRITE HOLE |
| |
| Due to non-atomicity nature of RAID write operations, |
| interruption of write operations (system crash, etc.) to RAID456 |
| array can lead to inconsistent parity and data loss (so called |
| RAID-5 write hole). |
| To plug the write hole md supports two mechanisms described below. |
| |
| .TP |
| DIRTY STRIPE JOURNAL |
| From Linux 4.4, md supports write ahead journal for RAID456. |
| When the array is created, an additional journal device can be added to |
| the array through write-journal option. The RAID write journal works |
| similar to file system journals. Before writing to the data |
| disks, md persists data AND parity of the stripe to the journal |
| device. After crashes, md searches the journal device for |
| incomplete write operations, and replay them to the data disks. |
| |
| When the journal device fails, the RAID array is forced to run in |
| read-only mode. |
| |
| .TP |
| PARTIAL PARITY LOG |
| From Linux 4.12 md supports Partial Parity Log (PPL) for RAID5 arrays only. |
| Partial parity for a write operation is the XOR of stripe data chunks not |
| modified by the write. PPL is stored in the metadata region of RAID member drives, |
| no additional journal drive is needed. |
| After crashes, if one of the not modified data disks of |
| the stripe is missing, this updated parity can be used to recover its |
| data. |
| |
| This mechanism is documented more fully in the file |
| Documentation/md/raid5-ppl.rst |
| |
| .SS WRITE-BEHIND |
| |
| From Linux 2.6.14, |
| .I md |
| supports WRITE-BEHIND on RAID1 arrays. |
| |
| This allows certain devices in the array to be flagged as |
| .IR write-mostly . |
| MD will only read from such devices if there is no |
| other option. |
| |
| If a write-intent bitmap is also provided, write requests to |
| write-mostly devices will be treated as write-behind requests and md |
| will not wait for writes to those requests to complete before |
| reporting the write as complete to the filesystem. |
| |
| This allows for a RAID1 with WRITE-BEHIND to be used to mirror data |
| over a slow link to a remote computer (providing the link isn't too |
| slow). The extra latency of the remote link will not slow down normal |
| operations, but the remote system will still have a reasonably |
| up-to-date copy of all data. |
| |
| .SS FAILFAST |
| |
| From Linux 4.10, |
| .I |
| md |
| supports FAILFAST for RAID1 and RAID10 arrays. This is a flag that |
| can be set on individual drives, though it is usually set on all |
| drives, or no drives. |
| |
| When |
| .I md |
| sends an I/O request to a drive that is marked as FAILFAST, and when |
| the array could survive the loss of that drive without losing data, |
| .I md |
| will request that the underlying device does not perform any retries. |
| This means that a failure will be reported to |
| .I md |
| promptly, and it can mark the device as faulty and continue using the |
| other device(s). |
| .I md |
| cannot control the timeout that the underlying devices use to |
| determine failure. Any changes desired to that timeout must be set |
| explictly on the underlying device, separately from using |
| .IR mdadm . |
| |
| If a FAILFAST request does fail, and if it is still safe to mark the |
| device as faulty without data loss, that will be done and the array |
| will continue functioning on a reduced number of devices. If it is not |
| possible to safely mark the device as faulty, |
| .I md |
| will retry the request without disabling retries in the underlying |
| device. In any case, |
| .I md |
| will not attempt to repair read errors on a device marked as FAILFAST |
| by writing out the correct. It will just mark the device as faulty. |
| |
| FAILFAST is appropriate for storage arrays that have a low probability |
| of true failure, but will sometimes introduce unacceptable delays to |
| I/O requests while performing internal maintenance. The value of |
| setting FAILFAST involves a trade-off. The gain is that the chance of |
| unacceptable delays is substantially reduced. The cost is that the |
| unlikely event of data-loss on one device is slightly more likely to |
| result in data-loss for the array. |
| |
| When a device in an array using FAILFAST is marked as faulty, it will |
| usually become usable again in a short while. |
| .I mdadm |
| makes no attempt to detect that possibility. Some separate |
| mechanism, tuned to the specific details of the expected failure modes, |
| needs to be created to monitor devices to see when they return to full |
| functionality, and to then re-add them to the array. In order of |
| this "re-add" functionality to be effective, an array using FAILFAST |
| should always have a write-intent bitmap. |
| |
| .SS RESTRIPING |
| |
| .IR Restriping , |
| also known as |
| .IR Reshaping , |
| is the processes of re-arranging the data stored in each stripe into a |
| new layout. This might involve changing the number of devices in the |
| array (so the stripes are wider), changing the chunk size (so stripes |
| are deeper or shallower), or changing the arrangement of data and |
| parity (possibly changing the RAID level, e.g. 1 to 5 or 5 to 6). |
| |
| As of Linux 2.6.35, md can reshape a RAID4, RAID5, or RAID6 array to |
| have a different number of devices (more or fewer) and to have a |
| different layout or chunk size. It can also convert between these |
| different RAID levels. It can also convert between RAID0 and RAID10, |
| and between RAID0 and RAID4 or RAID5. |
| Other possibilities may follow in future kernels. |
| |
| During any stripe process there is a 'critical section' during which |
| live data is being overwritten on disk. For the operation of |
| increasing the number of drives in a RAID5, this critical section |
| covers the first few stripes (the number being the product of the old |
| and new number of devices). After this critical section is passed, |
| data is only written to areas of the array which no longer hold live |
| data \(em the live data has already been located away. |
| |
| For a reshape which reduces the number of devices, the 'critical |
| section' is at the end of the reshape process. |
| |
| md is not able to ensure data preservation if there is a crash |
| (e.g. power failure) during the critical section. If md is asked to |
| start an array which failed during a critical section of restriping, |
| it will fail to start the array. |
| |
| To deal with this possibility, a user-space program must |
| .IP \(bu 4 |
| Disable writes to that section of the array (using the |
| .B sysfs |
| interface), |
| .IP \(bu 4 |
| take a copy of the data somewhere (i.e. make a backup), |
| .IP \(bu 4 |
| allow the process to continue and invalidate the backup and restore |
| write access once the critical section is passed, and |
| .IP \(bu 4 |
| provide for restoring the critical data before restarting the array |
| after a system crash. |
| .PP |
| |
| .B mdadm |
| versions from 2.4 do this for growing a RAID5 array. |
| |
| For operations that do not change the size of the array, like simply |
| increasing chunk size, or converting RAID5 to RAID6 with one extra |
| device, the entire process is the critical section. In this case, the |
| restripe will need to progress in stages, as a section is suspended, |
| backed up, restriped, and released. |
| |
| .SS SYSFS INTERFACE |
| Each block device appears as a directory in |
| .I sysfs |
| (which is usually mounted at |
| .BR /sys ). |
| For MD devices, this directory will contain a subdirectory called |
| .B md |
| which contains various files for providing access to information about |
| the array. |
| |
| This interface is documented more fully in the file |
| .B Documentation/admin-guide/md.rst |
| which is distributed with the kernel sources. That file should be |
| consulted for full documentation. The following are just a selection |
| of attribute files that are available. |
| |
| .TP |
| .B md/sync_speed_min |
| This value, if set, overrides the system-wide setting in |
| .B /proc/sys/dev/raid/speed_limit_min |
| for this array only. |
| Writing the value |
| .B "system" |
| to this file will cause the system-wide setting to have effect. |
| |
| .TP |
| .B md/sync_speed_max |
| This is the partner of |
| .B md/sync_speed_min |
| and overrides |
| .B /proc/sys/dev/raid/speed_limit_max |
| described below. |
| |
| .TP |
| .B md/sync_action |
| This can be used to monitor and control the resync/recovery process of |
| MD. |
| In particular, writing "check" here will cause the array to read all |
| data block and check that they are consistent (e.g. parity is correct, |
| or all mirror replicas are the same). Any discrepancies found are |
| .B NOT |
| corrected. |
| |
| A count of problems found will be stored in |
| .BR md/mismatch_count . |
| |
| Alternately, "repair" can be written which will cause the same check |
| to be performed, but any errors will be corrected. |
| |
| Finally, "idle" can be written to stop the check/repair process. |
| |
| .TP |
| .B md/stripe_cache_size |
| This is only available on RAID5 and RAID6. It records the size (in |
| pages per device) of the stripe cache which is used for synchronising |
| all write operations to the array and all read operations if the array |
| is degraded. The default is 256. Valid values are 17 to 32768. |
| Increasing this number can increase performance in some situations, at |
| some cost in system memory. Note, setting this value too high can |
| result in an "out of memory" condition for the system. |
| |
| memory_consumed = system_page_size * nr_disks * stripe_cache_size |
| |
| .TP |
| .B md/preread_bypass_threshold |
| This is only available on RAID5 and RAID6. This variable sets the |
| number of times MD will service a full-stripe-write before servicing a |
| stripe that requires some "prereading". For fairness this defaults to |
| 1. Valid values are 0 to stripe_cache_size. Setting this to 0 |
| maximizes sequential-write throughput at the cost of fairness to threads |
| doing small or random writes. |
| |
| .TP |
| .B md/bitmap/backlog |
| The value stored in the file only has any effect on RAID1 when write-mostly |
| devices are active, and write requests to those devices are proceed in the |
| background. |
| |
| This variable sets a limit on the number of concurrent background writes, |
| the valid values are 0 to 16383, 0 means that write-behind is not allowed, |
| while any other number means it can happen. If there are more write requests |
| than the number, new writes will by synchronous. |
| |
| .TP |
| .B md/bitmap/can_clear |
| This is for externally managed bitmaps, where the kernel writes the bitmap |
| itself, but metadata describing the bitmap is managed by mdmon or similar. |
| |
| When the array is degraded, bits mustn't be cleared. When the array becomes |
| optimal again, bit can be cleared, but first the metadata needs to record |
| the current event count. So md sets this to 'false' and notifies mdmon, |
| then mdmon updates the metadata and writes 'true'. |
| |
| There is no code in mdmon to actually do this, so maybe it doesn't even |
| work. |
| |
| .TP |
| .B md/bitmap/chunksize |
| The bitmap chunksize can only be changed when no bitmap is active, and |
| the value should be power of 2 and at least 512. |
| |
| .TP |
| .B md/bitmap/location |
| This indicates where the write-intent bitmap for the array is stored. |
| It can be "none" or "file" or a signed offset from the array metadata |
| - measured in sectors. You cannot set a file by writing here - that can |
| only be done with the SET_BITMAP_FILE ioctl. |
| |
| Write 'none' to 'bitmap/location' will clear bitmap, and the previous |
| location value must be write to it to restore bitmap. |
| |
| .TP |
| .B md/bitmap/max_backlog_used |
| This keeps track of the maximum number of concurrent write-behind requests |
| for an md array, writing any value to this file will clear it. |
| |
| .TP |
| .B md/bitmap/metadata |
| This can be 'internal' or 'clustered' or 'external'. 'internal' is set |
| by default, which means the metadata for bitmap is stored in the first 256 |
| bytes of the bitmap space. 'clustered' means separate bitmap metadata are |
| used for each cluster node. 'external' means that bitmap metadata is managed |
| externally to the kernel. |
| |
| .TP |
| .B md/bitmap/space |
| This shows the space (in sectors) which is available at md/bitmap/location, |
| and allows the kernel to know when it is safe to resize the bitmap to match |
| a resized array. It should big enough to contain the total bytes in the bitmap. |
| |
| For 1.0 metadata, assume we can use up to the superblock if before, else |
| to 4K beyond superblock. For other metadata versions, assume no change is |
| possible. |
| |
| .TP |
| .B md/bitmap/time_base |
| This shows the time (in seconds) between disk flushes, and is used to looking |
| for bits in the bitmap to be cleared. |
| |
| The default value is 5 seconds, and it should be an unsigned long value. |
| |
| .SS KERNEL PARAMETERS |
| |
| The md driver recognised several different kernel parameters. |
| .TP |
| .B raid=noautodetect |
| This will disable the normal detection of md arrays that happens at |
| boot time. If a drive is partitioned with MS-DOS style partitions, |
| then if any of the 4 main partitions has a partition type of 0xFD, |
| then that partition will normally be inspected to see if it is part of |
| an MD array, and if any full arrays are found, they are started. This |
| kernel parameter disables this behaviour. |
| |
| .TP |
| .B raid=partitionable |
| .TP |
| .B raid=part |
| These are available in 2.6 and later kernels only. They indicate that |
| autodetected MD arrays should be created as partitionable arrays, with |
| a different major device number to the original non-partitionable md |
| arrays. The device number is listed as |
| .I mdp |
| in |
| .IR /proc/devices . |
| |
| .TP |
| .B md_mod.start_ro=1 |
| .TP |
| .B /sys/module/md_mod/parameters/start_ro |
| This tells md to start all arrays in read-only mode. This is a soft |
| read-only that will automatically switch to read-write on the first |
| write request. However until that write request, nothing is written |
| to any device by md, and in particular, no resync or recovery |
| operation is started. |
| |
| .TP |
| .B md_mod.start_dirty_degraded=1 |
| .TP |
| .B /sys/module/md_mod/parameters/start_dirty_degraded |
| As mentioned above, md will not normally start a RAID4, RAID5, or |
| RAID6 that is both dirty and degraded as this situation can imply |
| hidden data loss. This can be awkward if the root filesystem is |
| affected. Using this module parameter allows such arrays to be started |
| at boot time. It should be understood that there is a real (though |
| small) risk of data corruption in this situation. |
| |
| .TP |
| .BI md= n , dev , dev ,... |
| .TP |
| .BI md=d n , dev , dev ,... |
| This tells the md driver to assemble |
| .B /dev/md n |
| from the listed devices. It is only necessary to start the device |
| holding the root filesystem this way. Other arrays are best started |
| once the system is booted. |
| |
| In 2.6 kernels, the |
| .B d |
| immediately after the |
| .B = |
| indicates that a partitionable device (e.g. |
| .BR /dev/md/d0 ) |
| should be created rather than the original non-partitionable device. |
| |
| .TP |
| .BI md= n , l , c , i , dev... |
| This tells the md driver to assemble a legacy RAID0 or LINEAR array |
| without a superblock. |
| .I n |
| gives the md device number, |
| .I l |
| gives the level, 0 for RAID0 or \-1 for LINEAR, |
| .I c |
| gives the chunk size as a base-2 logarithm offset by twelve, so 0 |
| means 4K, 1 means 8K. |
| .I i |
| is ignored (legacy support). |
| |
| .SH FILES |
| .TP |
| .B /proc/mdstat |
| Contains information about the status of currently running array. |
| .TP |
| .B /proc/sys/dev/raid/speed_limit_min |
| A readable and writable file that reflects the current "goal" rebuild |
| speed for times when non-rebuild activity is current on an array. |
| The speed is in Kibibytes per second, and is a per-device rate, not a |
| per-array rate (which means that an array with more disks will shuffle |
| more data for a given speed). The default is 1000. |
| |
| .TP |
| .B /proc/sys/dev/raid/speed_limit_max |
| A readable and writable file that reflects the current "goal" rebuild |
| speed for times when no non-rebuild activity is current on an array. |
| The default is 200,000. |
| |
| .SH SEE ALSO |
| .BR mdadm (8), |