|  | Automatically bind swap device to numa node | 
|  | ------------------------------------------- | 
|  |  | 
|  | If the system has more than one swap device and swap device has the node | 
|  | information, we can make use of this information to decide which swap | 
|  | device to use in get_swap_pages() to get better performance. | 
|  |  | 
|  |  | 
|  | How to use this feature | 
|  | ----------------------- | 
|  |  | 
|  | Swap device has priority and that decides the order of it to be used. To make | 
|  | use of automatically binding, there is no need to manipulate priority settings | 
|  | for swap devices. e.g. on a 2 node machine, assume 2 swap devices swapA and | 
|  | swapB, with swapA attached to node 0 and swapB attached to node 1, are going | 
|  | to be swapped on. Simply swapping them on by doing: | 
|  | # swapon /dev/swapA | 
|  | # swapon /dev/swapB | 
|  |  | 
|  | Then node 0 will use the two swap devices in the order of swapA then swapB and | 
|  | node 1 will use the two swap devices in the order of swapB then swapA. Note | 
|  | that the order of them being swapped on doesn't matter. | 
|  |  | 
|  | A more complex example on a 4 node machine. Assume 6 swap devices are going to | 
|  | be swapped on: swapA and swapB are attached to node 0, swapC is attached to | 
|  | node 1, swapD and swapE are attached to node 2 and swapF is attached to node3. | 
|  | The way to swap them on is the same as above: | 
|  | # swapon /dev/swapA | 
|  | # swapon /dev/swapB | 
|  | # swapon /dev/swapC | 
|  | # swapon /dev/swapD | 
|  | # swapon /dev/swapE | 
|  | # swapon /dev/swapF | 
|  |  | 
|  | Then node 0 will use them in the order of: | 
|  | swapA/swapB -> swapC -> swapD -> swapE -> swapF | 
|  | swapA and swapB will be used in a round robin mode before any other swap device. | 
|  |  | 
|  | node 1 will use them in the order of: | 
|  | swapC -> swapA -> swapB -> swapD -> swapE -> swapF | 
|  |  | 
|  | node 2 will use them in the order of: | 
|  | swapD/swapE -> swapA -> swapB -> swapC -> swapF | 
|  | Similaly, swapD and swapE will be used in a round robin mode before any | 
|  | other swap devices. | 
|  |  | 
|  | node 3 will use them in the order of: | 
|  | swapF -> swapA -> swapB -> swapC -> swapD -> swapE | 
|  |  | 
|  |  | 
|  | Implementation details | 
|  | ---------------------- | 
|  |  | 
|  | The current code uses a priority based list, swap_avail_list, to decide | 
|  | which swap device to use and if multiple swap devices share the same | 
|  | priority, they are used round robin. This change here replaces the single | 
|  | global swap_avail_list with a per-numa-node list, i.e. for each numa node, | 
|  | it sees its own priority based list of available swap devices. Swap | 
|  | device's priority can be promoted on its matching node's swap_avail_list. | 
|  |  | 
|  | The current swap device's priority is set as: user can set a >=0 value, | 
|  | or the system will pick one starting from -1 then downwards. The priority | 
|  | value in the swap_avail_list is the negated value of the swap device's | 
|  | due to plist being sorted from low to high. The new policy doesn't change | 
|  | the semantics for priority >=0 cases, the previous starting from -1 then | 
|  | downwards now becomes starting from -2 then downwards and -1 is reserved | 
|  | as the promoted value. So if multiple swap devices are attached to the same | 
|  | node, they will all be promoted to priority -1 on that node's plist and will | 
|  | be used round robin before any other swap devices. |