| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2022-49394: blk-iolatency: Fix inflight count imbalances and IO hangs on offline |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| blk-iolatency: Fix inflight count imbalances and IO hangs on offline |
| |
| iolatency needs to track the number of inflight IOs per cgroup. As this |
| tracking can be expensive, it is disabled when no cgroup has iolatency |
| configured for the device. To ensure that the inflight counters stay |
| balanced, iolatency_set_limit() freezes the request_queue while manipulating |
| the enabled counter, which ensures that no IO is in flight and thus all |
| counters are zero. |
| |
| Unfortunately, iolatency_set_limit() isn't the only place where the enabled |
| counter is manipulated. iolatency_pd_offline() can also dec the counter and |
| trigger disabling. As this disabling happens without freezing the q, this |
| can easily happen while some IOs are in flight and thus leak the counts. |
| |
| This can be easily demonstrated by turning on iolatency on an one empty |
| cgroup while IOs are in flight in other cgroups and then removing the |
| cgroup. Note that iolatency shouldn't have been enabled elsewhere in the |
| system to ensure that removing the cgroup disables iolatency for the whole |
| device. |
| |
| The following keeps flipping on and off iolatency on sda: |
| |
| echo +io > /sys/fs/cgroup/cgroup.subtree_control |
| while true; do |
| mkdir -p /sys/fs/cgroup/test |
| echo '8:0 target=100000' > /sys/fs/cgroup/test/io.latency |
| sleep 1 |
| rmdir /sys/fs/cgroup/test |
| sleep 1 |
| done |
| |
| and there's concurrent fio generating direct rand reads: |
| |
| fio --name test --filename=/dev/sda --direct=1 --rw=randread \ |
| --runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k |
| |
| while monitoring with the following drgn script: |
| |
| while True: |
| for css in css_for_each_descendant_pre(prog['blkcg_root'].css.address_of_()): |
| for pos in hlist_for_each(container_of(css, 'struct blkcg', 'css').blkg_list): |
| blkg = container_of(pos, 'struct blkcg_gq', 'blkcg_node') |
| pd = blkg.pd[prog['blkcg_policy_iolatency'].plid] |
| if pd.value_() == 0: |
| continue |
| iolat = container_of(pd, 'struct iolatency_grp', 'pd') |
| inflight = iolat.rq_wait.inflight.counter.value_() |
| if inflight: |
| print(f'inflight={inflight} {disk_name(blkg.q.disk).decode("utf-8")} ' |
| f'{cgroup_path(css.cgroup).decode("utf-8")}') |
| time.sleep(1) |
| |
| The monitoring output looks like the following: |
| |
| inflight=1 sda /user.slice |
| inflight=1 sda /user.slice |
| ... |
| inflight=14 sda /user.slice |
| inflight=13 sda /user.slice |
| inflight=17 sda /user.slice |
| inflight=15 sda /user.slice |
| inflight=18 sda /user.slice |
| inflight=17 sda /user.slice |
| inflight=20 sda /user.slice |
| inflight=19 sda /user.slice <- fio stopped, inflight stuck at 19 |
| inflight=19 sda /user.slice |
| inflight=19 sda /user.slice |
| |
| If a cgroup with stuck inflight ends up getting throttled, the throttled IOs |
| will never get issued as there's no completion event to wake it up leading |
| to an indefinite hang. |
| |
| This patch fixes the bug by unifying enable handling into a work item which |
| is automatically kicked off from iolatency_set_min_lat_nsec() which is |
| called from both iolatency_set_limit() and iolatency_pd_offline() paths. |
| Punting to a work item is necessary as iolatency_pd_offline() is called |
| under spinlocks while freezing a request_queue requires a sleepable context. |
| |
| This also simplifies the code reducing LOC sans the comments and avoids the |
| unnecessary freezes which were happening whenever a cgroup's latency target |
| is newly set or cleared. |
| |
| The Linux kernel CVE team has assigned CVE-2022-49394 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 4.19.29 with commit 6d482bc5697763eb1214f207286daa201b32d20a and fixed in 4.19.247 with commit 515d077ee3085ae343b6bea7fd031f9906645f38 |
| Issue introduced in 5.0 with commit 8c772a9bfc7c07c76f4a58b58910452fbb20843b and fixed in 5.4.198 with commit d19fa8f252000d141f9199ca32959c50314e1f05 |
| Issue introduced in 5.0 with commit 8c772a9bfc7c07c76f4a58b58910452fbb20843b and fixed in 5.10.121 with commit 77692c02e1517c54f2fd0535f41aa4286ac9f140 |
| Issue introduced in 5.0 with commit 8c772a9bfc7c07c76f4a58b58910452fbb20843b and fixed in 5.15.46 with commit a30acbb5dfb7bcc813ad6a18ca31011ac44e5547 |
| Issue introduced in 5.0 with commit 8c772a9bfc7c07c76f4a58b58910452fbb20843b and fixed in 5.17.14 with commit 968f7a239c590454ffba79c126fbe0e963a0ba78 |
| Issue introduced in 5.0 with commit 8c772a9bfc7c07c76f4a58b58910452fbb20843b and fixed in 5.18.3 with commit 5b0ff3ebbef791341695b718f8d2870869cf1d01 |
| Issue introduced in 5.0 with commit 8c772a9bfc7c07c76f4a58b58910452fbb20843b and fixed in 5.19 with commit 8a177a36da6c54c98b8685d4f914cb3637d53c0d |
| Issue introduced in 4.20.16 with commit beed6109acd4efc2f1717c31bddcd0ad7ebbf253 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2022-49394 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| block/blk-iolatency.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/515d077ee3085ae343b6bea7fd031f9906645f38 |
| https://git.kernel.org/stable/c/d19fa8f252000d141f9199ca32959c50314e1f05 |
| https://git.kernel.org/stable/c/77692c02e1517c54f2fd0535f41aa4286ac9f140 |
| https://git.kernel.org/stable/c/a30acbb5dfb7bcc813ad6a18ca31011ac44e5547 |
| https://git.kernel.org/stable/c/968f7a239c590454ffba79c126fbe0e963a0ba78 |
| https://git.kernel.org/stable/c/5b0ff3ebbef791341695b718f8d2870869cf1d01 |
| https://git.kernel.org/stable/c/8a177a36da6c54c98b8685d4f914cb3637d53c0d |