| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2022-49266: block: fix rq-qos breakage from skipping rq_qos_done_bio() |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| block: fix rq-qos breakage from skipping rq_qos_done_bio() |
| |
| a647a524a467 ("block: don't call rq_qos_ops->done_bio if the bio isn't |
| tracked") made bio_endio() skip rq_qos_done_bio() if BIO_TRACKED is not set. |
| While this fixed a potential oops, it also broke blk-iocost by skipping the |
| done_bio callback for merged bios. |
| |
| Before, whether a bio goes through rq_qos_throttle() or rq_qos_merge(), |
| rq_qos_done_bio() would be called on the bio on completion with BIO_TRACKED |
| distinguishing the former from the latter. rq_qos_done_bio() is not called |
| for bios which wenth through rq_qos_merge(). This royally confuses |
| blk-iocost as the merged bios never finish and are considered perpetually |
| in-flight. |
| |
| One reliably reproducible failure mode is an intermediate cgroup geting |
| stuck active preventing its children from being activated due to the |
| leaf-only rule, leading to loss of control. The following is from |
| resctl-bench protection scenario which emulates isolating a web server like |
| workload from a memory bomb run on an iocost configuration which should |
| yield a reasonable level of protection. |
| |
| # cat /sys/block/nvme2n1/device/model |
| Samsung SSD 970 PRO 512GB |
| # cat /sys/fs/cgroup/io.cost.model |
| 259:0 ctrl=user model=linear rbps=834913556 rseqiops=93622 rrandiops=102913 wbps=618985353 wseqiops=72325 wrandiops=71025 |
| # cat /sys/fs/cgroup/io.cost.qos |
| 259:0 enable=1 ctrl=user rpct=95.00 rlat=18776 wpct=95.00 wlat=8897 min=60.00 max=100.00 |
| # resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1 |
| ... |
| Memory Hog Summary |
| ================== |
| |
| IO Latency: R p50=242u:336u/2.5m p90=794u:1.4m/7.5m p99=2.7m:8.0m/62.5m max=8.0m:36.4m/350m |
| W p50=221u:323u/1.5m p90=709u:1.2m/5.5m p99=1.5m:2.5m/9.5m max=6.9m:35.9m/350m |
| |
| Isolation and Request Latency Impact Distributions: |
| |
| min p01 p05 p10 p25 p50 p75 p90 p95 p99 max mean stdev |
| isol% 15.90 15.90 15.90 40.05 57.24 59.07 60.01 74.63 74.63 90.35 90.35 58.12 15.82 |
| lat-imp% 0 0 0 0 0 4.55 14.68 15.54 233.5 548.1 548.1 53.88 143.6 |
| |
| Result: isol=58.12:15.82% lat_imp=53.88%:143.6 work_csv=100.0% missing=3.96% |
| |
| The isolation result of 58.12% is close to what this device would show |
| without any IO control. |
| |
| Fix it by introducing a new flag BIO_QOS_MERGED to mark merged bios and |
| calling rq_qos_done_bio() on them too. For consistency and clarity, rename |
| BIO_TRACKED to BIO_QOS_THROTTLED. The flag checks are moved into |
| rq_qos_done_bio() so that it's next to the code paths that set the flags. |
| |
| With the patch applied, the above same benchmark shows: |
| |
| # resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1 |
| ... |
| Memory Hog Summary |
| ================== |
| |
| IO Latency: R p50=123u:84.4u/985u p90=322u:256u/2.5m p99=1.6m:1.4m/9.5m max=11.1m:36.0m/350m |
| W p50=429u:274u/995u p90=1.7m:1.3m/4.5m p99=3.4m:2.7m/11.5m max=7.9m:5.9m/26.5m |
| |
| Isolation and Request Latency Impact Distributions: |
| |
| min p01 p05 p10 p25 p50 p75 p90 p95 p99 max mean stdev |
| isol% 84.91 84.91 89.51 90.73 92.31 94.49 96.36 98.04 98.71 100.0 100.0 94.42 2.81 |
| lat-imp% 0 0 0 0 0 2.81 5.73 11.11 13.92 17.53 22.61 4.10 4.68 |
| |
| Result: isol=94.42:2.81% lat_imp=4.10%:4.68 work_csv=58.34% missing=0% |
| |
| The Linux kernel CVE team has assigned CVE-2022-49266 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 5.15 with commit a647a524a46736786c95cdb553a070322ca096e3 and fixed in 5.15.54 with commit af9452dfdba4bf7359ef7645eee2d243a1df0649 |
| Issue introduced in 5.15 with commit a647a524a46736786c95cdb553a070322ca096e3 and fixed in 5.16.19 with commit dbd20bb904ad5731aaca8d009367a930d6ada111 |
| Issue introduced in 5.15 with commit a647a524a46736786c95cdb553a070322ca096e3 and fixed in 5.17.2 with commit 09737db4c891eba25e6f6383a7c38afd4acc883f |
| Issue introduced in 5.15 with commit a647a524a46736786c95cdb553a070322ca096e3 and fixed in 5.18 with commit aa1b46dcdc7baaf5fec0be25782ef24b26aa209e |
| Issue introduced in 5.14.11 with commit 004b8f8a691205a93d9e80d98b786b2b97424d6e |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2022-49266 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| block/bio.c |
| block/blk-iolatency.c |
| block/blk-rq-qos.h |
| include/linux/blk_types.h |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/af9452dfdba4bf7359ef7645eee2d243a1df0649 |
| https://git.kernel.org/stable/c/dbd20bb904ad5731aaca8d009367a930d6ada111 |
| https://git.kernel.org/stable/c/09737db4c891eba25e6f6383a7c38afd4acc883f |
| https://git.kernel.org/stable/c/aa1b46dcdc7baaf5fec0be25782ef24b26aa209e |