| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-53219: virtiofs: use pages instead of pointer for kernel direct IO |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| virtiofs: use pages instead of pointer for kernel direct IO |
| |
| When trying to insert a 10MB kernel module kept in a virtio-fs with cache |
| disabled, the following warning was reported: |
| |
| ------------[ cut here ]------------ |
| WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 ...... |
| Modules linked in: |
| CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123 |
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ...... |
| RIP: 0010:__alloc_pages+0x2bf/0x380 |
| ...... |
| Call Trace: |
| <TASK> |
| ? __warn+0x8e/0x150 |
| ? __alloc_pages+0x2bf/0x380 |
| __kmalloc_large_node+0x86/0x160 |
| __kmalloc+0x33c/0x480 |
| virtio_fs_enqueue_req+0x240/0x6d0 |
| virtio_fs_wake_pending_and_unlock+0x7f/0x190 |
| queue_request_and_unlock+0x55/0x60 |
| fuse_simple_request+0x152/0x2b0 |
| fuse_direct_io+0x5d2/0x8c0 |
| fuse_file_read_iter+0x121/0x160 |
| __kernel_read+0x151/0x2d0 |
| kernel_read+0x45/0x50 |
| kernel_read_file+0x1a9/0x2a0 |
| init_module_from_file+0x6a/0xe0 |
| idempotent_init_module+0x175/0x230 |
| __x64_sys_finit_module+0x5d/0xb0 |
| x64_sys_call+0x1c3/0x9e0 |
| do_syscall_64+0x3d/0xc0 |
| entry_SYSCALL_64_after_hwframe+0x4b/0x53 |
| ...... |
| </TASK> |
| ---[ end trace 0000000000000000 ]--- |
| |
| The warning is triggered as follows: |
| |
| 1) syscall finit_module() handles the module insertion and it invokes |
| kernel_read_file() to read the content of the module first. |
| |
| 2) kernel_read_file() allocates a 10MB buffer by using vmalloc() and |
| passes it to kernel_read(). kernel_read() constructs a kvec iter by |
| using iov_iter_kvec() and passes it to fuse_file_read_iter(). |
| |
| 3) virtio-fs disables the cache, so fuse_file_read_iter() invokes |
| fuse_direct_io(). As for now, the maximal read size for kvec iter is |
| only limited by fc->max_read. For virtio-fs, max_read is UINT_MAX, so |
| fuse_direct_io() doesn't split the 10MB buffer. It saves the address and |
| the size of the 10MB-sized buffer in out_args[0] of a fuse request and |
| passes the fuse request to virtio_fs_wake_pending_and_unlock(). |
| |
| 4) virtio_fs_wake_pending_and_unlock() uses virtio_fs_enqueue_req() to |
| queue the request. Because virtiofs need DMA-able address, so |
| virtio_fs_enqueue_req() uses kmalloc() to allocate a bounce buffer for |
| all fuse args, copies these args into the bounce buffer and passed the |
| physical address of the bounce buffer to virtiofsd. The total length of |
| these fuse args for the passed fuse request is about 10MB, so |
| copy_args_to_argbuf() invokes kmalloc() with a 10MB size parameter and |
| it triggers the warning in __alloc_pages(): |
| |
| if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)) |
| return NULL; |
| |
| 5) virtio_fs_enqueue_req() will retry the memory allocation in a |
| kworker, but it won't help, because kmalloc() will always return NULL |
| due to the abnormal size and finit_module() will hang forever. |
| |
| A feasible solution is to limit the value of max_read for virtio-fs, so |
| the length passed to kmalloc() will be limited. However it will affect |
| the maximal read size for normal read. And for virtio-fs write initiated |
| from kernel, it has the similar problem but now there is no way to limit |
| fc->max_write in kernel. |
| |
| So instead of limiting both the values of max_read and max_write in |
| kernel, introducing use_pages_for_kvec_io in fuse_conn and setting it as |
| true in virtiofs. When use_pages_for_kvec_io is enabled, fuse will use |
| pages instead of pointer to pass the KVEC_IO data. |
| |
| After switching to pages for KVEC_IO data, these pages will be used for |
| DMA through virtio-fs. If these pages are backed by vmalloc(), |
| {flush|invalidate}_kernel_vmap_range() are necessary to flush or |
| invalidate the cache before the DMA operation. So add two new fields in |
| fuse_args_pages to record the base address of vmalloc area and the |
| condition indicating whether invalidation is needed. Perform the flush |
| in fuse_get_user_pages() for write operations and the invalidation in |
| fuse_release_user_pages() for read operations. |
| |
| It may seem necessary to introduce another field in fuse_conn to |
| indicate that these KVEC_IO pages are used for DMA, However, considering |
| that virtio-fs is currently the only user of use_pages_for_kvec_io, just |
| reuse use_pages_for_kvec_io to indicate that these pages will be used |
| for DMA. |
| |
| The Linux kernel CVE team has assigned CVE-2024-53219 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 5.4 with commit a62a8ef9d97da23762a588592c8b8eb50a8deb6a and fixed in 6.11.11 with commit 9a8fde56d4b6d51930936ed50f6370a9097328d1 |
| Issue introduced in 5.4 with commit a62a8ef9d97da23762a588592c8b8eb50a8deb6a and fixed in 6.12.2 with commit 2bc07714dc955a91d2923a440ea02c3cb3376b10 |
| Issue introduced in 5.4 with commit a62a8ef9d97da23762a588592c8b8eb50a8deb6a and fixed in 6.13 with commit 41748675c0bf252b3c5f600a95830f0936d366c1 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-53219 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| fs/fuse/file.c |
| fs/fuse/fuse_i.h |
| fs/fuse/virtio_fs.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/9a8fde56d4b6d51930936ed50f6370a9097328d1 |
| https://git.kernel.org/stable/c/2bc07714dc955a91d2923a440ea02c3cb3376b10 |
| https://git.kernel.org/stable/c/41748675c0bf252b3c5f600a95830f0936d366c1 |