| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2024-50200: maple_tree: correct tree corruption on spanning store |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| maple_tree: correct tree corruption on spanning store |
| |
| Patch series "maple_tree: correct tree corruption on spanning store", v3. |
| |
| There has been a nasty yet subtle maple tree corruption bug that appears |
| to have been in existence since the inception of the algorithm. |
| |
| This bug seems far more likely to happen since commit f8d112a4e657 |
| ("mm/mmap: avoid zeroing vma tree in mmap_region()"), which is the point |
| at which reports started to be submitted concerning this bug. |
| |
| We were made definitely aware of the bug thanks to the kind efforts of |
| Bert Karwatzki who helped enormously in my being able to track this down |
| and identify the cause of it. |
| |
| The bug arises when an attempt is made to perform a spanning store across |
| two leaf nodes, where the right leaf node is the rightmost child of the |
| shared parent, AND the store completely consumes the right-mode node. |
| |
| This results in mas_wr_spanning_store() mitakenly duplicating the new and |
| existing entries at the maximum pivot within the range, and thus maple |
| tree corruption. |
| |
| The fix patch corrects this by detecting this scenario and disallowing the |
| mistaken duplicate copy. |
| |
| The fix patch commit message goes into great detail as to how this occurs. |
| |
| This series also includes a test which reliably reproduces the issue, and |
| asserts that the fix works correctly. |
| |
| Bert has kindly tested the fix and confirmed it resolved his issues. Also |
| Mikhail Gavrilov kindly reported what appears to be precisely the same |
| bug, which this fix should also resolve. |
| |
| |
| This patch (of 2): |
| |
| There has been a subtle bug present in the maple tree implementation from |
| its inception. |
| |
| This arises from how stores are performed - when a store occurs, it will |
| overwrite overlapping ranges and adjust the tree as necessary to |
| accommodate this. |
| |
| A range may always ultimately span two leaf nodes. In this instance we |
| walk the two leaf nodes, determine which elements are not overwritten to |
| the left and to the right of the start and end of the ranges respectively |
| and then rebalance the tree to contain these entries and the newly |
| inserted one. |
| |
| This kind of store is dubbed a 'spanning store' and is implemented by |
| mas_wr_spanning_store(). |
| |
| In order to reach this stage, mas_store_gfp() invokes |
| mas_wr_preallocate(), mas_wr_store_type() and mas_wr_walk() in turn to |
| walk the tree and update the object (mas) to traverse to the location |
| where the write should be performed, determining its store type. |
| |
| When a spanning store is required, this function returns false stopping at |
| the parent node which contains the target range, and mas_wr_store_type() |
| marks the mas->store_type as wr_spanning_store to denote this fact. |
| |
| When we go to perform the store in mas_wr_spanning_store(), we first |
| determine the elements AFTER the END of the range we wish to store (that |
| is, to the right of the entry to be inserted) - we do this by walking to |
| the NEXT pivot in the tree (i.e. r_mas.last + 1), starting at the node we |
| have just determined contains the range over which we intend to write. |
| |
| We then turn our attention to the entries to the left of the entry we are |
| inserting, whose state is represented by l_mas, and copy these into a 'big |
| node', which is a special node which contains enough slots to contain two |
| leaf node's worth of data. |
| |
| We then copy the entry we wish to store immediately after this - the copy |
| and the insertion of the new entry is performed by mas_store_b_node(). |
| |
| After this we copy the elements to the right of the end of the range which |
| we are inserting, if we have not exceeded the length of the node (i.e. |
| r_mas.offset <= r_mas.end). |
| |
| Herein lies the bug - under very specific circumstances, this logic can |
| break and corrupt the maple tree. |
| |
| Consider the following tree: |
| |
| Height |
| 0 Root Node |
| / \ |
| pivot = 0xffff / \ pivot = ULONG_MAX |
| / \ |
| 1 A [-----] ... |
| / \ |
| pivot = 0x4fff / \ pivot = 0xffff |
| / \ |
| 2 (LEAVES) B [-----] [-----] C |
| ^--- Last pivot 0xffff. |
| |
| Now imagine we wish to store an entry in the range [0x4000, 0xffff] (note |
| that all ranges expressed in maple tree code are inclusive): |
| |
| 1. mas_store_gfp() descends the tree, finds node A at <=0xffff, then |
| determines that this is a spanning store across nodes B and C. The mas |
| state is set such that the current node from which we traverse further |
| is node A. |
| |
| 2. In mas_wr_spanning_store() we try to find elements to the right of pivot |
| 0xffff by searching for an index of 0x10000: |
| |
| - mas_wr_walk_index() invokes mas_wr_walk_descend() and |
| mas_wr_node_walk() in turn. |
| |
| - mas_wr_node_walk() loops over entries in node A until EITHER it |
| finds an entry whose pivot equals or exceeds 0x10000 OR it |
| reaches the final entry. |
| |
| - Since no entry has a pivot equal to or exceeding 0x10000, pivot |
| 0xffff is selected, leading to node C. |
| |
| - mas_wr_walk_traverse() resets the mas state to traverse node C. We |
| loop around and invoke mas_wr_walk_descend() and mas_wr_node_walk() |
| in turn once again. |
| |
| - Again, we reach the last entry in node C, which has a pivot of |
| 0xffff. |
| |
| 3. We then copy the elements to the left of 0x4000 in node B to the big |
| node via mas_store_b_node(), and insert the new [0x4000, 0xffff] entry |
| too. |
| |
| 4. We determine whether we have any entries to copy from the right of the |
| end of the range via - and with r_mas set up at the entry at pivot |
| 0xffff, r_mas.offset <= r_mas.end, and then we DUPLICATE the entry at |
| pivot 0xffff. |
| |
| 5. BUG! The maple tree is corrupted with a duplicate entry. |
| |
| This requires a very specific set of circumstances - we must be spanning |
| the last element in a leaf node, which is the last element in the parent |
| node. |
| |
| spanning store across two leaf nodes with a range that ends at that shared |
| pivot. |
| |
| A potential solution to this problem would simply be to reset the walk |
| each time we traverse r_mas, however given the rarity of this situation it |
| seems that would be rather inefficient. |
| |
| Instead, this patch detects if the right hand node is populated, i.e. has |
| anything we need to copy. |
| |
| We do so by only copying elements from the right of the entry being |
| inserted when the maximum value present exceeds the last, rather than |
| basing this on offset position. |
| |
| The patch also updates some comments and eliminates the unused bool return |
| value in mas_wr_walk_index(). |
| |
| The work performed in commit f8d112a4e657 ("mm/mmap: avoid zeroing vma |
| tree in mmap_region()") seems to have made the probability of this event |
| much more likely, which is the point at which reports started to be |
| submitted concerning this bug. |
| |
| The motivation for this change arose from Bert Karwatzki's report of |
| encountering mm instability after the release of kernel v6.12-rc1 which, |
| after the use of CONFIG_DEBUG_VM_MAPLE_TREE and similar configuration |
| options, was identified as maple tree corruption. |
| |
| After Bert very generously provided his time and ability to reproduce this |
| event consistently, I was able to finally identify that the issue |
| discussed in this commit message was occurring for him. |
| |
| The Linux kernel CVE team has assigned CVE-2024-50200 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Issue introduced in 6.1 with commit 54a611b605901c7d5d05b6b8f5d04a6ceb0962aa and fixed in 6.1.114 with commit 7c7874977da9e47ca0f53d8b9a5b17385fed83f2 |
| Issue introduced in 6.1 with commit 54a611b605901c7d5d05b6b8f5d04a6ceb0962aa and fixed in 6.6.58 with commit 677f1df179cb68c12ddf7707ec325eb50e99c7d9 |
| Issue introduced in 6.1 with commit 54a611b605901c7d5d05b6b8f5d04a6ceb0962aa and fixed in 6.11.5 with commit 982dd0d26d1f015ed34866579480d2be5250b0ef |
| Issue introduced in 6.1 with commit 54a611b605901c7d5d05b6b8f5d04a6ceb0962aa and fixed in 6.12 with commit bea07fd63192b61209d48cbb81ef474cc3ee4c62 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2024-50200 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| lib/maple_tree.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/7c7874977da9e47ca0f53d8b9a5b17385fed83f2 |
| https://git.kernel.org/stable/c/677f1df179cb68c12ddf7707ec325eb50e99c7d9 |
| https://git.kernel.org/stable/c/982dd0d26d1f015ed34866579480d2be5250b0ef |
| https://git.kernel.org/stable/c/bea07fd63192b61209d48cbb81ef474cc3ee4c62 |