| From bippy-5f407fcff5a0 Mon Sep 17 00:00:00 2001 |
| From: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| To: <linux-cve-announce@vger.kernel.org> |
| Reply-to: <cve@kernel.org>, <linux-kernel@vger.kernel.org> |
| Subject: CVE-2021-47275: bcache: avoid oversized read request in cache missing code path |
| |
| Description |
| =========== |
| |
| In the Linux kernel, the following vulnerability has been resolved: |
| |
| bcache: avoid oversized read request in cache missing code path |
| |
| In the cache missing code path of cached device, if a proper location |
| from the internal B+ tree is matched for a cache miss range, function |
| cached_dev_cache_miss() will be called in cache_lookup_fn() in the |
| following code block, |
| [code block 1] |
| 526 unsigned int sectors = KEY_INODE(k) == s->iop.inode |
| 527 ? min_t(uint64_t, INT_MAX, |
| 528 KEY_START(k) - bio->bi_iter.bi_sector) |
| 529 : INT_MAX; |
| 530 int ret = s->d->cache_miss(b, s, bio, sectors); |
| |
| Here s->d->cache_miss() is the call backfunction pointer initialized as |
| cached_dev_cache_miss(), the last parameter 'sectors' is an important |
| hint to calculate the size of read request to backing device of the |
| missing cache data. |
| |
| Current calculation in above code block may generate oversized value of |
| 'sectors', which consequently may trigger 2 different potential kernel |
| panics by BUG() or BUG_ON() as listed below, |
| |
| 1) BUG_ON() inside bch_btree_insert_key(), |
| [code block 2] |
| 886 BUG_ON(b->ops->is_extents && !KEY_SIZE(k)); |
| 2) BUG() inside biovec_slab(), |
| [code block 3] |
| 51 default: |
| 52 BUG(); |
| 53 return NULL; |
| |
| All the above panics are original from cached_dev_cache_miss() by the |
| oversized parameter 'sectors'. |
| |
| Inside cached_dev_cache_miss(), parameter 'sectors' is used to calculate |
| the size of data read from backing device for the cache missing. This |
| size is stored in s->insert_bio_sectors by the following lines of code, |
| [code block 4] |
| 909 s->insert_bio_sectors = min(sectors, bio_sectors(bio) + reada); |
| |
| Then the actual key inserting to the internal B+ tree is generated and |
| stored in s->iop.replace_key by the following lines of code, |
| [code block 5] |
| 911 s->iop.replace_key = KEY(s->iop.inode, |
| 912 bio->bi_iter.bi_sector + s->insert_bio_sectors, |
| 913 s->insert_bio_sectors); |
| The oversized parameter 'sectors' may trigger panic 1) by BUG_ON() from |
| the above code block. |
| |
| And the bio sending to backing device for the missing data is allocated |
| with hint from s->insert_bio_sectors by the following lines of code, |
| [code block 6] |
| 926 cache_bio = bio_alloc_bioset(GFP_NOWAIT, |
| 927 DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS), |
| 928 &dc->disk.bio_split); |
| The oversized parameter 'sectors' may trigger panic 2) by BUG() from the |
| agove code block. |
| |
| Now let me explain how the panics happen with the oversized 'sectors'. |
| In code block 5, replace_key is generated by macro KEY(). From the |
| definition of macro KEY(), |
| [code block 7] |
| 71 #define KEY(inode, offset, size) \ |
| 72 ((struct bkey) { \ |
| 73 .high = (1ULL << 63) | ((__u64) (size) << 20) | (inode), \ |
| 74 .low = (offset) \ |
| 75 }) |
| |
| Here 'size' is 16bits width embedded in 64bits member 'high' of struct |
| bkey. But in code block 1, if "KEY_START(k) - bio->bi_iter.bi_sector" is |
| very probably to be larger than (1<<16) - 1, which makes the bkey size |
| calculation in code block 5 is overflowed. In one bug report the value |
| of parameter 'sectors' is 131072 (= 1 << 17), the overflowed 'sectors' |
| results the overflowed s->insert_bio_sectors in code block 4, then makes |
| size field of s->iop.replace_key to be 0 in code block 5. Then the 0- |
| sized s->iop.replace_key is inserted into the internal B+ tree as cache |
| missing check key (a special key to detect and avoid a racing between |
| normal write request and cache missing read request) as, |
| [code block 8] |
| 915 ret = bch_btree_insert_check_key(b, &s->op, &s->iop.replace_key); |
| |
| Then the 0-sized s->iop.replace_key as 3rd parameter triggers the bkey |
| size check BUG_ON() in code block 2, and causes the kernel panic 1). |
| |
| Another kernel panic is from code block 6, is by the bvecs number |
| oversized value s->insert_bio_sectors from code block 4, |
| min(sectors, bio_sectors(bio) + reada) |
| There are two possibility for oversized reresult, |
| - bio_sectors(bio) is valid, but bio_sectors(bio) + reada is oversized. |
| - sectors < bio_sectors(bio) + reada, but sectors is oversized. |
| |
| From a bug report the result of "DIV_ROUND_UP(s->insert_bio_sectors, |
| PAGE_SECTORS)" from code block 6 can be 344, 282, 946, 342 and many |
| other values which larther than BIO_MAX_VECS (a.k.a 256). When calling |
| bio_alloc_bioset() with such larger-than-256 value as the 2nd parameter, |
| this value will eventually be sent to biovec_slab() as parameter |
| 'nr_vecs' in following code path, |
| bio_alloc_bioset() ==> bvec_alloc() ==> biovec_slab() |
| Because parameter 'nr_vecs' is larger-than-256 value, the panic by BUG() |
| in code block 3 is triggered inside biovec_slab(). |
| |
| From the above analysis, we know that the 4th parameter 'sector' sent |
| into cached_dev_cache_miss() may cause overflow in code block 5 and 6, |
| and finally cause kernel panic in code block 2 and 3. And if result of |
| bio_sectors(bio) + reada exceeds valid bvecs number, it may also trigger |
| kernel panic in code block 3 from code block 6. |
| |
| Now the almost-useless readahead size for cache missing request back to |
| backing device is removed, this patch can fix the oversized issue with |
| more simpler method. |
| - add a local variable size_limit, set it by the minimum value from |
| the max bkey size and max bio bvecs number. |
| - set s->insert_bio_sectors by the minimum value from size_limit, |
| sectors, and the sectors size of bio. |
| - replace sectors by s->insert_bio_sectors to do bio_next_split. |
| |
| By the above method with size_limit, s->insert_bio_sectors will never |
| result oversized replace_key size or bio bvecs number. And split bio |
| 'miss' from bio_next_split() will always match the size of 'cache_bio', |
| that is the current maximum bio size we can sent to backing device for |
| fetching the cache missing data. |
| |
| Current problmatic code can be partially found since Linux v3.13-rc1, |
| therefore all maintained stable kernels should try to apply this fix. |
| |
| The Linux kernel CVE team has assigned CVE-2021-47275 to this issue. |
| |
| |
| Affected and fixed versions |
| =========================== |
| |
| Fixed in 5.12.11 with commit 555002a840ab88468e252b0eedf0b05e2ce7099c |
| Fixed in 5.13 with commit 41fe8d088e96472f63164e213de44ec77be69478 |
| |
| Please see https://www.kernel.org for a full list of currently supported |
| kernel versions by the kernel community. |
| |
| Unaffected versions might change over time as fixes are backported to |
| older supported kernel versions. The official CVE entry at |
| https://cve.org/CVERecord/?id=CVE-2021-47275 |
| will be updated if fixes are backported, please check that for the most |
| up to date information about this issue. |
| |
| |
| Affected files |
| ============== |
| |
| The file(s) affected by this issue are: |
| drivers/md/bcache/request.c |
| |
| |
| Mitigation |
| ========== |
| |
| The Linux kernel CVE team recommends that you update to the latest |
| stable kernel version for this, and many other bugfixes. Individual |
| changes are never tested alone, but rather are part of a larger kernel |
| release. Cherry-picking individual commits is not recommended or |
| supported by the Linux kernel community at all. If however, updating to |
| the latest release is impossible, the individual changes to resolve this |
| issue can be found at these commits: |
| https://git.kernel.org/stable/c/555002a840ab88468e252b0eedf0b05e2ce7099c |
| https://git.kernel.org/stable/c/41fe8d088e96472f63164e213de44ec77be69478 |