| From: Ryan Roberts <ryan.roberts@arm.com> |
| Subject: mm/readahead: terminate async readahead on natural boundary |
| Date: Mon, 9 Jun 2025 10:27:24 +0100 |
| |
| Previously asynchonous readahead would read ra_pages (usually 128K) |
| directly after the end of the synchonous readahead and given the |
| synchronous readahead portion had no alignment guarantees (beyond page |
| boundaries) it is possible (and likely) that the end of the initial 128K |
| region would not fall on a natural boundary for the folio size being used. |
| Therefore smaller folios were used to align down to the required |
| boundary, both at the end of the previous readahead block and at the start |
| of the new one. |
| |
| In the worst cases, this can result in never properly ramping up the folio |
| size, and instead getting stuck oscillating between order-0, -1 and -2 |
| folios. The next readahead will try to use folios whose order is +2 |
| bigger than the folio that had the readahead marker. But because of the |
| alignment requirements, that folio (the first one in the readahead block) |
| can end up being order-0 in some cases. |
| |
| There will be 2 modifications to solve this issue: |
| |
| 1) Calculate the readahead size so the end is aligned to a folio |
| boundary. This prevents needing to allocate small folios to align |
| down at the end of the window and fixes the oscillation problem. |
| |
| 2) Remember the "preferred folio order" in the ra state instead of |
| inferring it from the folio with the readahead marker. This solves |
| the slow ramp up problem (discussed in a subsequent patch). |
| |
| This patch addresses (1) only. A subsequent patch will address (2). |
| |
| Worked example: |
| |
| The following shows the previous pathalogical behaviour when the initial |
| synchronous readahead is unaligned. We start reading at page 17 in the |
| file and read sequentially from there. I'm showing a dump of the pages in |
| the page cache just after we read the first page of the folio with the |
| readahead marker. |
| |
| Initially there are no pages in the page cache: |
| |
| TYPE STARTOFFS ENDOFFS SIZE STARTPG ENDPG NRPG ORDER RA |
| ----- ---------- ---------- ---------- ------- ------- ----- ----- -- |
| HOLE 0x00000000 0x00800000 8388608 0 2048 2048 |
| |
| Then we access page 17, causing synchonous read-around of 128K with a |
| readahead marker set up at page 25. So far, all as expected: |
| |
| TYPE STARTOFFS ENDOFFS SIZE STARTPG ENDPG NRPG ORDER RA |
| ----- ---------- ---------- ---------- ------- ------- ----- ----- -- |
| HOLE 0x00000000 0x00001000 4096 0 1 1 |
| FOLIO 0x00001000 0x00002000 4096 1 2 1 0 |
| FOLIO 0x00002000 0x00003000 4096 2 3 1 0 |
| FOLIO 0x00003000 0x00004000 4096 3 4 1 0 |
| FOLIO 0x00004000 0x00005000 4096 4 5 1 0 |
| FOLIO 0x00005000 0x00006000 4096 5 6 1 0 |
| FOLIO 0x00006000 0x00007000 4096 6 7 1 0 |
| FOLIO 0x00007000 0x00008000 4096 7 8 1 0 |
| FOLIO 0x00008000 0x00009000 4096 8 9 1 0 |
| FOLIO 0x00009000 0x0000a000 4096 9 10 1 0 |
| FOLIO 0x0000a000 0x0000b000 4096 10 11 1 0 |
| FOLIO 0x0000b000 0x0000c000 4096 11 12 1 0 |
| FOLIO 0x0000c000 0x0000d000 4096 12 13 1 0 |
| FOLIO 0x0000d000 0x0000e000 4096 13 14 1 0 |
| FOLIO 0x0000e000 0x0000f000 4096 14 15 1 0 |
| FOLIO 0x0000f000 0x00010000 4096 15 16 1 0 |
| FOLIO 0x00010000 0x00011000 4096 16 17 1 0 |
| FOLIO 0x00011000 0x00012000 4096 17 18 1 0 |
| FOLIO 0x00012000 0x00013000 4096 18 19 1 0 |
| FOLIO 0x00013000 0x00014000 4096 19 20 1 0 |
| FOLIO 0x00014000 0x00015000 4096 20 21 1 0 |
| FOLIO 0x00015000 0x00016000 4096 21 22 1 0 |
| FOLIO 0x00016000 0x00017000 4096 22 23 1 0 |
| FOLIO 0x00017000 0x00018000 4096 23 24 1 0 |
| FOLIO 0x00018000 0x00019000 4096 24 25 1 0 |
| FOLIO 0x00019000 0x0001a000 4096 25 26 1 0 Y |
| FOLIO 0x0001a000 0x0001b000 4096 26 27 1 0 |
| FOLIO 0x0001b000 0x0001c000 4096 27 28 1 0 |
| FOLIO 0x0001c000 0x0001d000 4096 28 29 1 0 |
| FOLIO 0x0001d000 0x0001e000 4096 29 30 1 0 |
| FOLIO 0x0001e000 0x0001f000 4096 30 31 1 0 |
| FOLIO 0x0001f000 0x00020000 4096 31 32 1 0 |
| FOLIO 0x00020000 0x00021000 4096 32 33 1 0 |
| HOLE 0x00021000 0x00800000 8253440 33 2048 2015 |
| |
| Now access pages 18-25 inclusive. This causes an asynchronous 128K |
| readahead starting at page 33. But since we are unaligned, even though |
| the preferred folio order is 2, the first folio in this batch (the one |
| with the new readahead marker) is order-0: |
| |
| TYPE STARTOFFS ENDOFFS SIZE STARTPG ENDPG NRPG ORDER RA |
| ----- ---------- ---------- ---------- ------- ------- ----- ----- -- |
| HOLE 0x00000000 0x00001000 4096 0 1 1 |
| FOLIO 0x00001000 0x00002000 4096 1 2 1 0 |
| FOLIO 0x00002000 0x00003000 4096 2 3 1 0 |
| FOLIO 0x00003000 0x00004000 4096 3 4 1 0 |
| FOLIO 0x00004000 0x00005000 4096 4 5 1 0 |
| FOLIO 0x00005000 0x00006000 4096 5 6 1 0 |
| FOLIO 0x00006000 0x00007000 4096 6 7 1 0 |
| FOLIO 0x00007000 0x00008000 4096 7 8 1 0 |
| FOLIO 0x00008000 0x00009000 4096 8 9 1 0 |
| FOLIO 0x00009000 0x0000a000 4096 9 10 1 0 |
| FOLIO 0x0000a000 0x0000b000 4096 10 11 1 0 |
| FOLIO 0x0000b000 0x0000c000 4096 11 12 1 0 |
| FOLIO 0x0000c000 0x0000d000 4096 12 13 1 0 |
| FOLIO 0x0000d000 0x0000e000 4096 13 14 1 0 |
| FOLIO 0x0000e000 0x0000f000 4096 14 15 1 0 |
| FOLIO 0x0000f000 0x00010000 4096 15 16 1 0 |
| FOLIO 0x00010000 0x00011000 4096 16 17 1 0 |
| FOLIO 0x00011000 0x00012000 4096 17 18 1 0 |
| FOLIO 0x00012000 0x00013000 4096 18 19 1 0 |
| FOLIO 0x00013000 0x00014000 4096 19 20 1 0 |
| FOLIO 0x00014000 0x00015000 4096 20 21 1 0 |
| FOLIO 0x00015000 0x00016000 4096 21 22 1 0 |
| FOLIO 0x00016000 0x00017000 4096 22 23 1 0 |
| FOLIO 0x00017000 0x00018000 4096 23 24 1 0 |
| FOLIO 0x00018000 0x00019000 4096 24 25 1 0 |
| FOLIO 0x00019000 0x0001a000 4096 25 26 1 0 |
| FOLIO 0x0001a000 0x0001b000 4096 26 27 1 0 |
| FOLIO 0x0001b000 0x0001c000 4096 27 28 1 0 |
| FOLIO 0x0001c000 0x0001d000 4096 28 29 1 0 |
| FOLIO 0x0001d000 0x0001e000 4096 29 30 1 0 |
| FOLIO 0x0001e000 0x0001f000 4096 30 31 1 0 |
| FOLIO 0x0001f000 0x00020000 4096 31 32 1 0 |
| FOLIO 0x00020000 0x00021000 4096 32 33 1 0 |
| FOLIO 0x00021000 0x00022000 4096 33 34 1 0 Y |
| FOLIO 0x00022000 0x00024000 8192 34 36 2 1 |
| FOLIO 0x00024000 0x00028000 16384 36 40 4 2 |
| FOLIO 0x00028000 0x0002c000 16384 40 44 4 2 |
| FOLIO 0x0002c000 0x00030000 16384 44 48 4 2 |
| FOLIO 0x00030000 0x00034000 16384 48 52 4 2 |
| FOLIO 0x00034000 0x00038000 16384 52 56 4 2 |
| FOLIO 0x00038000 0x0003c000 16384 56 60 4 2 |
| FOLIO 0x0003c000 0x00040000 16384 60 64 4 2 |
| FOLIO 0x00040000 0x00041000 4096 64 65 1 0 |
| HOLE 0x00041000 0x00800000 8122368 65 2048 1983 |
| |
| Which means that when we now read pages 26-33 and readahead is kicked off |
| again, the new preferred order is 2 (0 + 2), not 4 as we intended: |
| |
| TYPE STARTOFFS ENDOFFS SIZE STARTPG ENDPG NRPG ORDER RA |
| ----- ---------- ---------- ---------- ------- ------- ----- ----- -- |
| HOLE 0x00000000 0x00001000 4096 0 1 1 |
| FOLIO 0x00001000 0x00002000 4096 1 2 1 0 |
| FOLIO 0x00002000 0x00003000 4096 2 3 1 0 |
| FOLIO 0x00003000 0x00004000 4096 3 4 1 0 |
| FOLIO 0x00004000 0x00005000 4096 4 5 1 0 |
| FOLIO 0x00005000 0x00006000 4096 5 6 1 0 |
| FOLIO 0x00006000 0x00007000 4096 6 7 1 0 |
| FOLIO 0x00007000 0x00008000 4096 7 8 1 0 |
| FOLIO 0x00008000 0x00009000 4096 8 9 1 0 |
| FOLIO 0x00009000 0x0000a000 4096 9 10 1 0 |
| FOLIO 0x0000a000 0x0000b000 4096 10 11 1 0 |
| FOLIO 0x0000b000 0x0000c000 4096 11 12 1 0 |
| FOLIO 0x0000c000 0x0000d000 4096 12 13 1 0 |
| FOLIO 0x0000d000 0x0000e000 4096 13 14 1 0 |
| FOLIO 0x0000e000 0x0000f000 4096 14 15 1 0 |
| FOLIO 0x0000f000 0x00010000 4096 15 16 1 0 |
| FOLIO 0x00010000 0x00011000 4096 16 17 1 0 |
| FOLIO 0x00011000 0x00012000 4096 17 18 1 0 |
| FOLIO 0x00012000 0x00013000 4096 18 19 1 0 |
| FOLIO 0x00013000 0x00014000 4096 19 20 1 0 |
| FOLIO 0x00014000 0x00015000 4096 20 21 1 0 |
| FOLIO 0x00015000 0x00016000 4096 21 22 1 0 |
| FOLIO 0x00016000 0x00017000 4096 22 23 1 0 |
| FOLIO 0x00017000 0x00018000 4096 23 24 1 0 |
| FOLIO 0x00018000 0x00019000 4096 24 25 1 0 |
| FOLIO 0x00019000 0x0001a000 4096 25 26 1 0 |
| FOLIO 0x0001a000 0x0001b000 4096 26 27 1 0 |
| FOLIO 0x0001b000 0x0001c000 4096 27 28 1 0 |
| FOLIO 0x0001c000 0x0001d000 4096 28 29 1 0 |
| FOLIO 0x0001d000 0x0001e000 4096 29 30 1 0 |
| FOLIO 0x0001e000 0x0001f000 4096 30 31 1 0 |
| FOLIO 0x0001f000 0x00020000 4096 31 32 1 0 |
| FOLIO 0x00020000 0x00021000 4096 32 33 1 0 |
| FOLIO 0x00021000 0x00022000 4096 33 34 1 0 |
| FOLIO 0x00022000 0x00024000 8192 34 36 2 1 |
| FOLIO 0x00024000 0x00028000 16384 36 40 4 2 |
| FOLIO 0x00028000 0x0002c000 16384 40 44 4 2 |
| FOLIO 0x0002c000 0x00030000 16384 44 48 4 2 |
| FOLIO 0x00030000 0x00034000 16384 48 52 4 2 |
| FOLIO 0x00034000 0x00038000 16384 52 56 4 2 |
| FOLIO 0x00038000 0x0003c000 16384 56 60 4 2 |
| FOLIO 0x0003c000 0x00040000 16384 60 64 4 2 |
| FOLIO 0x00040000 0x00041000 4096 64 65 1 0 |
| FOLIO 0x00041000 0x00042000 4096 65 66 1 0 Y |
| FOLIO 0x00042000 0x00044000 8192 66 68 2 1 |
| FOLIO 0x00044000 0x00048000 16384 68 72 4 2 |
| FOLIO 0x00048000 0x0004c000 16384 72 76 4 2 |
| FOLIO 0x0004c000 0x00050000 16384 76 80 4 2 |
| FOLIO 0x00050000 0x00054000 16384 80 84 4 2 |
| FOLIO 0x00054000 0x00058000 16384 84 88 4 2 |
| FOLIO 0x00058000 0x0005c000 16384 88 92 4 2 |
| FOLIO 0x0005c000 0x00060000 16384 92 96 4 2 |
| FOLIO 0x00060000 0x00061000 4096 96 97 1 0 |
| HOLE 0x00061000 0x00800000 7991296 97 2048 1951 |
| |
| This ramp up from order-0 with smaller orders at the edges for alignment |
| cycle continues all the way to the end of the file (not shown). |
| |
| After the change, we round down the end boundary to the order boundary so |
| we no longer get stuck in the cycle and can ramp up the order over time. |
| Note that the rate of the ramp up is still not as we would expect it. We |
| will fix that next. Here we are touching pages 17-256 sequentially: |
| |
| TYPE STARTOFFS ENDOFFS SIZE STARTPG ENDPG NRPG ORDER RA |
| ----- ---------- ---------- ---------- ------- ------- ----- ----- -- |
| HOLE 0x00000000 0x00001000 4096 0 1 1 |
| FOLIO 0x00001000 0x00002000 4096 1 2 1 0 |
| FOLIO 0x00002000 0x00003000 4096 2 3 1 0 |
| FOLIO 0x00003000 0x00004000 4096 3 4 1 0 |
| FOLIO 0x00004000 0x00005000 4096 4 5 1 0 |
| FOLIO 0x00005000 0x00006000 4096 5 6 1 0 |
| FOLIO 0x00006000 0x00007000 4096 6 7 1 0 |
| FOLIO 0x00007000 0x00008000 4096 7 8 1 0 |
| FOLIO 0x00008000 0x00009000 4096 8 9 1 0 |
| FOLIO 0x00009000 0x0000a000 4096 9 10 1 0 |
| FOLIO 0x0000a000 0x0000b000 4096 10 11 1 0 |
| FOLIO 0x0000b000 0x0000c000 4096 11 12 1 0 |
| FOLIO 0x0000c000 0x0000d000 4096 12 13 1 0 |
| FOLIO 0x0000d000 0x0000e000 4096 13 14 1 0 |
| FOLIO 0x0000e000 0x0000f000 4096 14 15 1 0 |
| FOLIO 0x0000f000 0x00010000 4096 15 16 1 0 |
| FOLIO 0x00010000 0x00011000 4096 16 17 1 0 |
| FOLIO 0x00011000 0x00012000 4096 17 18 1 0 |
| FOLIO 0x00012000 0x00013000 4096 18 19 1 0 |
| FOLIO 0x00013000 0x00014000 4096 19 20 1 0 |
| FOLIO 0x00014000 0x00015000 4096 20 21 1 0 |
| FOLIO 0x00015000 0x00016000 4096 21 22 1 0 |
| FOLIO 0x00016000 0x00017000 4096 22 23 1 0 |
| FOLIO 0x00017000 0x00018000 4096 23 24 1 0 |
| FOLIO 0x00018000 0x00019000 4096 24 25 1 0 |
| FOLIO 0x00019000 0x0001a000 4096 25 26 1 0 |
| FOLIO 0x0001a000 0x0001b000 4096 26 27 1 0 |
| FOLIO 0x0001b000 0x0001c000 4096 27 28 1 0 |
| FOLIO 0x0001c000 0x0001d000 4096 28 29 1 0 |
| FOLIO 0x0001d000 0x0001e000 4096 29 30 1 0 |
| FOLIO 0x0001e000 0x0001f000 4096 30 31 1 0 |
| FOLIO 0x0001f000 0x00020000 4096 31 32 1 0 |
| FOLIO 0x00020000 0x00021000 4096 32 33 1 0 |
| FOLIO 0x00021000 0x00022000 4096 33 34 1 0 |
| FOLIO 0x00022000 0x00024000 8192 34 36 2 1 |
| FOLIO 0x00024000 0x00028000 16384 36 40 4 2 |
| FOLIO 0x00028000 0x0002c000 16384 40 44 4 2 |
| FOLIO 0x0002c000 0x00030000 16384 44 48 4 2 |
| FOLIO 0x00030000 0x00034000 16384 48 52 4 2 |
| FOLIO 0x00034000 0x00038000 16384 52 56 4 2 |
| FOLIO 0x00038000 0x0003c000 16384 56 60 4 2 |
| FOLIO 0x0003c000 0x00040000 16384 60 64 4 2 |
| FOLIO 0x00040000 0x00044000 16384 64 68 4 2 |
| FOLIO 0x00044000 0x00048000 16384 68 72 4 2 |
| FOLIO 0x00048000 0x0004c000 16384 72 76 4 2 |
| FOLIO 0x0004c000 0x00050000 16384 76 80 4 2 |
| FOLIO 0x00050000 0x00054000 16384 80 84 4 2 |
| FOLIO 0x00054000 0x00058000 16384 84 88 4 2 |
| FOLIO 0x00058000 0x0005c000 16384 88 92 4 2 |
| FOLIO 0x0005c000 0x00060000 16384 92 96 4 2 |
| FOLIO 0x00060000 0x00070000 65536 96 112 16 4 |
| FOLIO 0x00070000 0x00080000 65536 112 128 16 4 |
| FOLIO 0x00080000 0x000a0000 131072 128 160 32 5 |
| FOLIO 0x000a0000 0x000c0000 131072 160 192 32 5 |
| FOLIO 0x000c0000 0x000e0000 131072 192 224 32 5 |
| FOLIO 0x000e0000 0x00100000 131072 224 256 32 5 |
| FOLIO 0x00100000 0x00120000 131072 256 288 32 5 |
| FOLIO 0x00120000 0x00140000 131072 288 320 32 5 Y |
| HOLE 0x00140000 0x00800000 7077888 320 2048 1728 |
| |
| Link: https://lkml.kernel.org/r/20250609092729.274960-3-ryan.roberts@arm.com |
| Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> |
| Reviewed-by: Jan Kara <jack@suse.cz> |
| Cc: Chaitanya S Prakash <chaitanyas.prakash@arm.com> |
| Cc: David Hildenbrand <david@redhat.com> |
| Cc: Will Deacon <will@kernel.org> |
| Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
| --- |
| |
| mm/readahead.c | 10 +++++++--- |
| 1 file changed, 7 insertions(+), 3 deletions(-) |
| |
| --- a/mm/readahead.c~mm-readahead-terminate-async-readahead-on-natural-boundary |
| +++ a/mm/readahead.c |
| @@ -620,7 +620,7 @@ void page_cache_async_ra(struct readahea |
| unsigned long max_pages; |
| struct file_ra_state *ra = ractl->ra; |
| pgoff_t index = readahead_index(ractl); |
| - pgoff_t expected, start; |
| + pgoff_t expected, start, end, aligned_end, align; |
| unsigned int order = folio_order(folio); |
| |
| /* no readahead */ |
| @@ -652,7 +652,6 @@ void page_cache_async_ra(struct readahea |
| * the readahead window. |
| */ |
| ra->size = max(ra->size, get_next_ra_size(ra, max_pages)); |
| - ra->async_size = ra->size; |
| goto readit; |
| } |
| |
| @@ -673,9 +672,14 @@ void page_cache_async_ra(struct readahea |
| ra->size = start - index; /* old async_size */ |
| ra->size += req_count; |
| ra->size = get_next_ra_size(ra, max_pages); |
| - ra->async_size = ra->size; |
| readit: |
| order += 2; |
| + align = 1UL << min(order, ffs(max_pages) - 1); |
| + end = ra->start + ra->size; |
| + aligned_end = round_down(end, align); |
| + if (aligned_end > ra->start) |
| + ra->size -= end - aligned_end; |
| + ra->async_size = ra->size; |
| ractl->_index = ra->start; |
| page_cache_ra_order(ractl, ra, order); |
| } |
| _ |