| From 651740a502411793327e2f0741104749c4eedcd1 Mon Sep 17 00:00:00 2001 |
| From: Josef Bacik <josef@toxicpanda.com> |
| Date: Mon, 13 Dec 2021 14:22:33 -0500 |
| Subject: btrfs: check WRITE_ERR when trying to read an extent buffer |
| |
| From: Josef Bacik <josef@toxicpanda.com> |
| |
| commit 651740a502411793327e2f0741104749c4eedcd1 upstream. |
| |
| Filipe reported a hang when we have errors on btrfs. This turned out to |
| be a side-effect of my fix c2e39305299f01 ("btrfs: clear extent buffer |
| uptodate when we fail to write it") which made it so we clear |
| EXTENT_BUFFER_UPTODATE on an eb when we fail to write it out. |
| |
| Below is a paste of Filipe's analysis he got from using drgn to debug |
| the hang |
| |
| """ |
| btree readahead code calls read_extent_buffer_pages(), sets ->io_pages to |
| a value while writeback of all pages has not yet completed: |
| --> writeback for the first 3 pages finishes, we clear |
| EXTENT_BUFFER_UPTODATE from eb on the first page when we get an |
| error. |
| --> at this point eb->io_pages is 1 and we cleared Uptodate bit from the |
| first 3 pages |
| --> read_extent_buffer_pages() does not see EXTENT_BUFFER_UPTODATE() so |
| it continues, it's able to lock the pages since we obviously don't |
| hold the pages locked during writeback |
| --> read_extent_buffer_pages() then computes 'num_reads' as 3, and sets |
| eb->io_pages to 3, since only the first page does not have Uptodate |
| bit set at this point |
| --> writeback for the remaining page completes, we ended decrementing |
| eb->io_pages by 1, resulting in eb->io_pages == 2, and therefore |
| never calling end_extent_buffer_writeback(), so |
| EXTENT_BUFFER_WRITEBACK remains in the eb's flags |
| --> of course, when the read bio completes, it doesn't and shouldn't |
| call end_extent_buffer_writeback() |
| --> we should clear EXTENT_BUFFER_UPTODATE only after all pages of |
| the eb finished writeback? or maybe make the read pages code |
| wait for writeback of all pages of the eb to complete before |
| checking which pages need to be read, touch ->io_pages, submit |
| read bio, etc |
| |
| writeback bit never cleared means we can hang when aborting a |
| transaction, at: |
| |
| btrfs_cleanup_one_transaction() |
| btrfs_destroy_marked_extents() |
| wait_on_extent_buffer_writeback() |
| """ |
| |
| This is a problem because our writes are not synchronized with reads in |
| any way. We clear the UPTODATE flag and then we can easily come in and |
| try to read the EB while we're still waiting on other bio's to |
| complete. |
| |
| We have two options here, we could lock all the pages, and then check to |
| see if eb->io_pages != 0 to know if we've already got an outstanding |
| write on the eb. |
| |
| Or we can simply check to see if we have WRITE_ERR set on this extent |
| buffer. We set this bit _before_ we clear UPTODATE, so if the read gets |
| triggered because we aren't UPTODATE because of a write error we're |
| guaranteed to have WRITE_ERR set, and in this case we can simply return |
| -EIO. This will fix the reported hang. |
| |
| Reported-by: Filipe Manana <fdmanana@suse.com> |
| Fixes: c2e39305299f01 ("btrfs: clear extent buffer uptodate when we fail to write it") |
| CC: stable@vger.kernel.org # 5.4+ |
| Reviewed-by: Filipe Manana <fdmanana@suse.com> |
| Signed-off-by: Josef Bacik <josef@toxicpanda.com> |
| Signed-off-by: David Sterba <dsterba@suse.com> |
| Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
| --- |
| fs/btrfs/extent_io.c | 8 ++++++++ |
| 1 file changed, 8 insertions(+) |
| |
| --- a/fs/btrfs/extent_io.c |
| +++ b/fs/btrfs/extent_io.c |
| @@ -6547,6 +6547,14 @@ int read_extent_buffer_pages(struct exte |
| if (test_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags)) |
| return 0; |
| |
| + /* |
| + * We could have had EXTENT_BUFFER_UPTODATE cleared by the write |
| + * operation, which could potentially still be in flight. In this case |
| + * we simply want to return an error. |
| + */ |
| + if (unlikely(test_bit(EXTENT_BUFFER_WRITE_ERR, &eb->bflags))) |
| + return -EIO; |
| + |
| if (eb->fs_info->sectorsize < PAGE_SIZE) |
| return read_extent_buffer_subpage(eb, wait, mirror_num); |
| |