blob: 2f6df9c9226e4fa6b0a5c6e7f1d1f53feac6c3a6 [file] [log] [blame]
From: Boris Burkov <boris@bur.io>
Subject: mm/filemap: add AS_UNCHARGED
Date: Mon, 18 Aug 2025 17:36:53 -0700
Patch series "introduce uncharged file mapped folios", v3.
I would like to revisit Qu's proposal to not charge btrfs extent_buffer
allocations to the user's cgroup.
https://lore.kernel.org/linux-mm/b5fef5372ae454a7b6da4f2f75c427aeab6a07d6.1727498749.git.wqu@suse.com/
I believe it is detrimental to randomly account these global pages to the
cgroup using them, basically at random. More justification and
explanation may be found in the patches themselves.
This patch (of 4):
Btrfs currently tracks its metadata pages in the page cache, using a fake
inode (fs_info->btree_inode) with offsets corresponding to where the
metadata is stored in the filesystem's full logical address space.
A consequence of this is that when btrfs uses filemap_add_folio(), this
usage is charged to the cgroup of whichever task happens to be running at
the time. These folios don't belong to any particular user cgroup, so I
don't think it makes much sense for them to be charged in that way. Some
negative consequences as a result:
- A task can be holding some important btrfs locks, then need to lookup
some metadata and go into reclaim, extending the duration it holds that
lock for, and unfairly pushing its own reclaim pain onto other cgroups.
- If that cgroup goes into reclaim, it might reclaim these folios a
different non-reclaiming cgroup might need soon. This is naturally
offset by LRU reclaim, but still.
A very similar proposal to use the root cgroup was previously made by Qu,
where he eventually proposed the idea of setting it per address_space.
This makes good sense for the btrfs use case, as the uncharged behavior
should apply to all use of the address_space, not select allocations.
I.e., if someone adds another filemap_add_folio() call using btrfs's
btree_inode, we would almost certainly want the uncharged behavior.
Link: https://lkml.kernel.org/r/cover.1755562487.git.boris@bur.io
Link: https://lore.kernel.org/linux-mm/b5fef5372ae454a7b6da4f2f75c427aeab6a07d6.1727498749.git.wqu@suse.com/
Link: https://lkml.kernel.org/r/43fed53d45910cd4fa7a71d2e92913e53eb28774.1755562487.git.boris@bur.io
Signed-off-by: Boris Burkov <boris@bur.io>
Suggested-by: Qu Wenruo <wqu@suse.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Tested-by: syzbot@syzkaller.appspotmail.com
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Chris Mason <clm@fb.com>
Cc: David Sterba <dsterba@suse.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/pagemap.h | 1 +
mm/filemap.c | 12 ++++++++----
2 files changed, 9 insertions(+), 4 deletions(-)
--- a/include/linux/pagemap.h~mm-filemap-add-as_uncharged
+++ a/include/linux/pagemap.h
@@ -211,6 +211,7 @@ enum mapping_flags {
folio contents */
AS_INACCESSIBLE = 8, /* Do not attempt direct R/W access to the mapping */
AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9,
+ AS_UNCHARGED = 10, /* Do not charge usage to a cgroup */
/* Bits 16-25 are used for FOLIO_ORDER */
AS_FOLIO_ORDER_BITS = 5,
AS_FOLIO_ORDER_MIN = 16,
--- a/mm/filemap.c~mm-filemap-add-as_uncharged
+++ a/mm/filemap.c
@@ -960,15 +960,19 @@ int filemap_add_folio(struct address_spa
{
void *shadow = NULL;
int ret;
+ bool charge_mem_cgroup = !test_bit(AS_UNCHARGED, &mapping->flags);
- ret = mem_cgroup_charge(folio, NULL, gfp);
- if (ret)
- return ret;
+ if (charge_mem_cgroup) {
+ ret = mem_cgroup_charge(folio, NULL, gfp);
+ if (ret)
+ return ret;
+ }
__folio_set_locked(folio);
ret = __filemap_add_folio(mapping, folio, index, gfp, &shadow);
if (unlikely(ret)) {
- mem_cgroup_uncharge(folio);
+ if (charge_mem_cgroup)
+ mem_cgroup_uncharge(folio);
__folio_clear_locked(folio);
} else {
/*
_