io_uring: switch away from percpu refcounts

For the common cases, the io_uring ref counts are all batched and hence
need not be a percpu reference. This saves some memory on systems, but
outside of that, it gets rid of needing a full RCU grace period on
tearing down the reference. With io_uring now waiting on cancelations
and IO during exit, this slows down the tear down a lot, up to 100x
as slow.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
9 files changed