percpu_counter: add a cmpxchg-based _add_batch variant
This was "percpu_counter: reimplement _add_batch with __this_cpu_cmpxchg".
I chatted with vbabka a little bit and he pointed me at mod_zone_state,
which does the same thing I needed except dodges preemption -- turns out
cmpxchg with a gs-prefixed argument is safe here.
================ cut here ================
Interrupt disable/enable trips are quite expensive on x86-64 compared to
a mere cmpxchg (note: no lock prefix!) and percpu counters are used
quite often.
With this change I get a bump of 1% ops/s for negative path lookups,
plugged into will-it-scale:
void testcase(unsigned long long *iterations, unsigned long nr)
{
while (1) {
int fd = open("/tmp/nonexistent", O_RDONLY);
assert(fd == -1);
(*iterations)++;
}
}
The win would be higher if it was not for other slowdowns, but one has
to start somewhere.
v2:
- dodge preemption
- use this_cpu_try_cmpxchg
- keep the old variant depending on CONFIG_HAVE_CMPXCHG_LOCAL
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
1 file changed