perf/x86/intel/qos: Support per-task events

Add support for task events as well as system-wide events. This change
has a big impact on the way that we gather L3 cache occupancy values in
intel_qos_event_read().

Currently, for system-wide (per-cpu) events we defer processing to
userland which knows how to discard all but one per-cpu result per
socket using the 'readers' cpumask.

Things aren't so simple for task events because we need to do the value
aggregation ourselves. To do this, we cache the L3 occupancy value for
the current socket in intel_qos_event_read() and calculate the total by
summing all the previously cached values for all other sockets.

Ideally we'd do a cross-CPU call in intel_qos_event_read() to read the
instantaneous value for all other sockets instead of relying on the
cached (stale) copy, but that's not possible because we execute with
interrupts disabled.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2 files changed