Merge branch 'perf-core-for-linus' of git://

Pull perf tooling updates from Ingo Molnar:

   - Streaming compression of perf ring buffer into
     PERF_RECORD_COMPRESSED user space records, resulting in ~3-5x file size reduction on variety of tested workloads what
     saves storage space on larger server systems where size
     can easily reach several tens or even hundreds of GiBs, especially
     when profiling with DWARF-based stacks and tracing of context

  perf record:

   - Improve -user-regs/intr-regs suggestions to overcome errors

  perf annotate:

   - Remove hist__account_cycles() from callback, speeding up branch
     processing (perf record -b)

  perf stat:

   - Add a 'percore' event qualifier, e.g.: -e
     cpu/event=0,umask=0x3,percore=1/, that sums up the event counts for
     both hardware threads in a core.

     We can already do this with --per-core, but it's often useful to do
     this together with other metrics that are collected per hardware

     I.e. now its possible to do this per-event, and have it mixed with
     other events not aggregated by core.


   - Map Brahma-B53 CPUID to cortex-a53 events.

   - Add Cortex-A57 and Cortex-A72 events.


   - Add DWARF register mappings for libdw, allowing --call-graph=dwarf
     to work on the C-SKY arch.


   - Add support for recording and printing XMM registers, available,
     for instance, on Icelake.

   - Add uncore_upi (Intel's "Ultra Path Interconnect" events) JSON
     support. UPI replaced the Intel QuickPath Interconnect (QPI) in
     Xeon Skylake-SP.

  Intel PT:

   - Fix instructions sampling rate.

   - Timestamp fixes.

   - Improve exported-sql-viewer GUI, allowing, for instance, to
     copy'n'paste the trees, useful for e-mailing"

* 'perf-core-for-linus' of git:// (73 commits)
  perf stat: Support 'percore' event qualifier
  perf stat: Factor out aggregate counts printing
  perf tools: Add a 'percore' event qualifier
  perf docs: Add description for stderr
  perf intel-pt: Fix sample timestamp wrt non-taken branches
  perf intel-pt: Fix improved sample timestamp
  perf intel-pt: Fix instructions sampling rate
  perf regs x86: Add X86 specific arch__intr_reg_mask()
  perf parse-regs: Add generic support for arch__intr/user_reg_mask()
  perf parse-regs: Split parse_regs
  perf vendor events arm64: Add Cortex-A57 and Cortex-A72 events
  perf vendor events arm64: Map Brahma-B53 CPUID to cortex-a53 events
  perf vendor events arm64: Remove [[:xdigit:]] wildcard
  perf jevents: Remove unused variable
  perf test zstd: Fixup verbose mode output
  perf tests: Implement Zstd comp/decomp integration test
  perf inject: Enable COMPRESSED record decompression
  perf report: Implement record decompression
  perf record: Implement -z,--compression_level[=<n>] option
  perf report: Add stub processing of compressed events for -D