perf tools changes for v6.9

perf stat
---------
* Support new 'cluster' aggregation mode for shared resources depending on the
  hardware configuration.

    $ sudo perf stat -a --per-cluster -e cycles,instructions sleep 1

     Performance counter stats for 'system wide':

    S0-D0-CLS0    2         85,051,822      cycles
    S0-D0-CLS0    2         73,909,908      instructions      #    0.87  insn per cycle
    S0-D0-CLS2    2         93,365,918      cycles
    S0-D0-CLS2    2         83,006,158      instructions      #    0.89  insn per cycle
    S0-D0-CLS4    2        104,157,523      cycles
    S0-D0-CLS4    2         53,234,396      instructions      #    0.51  insn per cycle
    S0-D0-CLS6    2         65,891,079      cycles
    S0-D0-CLS6    2         41,478,273      instructions      #    0.63  insn per cycle

           1.002407989 seconds time elapsed

* Various fixes and cleanups for event metrics including NaN handling.

perf script
-----------
* Use libcapstone if available to disassemble the instructions.  This enables
  'perf script -F disasm' and 'perf script --insn-trace=disasm' (for Intel-PT).

    $ perf script -F event,ip,disasm
    cycles:P:  ffffffffa988d428             wrmsr
    cycles:P:  ffffffffa9839d25             movq %rax, %r14
    cycles:P:  ffffffffa9cdcaf0             endbr64
    cycles:P:  ffffffffa988d428             wrmsr
    cycles:P:  ffffffffa988d428             wrmsr
    cycles:P:  ffffffffaa401f86             iretq
    cycles:P:  ffffffffa99c4de5             movq 0x30(%rcx), %r8
    cycles:P:  ffffffffa988d428             wrmsr
    cycles:P:  ffffffffaa401f86             iretq
    cycles:P:  ffffffffa9907983             movl 0x68(%rbx), %eax
    cycles:P:  ffffffffa988d428             wrmsr

* Expose sample ID / stream ID to python scripts

perf test
---------
* Add more perf test cases from Redhat internal test suites.  This time it adds
  the base infra and a few perf probe tests.  More to come. :)

* Add 'perf test -p' for parallel execution and fix some issues found by the
  parallel test.

* Support symbol test to print symbols in given (active) module:

    $ perf test -F -v Symbols --dso /lib/modules/$(uname -r)/kernel/fs/ext4/ext4.ko
    --- start ---
    Testing /lib/modules/6.5.13-1rodete2-amd64/kernel/fs/ext4/ext4.ko
    Overlapping symbols:
     7a990-7a9a0 l __pfx_ext4_exit_fs
     7a990-7a9a0 g __pfx_cleanup_module
    Overlapping symbols:
     7a9a0-7aa1c l ext4_exit_fs
     7a9a0-7aa1c g cleanup_module
    ...

JSON metric updates
-------------------
* A new round of Intel metric updates.

* Support Power11 PVR (compatible to Power10).

* Fix cache latency events on Zen 4 to set SliceId properly.

Internal
--------
* Fix reference counting for 'map' data structure, tireless work from Ian!

* More memory optimization for struct thread and annotate histogram.  Now,
  'perf report' (TUI) and 'perf annotate' should be much lighter-weight in
  terms of memory footprint.

* Support cross-arch perf register access.  Clean up the build configuration
  so that it can detect arch-register support at runtime.  This can allow to
  parse register data in sample which was recorded in a different arch.

Others
------
* Sync task state in 'perf sched' to kernel using trace event fields.  The
  task states have been changed so tools cannot assume a fixed encoding.

* Clean up 'perf mem' to generalize the arch-specific events.

* Add support for local and global variables to data type profiling.  This
  would increase the success rate of type resolution with DWARF.

* Add short option -H for --hierarchy in 'perf report' and 'perf top'.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
perf annotate: Add comments in the data structures

Reviewed-by: Ian Rogers <irogers@google.com>
Reviewed-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240304230815.1440583-5-namhyung@kernel.org
1 file changed