perf tools updates for 7.1

perf report:

 - Add 'comm_nodigit' sort key to combine similar threads that only have
   different numbers in the comm.  In the following example, the
   'comm_nodigit' will have samples from all threads starting with
   "bpfrb/" into an entry "bpfrb/<N>".

    $ perf report -s comm_nodigit,comm -H
    ...
    #
    #    Overhead  CommandNoDigit / Command
    # ...........  ........................
    #
        20.30%     swapper
           20.30%     swapper
        13.37%     chrome
           13.37%     chrome
        10.07%     bpfrb/<N>
            7.47%     bpfrb/0
            0.70%     bpfrb/1
            0.47%     bpfrb/3
            0.46%     bpfrb/2
            0.25%     bpfrb/4
            0.23%     bpfrb/5
            0.20%     bpfrb/6
            0.14%     bpfrb/10
            0.07%     bpfrb/7

 - Support flat layout for symfs.  The --symfs option is to specify the
   location of debugging symbol files.  The default 'hierarchy' layout
   would search the symbol file using the same path of the original file
   under the symfs root.  The new 'flat' layout would search only in the
   root directory.

 - Update 'simd' sort key for ARM SIMD flags to cover ASE/SME and more
   predicate flags.

perf stat:

 - Add --pmu-filter option to select specific PMUs.  This would be
   useful when you measure metrics from multiple instance of uncore PMUs
   with similar names.

    # perf stat -M cpa_p0_avg_bw
     Performance counter stats for 'system wide':

        19,417,779,115      hisi_sicl0_cpa0/cpa_cycles/      #     0.00 cpa_p0_avg_bw
                     0      hisi_sicl0_cpa0/cpa_p0_wr_dat/
                     0      hisi_sicl0_cpa0/cpa_p0_rd_dat_64b/
                     0      hisi_sicl0_cpa0/cpa_p0_rd_dat_32b/
        19,417,751,103      hisi_sicl10_cpa0/cpa_cycles/     #     0.00 cpa_p0_avg_bw
                     0      hisi_sicl10_cpa0/cpa_p0_wr_dat/
                     0      hisi_sicl10_cpa0/cpa_p0_rd_dat_64b/
                     0      hisi_sicl10_cpa0/cpa_p0_rd_dat_32b/
        19,417,730,679      hisi_sicl2_cpa0/cpa_cycles/      #     0.31 cpa_p0_avg_bw
            75,635,749      hisi_sicl2_cpa0/cpa_p0_wr_dat/
            18,520,640      hisi_sicl2_cpa0/cpa_p0_rd_dat_64b/
                     0      hisi_sicl2_cpa0/cpa_p0_rd_dat_32b/
        19,417,674,227      hisi_sicl8_cpa0/cpa_cycles/      #     0.00 cpa_p0_avg_bw
                     0      hisi_sicl8_cpa0/cpa_p0_wr_dat/
                     0      hisi_sicl8_cpa0/cpa_p0_rd_dat_64b/
                     0      hisi_sicl8_cpa0/cpa_p0_rd_dat_32b/

          19.417734480 seconds time elapsed

   With --pmu-filter, users can select only hisi_sicl2_cpa0 PMU.

    # perf stat --pmu-filter hisi_sicl2_cpa0 -M cpa_p0_avg_bw
     Performance counter stats for 'system wide':

         6,234,093,559      cpa_cycles                       #     0.60 cpa_p0_avg_bw
            50,548,465      cpa_p0_wr_dat
             7,552,182      cpa_p0_rd_dat_64b
                     0      cpa_p0_rd_dat_32b

           6.234139320 seconds time elapsed

Data type profiling:

 - Quality improvements by tracking register state more precisely.
 - Ensure array members to get the type.
 - Handle more cases for global variables.

Vendor event/metric updates:

 - Update various Intel events and metrics
 - Add NVIDIA Tegra 410 Olympus events

Internal changes:

 - Verify perf.data header for maliciously crafted files.
 - Update perf test to cover more usages and make them robust.
 - Move a couple of copied kernel headers not to annoy objtool build.
 - Fix a bug in map sorting in name order.
 - Remove some unused codes.

Misc:

 - Fix module symbol resolution with non-zero text address.
 - Add -t/--threads option to `perf bench mem mmap`.
 - Track duration of exit*() syscall by `perf trace -s`.
 - Add core.addr2line-timeout and core.addr2line-disable-warn config
   items.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Merge tag 'v7.0-rc6' into perf-tools

To get the latest updates and fixes.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>