perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)

The ARMv9.2 architecture introduces the optional Branch Record Buffer
Extension (BRBE), which records information about branches as they are
executed into set of branch record registers. BRBE is similar to x86's
Last Branch Record (LBR) and PowerPC's Branch History Rolling Buffer
(BHRB).

BRBE supports filtering by exception level and can filter just the
source or target address if excluded to avoid leaking privileged
addresses. The h/w filter would be sufficient except when there are
multiple events with disjoint filtering requirements. In this case, BRBE
is configured with a union of all the events' desired branches, and then
the recorded branches are filtered based on each event's filter. For
example, with one event capturing kernel events and another event
capturing user events, BRBE will be configured to capture both kernel
and user branches. When handling event overflow, the branch records have
to be filtered by software to only include kernel or user branch
addresses for that event.

The event and branch exception level filtering are separately
controlled. On x86, it is possible to request filtering which is
disjoint (e.g. kernel only event with user only branches). It is also
possible on x86 to configure branch filter such that no branches are
ever recorded (e.g. -j save_type). For BRBE, events with mismatched
exception level filtering or a configuration that will result in no
samples are rejected. This can be relaxed in the future if such a need
arises.

The handling of KVM guests is similar to the above. On x86, branch
recording is always disabled when guest is running. However, requesting
branch recording in guests is allowed. The guest events are recorded,
but the resulting branches are all from the host. For BRBE, branch
recording is similarly disabled when guest is running. In addition,
events with branch recording and "exclude_host" set are rejected.
Requiring "exclude_guest" to be set did not work. The default for the
perf tool does set "exclude_guest" if no exception level options are
specified. However, specifying kernel or user defaults to including both
host and guest. In this case, only host branches are recorded.

BRBE can support some additional exception, FIQ, and debug branch
types, but they are not supported currently. There's no control in the
perf ABI to enable/disable these branch types, so they could only be
enabled for the 'any' filter which might be undesired or unexpected.
The other architectures don't have any support similar events (at least
with perf). These can be added in the future if there is demand by
adding additional specific filter types.

Questions:
Is it wrong/unnecessary to invalidate after reading the branch buffer?
If 2 events happened close to each other, their branch records could
overlap and we'd want the same records recorded for both events? If the
events are not close together, then the branch records have probably all
been overwritten since the previous event. But if the same event occurs
again before the branch records have all been overwritten, then the
branch record would be recorded twice.

What about nVHE EL2 recording? When is PERF_SAMPLE_BRANCH_HV useful?

TODO/explain:
* Explain filtering and potential conflicts
  - Note: LBR allows the last-installed event to override all others,
    which is broken. This means some events get garbage data, and others
    expose information which they should not have (e.g. kernel
    addresses), and is a security risk.
[rh: done]

* Explain when/why records get discarded
  - Note: architecture permits HW to (infrequently) discard all records,
    but not to omit records mid-sequence
  - Note: saving/restoring at context-switch doesn't interact well with
    event rotation (e.g. if filters change)
[rh: Not sure what I do on these 2?]

TODO/discover:
* Why does branch_mask_set_arch() exist?
  - Why are these not additional filter types?
  - Do any users actually want these events?
[rh: removed Arm specific filter types]

* What should the generic branch filters include/exclude?
* How should the generic branch filters combine with the event filters?
* How do async exceptions (e.g. IRQ) get handled by LBR or BHRB?
[rh: For LBR, only enabled for 'any' or 'anycall', disabled otherwise]
  - Do exception returns need to be filtered out in any cases?

TODO/decide:
* Decide *how* event filter conflicts should be handled
  (a) Do the same as x86, and use "last event overrides all"
      - Not really an option
  (b) Reject conflicting filters when scheduling events
      - Does this break realistic usage?
      - May this need tool changes and/or new ABI?
  (c) Combine filters and program most permissive settings in to HW,
      filter in SW when reading
      - Is this possible/reliable?
      - Can this lose enough records samples to not be worthwhile?
[rh: c is the current operation. The s/w filtering did not really
match the h/w filtering and has been re-written.]

  (d) Something else?

TODO/fix:
* Fix filtering to avoid exposing kernel addresses when consumer cannot
  record kernel, SRC is kernel and DST is user.
  - Note: LBR has this bug today
[rh: Fixed in reworking filtering]
* Fix the barriers in the IRQ handling flow.
  - The existing ISB usage for starting/stopping the PMU isn't quite right
  - This needs comments to explin the potential races
[rh: fixed. Or at least attempted to]
  - We *might* want to remove the ISB from the irqchip code
    ... as we only need to manipulate PMU if OVSR shows overflow

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Co-developed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Co-developed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
v19:
- Various style cleanups
- Got rid of added armpmu ops. All BRBE support contained within pmuv3
  code.
- Dropped armpmu.num_branch_records as reg_brbidr has same info.
- Make sched_task() callback actually get called. Enabling requires a
  call to perf_sched_cb_inc().
- Fix freeze on overflow for VHE
- The cycle counter doesn't freeze BRBE on overflow, so avoid assigning
  it when BRBE is enabled.
- Drop all the Arm specific exception branches. Not a clear need for
  them.
- Simplify enable/disable to avoid RMW and document ISBs need
- Fix handling of branch 'cycles' reading. CC field is
  mantissa/exponent, not an integer.
- Rework s/w filtering to better match h/w filtering
- Reject events with disjoint event filter and branch filter
- Reject events if exclude_host is set
7 files changed