This tag contains habanalabs driver and accel changes for v6.4:

- uAPI changes:

  - Add opcodes to the CS ioctl to allow user to stall/resume specific engines
    inside Gaudi2. This is to allow the user to perform power
    testing/measurements when training different topologies.

  - Expose in the INFO ioctl the amount of device memory that the driver
    and f/w reserve for themselves.

  - Expose in the INFO ioctl a bit-mask of the available rotator engines
    in Gaudi2. This is to align with other engines that are already exposed.

  - Expose in the INFO ioctl the register's address of the f/w that should
    be used to trigger interrupts from within the user's code running in the
    compute engines.

  - Add a critical-event bit in the eventfd bitmask so the user will know the
    event that was received was critical, and a reset will now occur

  - Expose in the INFO ioctl two new opcodes to fetch information on h/w and
    f/w events. The events recorded are the events that were reported in the

- New features and improvements:

  - Add a dedicated interrupt ID in MSI-X in the device to the notification of
    an unexpected user-related event in Gaudi2. Handle it in the driver by
    reporting this event.

  - Allow the user to fetch the device memory current usage even when the
    device is undergoing compute-reset (a reset type that only clears the
    compute engines).

  - Enable graceful reset mechanism for compute-reset. This will give the
    user a few seconds before the device is reset. For example, the user can,
    during that time, perform certain device operations (dump data for debug)
    or close the device in an orderly fashion.

  - Align the decoder with the rest of the engines in regard to notification
    to the user about interrupts and in regard to performing graceful reset
    when needed (instead of immediate reset).

  - Add support for assert interrupt from the TPC engine.

  - Get the reset type that is necessary to perform per event from the
    auto-generated irq_map array.

  - Print the specific reason why a device is still in use when notifying to
    the user about it (after the user closed the device's FD).

  - Move to threaded IRQ when handling interrupts of workload completions.

- Firmware related fixes:

  - Fix RAZWI event handler to match newest f/w version.

  - Read error cause register in dma core events because the f/w doesn't
    do that.

  - Increase maximum time to wait for completion of Gaudi2 reset due to f/w

  - Align to the latest firmware specs.

- Enforce the release order of the compute device and dma-buf.
  i.e increment the device file refcount for any dma-buf that was exported
  for that device. This will make sure the compute device release function
  won't be called until the user closes all the FDs of the relevant
  dma-bufs. Without this change, closing the device's FD before/without
  closing the dma-buf's FD would always lead to hard-reset of the device.

- Fix a link in the drm documentation to correctly point to the accel section.

- Compilation warnings cleanups

- Misc bug fixes and code cleanups
accel/habanalabs: remove redundant TODOs

As mmu refactor and nic resume are not relevant anymore, remove
their TODO comments.

Signed-off-by: Ofir Bitton <>
Reviewed-by: Oded Gabbay <>
Signed-off-by: Oded Gabbay <>
1 file changed