| =============== | 
 |  GPU Debugging | 
 | =============== | 
 |  | 
 | General Debugging Options | 
 | ========================= | 
 |  | 
 | The DebugFS section provides documentation on a number files to aid in debugging | 
 | issues on the GPU. | 
 |  | 
 |  | 
 | GPUVM Debugging | 
 | =============== | 
 |  | 
 | To aid in debugging GPU virtual memory related problems, the driver supports a | 
 | number of options module parameters: | 
 |  | 
 | `vm_fault_stop` - If non-0, halt the GPU memory controller on a GPU page fault. | 
 |  | 
 | `vm_update_mode` - If non-0, use the CPU to update GPU page tables rather than | 
 | the GPU. | 
 |  | 
 |  | 
 | Decoding a GPUVM Page Fault | 
 | =========================== | 
 |  | 
 | If you see a GPU page fault in the kernel log, you can decode it to figure | 
 | out what is going wrong in your application.  A page fault in your kernel | 
 | log may look something like this: | 
 |  | 
 | :: | 
 |  | 
 |  [gfxhub0] no-retry page fault (src_id:0 ring:24 vmid:3 pasid:32777, for process glxinfo pid 2424 thread glxinfo:cs0 pid 2425) | 
 |    in page starting at address 0x0000800102800000 from IH client 0x1b (UTCL2) | 
 |  VM_L2_PROTECTION_FAULT_STATUS:0x00301030 | 
 |  	Faulty UTCL2 client ID: TCP (0x8) | 
 |  	MORE_FAULTS: 0x0 | 
 |  	WALKER_ERROR: 0x0 | 
 |  	PERMISSION_FAULTS: 0x3 | 
 |  	MAPPING_ERROR: 0x0 | 
 |  	RW: 0x0 | 
 |  | 
 | First you have the memory hub, gfxhub and mmhub.  gfxhub is the memory | 
 | hub used for graphics, compute, and sdma on some chips.  mmhub is the | 
 | memory hub used for multi-media and sdma on some chips. | 
 |  | 
 | Next you have the vmid and pasid.  If the vmid is 0, this fault was likely | 
 | caused by the kernel driver or firmware.  If the vmid is non-0, it is generally | 
 | a fault in a user application.  The pasid is used to link a vmid to a system | 
 | process id.  If the process is active when the fault happens, the process | 
 | information will be printed. | 
 |  | 
 | The GPU virtual address that caused the fault comes next. | 
 |  | 
 | The client ID indicates the GPU block that caused the fault. | 
 | Some common client IDs: | 
 |  | 
 | - CB/DB: The color/depth backend of the graphics pipe | 
 | - CPF: Command Processor Frontend | 
 | - CPC: Command Processor Compute | 
 | - CPG: Command Processor Graphics | 
 | - TCP/SQC/SQG: Shaders | 
 | - SDMA: SDMA engines | 
 | - VCN: Video encode/decode engines | 
 | - JPEG: JPEG engines | 
 |  | 
 | PERMISSION_FAULTS describe what faults were encountered: | 
 |  | 
 | - bit 0: the PTE was not valid | 
 | - bit 1: the PTE read bit was not set | 
 | - bit 2: the PTE write bit was not set | 
 | - bit 3: the PTE execute bit was not set | 
 |  | 
 | Finally, RW, indicates whether the access was a read (0) or a write (1). | 
 |  | 
 | In the example above, a shader (cliend id = TCP) generated a read (RW = 0x0) to | 
 | an invalid page (PERMISSION_FAULTS = 0x3) at GPU virtual address | 
 | 0x0000800102800000.  The user can then inspect their shader code and resource | 
 | descriptor state to determine what caused the GPU page fault. | 
 |  | 
 | UMR | 
 | === | 
 |  | 
 | `umr <https://gitlab.freedesktop.org/tomstdenis/umr>`_ is a general purpose | 
 | GPU debugging and diagnostics tool.  Please see the umr | 
 | `documentation <https://umr.readthedocs.io/en/main/>`_ for more information | 
 | about its capabilities. | 
 |  | 
 | Debugging backlight brightness | 
 | ============================== | 
 | Default backlight brightness is intended to be set via the policy advertised | 
 | by the firmware.  Firmware will often provide different defaults for AC or DC. | 
 | Furthermore, some userspace software will save backlight brightness during | 
 | the previous boot and attempt to restore it. | 
 |  | 
 | Some firmware also has support for a feature called "Custom Backlight Curves" | 
 | where an input value for brightness is mapped along a linearly interpolated | 
 | curve of brightness values that better match display characteristics. | 
 |  | 
 | In the event of problems happening with backlight, there is a trace event | 
 | that can be enabled at bootup to log every brightness change request. | 
 | This can help isolate where the problem is. To enable the trace event add | 
 | the following to the kernel command line: | 
 |  | 
 |   tp_printk trace_event=amdgpu_dm:amdgpu_dm_brightness:mod:amdgpu trace_buf_size=1M |