|  | ================================== | 
|  | Long running workloads and compute | 
|  | ================================== | 
|  |  | 
|  | Long running workloads (compute) are workloads that will not complete in 10 | 
|  | seconds. (The time let the user wait before he reaches for the power button). | 
|  | This means that other techniques need to be used to manage those workloads, | 
|  | that cannot use fences. | 
|  |  | 
|  | Some hardware may schedule compute jobs, and have no way to pre-empt them, or | 
|  | have their memory swapped out from them. Or they simply want their workload | 
|  | not to be preempted or swapped out at all. | 
|  |  | 
|  | This means that it differs from what is described in driver-api/dma-buf.rst. | 
|  |  | 
|  | As with normal compute jobs, dma-fence may not be used at all. In this case, | 
|  | not even to force preemption. The driver with is simply forced to unmap a BO | 
|  | from the long compute job's address space on unbind immediately, not even | 
|  | waiting for the workload to complete. Effectively this terminates the workload | 
|  | when there is no hardware support to recover. | 
|  |  | 
|  | Since this is undesirable, there need to be mitigations to prevent a workload | 
|  | from being terminated. There are several possible approach, all with their | 
|  | advantages and drawbacks. | 
|  |  | 
|  | The first approach you will likely try is to pin all buffers used by compute. | 
|  | This guarantees that the job will run uninterrupted, but also allows a very | 
|  | denial of service attack by pinning as much memory as possible, hogging the | 
|  | all GPU memory, and possibly a huge chunk of CPU memory. | 
|  |  | 
|  | A second approach that will work slightly better on its own is adding an option | 
|  | not to evict when creating a new job (any kind). If all of userspace opts in | 
|  | to this flag, it would prevent cooperating userspace from forced terminating | 
|  | older compute jobs to start a new one. | 
|  |  | 
|  | If job preemption and recoverable pagefaults are not available, those are the | 
|  | only approaches possible. So even with those, you want a separate way of | 
|  | controlling resources. The standard kernel way of doing so is cgroups. | 
|  |  | 
|  | This creates a third option, using cgroups to prevent eviction. Both GPU and | 
|  | driver-allocated CPU memory would be accounted to the correct cgroup, and | 
|  | eviction would be made cgroup aware. This allows the GPU to be partitioned | 
|  | into cgroups, that will allow jobs to run next to each other without | 
|  | interference. | 
|  |  | 
|  | The interface to the cgroup would be similar to the current CPU memory | 
|  | interface, with similar semantics for min/low/high/max, if eviction can | 
|  | be made cgroup aware. | 
|  |  | 
|  | What should be noted is that each memory region (tiled memory for example) | 
|  | should have its own accounting. | 
|  |  | 
|  | The key is set to the regionid set by the driver, for example "tile0". | 
|  | For the value of $card, we use drmGetUnique(). |