| ================================== | 
 | Long running workloads and compute | 
 | ================================== | 
 |  | 
 | Long running workloads (compute) are workloads that will not complete in 10 | 
 | seconds. (The time let the user wait before he reaches for the power button). | 
 | This means that other techniques need to be used to manage those workloads, | 
 | that cannot use fences. | 
 |  | 
 | Some hardware may schedule compute jobs, and have no way to pre-empt them, or | 
 | have their memory swapped out from them. Or they simply want their workload | 
 | not to be preempted or swapped out at all. | 
 |  | 
 | This means that it differs from what is described in driver-api/dma-buf.rst. | 
 |  | 
 | As with normal compute jobs, dma-fence may not be used at all. In this case, | 
 | not even to force preemption. The driver with is simply forced to unmap a BO | 
 | from the long compute job's address space on unbind immediately, not even | 
 | waiting for the workload to complete. Effectively this terminates the workload | 
 | when there is no hardware support to recover. | 
 |  | 
 | Since this is undesirable, there need to be mitigations to prevent a workload | 
 | from being terminated. There are several possible approach, all with their | 
 | advantages and drawbacks. | 
 |  | 
 | The first approach you will likely try is to pin all buffers used by compute. | 
 | This guarantees that the job will run uninterrupted, but also allows a very | 
 | denial of service attack by pinning as much memory as possible, hogging the | 
 | all GPU memory, and possibly a huge chunk of CPU memory. | 
 |  | 
 | A second approach that will work slightly better on its own is adding an option | 
 | not to evict when creating a new job (any kind). If all of userspace opts in | 
 | to this flag, it would prevent cooperating userspace from forced terminating | 
 | older compute jobs to start a new one. | 
 |  | 
 | If job preemption and recoverable pagefaults are not available, those are the | 
 | only approaches possible. So even with those, you want a separate way of | 
 | controlling resources. The standard kernel way of doing so is cgroups. | 
 |  | 
 | This creates a third option, using cgroups to prevent eviction. Both GPU and | 
 | driver-allocated CPU memory would be accounted to the correct cgroup, and | 
 | eviction would be made cgroup aware. This allows the GPU to be partitioned | 
 | into cgroups, that will allow jobs to run next to each other without | 
 | interference. | 
 |  | 
 | The interface to the cgroup would be similar to the current CPU memory | 
 | interface, with similar semantics for min/low/high/max, if eviction can | 
 | be made cgroup aware. | 
 |  | 
 | What should be noted is that each memory region (tiled memory for example) | 
 | should have its own accounting. | 
 |  | 
 | The key is set to the regionid set by the driver, for example "tile0". | 
 | For the value of $card, we use drmGetUnique(). |