anv: set ComputeMode.PixelAsyncComputeThreadLimit = 4

Heuristic-based optimization throttling CCS work (async compute). Without throttling, background compute work consumes all threads, deminishing performance gains by running dispatch in parallel with 3D work. Optimization is heuristics based, meaning a workload might slow down when using async compute. Best value: PixelAsyncComputeThreadLimit = 4. On DG2, this equates to a max CCS thread occupancy of 37.5%. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25508>
2023-09-13 20:56:59 +00:00
parent 8ff4847b64
commit b561bcd78c
1 changed files with 4 additions and 1 deletions
--- a/src/intel/vulkan/genX_init_state.c
+++ b/src/intel/vulkan/genX_init_state.c
@@ -654,7 +654,10 @@ init_compute_queue_state(struct anv_queue *queue)
          ANV_PIPE_HDC_PIPELINE_FLUSH_BIT);
   }

-   anv_batch_emit(&batch, GENX(STATE_COMPUTE_MODE), zero);
+   anv_batch_emit(&batch, GENX(STATE_COMPUTE_MODE), cm) {
+      cm.PixelAsyncComputeThreadLimit = 4;
+      cm.PixelAsyncComputeThreadLimitMask = 0x7;
+   }
 #endif

   init_common_queue_state(queue, &batch);