intel/gfx12.5: Enable L3 partial write merging for compressible surfaces among other cases.

This enables L3 partial write merging for a number of cases that seem
to be getting accidentally disabled by the kernel, which was causing a
serious performance bottleneck on DG2 and MTL platforms.  The
"Compressible Partial Write Merge Enable", "Coherent Partial Write
Merge Enable" and "Cross-Tile Partial Write Merge Enable" bits in
L3SQCREG5 were expected to be enabled by default (and confusingly,
they even read off as enabled if you ran 'intel_reg read 0xb158' on an
idle system), but they are getting clobbered during 3D context
initialization by an i915 workaround.

Enabling L3 partial write merging of compressible surfaces in
particular seems to increase rendering fillrate by over 3x in some
cases (e.g. the
"VulkanFillRate/FillRateGPU/resolution:1[0-3]/format:*/blend:0"
fillrate-bound microbenchmarks).  Significant improvements can also be
reproduced in most real-world workloads we've tested so far,
e.g. Counter Strike GO improves by ~11%, Shadow Of the Tomb Raider
improves by ~5.5%, and AztecRuins-VK improves by ~6.5% on DG2-512 --
Thanks a lot to Caleb Callaway for these figures.  No regressions have
been observed so far.

Even though this patch might strike as surprisingly simple for such a
large payoff, it's the result of Felix DeGrood and I trying to
root-cause the rendering performance gap of DG2 on Linux vs Windows on
and off during the last year, and some of the OA statistics captured
by Felix early this month were greatly helpful for me to connect the
last few dots, so Felix deserves a big chunk of the credit for this
work.

Cc: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23783>
This commit is contained in:
Francisco Jerez
2023-06-20 18:12:12 -07:00
committed by Marge Bot
parent d7ec6f1724
commit 427fee3507
3 changed files with 36 additions and 0 deletions

View File

@@ -179,6 +179,21 @@ init_common_queue_state(struct anv_queue *queue, struct anv_batch *batch)
device->l3_config = cfg;
#endif
#if GFX_VERx10 == 125
/* Even though L3 partial write merging is supposed to be enabled
* by default on Gfx12.5 according to the hardware spec, i915
* appears to accidentally clear the enables during context
* initialization, so make sure to enable them here since partial
* write merging has a large impact on rendering performance.
*/
anv_batch_write_reg(batch, GENX(L3SQCREG5), reg) {
reg.L3CachePartialWriteMergeTimerInitialValue = 0x7f;
reg.CompressiblePartialWriteMergeEnable = true;
reg.CoherentPartialWriteMergeEnable = true;
reg.CrossTilePartialWriteMergeEnable = true;
}
#endif
#if GFX_VER >= 125
/* Wa_14014427904 - We need additional invalidate/flush when
* emitting NP state commands with ATS-M in compute mode.