From 4420251947443e5f29ecc702900e560e66e73f0e Mon Sep 17 00:00:00 2001 From: Francisco Jerez Date: Wed, 19 Oct 2022 16:13:24 -0700 Subject: [PATCH] intel/rt: Fix L3 bank performance bottlenecks due to SW stack stride alignment. Power-of-two SW stack sizes are prone to causing collisions in the hashing function used by the L3 to map memory addresses to banks, which can cause stack accesses from most DSSes to bottleneck on a single L3 bank. Fix it by padding the SW stack stride by a single cacheline if it was a power of two. This has been reported by Felix DeGrood to improve Quake2 RTX performance by ~30% on DG2-512 in combination with other RT patches Lionel Landwerlin has been working on. Many thanks to Felix DeGrood for doing much of the legwork and providing several iterations of Q2RTX performance counter dumps which eventually prompted me to consider the hash collision theory and motivated this patch, and for providing additional performance counter dumps confirming that there is no longer an appreciable imbalance in traffic across L3 banks after this change. Reviewed-by: Lionel Landwerlin Part-of: --- src/intel/compiler/brw_rt.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/src/intel/compiler/brw_rt.h b/src/intel/compiler/brw_rt.h index d03187636f6..15c024072f1 100644 --- a/src/intel/compiler/brw_rt.h +++ b/src/intel/compiler/brw_rt.h @@ -230,6 +230,18 @@ brw_rt_compute_scratch_layout(struct brw_rt_scratch_layout *layout, assert(size % 64 == 0); layout->sw_stack_start = size; layout->sw_stack_size = ALIGN(sw_stack_size, 64); + + /* Currently it's always the case that sw_stack_size is a power of + * two, but power-of-two SW stack sizes are prone to causing + * collisions in the hashing function used by the L3 to map memory + * addresses to banks, which can cause stack accesses from most + * DSSes to bottleneck on a single L3 bank. Fix it by padding the + * SW stack by a single cacheline if it was a power of two. + */ + if (layout->sw_stack_size > 64 && + util_is_power_of_two_nonzero(layout->sw_stack_size)) + layout->sw_stack_size += 64; + size += num_stack_ids * layout->sw_stack_size; layout->total_size = size;