radeonsi: enable nir_group_loads for better performance
The best case I have is one viewperf subtest getting +9% performance. 56979 shaders in 34726 tests Totals: SGPRS: 2667522 -> 2669178 (0.06 %) VGPRS: 1543608 -> 1553472 (0.64 %) Spilled SGPRs: 4090 -> 4100 (0.24 %) Spilled VGPRs: 1600 -> 1791 (11.94 %) Private memory VGPRs: 256 -> 256 (0.00 %) Scratch size: 1872 -> 2076 (10.90 %) dwords per thread Code Size: 59443980 -> 59479804 (0.06 %) bytes Max Waves: 867280 -> 865634 (-0.19 %) Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> v2: No change in pixels but the hash changed. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13604>
This commit is contained in:
@@ -37,7 +37,7 @@ traces:
|
||||
- path: gputest/pixmark-piano.trace
|
||||
expectations:
|
||||
- device: gl-radeonsi-stoney
|
||||
checksum: 58a86d233d03e2a174cb79c16028f916
|
||||
checksum: a7317d54d452d19ce630c7f554f2279b
|
||||
- path: gputest/triangle.trace
|
||||
expectations:
|
||||
- device: gl-radeonsi-stoney
|
||||
|
@@ -1426,6 +1426,11 @@ struct nir_shader *si_get_nir_shader(struct si_shader_selector *sel,
|
||||
nir_var_shader_out);
|
||||
}
|
||||
|
||||
/* This helps LLVM form VMEM clauses and thus get more GPU cache hits.
|
||||
* 200 is tuned for Viewperf. It should be done last.
|
||||
*/
|
||||
NIR_PASS_V(nir, nir_group_loads, nir_group_same_resource_only, 200);
|
||||
|
||||
return nir;
|
||||
}
|
||||
|
||||
|
Reference in New Issue
Block a user