ac/nir: Enable nir_opt_large_constants
vkpipeline-db numbers: Totals: SGPRS: 1740306 -> 1741322 (0.06 %) VGPRS: 1331124 -> 1331712 (0.04 %) Spilled SGPRs: 21201 -> 21316 (0.54 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 256 -> 256 (0.00 %) dwords per thread Code Size: 79022628 -> 78694788 (-0.41 %) bytes LDS: 6500 -> 6500 (0.00 %) blocks Max Waves: 301413 -> 301302 (-0.04 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 53633 -> 54649 (1.89 %) VGPRS: 53000 -> 53588 (1.11 %) Spilled SGPRs: 3454 -> 3569 (3.33 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 0 -> 0 (0.00 %) Scratch size: 0 -> 0 (0.00 %) dwords per thread Code Size: 5284232 -> 4956392 (-6.20 %) bytes LDS: 2 -> 2 (0.00 %) blocks Max Waves: 4239 -> 4128 (-2.62 %) Wait states: 0 -> 0 (0.00 %) (The biggest VGPR and max wave regression is due to unrolling a loop, which made the scheduler more aggressive, but in this case it's able to effectively hide latency so it's actually probably a win.) shader-db numbers with radeonsi NIR: Totals: SGPRS: 3526496 -> 3526512 (0.00 %) VGPRS: 2198576 -> 2198576 (0.00 %) Spilled SGPRs: 10463 -> 10463 (0.00 %) Spilled VGPRs: 86 -> 86 (0.00 %) Private memory VGPRs: 3182 -> 2528 (-20.55 %) Scratch size: 3308 -> 2640 (-20.19 %) dwords per thread Code Size: 74117280 -> 74106140 (-0.02 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 775846 -> 775844 (-0.00 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 856 -> 872 (1.87 %) VGPRS: 680 -> 680 (0.00 %) Spilled SGPRs: 0 -> 0 (0.00 %) Spilled VGPRs: 0 -> 0 (0.00 %) Private memory VGPRs: 654 -> 0 (-100.00 %) Scratch size: 668 -> 0 (-100.00 %) dwords per thread Code Size: 49652 -> 38512 (-22.44 %) bytes LDS: 0 -> 0 (0.00 %) blocks Max Waves: 182 -> 180 (-1.10 %) Wait states: 0 -> 0 (0.00 %) Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This commit is contained in:
@@ -442,6 +442,13 @@ radv_shader_compile_to_nir(struct radv_device *device,
|
||||
*/
|
||||
nir_lower_var_copies(nir);
|
||||
|
||||
/* Lower large variables that are always constant with load_constant
|
||||
* intrinsics, which get turned into PC-relative loads from a data
|
||||
* section next to the shader.
|
||||
*/
|
||||
NIR_PASS_V(nir, nir_opt_large_constants,
|
||||
glsl_get_natural_size_align_bytes, 16);
|
||||
|
||||
/* Indirect lowering must be called after the radv_optimize_nir() loop
|
||||
* has been called at least once. Otherwise indirect lowering can
|
||||
* bloat the instruction count of the loop and cause it to be
|
||||
|
@@ -986,6 +986,13 @@ void si_lower_nir(struct si_shader_selector *sel)
|
||||
};
|
||||
NIR_PASS_V(sel->nir, nir_lower_subgroups, &subgroups_options);
|
||||
|
||||
/* Lower large variables that are always constant with load_constant
|
||||
* intrinsics, which get turned into PC-relative loads from a data
|
||||
* section next to the shader.
|
||||
*/
|
||||
NIR_PASS_V(sel->nir, nir_opt_large_constants,
|
||||
glsl_get_natural_size_align_bytes, 16);
|
||||
|
||||
ac_lower_indirect_derefs(sel->nir, sel->screen->info.chip_class);
|
||||
|
||||
si_nir_opts(sel->nir);
|
||||
|
Reference in New Issue
Block a user