intel/brw: Unconditionally run optimizations after nir_opt_uniform_subgroup
I observed some ray tracing shaders where a resource_intel inside a loop was non-uniform, and some code was lowered to account for that. Eventually the loop containing the resource_intel was unrolled, and the resource_intel became uniform. For example, nir_opt_uniform_subgroup can transform something like con loop { con block b5: // preds: b4 b8 con 32 %330 = @read_first_invocation (%329) con 1 %331 = ieq %330, %329 // succs: b6 b7 if %331 { con block b6: // preds: b5 con 32 %332 = iadd %120.b, %330 con 32 %333 = @resource_intel (%125 (0xdeaddeed), %332, %125 (0xdeaddeed), %3 (0x0)) (desc_set=1, binding=2, resource_intel=bindless|non-uniform, resource_block_intel=-1) div 32x4 %334 = (float32)txl %333 (texture_handle), %130 (sampler_handle), %327 (coord), %275 (lod), 0 (texture), 0 (sampler) break // succs: b9 } else { con block b7: // preds: b5, succs: b8 } con block b8: // preds: b7, succs: b5 } into con loop { con block b5: // preds: b4 b8 con 1 %331 = ieq %329, %329 // succs: b6 b7 if %331 { con block b6: // preds: b5 con 32 %332 = iadd %120.b, %329 con 32 %333 = @resource_intel (%125 (0xdeaddeed), %332, %125 (0xdeaddeed), %3 (0x0)) (desc_set=1, binding=2, resource_intel=bindless|non-uniform, resource_block_intel=-1) div 32x4 %334 = (float32)txl %333 (texture_handle), %130 (sampler_handle), %327 (coord), %275 (lod), 0 (texture), 0 (sampler) break // succs: b9 } else { con block b7: // preds: b5, succs: b8 } con block b8: // preds: b7, succs: b5 } Notice that %331 is now a tautology. Running brw_nir_optimize again eliminates the loop. v2: Add a comment in the code explaining the rationale. Suggested by Ken. Update the commit message. Suggested by Caio. shader-db: Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) total instructions in shared programs: 19733448 -> 19733330 (<.01%) instructions in affected programs: 14120 -> 14002 (-0.84%) helped: 32 / HURT: 3 total cycles in shared programs: 916254496 -> 916226288 (<.01%) cycles in affected programs: 2035116 -> 2006908 (-1.39%) helped: 19 / HURT: 13 total spills in shared programs: 5807 -> 5807 (0.00%) spills in affected programs: 26 -> 26 (0.00%) helped: 1 / HURT: 1 total fills in shared programs: 6794 -> 6792 (-0.03%) fills in affected programs: 84 -> 82 (-2.38%) helped: 1 / HURT: 1 LOST: 1 GAINED: 1 Ice Lake and Skylake had similar results. (Ice Lake shown) total instructions in shared programs: 20393084 -> 20392971 (<.01%) instructions in affected programs: 21750 -> 21637 (-0.52%) helped: 31 / HURT: 4 total cycles in shared programs: 880273065 -> 880247818 (<.01%) cycles in affected programs: 2546748 -> 2521501 (-0.99%) helped: 18 / HURT: 9 total spills in shared programs: 4628 -> 4630 (0.04%) spills in affected programs: 287 -> 289 (0.70%) helped: 1 / HURT: 2 total fills in shared programs: 5381 -> 5376 (-0.09%) fills in affected programs: 711 -> 706 (-0.70%) helped: 2 / HURT: 2 LOST: 1 GAINED: 1 fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 151513669 -> 151505520 (-0.01%); split: -0.01%, +0.00% Send messages: 7459339 -> 7459396 (+0.00%) Loop count: 49111 -> 47588 (-3.10%) Cycle count: 17208178205 -> 17201385104 (-0.04%); split: -0.05%, +0.01% Spill count: 80830 -> 80827 (-0.00%); split: -0.02%, +0.01% Fill count: 152754 -> 152693 (-0.04%); split: -0.04%, +0.00% Scratch Memory Size: 4136960 -> 4130816 (-0.15%) Max live registers: 32016493 -> 32015955 (-0.00%); split: -0.00%, +0.00% Totals from 672 (0.11% of 630198) affected shaders: Instrs: 1352428 -> 1344279 (-0.60%); split: -0.78%, +0.17% Send messages: 54302 -> 54359 (+0.10%) Loop count: 6124 -> 4601 (-24.87%) Cycle count: 1260266379 -> 1253473278 (-0.54%); split: -0.69%, +0.16% Spill count: 15967 -> 15964 (-0.02%); split: -0.09%, +0.08% Fill count: 36245 -> 36184 (-0.17%); split: -0.18%, +0.01% Scratch Memory Size: 740352 -> 734208 (-0.83%) Max live registers: 50699 -> 50161 (-1.06%); split: -1.45%, +0.39% Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 149976046 -> 149971100 (-0.00%); split: -0.00%, +0.00% Subgroup size: 7685264 -> 7685256 (-0.00%) Cycle count: 15566401168 -> 15566405478 (+0.00%); split: -0.00%, +0.00% Spill count: 61238 -> 61240 (+0.00%) Fill count: 107301 -> 107289 (-0.01%) Max live registers: 31992969 -> 31993857 (+0.00%); split: -0.00%, +0.00% Totals from 553 (0.09% of 629912) affected shaders: Instrs: 557027 -> 552081 (-0.89%); split: -0.90%, +0.01% Subgroup size: 8648 -> 8640 (-0.09%) Cycle count: 150154496 -> 150158806 (+0.00%); split: -0.23%, +0.24% Spill count: 181 -> 183 (+1.10%) Fill count: 440 -> 428 (-2.73%) Max live registers: 33698 -> 34586 (+2.64%); split: -0.02%, +2.65% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30251>
This commit is contained in:
@@ -1806,8 +1806,15 @@ brw_postprocess_nir(nir_shader *nir, const struct brw_compiler *compiler,
|
||||
/* Some of the optimizations can generate 64-bit integer multiplication
|
||||
* that must be lowered.
|
||||
*/
|
||||
if (OPT(nir_lower_int64))
|
||||
brw_nir_optimize(nir, devinfo);
|
||||
OPT(nir_lower_int64);
|
||||
|
||||
/* Even if nir_lower_int64 did not make progress, re-run the main
|
||||
* optimization loop. nir_opt_uniform_subgroup may have made some things
|
||||
* that previously appeared divergent be marked as convergent. This
|
||||
* allows the elimination of some loops over, say, a TXF instruction
|
||||
* with a non-uniform texture handle.
|
||||
*/
|
||||
brw_nir_optimize(nir, devinfo);
|
||||
|
||||
OPT(nir_lower_subgroups, &subgroups_options);
|
||||
}
|
||||
|
Reference in New Issue
Block a user