nir/rematerialize: Rematerialize ALUs used only by compares with zero
This was 4th on the list of things to try in3ee2e84c60
("nir: Rematerialize compare instructions"). This is implemented as a separate subpass that tries to find ALU instructions (with restrictions) that are only used by comparisons with zero that are in turn only used as conditions for bcsel or if-statements. There are two restrictions implemented. One of the sources must be a constant. This is done in an attempt to prevent increasing register pressure. Additionally, the opcode of the instruction must be one that has a high probablility of getting a conditional modifier on Intel GPUs. Not all instructions can have a conditional modifiers (e.g., min and max), so I don't think there is any benefit to moving these instructions. v2: Rebase on many, many recent NIR infrastructure changes. v3: Make data in commit message more clear. Suggested by Matt. Rebase onb5d6b7c402
("nir: Drop most uses if nir_instr_rewrite_src()"). All of the affected shaders on ILK and G45 are in CS:GO. There is some brief analysis of the changes in the MR. Reviewed-by: Matt Tuner <mattst88@gmail.com> Shader-db results: DG2 total instructions in shared programs: 22824637 -> 22824258 (<.01%) instructions in affected programs: 365742 -> 365363 (-0.10%) helped: 190 / HURT: 97 total cycles in shared programs: 832186193 -> 832157290 (<.01%) cycles in affected programs: 41245259 -> 41216356 (-0.07%) helped: 208 / HURT: 117 total spills in shared programs: 4072 -> 4060 (-0.29%) spills in affected programs: 366 -> 354 (-3.28%) helped: 4 / HURT: 2 total fills in shared programs: 3601 -> 3607 (0.17%) fills in affected programs: 708 -> 714 (0.85%) helped: 4 / HURT: 2 LOST: 0 GAINED: 1 Tiger Lake and Ice Lake had similar results. (Ice Lake shown) total instructions in shared programs: 20320934 -> 20320689 (<.01%) instructions in affected programs: 236592 -> 236347 (-0.10%) helped: 176 / HURT: 29 total cycles in shared programs: 849846341 -> 849843856 (<.01%) cycles in affected programs: 41277336 -> 41274851 (<.01%) helped: 195 / HURT: 110 LOST: 0 GAINED: 1 Skylake total instructions in shared programs: 18550811 -> 18550470 (<.01%) instructions in affected programs: 233908 -> 233567 (-0.15%) helped: 182 / HURT: 25 total cycles in shared programs: 835910983 -> 835889167 (<.01%) cycles in affected programs: 38764359 -> 38742543 (-0.06%) helped: 207/ HURT: 94 total spills in shared programs: 4522 -> 4506 (-0.35%) spills in affected programs: 324 -> 308 (-4.94%) helped: 4 / HURT: 0 total fills in shared programs: 5296 -> 5280 (-0.30%) fills in affected programs: 324 -> 308 (-4.94%) helped: 4 / HURT: 0 LOST: 0 GAINED: 1 Broadwell total instructions in shared programs: 18199130 -> 18197920 (<.01%) instructions in affected programs: 214664 -> 213454 (-0.56%) helped: 191 / HURT: 0 total cycles in shared programs: 935131908 -> 934870248 (-0.03%) cycles in affected programs: 75770568 -> 75508908 (-0.35%) helped: 203 / HURT: 84 total spills in shared programs: 13896 -> 13734 (-1.17%) spills in affected programs: 162 -> 0 helped: 3 / HURT: 0 total fills in shared programs: 16989 -> 16761 (-1.34%) fills in affected programs: 228 -> 0 helped: 3 / HURT: 0 Haswell total instructions in shared programs: 16969502 -> 16969085 (<.01%) instructions in affected programs: 185498 -> 185081 (-0.22%) helped: 121 / HURT: 1 total cycles in shared programs: 925290863 -> 924806827 (-0.05%) cycles in affected programs: 30200863 -> 29716827 (-1.60%) helped: 100 / HURT: 85 total spills in shared programs: 13565 -> 13533 (-0.24%) spills in affected programs: 736 -> 704 (-4.35%) helped: 8 / HURT: 0 total fills in shared programs: 15468 -> 15436 (-0.21%) fills in affected programs: 740 -> 708 (-4.32%) helped: 8 / HURT: 0 LOST: 0 GAINED: 1 Ivy Bridge total instructions in shared programs: 15839127 -> 15838947 (<.01%) instructions in affected programs: 77776 -> 77596 (-0.23%) helped: 58 / HURT: 0 total cycles in shared programs: 459852774 -> 459739770 (-0.02%) cycles in affected programs: 11970210 -> 11857206 (-0.94%) helped: 79 / HURT: 53 Sandy Bridge total instructions in shared programs: 14106847 -> 14106831 (<.01%) instructions in affected programs: 1611 -> 1595 (-0.99%) helped: 10 / HURT: 0 total cycles in shared programs: 775004024 -> 775007516 (<.01%) cycles in affected programs: 2530686 -> 2534178 (0.14%) helped: 55 / HURT: 48 Iron Lake total cycles in shared programs: 257753356 -> 257754900 (<.01%) cycles in affected programs: 2977374 -> 2978918 (0.05%) helped: 12 / HURT: 106 GM45 total cycles in shared programs: 169711382 -> 169712816 (<.01%) cycles in affected programs: 2402070 -> 2403504 (0.06%) helped: 12 / HURT: 57 Fossil-db results: All Intel platforms had similar results. (DG2 shown) Totals: Instrs: 193884596 -> 193465896 (-0.22%); split: -0.25%, +0.03% Cycles: 14050193354 -> 14048194826 (-0.01%); split: -0.34%, +0.33% Spill count: 114944 -> 100449 (-12.61%); split: -13.59%, +0.98% Fill count: 201525 -> 179534 (-10.91%); split: -11.22%, +0.31% Scratch Memory Size: 10028032 -> 8468480 (-15.55%) Totals from 16912 (2.59% of 653124) affected shaders: Instrs: 34173709 -> 33755009 (-1.23%); split: -1.41%, +0.19% Cycles: 2945969110 -> 2943970582 (-0.07%); split: -1.62%, +1.55% Spill count: 97753 -> 83258 (-14.83%); split: -15.98%, +1.15% Fill count: 176355 -> 154364 (-12.47%); split: -12.82%, +0.35% Scratch Memory Size: 8619008 -> 7059456 (-18.09%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20176>
This commit is contained in:
@@ -57,6 +57,36 @@ is_two_src_comparison(const nir_alu_instr *instr)
|
||||
}
|
||||
}
|
||||
|
||||
static inline bool
|
||||
is_zero(const nir_alu_instr *instr, unsigned src, unsigned num_components,
|
||||
const uint8_t *swizzle)
|
||||
{
|
||||
/* only constant srcs: */
|
||||
if (!nir_src_is_const(instr->src[src].src))
|
||||
return false;
|
||||
|
||||
for (unsigned i = 0; i < num_components; i++) {
|
||||
nir_alu_type type = nir_op_infos[instr->op].input_types[src];
|
||||
switch (nir_alu_type_get_base_type(type)) {
|
||||
case nir_type_int:
|
||||
case nir_type_uint: {
|
||||
if (nir_src_comp_as_int(instr->src[src].src, swizzle[i]) != 0)
|
||||
return false;
|
||||
break;
|
||||
}
|
||||
case nir_type_float: {
|
||||
if (nir_src_comp_as_float(instr->src[src].src, swizzle[i]) != 0)
|
||||
return false;
|
||||
break;
|
||||
}
|
||||
default:
|
||||
return false;
|
||||
}
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool
|
||||
all_uses_are_bcsel(const nir_alu_instr *instr)
|
||||
{
|
||||
@@ -79,6 +109,28 @@ all_uses_are_bcsel(const nir_alu_instr *instr)
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool
|
||||
all_uses_are_compare_with_zero(const nir_alu_instr *instr)
|
||||
{
|
||||
nir_foreach_use(use, &instr->def) {
|
||||
if (use->parent_instr->type != nir_instr_type_alu)
|
||||
return false;
|
||||
|
||||
nir_alu_instr *const alu = nir_instr_as_alu(use->parent_instr);
|
||||
if (!is_two_src_comparison(alu))
|
||||
return false;
|
||||
|
||||
if (!is_zero(alu, 0, 1, alu->src[0].swizzle) &&
|
||||
!is_zero(alu, 1, 1, alu->src[1].swizzle))
|
||||
return false;
|
||||
|
||||
if (!all_uses_are_bcsel(alu))
|
||||
return false;
|
||||
}
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
static bool
|
||||
nir_opt_rematerialize_compares_impl(nir_shader *shader, nir_function_impl *impl)
|
||||
{
|
||||
@@ -160,6 +212,106 @@ nir_opt_rematerialize_compares_impl(nir_shader *shader, nir_function_impl *impl)
|
||||
return progress;
|
||||
}
|
||||
|
||||
static bool
|
||||
nir_opt_rematerialize_alu_impl(nir_shader *shader, nir_function_impl *impl)
|
||||
{
|
||||
bool progress = false;
|
||||
|
||||
nir_foreach_block(block, impl) {
|
||||
nir_foreach_instr(instr, block) {
|
||||
if (instr->type != nir_instr_type_alu)
|
||||
continue;
|
||||
|
||||
nir_alu_instr *const alu = nir_instr_as_alu(instr);
|
||||
|
||||
/* This list only include ALU ops that are likely to be able to have
|
||||
* cmod propagation on Intel GPUs.
|
||||
*/
|
||||
switch (alu->op) {
|
||||
case nir_op_ineg:
|
||||
case nir_op_iabs:
|
||||
case nir_op_fneg:
|
||||
case nir_op_fabs:
|
||||
case nir_op_fadd:
|
||||
case nir_op_iadd:
|
||||
case nir_op_iadd_sat:
|
||||
case nir_op_uadd_sat:
|
||||
case nir_op_isub_sat:
|
||||
case nir_op_usub_sat:
|
||||
case nir_op_irhadd:
|
||||
case nir_op_urhadd:
|
||||
case nir_op_fmul:
|
||||
case nir_op_inot:
|
||||
case nir_op_iand:
|
||||
case nir_op_ior:
|
||||
case nir_op_ixor:
|
||||
case nir_op_ffloor:
|
||||
case nir_op_ffract:
|
||||
case nir_op_uclz:
|
||||
case nir_op_ishl:
|
||||
case nir_op_ishr:
|
||||
case nir_op_ushr:
|
||||
case nir_op_urol:
|
||||
case nir_op_uror:
|
||||
break; /* ... from switch. */
|
||||
default:
|
||||
continue; /* ... with loop. */
|
||||
}
|
||||
|
||||
/* To help prevent increasing live ranges, require that one of the
|
||||
* sources be a constant.
|
||||
*/
|
||||
if (nir_op_infos[alu->op].num_inputs == 2 &&
|
||||
!nir_src_is_const(alu->src[0].src) &&
|
||||
!nir_src_is_const(alu->src[1].src))
|
||||
continue;
|
||||
|
||||
if (!all_uses_are_compare_with_zero(alu))
|
||||
continue;
|
||||
|
||||
/* At this point it is known that the alu is only used by a
|
||||
* comparison with zero that is used by nir_op_bcsel and possibly by
|
||||
* if-statements (though the latter has not been explicitly checked).
|
||||
*
|
||||
* Iterate through each use of the ALU. For every use that is in a
|
||||
* different block, emit a copy of the ALU. Care must be taken here.
|
||||
* The original instruction must be duplicated only once in each
|
||||
* block because CSE cannot be run after this pass.
|
||||
*/
|
||||
nir_foreach_use_safe(use, &alu->def) {
|
||||
nir_instr *const use_instr = use->parent_instr;
|
||||
|
||||
/* If the use is in the same block as the def, don't
|
||||
* rematerialize.
|
||||
*/
|
||||
if (use_instr->block == alu->instr.block)
|
||||
continue;
|
||||
|
||||
nir_alu_instr *clone = nir_alu_instr_clone(shader, alu);
|
||||
|
||||
nir_instr_insert_before(use_instr, &clone->instr);
|
||||
|
||||
nir_alu_instr *const use_alu = nir_instr_as_alu(use_instr);
|
||||
for (unsigned i = 0; i < nir_op_infos[use_alu->op].num_inputs; i++) {
|
||||
if (use_alu->src[i].src.ssa == &alu->def) {
|
||||
nir_src_rewrite(&use_alu->src[i].src, &clone->def);
|
||||
progress = true;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (progress) {
|
||||
nir_metadata_preserve(impl, nir_metadata_block_index |
|
||||
nir_metadata_dominance);
|
||||
} else {
|
||||
nir_metadata_preserve(impl, nir_metadata_all);
|
||||
}
|
||||
|
||||
return progress;
|
||||
}
|
||||
|
||||
bool
|
||||
nir_opt_rematerialize_compares(nir_shader *shader)
|
||||
{
|
||||
@@ -167,6 +319,8 @@ nir_opt_rematerialize_compares(nir_shader *shader)
|
||||
|
||||
nir_foreach_function_impl(impl, shader) {
|
||||
progress = nir_opt_rematerialize_compares_impl(shader, impl) || progress;
|
||||
|
||||
progress = nir_opt_rematerialize_alu_impl(shader, impl) || progress;
|
||||
}
|
||||
|
||||
return progress;
|
||||
|
Reference in New Issue
Block a user