nir: Eliminate nir_op_i2b

There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b.  Instead of adding a huge
pile of additional patterns (including variations that include both ine
and i2b), always lower i2b to a != 0.

At this point in the series, it should be impossible for anything to
generate i2b, so there /should not/ be any changes.

The failing test on d3d12 is a pre-existing bug that is triggered by
this change.  I talked to Jesse about it, and, after some analysis, he
suggested just adding it to the list of known failures.

v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b.

v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py.

v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after
nir_lower_doubles makes progress.  The latter can generate b2i
instructions, but nir_lower_int64 can't handle them (anymore).

v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I
had accidentally removed the f2b(bf2(x)) optimization.

v6: Just eliminate the i2b instruction.

v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused)
emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this
function was still used. 🤷

No shader-db changes on any Intel platform.

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141165875 -> 141165873 (-0.0%)
Instructions helped: 2

Cycles in all programs: 9098956382 -> 9098956350 (-0.0%)
Cycles helped: 2

The two Vulkan shaders are helped because of the "new" (('b2i32',
('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern.

Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
This commit is contained in:
Ian Romanick
2022-02-15 09:35:47 -08:00
committed by Marge Bot
parent 8b37046765
commit eb76cee9f8
31 changed files with 23 additions and 180 deletions

View File

@@ -1569,23 +1569,14 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr,
inst = bld.emit(SHADER_OPCODE_RSQ, result, op[0]);
break;
case nir_op_i2b32:
case nir_op_f2b32: {
uint32_t bit_size = nir_src_bit_size(instr->src[0].src);
if (bit_size == 64) {
/* two-argument instructions can't take 64-bit immediates */
fs_reg zero;
fs_reg tmp;
fs_reg zero = vgrf(glsl_type::double_type);
fs_reg tmp = vgrf(glsl_type::double_type);
if (instr->op == nir_op_f2b32) {
zero = vgrf(glsl_type::double_type);
tmp = vgrf(glsl_type::double_type);
bld.MOV(zero, setup_imm_df(bld, 0.0));
} else {
zero = vgrf(glsl_type::int64_t_type);
tmp = vgrf(glsl_type::int64_t_type);
bld.MOV(zero, brw_imm_q(0));
}
bld.MOV(zero, setup_imm_df(bld, 0.0));
/* A SIMD16 execution needs to be split in two instructions, so use
* a vgrf instead of the flag register as dst so instruction splitting
@@ -1596,11 +1587,10 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr,
} else {
fs_reg zero;
if (bit_size == 32) {
zero = instr->op == nir_op_f2b32 ? brw_imm_f(0.0f) : brw_imm_d(0);
zero = brw_imm_f(0.0f);
} else {
assert(bit_size == 16);
zero = instr->op == nir_op_f2b32 ?
retype(brw_imm_w(0), BRW_REGISTER_TYPE_HF) : brw_imm_w(0);
zero = retype(brw_imm_w(0), BRW_REGISTER_TYPE_HF);
}
bld.CMP(result, op[0], zero, BRW_CONDITIONAL_NZ);
}

View File

@@ -1576,10 +1576,6 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
}
break;
case nir_op_i2b32:
emit(CMP(dst, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ));
break;
case nir_op_unpack_half_2x16_split_x:
case nir_op_unpack_half_2x16_split_y:
case nir_op_pack_half_2x16_split: