nir: Eliminate nir_op_i2b
There are a lot of optimizations in opt_algebraic that match ('ine', a, 0), but there are almost none that match i2b. Instead of adding a huge pile of additional patterns (including variations that include both ine and i2b), always lower i2b to a != 0. At this point in the series, it should be impossible for anything to generate i2b, so there /should not/ be any changes. The failing test on d3d12 is a pre-existing bug that is triggered by this change. I talked to Jesse about it, and, after some analysis, he suggested just adding it to the list of known failures. v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b. v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py. v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after nir_lower_doubles makes progress. The latter can generate b2i instructions, but nir_lower_int64 can't handle them (anymore). v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I had accidentally removed the f2b(bf2(x)) optimization. v6: Just eliminate the i2b instruction. v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused) emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this function was still used. 🤷 No shader-db changes on any Intel platform. All Intel platforms had similar results. (Ice Lake shown) Instructions in all programs: 141165875 -> 141165873 (-0.0%) Instructions helped: 2 Cycles in all programs: 9098956382 -> 9098956350 (-0.0%) Cycles helped: 2 The two Vulkan shaders are helped because of the "new" (('b2i32', ('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern. Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version] Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
This commit is contained in:
@@ -1569,23 +1569,14 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr,
|
||||
inst = bld.emit(SHADER_OPCODE_RSQ, result, op[0]);
|
||||
break;
|
||||
|
||||
case nir_op_i2b32:
|
||||
case nir_op_f2b32: {
|
||||
uint32_t bit_size = nir_src_bit_size(instr->src[0].src);
|
||||
if (bit_size == 64) {
|
||||
/* two-argument instructions can't take 64-bit immediates */
|
||||
fs_reg zero;
|
||||
fs_reg tmp;
|
||||
fs_reg zero = vgrf(glsl_type::double_type);
|
||||
fs_reg tmp = vgrf(glsl_type::double_type);
|
||||
|
||||
if (instr->op == nir_op_f2b32) {
|
||||
zero = vgrf(glsl_type::double_type);
|
||||
tmp = vgrf(glsl_type::double_type);
|
||||
bld.MOV(zero, setup_imm_df(bld, 0.0));
|
||||
} else {
|
||||
zero = vgrf(glsl_type::int64_t_type);
|
||||
tmp = vgrf(glsl_type::int64_t_type);
|
||||
bld.MOV(zero, brw_imm_q(0));
|
||||
}
|
||||
bld.MOV(zero, setup_imm_df(bld, 0.0));
|
||||
|
||||
/* A SIMD16 execution needs to be split in two instructions, so use
|
||||
* a vgrf instead of the flag register as dst so instruction splitting
|
||||
@@ -1596,11 +1587,10 @@ fs_visitor::nir_emit_alu(const fs_builder &bld, nir_alu_instr *instr,
|
||||
} else {
|
||||
fs_reg zero;
|
||||
if (bit_size == 32) {
|
||||
zero = instr->op == nir_op_f2b32 ? brw_imm_f(0.0f) : brw_imm_d(0);
|
||||
zero = brw_imm_f(0.0f);
|
||||
} else {
|
||||
assert(bit_size == 16);
|
||||
zero = instr->op == nir_op_f2b32 ?
|
||||
retype(brw_imm_w(0), BRW_REGISTER_TYPE_HF) : brw_imm_w(0);
|
||||
zero = retype(brw_imm_w(0), BRW_REGISTER_TYPE_HF);
|
||||
}
|
||||
bld.CMP(result, op[0], zero, BRW_CONDITIONAL_NZ);
|
||||
}
|
||||
|
@@ -1576,10 +1576,6 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
|
||||
}
|
||||
break;
|
||||
|
||||
case nir_op_i2b32:
|
||||
emit(CMP(dst, op[0], brw_imm_d(0), BRW_CONDITIONAL_NZ));
|
||||
break;
|
||||
|
||||
case nir_op_unpack_half_2x16_split_x:
|
||||
case nir_op_unpack_half_2x16_split_y:
|
||||
case nir_op_pack_half_2x16_split:
|
||||
|
Reference in New Issue
Block a user