intel/fs: Fix constant propagation into 32x16 integer multiplication
Don't copy propagate the constant in situations like mov(8) g8<1>D 0x7fffffffD mul(8) g16<1>D g8<8,8,1>D g15<16,8,2>W On platforms that only have a 32x16 multiplier, this will result in lowering the multiply to mul(8) g15<1>D g14<8,8,1>D 0xffffUW mul(8) g16<1>D g14<8,8,1>D 0x7fffUW add(8) g15.1<2>UW g15.1<16,8,2>UW g16<16,8,2>UW On Gfx8 and Gfx9, which have the full 32x32 multiplier, it results in mul(8) g16<1>D g15<16,8,2>W 0x7fffffffD Volume 2a of the Skylake PRM says: When multiplying a DW and any lower precision integer, the DW operand must on src0. See also https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/104. Previous to INTEL_shader_integer_functions2 (in Vulkan or OpenGL), I don't think it would be possible to create a situation where this could occur. I discovered this via some optimizations that can determine that the non-constant source must be able to fit in 16-bits. The case listed above came from piglit's "ext_transform_feedback-order arrays points" with those optimizations in place. No shader-db or fossil-db changes on any Intel platform. Fixes:de6c0f8487
("intel/fs: Implement support for NIR opcodes for INTEL_shader_integer_functions2") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718> (cherry picked from commitdb20412168
)
This commit is contained in:

committed by
Dylan Baker

parent
3f1fe4b6bd
commit
b7a7e78ade
@@ -184,7 +184,7 @@
|
||||
"description": "intel/fs: Fix constant propagation into 32x16 integer multiplication",
|
||||
"nominated": true,
|
||||
"nomination_type": 1,
|
||||
"resolution": 0,
|
||||
"resolution": 1,
|
||||
"main_sha": null,
|
||||
"because_sha": "de6c0f848797d26d28223bcad25da03654461874"
|
||||
},
|
||||
|
@@ -836,6 +836,33 @@ fs_visitor::try_constant_propagate(fs_inst *inst, acp_entry *entry)
|
||||
inst->src[i] = val;
|
||||
progress = true;
|
||||
} else if (i == 0 && inst->src[1].file != IMM) {
|
||||
/* Don't copy propagate the constant in situations like
|
||||
*
|
||||
* mov(8) g8<1>D 0x7fffffffD
|
||||
* mul(8) g16<1>D g8<8,8,1>D g15<16,8,2>W
|
||||
*
|
||||
* On platforms that only have a 32x16 multiplier, this will
|
||||
* result in lowering the multiply to
|
||||
*
|
||||
* mul(8) g15<1>D g14<8,8,1>D 0xffffUW
|
||||
* mul(8) g16<1>D g14<8,8,1>D 0x7fffUW
|
||||
* add(8) g15.1<2>UW g15.1<16,8,2>UW g16<16,8,2>UW
|
||||
*
|
||||
* On Gfx8 and Gfx9, which have the full 32x32 multiplier, it
|
||||
* results in
|
||||
*
|
||||
* mul(8) g16<1>D g15<16,8,2>W 0x7fffffffD
|
||||
*
|
||||
* Volume 2a of the Skylake PRM says:
|
||||
*
|
||||
* When multiplying a DW and any lower precision integer, the
|
||||
* DW operand must on src0.
|
||||
*/
|
||||
if (inst->opcode == BRW_OPCODE_MUL &&
|
||||
type_sz(inst->src[1].type) < 4 &&
|
||||
type_sz(val.type) == 4)
|
||||
break;
|
||||
|
||||
/* Fit this constant in by commuting the operands.
|
||||
* Exception: we can't do this for 32-bit integer MUL/MACH
|
||||
* because it's asymmetric.
|
||||
|
Reference in New Issue
Block a user