nir/algebraic: Optimize some bit operation nonsense observed in some shaders

In updates (not post at the time of this writing) to !29884, a change
caused many spill and fill regressions shader for OpenGL Tomb
Raider. While looking at that shader, I noticed some odd patterns. I
initially added these patterns to counteract the regressions caused by
the other change, but I had no luck. On Ice Lake... this cuts 99
instructions from the shader.

shader-db:

All Intel platforms had simliar results. (Meteor Lake shown)
total instructions in shared programs: 19732341 -> 19732295 (<.01%)
instructions in affected programs: 1744 -> 1698 (-2.64%)
helped: 1 / HURT: 0

total cycles in shared programs: 916273716 -> 916273068 (<.01%)
cycles in affected programs: 14266 -> 13618 (-4.54%)
helped: 1 / HURT: 0

fossil-db:

All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 151519575 -> 151519393 (-0.00%)
Cycle count: 17208402120 -> 17208246858 (-0.00%); split: -0.00%, +0.00%

Totals from 159 (0.03% of 630198) affected shaders:
Instrs: 51970 -> 51788 (-0.35%)
Cycle count: 11474176 -> 11318914 (-1.35%); split: -1.36%, +0.01%

Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30158>
This commit is contained in:
Ian Romanick
2024-07-06 11:24:43 -07:00
committed by Marge Bot
parent 92befad89f
commit 1c7e35d4e0

View File

@@ -1876,6 +1876,22 @@ optimizations.extend([
(('iadd', ('pack_half_2x16_rtz_split', a, 0), ('pack_half_2x16_rtz_split', 0, b)), ('pack_half_2x16_rtz_split', a, b)),
(('ior', ('pack_half_2x16_rtz_split', a, 0), ('pack_half_2x16_rtz_split', 0, b)), ('pack_half_2x16_rtz_split', a, b)),
(('bfi', 0xffff0000, ('pack_half_2x16_split', a, b), ('pack_half_2x16_split', c, d)),
('pack_half_2x16_split', c, a)),
# Part of the BFI operation is src2&~src0. This expands to (b & 3) & ~0xc
# which is (b & 3) & 3.
(('bfi', 0x0000000c, a, ('iand', b, 3)), ('bfi', 0x0000000c, a, b)),
# The important part here is that ~0xf & 0xfffffffc = ~0xf.
(('iand', ('bfi', 0x0000000f, '#a', b), 0xfffffffc),
('bfi', 0x0000000f, ('iand', a, 0xfffffffc), b)),
(('iand', ('bfi', 0x00000007, '#a', b), 0xfffffffc),
('bfi', 0x00000007, ('iand', a, 0xfffffffc), b)),
# 0x0f << 3 == 0x78, so that's already the maximum possible value.
(('umin', ('ishl', ('iand', a, 0xf), 3), 0x78), ('ishl', ('iand', a, 0xf), 3)),
(('extract_i8', ('pack_32_4x8_split', a, b, c, d), 0), ('i2i', a)),
(('extract_i8', ('pack_32_4x8_split', a, b, c, d), 1), ('i2i', b)),
(('extract_i8', ('pack_32_4x8_split', a, b, c, d), 2), ('i2i', c)),