intel/brw: Support CSE of ADD3

This one is a bit more complex in that we need to handle 3-source
commutative opcodes.  But it's also quite useful:

fossil-db results on Alchemist (A770):

    Instrs: 151659750 -> 150164959 (-0.99%); split: -0.99%, +0.01%
    Cycles: 12822686329 -> 12574996669 (-1.93%); split: -2.05%, +0.12%
    Subgroup size: 7589608 -> 7589592 (-0.00%)
    Send messages: 7375047 -> 7375053 (+0.00%); split: -0.00%, +0.00%
    Loop count: 46313 -> 46315 (+0.00%); split: -0.01%, +0.01%
    Spill count: 110184 -> 54670 (-50.38%); split: -50.79%, +0.41%
    Fill count: 213724 -> 104802 (-50.96%); split: -51.43%, +0.47%
    Scratch Memory Size: 9406464 -> 3375104 (-64.12%); split: -64.35%, +0.23%

Our older Shadow of the Tomb Raider fossil is particularly helped with
over a 90% reduction in scratch access (spills, fills, and scratch
size).  However, benchmarking in the actual game shows no change in
performance.  We're thinking the game's shaders have been updated since
our capture.

Ian noted that there was a bug here where we'd accidentally CSE two ADD3
instructions with null destinations and different src[2] that couldn't
be dead code eliminated due to conditional mods.  However, this is only
a bug in the new cse_defs pass so we don't need to nominate this for
stable branches.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29848>
This commit is contained in:
Kenneth Graunke
2024-06-16 02:45:53 -07:00
committed by Marge Bot
parent e1b1114bc2
commit 7adccbd48d

View File

@@ -66,6 +66,7 @@ is_expression(const fs_visitor *v, const fs_inst *const inst)
case BRW_OPCODE_FBH:
case BRW_OPCODE_FBL:
case BRW_OPCODE_CBIT:
case BRW_OPCODE_ADD3:
case BRW_OPCODE_RNDU:
case BRW_OPCODE_RNDD:
case BRW_OPCODE_RNDE:
@@ -208,6 +209,13 @@ operands_match(const fs_inst *a, const fs_inst *b, bool *negate)
}
}
return match;
} else if (a->sources == 3) {
return (xs[0].equals(ys[0]) && xs[1].equals(ys[1]) && xs[2].equals(ys[2])) ||
(xs[0].equals(ys[0]) && xs[1].equals(ys[2]) && xs[2].equals(ys[1])) ||
(xs[0].equals(ys[1]) && xs[1].equals(ys[0]) && xs[2].equals(ys[2])) ||
(xs[0].equals(ys[1]) && xs[1].equals(ys[2]) && xs[2].equals(ys[1])) ||
(xs[0].equals(ys[2]) && xs[1].equals(ys[0]) && xs[2].equals(ys[1])) ||
(xs[0].equals(ys[2]) && xs[1].equals(ys[1]) && xs[2].equals(ys[0]));
} else {
return (xs[0].equals(ys[0]) && xs[1].equals(ys[1])) ||
(xs[1].equals(ys[0]) && xs[0].equals(ys[1]));
@@ -319,10 +327,11 @@ hash_inst(const void *v)
hash = src_hash[0] * src_hash[1];
} else if (inst->is_commutative()) {
/* Commutatively combine both sources */
/* Commutatively combine the sources */
uint32_t hash0 = hash_reg(hash, inst->src[0]);
uint32_t hash1 = hash_reg(hash, inst->src[1]);
hash = hash0 * hash1;
uint32_t hash2 = inst->sources > 2 ? hash_reg(hash, inst->src[2]) : 1;
hash = hash0 * hash1 * hash2;
} else {
/* Just hash all the sources */
for (int i = 0; i < inst->sources; i++)