nir/find_array_copies: Handle wildcards and overlapping copies

This commit rewrites opt_find_array_copies to be able to handle
an array copy sequence with other intervening operations in between. In
particular, this handles the case where we OpLoad an array of structs
and then OpStore it, which generates code like:

foo[0].a = bar[0].a
foo[0].b = bar[0].b
foo[1].a = bar[1].a
foo[1].b = bar[1].b
...

that wasn't recognized by the previous pass.

In order to correctly handle copying arrays of arrays, and in particular
to correctly handle copies involving wildcards, we need to use a tree
structure similar to lower_vars_to_ssa so that we can walk all the
partial array copies invalidated by a particular write, including
ones where one of the common indices is a wildcard. I actually think
that when factoring in the needed hashing/comparing code, a hash table
based approach wouldn't be a lot smaller anyways.

All of the changes come from tessellation control shaders in Strange
Brigade, where we're able to remove the DXVK-inserted copy at the
beginning of the shader. These are the result for radv:

Totals from affected shaders:
SGPRS: 4576 -> 4576 (0.00 %)
VGPRS: 13784 -> 5560 (-59.66 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 8696 -> 6876 (-20.93 %) dwords per thread
Code Size: 329940 -> 263268 (-20.21 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 330 -> 898 (172.12 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This commit is contained in:
Connor Abbott
2019-06-18 12:12:49 +02:00
parent c6543efe7a
commit 156306e5e6
3 changed files with 416 additions and 196 deletions

View File

@@ -1223,6 +1223,7 @@ nir_deref_instr_get_variable(const nir_deref_instr *instr)
}
bool nir_deref_instr_has_indirect(nir_deref_instr *instr);
bool nir_deref_instr_is_known_out_of_bounds(nir_deref_instr *instr);
bool nir_deref_instr_has_complex_use(nir_deref_instr *instr);
bool nir_deref_instr_remove_if_unused(nir_deref_instr *instr);