nir: intel/compiler: Add and use nir_op_pack_32_4x8_split

A lot of CTS tests write a u8vec4 or an i8vec4 to an SSBO. This results in a lot of shifts and MOVs. When that pattern can be recognized, the individual 8-bit components can be packed much more efficiently. v2: Rebase on b4369de27f ("nir/lower_packing: use shader_instructions_pass") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9025>
2021-01-25 16:31:17 -08:00
parent 89f639c0ca
commit f0a8a9816a
6 changed files with 31 additions and 2 deletions
--- a/src/intel/compiler/brw_compiler.c
+++ b/src/intel/compiler/brw_compiler.c
@@ -68,6 +68,7 @@
   .lower_usub_sat64 = true,                                                  \
   .lower_hadd64 = true,                                                      \
   .avoid_ternary_with_two_constants = true,                                  \
+   .has_pack_32_4x8 = true,                                                   \
   .max_unroll_iterations = 32,                                               \
   .force_indirect_unrolling = nir_var_function_temp