intel/nir: Lower 8-bit ops to 16-bit in NIR on Gen11+

Intel hardware supports 8-bit arithmetic but it's tricky and annoying: - Byte operations don't actually execute with a byte type. The execution type for byte operations is actually word. (I don't know if this has implications for the HW implementation. Probably?) - Destinations are required to be strided out to at least the execution type size. This means that B-type operations always have a stride of at least 2. This means wreaks havoc on the back-end in multiple ways. - Thanks to the strided destination, we don't actually save register space by storing things in bytes. We could, in theory, interleave two byte values into a single 2B-strided register but that's both a pain for RA and would lead to piles of false dependencies pre-Gen12 and on Gen12+, we'd need some significant improvements to the SWSB pass. - Also thanks to the strided destination, all byte writes are treated as partial writes by the back-end and we don't know how to copy-prop them. - On Gen11, they added a new hardware restriction that byte types aren't allowed in the 2nd and 3rd sources of instructions. This means that we have to emit B->W conversions all over to resolve things. If we emit said conversions in NIR, instead, there's a chance NIR can get rid of some of them for us. We can get rid of a lot of this pain by just asking NIR to get rid of 8-bit arithmetic for us. It may lead to a few more conversions in some cases but having back-end copy-prop actually work is probably a bigger bonus. There is still a bit we have to handle in the back-end. In particular, basic MOVs and conversions because 8-bit load/store ops still require 8-bit types. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7482>
2020-11-05 23:19:31 -06:00
parent b98f0d3d7c
commit 68092df8d8
4 changed files with 40 additions and 35 deletions
--- a/src/intel/compiler/brw_compiler.c
+++ b/src/intel/compiler/brw_compiler.c
@@ -49,7 +49,6 @@
   .vertex_id_zero_based = true,                                              \
   .lower_base_vertex = true,                                                 \
   .use_scoped_barrier = true,                                                \
-   .support_8bit_alu = true,                                                  \
   .support_16bit_alu = true,                                                 \
   .lower_uniforms_to_ubo = true

@@ -187,6 +186,9 @@ brw_compiler_create(void *mem_ctx, const struct gen_device_info *devinfo)
      nir_options->lower_int64_options = int64_options;
      nir_options->lower_doubles_options = fp64_options;

+      /* Starting with Gen11, we lower away 8-bit arithmetic */
+      nir_options->support_8bit_alu = devinfo->gen < 11;
+
      nir_options->unify_interfaces = i < MESA_SHADER_FRAGMENT;

      compiler->glsl_compiler_options[i].NirOptions = nir_options;