intel/nir: Lower 8-bit scan/reduce ops to 16-bit

We can't really support these directly on any platform. May as well let NIR lower them. The NIR lowering is potentially one more instruction for scan/reduce ops thanks to not being able to do the B->W conversion as part of SEL_EXEC. For imax/imin exclusive scan, it's yet another instruction thanks to the extra imax/imin NIR has to insert to deal with the fact that the first live channel will contain the identity value which, when signed, will cast wrong. However, it does let us drop some complexity from our back-end so it's probably worth it. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7482>
2020-11-05 23:23:07 -06:00
parent 3ad2d85995
commit b98f0d3d7c
2 changed files with 33 additions and 39 deletions
--- a/src/intel/compiler/brw_nir.c
+++ b/src/intel/compiler/brw_nir.c
@@ -672,6 +672,36 @@ lower_bit_size_callback(const nir_instr *instr, UNUSED void *data)
      break;
   }

+   case nir_instr_type_intrinsic: {
+      nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
+      switch (intrin->intrinsic) {
+      case nir_intrinsic_reduce:
+      case nir_intrinsic_inclusive_scan:
+      case nir_intrinsic_exclusive_scan:
+         /* There are a couple of register region issues that make things
+          * complicated for 8-bit types:
+          *
+          *    1. Only raw moves are allowed to write to a packed 8-bit
+          *       destination.
+          *    2. If we use a strided destination, the efficient way to do
+          *       scan operations ends up using strides that are too big to
+          *       encode in an instruction.
+          *
+          * To get around these issues, we just do all 8-bit scan operations
+          * in 16 bits.  It's actually fewer instructions than what we'd have
+          * to do if we were trying to do it in native 8-bit types and the
+          * results are the same once we truncate to 8 bits at the end.
+          */
+         if (intrin->dest.ssa.bit_size == 8)
+            return 16;
+         return 0;
+
+      default:
+         return 0;
+      }
+      break;
+   }
+
   default:
      return 0;
   }