intel/nir: Lower 8-bit ops to 16-bit in NIR on Gen11+

Intel hardware supports 8-bit arithmetic but it's tricky and annoying: - Byte operations don't actually execute with a byte type. The execution type for byte operations is actually word. (I don't know if this has implications for the HW implementation. Probably?) - Destinations are required to be strided out to at least the execution type size. This means that B-type operations always have a stride of at least 2. This means wreaks havoc on the back-end in multiple ways. - Thanks to the strided destination, we don't actually save register space by storing things in bytes. We could, in theory, interleave two byte values into a single 2B-strided register but that's both a pain for RA and would lead to piles of false dependencies pre-Gen12 and on Gen12+, we'd need some significant improvements to the SWSB pass. - Also thanks to the strided destination, all byte writes are treated as partial writes by the back-end and we don't know how to copy-prop them. - On Gen11, they added a new hardware restriction that byte types aren't allowed in the 2nd and 3rd sources of instructions. This means that we have to emit B->W conversions all over to resolve things. If we emit said conversions in NIR, instead, there's a chance NIR can get rid of some of them for us. We can get rid of a lot of this pain by just asking NIR to get rid of 8-bit arithmetic for us. It may lead to a few more conversions in some cases but having back-end copy-prop actually work is probably a bigger bonus. There is still a bit we have to handle in the back-end. In particular, basic MOVs and conversions because 8-bit load/store ops still require 8-bit types. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7482>
2020-11-05 23:19:31 -06:00
parent b98f0d3d7c
commit 68092df8d8
4 changed files with 40 additions and 35 deletions
--- a/src/intel/compiler/brw_fs_builder.h
+++ b/src/intel/compiler/brw_fs_builder.h
@@ -304,11 +304,11 @@ namespace brw {
         case SHADER_OPCODE_INT_REMAINDER:
            return emit(instruction(opcode, dispatch_width(), dst,
                                    fix_math_operand(src0),
-                                    fix_math_operand(fix_byte_src(src1))));
+                                    fix_math_operand(src1)));

         default:
            return emit(instruction(opcode, dispatch_width(), dst,
-                                    src0, fix_byte_src(src1)));
+                                    src0, src1));

         }
      }
@@ -327,12 +327,12 @@ namespace brw {
         case BRW_OPCODE_LRP:
            return emit(instruction(opcode, dispatch_width(), dst,
                                    fix_3src_operand(src0),
-                                    fix_3src_operand(fix_byte_src(src1)),
-                                    fix_3src_operand(fix_byte_src(src2))));
+                                    fix_3src_operand(src1),
+                                    fix_3src_operand(src2)));

         default:
            return emit(instruction(opcode, dispatch_width(), dst,
-                                    src0, fix_byte_src(src1), fix_byte_src(src2)));
+                                    src0, src1, src2));
         }
      }

@@ -394,8 +394,8 @@ namespace brw {
         /* In some cases we can't have bytes as operand for src1, so use the
          * same type for both operand.
          */
-         return set_condmod(mod, SEL(dst, fix_unsigned_negate(fix_byte_src(src0)),
-                                     fix_unsigned_negate(fix_byte_src(src1))));
+         return set_condmod(mod, SEL(dst, fix_unsigned_negate(src0),
+                                     fix_unsigned_negate(src1)));
      }

      /**
@@ -659,8 +659,8 @@ namespace brw {
                            emit(BRW_OPCODE_CSEL,
                                 retype(dst, BRW_REGISTER_TYPE_F),
                                 retype(src0, BRW_REGISTER_TYPE_F),
-                                 retype(fix_byte_src(src1), BRW_REGISTER_TYPE_F),
-                                 fix_byte_src(src2)));
+                                 retype(src1, BRW_REGISTER_TYPE_F),
+                                 src2));
      }

      /**
@@ -721,21 +721,6 @@ namespace brw {

      backend_shader *shader;

-      /**
-       * Byte sized operands are not supported for src1 on Gen11+.
-       */
-      src_reg
-      fix_byte_src(const src_reg &src) const
-      {
-         if (shader->devinfo->gen < 11 || type_sz(src.type) != 1)
-            return src;
-
-         dst_reg temp = vgrf(src.type == BRW_REGISTER_TYPE_UB ?
-                             BRW_REGISTER_TYPE_UD : BRW_REGISTER_TYPE_D);
-         MOV(temp, src);
-         return src_reg(temp);
-      }
-
   private:
      /**
       * Workaround for negation of UD registers.  See comment in