intel/nir: Lower 8-bit ops to 16-bit in NIR on Gen11+

Intel hardware supports 8-bit arithmetic but it's tricky and annoying:

  - Byte operations don't actually execute with a byte type.  The
    execution type for byte operations is actually word.  (I don't know
    if this has implications for the HW implementation.  Probably?)

  - Destinations are required to be strided out to at least the
    execution type size.  This means that B-type operations always have
    a stride of at least 2.  This means wreaks havoc on the back-end in
    multiple ways.

  - Thanks to the strided destination, we don't actually save register
    space by storing things in bytes.  We could, in theory, interleave
    two byte values into a single 2B-strided register but that's both a
    pain for RA and would lead to piles of false dependencies pre-Gen12
    and on Gen12+, we'd need some significant improvements to the SWSB
    pass.

  - Also thanks to the strided destination, all byte writes are treated
    as partial writes by the back-end and we don't know how to copy-prop
    them.

  - On Gen11, they added a new hardware restriction that byte types
    aren't allowed in the 2nd and 3rd sources of instructions.  This
    means that we have to emit B->W conversions all over to resolve
    things.  If we emit said conversions in NIR, instead, there's a
    chance NIR can get rid of some of them for us.

We can get rid of a lot of this pain by just asking NIR to get rid of
8-bit arithmetic for us.  It may lead to a few more conversions in some
cases but having back-end copy-prop actually work is probably a bigger
bonus.  There is still a bit we have to handle in the back-end.  In
particular, basic MOVs and conversions because 8-bit load/store ops
still require 8-bit types.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7482>
This commit is contained in:
Jason Ekstrand
2020-11-05 23:19:31 -06:00
committed by Marge Bot
parent b98f0d3d7c
commit 68092df8d8
4 changed files with 40 additions and 35 deletions

View File

@@ -304,11 +304,11 @@ namespace brw {
case SHADER_OPCODE_INT_REMAINDER:
return emit(instruction(opcode, dispatch_width(), dst,
fix_math_operand(src0),
fix_math_operand(fix_byte_src(src1))));
fix_math_operand(src1)));
default:
return emit(instruction(opcode, dispatch_width(), dst,
src0, fix_byte_src(src1)));
src0, src1));
}
}
@@ -327,12 +327,12 @@ namespace brw {
case BRW_OPCODE_LRP:
return emit(instruction(opcode, dispatch_width(), dst,
fix_3src_operand(src0),
fix_3src_operand(fix_byte_src(src1)),
fix_3src_operand(fix_byte_src(src2))));
fix_3src_operand(src1),
fix_3src_operand(src2)));
default:
return emit(instruction(opcode, dispatch_width(), dst,
src0, fix_byte_src(src1), fix_byte_src(src2)));
src0, src1, src2));
}
}
@@ -394,8 +394,8 @@ namespace brw {
/* In some cases we can't have bytes as operand for src1, so use the
* same type for both operand.
*/
return set_condmod(mod, SEL(dst, fix_unsigned_negate(fix_byte_src(src0)),
fix_unsigned_negate(fix_byte_src(src1))));
return set_condmod(mod, SEL(dst, fix_unsigned_negate(src0),
fix_unsigned_negate(src1)));
}
/**
@@ -659,8 +659,8 @@ namespace brw {
emit(BRW_OPCODE_CSEL,
retype(dst, BRW_REGISTER_TYPE_F),
retype(src0, BRW_REGISTER_TYPE_F),
retype(fix_byte_src(src1), BRW_REGISTER_TYPE_F),
fix_byte_src(src2)));
retype(src1, BRW_REGISTER_TYPE_F),
src2));
}
/**
@@ -721,21 +721,6 @@ namespace brw {
backend_shader *shader;
/**
* Byte sized operands are not supported for src1 on Gen11+.
*/
src_reg
fix_byte_src(const src_reg &src) const
{
if (shader->devinfo->gen < 11 || type_sz(src.type) != 1)
return src;
dst_reg temp = vgrf(src.type == BRW_REGISTER_TYPE_UB ?
BRW_REGISTER_TYPE_UD : BRW_REGISTER_TYPE_D);
MOV(temp, src);
return src_reg(temp);
}
private:
/**
* Workaround for negation of UD registers. See comment in