v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix
overlapping register allocation (via has_source_and_destination_hazard). Fix
incorrect destination register file encoding.
v3: Prevent lower_regioning from trying to "fix" DPAS sources.
v4: Add instruction latency information for scheduling and perf
estimates.
v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update
the comment in fs_inst::has_source_and_destination_hazard. Suggested
by Caio.
v6: Add some comments near the src2 calculation in
fs_inst::size_read. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
With the following SEND instruction :
send(1) nullUD nullUD g0UD 0x4200c504 a0.1<0>UD
This instruction although valid but somewhat nonsensical (SEND message
to write at offset contained in NULL register), triggers an error in
the validator.
The restriction is that we cannot have overlapping sources. The
validator not checking the type of register incorrectly thinks that
the null register (offset 0) is the same as g0.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17555>
When this rule started causing issues, I looked it up in the
documentation, and found the rule for 64-bit destinations and
integer DWord multiplication, but there was no mention of floating
point destinations, as the text in brackets suggested. The actual
restriction text had been updated, so this led to some confusion
where I thought the conditions had been changed in newer docs.
However, what's actually going on is that there are two separate
conditions, each listed in separate rows of the table. One lists
64-bit destinations or integer DWord multiplication, and the other
mentions floating-point destinations. In both cases, the actual
restrictions are identical, so we handle them together in the code.
Try to update the comment to avoid future confusion.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17624>
Recently, we started using <1;1,0> register regions for consecutive
channels, rather than the <8;8,1> we've traditionally used, as the
<1;1,0> encoding can be compacted on XeHP. Since then, one of the
EU validator rules has been flagging tons of instructions as errors:
mov(16) g114<1>F g112<1,1,0>UD { align1 1H I@2 compacted };
ERROR: Register Regioning patterns where register data bit locations are changed between source and destination are not supported except for broadcast of a scalar.
Our code for this restriction checked three things:
#1: vstride != width * hstride ||
#2: src_stride != dst_stride ||
#3: subreg != dst_subreg
Destination regions are always linear (no replicated values, nor
any overlapping components), as they only have hstride. Rule #1 is
requiring that the source region be linear as well. Rules #2-3 are
straightforward: the subregister must match (for the first channel to
line up), and the source/destination strides must match (for any
subsequent channels to line up).
Unfortunately, rules #1-2 weren't working when horizontal stride was 0.
In that case, regions are linear if width == 1, and the stride between
consecutive channels is given by vertical stride instead.
So we adjust our src_stride calculation from
src_stride = hstride * type_size;
to:
src_stride = (hstride ? hstride : vstride) * type_size;
and adjust rule #1 to allow hstride == 0 as long as width == 1.
While here, we also update the text of the rule to match the latest
documentation, which apparently clarifies that it's the location of
the LSB of the channel which matters.
Fixes: 3f50dde8b3 ("intel/eu: Teach EU validator about FP/DP pipeline regioning restrictions.")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17624>
When the EU validator encountered an error, it would add an annotation
to the disassembly. Unfortunately, the code to insert an error assumed
that the next instruction would start at (offset + sizeof(brw_inst)),
which is not true if the instruction with an error is compacted.
This could lead to cascading disassembly errors, where we started trying
to decode the next instruction at the wrong offset, and getting lots of
scary looking output:
ERROR: Register Regioning patterns where [...]
(-f0.1.any16h) illegal(*** invalid execution size value 6 ) { align1 $7.src atomic };
(+f0.1.any16h) illegal.sat(*** invalid execution size value 6 ) { align1 $9.src AccWrEnable };
illegal(*** invalid execution size value 6 ) { align1 $11.src };
(+f0.1) illegal.sat(*** invalid execution size value 6 ) { align1 F@2 AccWrEnable };
(+f0.1) illegal.sat(*** invalid execution size value 6 ) { align1 F@2 AccWrEnable };
(+f0.1) illegal.sat(*** invalid execution size value 6 ) { align1 $15.src AccWrEnable };
illegal(*** invalid execution size value 6 ) { align1 $15.src };
(+f0.1) illegal.sat.g.f0.1(*** invalid execution size value 6 ) { align1 $13.src AccWrEnable };
Only the first instruction was actually wrong - the rest are just a
result of starting the disassembler at the wrong offset. Trash ensues!
To fix this, just pass the instruction size in a few layers so we can
record the next offset properly.
Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17624>
If these checks had been in place previously, some bugs
that... eh-hem... practically took down the Intel CI would have been
caught earlier. *blush*
v2: Update to account for split sends.
v3: Add some more Gfx version checks. Remove the redundant "src0 is a
GRF" check. Both suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17379>
Specifically, execution size, register file, and register type. I did
not add validation for vertical stride and width because I don't believe
it's possible to have an otherwise valid instruction with an invalid
vertical stride or width, due to all of the other regioning
restrictions.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>
To avoid it, use the modulo of the number of bits in the value being
shifted, which is presumably what ended up happening on x86.
Flagged by UBSan:
../src/intel/compiler/brw_eu_validate.c:974:33: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int'
#0 0x561abb612ab3 in general_restrictions_on_region_parameters ../src/intel/compiler/brw_eu_validate.c:974
#1 0x561abb617574 in brw_validate_instructions ../src/intel/compiler/brw_eu_validate.c:1851
#2 0x561abb53bd31 in validate ../src/intel/compiler/test_eu_validate.cpp:106
#3 0x561abb555369 in validation_test_source_cannot_span_more_than_2_registers_Test::TestBody() ../src/intel/compiler/test_eu_validate.cpp:486
#4 0x561abb742651 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#5 0x561abb72e64d in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#6 0x561abb6d5451 in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
#7 0x561abb6d7b2a in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
#8 0x561abb6da2b8 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
#9 0x561abb6f5c92 in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
#10 0x561abb74626a in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#11 0x561abb732025 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#12 0x561abb6ed2b4 in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
#13 0x561abb768b3b in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
#14 0x561abb7689fb in main ../src/gtest/src/gtest_main.cc:37
#15 0x7f525e5a9bba in __libc_start_main ../csu/libc-start.c:308
#16 0x561abb538ed9 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/eu_validate+0x1b8ed9)
Reviewed-by: Adam Jackson <ajax@redhat.com>
They look like a NULL source if you don't look at the address mode.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The following fix-up by Jordan Justen is squashed in:
intel/eu/validate: gen12 send instruction doesn't have a dst type field
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Change brw_inst_set_opcode() and brw_inst_opcode() to call
brw_opcode_encode/decode() transparently in order to translate between
hardware and IR opcodes, and update the EU compaction code in order to
do the same as needed, so we can eventually drop the one-to-one
correspondence between hardware and IR opcodes.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The brw_inst opcode accessors are going away in one of the following
commits. We could potentially replace them with the new helpers that
do opcode remapping, but that would lead to a circular dependency
between brw_inst.h and brw_eu.h. This way we also avoid ordering
issues that can cause the semantics of the ex_desc accessors to change
depending on whether the ex_desc field is set after or before the
opcode instruction field.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The simulator complains about using byte operands, we also have
documentation telling us.
Note that add operations on bytes seems to work fine on HW (like ADD).
Using dwords operands with CMP & SEL fixes the following tests :
dEQP-VK.spirv_assembly.type.vec*.i8.*
v2: Drop the GLK changes (Matt)
Add validator tests (Matt)
v3: Drop GLK ref (Matt)
Don't mix float/integer in MAD (Matt)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
BSpec: 3017
Cc: <mesa-stable@lists.freedesktop.org>
v2:
- Adapted unit tests to make them consistent with the changes done
to the validation of half-float conversions.
v3 (Curro):
- Check all the accummulators
- Constify declarations
- Do not check src1 type in single-source instructions.
- Check for all instructions that read accumulator (either implicitly or
explicitly)
- Check restrictions in src1 too.
- Merge conditional block
- Add invalid test case.
v4 (Curro):
- Assert on 3-src instructions, as they are not validated.
- Get rid of types_are_mixed_float(), as we know instruction is mixed
float at that point.
- Remove conditions from not verified case.
- Fix brackets on conditional.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>