Otherwise a smaller type may be promoted to int, which can hit undefined
behaviour:
../src/intel/compiler/brw_packed_float.c:66:17: runtime error: left shift of 128 by 24 places cannot be represented in type 'int'
#0 0x5604a03969aa in brw_vf_to_float ../src/intel/compiler/brw_packed_float.c:66
#1 0x5604a0391305 in vf_float_conversion_test_test_vf_to_float_Test::TestBody() ../src/intel/compiler/test_vf_float_conversions.cpp:70
#2 0x5604a041a323 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#3 0x5604a0405c31 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#4 0x5604a03ab03b in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
#5 0x5604a03ad714 in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
#6 0x5604a03afea2 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
#7 0x5604a03cb87c in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
#8 0x5604a041df3c in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#9 0x5604a0409609 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#10 0x5604a03c2e9e in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
#11 0x5604a0442d57 in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
#12 0x5604a0442c17 in main ../src/gtest/src/gtest_main.cc:37
#13 0x7f9a1983dbba in __libc_start_main ../csu/libc-start.c:308
#14 0x5604a0390d89 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/vf_float_conversions+0x8dd89)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
To avoid it, use the modulo of the number of bits in the value being
shifted, which is presumably what ended up happening on x86.
Flagged by UBSan:
../src/intel/compiler/brw_eu_validate.c:974:33: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int'
#0 0x561abb612ab3 in general_restrictions_on_region_parameters ../src/intel/compiler/brw_eu_validate.c:974
#1 0x561abb617574 in brw_validate_instructions ../src/intel/compiler/brw_eu_validate.c:1851
#2 0x561abb53bd31 in validate ../src/intel/compiler/test_eu_validate.cpp:106
#3 0x561abb555369 in validation_test_source_cannot_span_more_than_2_registers_Test::TestBody() ../src/intel/compiler/test_eu_validate.cpp:486
#4 0x561abb742651 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#5 0x561abb72e64d in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#6 0x561abb6d5451 in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
#7 0x561abb6d7b2a in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
#8 0x561abb6da2b8 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
#9 0x561abb6f5c92 in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
#10 0x561abb74626a in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#11 0x561abb732025 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#12 0x561abb6ed2b4 in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
#13 0x561abb768b3b in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
#14 0x561abb7689fb in main ../src/gtest/src/gtest_main.cc:37
#15 0x7f525e5a9bba in __libc_start_main ../csu/libc-start.c:308
#16 0x561abb538ed9 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/eu_validate+0x1b8ed9)
Reviewed-by: Adam Jackson <ajax@redhat.com>
On Gen12, we support mixed mode HF/F operands, and also 3 source
instruction supports immediate value support, so keep immediate as it
is, if it fits properly in 16 bit field.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On Gen >= 12, if src0 or src2 holds immediate value, we need set
src[0/2]_is_imm bits instead of register file.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On Gen >= 10, Either src0 or src2 can use 16-bit immediate value, but
not both.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Remove emit_alpha_to_coverage workaround from backend compiler and start
using ported workaround from NIR.
v2: Copy comment from brw_fs_visitor (Caio Marcelo de Oliveira Filho)
Fixes piglit test on HSW:
- arb_sample_shading-builtin-gl-sample-mask-mrt-alpha-to-coverage-combinations
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Importing this pass from fs_visitor::emit_alpha_to_coverage_workaround()
in intel/compiler.
v2 (Caio Marcelo de Oliveira Filho):
- Track store output and sample mask instruction
- Nest math insturction for more readability
- Bail out early if no gl_SampleMask
v3: (Caio Marcelo de Oliveira Filho):
- Do math instructions after instruction block
- Restructure code
- Move pass under src/intel/compiler
v4: (Caio Marcelo de Oliveira Filho):
- Organize dither mask calculation
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This can be useful to measure whether memory access optimizations are
having the desired effect. For example, we might see a reduction in
image loads/stores, or constant buffer loads. We can already see this
in cycle estimates to some extent, but this is a more direct approach,
minus a lot of the noise of random scheduler shuffling.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
DPH isn't actually commutative, so this doesn't work. If the immediate
in src0 would be a VF candidate, we could do better. *shrug*
No shader-db changes on any Intel platform.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: b04beaf41d ("intel/vec4: Try both sources as candidates for being immediates")
Tests the combinations of cases of RAW, WAW and WAR hazards involving
both inorder and outoforder instructions. Also tests that
dependencies combine and propagate correctly through control
flow (loops and conditionals).
v2: Add an extra test illustrating that the non-logical CFG edge
between then-block and else-block is being taking into
account. (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The bit moved on gen12 in order to prepare for dual-SIMD8 dispatch.
This implementation isn't an entirely complete as it only works on SIMD8
and SIMD16 and not dual-SIMD8.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Apparently the ts_request_type and ts_resource_select thread spawner
message descriptor bits were removed from the hardware at least since
ICL. Drop them in order to avoid assertion failures on Gen12+
platforms which don't have any encoding for this. On Gen9+ these are
probably just ignored by the hardware, so this is unlikely to have had
any functional implications prior to Gen12.
v2: Mark TS message fields as non-existing in brw_inst.h on ICL. (Caio)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The WAIT instruction has been removed, but SYNC.bar can be used
instead to wait for a notification on n0.0.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Apparently this field was removed on SKL, and according to the
hardware docs for previous platforms "This field is only valid for a
ForwardMsg message. It is ignored for other messages. The BarrierMsg
message always increments the N0 notification counter".
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Confirmed no regressions after a full Piglit run on TGL with the
brw_fs_test_dispatch_packing() test enabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
They look like a NULL source if you don't look at the address mode.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The following fix-up by Jordan Justen is squashed in:
intel/eu/validate: gen12 send instruction doesn't have a dst type field
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Due to hardware bug filed as HSDES#1604601757.
v2: Only return if result of fs_inst::can_do_source_mods() is known to
be false for the case new orthogonal restrictions are implemented
below in the future. (Caio)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Kept as a separate commit in order to avoid distracting reviewers of
the software scoreboard pass with memory management boilerplate.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Gen12+ hardware lacks the register scoreboard logic that used to
guarantee data coherency between register reads and writes in previous
generations. This lowering pass runs after register allocation in
order to make up for it.
It works by performing global dataflow analysis in order to determine
the set of potential dependencies of every instruction in the shader,
and then inserts any required SWSB annotations and additional SYNC
instructions in order to guarantee data coherency.
v2: Drop unnecessary _safe list iteration (Caio).
v3: Temporarily workaround potential WaR hazard between FPU
instruction and subsequent out-of-order write, pending
clarification from the hardware team. Drop redundant tracking of
implicit access of acc0-1, since the hardware guarantees coherency
of these (but not the other accumulators...).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewers are encouraged to audit the code generation pass
independently for the case I missed some potential data hazard or new
code has been added in the meantime.
v2: Add SYNC instruction to cr0 workaround in brw_float_controls_mode().
v3: Drop likely redundant (and potentially harmful) RegDist SWSB
annotation from ce0 read in brw_find_live_channel() (Caio).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
An effect similar to the one formerly provided by setting thread
control to "switch" can be achieved now by setting a RegDist of 1 on
the SWSB field.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A future lowering pass will simulate the same behavior originally
provided by NoDDChk/NoDDClr at the IR level by using appropriate SWSB
annotations.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The new SEND instruction behaves like the former SENDS instruction.
The original single-payload SEND instruction is gone.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The SEND instruction is now four-source. The descriptor is no longer
part of source 1, so avoid touching it to avoid corruption while
initializing the descriptor.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Quite a lot of churn because the encoding of most hardware opcodes has
changed unfortunately.
v2: Split dot-product description fixes to separate patch (Caio).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>