Without looking at the assembly or something, I'm not sure what the
compiler does here. The brw_reg_type enum is marked packed, so I'm
guess that it gets represented as a uint8_t. That's the only reason I
could think that comparing with -1 would be always true.
This patch adds the same cast that exists in brw_hw_type_to_reg_type.
It might be better to add a #define outside the enum for
BRW_REGISTER_TYPE_INVALID as (enum brw_reg_type)-1.
src/intel/compiler/brw_eu_compact.c: In function ‘has_immediate’:
src/intel/compiler/brw_eu_compact.c:1515:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
1515 | return *type != -1;
| ^~
src/intel/compiler/brw_eu_compact.c:1518:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
1518 | return *type != -1;
| ^~
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
CID: 1455194
Fixes: 12d3b11908 ("intel/compiler: Add instruction compaction support on Gen12")
Cc: @mattst88
This reverts commit 52c7df1643. The pass,
while clearly useful for some shaders, has at least three bugs that I
was able to find fairly quickly:
1. It doesn't work for type-converting MOVs because f > 0 is not the
same as f2i(f) > 0
2. CSEL is a 3src instruction and only supports one source type; it
doesn't take this into account and tries to create instructions
which do a F compare and a D select. This is especially nasty to
debug because you don't see that in the dumped assembly because we
don't properly assert that types are the same in codegen.
3. While you can handle 2, in theory, by reinterpreting types, you
can't do that in the presence of source modifiers. This pass
doesn't even attempt to detect that.
Those are just the ones I found with the one almost trival shader I was
debugging. There very likely may be more and. Best thing to do for now
is just shut it off until someone has the time to figure out how to do
this properly and write tests to ensure it's correct.
Fixes: 3cb085e6d61a "i965/fs: Merge CMP and SEL into CSEL on Gen8+"
Reviewed-by: Brian Paul <brianp@vmware.com>
Alignment requirements may have changed the horizontal stride already,
so don't set it if not required to avoid breaking said requirements.
Fixes several tests such as
dEQP-VK.subgroups.vote.graphics.subgroupallequal_int8_t
Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
On gen7 and earlier the scratch space size is limited to 12kB.
By enabling this optimization we may easily exceed this limit
without having any fallback.
arb_compute_shader/linker/bug-93840.shader_test crashes with
this lowering on IVB due to exceeding scratch size limit.
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2092
Fixes: 69244fc7
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
On ICL we have the src1 restriction which is applied through
fix_byte_src() and potentially changes the type of the operands from 8
to 32 bits. When this change happens, we fall into the "else if
(bit_size < 32)" case and miscompute src_type because it takes into
consideration bit_size (8) instead of the adjusted size of temp_op
(32). This results in the shader reading unused memory, giving us
mostly failures, but occasional passes due to whatever was already in
the registers we were reading.
This commit fixes a lot of dEQP subgroup i8vec2 tests on ICL, such as:
dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2
This can also be verified by simply changing fix_byte_src() to apply
on all platforms.
Fixes: 5847de6e9a ("intel/compiler: don't use byte operands for src1 on ICL")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Shader-db results on Kaby Lake:
total instructions in shared programs: 14929212 -> 14880028 (-0.33%)
instructions in affected programs: 72428 -> 23244 (-67.91%)
helped: 6
HURT: 2
helped stats (abs) min: 2165 max: 15981 x̄: 8590.00 x̃: 7624
helped stats (rel) min: 56.06% max: 74.52% x̄: 67.55% x̃: 72.08%
HURT stats (abs) min: 1178 max: 1178 x̄: 1178.00 x̃: 1178
HURT stats (rel) min: 350.60% max: 361.35% x̄: 355.97% x̃: 355.97%
95% mean confidence interval for instructions value: -11947.03 -348.97
95% mean confidence interval for instructions %-change: -125.72% 202.37%
Inconclusive result (%-change mean confidence interval includes 0).
total cycles in shared programs: 368585300 -> 342557344 (-7.06%)
cycles in affected programs: 28144921 -> 2116965 (-92.48%)
helped: 6
HURT: 2
helped stats (abs) min: 1404978 max: 7766106 x̄: 4353922.00 x̃: 3890682
helped stats (rel) min: 82.01% max: 95.57% x̄: 89.95% x̃: 92.28%
HURT stats (abs) min: 47778 max: 47798 x̄: 47788.00 x̃: 47788
HURT stats (rel) min: 278.20% max: 282.98% x̄: 280.59% x̃: 280.59%
95% mean confidence interval for cycles value: -5900438.73 -606550.27
95% mean confidence interval for cycles %-change: -140.79% 146.16%
Inconclusive result (%-change mean confidence interval includes 0).
total spills in shared programs: 9243 -> 8901 (-3.70%)
spills in affected programs: 2718 -> 2376 (-12.58%)
helped: 4
HURT: 4
total fills in shared programs: 21831 -> 10141 (-53.55%)
fills in affected programs: 11804 -> 114 (-99.03%)
helped: 6
HURT: 2
total sends in shared programs: 815912 -> 815912 (0.00%)
sends in affected programs: 0 -> 0
helped: 0
HURT: 0
LOST: 1
GAINED: 3
The helped shaders are all compute shaders in Aztec Ruins. There is
also a compute shader in synmark2 OglCSDof that's helped but it doesn't
show up in above shader-db results because it went from SIMD8 to SIMD16.
That shader improves enough to yield an 15-20% performance boost to the
benchmark as a whole on my KBL laptop. The hurt shaders are a couple
shaders in Kerbal Space Program and a couple in Aztec Ruins.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This commit fills in a number of different pieces:
1. We add support to brw_nir_lower_mem_access_bit_sizes to handle the
new intrinsics. This involves simple plumbing work as well as a
tiny bit of extra logic to always scalarize scratch intrinsics
2. Add code to brw_fs_nir.cpp to turn nir_load/store_scratch intrinsics
into byte/dword scattered read/write messages which use the A32
stateless model.
3. Add code to lower_surface_logical_send to handle dword scattered
messages and the A32 stateless model.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The new helper solves most of the annoying problems with data wrangling
in brw_nir_lower_mem_access_bit_sizes.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Commit 5847de6e9a implemented a restriction that applies to ICL, but
wrongly marked it as also applying to GLK. Reviewers or MR !1125
pointed this, and the commit history shows removal of GLK to parts of
the patch, but it turns there was still a left-over GLK check in the
code.
This code was breaking some of the i8vec2 tests on GLK, for example:
dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2
Removing the GLK check solves the issue for GLK. I don't see a reason
on why implementing this restriction would actually break GLK, so
there's still more to investigate here since this bug may be affecting
ICL+, but let's apply the real GLK fix while we analyze and discuss
the other possible issues.
Fixes: 5847de6e9a ("intel/compiler: don't use byte operands for src1
on ICL")
BSpec: 3017
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
This make shader-db's report.py work on Haswell and earlier platforms.
The problem is that the script would detect the "sends" output for
scalar shaders and expect in in vec4 shaders too. When it didn't find
it, the script would fail with:
Traceback (most recent call last):
File "./report.py", line 351, in <module>
main()
File "./report.py", line 182, in main
before_count = before[p][m]
KeyError: 'sends'
Fixes: f192741ddd ("intel/compiler: Report the number of non-spill/fill SEND messages")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These reworks were combined into this patch:
* Matt Turner: i965: Disable NoDDChk/NoDDClr test on Gen12+
* Francisco Jerez: intel/eu/validate/gen12: Disable
qword_low_power_no_depctrl eu_validate test.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
TGL will have separate tables for src0 and src1, so the shared function
will no longer make sense.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The EU compaction unit test fuzzes the compaction code by flipping bits.
We use a simple skip_bits() function with a list of reserved bits to
ignore, but for more complex cases like invalid combinations of register
file:type, we need either machinery to check validity or for these
functions to simply inform us whether a combination was valid.
enum brw_reg_type a 4-bit field in brw_reg, so rather than expanding it
with an "INVALID" value, just return -1 and let the caller check for
that.
Scott suggested redefining unreachable() within the unit test to
longjmp() which would allow driver code like this to still use it and
allow the test to handle expected failures like this. If that plan works
out, I plan to revert this.
In 0e4a75f917, Ken added a flag brw_stage_prog_data which indicates
whether any UBO pulls ever occur. Unfortunately, he neglected to set
the bit in the vec4 back-end. This was fine at the time because the
optimization was intended for iris which does not support gen7 and using
the vec4 back-end on Gen8+ requires an environment variable. We want to
use this in Vulkan which does support Gen7 so we want the information
from the vec4 back-end as well as scalar.
Fixes: 0e4a75f917 "intel/compiler: Record whether any pull constant..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise a smaller type may be promoted to int, which can hit undefined
behaviour:
../src/intel/compiler/brw_packed_float.c:66:17: runtime error: left shift of 128 by 24 places cannot be represented in type 'int'
#0 0x5604a03969aa in brw_vf_to_float ../src/intel/compiler/brw_packed_float.c:66
#1 0x5604a0391305 in vf_float_conversion_test_test_vf_to_float_Test::TestBody() ../src/intel/compiler/test_vf_float_conversions.cpp:70
#2 0x5604a041a323 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#3 0x5604a0405c31 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#4 0x5604a03ab03b in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
#5 0x5604a03ad714 in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
#6 0x5604a03afea2 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
#7 0x5604a03cb87c in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
#8 0x5604a041df3c in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#9 0x5604a0409609 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#10 0x5604a03c2e9e in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
#11 0x5604a0442d57 in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
#12 0x5604a0442c17 in main ../src/gtest/src/gtest_main.cc:37
#13 0x7f9a1983dbba in __libc_start_main ../csu/libc-start.c:308
#14 0x5604a0390d89 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/vf_float_conversions+0x8dd89)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
To avoid it, use the modulo of the number of bits in the value being
shifted, which is presumably what ended up happening on x86.
Flagged by UBSan:
../src/intel/compiler/brw_eu_validate.c:974:33: runtime error: shift exponent 64 is too large for 64-bit type 'long unsigned int'
#0 0x561abb612ab3 in general_restrictions_on_region_parameters ../src/intel/compiler/brw_eu_validate.c:974
#1 0x561abb617574 in brw_validate_instructions ../src/intel/compiler/brw_eu_validate.c:1851
#2 0x561abb53bd31 in validate ../src/intel/compiler/test_eu_validate.cpp:106
#3 0x561abb555369 in validation_test_source_cannot_span_more_than_2_registers_Test::TestBody() ../src/intel/compiler/test_eu_validate.cpp:486
#4 0x561abb742651 in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#5 0x561abb72e64d in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#6 0x561abb6d5451 in testing::Test::Run() ../src/gtest/src/gtest.cc:2474
#7 0x561abb6d7b2a in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
#8 0x561abb6da2b8 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
#9 0x561abb6f5c92 in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
#10 0x561abb74626a in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
#11 0x561abb732025 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
#12 0x561abb6ed2b4 in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
#13 0x561abb768b3b in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
#14 0x561abb7689fb in main ../src/gtest/src/gtest_main.cc:37
#15 0x7f525e5a9bba in __libc_start_main ../csu/libc-start.c:308
#16 0x561abb538ed9 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/eu_validate+0x1b8ed9)
Reviewed-by: Adam Jackson <ajax@redhat.com>
On Gen12, we support mixed mode HF/F operands, and also 3 source
instruction supports immediate value support, so keep immediate as it
is, if it fits properly in 16 bit field.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On Gen >= 12, if src0 or src2 holds immediate value, we need set
src[0/2]_is_imm bits instead of register file.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
On Gen >= 10, Either src0 or src2 can use 16-bit immediate value, but
not both.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Remove emit_alpha_to_coverage workaround from backend compiler and start
using ported workaround from NIR.
v2: Copy comment from brw_fs_visitor (Caio Marcelo de Oliveira Filho)
Fixes piglit test on HSW:
- arb_sample_shading-builtin-gl-sample-mask-mrt-alpha-to-coverage-combinations
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Importing this pass from fs_visitor::emit_alpha_to_coverage_workaround()
in intel/compiler.
v2 (Caio Marcelo de Oliveira Filho):
- Track store output and sample mask instruction
- Nest math insturction for more readability
- Bail out early if no gl_SampleMask
v3: (Caio Marcelo de Oliveira Filho):
- Do math instructions after instruction block
- Restructure code
- Move pass under src/intel/compiler
v4: (Caio Marcelo de Oliveira Filho):
- Organize dither mask calculation
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This can be useful to measure whether memory access optimizations are
having the desired effect. For example, we might see a reduction in
image loads/stores, or constant buffer loads. We can already see this
in cycle estimates to some extent, but this is a more direct approach,
minus a lot of the noise of random scheduler shuffling.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
DPH isn't actually commutative, so this doesn't work. If the immediate
in src0 would be a VF candidate, we could do better. *shrug*
No shader-db changes on any Intel platform.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: b04beaf41d ("intel/vec4: Try both sources as candidates for being immediates")
Tests the combinations of cases of RAW, WAW and WAR hazards involving
both inorder and outoforder instructions. Also tests that
dependencies combine and propagate correctly through control
flow (loops and conditionals).
v2: Add an extra test illustrating that the non-logical CFG edge
between then-block and else-block is being taking into
account. (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The bit moved on gen12 in order to prepare for dual-SIMD8 dispatch.
This implementation isn't an entirely complete as it only works on SIMD8
and SIMD16 and not dual-SIMD8.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Apparently the ts_request_type and ts_resource_select thread spawner
message descriptor bits were removed from the hardware at least since
ICL. Drop them in order to avoid assertion failures on Gen12+
platforms which don't have any encoding for this. On Gen9+ these are
probably just ignored by the hardware, so this is unlikely to have had
any functional implications prior to Gen12.
v2: Mark TS message fields as non-existing in brw_inst.h on ICL. (Caio)
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The WAIT instruction has been removed, but SYNC.bar can be used
instead to wait for a notification on n0.0.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Apparently this field was removed on SKL, and according to the
hardware docs for previous platforms "This field is only valid for a
ForwardMsg message. It is ignored for other messages. The BarrierMsg
message always increments the N0 notification counter".
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Confirmed no regressions after a full Piglit run on TGL with the
brw_fs_test_dispatch_packing() test enabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
They look like a NULL source if you don't look at the address mode.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>