Commit Graph

2553 Commits

Author SHA1 Message Date
Francisco Jerez
f3352745ad intel/disasm/gfx12+: Use helper instead of hardcoded bit access for 64-bit immediates.
So we don't have to duplicate code to handle differences in the
encoding of 64-bit immediates across platforms.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20543>
2023-01-19 06:14:03 +00:00
Francisco Jerez
4a2e7306dd intel/fs/gfx12: Ensure that prior reads have executed before barrier with acquire semantics.
This avoids a violation of the Vulkan memory model that was leading to
intermittent failures of at least 8k test-cases of the Vulkan CTS
(within the group dEQP-VK.memory_model.*) on TGL and DG2 platforms.
In theory the issue may be reproducible on earlier platforms like IVB
and ICL, but the SYNC.ALLWR instruction is not available on those
platforms so a different (likely costlier) fix will be needed.

The issue occurs within the sequence we emit for a NIR memory barrier
with acquire semantics requiring the synchronization of multiple
caches, e.g. in pseudocode for a barrier involving the TGM and UGM
caches on DG2:

 x <- load.ugm // Atomic read sequenced-before the barrier
 y <- fence.ugm
 z <- fence.tgm
 wait(y, z)
 w <- load.tgm // Read sequenced-after the barrier

In the example we must provide the guarantee that the memory load for
x is completed before the one for w, however this ordering can be
reversed with the intervention of a concurrent thread, since the UGM
fence will block on the prior UGM load and potentially take a long
time, while the TGM fence may complete and invalidate the TGM cache
immediately, so a concurrent thread could pollute the TGM cache with
stale contents for the w location *before* the UGM load has completed,
leading to an inversion of the expected memory ordering.

v2: Apply the workaround regardless of whether the NIR barrier
    intrinsic specifies multiple storage classes or a single one,
    since an acquire barrier is required to order subsequent requests
    relative to previous atomic requests of unknown storage class not
    necessarily specified by the memory scope information of the
    intrinsic.

Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20690>
2023-01-18 21:34:33 -08:00
Tapani Pälli
53de48f1c4 intel/compiler: add cpp_std=c++17 when building tests
Otherwise build fails:

"../src/intel/compiler/brw_private.h:40:4: note:
 ‘std::variant’ is only available from C++17 onwards"

Fixes: 6c194ddd18 ("intel/compiler: Prepare SIMD selection helpers to handle different prog_datas")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20725>
2023-01-17 13:58:03 +00:00
Nico Cortes
29adbb132f Revert "intel/compiler: fine-grained control of dispatch widths"
This reverts commit bed18ab3e2.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8063
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20654>
2023-01-12 00:33:25 +00:00
Marcin Ślusarz
bed18ab3e2 intel/compiler: fine-grained control of dispatch widths
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20535>
2023-01-11 08:17:12 +00:00
Ian Romanick
51be623372 intel/eu/validate: Check predication and cmod for SEL, CMP, and CMPN
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
e0f409c5d8 intel/eu/validate: Add validation for csel
v2: Also check the condition modifier. Suggested by Lionel.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
3a7c23973b intel/eu/validate: Add validation for bfi2
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
f34821d998 intel/eu/validate: More validation for logic ops
v2: Use number of source to condition validating src1 instead of using
the opcode. Suggested by Lionel.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
8be7406c81 intel/compiler: Assert that ARF used is the accumulator
v2: Move the new check to be with similar existing checks. Suggested by
Lionel.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
3b579a2ea8 intel/compiler: Validate 3-source instruction source strides
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
c5684019f6 intel/compiler: Validate 3-source instruction sources have same base type
This can't be checked in EU validation because the bits to describe the
base type of the individual sources no longer exist.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Lionel Landwerlin
6b494745be intel/fs: only avoid SIMD32 if strictly inferior in throughput
This enabled SIMD32 in blorp shaders and seems to be give a small FPS
bump when using a DG2 GPU as secondary (requires copies to linear
buffers to exchange with main GPU).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19341>
2023-01-09 08:41:47 +00:00
Ian Romanick
8ab7ec0129 intel/compiler: Enable lower_bitfield_extract_to_shifts and lower_bitfield_insert_to_shifts for pre-Gfx7
GLSL IR opcodes generated for bitfieldExtract and bitfieldInsert are
lowered by lower_instructions.  4dff3ff005 ("nir/opt_algebraic:
Optimize open coded bfm.") adds an optimization that can rematerialize
nir_op_bfm that was prevented by the GLSL IR lowering.

It appears that every piece of hardware, except older Intel GPUS, that
has real integers (i.e., lower_bitops is not set) also sets
lower_bitfield_extract_to_shifts and lower_bitfield_insert_to_shifts.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 4dff3ff005 ("nir/opt_algebraic: Optimize open coded bfm.")
Closes: #7874
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20323>
2023-01-03 18:37:53 -08:00
Lionel Landwerlin
25608659a0 intel/compiler: mark shader_record_ptr as uniform
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20413>
2022-12-23 09:22:13 +00:00
Jordan Justen
78a75e0d25 intel/common/intel_genX_state.h: Add intel_set_ps_dispatch_state()
This replaces brw_fs_get_dispatch_enables(), which was added in
b9403b1c47 ("intel: factor out dispatch PS enabling logic"), but this
function will not work well for future changes to 3DSTATE_PS.

So, instead, this moves the related code into a "genX" file which can
directly update 3DSTATE_PS for the given platform.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20329>
2022-12-15 00:54:59 -08:00
Ian Romanick
eb76cee9f8 nir: Eliminate nir_op_i2b
There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b.  Instead of adding a huge
pile of additional patterns (including variations that include both ine
and i2b), always lower i2b to a != 0.

At this point in the series, it should be impossible for anything to
generate i2b, so there /should not/ be any changes.

The failing test on d3d12 is a pre-existing bug that is triggered by
this change.  I talked to Jesse about it, and, after some analysis, he
suggested just adding it to the list of known failures.

v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b.

v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py.

v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after
nir_lower_doubles makes progress.  The latter can generate b2i
instructions, but nir_lower_int64 can't handle them (anymore).

v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I
had accidentally removed the f2b(bf2(x)) optimization.

v6: Just eliminate the i2b instruction.

v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused)
emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this
function was still used. 🤷

No shader-db changes on any Intel platform.

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141165875 -> 141165873 (-0.0%)
Instructions helped: 2

Cycles in all programs: 9098956382 -> 9098956350 (-0.0%)
Cycles helped: 2

The two Vulkan shaders are helped because of the "new" (('b2i32',
('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern.

Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
edae161d98 intel/fs: Use nir_type_convert instead of nir_type_conversion_op
In a future commit, nit_type_conversion_op won't be able to handle i2b
(and in a much later commit f2b), so switch many users to the fully
featured function.

No shader-db or fossil-db changes on any Intel platform.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Lionel Landwerlin
94bb4a13fa intel/fs: make Wa_1806565034 conditional to non robust access
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20280>
2022-12-13 18:05:19 +00:00
Marcin Ślusarz
75375233f6 intel/compiler/mesh: extract emit_urb_direct_vec4_write
No functional changes.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20292>
2022-12-13 13:00:49 +00:00
Marcin Ślusarz
3a60112ce5 intel/compiler: optimize away local_inv_index and local_inv_id if workgroup size is 1
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20292>
2022-12-13 13:00:49 +00:00
Marcin Ślusarz
85b1c89e20 intel/compiler: split lower_cs_intrinsics_convert_block
No functional changes.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20292>
2022-12-13 13:00:48 +00:00
Marcin Ślusarz
bb93f1bda1 intel/compiler/mesh: extract shared code for offset adjustment
No functional changes.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20292>
2022-12-13 13:00:48 +00:00
Marcin Ślusarz
7fbd1dfb18 anv,intel/compiler/mesh: drop lowering of gl_Primitive*IndicesEXT
Until U888X index format lands this change shouldn't have any impact on performance.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20292>
2022-12-13 13:00:48 +00:00
Caio Oliveira
e9efd05af5 intel/compiler: Remove leftover declarations of old NIR passes
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19805>
2022-12-12 10:03:04 +00:00
Lionel Landwerlin
6106396825 intel/nir/rt: fixup primitive id
There is a delta index value in the hit structure, we forgot to add it
to the base value.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0465714790 ("intel/nir/rt: add more helpers for ray queries")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7565
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19346>
2022-12-12 10:16:21 +02:00
Paulo Zanoni
a099d6ae4d intel: add devinfo->has_64bit_float_via_math_pipe
Unusual hardware features that require special hanlding usually get a
devinfo field, so do this for MTL's unordered DF types. This will
guarantee that any platform based on MTL (thus inheriting from
MTL_FEATURES) will automatically be handled in these special cases.

v2: s/has_unordered_64bit_float/has_64bit_float_via_math_pipe/ (Curro).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Paulo Zanoni
eac00f4ec7 intel/compiler: fix intel_swsb_decode for newer platforms
In the previous patch we adjusted the scoreboard pass to take into
consideration a new case of unordered operations for TGL. Fix the
decoding as well.

v2: use intel_device_info_is_mtl()  (Curro, Jordan)
v3: the part where we export num_sources_from_inst() is now a separate patch
    (Curro).
v4: Work around false positive maybe-unitialized warning since Marge
    uses -Werror=maybe-uninitialized (Marge).

Reviewed-by: Francisco Jerez <currojerez@riseup.net> (v3)
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Paulo Zanoni
295c5f59e0 intel/compiler: export brw_num_sources_from_inst
We want to call this from brw_disasm.c, so move it out to brw_eu.c
since it's about to become more of a shared utility function than
something specific to the EU validator.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Paulo Zanoni
df50add27e intel/compiler: avoid 64bit SEL_EXEC on MTL
On MTL, instructions with DF type are unordered, executed in the math
pipe. This means that they require different SWSB dependency handling,
and also that in some cases such as MOVs it's generally faster to
simply use 2 smaller ordered moves than a single unordered MOV.

One problem we have with the current code is that generate_code() is
not setting the proper SWSB dependencies for the generated DF MOVs,
causing some tests to fail.

One solution would be to fix generate_code() by making it set the
appropriate dependencies. This was the first patch I wrote. Another
solution to this problem, pointed to us by Curro, is to change
required_exec_type() so we use UD instructions instead of DF, just
like we do with platforms that don't have 64 bit instructions, which
means there won't be anything to fix in generate_code(). The second
solution is what this patch implements.

This fixes at least:
 - dEQP-VK.subgroups.arithmetic.framebuffer.subgroupmin_double_vertex

Thanks to Francisco Jerez for all the major help provided with this
problem.

Credits-to: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Paulo Zanoni
951855c349 intel/compiler: avoid (RegDist, SBID) on DF instructions on MTL
When we use this form there's no way to specify which pipe RegDist
refers to, so there are a few rules to figure this out, which is what
inferred_sync_pipe() implements. But for MTL there's no long pipe and
the documentation does not explicitly explain what should be the
inferred type for its long (DF) instructions - which are out-of-order,
by the way.  One way to interpret this is that such case should be
avoided.  So add the extra check to entirely avoid this case.

Notice that this is not actually fixing any bug, since returning
TGL_PIPE_LONG (what we do today) will actually make these DF
instructions incompatible with every in-order instruction, so we'll
never opt to use the (RegDist, SBID) form anyway. But still, it's
better to have this case explicitly documented instead of having it
covered by a semi coincidence.

v2: use intel_device_info_is_mtl()  (Curro, Jordan)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Paulo Zanoni
16b9f87104 intel/compiler: on MTL, DF instructions run in the math pipe
Adjust the scoreboard code to take that into account.

Fixes at least:
  - dEQP-VK.glsl.builtin.precision_double.refract.compute.vec3
  - dEQP-VK.glsl.builtin.precision_double.matrixcompmult.compute.mat4

v2: use intel_device_info_is_mtl()  (Curro, Jordan)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Francisco Jerez
051887fbf3 intel/fs: Make the result of is_unordered() dependent on devinfo.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Kenneth Graunke
8c2448d4e6 intel/compiler: Delete sampler key handling for planar format stuff
i965 used these, but Gallium drivers do this lowering via a separate
nir_lower_tex call from st/mesa.  Vulkan drivers don't use these at all.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20223>
2022-12-09 10:18:25 +00:00
Kenneth Graunke
88918baf5c intel/compiler: Delete key->msaa_16
None of the drivers have used this since we dropped i965, and BLORP
no longer uses it as of the previous commit.  We can also drop the
former compressed_multisample_tex_mask (now padding) field so that
things remain 64-bit aligned.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20223>
2022-12-09 10:18:25 +00:00
Kenneth Graunke
584e18863e intel: Drop compressed_multisample_layout_mask from the compiler keys
The compiler looks at this key field to determine whether to perform
an MCS fetch for a txf_ms or samples_identical texture message, if a
nir_tex_src_ms_mcs_intel source wasn't provided.  If it isn't set,
it instead uses constant 0 (nothing is compressed).

All of the drivers (iris, crocus, anv, hasvk) unconditionally set this
to ~0 because we don't want to pay for costly shader recompiles (which
can cause nasty stuttering).  Most textures are compressed anyway, and
the hardware ignores the l2dms MCS parameter if MCS is disabled.

The only user was BLORP, which sets the key field based on whether the
texture's aux usage has MCS.  But if it has MCS, it also does the MCS
fetch itself and supplies it directly.  Otherwise, it relies on the
compiler to fill in the 0 value.  But it could easily just provide the
0 value itself in that case and not rely on the compiler at all.

With that fixed, we can just drop the key fields entirely.  We leave
them as padding for now to avoid repacking structures; we won't need
to after the next commits anyway.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20223>
2022-12-09 10:18:25 +00:00
Lionel Landwerlin
b4b4294a78 intel/fs: add a saturation propagation test
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20206>
2022-12-09 00:39:05 +00:00
Oleksii Bozhenko
d5d8bb1dbb brw: fix saturate propagation region overlap range
Fixes: 947c828d5c
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7691

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Oleksii Bozhenko <oleksii.bozhenko@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20206>
2022-12-09 00:39:05 +00:00
Tapani Pälli
bc4b7de0d0 intel/fs: implement Wa_14017989577
The first instruction of any kernel should have non-zero emask. This
restriction needs to be obeyed to avoid GPU hangs.

Patch adds a function to insert dummy mov as first instruction
to make sure this requirement is fulfilled.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20194>
2022-12-08 23:58:32 +00:00
Kenneth Graunke
bafbe7c23a intel/compiler: Set NoMask on cr0 access for float controls mode
This is trying to clear a bit in the control register.  However, it's
executing with whatever channel mask happens to be active.  Typically
this is the one at the start of the program, so at least some channels
will be active.  Typically the first channel will be active due to
packed dispatch, but that's not always guaranteed.  Without NoMask,
the float controls writes may randomly not happen.

Recent GPUs also seem to have a hang issue when the first instruction in
the shader doesn't have any active channels.  Having an instruction with
NoMask at the start of the program works around the issue.  See HSD bug
14017989577.  In our case, the float controls preamble was breaking that
restriction every time, causing us to run into this problem frequently.

Thanks to Tapani Pälli for finding this hang issue, and Francisco
Jerez and Lionel Landwerlin for helping pinpoint this issue during
review of a workaround patch in !20194.

Fixes GPU hangs in Elder Scrolls Online, Witcher 3, and likely more.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7639
Fixes: 9da56ffc52 ("i965/fs: add emit_shader_float_controls_execution_mode() and aux functions")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20214>
2022-12-08 09:54:09 +00:00
Lionel Landwerlin
e25e17dd0c intel/fs: clamp per vertex input accesses to patchControlPoints
In a tesselation control shader where an input array is accessed using
the index gl_InvocationID, we can end up accessing elements beyond the
number of input vertices specified in the shader key.

This happens because of the lowering in nir_lower_indirect_derefs().
This lowering will affect compact variables which happens in this
case :

  in gl_PerVertex {
      vec4  gl_Position;
      float gl_ClipDistance[1];
  } gl_in[gl_MaxPatchVertices];

The lowered code produced by NIR is somewhat ineffecient (implements a
binary seach) :

  if (gl_InvocationID < 16) {
     if (gl_InvocationID < 8) {
        if (gl_InvocationID < 4) {
          vec4 vals = load_at_offset(0);
          value = bcsel(vals, gl_InvocationID);
        } else {
          vec4 vals = load_at_offset(4);
          value = bcsel(vals, gl_InvocationID - 4);
        }
     } else {
        if (gl_InvocationID < 12) {
          vec4 vals = load_at_offset(8);
          value = bcsel(vals, gl_InvocationID - 8);
        } else {
          vec4 vals = load_at_offset(12);
          value = bcsel(vals, gl_InvocationID - 12);
        }
     }
  } else {
     if (gl_InvocationID < 24) {
        ...
     } else {
        ...
     }
  }

By default the gl_MaxPatchVertices must be set at 32 items and that's
what the lowering code will use to divide the access into chunks of 4.
But when running with 3 input vertices, this means we'll pull one more
item than what was delivered in the shader payload.

This triggers issues further down the register scheduling where the
g5UD (register for the 4th item) is overwritten by a previous SEND,
leading the URB read to use an invalid handle.

This pass clamps any access load_per_vertex_input intrinsic vertex
indice to (input_vertices - 1).

Fixes issues with tests like :
dEQP-VK.clipping.user_defined.clip_distance.vert_tess.*

Also fixes a hang with zink/anv on :
KHR-GL46.draw_elements_base_vertex_tests.AEP_shader_stages

v2: Don't replace source register

v3: Implement in NIR

v4: Clamp per vertex array sizes in NIR (Jason)

v5: Move the clamping on the intel compiler

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9749>
2022-12-07 08:16:03 +00:00
Marcin Ślusarz
7809f76fe8 intel/compiler/mesh: align payload size to the size of vec4
This reduces the number of instructions in task shaders when payload
size is not aligned to vec4 and payload_in_shared WA is enabled,
because nir_lower_task_shader will not need to handle the unaligned
size case.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20080>
2022-12-06 16:31:11 +00:00
Lionel Landwerlin
d4cd33630a intel: add missing restriction on fragment simd dispatch
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7755
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Tested-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20169>
2022-12-06 00:37:50 +02:00
Lionel Landwerlin
b9403b1c47 intel: factor out dispatch PS enabling logic
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Tested-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20169>
2022-12-06 00:37:47 +02:00
Lionel Landwerlin
df38426072 intel/rt/nir: add support for RayCullMaskKHR
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20011>
2022-12-02 09:28:23 +00:00
Lionel Landwerlin
6202a2c6b4 intel/rt/nir: enable the trampoline shader to load the indirect ray shader bsr
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20011>
2022-12-02 09:28:23 +00:00
Lionel Landwerlin
a855bdbf47 intel/nir/rt: switch to workgroup_id_zero_base
RT don't use a base workgroup id so no reason of using workgroup_id.
Additionally the lowering introduced in b4dd3df227 requires something
provides base_workgroup_id which we don't have for RT as it's not
needed.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b4dd3df227 ("intel/nir: Set has_base_workgroup_id for lower_compute_system_values")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7812
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20115>
2022-12-02 05:25:22 +00:00
Marcin Ślusarz
db0e6f9a07 intel/compiler: user payload starts after TUE header & its padding
All data written by the user are offset by TUE header size.
Without this patch we copy the correct amount of user data, but both
"from" and "to" offsets are wrong.

Fixes: 37e78803d7 ("intel/compiler: use nir_lower_task_shader pass")

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19409>
2022-12-01 11:19:47 +00:00
Marcin Ślusarz
7aaafaa8ae intel/compiler: adjust [store|load]_task_payload.base too
Base also needs to be converted from bytes to words.

Fixes: c36ae42e4c ("intel/compiler: Use nir_var_mem_task_payload")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19409>
2022-12-01 11:19:47 +00:00
Jason Ekstrand
b4dd3df227 intel/nir: Set has_base_workgroup_id for lower_compute_system_values
This option didn't exist half a decade ago when I first implemented base
workgroup support in ANV.  It's cleaner to just have split system values
like all the other zero_base+base things do.

We currently only do this for COMPUTE and not KERNEL because it lets us
avoid changing intel_clc for now.  We can add KERNEL later if needed.
We also don't do this lowering for task/mesh.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20068>
2022-12-01 04:56:48 +00:00