Commit Graph

2366 Commits

Author SHA1 Message Date
Kenneth Graunke
7092c1218a intel/compiler: Use more symbolic source names in components_read()
Rather than hardcoding source 1, source 2, etc.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
16b66ab659 intel/compiler: Drop dest checking in atomic code
NIR atomic operation intrinsics all have destinations.  This is just
copy and pasted from other generic intrinsic handling where that may
or may not be the case.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
780f3e2e6b intel/compiler: Delete all the A64 atomic variants for type sizes
These are handled identically in almost all cases.  There is one place
in the legacy surface lowering that was obtaining the bitsize from the
opcode, but the LSC-based lowering uses (type_sz(inst->dst.type) * 8)
for that and works just fine.  If we just do that in the legacy lowering
too, then we don't need this plethora of opcodes.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
03ddde1230 intel/compiler: Combine nir_emit_{ssbo,shared}_atomic into one helper
These are basically identical save for:
- shared has surface hardcoded to SLM rather than an SSBO index
- shared has to handle adding the 'base' const_index (SSBO have none)
- the NIR source index for data is shifted by one

It's not worth copy and pasting the entire function for this.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
b84939c678 intel/compiler: Delete fs_visitor::nir_emit_{ssbo,shared}_atomic_float()
These are now basically identical to their non-float counterparts.  The
only thing that differed was the opcode checking to determine which
operands existed.  Now that we have a unified opcode enum and a helper
for the number of data operands, we can just use that.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
f7b29d7924 intel/compiler: Drop redundant 32-bit expansion for shared float atomics
We already expanded data to 32-bit a few lines earlier, so this is just
redundantly doing it a second time.

Fixes: 43169dbbe5 ("intel/compiler: Support 16 bit float ops")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
02129eee3a intel/compiler: Eliminate SHADER_OPCODE_UNTYPED_ATOMIC_FLOAT
The only reason for the separate opcode was because of the overlapping
BRW_AOP_* enums, making it impossible to tell whether a particular AOP
was the integer or float operation.  Now that we use the lsc_opcode
enums, we can just have the legacy lowering inspect the opcode and
select the right descriptor.  No need for a separate opcode.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
284f0c9a57 intel/compiler: Add an lsc_op_num_data_values() helper
There are a number of places that need to know how many operands an LSC
atomic takes (0 for inc/dec, 1 for most things, 2 for cmpxchg).  We can
add a helper for that and eliminate some code (with more to come).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
90a2137cd5 intel/compiler: Use LSC opcode enum rather than legacy BRW_AOPs
This gets our logical atomic messages using the lsc_opcode enum rather
than the legacy BRW_AOP_* defines.  We have to translate one way or
another, and using the modern set makes sense going forward.

One advantage is that the lsc_opcode encoding has opcodes for both
integer and floating point atomics in the same enum, whereas the legacy
encoding used overlapping values (BRW_AOP_AND == 1 == BRW_AOP_FMAX),
which made it impossible to handle both sensibly in common code.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Kenneth Graunke
8d2dc52a14 intel/compiler: Move atomic op translation into emit_*_atomic()
There's no need to pass both the intrinsic and an opcode computed from
that same intrinsic.  Just do it in the functions themselves.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Francisco Jerez
f40e17059a intel/fs/gfx12+: Drop redundant handling of SHADER_OPCODE_BROADCAST in exec pipe inference.
Commit c80c0ed943 introduced handling of
SHADER_OPCODE_BROADCAST into inferred_exec_pipe(), but it was already
being handled, drop the redundant handling.  Shouldn't lead to any
functional changes.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20543>
2023-01-19 06:14:03 +00:00
Francisco Jerez
b867d1b851 intel/eu/gfx12+: Implement decoding of 64-bit immediates.
C.f. a12533f2ce.  The corresponding
change for the decoding path was never implemented so the disassembler
was printing incorrect immediate values.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20543>
2023-01-19 06:14:03 +00:00
Francisco Jerez
f80f29dc4b intel/disasm/gfx12+: Fix print out of non-existing condmod field with 64-bit immediate.
The conditional mode field doesn't exist for instructions with a
64-bit immediate, so this would currently print garbage.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20543>
2023-01-19 06:14:03 +00:00
Francisco Jerez
f3352745ad intel/disasm/gfx12+: Use helper instead of hardcoded bit access for 64-bit immediates.
So we don't have to duplicate code to handle differences in the
encoding of 64-bit immediates across platforms.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20543>
2023-01-19 06:14:03 +00:00
Francisco Jerez
4a2e7306dd intel/fs/gfx12: Ensure that prior reads have executed before barrier with acquire semantics.
This avoids a violation of the Vulkan memory model that was leading to
intermittent failures of at least 8k test-cases of the Vulkan CTS
(within the group dEQP-VK.memory_model.*) on TGL and DG2 platforms.
In theory the issue may be reproducible on earlier platforms like IVB
and ICL, but the SYNC.ALLWR instruction is not available on those
platforms so a different (likely costlier) fix will be needed.

The issue occurs within the sequence we emit for a NIR memory barrier
with acquire semantics requiring the synchronization of multiple
caches, e.g. in pseudocode for a barrier involving the TGM and UGM
caches on DG2:

 x <- load.ugm // Atomic read sequenced-before the barrier
 y <- fence.ugm
 z <- fence.tgm
 wait(y, z)
 w <- load.tgm // Read sequenced-after the barrier

In the example we must provide the guarantee that the memory load for
x is completed before the one for w, however this ordering can be
reversed with the intervention of a concurrent thread, since the UGM
fence will block on the prior UGM load and potentially take a long
time, while the TGM fence may complete and invalidate the TGM cache
immediately, so a concurrent thread could pollute the TGM cache with
stale contents for the w location *before* the UGM load has completed,
leading to an inversion of the expected memory ordering.

v2: Apply the workaround regardless of whether the NIR barrier
    intrinsic specifies multiple storage classes or a single one,
    since an acquire barrier is required to order subsequent requests
    relative to previous atomic requests of unknown storage class not
    necessarily specified by the memory scope information of the
    intrinsic.

Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20690>
2023-01-18 21:34:33 -08:00
Tapani Pälli
53de48f1c4 intel/compiler: add cpp_std=c++17 when building tests
Otherwise build fails:

"../src/intel/compiler/brw_private.h:40:4: note:
 ‘std::variant’ is only available from C++17 onwards"

Fixes: 6c194ddd18 ("intel/compiler: Prepare SIMD selection helpers to handle different prog_datas")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20725>
2023-01-17 13:58:03 +00:00
Nico Cortes
29adbb132f Revert "intel/compiler: fine-grained control of dispatch widths"
This reverts commit bed18ab3e2.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8063
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20654>
2023-01-12 00:33:25 +00:00
Marcin Ślusarz
bed18ab3e2 intel/compiler: fine-grained control of dispatch widths
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20535>
2023-01-11 08:17:12 +00:00
Ian Romanick
51be623372 intel/eu/validate: Check predication and cmod for SEL, CMP, and CMPN
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
e0f409c5d8 intel/eu/validate: Add validation for csel
v2: Also check the condition modifier. Suggested by Lionel.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
3a7c23973b intel/eu/validate: Add validation for bfi2
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
f34821d998 intel/eu/validate: More validation for logic ops
v2: Use number of source to condition validating src1 instead of using
the opcode. Suggested by Lionel.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
8be7406c81 intel/compiler: Assert that ARF used is the accumulator
v2: Move the new check to be with similar existing checks. Suggested by
Lionel.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
3b579a2ea8 intel/compiler: Validate 3-source instruction source strides
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Ian Romanick
c5684019f6 intel/compiler: Validate 3-source instruction sources have same base type
This can't be checked in EU validation because the bits to describe the
base type of the individual sources no longer exist.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20527>
2023-01-09 19:15:19 +00:00
Lionel Landwerlin
6b494745be intel/fs: only avoid SIMD32 if strictly inferior in throughput
This enabled SIMD32 in blorp shaders and seems to be give a small FPS
bump when using a DG2 GPU as secondary (requires copies to linear
buffers to exchange with main GPU).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19341>
2023-01-09 08:41:47 +00:00
Ian Romanick
8ab7ec0129 intel/compiler: Enable lower_bitfield_extract_to_shifts and lower_bitfield_insert_to_shifts for pre-Gfx7
GLSL IR opcodes generated for bitfieldExtract and bitfieldInsert are
lowered by lower_instructions.  4dff3ff005 ("nir/opt_algebraic:
Optimize open coded bfm.") adds an optimization that can rematerialize
nir_op_bfm that was prevented by the GLSL IR lowering.

It appears that every piece of hardware, except older Intel GPUS, that
has real integers (i.e., lower_bitops is not set) also sets
lower_bitfield_extract_to_shifts and lower_bitfield_insert_to_shifts.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 4dff3ff005 ("nir/opt_algebraic: Optimize open coded bfm.")
Closes: #7874
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20323>
2023-01-03 18:37:53 -08:00
Lionel Landwerlin
25608659a0 intel/compiler: mark shader_record_ptr as uniform
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20413>
2022-12-23 09:22:13 +00:00
Jordan Justen
78a75e0d25 intel/common/intel_genX_state.h: Add intel_set_ps_dispatch_state()
This replaces brw_fs_get_dispatch_enables(), which was added in
b9403b1c47 ("intel: factor out dispatch PS enabling logic"), but this
function will not work well for future changes to 3DSTATE_PS.

So, instead, this moves the related code into a "genX" file which can
directly update 3DSTATE_PS for the given platform.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20329>
2022-12-15 00:54:59 -08:00
Ian Romanick
eb76cee9f8 nir: Eliminate nir_op_i2b
There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b.  Instead of adding a huge
pile of additional patterns (including variations that include both ine
and i2b), always lower i2b to a != 0.

At this point in the series, it should be impossible for anything to
generate i2b, so there /should not/ be any changes.

The failing test on d3d12 is a pre-existing bug that is triggered by
this change.  I talked to Jesse about it, and, after some analysis, he
suggested just adding it to the list of known failures.

v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b.

v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py.

v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after
nir_lower_doubles makes progress.  The latter can generate b2i
instructions, but nir_lower_int64 can't handle them (anymore).

v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I
had accidentally removed the f2b(bf2(x)) optimization.

v6: Just eliminate the i2b instruction.

v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused)
emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this
function was still used. 🤷

No shader-db changes on any Intel platform.

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141165875 -> 141165873 (-0.0%)
Instructions helped: 2

Cycles in all programs: 9098956382 -> 9098956350 (-0.0%)
Cycles helped: 2

The two Vulkan shaders are helped because of the "new" (('b2i32',
('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern.

Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Ian Romanick
edae161d98 intel/fs: Use nir_type_convert instead of nir_type_conversion_op
In a future commit, nit_type_conversion_op won't be able to handle i2b
(and in a much later commit f2b), so switch many users to the fully
featured function.

No shader-db or fossil-db changes on any Intel platform.

Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
2022-12-14 06:23:21 +00:00
Lionel Landwerlin
94bb4a13fa intel/fs: make Wa_1806565034 conditional to non robust access
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20280>
2022-12-13 18:05:19 +00:00
Marcin Ślusarz
75375233f6 intel/compiler/mesh: extract emit_urb_direct_vec4_write
No functional changes.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20292>
2022-12-13 13:00:49 +00:00
Marcin Ślusarz
3a60112ce5 intel/compiler: optimize away local_inv_index and local_inv_id if workgroup size is 1
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20292>
2022-12-13 13:00:49 +00:00
Marcin Ślusarz
85b1c89e20 intel/compiler: split lower_cs_intrinsics_convert_block
No functional changes.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20292>
2022-12-13 13:00:48 +00:00
Marcin Ślusarz
bb93f1bda1 intel/compiler/mesh: extract shared code for offset adjustment
No functional changes.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20292>
2022-12-13 13:00:48 +00:00
Marcin Ślusarz
7fbd1dfb18 anv,intel/compiler/mesh: drop lowering of gl_Primitive*IndicesEXT
Until U888X index format lands this change shouldn't have any impact on performance.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20292>
2022-12-13 13:00:48 +00:00
Caio Oliveira
e9efd05af5 intel/compiler: Remove leftover declarations of old NIR passes
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19805>
2022-12-12 10:03:04 +00:00
Lionel Landwerlin
6106396825 intel/nir/rt: fixup primitive id
There is a delta index value in the hit structure, we forgot to add it
to the base value.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0465714790 ("intel/nir/rt: add more helpers for ray queries")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7565
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19346>
2022-12-12 10:16:21 +02:00
Paulo Zanoni
a099d6ae4d intel: add devinfo->has_64bit_float_via_math_pipe
Unusual hardware features that require special hanlding usually get a
devinfo field, so do this for MTL's unordered DF types. This will
guarantee that any platform based on MTL (thus inheriting from
MTL_FEATURES) will automatically be handled in these special cases.

v2: s/has_unordered_64bit_float/has_64bit_float_via_math_pipe/ (Curro).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Paulo Zanoni
eac00f4ec7 intel/compiler: fix intel_swsb_decode for newer platforms
In the previous patch we adjusted the scoreboard pass to take into
consideration a new case of unordered operations for TGL. Fix the
decoding as well.

v2: use intel_device_info_is_mtl()  (Curro, Jordan)
v3: the part where we export num_sources_from_inst() is now a separate patch
    (Curro).
v4: Work around false positive maybe-unitialized warning since Marge
    uses -Werror=maybe-uninitialized (Marge).

Reviewed-by: Francisco Jerez <currojerez@riseup.net> (v3)
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Paulo Zanoni
295c5f59e0 intel/compiler: export brw_num_sources_from_inst
We want to call this from brw_disasm.c, so move it out to brw_eu.c
since it's about to become more of a shared utility function than
something specific to the EU validator.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Paulo Zanoni
df50add27e intel/compiler: avoid 64bit SEL_EXEC on MTL
On MTL, instructions with DF type are unordered, executed in the math
pipe. This means that they require different SWSB dependency handling,
and also that in some cases such as MOVs it's generally faster to
simply use 2 smaller ordered moves than a single unordered MOV.

One problem we have with the current code is that generate_code() is
not setting the proper SWSB dependencies for the generated DF MOVs,
causing some tests to fail.

One solution would be to fix generate_code() by making it set the
appropriate dependencies. This was the first patch I wrote. Another
solution to this problem, pointed to us by Curro, is to change
required_exec_type() so we use UD instructions instead of DF, just
like we do with platforms that don't have 64 bit instructions, which
means there won't be anything to fix in generate_code(). The second
solution is what this patch implements.

This fixes at least:
 - dEQP-VK.subgroups.arithmetic.framebuffer.subgroupmin_double_vertex

Thanks to Francisco Jerez for all the major help provided with this
problem.

Credits-to: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Paulo Zanoni
951855c349 intel/compiler: avoid (RegDist, SBID) on DF instructions on MTL
When we use this form there's no way to specify which pipe RegDist
refers to, so there are a few rules to figure this out, which is what
inferred_sync_pipe() implements. But for MTL there's no long pipe and
the documentation does not explicitly explain what should be the
inferred type for its long (DF) instructions - which are out-of-order,
by the way.  One way to interpret this is that such case should be
avoided.  So add the extra check to entirely avoid this case.

Notice that this is not actually fixing any bug, since returning
TGL_PIPE_LONG (what we do today) will actually make these DF
instructions incompatible with every in-order instruction, so we'll
never opt to use the (RegDist, SBID) form anyway. But still, it's
better to have this case explicitly documented instead of having it
covered by a semi coincidence.

v2: use intel_device_info_is_mtl()  (Curro, Jordan)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Paulo Zanoni
16b9f87104 intel/compiler: on MTL, DF instructions run in the math pipe
Adjust the scoreboard code to take that into account.

Fixes at least:
  - dEQP-VK.glsl.builtin.precision_double.refract.compute.vec3
  - dEQP-VK.glsl.builtin.precision_double.matrixcompmult.compute.mat4

v2: use intel_device_info_is_mtl()  (Curro, Jordan)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Francisco Jerez
051887fbf3 intel/fs: Make the result of is_unordered() dependent on devinfo.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Kenneth Graunke
8c2448d4e6 intel/compiler: Delete sampler key handling for planar format stuff
i965 used these, but Gallium drivers do this lowering via a separate
nir_lower_tex call from st/mesa.  Vulkan drivers don't use these at all.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20223>
2022-12-09 10:18:25 +00:00
Kenneth Graunke
88918baf5c intel/compiler: Delete key->msaa_16
None of the drivers have used this since we dropped i965, and BLORP
no longer uses it as of the previous commit.  We can also drop the
former compressed_multisample_tex_mask (now padding) field so that
things remain 64-bit aligned.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20223>
2022-12-09 10:18:25 +00:00
Kenneth Graunke
584e18863e intel: Drop compressed_multisample_layout_mask from the compiler keys
The compiler looks at this key field to determine whether to perform
an MCS fetch for a txf_ms or samples_identical texture message, if a
nir_tex_src_ms_mcs_intel source wasn't provided.  If it isn't set,
it instead uses constant 0 (nothing is compressed).

All of the drivers (iris, crocus, anv, hasvk) unconditionally set this
to ~0 because we don't want to pay for costly shader recompiles (which
can cause nasty stuttering).  Most textures are compressed anyway, and
the hardware ignores the l2dms MCS parameter if MCS is disabled.

The only user was BLORP, which sets the key field based on whether the
texture's aux usage has MCS.  But if it has MCS, it also does the MCS
fetch itself and supplies it directly.  Otherwise, it relies on the
compiler to fill in the 0 value.  But it could easily just provide the
0 value itself in that case and not rely on the compiler at all.

With that fixed, we can just drop the key fields entirely.  We leave
them as padding for now to avoid repacking structures; we won't need
to after the next commits anyway.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20223>
2022-12-09 10:18:25 +00:00
Lionel Landwerlin
b4b4294a78 intel/fs: add a saturation propagation test
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20206>
2022-12-09 00:39:05 +00:00