Georg Lehmann
05ca6e2478
amd/common: set COMPUTE_STATIC_THREAD_MGMT_SE2-3 correctly on gfx10-11
...
There is a hole between SE1 and SE2 occupied by COMPUTE_TMPRING_SIZE.
Fixes: 3c8b48e310
("ac,radeonsi: add a function to initialize compute preambles")
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29622 >
2024-06-08 19:18:53 +00:00
Karol Herbst
5d013da038
rusticl/memory: copies might overlap for host ptrs
...
We can't really gurantee there is no overlap, because applications might
pass in arbitrary host pointers.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29604 >
2024-06-08 17:03:31 +00:00
Karol Herbst
e522c91d5c
rusticl/spirv: do not pass a NULL pointer to slice::from_raw_parts
...
Fixes: e8de580998
("rusticl/kernel: basic implementation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29604 >
2024-06-08 17:03:31 +00:00
Kenneth Graunke
3da444b79e
intel/brw: Refactor code to commute immediates into legal positions
...
This will let us reuse this in a new pass shortly.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624 >
2024-06-08 02:19:12 -07:00
Kenneth Graunke
d45da713e7
intel/brw: Refactor try_constant_propagate()
...
This will let us reuse the bulk of this code in a new copy propagation
pass without replicating it. We retain a wrapper function for dealing
with ACP entries, which the new pass won't have.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624 >
2024-06-08 02:19:10 -07:00
Kenneth Graunke
85aa6f80af
intel/brw: Drop BRW_OPCODE_IF from try_constant_propagate
...
This was for Sandybridge's IF with embedded comparison, which only
existed for a single generation of hardware. Since the compiler fork,
we no longer support Sandybridge here.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624 >
2024-06-08 02:19:08 -07:00
Kenneth Graunke
7019bc4469
intel/brw: Drop compiler parameter from try_constant_propagate()
...
This is unused.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624 >
2024-06-08 02:19:06 -07:00
Kenneth Graunke
43ab997951
intel/brw: Update instructions_match() to compare more fields
...
We were missing the following "newer" fields:
- ex_desc
- predicate_trivial
- sdepth
- rcount
- writes_accumulator
- no_dd_clear
- no_dd_check
- check_tdr
- send_is_volatile
- send_ex_desc_scratch
- send_ex_bso
- last_rt
- keep_payload_trailing_zeroes
- has_packed_lod_ai_src
We can actually just check ex_desc and the new "bits" union to handle
most of them with fewer checks.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624 >
2024-06-08 02:19:03 -07:00
Kenneth Graunke
061da9f748
intel/brw: Make brw_reg::bits publicly accessible from fs_reg
...
I want to be able to hash an fs_reg, including all the brw_reg fields.
It's easiest to do this if I can use the "bits" union field that
incorporates many of the other ones.
We also move the using declaration for "nr" down because that field was
moved to the second section a while back.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624 >
2024-06-08 02:19:01 -07:00
Kenneth Graunke
b4a595204b
intel/brw: Add a idom_tree::dominates(a, b) helper.
...
Simpler to use than the existing methods.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624 >
2024-06-08 02:18:56 -07:00
Kenneth Graunke
e2d9ff8004
intel/brw: Handle scratch address swizzling of constants
...
Pass in the nir_src and check if it's constant, handling it via CPU-side
arithmetic instead of emitting instructions. While we can constant fold
these via our optimization passes, we have to do opt_algebraic to fold
the binary operation with constant sources into a MOV of an immediate,
then opt_copy_propagation to put it in the next expression, and so on,
until the entire expression is folded. This can take several iterations
of the optimization loop, which is inefficient.
For example, gfxbench5/aztec-ruins/normal/7 has load/store_scratch
intrinsics with constant sources, and this patch removes a number of
optimization passes according to INTEL_DEBUG=optimizer.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624 >
2024-06-08 02:18:54 -07:00
Kenneth Graunke
07745752d6
intel/brw: Skip fs_nir_setup_outputs for compute shaders
...
There aren't any outputs, so there's no point to doing this work.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624 >
2024-06-08 02:18:54 -07:00
Kenneth Graunke
fa1564fb87
intel/brw: Recreate GS output registers after EmitVertex
...
Geometry shaders write outputs multiple times, with EmitVertex()
between them. The value of output variables becomes undefined after
calling EmitVertex(), so we don't need to preserve those. This lets
us recreate new registers after each EmitVertex(), assuming we aren't
in control flow, allowing them to have separate live ranges. It also
means that those registers are more likely to be written once, rather
than having multiple writes, which can make optimization easier.
This is pretty much a total hack, but it's helpful.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624 >
2024-06-08 02:18:51 -07:00
Eric Engestrom
cb30b266ca
ci/deqp: uprev gl & gles cts
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29602 >
2024-06-08 08:19:47 +00:00
Eric Engestrom
c02329ded1
ci: set a common B2C_JOB_SUCCESS_REGEX with the message that's printed for all jobs
...
Simpler code, and more reliable against serial corruption because that
message is printed 4 times (vs only once for the other ones).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29608 >
2024-06-08 07:16:27 +00:00
Marek Olšák
dc113c418d
ac/nir: import the dispatch logic for the universal compute clear/blit shader
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
6b15e45908
ac/nir: import the universal compute clear/blit shader
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
1becc6953c
ac/nir: import the MSAA resolving pixel shader from radeonsi
...
It has a lot of options for efficiency.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
f96bbb64d6
radeonsi: add decision code to select when to use compute blit for performance
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
3424e16ece
radeonsi: add decision code to select when to use CB_RESOLVE for performance
...
The answer is "almost never".
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
c5641387f3
radeonsi: add a new blit microbenchmark
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
0c545e2fca
radeonsi: add fail_if_slow parameter into si_msaa_resolve_blit_via_CB
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
77d81fb8b0
radeonsi: add a custom MSAA resolving pixel shader
...
This is faster for 8 samples because it forms a VMEM clause, unlike
the default shader.
It also uses 16-bit types in the shader when possible and averages fewer
components if the format has less than 4.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
21e90d9c6e
radeonsi: clear color buffers via compute for special tiling cases
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
2a0b9839ca
radeonsi: add use_aco into CS blit shader key
...
it will be set in a future commit
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
fe7a4ed708
radeonsi: use shader_info::use_aco_amd to determine whether to use ACO
...
It's set by si_nir_scan_shader, so we need to use it after that.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
c83225cd0a
radeonsi: print the compute shader blit key for AMD_DEBUG
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
d62ad0da5f
radeonsi: use MIMG A16 (16-bit image coordinates) in compute blits
...
This reduces VGPR usage for MSAA blits and blitting multiple pixels per
lane.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
d6c96024a8
radeonsi: extend NIR compute helpers to allow returning 16-bit results
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
5b3e1a0532
radeonsi: change the compute blit to clear/blit multiple pixels per lane
...
The target is 8-16B per lane regardless of the format and number of
samples. This is needed to fully utilize the memory bandwidth instead
of only a small fraction of it. These are optimal numbers identified by
benchmarking.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
d4c066abaf
radeonsi: adds flags parameter into si_compute_blit to replace fail_if_slow
...
So that we can also specify sync flags.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
30af861bff
radeonsi: restructure (rewrite) the compute blit shader
...
This merges the separate MSAA, downsampling, upsampling, and non-MSAA blocks.
It's not meant to change behavior, but some change are necessary:
- disallow 16 samples
- loads only load the number of components that we need
- optimizations barriers are placed optimally and include the sample index
in the same vector as the coordinates, so that LLVM is forced to form VMEM
clauses for loads and stores
- the shader queries the descriptor for the dst image manually and passes
it to the image store instead of the image variable (this is needed to get
latency hiding for scalar loads in the presence of optimization barriers)
This is a prerequisite for blitting multiple pixels per lane.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
d2ce5fc07a
radeonsi: split xy_clamp_to_edge to separate X and Y flags for the compute blit
...
to generate less shader code if only one of the axes needs clamping.
Use util_is_box_out_of_bounds instead of doing it manually.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
7ee936bf65
radeonsi: convert the compute blit shader hash table to u64 keys
...
32 bits is not enough anymore. We'll add more.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
40bcb588dd
radeonsi: remove the old si_compute_copy_image
...
It's replaced by the compute blit.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:11 +00:00
Marek Olšák
b0c0cca3a7
radeonsi: switch the old compute image copy to the new one using the blit
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:10 +00:00
Marek Olšák
f3a59fe216
radeonsi: add a new version of si_compute_copy_image using the compute blit
...
It's faster and handles more stuff.
This is mostly the same code as the old version, but it calls
si_compute_blit at the end.
A later commit will remove the old version, so that there is no code
duplication.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:10 +00:00
Marek Olšák
b7389615c6
radeonsi: rename si_compute_copy_image -> si_compute_copy_image_old
...
It will be replaced in several stages.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:10 +00:00
Marek Olšák
8b030ac588
radeonsi: rename si_compute_blit "testing" parameter to "fail_if_slow"
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:10 +00:00
Marek Olšák
a4602395d2
radeonsi: switch compute image clears to the compute blit shader
...
The compute blit shader is faster and handles more stuff.
This removes the old clear_render_target shader.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:10 +00:00
Marek Olšák
9915289bdf
radeonsi: extend the compute blit to do image clears as well
...
The compute blit is faster and handles more stuff than
the clear_render_target shader. We can just pass a clear value to it
to replace the source image.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:10 +00:00
Marek Olšák
e41887c6a4
radeonsi: cosmetic and robustness changes for the compute blit
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:10 +00:00
Marek Olšák
0c5d727a5e
radeonsi: document better how X/Y flipping in the compute blit works
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:10 +00:00
Marek Olšák
bb86366fee
radeonsi/gfx11: enable MSAA image stores in the compute blit
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:10 +00:00
Marek Olšák
5897dde3f7
radeonsi: don't fail due to DCC when using the compute blit on compute queues
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:10 +00:00
Marek Olšák
fcd9f0069f
radeonsi: don't use si_can_use_compute_blit in the compute blit
...
It makes supporting compute queues on all chips more complicated.
Other uses of si_can_use_compute_blit will be removed, so the function
will be removed too.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:10 +00:00
Marek Olšák
1b924bad5e
radeonsi: reject unsupported parameters as the first thing in the compute blit
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:10 +00:00
Marek Olšák
993c30af06
radeonsi: fix sample0_only for the compute blit
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:10 +00:00
Marek Olšák
0ca93e8090
radeonsi: optimize unaligned compute blits
...
If a blit starts on a coordinate that is not at the beginning of a tile
(e.g. 8x8), launch extra threads before 0,0,0 to make all following blocks
start at the beginning of such tiles. This makes such blits faster.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:10 +00:00
Marek Olšák
2423c5ad2f
radeonsi: use MIMG D16 (16-bit data) for image instructions in compute blits
...
This reduces VGPR usage.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917 >
2024-06-08 05:48:10 +00:00