Commit Graph

184483 Commits

Author SHA1 Message Date
Lionel Landwerlin
a26c7b0b03 intel/ds: new tracepoints for generated commands
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
2024-02-13 00:06:45 +00:00
Lionel Landwerlin
472f49ef43 genxml: remove NDEBUG_UNUSED
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
2024-02-13 00:06:45 +00:00
Lionel Landwerlin
41b2ed65e2 genxml: generate opencl packing headers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
2024-02-13 00:06:45 +00:00
Lionel Landwerlin
2a0328ba8b genxml: enable opencl code generation
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
2024-02-13 00:06:45 +00:00
Lionel Landwerlin
e6b5196079 intel-clc: print text input
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
2024-02-13 00:06:45 +00:00
Lionel Landwerlin
4fd7495c69 intel/clc: add ability to output NIR
This will be used to generate a serialized NIR of functions for
internal shaders.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
2024-02-13 00:06:45 +00:00
Lionel Landwerlin
2bae1b6b66 intel-clc: move ISA generation to its own function
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
2024-02-13 00:06:45 +00:00
Lionel Landwerlin
2a1ff08376 intel/compiler: make default NIR compiler options visible
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
2024-02-13 00:06:45 +00:00
Lionel Landwerlin
012489e55c meson: add a new option to enable intel-clc without building RT shaders
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
2024-02-13 00:06:44 +00:00
Lionel Landwerlin
c53a4711cb anv: fix incorrect flushing on shader query copy
When doing query result copies in 3D mode, we're flushing the render
target cache, but the shader writes go through the dataport.

Fixes flakes/fails in piglit with shader query copies forced with Zink :

  $ query_copy_with_shader_threshold=0 ./bin/arb_query_buffer_object-coherency -auto -fbo

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b3b12c2c27 ("anv: enable CmdCopyQueryPoolResults to use shader for copies")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
2024-02-13 00:06:44 +00:00
Lionel Landwerlin
2437556d83 intel/fs: rerun divergence prior to lowering non-uniform interpolate at sample
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 74a40cc4b6 ("intel/fs: move lower of non-uniform at_sample barycentric to NIR")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
2024-02-13 00:06:44 +00:00
Lionel Landwerlin
8f5a7f57df intel/fs: indent lowering code to make it more readable
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
2024-02-13 00:06:44 +00:00
Lionel Landwerlin
c517088cf1 anv: factor out post submit queue debug code
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
2024-02-13 00:06:44 +00:00
Lionel Landwerlin
67f3fa896e intel/dev: fix missing dependency on generated packing heaers
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
2024-02-13 00:06:44 +00:00
Lepton Wu
04d26ceb0a llvmpipe: Set "+64bit" for X86_64
Without this, on some "buggy" qemu cpu setup, LLVM could crash
if LLVM detects the wrong CPU type.

Fixes: f92cadccc6 ("llvmpipe: Always using util_get_cpu_caps to get cpu caps for llvm on x86")

Signed-off-by: Lepton Wu <lepton@chromium.org>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27539>
2024-02-12 22:43:46 +00:00
Danylo Piliaiev
5dd5d4c4b5 tu: Exclude more a7xx regs from stomping
Stomping these regs even for a short time leads to crashes.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev
e4631bee61 freedreno/devices: Update magic regs for a7xx
These regs are written by blob, for some of them blob could
write non-zero values. So executing Turnip after blob without
writing these regs could lead to nasty GPU crashes.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev
eb1e71e707 freedreno,tu: Move varying interp and varying repl modes to xml
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev
78c843230c tu/a750: Consider vertex attr buff in gmem allocation
A750 added a new optimization - placement of vertex attributes
into GMEM, so part of GMEM is carved out for it and needs to
be considered during GMEM allocations.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Mark Collins
5266815ca9 tu/a7xx: Update CCU layout logic for A7XX
A7XX introduces some changes into the CCU such as having different
amounts of memory per CCU for depth and color and dividing up CCU
control into two registers A7XX_RB_CCU_CNTL and A7XX_RB_CCU_CNTL2
where CNTL2 no longer requires a complete flush to be updated, we
currently don't take advantage of this as any CCU updates set both
registers but it's a potential optimization we can add in the future.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev
98d6d93a82 turnip,ir3/a750: Implement inline uniforms via ldg.k
Inline consts suffer the same issue as driver params, so they also
should be preloaded via preamble. There is special instruction to
load from global memory into consts.

Co-Authored-By: Connor Abbott <cwabbott0@gmail.com>
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Connor Abbott
6a744ddebc ir3: Initial support for pushing globals with ldg.k
Add a separate pass which uses the analyze_ubo_ranges machinery to
construct ranges of readonly globals accessed in the shader and push
them to constants in the preamble, using ldg.k if possible. This is
enough to handle inline uniforms in turnip but also provides a base for
OpenCL, although the pass would need further work for that.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Connor Abbott
513fa1873c ir3/a7xx: Fix load_global_ir3 with immediate offset
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Connor Abbott
45c71803f9 tu: Add more info to ldg inline uniform path
This will let us push the ldg into the preamble.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev
b87b8fdf73 tu: Use SS6_INDIRECT for VS params
a750 has SS6_DIRECT path broken, we should either use UBO lowering
or SS6_INDIRECT path.

It is implemented as INDIRECT load even on a750+ because with UBO
lowering it would be tricky to get const offset for to use in multidraw,
also we would need to ensure the offset is not 0.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev
76e417ca59 turnip,ir3/a750: Implement consts loading via preamble
A750 expects driver params loaded through the preamble, old path
does work but has issues when the same LOAD_STATE is used between
several draw calls (it seems that LOAD_STATE is executed only for
the first draw call).

To solve this we now lower driver params to UBOs and let NIR deal with
them.

Notes:
- VS params are loaded via old path since blob do the same and there
  are no issues observed.
- FDM is not supported at the moment.
- For now driver params data is emitted via CP_NOP because it's tricky
  to allocate space for the data. (It is emitted when we are already in
  sub_cs)

Co-Authored-By: Connor Abbott <cwabbott0@gmail.com>
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev
7429ca3115 tu: Use SS6_INDIRECT consts upload path for 3d blits
3d blits used DIRECT consts upload path, which doesn't work
properly on a750+, however uploading them via SS6_INDIRECT
seem to be working.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev
30597970a5 tu/a7xx: Do not preload shaders, HW does it by default
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev
ac75edb8c4 tu/a7xx: Correctly set A7XX_HLSQ_UNKNOWN_A9AE.SYSVAL_REGS_COUNT
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev
bc6b847017 ir3: Add ldg.k instruction
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev
ad52f92cb8 tu: Define and set to zero all SP_*_VGPR_CONFIG regs
SP_FS_VGPR_CONFIG was found to be correlated with blob using avgs/uvgs.
Other SP_*_VGPR_CONFIG where undefined per-stage regs and it was tested
via rddecompiler that they "fix" hangs in respective shader stage,
when such stage uses the following instructions pattern:

  avgs.s.1.tex.0
  (ss) avgs.e;
  uvgs.s.tex.0;
  uvgs.e

The exact meaning of SP_*_VGPR_CONFIG is to be investigated.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:12 +00:00
Jonathan Marek
c166c5100b tu/a750: Basic a750 support
Could run vkcube.

Based on changes from Jonathan Marek <jonathan@marek.ca>

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:12 +00:00
Danylo Piliaiev
cdadead230 tu/a7xx: Make A7XX_RB_UNKNOWN_8E06 value configurable per-gen
It is some kind of DBG register which has different value
on different gens.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:12 +00:00
Sagar Ghuge
98b62434bd intel/compiler: Lower texture operation to combine LOD and AI
We have to push the lowering of texture operations a bit further in
pipeline since nir_lower_tex gets invoked twice and if there is no LOD
source present, nir_lower_tex adds that as a source. Once that's all
done we can easily combine the LOD and array index into a single 32-bit
value.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458>
2024-02-12 21:25:48 +00:00
Sagar Ghuge
c984d6e2fc nir: Drop intel specific lowering code
In previous patches, we have moved the Intel specific lowering code in
brw_nir_lower_texture file. We can go ahead and drop the Intel specific
texture source too.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458>
2024-02-12 21:25:48 +00:00
Sagar Ghuge
15129c7634 intel/compiler: Use nir_tex_src_backend1 to pack LOD and array index
Since this lowering is totally Intel specific, we don't have to
introduce the new texture source. We can use the nir_tex_src_backend1
source to pack LOD/LOD Bias and array index into 32 bit single value.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458>
2024-02-12 21:25:48 +00:00
Sagar Ghuge
73a3257968 intel/compiler: Add texture operation lowering pass
This pass combines the LOD or LOD bias and array index into a single
32-bit value since Xe2+ sampler messages requires us to do that.

v2: (Alyssa)
- Use nir_iand_imm instead of nir_iand and nir_imm_int
- Use nir_trim_vector instead of nir_swizzle

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458>
2024-02-12 21:25:48 +00:00
Lionel Landwerlin
646a7c864d anv: re-introduce BO CCS allocations
On Gfx12.0, CCS allocations have to be allocated per image because the
format of the image goes into the AUX-TT PTEs. The effect on memory
allocations is limited since the main surface granularity in the
AUX-TT PTE is 64KB.

On Gfx12.5, the granularity of the AUX-TT PTE is 1MB. This creates a
lot of waste in the application memory allocations. Fortunately the HW
doesn't care about the format put into the PTEs anymore. So it becomes
possible to have 2 images share the same PTE.

To implement this we bring back an earlier version of AUX-TT mappings
where we used to allocate additional CCS space at the end of the
VkDeviceMemory objects. On Gfx12.5, if the BO has additional CCS
space, we will now map the main surface to that space.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26822>
2024-02-12 21:00:27 +00:00
Lionel Landwerlin
bd197c6bcf intel/aux_map: add helper to compute offset in aux data
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26822>
2024-02-12 21:00:27 +00:00
Lionel Landwerlin
c0889a127b intel/aux_map: add BSpec reference
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26822>
2024-02-12 21:00:27 +00:00
Lionel Landwerlin
da6484a8a4 anv: use address helper to compute address u64 value
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26822>
2024-02-12 21:00:27 +00:00
Lionel Landwerlin
7763e75eea anv: move ALLOC_HOST_CACHED_COHERENT as define
That way gdb can decode the other flags when looking at the variables.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26822>
2024-02-12 21:00:27 +00:00
Lionel Landwerlin
3f64ec141e isl: add a no-aux-align usage flag
This flag signals that the driver will be dealing with aux-tt
alignment requirements on its own.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26822>
2024-02-12 21:00:27 +00:00
Lionel Landwerlin
44515bb92c isl: printout sparse usage
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26822>
2024-02-12 21:00:27 +00:00
Rhys Perry
926d9f1cef radv: support minmax filter for more formats
Support should be the same as AMDVLK, except for these formats:
- VK_FORMAT_R4G4_UNORM_PACK8
- VK_FORMAT_A4R4G4B4_UNORM_PACK16_EXT
- VK_FORMAT_A4B4G4R4_UNORM_PACK16_EXT
- VK_FORMAT_A1B5G5R5_UNORM_PACK16_KHR
- VK_FORMAT_A8_UNORM_KHR
- VK_FORMAT_X8_D24_UNORM_PACK32
- VK_FORMAT_D24_UNORM_S8_UINT
And the various emulated compressed formats.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27551>
2024-02-12 20:05:27 +00:00
Faith Ekstrand
05cf04ac97 nvk: Convert shader addresses to offsets in nvk_shader.c
Fixes: e162c2e78e ("nvk: Use VM_BIND for contiguous heaps instead of copying")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27565>
2024-02-12 18:47:07 +00:00
Faith Ekstrand
afd42f5951 nvk/heap: Rework over-allocation
Instead of making it part of every BO, just reserve a bit of space at
the end of the top buffer as part of setting up our vma_heap.  This
reduces our memory allocation by nvk_heap::overalloc per BO and means
that the over-allocation is taken into account when sparse binding heap
BOs in the contiguous case.

Fixes: e162c2e78e ("nvk: Use VM_BIND for contiguous heaps instead of copying")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27565>
2024-02-12 18:47:07 +00:00
Faith Ekstrand
728256e994 nvk/heap: Use nvk_heap_bo::addr instead of bo->offset
Fixes: e162c2e78e ("nvk: Use VM_BIND for contiguous heaps instead of copying")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27565>
2024-02-12 18:47:07 +00:00
Faith Ekstrand
83521dd486 nvk: Don't set CONSTANT_BUFFER_SELECTOR with a zero size
Kepler complains about this and it's unnecessary since we set
ENABLE_FALSE whenever we have a zero size anyway.

Fixes: 55413e33dc ("nvk: Disable all cbufs in nvk_queue_init_context_draw_state()")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27565>
2024-02-12 18:47:07 +00:00
Sviatoslav Peleshko
28ad2f488a anv: Store host-located copy of NULL surface state for faster memcpy
Real null_surface_state is located in the GPU memory, so copying from
there will be slow for dGPUs.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10594
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27577>
2024-02-12 17:48:15 +00:00