Indirect draw calls upload VS params themselves, if indirect draw call
has zero instances GPU seem to still try to upload consts, however
with zer instances it doesn't apply draw state. Without the draw state
the the HLSQ_VS_CNTL values is stale, so less constants may be specified
than draw call expects. It is found that if CP_DRAW_INDIRECT_MULTI_1_DST_OFF
is less than 0x3f - GPU is happy even if the constlen is less than that.
As a workaround we allocate driver params first and ensure that VS
constlen always has the minimum size which is enough to upload driver
params.
Fixes one of the GPU hangs in "Disney Epic Mickey: Rebrushed" on a750.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32140>
With all consts going through generic allocations it's now possible
to call ir3_setup_const_state once, and have lowerings that dynamically
lower things to consts just to update the max consts being used.
The only exception for now are immediates, since they eat up the space
that was left and allocated much later.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32140>
The order of allocation was backed into ir3_setup_const_state and
some other parts of ir3, which is rather brittle.
And don't assume offsets for consts in other part of code, their order
and offset calculation is not guaranteed.
This also potentially fixes indirect UBO effect on constlen size.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32140>
In the current API, precomp implicitly assumes full barriers both before & after
every dispatch. That's not good for performance. However, dropping the barriers
and requiring user to explicitly call barrier functions before/after would have
bad ergonomics.
So, we add a new parameter to the standard MESA_DISPATCH_PRECOMP signature
representing the barriers required around the dispatch. As usual, the actual
type & semantic is left to drivers to define what makes sense for their
hardware. We just reserve the place for it. (I think most drivers will want
bitflags here, but I don't think the actual flags are worth. If a driver wanted
to use a struct here, that would work too.)
Since the asahi stack doesn't do anything clever with barriers yet, we
mechnically add an AGX_BARRIER_ALL barrier to all precomp users in-tree. We can
optimize that later, this just gets the flag-day change in with no functional
change.
For JM panfrost, this will provide a convenient place to stash both their "job
barrier" bit and their "suppress prefetch" bit (which is really a sort of
barrier / cache flush, if you think about it).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32980>
When cross-building for Android meson fails to find supported compiler
arguments for the C++ cross-compiler, e.g.:
```
Compiler for C++ supports arguments -Wno-array-bounds: NO (cached)
Compiler for C++ supports arguments -Wno-overflow: NO
Compiler for C++ supports arguments -Wno-c++11-narrowing: NO (cached)
Compiler for C++ supports arguments -Wno-vla-cxx-extension: NO (cached)
```
This is due to an **unrelated** and more generic compilation failure
when testing for the supported arguments, e.g.:
```
Command line: `/tmp/android-ndk-r27c/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android34-clang++ -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -static-libstdc++ /home/ao2/Collabora/GOO0042/mesa/_build/meson-private/tmpv6_hke9l/testfile.cpp -o /home/ao2/Collabora/GOO0042/mesa/_build/meson-private/tmpv6_hke9l/output.obj -c -Wno-error=c99-designator -Wno-error=unused-variable -Wno-error=unused-but-set-variable -Wno-error=self-assign -D_FILE_OFFSET_BITS=64 -O0 -fpermissive -Werror=implicit-function-declaration -Werror=unknown-warning-option -Werror=unused-command-line-argument -Werror=ignored-optimization-argument -Wvla-cxx-extension -Wno-vla-cxx-extension` -> 1
stderr:
clang++: error: argument unused during compilation: '-static-libstdc++' [-Werror,-Wunused-command-line-argument]
-----------
Compiler for C++ supports arguments -Wno-vla-cxx-extension: NO
```
The issue is caused by how the cross compiler is set up by
.gitlab-ci/container/create-android-cross-file.sh
Allow the cross compiler to still start even when the
`-Werror,-Wunused-command-line-argument` error occurs in order to be
able to actually detect other arguments:
```
Command line: `/tmp/android-ndk-r27c/toolchains/llvm/prebuilt/linux-x86_64/bin/aarch64-linux-android34-clang++ -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables --start-no-unused-arguments -static-libstdc++ --end-no-unused-arguments /home/ao2/Collabora/GOO0042/mesa/_build/meson-private/tmpm6eolxed/testfile.cpp -o /home/ao2/Collabora/GOO0042/mesa/_build/meson-private/tmpm6eolxed/output.obj -c -Wno-error=c99-designator -Wno-error=unused-variable -Wno-error=unused-but-set-variable -Wno-error=self-assign -D_FILE_OFFSET_BITS=64 -O0 -fpermissive -Werror=implicit-function-declaration -Werror=unknown-warning-option -Werror=unused-command-line-argument -Werror=ignored-optimization-argument -Wvla-cxx-extension -Wno-vla-cxx-extension` -> 0
Compiler for C++ supports arguments -Wno-vla-cxx-extension: YES
```
This makes argument detection work again, e.g:
```
Compiler for C++ supports arguments -Wno-array-bounds: YES (cached)
Compiler for C++ supports arguments -Wno-overflow: YES
Compiler for C++ supports arguments -Wno-c++11-narrowing: YES (cached)
Compiler for C++ supports arguments -Wno-vla-cxx-extension: YES (cached)
```
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12441
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Valentine Burley <valentine.burley@collabora.com>
Reviewed-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33013>
The Xe uAPI is designed to use bind queues such that binds without input
dependencies (sync objects) do not block on binds with input
dependencies.
For example:
- Bind A (sparse) is submitted with a list of input dependencies.
- Bind B (immediate) is subsequently submitted without a list of input
dependencies.
If Bind A and Bind B share a single bind queue, Bind B will not be
scheduled until Bind A completes. Using individual bind queues decouples
Bind A and Bind B, allowing Bind B to make immediate progress.
This change creates a separate bind queue for each ANV queue, enabling
support for sparse bindings that may have input dependencies.
v2:
- Bail on bind queue creation failure (Linoel)
- Only create bind queue if VK_QUEUE_SPARSE_BINDING_BIT is set (Jose)
v3:
- Add comment around submit->queue usage (Jose)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32873>
It's split into ac_nir_lower_ps_early ac_nir_lower_ps_late.
ac_nir_lower_ps_early doesn't generate any AMD specific intrinsics except
some system values and is mainly an optimization pass with some lowering.
The new change here is that it also eliminates output components not needed
by spi_shader_col_format.
ac_nir_lower_ps_late lowers output stores to exports and does the bc_optimize
thing.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966>
Add support for getting time elapsed values via glBeginQuery/glEndQuery.
When recording query start & end time, we ensure that all pending jobs have
been completed by using v3d cpu_queue & the multisync extension.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32547>
Add support for getting timestamp values via
glGet(GL_TIMESTAMP) and glQueryCounter(GL_TIMESTAMP). For the case of
glQueryCounter, we make use of v3d cpu jobs via
DRM_IOCTL_V3D_SUBMIT_CPU and DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32547>
The spec says
vkCmdCopyQueryPoolResults is considered to be a transfer operation,
and its writes to buffer memory must be synchronized using
VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT before
using the results.
While STORE_MULTIPLE is not exactly VK_PIPELINE_STAGE_TRANSFER_BIT /
VK_ACCESS_TRANSFER_WRITE_BIT, we can still rely on user barriers to do
the right thing (e.g., flush caches for host access).
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>