Zink creates all libaries with CREATE_RETAIN_LINK_TIME_OPTIMIZATION,
then it first creates unoptimized pipelines and it enqueues optimized
pipelines in the background with CREATE_LINK_TIME_OPTIMIZATION.
If a pipeline is linked without CREATE_LINK_TIME_OPTIMIZATION, the
driver should import binaries instead of retained NIR shaders. This
was broken because RADV wasn't compiling binaries at all in presence
of CREATE_RETAIN_LINK_TIME_OPTIMIZATIONS. Now, it always compiles
binaries in libraries but can also retain NIR if requested.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8150
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21008>
Our hardware requires that we write to URB using full vec4s at aligned
addresses. It gives us an ability to mask-off dwords within vec4 we don't
want to write, but we have to know their positions at compile time.
Let's assume that:
- V represents one dword we want to write
- ? is an unitinitialized value
- "|" is a vec4 boundary.
When we want to write 2-dword value at offset 0 we generate 1 write message:
| V1 V2 ? ? |
with mask:
| 1 1 0 0 |
When we want to write 4-dword value at offset 2 we generate 2 write messages:
| ? ? V1 V2 | V3 V4 ? ? |
with mask:
| 0 0 1 1 | 1 1 0 0 |
However if we don't know the offset within vec4 at *compile time* we
currently generate 4 write messages:
| V1 V1 V1 V1 |
| 0 0 1 0 |
| V2 V2 V2 V2 |
| 0 0 0 1 |
| V3 V3 V3 V3 |
| 1 0 0 0 |
| V4 V4 V4 V4 |
| 0 1 0 0 |
where masks are determined at *run time*.
This is quite wasteful and slow.
However, if we could determine the offset modulo 4 statically at compile time,
we could generate only 1 or 2 write messages (1 if modulo is 0) instead of 4.
This is what this patch does: it analyzes the addressing expression for
modulo 4 value and if it can determine it at compile time, we generate
1 or 2 writes, and if it can't we fallback to the old 4 writes method.
In mesh shader, the value of offset modulo 4 should be known for all outputs,
with an exception of primitive indices.
The modulo value should be known because of MUE layout restrictions, which
require that user per-primitive and per-vertex data start at address aligned
to 8 dwords and we should statically always know the offset from this base.
There can be some cases where the offset from the base is more dynamic
(e.g. indirect array access inside a per-vertex value), so we always do
the analysis.
Primitive indices are an exception, because they form vec3s (for triangles),
which means that the offset will not be easy to analyse.
When U888X index format lands, primitive indices will use only one dword
per triangle, which means that we'll always write them using one message.
Task shaders don't have any predetermined structure of output memory, so
always do the analysis.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20050>
Coded buffer will only be read on CPU, setting
PIPE_USAGE_STAGING instead of PIPE_USAGE_STREAM
makes the CPU reads much faster.
On 6700XT this reduces the CPU copy by around
3ms to 0.3 ms on average while under high GPU
load - real-time game streaming.
Signed-off-by: David Rosca <nowrep@gmail.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20989>
We have a common pain point with fractional CTS coverage, where the test
list changes on a CTS uprev or board load rebalancing, so you get a
different subset of tests run. The dev updates the list of xfails (a
pain), but also we end up with xfails left behind that aren't tested any
more and don't reflect reality.
For some drivers (tu, freedreno, zink-anv) we have manual jobs available
for curious devs to look at the current state of the CTS, but without
anyone having to keep the full xfails updated during uprevs, you don't
necessarily know what to do with the results you get on your MR.
So, let's introduce nightly testing for the tests that aren't guaranteed
green by Marge. With that, Someone (possibly me? sigh) can review the
nightly results and push up updates for full-run xfails so everyone can be
on the same page other than a day or so of delay. We also have some hope
for automated tooling to do this thanks to what Collabora has been working
on for automated CI uprev MR generation.
Reviewed-by: Martin Roukala <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20950>
ac/surface puts the raw pip_bank_xor there, which needs the extra
shift for the actual tile_swizzle.
(I think long term we should refactor this in ac/surface but for
now lets fix like radeonsi to avoid race conditions.)
CC: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20979>
If we hit the race condition of looking up an already imported BO that
is in the process of being destroyed, the handle will be GEM_CLOSE'd,
meaning that the handle that we just got from the kernel is probably not
valid. So in this case we should retry.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20961>
The RESOURCE_CREATE_BLOB ioctl can carry a ccmd payload, similarly to
EXECBUF. But we need to preserve the order of buffered execbuf cmds
which haven't been flushed to the guest kernel yet, rather than let the
CREATE_BLOB payload jump to the head of the queue. Otherwise, for ex,
the host could see the guest requesting an iova that has not yet been
(from it's perspective) released.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20961>
This introduces radv_compute_pipeline_compile() which is used for
compute and ray tracing pipelines. I think it's better than having a
single function for compiling everything, and that will allow us to do
more cleanups.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20943>
The Vulkan spec says:
"VK_PIPELINE_CREATE_FAIL_ON_PIPELINE_COMPILE_REQUIRED_BIT specifies
that pipeline creation will fail if a compile is required for
creation of a valid VkPipeline object; VK_PIPELINE_COMPILE_REQUIRED
will be returned by pipeline creation, and the VkPipeline will be
set to VK_NULL_HANDLE."
Given the implementation is expected to set the pipeline to
VK_NULL_HANDLE, it's unecessary to handle pipeline feedback.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20943>
"If this texture instruction has a nir_tex_src_texture_offset source,
then the texture index is given by texture_index + texture_offset."
This fixes the failures for:
spec@arb_arrays_of_arrays@execution@sampler@fs-nested-struct-arrays-nonconst-nested-array
spec@arb_gl_spirv@execution@uniform@sampler2d-nonconst-nested-array
Signed-off-by: Amber Amber <amber@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20954>
list_is_linked() isn't the right function to use in order to check if
the BO is on a cache bucket or the zombie list, as this checks if the
next pointer of the list isn't NULL. This is always the case with the
BO list item as it's always initialized, so the next pointer points to
the list head itself when the BO isn't on any list.
Use list_is_empty() to check if the BO is actually linked into one
of the deferred destroy lists.
Fixes: 1b1f8592c0 ("etnaviv: drm: properly handle reviving BOs via a lookup")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20940>
Fixes 3DMark Wild Lands. Otherwise, we'd end up loading a DXIL shader
that had invalid linkage with another shader in the pipeline. We can
only load a DXIL shader if it's being linked against the same before
and after as a previous compilation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20962>