If task redistribution is enabled, then some mesh shaders read
garbage from task payload.
It may be a hardware bug, or it may be our bug. Who knows :(
This change will probably negatively affect performance of task
shader-enabled workloads on multi-slice GPUs, because mesh shaders
will be executed only on the slice where task shader was spawned.
Fixes: ef04caea9b ("anv: Implement Mesh Shading pipeline")
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16197>
(cherry picked from commit 4eaecd7965)
Export instructions allow burst writes, so it makes send to try
to allocate consecutive registers, but for ring writes we don't
schedule the outputs correctly to exploit this, so for now
don't mark these instructions as export to let the RA restart
picking colors.
When the scheduler starts to emit the ring writes in the right order
to allow for bust writes we might revisit this.
This fixes
spec@glsl-1.50@execution@variable-indexing@gs-output-array-vec4-index-wr
Fixes: 79ca456b48
r600/sfn: rewrite NIR backend
Related: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6975
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18212>
(cherry picked from commit fd71cd0b6a)
When linear rendering is used together with TS, the color tiles must be fully
contained in a single row of pixels. When wrapping around to the next row
TS gets confused and records wrong tile status information, leading to visual
corruption when the surface is resolved/decompressed.
The corruption can be fixed by increasing the stride alignment for linear
render targets, but that would break some existing use-cases, as some display
engines used together with Vivante GPUs currently don't support strides that
don't match the horizontal display resolution.
For now only enable linear PE rendering when the surface is properly aligned
already. This allows to use the optimization in a lot of common use-cases, but
falls back to the proven tiled rendering with subsequent resolve into linear
for the problematic cases.
CC: mesa-stable #22.2
Fixes: 53445284a4 ("etnaviv: add linear PE support")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18232>
(cherry picked from commit ea8fc9592c)
Conflicts:
src/gallium/drivers/etnaviv/etnaviv_surface.c
The decision whether to use fast clear aka TS currently checks for two
feature bits: FAST_CEAR and MC20. We check for MC20, as TS on MC1.0 bypasses
the memory offset and we don't have any way to fixup the GPU address to
account for that. It could be done with some support of the kernel driver,
but then GPUs with MC1.0 are very rare to find these days, so not sure if we
are ever going to bother with that.
Instead of checking two separate feature bits to determine if TS can be used,
mask out the FAST_CLEAR bit from the features when MC20 isn't present. This
way we only have to check for a single feature bit.
CC: mesa-stable #22.2
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Guido Günther <agx@sigxcpu.org>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18232>
(cherry picked from commit 09953d7b75)
Sync fd fence export implies a payload reset operation, and application
can immediately do another submission with the same fence after export.
Concurrent use of the same feedback slot is incorrect. Keeping a list of
feedback slots for sync_fd external fence is a bit over designed given
those fences are usually not checked or waited by the app, but will hand
off the ownership via sync fd to an external client.
Fixes: d7f2e6c8d0 ("venus: add fence feedback")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17975>
(cherry picked from commit 5457f4c0a4)
With the following SEND instruction :
send(1) nullUD nullUD g0UD 0x4200c504 a0.1<0>UD
This instruction although valid but somewhat nonsensical (SEND message
to write at offset contained in NULL register), triggers an error in
the validator.
The restriction is that we cannot have overlapping sources. The
validator not checking the type of register incorrectly thinks that
the null register (offset 0) is the same as g0.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17555>
(cherry picked from commit 3c6fa2703d)
tcs_out_lds_offsets is not sure to be 16 byte aligned, it's
calculated like this:
num_patches * patch_vertices * lshs_vertex_stride
num_patches and patch_vertices are not sure to be any value aligned,
lshs_vertex_stride is added one extra dword, so it's only 4 byte
aligned.
This may cause problem even before we switch to nir tess output
lower when write tess factor before read tail of input. But it's
more likely to cause problem after we switch to nir tess output
lower because the main body won't eliminate the low 4bit offset
but epilog will, so they use different offset to read/write tess
factor.
Fixes: 7598bfd768 ("radeonsi: replace llvm tcs output with nir lower pass")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7083
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18174>
(cherry picked from commit ff7c59672f)
For AMD we kinda have some modifiers with a max size ... (Which is
really a compositor/kms issue, but getting them to try kinda falls
into the unsolved "how to allocate/what pitch to use" bucket, so
we solve it on the allocating side)
Cc: mesa-stable
Tested-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Joshua Ashton <joshua@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18139>
(cherry picked from commit bb2a444324)
Even with dynamic rendering, you have to bind both aspects of the image
if the image contains both depth and stencil. One day, we may see this
restriction lifted but that will require deeper driver surgery into the
way we handle depth/stencil layouts.
Fixes: 42db590006 ("radv: convert the meta blit 2d path to dynamic rendering")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18084>
(cherry picked from commit 76b8b854a5)
With VK_FORMAT_B10G11R11_UFLOAT_PACK32 in particular, we're seeing
applications create image views with swizzle = R,G,B,0
But since the format has no alpha channel, the swizzle value for it
does not matter for the equivalence we're trying to verify.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a9edc268b9 ("anv: validate image view lowered storage formats for storage")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18081>
(cherry picked from commit 4ab38112f3)