Memory can be free before images it is bound to. When unmapping the
CCS range in the AUX-TT, we cannot rely on the anv_bo::offset field
because the anv_bo might have been freed.
Just save the mapping address/size and use those values at unmapping
time.
Fixes an assert on CI with :
dEQP-VK.synchronization.internally_synchronized_objects.pipeline_cache_graphics
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e519e06f4b ("anv: add missing alignment for AUX-TT mapping")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27304>
Even though CS2R can fetch a whole 64 bits at a time, that doesn't mean
it does so atomically. Instead, we need to loop, alternating high and
low until we fetch the same high value twice.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27303>
syncobjs provide the same features and allow to unify code
paths because we don't need to handle imported syncobj
separately.
This simplifies the code and doesn't seem to have any perf
impact.
Syncobjs are supported in amdgpu since kernel commit 660e855813f78
during 4.12 cycle but the minor version wasn't bumped so use
the next bump value asthe minimum supported version.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24724>
- no need to flush anything before as we're working on a clean
buffer (SI_OP_SKIP_CACHE_INV_BEFORE)
- L2 must be flushed after the job to avoid rendering artifacts.
Instead of setting it manually, use SI_OP_SYNC_AFTER +
SI_COHERENCY_NONE.
Fixes: 1a99f50c7f ("radeonsi: use a compute shader to convert unsupported indices format")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27095>
This fixes#9807 but I don't understand why.
Emitting cache flushes before VGT_PRIMITIVE_TYPE is what makes
the problem go away but changing the order in si_draw() is clearer.
The only cases where sctx->flags is modified in si_emit_draw_registers
is handled using si_emit_cache_flush_direct so we can move cache
flushing up without any addtional conditionals.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9807
Fixes: 1e4b539042 ("radeonsi: handle deferred cache flushes as a state (si_atom)")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27095>
The runtime is turning GENERAL layouts into FEEDBACK_LOOP ones when it
detects feedback loops in a render pass. This is breaking drivers that
would like to use a different HW layout for those 2 layouts because if
the application inserts barrier in the render pass, the barriers the
driver sees are inconsistent.
This could lead to barrier of this type :
- GENERAL -> FEEDBACK_LOOP (runtime)
- GENERAL -> GENERAL (app)
- FEEDBACK_LOOP -> GENERAL (runtime)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23523>
NIR caching is useful for two use cases:
- Shader permutations involving reused VS or FS.
- GPL-like engine that compiles a separate (library) variant and an
optimized (monolithic) variant, e.g. DXVK.
By caching the result of radv_shader_spirv_to_nir, permutations hitting
the cache can have their compilation time reduced by 50% or more.
There are still open questions about the memory and storage footprint of
NIR caches, which is why this is gated behind a perftest flag. In
particular, Steam doesn't want to ship NIR cache since they are
unnecessary in presence of a full precompiled shader cache. In this
commit, the cache entries do not reside in memory and are immediately
written to the disk. Further design around how the caches are stored and
how to coordinate cache type with Steam etc. is left as future work.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26696>
Buffers that are not dedicated can also be used for CCS mapped images,
so they need to be aligned to the AUX-TT requirements.
GTK+ is running into such case where it creates an image with a CCS
modifier. When requesting the alignment through
vkGetImageMemoryRequirements() the 64KB/1MB alignment is returned, but
the binding fails with an assert because the VkDeviceMemory has not
been aligned to the AUX-TT requirement and we cannot disable CCS since
the modifier requires it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4cdd3178fb ("anv: Meet CCS alignment reqs with dedicated allocs")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10433
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27258>
The HW can't do comparison opcodes in the FS, so the old way was to
lower all to ADD and CMP in the backend. However, the comparison bool
result is used either in if (where we can encode the comparions in the
aluresults calculation) or is uses in another bcsel. Therefore, we now
end with two CMPs, one from the original bcsel and one from the lowered
comparison.
This patch fixes it by doing the comparison lowering early for some nice
shader-db gains. Similarly to vertex shaders, we need some special
passes for this, because we can't nir_lower_bool_to_float too early, so
we just manually lower the most common patterns in the main opt loop and
than clean up the rest later.
Shader-db RV530:
total instructions in shared programs: 130797 -> 130400 (-0.30%)
instructions in affected programs: 34591 -> 34194 (-1.15%)
helped: 203
HURT: 133
total presub in shared programs: 8175 -> 8220 (0.55%)
presub in affected programs: 1738 -> 1783 (2.59%)
helped: 62
HURT: 53
total omod in shared programs: 414 -> 412 (-0.48%)
omod in affected programs: 4 -> 2 (-50.00%)
helped: 2
HURT: 0
total temps in shared programs: 17570 -> 17566 (-0.02%)
temps in affected programs: 1122 -> 1118 (-0.36%)
helped: 41
HURT: 43
total consts in shared programs: 94362 -> 94359 (<.01%)
consts in affected programs: 381 -> 378 (-0.79%)
helped: 13
HURT: 10
total lits in shared programs: 2951 -> 2961 (0.34%)
lits in affected programs: 104 -> 114 (9.62%)
helped: 3
HURT: 9
total cycles in shared programs: 198965 -> 198744 (-0.11%)
cycles in affected programs: 55784 -> 55563 (-0.40%)
helped: 177
HURT: 144
LOST: 0
GAINED: 1
Shader-db RV380:
total instructions in shared programs: 84224 -> 84109 (-0.14%)
instructions in affected programs: 6039 -> 5924 (-1.90%)
helped: 106
HURT: 38
total presub in shared programs: 1401 -> 1372 (-2.07%)
presub in affected programs: 113 -> 84 (-25.66%)
helped: 27
HURT: 10
total temps in shared programs: 13231 -> 13224 (-0.05%)
temps in affected programs: 303 -> 296 (-2.31%)
helped: 22
HURT: 12
total consts in shared programs: 82484 -> 82505 (0.03%)
consts in affected programs: 271 -> 292 (7.75%)
helped: 4
HURT: 25
total cycles in shared programs: 132957 -> 132840 (-0.09%)
cycles in affected programs: 12696 -> 12579 (-0.92%)
helped: 101
HURT: 43
LOST: 0
GAINED: 1
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27089>
This is slightly different from the stock is_only_used_as_float since
that one will return false when called for uses of bcsel if any of the
uses actually uses uint, like vec or another bcsel. This version will
recursivelly go further in such cases and return false only if it sees
any instructions specifically neededing bools or ints.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27089>
When updating an AFBC-packed resource, we need to make sure it is
legalized before blitting the staging resource to it. We can't rely
on the blit to properly convert the resource as it will result in
blit recursion and a crash.
If the whole texture is updated however, there is no need to unpack
as the content can be discarded. Just create a new BO with the right
format.
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 33b48a5585 ("panfrost: Add debug flag to force packing of AFBC textures on upload")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27208>
There might be a more efficient path when legalizing a resource if
we don't need to worry about its content. For example, it doesn't
make sense to copy the resource content when converting the modifier
if the resource content is discarded anyway.
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 33b48a5585 ("panfrost: Add debug flag to force packing of AFBC textures on upload")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27208>