The entire point of resource shadowing is to avoid unnecessary flushing.
Flushing readers after shadowing is counterproductive. A refresher on
how resource shadowing is supposed to work:
First, we determine if it's beneficial to shadow resources. If so, we
create a new backing buffer object. We flush the current writer of the
resource, if there is one, so the current contents become known to the
CPU. If we are not discarding the original resource, we then copy the
existing contents of the buffer to the new shadow buffer on the CPU.
Finally, we swap the resource's backing buffer for our shadow. Any batch
that reads the resource will continue to read the old copy of the
resource, and any future draw calls will see the new copy with the
change implemented.
Where did we go wrong?
In 988d5aae74 ("panfrost: Flush resources when shadowing"), we started
flushing all readers. We didn't actually need to flush, we just needed
to avoid dangling references on the batches reading the old copy of the
resource. But that's easily enough avoided: just remove the references.
The batches still hold a reference to the underlying BO, which will be
freed at the right time regardless.
Originally motivated by glmark2 -bbuffer:update-method=subdata, which
has some pathological access paterns.
Firefox is a lot faster anecdotally (now scrolling at 60fps in firefox).
But what actually motivated this is an apitrace from Duckstation's GLES
renderer. With this patch, the in-game portion is improved 3fps to 21fps.
Closes: #4028
Fixes: 988d5aae74 ("panfrost: Flush resources when shadowing")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19361>
(cherry picked from commit 2d8f28df73)
If a synchronized transfer_map is going to overwrite an entire resource,
there's no need to memcpy in the original contents ahead-of-time. This
memcpy is particularly bad for large buffers where it's copying WC->WC,
although that could be mitigated with threaded_context's cpu_storage in
the future if needed.
Prevents a performance regression in glmark2's buffer scenes from the
next patch, hence the Cc.
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19361>
(cherry picked from commit 0b26a9f773)
To match PIPE_COMPUTE_CAP_MAX_THREADS_PER_BLOCK, having it be higher in any
dimension is nonsensical and can confuse apps. Fixes tests in
KHR-GLES31.core.texture_buffer.* on Mali-T860.
Fixes: 9b19104a30 ("pan/mdg: Lower PIPE_COMPUTE_CAP_MAX_THREADS_PER_BLOCK on Midgard")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19238>
(cherry picked from commit 027ee6c9e9)
Now we're back to a single XFB implementation for all gens. Fixes:
KHR-GLES31.core.draw_indirect.advanced-twoPasses-transformFeedback-arrays
KHR-GLES31.core.draw_indirect.advanced-twoPasses-transformFeedback-elements
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19238>
(cherry picked from commit 0955fe8fe2)
rc_adjust_channels is inteded for moving the swizzles to a new channels
when rewriting the writemask of an instruction. However for readers one
needs to keep the swizzles in the old channels but rather convert to the
new values, so use the proper helper rc_rewrite_swizzle.
With the new swizzle fixed, we should properly detect that it would be
invalid and thus we can select the proper register class to prevent the
writemask rewrite in the regalloc.
Documentation was added to rc_adjust_channels to make it more clear what
it actually does.
Fixes a bunch of dEQP tests.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7521
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19158>
(cherry picked from commit 1e9e561811)
Conflicts:
src/gallium/drivers/r300/ci/r300-r480-fails.txt
It's undefined behavior in C to read a union member if another member
has been written to more recently. Let's be more careful here!
Fixes: a9d2b86c2c ("zink: store the spirv_shader to the zink_shader struct for generated tcs")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19457>
(cherry picked from commit 090a111c5d)
6698753cdb switched our GS output stores to use MUBUF.
The stride doesn't matter for the ESGS descriptor (because idxen=false and
the index stride is 64), but this fixes it anyway.
This also changes ACO to use MUBUF store too, since MTBUF doesn't seem to
work correctly with an invalid data format in the descriptor.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 6698753cdb ("ac/llvm: don't use tbuffer_store as a fallback for swizzled stores")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18885>
(cherry picked from commit a71d068fd0)
Recently, a regression was reported where videos in Firefox had shifted/
glitched colors on certain Kepler hardware. This was bisected to
bf02bffe15, however, the issue already
existed but didn't hit users until TGSI was switched to NIR as default.
The issue was traced to a YUV-to-RGB fragment shader used by Firefox,
which uses three samplers for the Y/U/V components. The Y component was
handled correctly, but the U/V components were bogus, causing the issue.
After analysis, it appears the TXF/TXQ ops. should only handle the texture
(r) but not the sampler (s), see 63b850403c
and 346ce0b988.
Similarly, handleTXQ/handleTXF on nv50_ir_from_tgsi always sets s=0.
Only Kepler was affected because other hardware ignores s at codegen.
Always set s=0 on NIR for TXF/TXQ, to keep TGSI behavior and fix the
regression.
Thanks: Karol Herbst and M Henning for help diagnosing the issue.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7416
Cc: mesa-stable
Suggested-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Signed-off-by: Joan Bruguera <joanbrugueram@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19453>
(cherry picked from commit 6014a642ae)
Documentation is worded in a confusing way, which may be understood that
we don't have to set this field to get good results.
MESH part of this commit improves performance of vk_meshlet_cadscene
by a factor of 2 on A380.
Fixes: ef04caea9b ("anv: Implement Mesh Shading pipeline")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19412>
(cherry picked from commit d1d2dee970)
From empirical tests (on a660) R8G8 with UBWC enabled requires 256b
alignment, otherwise there would be a GPU fault during blits.
Set alignment to 4096 for all UBWC images since that's what blob does
and this area is heavily undertested.
Fixes GPU fault in Borderlands 3 running through DXVK.
cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19298>
(cherry picked from commit 5eaca461a7)
In 7f6ecb8667 we added reference counting for descriptor set layouts,
however, we didn't realize that pools created without the flag
VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT don't free individual
descriptors and can only be reset or destroyed. Since we only drop
references when individual descriptor sets were destroyed, we would
leak set layouts referenced from descriptor sets allocated from these
pools.
Fix that by keeping a list of all allocated descriptor sets (no matter
whether VK_DESCRIPTOR_POOL_CREATE_FREE_DESCRIPTOR_SET_BIT is present or
not) and then traversing the list dropping the references on pool resets
and destroys.
Fixes: 7f6ecb8667 ('v3dv: add reference counting for descriptor set layouts')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19337>
(cherry picked from commit 07c7d846e5)
The flush_resources recorded in the context need to stay alive until
the context is flushed, at which point additional resolve operations
are done to those resources. While the backing BO is alive due to being
referenced in the cmdstream, the resource might already be destroyed
at this point.
Keep a reference to the resource to make sure it is still available at
context flush time.
Fixes: 7b9d8d1936 ("etnaviv: flush used render buffers on context flush when neccessary")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19280>
(cherry picked from commit 16e0702ec7)
The previous list of passes contained several errors: "constprop" does not
exist anymore and a trailing ',' is not allowed. This made LLVMRunPasses
fail, but the error is silently ignored. Thus none of the listed passes
ran at all.
https://reviews.llvm.org/D85159 suggests "InstSimplify really should be
used anywhere ConstProp is being used" so replace constprop with
instsimplify and remove the trailing comma.
By enabling pass logging with
LLVMPassBuilderOptionsSetDebugLogging(opts, true),
the difference is visible. Before:
Running pass: AlwaysInlinerPass on [module]
Running analysis: InnerAnalysisManagerProxy<llvm::FunctionAnalysisManager, llvm::Module> on [module]
Running analysis: ProfileSummaryAnalysis on [module]
Running pass: CoroConditionalWrapper on [module]
Running pass: AnnotationRemarksPass on fs_variant_partial (1162 instructions)
Running analysis: TargetLibraryAnalysis on fs_variant_partial
Running pass: AnnotationRemarksPass on fs_variant_whole (1110 instructions)
Running analysis: TargetLibraryAnalysis on fs_variant_whole
After:
Running pass: AlwaysInlinerPass on [module]
Running analysis: InnerAnalysisManagerProxy<llvm::FunctionAnalysisManager, llvm::Module> on [module]
Running analysis: ProfileSummaryAnalysis on [module]
Running pass: CoroConditionalWrapper on [module]
Running pass: AnnotationRemarksPass on fs_variant_partial (1162 instructions)
Running analysis: TargetLibraryAnalysis on fs_variant_partial
Running pass: AnnotationRemarksPass on fs_variant_whole (1110 instructions)
Running analysis: TargetLibraryAnalysis on fs_variant_whole
Running analysis: InnerAnalysisManagerProxy<llvm::FunctionAnalysisManager, llvm::Module> on [module]
Running pass: SROAPass on fs_variant_partial (1162 instructions)
Running analysis: DominatorTreeAnalysis on fs_variant_partial
Running analysis: AssumptionAnalysis on fs_variant_partial
Running analysis: TargetIRAnalysis on fs_variant_partial
Running pass: EarlyCSEPass on fs_variant_partial (1111 instructions)
Running analysis: TargetLibraryAnalysis on fs_variant_partial
Running pass: SimplifyCFGPass on fs_variant_partial (961 instructions)
Running pass: ReassociatePass on fs_variant_partial (961 instructions)
Running pass: PromotePass on fs_variant_partial (897 instructions)
Running pass: InstCombinePass on fs_variant_partial (897 instructions)
Running analysis: OptimizationRemarkEmitterAnalysis on fs_variant_partial
Running analysis: AAManager on fs_variant_partial
Running analysis: BasicAA on fs_variant_partial
Running analysis: ScopedNoAliasAA on fs_variant_partial
Running analysis: TypeBasedAA on fs_variant_partial
Running analysis: OuterAnalysisManagerProxy<llvm::ModuleAnalysisManager, llvm::Function> on fs_variant_partial
Running pass: SROAPass on fs_variant_whole (1110 instructions)
Running analysis: DominatorTreeAnalysis on fs_variant_whole
Running analysis: AssumptionAnalysis on fs_variant_whole
Running analysis: TargetIRAnalysis on fs_variant_whole
Running pass: EarlyCSEPass on fs_variant_whole (1059 instructions)
Running analysis: TargetLibraryAnalysis on fs_variant_whole
Running pass: SimplifyCFGPass on fs_variant_whole (912 instructions)
Running pass: ReassociatePass on fs_variant_whole (912 instructions)
Running pass: PromotePass on fs_variant_whole (844 instructions)
Running pass: InstCombinePass on fs_variant_whole (844 instructions)
Running analysis: OptimizationRemarkEmitterAnalysis on fs_variant_whole
Running analysis: AAManager on fs_variant_whole
Running analysis: BasicAA on fs_variant_whole
Running analysis: ScopedNoAliasAA on fs_variant_whole
Running analysis: TypeBasedAA on fs_variant_whole
Running analysis: OuterAnalysisManagerProxy<llvm::ModuleAnalysisManager, llvm::Function> on fs_variant_whole
Fixes: 2037c34f24 ("gallivm: move to new pass manager to handle coroutines change.")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19217>
(cherry picked from commit 28f4fcaa4f)