Now that we have proper handling of FCV_CCS_E everywhere, we can turn
this on for Gen12.5.
This helps fix a performance regression where enabling fast
clears to non-zero values with CCS_E caused additional partial resolves,
regressing performance on certain games. Performance is helped on the
following games:
- F1'22: +45%
- RDR2: +6%
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25589>
Surfaces with FCV_CCS_E aux usage should be marked as fast cleared when
being rendered to, to ensure proper fast clear state tracking. We also
need to ensure that we're not trying to partially resolve surfaces with
level > 0 and layer > 0 since we don't track fast clear states for
those.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25589>
Our implementation of secondary command buffers already jumps into
them and edits the end of the secondary command buffer to jump back
into the primary.
That implementation can work just the same with any levels of
secondary. The only possible issue would happen with a secondary
calling itself, but that's not possible.
We also cannot support simultaneous execution with self-modifying
command buffers. That's actually not a problem at the moment because
we don't have multiple queues of the same family but we choose to
reflect that in the feature bits.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25600>
Take advantage of some vk_sampler goodness and migrate all pvr
tex_formats to map to pipe_formats in pvr_formats.c. This allows us to
get rid of all the nasty manual packing functions.
This cleanup incidentally fixes some bad swizzling that was happening
in the manual handling.
Fixes: 4a2e6284 pvr: Add support for sampler border colors
Signed-off-by: Matt Coster <matt.coster@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25270>
Tile buffer emits required a load from the tile buffer into the
output regs, so they must be placed at the end of the EOT program
as to not corrupt the output register emits.
This commit orders the emit state to place output register emits
first, and tile buffer emits last.
dEQP test fixed:
dEQP-VK.renderpass.suballocation.attachment.4.422
... and others from the dEQP-VK.renderpass.suballocation.*
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25584>
Dynamic rendering requires that the client be able to bind just one
aspect of a depth/stencil image. Because we only have interleaved
depth/stencil on NVIDIA and no actual disable bits, this means we need
to implicitly AND any enables with a vk_format != UNDEFINED check. In
future, we might want to do that with a macro but we'll keep it simple
for today.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25653>
This is a temp fix. Currently we mix use llvm and aco to compile
shaders when AMD_DEBUG=useaco, but disk cache need function
identifier when creation, aco compiled shader should not use llvm
function identifier, so we have to disable disk cache when use
aco for now.
After aco is able to compile all shaders, we can re-enable disk
cache by removing the llvm function identifier when aco.
Fixes: d1dd36a74e ("radeonsi: be able to use aco compiler for mono ps")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9673
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25607>
The number of const shared registers was being used for the allocation size
rather than the number of bytes. In practice this doesn't make a difference as
the max allocation size is 24 bytes, which then gets rounded up to 64 bytes by
the buffer allocation function. However, we might as well make the allocation
size correct to avoid any future confusion. Noticed through code inspection.
Fixes: 7509e259f8 ("pvr: Implement color/depth/depth+stencil attachment clear.")
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25489>
pvr_is_stencil_store_load_needed() may be called on secondary command buffers,
which don't have any attachments. This wasn't being taken into account, meaning
a segfault could occur.
Fixes a segfault seen in:
dEQP-VK.renderpass.suballocation.attachment_allocation.input_output.39
Fixes: 54876512a1 ("pvr: Add mid fragment pipeline barrier if needed.")
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25486>