So far we have been only restoring dirty dynamic states used by meta
pipelines however, static state from meta pipelines will also clear
dirty flags, preventing follow-up draw calls in the command buffer
to honor these if they are flagged as dynamic states in their
pipelines. Fix this by always resetting all dirty state flags after
a meta operation so we re-emit all the state we need with the next draw
call.
Fixes:
dEQP-VK.dynamic_state.monolithic.image.clear
cc: mesa-stable
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20356>
Nothing in the spec seems to require that the number of stages for
which creation feedback is requested must match the number of stages
available in the pipeline. In fact, the spec explicitly mentions
that this number could be 0:
"If pipelineStageCreationFeedbackCount is not 0,
pPipelineStageCreationFeedbacks must be a valid pointer to an
array of pipelineStageCreationFeedbackCount
VkPipelineCreationFeedback structures"
Fixes an assert crash in:
dEQP-VK.pipeline.monolithic.creation_feedback.graphics_tests.vertex_stage_fragment_stage_no_cache_zero_out_feedback_cout
cc: mesa-stable
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20352>
This may allow applications making use of buffer age to save some effort
in some cases.
v2: (Simon Ser)
* Add space between struct member and "<" operator.
* Remove break statement which prevented the change from working as
intended in swrast_update_buffers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18269>
If we can't use the TLB to do a subpass resolve we have a fallaback
that emits separate image resolves, but this fallback was only
handling color resolves. This adds depth/stencil as well.
Fixes some of the issues we have with CTS 1.3.4 in:
dEQP-VK.pipeline.monolithic.multisample.misc.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20331>
For these we always want to use sample_0, averaging is reserved for
color formats. We were already doing this correctly for depth/stencil
resolved in render passes, but not for those happening through
vkCmdResolveImage.
Fixes some of the issues we have with CTS 1.3.4 in:
dEQP-VK.pipeline.monolithic.multisample.misc.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20331>
attachment state is only relevant during render passes, however,
there is a corner case: if we can't resolve an attachment in a
subpass using the hardware, we emit a manual image resolve in the
driver which can trigger a meta operation via blit. In this case,
we pretend we are not in a render pass (since vulkan disallows
blits/resolves in a render pass) but we really want to keep the
attachment state after the meta operation.
Fixes some of the issues we have with CTS 1.3.4 in:
dEQP-VK.pipeline.monolithic.multisample.misc.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20331>
Pre-patch, anv_descriptor_pool used a free list for host allocations
that never merged adjacent free blocks. If the pool only allocated
fixed-sized blocks, then this would not be a problem. But the pool
allocations are variable-sized, and this caused over half of the pool's
memory to be consumed by unusable free blocks in some workloads, causing
unnecessary memory footprint.
Replacing the free list with util_vma_heap, which does merge adjacent
free blocks, fixes the memory explosion in the target workload.
Disdavantges of util_vma_heap compared to the free list:
- The heap calls malloc() when a new hole is created.
- The heap calls free() when a hole disappears or is merged with an
adjacent hole.
- The Vulkan spec expects descriptor set creation/destruction to be
thread-local lockless in the common case. For workloads that
create/destroy with high frequency, malloc/free may cause overhead.
Profiling is needed.
Tested with a ChromeOS internal TensorFlow benchmark, provided by
package 'tensorflow', running with its OpenCL backend on clvk.
cmdline: benchmark_model --graph=mn2.tflite --use_gpu=true --min_secs=60
gpu: adl
memory footprint from start of benchmark:
before: init=132.691MB max=227.684MB
after: init=134.988MB max=134.988MB
Reported-by: Romaric Jodin <rjodin@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20289>
This reverts commit b1126abb38.
This breaks all hell at least on DG2, as there are several cases left
where current_pipeline gets checked against GPGPU to decide what to do,
and the value doesn't match that of ANV_HW_PIPELINE_STATE_COMPUTE.
On top of that, it also misses checking for
ANV_HW_PIPELINE_STATE_RAYTRACING.
Then there's the fact that in some cases, current_pipeline will be
UINT32_MAX, because it's the original undefined state and also used
after executing a secondary command buffer because we are not tracking
on which pipeline did the secondary left us.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7910
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20349>
Buffer maps that don't invalidate their destination range work better
as direct CPU maps than staging blits. The application may write only
part of the range, effectively combining the new data with existing
data. So even if the map would stall, the staging blit path won't help
us, as we have to read the existing data to populate the staging buffer
before returning it. This incurs a stall anyway - plus a read and copy.
In contrast, a direct map doesn't need to read any data - it can just
write the destination and the existing data will still be there.
Fixes excessive blits for stalling buffer writes that don't invalidate
the buffer since my recent map heuristic rework.
Fixes: bec68a85a2 ("iris: Improve direct CPU map heuristics")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7895
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20330>
Based on Jianxun's ("iris: don't get format bits in AUX tables").
With gfx12.5+, the compression format is once again coming from the
surface state programming. MTL once again uses an aux-map, but it
ignores the format bits within the the aux-map metadata.
Ref: Bspec 44930: "Compression format from AUX page walk is ignored.
Instead compression format from Surface State is used."
gfx12.5+ also uses tile-4 rather than y-tiling, so if we don't see
y-tiling, we can return 0 from intel_aux_map_format_bits() for the
ignored format bits.
Rework:
* Just return 0 if not using y-tiling as suggested by Nanley.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20322>
Direct leak of 3912 byte(s) in 2 object(s) allocated from:
#0 0x7fbd4641b0 in __interceptor_malloc (/usr/lib64/libasan.so.6+0xa41b0)
#1 0x7f74413518 in parse_and_validate_cache_item ../src/util/disk_cache_os.c:549
#2 0x7f74414b84 in disk_cache_load_item ../src/util/disk_cache_os.c:599
#3 0x7f74410364 in disk_cache_get ../src/util/disk_cache.c:551
#4 0x7f775695ac in panfrost_disk_cache_retrieve ../src/gallium/drivers/panfrost/pan_disk_cache.c:125
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20336>
A bit count of es_accepted works for both when ngg is and isn't
dynamically enabled. Unlike the other sequence, this should only be a
single SALU instruction.
fossil-db (gfx1100, nggc):
Totals from 41388 (30.75% of 134574) affected shaders:
Instrs: 25783544 -> 25432959 (-1.36%); split: -1.36%, +0.00%
CodeSize: 127281160 -> 125878820 (-1.10%); split: -1.10%, +0.00%
Latency: 92849566 -> 92723047 (-0.14%); split: -0.14%, +0.00%
InvThroughput: 9542194 -> 9485012 (-0.60%); split: -0.60%, +0.00%
Copies: 2031074 -> 1928796 (-5.04%); split: -5.04%, +0.00%
Branches: 642407 -> 642409 (+0.00%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20321>
This replaces brw_fs_get_dispatch_enables(), which was added in
b9403b1c47 ("intel: factor out dispatch PS enabling logic"), but this
function will not work well for future changes to 3DSTATE_PS.
So, instead, this moves the related code into a "genX" file which can
directly update 3DSTATE_PS for the given platform.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20329>
Some GPUs are able to render more efficiently when all channels of a
color attachment are written, since whole pixels are being overwritten,
rather than hitting a read-modify-write cycle where newly written data
has to be combined with existing unmodified image data.
When faking GL_RGB as RGBA (in case RGB/RGBX isn't color renderable),
we introduce an extra channel that doesn't exist from the application
point of view. With such a format, a color mask of 0x7 (RGB) would mean
to write all channels. But because we've added an alpha channel behind
their back, this becomes a partial write. We are free to write whatever
garbage we want to the alpha channel, however. So we can enable alpha
writes, making this a more efficient full pixel write again.
This is done unconditionally as it's expected to address a problem
common to many drivers and isn't expected to be harmful, even on GPUs
where it may not help much.
Improves WebGL Aquarium performance on Alderlake GT1 by around 2.4x, in
the Chromium, using Wayland (the --enable-features=UseOzonePlatform and
--ozone-platform=wayland flags).
v2: Don't require PIPE_CAP_RGB_OVERRIDE_DST_ALPHA_BLEND (Marek)
v3: Fix independent blending enables (Emma) - now set when needed,
skipped when not needed, and PIPE_CAP_INDEP_BLEND_ENABLE is no
longer a requirement. We just optimize where we can.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7864
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v2]
Reviewed-by: Emma Anholt <emma@anholt.net> [v3]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20290>
While all Panfrost-supported Mali GPUs support all the compressed texture
formats architecturally, the system integrator decides which formats will
actually be wired up in the production system-on-chip. In the past there may
have been legal considerations, I'm neither a lawyer nor a system integrator so
couldn't say.
It's useful for users to know which compressed texture formats are supported by
their hardware, to understand its performance characteristics (and perhaps to
buy systems that support their needs, especially if they need BCn formats which
are omitted in many Mali implementations).
To help with that, this commit adds a small standalone tool that prints which
formats are supported. It is tested so far on Mali-T860 and Mali-G57.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Tested-by: Chris Healy <healych@amazon.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20086>