If a subpass clears one aspect of Depth/Stencil but loads the other
the clear might get lost. Fix this by emitting the clear as a draw
call instead of relying on the TLB clear.
Fixes:
dEQP-VK.renderpass.suballocation.attachment.3.307
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far V3DV_ENABLE_DEFAULT_PIPELINE_CACHE allowed to configure
pipeline cache to avoid any caching using a pipeline cache.
With this change we can be more detailed. Then envvar is not anymore a
boolean. Allowed values:
* "off": no pipeline cache at all. PipelineCache objects behaves as
no-op objects.
* "no-default-cache": user PipelineCache caches nir/variants, but we
don't provide a default cache in case the user doesn't provide a
PipelineCache object, neither for internal pipelines.
* "full" (default): we provide a default PipelineCache, used when
the user doesn't provide one when creating a Pipeline, and for
internal Pipelines.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We don't want to let the default pipeline cache grow without limit. We
choose a maximum number of entries that should work for all real world
applications. CTS will exceed that limit, but that is okay, as it will
prevent us from running out of memory.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Some shaders that need to spill hundreds of registers can take very long times
to compile as each allocation attempt spills a single register and restarts
the allocation process. We can significantly cut down these times if we allow
the compiler to spill in batches, which should be possible if we are spilling
uniforms, which is in fact the kind of spills that we do first because they
have lower cost than TMU spills.
Doing this could cause us to slightly over spill in some cases (depending on
the chosen batch size) leading to slightly worse performance, so we only
enable this behavior after we have started to spill over a certain threshold,
at which point we assume that performance won't be good and we want to
favor compilation speed instead.
v2:
- Keep it simple and just try to spill a fixed amount of registers in a
batch instead of trying to compute this dynamically based on accumulated
spills and current register pressure. (Eric).
v3:
- Check if the node is valid before doing anything with it.
- Drop the environment variable to select batch size and just fix it to 20.
With this we can take this CTS test from 35 minutes down to about 3 minutes:
dEQP-VK.ssbo.layout.random.all_shared_buffer.5
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We had some code on blit_tfu to hande 3D images but it was wrong. For
example, it executed a copy on the 3D image no matter the depth
component copy needed. This was not detected until vk-gl-cts 1.2.4
introduced more 1D and 3D blitting tests.
Also add checks for rely on blit_shader if needed like when mirroring
on the depth component.
Fixes the following tests:
dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests.mirror_z_3d.nearest
dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests.whole_3d.nearest
dEQP-VK.api.copy_and_blit.dedicated_allocation.blit_image.simple_tests.mirror_z_3d.nearest
dEQP-VK.api.copy_and_blit.dedicated_allocation.blit_image.simple_tests.whole_3d.nearest
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
When sampling the stencil aspect we want to reinterpret the D24S8 format
as RGBA8 and read stencil values from the R component.
Fixes:
dEQP-VK.renderpass.suballocation.formats.d24_unorm_s8_uint.input.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Gets tests like the following one properly skipped:
dEQP-VK.api.copy_and_blit.core.image_to_image.all_formats.color.1d.etc2_r8g8b8a8_unorm_block.etc2_r8g8b8a8_unorm_block.optimal_general
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we have only been exposing linear for WSI formats and UIF on
everythig else, but we should instead expose linear or UIF based
on whether the underlying format supports any features for the given
layout.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
When negotiating DRM modifiers, applications may use this to validate the
features that are supported with a particular modifier. The WSI code in
Mesa relies on this to validate its modifiers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
By basing the tex_coord on the max layer, instead of min (similarly to
what we do for mirroring x/y)
Avoid all crashes, and get to Pass most of the following tests:
dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests.mirror_z_3d.*
The only one failing is this one:
dEQP-VK.api.copy_and_blit.core.blit_image.simple_tests.mirror_z_3d.nearest
but looks that the core cause would be different, as there are other
3d nearests tests failing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Command buffer object destruction callbacks take 64-bit object
handles, but we defined the color clear pipeline callback to take
a 32-bit argument.
Should fix recent crash regressions with some CTS tests on Rpi4.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Subpass color clear pipelines are those used to emit partial attachment
clears as draw calls inside the render pass currently bound by the
application in the command buffer, leading to a huge performance improvement
compared to the case where we emit them in their own render pass.
Unfortunately, because the pipeline references the render pass
object in which it is used and the render pass object is owned by the
application (and can be destroyed at any point), we can't cache these
pipelines (unless we implement a refcounting mechanism or other
similar strategy).
Performance impact looks negligible based on experiments with vkQuake3,
probably because the underlying pipeline cache is preventing the
redundant shader recompiles.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Specifically, we should select the slice to blit from on the source
image to be in the middle of the depth step.
This issue was only raised recently after the CTS improved the 3D
blitting tests.
Fixes:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.*.3d.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Originally, copies between buffers and images required a buffer offset
that was a multiple of 4 bytes, however, the spec was later fixed to
relax this rule and only require offsets that had texel alignment.
Our implementation of image to buffer copies using the blit path needs
to bind the destination buffer as a linear image and be able to bind
the requested buffer memory at the required offset, so for that to work
we need to chnage the alignment requirements for linear images to match
the relaxed texel alignment requirement.
Fixes new tests in Vulkan CTS 1.2.4:
dEQP-VK.api.copy_and_blit.core.image_to_buffer.buffer_offset_relaxed
dEQP-VK.api.copy_and_blit.dedicated_allocation.image_to_buffer.buffer_offset_relaxed
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The lowering will get all the interpolateAt() functions from GLSL lowered to
the corresponding intrinsics we have just implemented in the compiler backend,
which was the last piece we needed to enable the feature.
This gets us to pass all the relevant tests in:
dEQP-VK.pipeline.multisample_interpolation.*
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
The option use_interpolated_input_intrinsics will lower these as well
as regular input loads. This is inconvenient for V3D, where we can
produce optimal code for regular input loads based on the input
variable layout qualifiers, so this change adds an option to only
lower instances of interpolateAt().
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
as we have just set proper values for point granularity etc, we can
enable largePoints. With this change tests like this:
dEQP-VK.rasterization.primitive_size.points.point_size_*
goes from Skip to Pass.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
As we are here, we also tweak some line-related limits, as some use
the same value that for point, and in order to use the enum we added
recently at common/v3d_limits.h
Fixes the following test:
dEQP-VK.glsl.builtin_var.simple.pointcoord
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
PTB assumes that instance id to be 0 at start of tile, but hw would
not do that, we need to set it.
This fixes some Vulkan CTS tests that start to fails after some other
tests used an instance id.
So for example, before this commit for the following tests, executed
in that order, we got the following behaviour:
dEQP-VK.pipeline.vertex_input.multiple_attributes.binding_one_to_many.attributes.float.mat2.mat3 => Pass
dEQP-VK.draw.indexed_draw.draw_instanced_indexed_triangle_strip => Pass
dEQP-VK.pipeline.vertex_input.multiple_attributes.binding_one_to_many.attributes.float.mat2.mat3 => Fails
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we were pre-generating two variants, an all 16 bit return_size
and an all 32-bit return_size, as at pipeline creation time we don't
know the texture format that it would be used finally used.
But it is possible to override or at least refine the 32bit case, as
we know in advance that all shadow textures can (and in fact should)
use return_size 16bit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
To be used to decide the texture return size. We add it on the
descriptor map because it is the easier place to do so. As we are
lowering the texture accesses we can check instr->is_shadow at that
point. It is true that it is somewhat odd, as so far the descriptor
map was general-descriptor info, but is_shadow is only for
textures. But it doesn't make sense to make an effort now, as it is
possible that we would get more descriptor-specific info on the map on
the future. We can revisit that later.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
There are some potential advantages for that. Even if we are not
taking advantage of them, it would be interesting to be using this
path now, specially as non-deref path could be removed at some point.
Note that instead of returning for both resource_index and
vulkan_descriptor a vec2, we return a scalar for the first one, as it
is what the v3d backend expect (like for get_ssbo_size). For this to
work, we reconfigure the vec2 at vulkan_descriptor using the index and
an unused 0 value.
As far as I see turnip avoids that by lowering too load_ssbo/ubo, so
it just gets the index lowered (that in their case it is a vec3 with a
fixed 0 on the third component), but for now it is easier doing this.
v2: return a single-component for the index, to avoid the backend
needing to handle it (Eric, Jason).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Asking the simulator the total memory it is using, instead of sysinfo
(that returned the host system memory).
Fixes the following CTS tests when using the simulator:
dEQP-VK.memory.allocation.basic.percent_1.forward.count_12
dEQP-VK.memory.allocation.basic.percent_1.reverse.count_12
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
Although we don't support texture buffers on the OpenGL driver, we are
already doing that for the Vulkan driver. This would be needed for the
OpenGL driver in any case.
Fixes following tests on v3dv:
dEQP-VK.memory.pipeline_barrier.host_write_uniform_texel_buffer.*
dEQP-VK.memory.pipeline_barrier.transfer_dst_uniform_texel_buffer.*
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
So far we were using directly the local variable key to do the
insertion, when the hash table expects a permanent address. We add a
key field on all the meta structures (that are already basically a
wrapper over v3dv_pipeline).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
We were inserting as key directly the local key variable used to
search for entries, but hash_table expect a real pointer. Fixed by
using the array of keys that we already had at v3dv_pipeline.
Fixed failures on the rpi4 like:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.a1r5g5b5_unorm_pack16.a1r5g5b5_unorm_pack16.general_general_linear
but fwiw, this tests on the simulator, and several other tests on both
the simulator and rpi4, were working just by luck.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>