Some hardware that doesn't support true static samplers, emulates it
by copying all static samplers into a reserved portion of every descriptor
heap. To support Vulkan's required 4000 live sampler limit in bindless
mode, D3D is now able to create descriptor heaps which do not have a reserved
portion. Any descriptor heaps above the MaxSamplerDescriptorHeapSizeWithStaticSamplers
limit will not have that reserved portion and cannot be used with static samplers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27348>
DXIL doesn't have instruction-level coherency. We have 3 options:
1. Promote the instruction to an atomic instruction. We can only do this
for 32-bit or 64-bit ops.
2. If using bindless, declare the local resource declaration as globally-coherent.
3. If not using bindless, add globally-coherent to the global resource declaration.
This pass does all 3 of these, stopping at the intrinsic level for supported types
of atomics, otherwise assigning to the global resource declaration, which will be
unused if we're doing bindless, where instead we'll get it from the instruction.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27348>
The optimization pass will (eventually) turn the imul into a
umul_32x16. In many cases, the multiply will be converted to something
else.
I also tried cloning a bunch of existing imul algebraic patterns for
[iu]mul_32x16. This produced the same result, but it was a lot more
churn.
All of the shaders affected were ray tracing shaders in Q2RTX. This is
the only ray tracing workload in my fossil-db.
DG2
Totals:
Instrs: 191995626 -> 191995079 (-0.00%); split: -0.00%, +0.00%
Cycles: 14003803561 -> 14003798040 (-0.00%); split: -0.00%, +0.00%
Spill count: 108320 -> 108288 (-0.03%)
Fill count: 200695 -> 200663 (-0.02%)
Scratch Memory Size: 8755200 -> 8754176 (-0.01%)
Totals from 7 (0.00% of 652118) affected shaders:
Instrs: 14998 -> 14451 (-3.65%); split: -3.94%, +0.29%
Cycles: 137222 -> 131701 (-4.02%); split: -4.10%, +0.07%
Spill count: 32 -> 0 (-inf%)
Fill count: 32 -> 0 (-inf%)
Scratch Memory Size: 19456 -> 18432 (-5.26%)
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27161>
As per this optimisations description:
"Takes assignments to variables that are dereferenced only
once and pastes the RHS expression into where the variables
dereferenced."
However the optimisation is run at compile time before multiple
shaders from the same stage could have been pasted together.
So this optimisation can incorrectly assume a global is only
referenced once since it cannot see the other pieces of the
shader stage until link time.
Here we skip the optimisation if the variable is a global. We
could change it to only run at link time however this
optimisation is only run at link time if we are being forced
to use GLSL IR to inline a function that glsl to nir cannot
handle and this will also be removed in a future patchset.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10482
Fixes: d75a36a9ee ("glsl: remove do_copy_propagation_elements() optimisation pass")
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27351>
I put dashes in the dates when I first introduced the image tags; it
made sense to improve date readability as we had only a handful of these
and they barely combined.
Nowadays we combine a lot of these tags to form the docker image tags,
and we often run out of space.
Let's remove these dashes, making dates slightly harder to read, and
instead allow these two extra characters to be used in the
unique/descriptive part of the tag.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27379>
Use template specialization to handle the static control flow based
on template parameters during code generation instead of relying
on the optimizer to remove the unused code path.
v2: - Fix function opening brace location (zmike)
- remove accidently added dead code
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27327>
Use template specialization to handle the static control flow based
on template parameters during code generation instead of relying
on the optimizer to remove the unused code path.
v2: - move function opening braces to new line and fix indetion (zmike)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27327>
Replace the generic true/false by an enum to make the intent clearer.
Factor out the emission of the barrier, and use template specialization
to pick the type of barrier that is to be emitted, because with template
specialization the control flow is avoided altogether, whereas with
the static code flow it is up to the optimizer to remove the unused bits -
which may not happen in debug builds.
v2: Fix function start braces (zmike)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27327>
Previously we apply implicit sync to all external memory, which is a bit
redundant since we only need it for the dedicated image scenario (media
image imported into Vulkan). This change optimizes just like that while
also excluding wsi which has its own way of synchronizing with the
compositor.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27398>
If the stride is larger than the component with the largest offset plus
the size of that component, it is still considered an overflow if there's
not enough space in the buffer to fit the whole stride-sized thing,
even when there would be enough space to actually write all components.
This is actually much simpler too, since we don't need to verify the
individual components at all (stride is guaranteed to be larger or equal
to the component with the largest offset plus the size of that component).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27368>
We are slightly over-subscribed right now, with 21 merge jobs (10 vk
+ 10 gl + 1 traces) for 20 RPi4, so let's split the tests between
slightly fewer RPi4 and make each split job run for slightly longer,
because it also means that all the jobs start immediately, reducing the
overall delay in merging any MR that triggers this job.
The typical run time for this job was around 8 min with a 10-split; with
an 8-split it is now 9 min, which is still within the 10 min target and
well below the 15min limit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27391>
It's sometimes really, really useful if GL_BGRA8 can be used as a
sized internal format, and the combination of EXT_texture_storage
and EXT_texture_format_BGRA8888 allows this (only when using
texture-storage, which is good enough in some cases). Until now,
we've only implemented ARB_texture_storage, and not the EXT
version.
So let's implement the EXT version as well, so we get the benefit
of the interaction here. This pulls in a lot of other similar
interactions as well, which also seems useful.
...because the ARB version is created from the EXT version, let's move
the EXT function definitions to the EXT extension. These should probably
have been suffixed with ARB in the ARB-version, but things seems to have
just ended up kinda confused. Oh well.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27222>
Similar to what was done for Wayland in 58f90fd03f:
the glthread unmarhsal thread needs to be idle to avoid
concurrent calls to get_back_bo.
Also the existing code flushed after setting dri2_surf->back
to NULL so a new back buffer was always allocated by the
glthread flush:
|---------------> dri2_drm_swap_buffers
| get_back_bo (back=0x55eb93c6c488) > # First get_back_bo call
| get_back_bo (back=0x55eb93c6c488 age: 0)<
| # dri2_surf->back = NULL
|-----> FLUSH
| get_back_bo (back=nil) > # Another get_back_bo call
| get_back_bo (back=0x55eb93c6c4c8 age: 3)<
|-----< FLUSH
|---------------< dri2_drm_swap_buffers
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10437
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27143>