The NIR lowering for mediump can sometimes detect stores of 16-bit values
and demote the outputs, but even better is to have them decorated properly
in the first place. Fixes a bunch of full-precision outputs in gfxbench
Aztec Ruins.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18960>
You can't do it unless GL called the sampler mediump. Also, the spirv
says "For image-sampling operations, decorations can appear on both the
sampling instruction and the image variable being sampled. If either is
decorated, they both should be decorated, and if both are decorated their
decorations must match. If only one is decorated, the sampling instruction
can behave either as if both were decorated or neither were decorated." so
emit it on the declaration too.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18960>
the pipe_context::link_shader hook is called when shaders are
linked into a program by the application
by leveraging this, it becomes possible to utilize the existing
graphics pipeline library to implement precompilation
by creating a partial pipeline containing only the shader stages
and then adding in the vertex input and fragment output stages
dynamically using the fast-link feature
if all goes well, and if the vulkan driver's fast-linking is
truly fast, the full pipeline should be dynamically combined
in time to avoid stuttering, and an optimized variant will be
queued for async compile to be used the next time the pipeline
triggers a draw
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18961>
This change implements 3 states in one go:
- depth clamp enable
- depth clip enable
- depth clip negative one to one
This affects following packets:
3DSTATE_CLIP
3DSTATE_VIEWPORT_STATE_POINTERS_CC
3DSTATE_RASTER
v2: remove clip enable bit check from viewport emit (Lionel)
v3: use helper function from runtime to get depth clip (Lionel)
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18879>
When the output patch size <= 32 we can be sure regardless
of wave size that each wave will take this branch, therefore
the jump can be removed.
Fossil DB stats on Navi 21:
Totals from 1385 (1.03% of 134906) affected shaders:
CodeSize: 2664436 -> 2658896 (-0.21%)
Instrs: 488618 -> 487233 (-0.28%)
Latency: 2290157 -> 2289199 (-0.04%)
InvThroughput: 898658 -> 898364 (-0.03%)
Branches: 6554 -> 5169 (-21.13%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-By: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>
The GPU can skip LDS instructions when LGKMCNT==0, and for these
branches this should be always faster than a jump.
Fossil DB stats on Navi 21:
Totals from 60918 (45.16% of 134906) affected shaders:
CodeSize: 158624792 -> 157893776 (-0.46%)
Instrs: 30234254 -> 30051500 (-0.60%)
Latency: 139521675 -> 139434597 (-0.06%); split: -0.06%, +0.00%
InvThroughput: 21184146 -> 21183653 (-0.00%); split: -0.00%, +0.00%
Branches: 1115134 -> 932380 (-16.39%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-By: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>
"Removing jumps" in ACO means skipping the jump instruction
at the beginning of a divergent branch (but still modify exec).
ACO already supports implicitly removing jumps when it decides
that executing a branch with empty exec mask is more beneficial
than a jump.
This commit adds the possibility to use this explicitly
through nir_selection_control. ACO will respect this
setting and remove the branch instructions when this is specified,
unless it decides that this would cause bugs (eg. exp instruction).
There are two cases that benefit from the new change:
1. When the application requests to "flatten" a branch (ie.
remove control flow), we now respect that.
2. When the compiler stack determines that a divergent branch
is always taken.
v2 by Georg Lehmann: fixed applying sel_ctrl to else blocks
Fossil DB stats on Navi 21:
Totals from 13 (0.01% of 134906) affected shaders:
CodeSize: 136616 -> 136496 (-0.09%)
Instrs: 26196 -> 26166 (-0.11%)
Latency: 417928 -> 417889 (-0.01%)
Branches: 1241 -> 1211 (-2.42%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-By: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>
The new enum is called nir_selection_control_divergent_always_taken,
and it's almost the same as nir_selection_control_flatten.
The main difference between the two is that "flatten" represents
a choice made by the application but "divergent_always_taken" may
be applied by the compiler stack when it thinks this is beneficial.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-By: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>
On DG2 the HW will fetch the binding entries into the cache
for every single thread when a compute walker is dispatched,
wiping out the advantages of the cache prefetch.
The spec also advises to not do a cache prefetch when we have more than
31 binding table entries, but most real world applications will never
hit that limit.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18498>