In Vulkan, descriptors for samplers, SSBO's, etc. are collected into
descriptor sets, and shaders can use multiple descriptor sets. At
command-recording time, users can swap out only some of the descriptor
sets, and the driver is supposed to do the minimum amount necessary to
update any internal binding tables, knowing that only some of the
descriptors have changed.
With the old binding model, focused on GL, where there are separate
tables for each type of resource, we can do somewhat better than now by
preserving descriptors from lower descriptor sets when switching higher
descriptor sets. However we still have to copy around descriptors before
each draw.
At least for a6xx, qualcomm went further, essentially copying the Vulkan
binding model as an alternate way to load resources. There's an array of
registers (actually an array for compute and one for everything else),
where each register holds a pointer to a descriptor set that can contain
various different descriptor types. The descriptors are padded out to 16
dwords, so that every instruction can use an index instead of a dword
offset. It's called "bindless", I think, because it can also be used to
implement the old GL bindless extensions (presumably it allows more
samplers and textures than the old model).
This commit adds the register and cmdstream parts. Next up will be the
instruction encoding.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4358>
Carve out some space at the beginning for push constants, and push them
directly, rather than remapping them to a UBO and then relying on the
UBO pushing code. Remapping to a UBO is easy now, where there's a single
table of UBO's, but with the bindless model it'll be a lot harder. I
haven't removed all the code to move the remaining UBO's over by 1,
though, because it's going to all get rewritten with bindless anyways.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4358>
An improved hashing greatly reduces the number of collisions,
and thus, increases the speed for lookups in the hash table.
The hash function now uses Murmur3 written by Austin Appleby.
This patch also pre-reserves space for the hashmap to avoid rehashing.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4130>
* Take tile_mode as input directly
* tu6_format_gmem to tu6_base_format, use may not be limited to GMEM
* Add new helpers that will return the correct tile_mode as for image level
as part of the format.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3783>
This gives +8% with Wolfeinstein Youngblood on my Vega64, and
according to someone else, it also improves performance with Doom
2016 and Wolfenstein 2 (and probably other ID Tech games).
This improvement is because Youngblood uses GENERAL for the main
depth-only pass and TC-compat HTILE is now enabled with GENERAL if
we know that we are outside of a render loop. This obviously also
reduces the number of HTILE decompressions from/to GENERAL.
Note that Youngblood violates the Vulkan spec regarding render loops
because they are only allowed with input attachments. Expect possible
rendering issues if apps use render loops with the wrong way (ie.
without input attachmens) because HTILE might not be coherent if
a depth-stencil texture is sampled and rendered in the same draw.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2704
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4391>
This disables shaderFloat16 on GFX8 because only GFX9+ supports
double rate packed math.
This improves consistency regarding other AMD Vulkan drivers and
it makes no sense to enable that feature without packed math.
This also reduces performance with Wolfeinstein Youngblood if
fp16 is forced enabled on GFX8, while it's similar on GFX9.
We might re-introduce that feature in the future with ACO support
if it ends up being faster and correct.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4453>
This replaces emit_vertex with:
if (vertex_count < max_vertices) {
emit_vertex_with_counter vertex_count ...
vertex_count += 1
}
Which is exactly what NIR->LLVM was doing but at NIR level. This
pass is already called by ACO.
pipeline-db changes on GFX10:
Totals from affected shaders:
SGPRS: 1952 -> 1912 (-2.05 %)
VGPRS: 2112 -> 2044 (-3.22 %)
Code Size: 189368 -> 185620 (-1.98 %) bytes
Max Waves: 494 -> 491 (-0.61 %)
No pipeline-db changes on other generations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4182>
The goal of this function was to return whether a depth-stencil image
has HTILE, in comparison to radv_layout_is_htile_compressed() which
is used to know whether a depth-stencil image has HTILE compressed.
These two functions are actually similar and they have never been
used for what they were supposed to. Remove radv_layout_has_htile()
in favour of radv_layout_is_htile_compressed() for now. If it's
needed in the future, I will re-introduce this concept properly.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4389>