This lets us stop tracking the pipeline layout. It also means less
indirection on a very hot path. As an extra bonus, we can make some of
our data structures smaller. No measurable CPU overhead improvement.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
On commit f6e7de41d7, we started emitting 3DSTATE_LINE_STIPPLE as part
of the non-dynamic state. That gets re-emitted every time we bind a new
VkPipeline. But that instruction is non-pipelined, and it caused a perf
regression of about 9-10% on Dota2.
This commit makes anv_dynamic_state_copy() return a mask with only the
state that has changed when copying it. 3DSTATE_LINE_STIPPLE won't be
emitted anymore unless it has changed, fixing the problem above.
v2: Improve commit message and add documentation about skipped checks
(Jason)
Fixes: f6e7de41d7 ("anv: Implement VK_EXT_line_rasterization")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Previously, we assumed that the dirty bit was always 1 << VK_DYNAMIC_*
and this assumption is about to be false. Extensions which define new
VK_DYNAMIC_* enums won't be nice and tightly packed which this really
requires. Instead, add functions to don the conversions and rework the
bits a bit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
According to the Vulkan spec, inline uniform blocks are not allowed
to be updated through vkCmdPushDescriptorSetKHR().
These are the spec quotes from "13.2.1. Descriptor Set Layout"
that are relevant for this case:
"VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR specifies
that descriptor sets must not be allocated using this layout, and
descriptors are instead pushed by vkCmdPushDescriptorSetKHR."
"If flags contains
VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR, then all
elements of pBindings must not have a descriptorType of
VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK_EXT".
There is no explicit mention in vkCmdPushDescriptorSetKHR() to forbid
this case but it is implied in the creation of the descriptor set
layout as aforementioned.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The key reason for that mechanism is gone: all the extra optional data
that could be in the anv_push_constants was moved elsewhere. At this
point, just put anv_push_constants directly in anv_cmd_state (part of
anv_cmd_buffer).
v2: Remove a NULL check we don't need anymore in
anv_cmd_buffer_push_constants(). (Lionel)
Fix size we consider for valid push params. (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Found while running Talos Principle.
As far as I can tell running a draw call with a pipeline having push
constants without the application having called vkCmdPushConstants
gives undefined push constant values.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
This commit adds a new way for ANV to do SSBO bindings by just passing a
GPU address in through the descriptor buffer and using the A64 messages
to access the GPU address directly. This means that our variable
pointers are now "real" pointers instead of a vec2(BTI, offset) pair.
This carries a few of advantages:
1. It lets us support a virtually unbounded number of SSBO bindings.
2. It lets us implement VK_KHR_shader_atomic_int64 which we couldn't
implement before because those atomic messages are only available
in the bindless A64 form.
3. It's way better than messing around with bindless handles for SSBOs
which is the only other option for VK_EXT_descriptor_indexing.
4. It's more future looking, maybe? At the least, this is what NVIDIA
does (they don't have binding based SSBOs at all). This doesn't a
priori mean it's better, it just means it's probably not terrible.
The big disadvantage, of course, is that we have to start doing our own
bounds checking for robustBufferAccess again have to push in dynamic
offsets.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This buffer goes along side the CPU data structure and may contain
pointers, bindless handles, or any other descriptor information.
Currently, all descriptors are size zero and nothing goes in the buffer
but this commit sets up the framework we will need later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Pull the common code out of the two entrypoints into the helper which
fetches the push descriptor set for us. Now that it does more than just
get a thing, call it anv_cmd_buffer_push_descriptor_set.
Cc: "19.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Make them all take a device followed by a set. This is consistent
with how the actual Vulkan entrypoint parameters are laid out.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Conditional rendering affects next functions:
- vkCmdDraw, vkCmdDrawIndexed, vkCmdDrawIndirect, vkCmdDrawIndexedIndirect
- vkCmdDrawIndirectCountKHR, vkCmdDrawIndexedIndirectCountKHR
- vkCmdDispatch, vkCmdDispatchIndirect, vkCmdDispatchBase
- vkCmdClearAttachments
Value from conditional buffer is cached into designated register,
MI_PREDICATE is emitted every time conditional rendering is enabled
and command requires it.
v2: by Jason Ekstrand
- Use vk_find_struct_const instead of manually looping
- Move draw count loading to prepare function
- Zero the top 32-bits of MI_ALU_REG15
v3: Apply pipeline flush before accessing conditional buffer
(The issue was found by Samuel Iglesias)
v4: - Remove support of Haswell due to possible hardware bug
- Made TMP_REG_PREDICATE and TMP_REG_DRAW_COUNT defines to
define registers in one place.
v5: thanks to Jason Ekstrand and Lionel Landwerlin
- Workaround the fact that MI_PREDICATE_RESULT is not
accessible on Haswell by manually calculating
MI_PREDICATE_RESULT and re-emitting MI_PREDICATE
when necessary.
v6: suggested by Lionel Landwerlin
- Instead of calculating the result of predicate once - re-emit
MI_PREDICATE to make it easier to investigate error states.
v7: suggested by Jason
- Make anv_pipe_invalidate_bits_for_access_flag add CS_STALL
if VK_ACCESS_CONDITIONAL_RENDERING_READ_BIT is set.
v8: suggested by Lionel
- Precompute conditional predicate's result to
support secondary command buffers.
- Make prepare_for_draw_count_predicate more readable.
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The assert is checking that we are not binding more descriptor sets
than the supported by the driver. When binding the descriptor set
number MAX_SETS-1, it was breaking the assert because
descriptorSetCount = 1.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This makes certain checks a bit easier and means that we don't have
the attachment information duplicated in the attachment list and in
depth_stencil_attachment.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This is part of the device groups extension/feature but it's a decent
chunk of work in its own right so it's worth breaking into its own
patch. The mechanism we use is fairly straightforward: we just push the
base work group id into the shader and add it to the work group id we
get from dispatch.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This requires us to rename any Vulkan API entrypoints which became core
in 1.1 to no longer have the KHR suffix.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
We were setting current_pipeline to UINT32_MAX and then calling
cmd_cmd_state_reset which memsets the entire state struct to 0 which
implicitly resets current_pipeline to 3D. I have no idea how this
hasn't caused everything to explode.
Fixes: cd3feea745 "anv/cmd_buffer: Rework anv_cmd_state_reset"
cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Currently, this helper does nothing but we call it every place where an
image is written through the render pipeline. This will allow us to
properly mark the aux state so that we can handle resolves correctly.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
We need to access the pipeline layout to compute correct dynamic
offsets for dyamic UBO/SSBO descriptors when we emit draw commands.
Instead of taking it from the pipeline object, store the layout
in the command buffer pipeline state.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The spec states that descriptor set layouts can be destroyed almost
at any time:
"VkDescriptorSetLayout objects may be accessed by commands that
operate on descriptor sets allocated using that layout, and those
descriptor sets must not be updated with vkUpdateDescriptorSets
after the descriptor set layout has been destroyed. Otherwise,
descriptor set layouts can be destroyed any time they are not in
use by an API command."
v2: allocate off the device allocator with DEVICE scope (Jason)
Fixes the following work-in-progress CTS tests:
dEQP-VK.api.descriptor_set.descriptor_set_layout_lifetime.graphics
dEQP-VK.api.descriptor_set.descriptor_set_layout_lifetime.compute
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The Vulkan spec says:
"pipelineBindPoint is a VkPipelineBindPoint indicating whether the
descriptors will be used by graphics pipelines or compute pipelines.
There is a separate set of bind points for each of graphics and
compute, so binding one does not disturb the other."
Up until now, we've been ignoring the pipeline bind point and had just
one bind point for everything. This commit separates things out into
separate bind points.
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102897
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
It's now a function which returns the push descriptor set. Since we set
the error on the command buffer, returning the error is a little
redundant. Returning the descriptor set (or NULL on error) is more
convenient.
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
This splits anv_cmd_state_reset into separate init and finish functions.
This lets us share init code with cmd_buffer_create. This potentially
fixes subtle bugs where we may have missed some bit of state that needs
to get initialized on command buffer creation.
Tested-by: Józef Kucia <joseph.kucia@gmail.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "18.0" <mesa-stable@lists.freedesktop.org>
We're going to want subgroup ID for SPIR-V subgroups eventually anyway.
We really only want to push one and calculate the other from it. It
makes a bit more sense to push the subgroup ID because it's simpler to
calculate and because it's a real API thing. The only advantage to
pushing the base thread ID is to avoid a single SHL in the shader.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This is a lot more natural than special casing it all over the place.
We still have to do a bit of special-casing in assign_constant_locations
but it's not special-cased quite as bad as it was before.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This moves us away to the array of pointers model and onto a model where
each param is represented by a generic uint32_t handle. We reserve 2^16
of these handles for builtins that get generated by somewhere inside the
compiler and have well-defined meanings. Generic params have handles
whose meanings are defined by the driver.
The primary downside to this new approach is that it moves a little bit
of the work that we would normally do at compile time to draw time. On
my laptop this hurts OglBatch6 by no more than 1% and doesn't seem to
have any measurable affect on OglBatch7. So, while this may come back
to bite us, it doesn't look too bad.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This ensures that everything gets cleaned up properly. In particular,
it fixes a memory leak where we were leaking the push constants
structs.
Valgrind stats on
dEQP-VK.pipeline.push_constant.graphics_pipeline.range_size_128 :
Before:
HEAP SUMMARY:
in use at exit: 2,467,513 bytes in 1,305 blocks
total heap usage: 697,853 allocs, 696,530 frees, 138,466,600 bytes allocated
LEAK SUMMARY:
definitely lost: 1,068 bytes in 11 blocks
indirectly lost: 24,669 bytes in 412 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 2,441,776 bytes in 882 blocks
suppressed: 0 bytes in 0 blocks
After:
HEAP SUMMARY:
in use at exit: 2,467,381 bytes in 1,304 blocks
total heap usage: 697,853 allocs, 696,531 frees, 138,466,600 bytes allocated
LEAK SUMMARY:
definitely lost: 936 bytes in 10 blocks
indirectly lost: 24,669 bytes in 412 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 2,441,776 bytes in 882 blocks
suppressed: 0 bytes in 0 blocks
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
When writing to set > 0, we were just wrongly writing to set 0. This
commit fixes this by lazily allocating each set as we write to them.
We didn't go for having them directly into the command buffer as this
would require an additional ~45Kb per command buffer.
v2: Allocate push descriptors from system memory rather than in BO
streams. (Lionel)
Cc: "17.2 17.1" <mesa-stable@lists.freedesktop.org>
Fixes: 9f60ed98e5 ("anv: add VK_KHR_push_descriptor support")
Reported-by: Daniel Ribeiro Maciel <daniel.maciel@gmail.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>