Conditional rendering affects next functions:
- vkCmdDraw, vkCmdDrawIndexed, vkCmdDrawIndirect, vkCmdDrawIndexedIndirect
- vkCmdDrawIndirectCountKHR, vkCmdDrawIndexedIndirectCountKHR
- vkCmdDispatch, vkCmdDispatchIndirect, vkCmdDispatchBase
- vkCmdClearAttachments
Value from conditional buffer is cached into designated register,
MI_PREDICATE is emitted every time conditional rendering is enabled
and command requires it.
v2: by Jason Ekstrand
- Use vk_find_struct_const instead of manually looping
- Move draw count loading to prepare function
- Zero the top 32-bits of MI_ALU_REG15
v3: Apply pipeline flush before accessing conditional buffer
(The issue was found by Samuel Iglesias)
v4: - Remove support of Haswell due to possible hardware bug
- Made TMP_REG_PREDICATE and TMP_REG_DRAW_COUNT defines to
define registers in one place.
v5: thanks to Jason Ekstrand and Lionel Landwerlin
- Workaround the fact that MI_PREDICATE_RESULT is not
accessible on Haswell by manually calculating
MI_PREDICATE_RESULT and re-emitting MI_PREDICATE
when necessary.
v6: suggested by Lionel Landwerlin
- Instead of calculating the result of predicate once - re-emit
MI_PREDICATE to make it easier to investigate error states.
v7: suggested by Jason
- Make anv_pipe_invalidate_bits_for_access_flag add CS_STALL
if VK_ACCESS_CONDITIONAL_RENDERING_READ_BIT is set.
v8: suggested by Lionel
- Precompute conditional predicate's result to
support secondary command buffers.
- Make prepare_for_draw_count_predicate more readable.
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Currently, this helper does nothing but we call it every place where an
image is written through the render pipeline. This will allow us to
properly mark the aux state so that we can handle resolves correctly.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
We'll be performing a GPU memcpy in more places to copy small amounts of
data. Add an alternate function that thrashes less state.
v2:
- Make a new function (Jason Ekstrand).
- Move the #define into the function.
v3:
- Update the function name (Jason).
- Update comments.
v4: Use an indirect drawing register as TEMP_REG (Jason Ekstrand).
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
A GPU memcpy function could alternatively be implemented using MI_*
commands. Provide more detail into how this one operates in case another
memcpy function is created.
v2:
- Update the commit message.
v3:
- Use 'memcpy' instead of 'cpy' (Jason Ekstrand)
- Shorten 'streamout' to 'so'
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This helps Dota 2 on Broadwell by 8-9%. I also hacked up the driver and
used the Sascha "shadowmapping" demo to get some results. Setting
uses_kill to true dropped the framerate on the demo by 25-30%. Enabling
the PMA fix brought it back up to around 90% of the original framerate.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This code is far too complicated to cut and paste.
v2: Update the newly added genX_gpu_memcpy.c; const a few things.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
This method of doing copies has the advantage of touching very little of
the GPU state. While it does disable all the shader stages, it doesn't
have to blow away binding tables, viewports, scissors, or any other bits of
dynamic state other than VBO 32 which is already reserved. All of the
state that it does touch is contained within a pipeline anyway so that's
the only thing that has to be dirtied.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
It really should have gone here all along. We were trying a bit too hard
to make it gen-agnostic just because it didn't have any #if's.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Now that we don't have meta, we have no need for a gen-agnostic pipeline
create path. We can, instead, just generate one Create*Pipelines function
per gen and be done with it.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Now that we no longer have meta, all pipelines get created via the normal
Vulkan pipeline creation mechanics. There is no more need for this bit of
extra magic data that we've been passing around.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Update the comment to reflect the correct filename and add a guard to
catch incorrect inclusion of the header.
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Create a function that performs one of three HiZ operations -
depth/stencil clears, HiZ resolve, and depth resolves.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Initially, we had intended set_subpass to be an interesting function that
did whatever (presumably a lot) setup we needed for a subpass. In reality,
it just sets a pointer and a dirty bit and then emits depth and stencil
state. When we call BeginCommandBuffer on a secondary, there's no point in
setting depth and stencil state since it will already be set by the
primary. Instead, the only thing we need to do at the start of a secondary
is set the subpass pointer and the dirty bit.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
This is the only remaining part of genX_l3.c and there's really no good
reason for it to be in its own file.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Now that we're using gen_l3_config.c, we no longer have one set of l3
config functions per gen and we can simplify a bit. Also, we know that
only compute uses SLM so we don't need to look for it in all of the stages.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
This is in contrast to emitting it directly in vkCmdPipelineBarrier. This
has a couple of advantages. First, it means that no matter how many
vkCmdPipelineBarrier calls the application strings together it gets one or
two PIPE_CONTROLs. Second, it allow us to better track when we need to do
stalls because we can flag when a flush has happened and we need a stall.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>