This way we're properly using the vulkan barrier paradigm instead
of adhoc guessing what caches need to be flushed. This is more robust
for cache policy changes as we now don't have to revisit all the meta
operations all the time.
Note that a barrier has both a src and dst part though. So
barrier:
flush src
meta op
flush dst
becomes
barrier:
flush barrier src
flush meta op dst
meta op
flush meta op src
flush barrier dst
And there are some places where we've been able to replace a CB flush
with a shader flush because that is what we'd need according to vulkan rules
(and it turns out that in the cases the CB flush mattered the app will set the
bit in one of the relevant flushes or it was needed as a result of an optimization
that we counter-acted in the previous patch.)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7202>
This cleans up a bunch of gross sprintfs and keeps the caller from needing
to remember to ralloc_strdup. I added a couple of '"%s", name ? name :
""' to radv where I didn't fully trace through whether a non-null name was
being passed in.
I also took the liberty of adding a basic name to a few shaders (pan_blit,
unit tests)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7323>
For extra args. Unlike image creation, I'm not embedding the vk
struct in there, so all the inline structs can be kept.
Reviewed-by: Dave Airlie <airlied@redhat.com>
load_fragcoord is already handled in common code for radeonsi, so we
don't need to do anything to handle it. However, there were some passes
creating NIR with the varying, so we switch them over to the sysval. In
the case of nir_lower_input_attachments which is used by both radv and
anv, we add handling for both until intel switches to using a sysval.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When using graphics, the driver doesn't need to decompress HTILE
before resolving. This path currently doesn't support layers
so we have to fallback to the compute path.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The old code was not wrong because the transitions performed
after the resolves should re-emit the framebuffer if needed.
This change is mostly a no-op but it improves consistency
regarding other meta operations that need to save/restore subpasses.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The Vulkan spec says:
"If pResolveAttachments is not NULL, for each resolve attachment
that does not have the value VK_ATTACHMENT_UNUSED, the
corresponding color attachment must not have the value
VK_ATTACHMENT_UNUSED."
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This reworks how the depth stencil attachment is used for
simplicity. This also introduces radv_render_pass_compile()
helper that will be used for further optimizations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Seems that in a single case we use the renderpass before checking
the pipeline, so check the renderpass before we use it.
Fixes: fbcd167314 "radv: Add on-demand compilation of built-in shaders."
Tested-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
In environments where we cannot cache, e.g. Android (no homedir),
ChromeOS (readonly rootfs) or sandboxes (cannot open cache), the
startup cost of creating a device in radv is rather high, due
to compiling all possible built-in pipelines up front. This meant
depending on the CPU a 1-4 sec cost of creating a Device.
For CTS this cost is unacceptable, and likely for starting random
apps too.
So if there is no cache, with this patch radv will compile shaders
on demand. Once there is a cache from the first run, even if
incomplete, the driver knows that it can likely write the cache
and precompiles everything.
Note that I did not switch the buffer and itob/btoi compute pipelines
to on-demand, since you cannot really do anything in Vulkan without
them and there are only a few.
This reduces the CTS runtime for the no caches scenario on my
threadripper from 32 minutes to 8 minutes.
Reviewed-by: Dave Airlie <airlied@redhat.com>
The goal is to use radv_barrier()/radv_subpass_barrier() as
much as possible for further optimizations.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Multisampled source images (ie. color attachments) can be now
DCC compressed, so the driver needs to perform a DCC decompression
pass before resolving
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This helper shares common code before resolving using either
a fragment or a compute shader.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It should already be valid there + the RB will update it during
rendering.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
For fast clear eliminate and decompressions, we always use the most compressed
format.
For clears, the code already creates a renderpass on demand with the exact same
layout as specified.
Otherwise we start distinguishing between GENERAL and TRANSFER_DST_OPTIMAL.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Framebuffer is from 0,0, not (dst.x, dst.y).
Fixes: 69136f4e63 "radv/meta: add resolve pass using fragment/vertex shaders"
Reviewed-by: Dave Airlie <airlied@redhat.com>
The position start at (dst.x, dst.y), so if we want the source to
start at (src.x, src.y), we have to offset by (src.x-dst.x,src.y-dst.y).
Haven't tested that this fixed anything yet, but found by inspection.
Fixes: 69136f4e63 "radv/meta: add resolve pass using fragment/vertex shaders"
Reviewed-by: Dave Airlie <airlied@redhat.com>
And merge radv_meta_save_novertex() with
radv_meta_save_graphics_reset_vport_scissor_novertex().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>