Venus implements guest WSI on host external memory and thus cannot
transition guest wsi images to/from VK_IMAGE_LAYOUT_PRESENT_SRC_KHR.
Thus, when a client would attempt to transition a Venus wsi image
to/from VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, Venus instead transitions
to/from VK_IMAGE_LAYOUT_GENERAL and performs an explicit ownership
transfer to/from VK_QUEUE_FAMILY_FOREIGN_EXT. Unfortunately, the
read-only guarantee of VK_IMAGE_LAYOUT_PRESENT_SRC_KHR is lost.
Upon the "acquire from foreign queue" side of that symmetry, when a
client would attempt to retain the contents of the image (i.e.
transition from VK_IMAGE_LAYOUT_PRESENT_SRC_KHR instead of
VK_IMAGE_LAYOUT_UNDEFINED), Venus knows that the image's backing memory
has not been modified. Thus, when those "acquire from FOREIGN queue"
ownership transfers flow to the native driver, Venus can signal it to
skip any acquisition-time validation of an image's internal data,
obtaining the same optimization as native WSI.
This is useful for drivers such as ARM's Mali (with Transaction
Elimination) that would otherwise need to recompute costly per-tile
checksums (CRCs) to ensure that they haven't gone stale during FOREIGN
ownership of the image's memory.
Signed-off-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29777>
Image memory barriers don't need to be fixed when Venus' internal
"presentable" layout is PRESENT_SRC (generally only in specific types of
debugging). In that case, skip barrier fixes as early as possible and
remove early returns from procedures deeper in the call stack.
Signed-off-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29777>
Prepare to allocate VkExternalMemoryAcquireUnmodifiedEXT structs from
command pool cached storage with the same lifetime as
VkImageMemoryBarrier(2) structs.
Also use common parameter naming and function call signatures for the
both the barrier and barrier2 variants.
Signed-off-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29777>
lower_fdot outputs fsum ops like fsum3, which in this stage may
go through nir_to_tgsi paths and tgsi doesn't implement them.
This hits an assert in ntt_emit_alu:
feedback: ../src/gallium/auxiliary/nir/nir_to_tgsi.c:1804:
ntt_emit_alu: Assertion `!"" "Unknown NIR opcode"' failed.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30079>
This removes the unrequired dependance on _mesa_init_debug() and moves
all log code to the util file so that _mesa_log* can now be used without
creating a dependance on mesa/main. Since the code we are moving depends
on the code already in the util (as it was moved here previously) this is
also a much better spot for the code.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30012>
`with_platform_windows` depends on the `platforms` option,
and is an inaccurate proxy check for `host_machine.system() == 'windows'`.
The ICD path and shared library name are dependent on the host system,
not whether or not Lavapipe is built with `WIN32` platform support.
In fact Lavapipe can be built with no platforms (`-Dplatforms=`)
if you only need headless (then this check would be incorrect)
fixes: e030ab5163
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29900>
Previously we tried to map GPU resources directly wherever we could,
however this was always causing random issues and was overall not very
robust.
Now we just allocate a staging buffer on the host to copy into, with some
short-cut for host_ptr allocations.
Fixes the following tests across various drivers:
1Dbuffer tests (radeonsi, zink)
buffers map_read_* (zink)
multiple_device_context context_* (zink)
thread_dimensions quick_* (zink)
math_brute_force (zink)
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30082>
The linear_phi changes are already reflected in the live-in demand
of the successor. The logical_phi_sgpr_ops can only reduce the
register demand at the predecessor. Although this might slightly
overestimate the register-demand, no differences in code quality
were found.
Totals from 1610 (2.03% of 79395) affected shaders: (GFX11)
PreSGPRs: 54002 -> 56954 (+5.47%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29962>
Add the ability to deduplicate hoisted expressions, which will be
necessary to avoid repeatedly hoisting the same descriptors and blowing
our budget. The offset calculation may have itself been hoisted into the
preamble, so we also have to be able to hoist a bindless_resource_ir3
referencing a load_preamble and connect it to the source of the
corresponding store_preamble.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29873>
We have to be careful here because r63.x is also used to mean "register
not assigned yet," but dummy destinations for prefetches will use r63.x
and we don't want them to count as GPRs when determining whether to
enable early preamble.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29873>
We currently assume that the instruction is already inserted and we are
optimizing it away, but in the use case I have where we are hoisting
instructions into a preamble and deduplicating as we go along, that
isn't the case. Move this responsibility onto the caller, which also
makes it a bit clearer what's going on and turns this into something
more similar to an actual set.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29873>