Commit 582bf4d9 turned on write-combining for most (all?) memory
allocations. This caused a fairly large performance drop in some of
our VMware tests (application traces, such as Windows Metro Paint).
This patch adds a third memory type configuration: DEVICE_LOCAL,
HOST_VISIBLE, HOST_COHERENT. This is uncached. Then, in
anv_AllocateMemory() we only use write-combining for this uncached
type. This memory type is found in the Intel Windows Vulkan driver.
And according to
https://asawicki.info/news_1740_vulkan_memory_types_on_pc_and_how_to_use_them
uncached memory correlates to write-combined memory.
This fixes our performance regression (and actually produced the
fastest ever results for our test suite).
Signed-off-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20770>
Handle lookup (for example PRIME_FD_TO_HANDLE) must be synchronized with
GEM_CLOSE, otherwise re-import can race with bo_del path, resulting in
the handle of the newly (re)imported BO getting closed. Now that the
finalize step has been decoupled, fixing this is mostly just deleting
code.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20918>
The complexity around batching up handle closing is simply to allow the
virtgpu to back up ccmd's to the host (because virtio/virtgpu is pretty
inefficient when it comes to lots of small msgs to the host, and it is
common that when we are deleting BOs, we delete a lot of them at the
same time. But that will make the locking fix in the next commit
impossible (without nested locks). So let's flip this around and do the
step that virtgpu wants to batch up first, before we get into closing
GEM handles, etc.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20918>
When importing from a GEM name or dmabuf fd, we can race with the final
unref of the same BO, in which case we can get a hit in the handle
table for an fd_bo that another thread is about to free(). Detect and
handle this case.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20918>
This was a merge conflict from the Win32 WSI DXGI swapchain changes.
I missed moving a new line of code that was added when rearranging
things for using the common helpers.
Fixes: cfa260cd ("dzn: Use common physical device list/enumeration helpers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20944>
VKD3D-Proton DXBC f32 to f16 conversion implements a float conversion using PackHalf2x16.
Because the spec does not specify a rounding mode, it emits a sequence to ensure
D3D-like behaviour for infinity.
When we know the current backend has pack_half_2x16_rtz_split,
we can eliminate the extra sequence.
Fossil DB stats on GFX11:
Totals from 835 (0.62% of 134913) affected shaders:
VGPRs: 49368 -> 49224 (-0.29%)
CodeSize: 5341956 -> 5124564 (-4.07%)
Instrs: 1024062 -> 987041 (-3.62%)
Latency: 6530956 -> 6465120 (-1.01%); split: -1.01%, +0.00%
InvThroughput: 908189 -> 870253 (-4.18%)
VClause: 18704 -> 18702 (-0.01%); split: -0.02%, +0.01%
SClause: 33406 -> 33284 (-0.37%); split: -0.38%, +0.01%
Copies: 67440 -> 65992 (-2.15%); split: -2.15%, +0.00%
Branches: 18498 -> 18465 (-0.18%)
PreSGPRs: 38409 -> 38331 (-0.20%)
PreVGPRs: 44089 -> 43834 (-0.58%)
Note, some fossils are from before this pattern was added to VKD3D-Proton,
so the above may not reflect real-world impact.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15838>
We can lower FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD into other more
generic sends and drop this internal opcode.
The idea behind this change is to allow bindless surfaces to be used
for UBO pulls and why it's interesting to be able to reuse
setup_surface_descriptors(). But that will come in a later change.
No shader-db changes on TGL & DG2.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20416>
The code emitted by lp_build_fpstate_set to reset the FP state could be
jumped over when the write mask was zero, leading to denormals not being
flushed to zero.
Spotted by Roland Scheidegger.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20901>
It's legal to create a library with FRAGMENT_OUTPUT_INTERFACE and with
all CB states as dynamic, in this case the PS epilog should be dynamic.
This fixes a bunch of regressions while running Zink/RADV CTS with
RADV_PERFTEST=gpl.
Zink is the final boss.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20882>
For the common case where we're emitting packet we don't need to
update the cl_out pointer and then store the result in cl->next,
we can directly update cl->next.
This shows a small improvement in vkoverhead's scores for basic
draw tests.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20897>
Fix: Acquire/release should have one valid access/sync and one set
to none.
Workaround: D3D doesn't like simultaneous access resources leaving
COMMON layout, nor does it like setting UAV/RTV access bits for the
COMMON layout.
Use UNDEFINED -> UNDEFINED layout transitions, where the access bits
just aren't validated.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20919>
Float controls are emitted as function attributes on the entrypoint.
These function attributes are not the standard build-in LLVM kind, but
are strings, which the DXIL backend didn't know how to emit. So, this
change adds string attribute support and uses it for fp32 ftz/preserve.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20919>
Allow the compiler to assume that the shader always has full subgroups,
meaning that the initial EXEC mask is -1 in all waves (all lanes enabled).
This assumption is incorrect for ray tracing and internal (meta) shaders
because they can use unaligned dispatch.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20670>