Particularly, this makes compilation stop as soon as we get a
valid shader and doesn't try to optimize spilling by trying
fallback strategies.
Might come in handy to reduce CTS execution time, for example,
dEQP-VK.ssbo.layout.random.8bit.all_per_block_buffers.6 goes from
43m46.715s down to 15m15.068s.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20601>
The CPU copy is horribly slow, so let's hook-up DXGI swapchains. Note
that we're still limited in term of features. For instance, we can't
support more than 2 images per swapchain because of the DXGI present
ordering constraint. We also have to do an extra copy, because DXGI
only allows rendering to a resource on the queue that the swapchain
was created against, but swapchains in Vulkan don't have a queue.
The swapchain is bound to the window using DirectComposition aka
DComp. The DComp infrastructure is set up in the surface, and is
transitioned from one swapchain to the next when the new swapchain
begins presenting.
Unlike Wayland and X, there's no requirement that the compositor has
to release a surface before you can start rendering against it. However,
since we're now supporting the non-sw path, we do need to prevent apps
from rendering to a resource *while* the blit is occurring. We do this
by blocking for a fence while acquiring an image.
Co-authored-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16200>
The win32 swapchain can be backed by a DXGI swapchain, but such swapchains
are incompatible with STORAGE images (AKA UNORDERED_ACCESS usage in
DXGI). So, we need to allocate an intermediate image that will serve as
a render-target, and copy this image to the WSI image when QueuePresent()
is called. That's pretty similar to what we do for the buffer blit case,
except the image -> buffer copy is replaced by an image -> image copy.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16200>
Right now, the WSI core supports copying WSI images to a linear buffer
for implementations that want the result in this form. This being said,
most of the blit logic can be re-used for image to image copies, and that's
exactly what we'll need if we want to hook-up DXGI swapchains in the
win32 WSI implementation. So let's rename a few fields so we no longer
imply that images are copied to a buffer, and the use_buffer_blit boolean
an enum so we can extend the implementation to support image -> image
copies.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16200>
If a shader's sampler state is dirty often, the sampler descriptor heap
can get used up quickly, forcing flushing. If that happens quickly, we
run out of batches and have to wait for batches to finish on the GPU.
When this happens, it is often because the sampler state is switching,
not because it's truly unique. This change hashes and saves sampler
descriptor tables that can be reused in subsequent draws in the same
batch, instead of re-copying the same descriptors and consuming the
heap.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20618>
The maximum number of mipmap levels supported for cubemap can be
determined from the maximum 2D texture size. There is no need
to limit the max to 12.
This fixes a regression in creating GL4.1 and up context since
commit 2658d02516 is now explicitly checking for
MaxCubeTextureLevels >= 15 for GL4.1 context.
Reviewed-by: Neha Bhende <bhenden@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20600>
We are now extremelly careful when copy propagating a mov that uses
relative addressing. The search for readers will trigger abort when it
sees any other instruction using a relative addressing, irrespective of
the actual used registers or whether an address register load was seen.
Additionally, since ntt switch all movs using the relative addressing are
actually used only once right on the next line, and are result of ntt converting
vec4 32 ssa_10 = intrinsic load_ubo_vec4 (ssa_0, ssa_9) (access=0, base=11, component=0)
into
5: ARL ADDR[0].x, TEMP[0].xxxx
6: MOV TEMP[2], CONST[0][ADDR[0].x+11]
RV530 shader-db:
total instructions in shared programs: 132966 -> 131904 (-0.80%)
instructions in affected programs: 29896 -> 28834 (-3.55%)
helped: 234
HURT: 2
total temps in shared programs: 16969 -> 16905 (-0.38%)
temps in affected programs: 604 -> 540 (-10.60%)
helped: 68
HURT: 12
Partial fix for: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7723
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20577>
When computing line smoothing, we can also do something similar to
compute the line stippling. This can be useful for some drivers, who
can't easily split the lines before rasterizing them.
This does lead to slightly inaccurate stippling, because the
line-smoothing extends the line-length by a small amount. That leads to
the line-stippling pattern being over-stretched over the line-segment by
a fraction of a pixel in lenght. For short lines, this can be quite a
lot of error.
Reviewed-by: Soroush Kashani <soroush.kashani@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20134>
I'm convinced that u_blitter interactions with fbfetch can't be handled
in si_update_ps_colorbuf0_slot alone, so it has to be force-disabled
by si_blitter_begin. Another reason why it has to be disabled for u_blitter
and not ignored is because FBFETCH with MSAA enables sample shading
regardless of context states, and we don't want that for u_blitter.
Also, si_update_ps_colorbuf0_slot now disables FBFETCH explicitly before
its own DCC and CMASK decompression because even though u_blitter can't do
anything (due to blitter_running), si_blitter_end calls it too.
The result is that no recursion can occur thanks to the blitter_running
and suppress_update_ps_colorbuf0_slot flags, and FBFETCH is always
force-disabled before those flags are set, which is the state we want
to be in.
Fixes: bc6d22b920 ("radeonsi: fix ps_uses_fbfetch value")
Acked-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20318>