Fixes assertion fails in piglit isinf-and-isnan, which uses a constant infinity,
which has an out-of-bounds mantissa (but the function contract says that's
fine and we just return something undefined.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20563>
This acts as a depth/stencil write. The AGX compiler checks outputs_written to
determine what conservative depth settings the driver needs. Nominally, this
should work: the original store_output(FRAG_RESULT_DEPTH) intrinsic causes the
DEPTH outputs_written bit to be set, so the metadata is still correct after
lowering store_output to store_zs_agx. However, there are a handful of places
that call nir_gather_info late, which *resets* the existing outputs_written
value and regathers, causing Asahi to use the wrong conservative depth settings
when shuffling NIR pass order and breaking gl_FragDepth.
To fix, handle store_zs_agx conservatively when gathering info so we don't have
to play games with the pass order or stashing info in a sideband.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20563>
Use the new get_decoder_fence vfunc to implement
vaQuerySurfaceStatus and vaSyncSurface in the va state tracker.
A pointer to the surface's fence is passed to the codecs before the
end_frame vfunc and the codec is responsible for allocating a fence on
command stream submission.
This fence is then queried on vaQuerySurfaceStatus and waited on in
vaSyncSurface.
Notably both functions were not implemented as per the VA-API docs for
PIPE_VIDEO_ENTRYPOINT_BITSTREAM.
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
Add a get_decoder_fence vfunc that can be used to query the status
of the previous decode job denoted by 'fence' given 'timeout'.
A pointer to a fence pointer can be passed to the codecs before the
end_frame vfunc and the codec should then be responsible for allocating
a fence on command stream submission.
Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20133>
A pretty significant amount of time spent in fd6_draw_vbo is calling
memset to zero init the on-stack struct. And a big part of the size
of the struct is fd6_state, of which we only need to initialize
num_groups to zero. This is worth a 15% improvement in drawoverhead
test 0 ("no state change").
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20572>
Split out the build-up of CP_SET_DRAW_STATE packet, as we are going to
want to re-use this for compute state later when we switch to bindless
IBO descriptors.
While we are at it, drop the enable_mask param, as this is determined
solely by the group_id, and it is easier to maintain a table for the
handful of exceptions to ENABLE_ALL. The compiler should be able to
optimize away the table lookup.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20572>
Same thing as https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20530:
newly added `src/vulkan/util/rmv/vk_rmv_tokens.h` (see !17331) includes
`src/util/` files, so anything that includes it needs `idep_mesautil`.
In file included from ../src/vulkan/util/rmv/vk_rmv_common.h:29,
from ../src/vulkan/runtime/vk_device.h:26,
from ../src/vulkan/wsi/wsi_common.c:31:
../src/util/simple_mtx.h:34:12: fatal error: valgrind.h: No such file or directory
34 | # include <valgrind.h>
| ^~~~~~~~~~~~
compilation terminated.
Fixes: 5f30a7538b ("vulkan: Add RMV token definitions")
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20642>
NIR will automatically lower all of these opcodes unless the driver
specifies that it can handle them natively. We don't have any hardware
support for any of these opcodes though, so we just let NIR lower
all of them.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20639>
Particularly, this makes compilation stop as soon as we get a
valid shader and doesn't try to optimize spilling by trying
fallback strategies.
Might come in handy to reduce CTS execution time, for example,
dEQP-VK.ssbo.layout.random.8bit.all_per_block_buffers.6 goes from
43m46.715s down to 15m15.068s.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20601>
The CPU copy is horribly slow, so let's hook-up DXGI swapchains. Note
that we're still limited in term of features. For instance, we can't
support more than 2 images per swapchain because of the DXGI present
ordering constraint. We also have to do an extra copy, because DXGI
only allows rendering to a resource on the queue that the swapchain
was created against, but swapchains in Vulkan don't have a queue.
The swapchain is bound to the window using DirectComposition aka
DComp. The DComp infrastructure is set up in the surface, and is
transitioned from one swapchain to the next when the new swapchain
begins presenting.
Unlike Wayland and X, there's no requirement that the compositor has
to release a surface before you can start rendering against it. However,
since we're now supporting the non-sw path, we do need to prevent apps
from rendering to a resource *while* the blit is occurring. We do this
by blocking for a fence while acquiring an image.
Co-authored-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Acked-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16200>
The win32 swapchain can be backed by a DXGI swapchain, but such swapchains
are incompatible with STORAGE images (AKA UNORDERED_ACCESS usage in
DXGI). So, we need to allocate an intermediate image that will serve as
a render-target, and copy this image to the WSI image when QueuePresent()
is called. That's pretty similar to what we do for the buffer blit case,
except the image -> buffer copy is replaced by an image -> image copy.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16200>
Right now, the WSI core supports copying WSI images to a linear buffer
for implementations that want the result in this form. This being said,
most of the blit logic can be re-used for image to image copies, and that's
exactly what we'll need if we want to hook-up DXGI swapchains in the
win32 WSI implementation. So let's rename a few fields so we no longer
imply that images are copied to a buffer, and the use_buffer_blit boolean
an enum so we can extend the implementation to support image -> image
copies.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16200>