The following instructions :
- 3DSTATE_VIEWPORT_STATE_POINTERS_SF_CLIP
- 3DSTATE_VIEWPORT_STATE_POINTERS_CC
- 3DSTATE_SCISSOR_STATE_POINTERS
do not care about the content/format/count of the render targets, only
the size of the render area and count of viewport/scissor.
By tracking render targets & render area we can reduce the emission of
those instructions.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25778>
VK_ATTACHMENT_STORE_OP_NONE_EXT is already supported via VK_KHR_dynamic_rendering, logic dictates it shouldn't need much else for VK_ATTACHMENT_LOAD_OP_NONE_EXT.
CTS is passing with the following results on dEQP-VK.renderpass*.load_store_op_none.* tests
Test run totals:
Passed: 78/110 (70.9%)
Failed: 0/110 (0.0%)
Not supported: 32/110 (29.1%)
Warnings: 0/110 (0.0%)
Waived: 0/110 (0.0%)
Requires !24596 to achieve parity with proprietary driver
Closes#9638
Signed-off-by: Sid Pranjale <sidpranjale127@protonmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25771>
gcc is able to optimize away either the modulo or the logical and. This
makes no difference to gcc.
clang is only able to optimize away the logical and. This allows clang
to generate faster code for BITFIELD_MASK.
As for BITFIELD64_MASK, this also makes no difference to clang except it
fixes a compile error for BITFIELD64_MASK(64):
error: shift count >= width of type [-Werror,-Wshift-count-overflow]
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9989
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25757>
This is for GLSL versions where Infs are undefined.
It also helps piglit/glsl-fs-fogscale that breaks when we move the fragment
shader code into the vertex shader across interpolation.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Acked-by: Jesse Natalie on IRC
Acked-by: Erico Nunes on Gitlab
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25391>
Replaced by the new NIR pass.
i915g results:
total instructions in shared programs: 510678 -> 510714 (<.01%)
total temps in shared programs: 30429 -> 30426 (<.01%)
rv370 results:
total instructions in shared programs: 737649 -> 737656 (<.01%)
instructions in affected programs: 82 -> 89 (8.54%)
total temps in shared programs: 112093 -> 112094 (<.01%)
temps in affected programs: 6 -> 7 (16.67%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24763>
i915g and r300-r400 don't have if statements, and discards are all
nir_intrinsic_discard_if. We can flatten those discards here, saving a
separate GLSL pass to try to do so.
i915g:
GAINED: shaders/closed/xcom-enemy-unknown/413.shader_test FS
rv370:
GAINED: shaders/closed/xcom-enemy-unknown/12.shader_test FS
GAINED: shaders/closed/xcom-enemy-unknown/122.shader_test FS
GAINED: shaders/closed/xcom-enemy-unknown/132.shader_test FS
GAINED: shaders/closed/xcom-enemy-unknown/145.shader_test FS
GAINED: shaders/closed/xcom-enemy-unknown/146.shader_test FS
GAINED: shaders/closed/xcom-enemy-unknown/19.shader_test FS
GAINED: shaders/closed/xcom-enemy-unknown/413.shader_test FS
GAINED: shaders/closed/xcom-enemy-unknown/415.shader_test FS
Closes: #9918
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24763>
in cases where apps (stupidly) do swapbuffers->blitframebuffer, there's
no functional way to (legitimately) perform readback on the just-presented
vk image, which leads to the existing acquire+present loop dance
this adds a counter threshold which, when exceeded, begins copying the
scanout image for swapchains to provide local readback on images without
the massive perf penalty of roundtrips
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25754>
In order to return correct offsets, strides and possibly other
parameters. This is relevant for formats like NV12 and P010 where
the second plane, when produced by the Intel VAAPI decoder, uses the
same FD like the first one, but with an offset.
Right now there are only two modifiers with well defined indices
of auxiliary planes for semi-planar formats according to the local
copy of drm_fourcc.h, `I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS` and
`I915_FORMAT_MOD_4_TILED_MTL_MC_CCS`, sharing the same layout.
Assume that future `MC_CCS` modifiers will get defined accordingly
for now.
Cc: mesa-stable
Tested-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9952
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25603>
Things have been working by accident for split display/render SoCs when
using the GBM platform.
The device fd given to the GBM platform may be associated with a
KMS-only device, so _eglFindDevice() should find nothing (because the
global EGLDevice list only has render-capable devices). The only thing
making it work is that we don't error out when we go through the
EGLDevice list and can't find the device we are looking for. We simply
return the last EGLDevice from the list.
This patch fixes that. Now we look for a compatible render-only device
for the KMS-only in the GBM platform.
Signed-off-by: Leandro Ribeiro <leandro.ribeiro@collabora.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24825>
With this change we are able to query the render node fd of a
render-only device compatible with a given KMS-only device (at the
egl/dri2 level).
It uses pipe_loader_get_compatible_render_capable_device_fd(), which was
added in commit "pipe-loader: add
pipe_loader_get_compatible_render_capable_device_fd()".
Signed-off-by: Leandro Ribeiro <leandro.ribeiro@collabora.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24825>
pipe_loader_get_compatible_render_capable_device_fd() receives the fd of
a KMS-only platform device, find a compatible render-only device that is
available and returns the fd of its DRM render node.
This function will be helpful to fix a long standing issue that is
preventing us to add support for EGL_EXT_device_drm_render_node for
split display/render SoCs. And it will also help KMSRO to select a
render-only driver that we are sure that is compatible, because
currently KMSRO uses whatever render-only driver is available.
In sort, in the EGL GBM platform case, the GBM device may be created
with a KMS-only device. The information of what render driver will be
selected by KMSRO is not available before creating the EGLDevice global
list. Without this information we don't have a render node to use in the
EGL_EXT_device_drm_render_node query. We've tried to fix this before,
but failed. See [1-2].
For the moment, this function only works for platform KMS-only devices.
For other types of KMS-only devices, we'll need to add more heuristics.
[1] Detailed explanation of the issue:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/5591
[2] Previous attempt to fix:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12796
Signed-off-by: Leandro Ribeiro <leandro.ribeiro@collabora.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24825>
In a later commit in this series, we'll need to open the first supported
render-only platform device that we can find.
In order to avoid calling loader_open_render_node_platform_device()
multiple times (what is quite expensive), change this function to take a
driver list (instead of a single driver name) as parameter.
Signed-off-by: Leandro Ribeiro <leandro.ribeiro@collabora.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24825>
Heuristic-based optimization throttling CCS work (async compute).
Without throttling, background compute work consumes all threads,
deminishing performance gains by running dispatch in parallel with
3D work.
Optimization is heuristics based, meaning a workload might slow
down when using async compute.
Best value: PixelAsyncComputeThreadLimit = 4. On DG2, this
equates to a max CCS thread occupancy of 37.5%.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25508>
The padded width/height is stored in samples, while the blit box
dimensions need to be specified in pixels. Use the unpadded
width/height of the resource levels to generate the blit box
dimensions used to copy a resource. The blit code already extends
those sizes to the padded sizes when necessary and possible.
Fixes crashes in some piglit tests with MSAA active.
CC: mesa-stable
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25751>
There is not real need to use the spirv-linker here at all,
we can just read all the CL C files into one buffer, then compile
that buffer in a single pass.
This worksaround an issue seen with llvm17 and opaque pointers
and the spirv linker.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25759>