drm_shim.c undefines the _FILE_OFFSET_BITS macro, so plain off_t might
be 32 bits, while it's 64 bits in device.c. To avoid this mismatch,
use off64_t which will always be 64 bits in both source files.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12203>
loader_open_render_node returns the first device in /dev/dri that it
can use. To make sure the drm-shim device always gets chosen, return
the fake entries in readdir before returning the real ones.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12203>
Will allow us to drop more GLSL IR code in future once we switch
all drivers to NIR. Also stops the need for all drivers to call
this pass to remove indirect temps that may have been added during
the NIR varying linking lowering/optimisations.
This patch fixes some tests on i915, d3d12, lima and vc4.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15871>
We've wanted to remove destructors from ralloc's API for a long time (it's
an extra storage cost per ralloc for a rarely-used feature), and for the
suballoc change we'd need to spend more storage on storing the tu_device
pointer per result since destructors don't get anything else but the
pointer passed into them.
Fixes use-after-frees:
=================================================================
==2383==ERROR: AddressSanitizer: heap-use-after-free on address 0xffff88fe1940 at pc 0xffff934f427c bp 0xfffff5481e90 sp 0xfffff5481ea8
WRITE of size 8 at 0xffff88fe1940 thread T0
#0 0xffff934f4278 in list_del ../src/util/list.h:108
#1 0xffff934f4278 in result_destructor ../src/freedreno/vulkan/tu_autotune.c:237
#2 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#3 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#4 0xffff934f4368 in history_destructor ../src/freedreno/vulkan/tu_autotune.c:229
#5 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#6 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#7 0xffff934f5990 in tu_autotune_on_submit ../src/freedreno/vulkan/tu_autotune.c:442
[...]
0xffff88fe1940 is located 80 bytes inside of 112-byte region [0xffff88fe18f0,0xffff88fe1960)
freed by thread T0 here:
#0 0xffff9c1c90d8 in __interceptor_free ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:127
#1 0xffff934f4368 in history_destructor ../src/freedreno/vulkan/tu_autotune.c:229
#2 0xffff9377793c in unsafe_free ../src/util/ralloc.c:300
#3 0xffff9377793c in ralloc_free ../src/util/ralloc.c:265
#4 0xffff934f5990 in tu_autotune_on_submit ../src/freedreno/vulkan/tu_autotune.c:442
#5 0xffff935cf2ac in tu_queue_submit_locked ../src/freedreno/vulkan/tu_drm.c:997
[...]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
We don't return unused space to the suballocator, so it's a little useful
to limit how much we overallocate to reduce memory footprint. I took a
look through the tu_cs_emit_array() calls and accounted for a couple of
them in the variant-specific space calculation, then dropped the base
allocation by factors of 2 until we started throwing asserts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
Allocating a BO for each pipeline meant that for apps with many pipelines
(such as Asphalt9 under ANGLE), we would end up spending too much time in
the kernel tracking the BO references.
Looking at CS:Source on zink, before we had 85 BOs for the pipelines for a
total of 1036 kb, and now we have 7 BOs for a total of 896 kb.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15038>
The intel_perf_query path used for performance queries on GL was
passing a bogus "end" pointer to intel_perf_query_result_accumulate(),
causing it to accumulate garbage values. This was causing the values
of many performance counters to be corrupted.
The "end" pointer was incorrect because the current code was assuming
that different OA reports were located TOTAL_QUERY_DATA_SIZE bytes
apart, which is a hard-coded preprocessor define. However recent
(Gfx12+) hardware generations use a variable query size determined by
the query layout. Use the size derived from it instead, and remove
the stale define.
Fixes: 3c51325025 ("intel/perf: switch query code to use query layout")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15783>
gallium queries have to be mapped onto multiple vulkan queries,
this can happen for two reasons.
1. primitives generated and overflow any don't map directly, and
multiple vulkan queries are needs per iteration. These are stored
inside the "starts" as zink_vk_query ptrs.
2. suspending/resuming queries uses multiple queries, these are
the "starts". Every suspend/resume cycle adds a new start.
Vulkan also requires that multiple queries of the same time don't
execute at once, which affects the overflow any vs xfb normal
queries, so vk_query structs are refcounted and can be shared
between starts. Due to this when the draw state changes, it's
simple to just suspend/resume all queries so the shared vulkan
queries get handled properly.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15785>
This enables a lower power mode in the sampler hardware in certain
common scenarios. On Tigerlake, SAMPLER_MODE is not programmable by
userspace but the kernel already sets this bit for us.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
While this register still exists, it's no longer a per-context register.
Instead, on Gfx12+, SAMPLER_MODE exists per dual-subslice and is
accessed as a "multicast" register, where you write control which
version is accessed by the "steering control register".
At any rate, userspace cannot write it any longer, and so there's not
much point to it existing in our genxml (which was missing most of the
fields anyway).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
This allows the sampler to perform faster filtering of 8-bit UNORM
textures by filtering them at a different precision. The filtering
is intended to still be OpenGL and DirectX spec compliant.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15628>
according to spec, a barrier is required any time the pixels of the
framebuffer are changed. since zink defers clears and runs them at
a later time, it must also be responsible for handling the required
synchronization for such operations
fixes (radv):
KHR-GL46.blend_equation_advanced.blend_all*
fixes#5572
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15831>
according to spec, for fbfetch this should match the subpass self-dependency of
* stage FRAGMENT_STAGE -> FRAGMENT_STAGE
* access 0 -> INPUT_ATTACHMENT_READ
zs fbfetch doesn't seem to be a thing, so that code is left for historical
and/or future purposes
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15831>
this is super annoying since it means that a build of zink cannot
be mix-and-matched with an existing build of lavapipe, e.g., for faster
bisecting
the env var should be sufficient to handle this, and if someone sets it
and doesn't have swrast enabled then they can deal with it
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15844>