util_cpu_detect is an anti-pattern: it relies on callers high up in the call
chain initializing a local implementation detail. As a real example, I added:
...a Mali compiler unit test
...that called bi_imm_f16() to construct an FP16 immediate
...that calls _mesa_float_to_half internally
...that calls util_get_cpu_caps internally, but only on x86_64!
...that relies on util_cpu_detect having been called before.
As a consequence, this unit test:
...crashes on x86_64 with USE_X86_64_ASM set
...passes on every other architecture
...works on my local arm64 workstation and on my test board
...failed CI which runs on x86_64
...needed to have a random util_cpu_detect() call sprinkled in.
This is a bad design decision. It pollutes the tree with magic, it causes
mysterious CI failures especially for non-x86_64 developers, and it is not
justified by a micro-optimization.
Instead, let's call util_cpu_detect directly from util_get_cpu_caps, avoiding
the footgun where it fails to be called. This cleans up Mesa's design,
simplifies the tree, and avoids a class of a (possibly platform-specific)
failures. To mitigate the added overhead, wrap it all in a (fast) atomic
load check and declare the whole thing as ATTRIBUTE_CONST so the
compiler will CSE calls to util_cpu_detect.
Co-authored-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15580>
NIR doesn't have that knob currently, so we end up throwing errors about
it being ignored.
This should fix cases of "tgsi_to_nir: unhandled TGSI property 23 = 1",
and presumably do better at DX9 muls on nv50 and r600.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14883>
Otherwise, we get an ishl that the HW can't support, and a ushr if the NIR
ends up being lowered to ubo_vec4, which may not get constant-folded if
the offset was non-constant.
This matches what mesa/st uses for this arg to uniform lowering.
Fixes: #5971
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14883>
On virglrenderer it is of interest to not re-use temporaries when we
want to handle precise, invariant, and highp/mediump with better
possibility for optimization.
v2: Force optimized RA if the number of registers is too large
(Emma: only 16 bit signed int are reserved for register indices)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16051>
Found in virgl, where a glslparsertest accidentally gets its inputs
lowered to undefs, and 64-bit undefs don't get split by the normal
alu/intrinsic splitter (and would be hard to split because other passes
would see reconstruction of the vec4 from undefs and turn it back into
vec3/vec4 undef).
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16043>
Getting opengl32*.def consistence with Windows SDK.
Getting osmesa.mingw.def's gl* functions consistence with Windows SDK.
stw_* functions are cdecl, not stdcall, so there is no need mangling the symbol.
Fixes egl.def for x86
d3d10sw: Move the place of d3d10_sw.def to d3d10_sw.def.in
Fixes vulkan_lvp.def for x86
Fixes#5552
Remove stdcall-fixup
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14041>
gfxbench sets these between the gbuffer subpass and the following ones.
They should be no-ops as subpass dependencies. gfxbench vk-5-debug perf
12.8 -> 14.6 fps thanks to getting gmem on the gbuffer rendering.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15982>
gfxbench vk-5-normal has a shader that sampels into a texels[] array at
the top, then in a loop calls a GLSL function passing texels[] in by
value. This resulted in a copy to a temp inside the loop, which got
lowered to scratch stores since it was pretty big.
By doing find_array_copies(), we notice that it's equivalent to
copy_deref, then get to copy-propagate from the array at the top. Then we
only have to set up the scratch array outside of the loop and load_scratch
from it in the called function inside the loop. This also causes there to
be less spilling, stps 1144 -> 354 and ldps 826->36.
However, it doesn't seem to change performance on the test. So, while
this seems to be an improvement for the shader, and we could maybe even do
better by rematerializing the txl samples inside the loop instead of
storing the texture fetches to scratch in the first place, it doesn't
currently seem worth pursuing more optimization of this shader.
No change on freedreno shader-db.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15982>
This was useful for comparing image allocations between gfxbench
gl_5_normal and vk_5_normal to see if rendering was generally equivalent
(formats, MSAA, UBWC choices, and notably gfxbench vk was choosing DXT5
instead of ASTC on non-android builds!)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15982>
Refactored tu6_calculate_lrz_state and added comments.
1) If there is no depth write we could keep LRZ valid with any
compare op, we just have to temporary disable LRZ for incompatible
ops in such case.
2) Found that VK_COMPARE_OP_EQUAL is not compatible with LRZ,
and since it doesn't change LRZ buffer - LRZ could be just
temporary disabled. This fixes rendering of grass/trees in
PUBG mobile on angle.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6127
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16014>
The tgsi path already marked all aliasing loads of atomic counters with
CACHE_CG, so we don't need to emit a cctl. This patch uses the cache
flag on the atomic to model whether the L1 cache needs the stale
values to be flushed or not.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14386>
Using the vulkan-helpers from C++ code has turned out to have a lot of
friction, because no other driver uses C++ for this.
So let's bite the bullet and call the D3D12 C-API instead. The C-API
wasn't really around when we started out, but it's there now.
This is still far from ideal; we should really create some wrapping
macros to generate the extremely verbose COM calls.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15816>