Even although the option is called shaderdb, it is not really used by
shaderdb (for V3D shaderdb uses the debug option "precompile"). And in
fact, right now the output format is not compatible with shaderdb.
This commit tries to fix that, and as we are here, also try to make
the option more useful for the Vulkan case, as that debug option also
works with v3dv.
We can't really fully imitate shaderdb use with OpenGL (run with a set
of glsl shader tests), but we can at least assign a unique name (the
pipeline sha1 in text format) so we can compare executions of the same
vulkan application. For that remember to disable the on-disk cache.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13938>
Mali4x0 supports writing depth and stencil from fragment shader
and we've been using it quite a while for depth/stencil buffer reload.
The missing part was specifying output register for depth/stencil.
To figure it out, I changed reload shader to use register $4 as output
and poked RSW bits (or rather consecutive 4 bit groups) until tests
that rely on reload started to pass again.
It turns out that register number for gl_FragDepth/gl_FragStencil is in
rsw->depth_test and register number for gl_FragColor is in
rsw->multi_sample and it's repeated 4 times for some reason (likely for
MSAA?)
With this knowledge we now can modify ppir compiler to support multiple
store_output intrinsics.
To do that just add destination SSA for store_output to the registers
list for regalloc and mark them explicitly as output. Since it's never
read in shader we have to take care about it in liveness analysis -
basically just mark it alive from the time when it's written to the end
of the block. If it's live only in the last instruction, mark it as
live_internal, so regalloc doesn't clobber it.
Then just let regalloc do its job, and then copy register number to the
shader state and program it in RSW.
The tricky part is gl_FragStencil, since it resides in the same register
as gl_FragDepth and with the current design of the compiler it's hard to
merge them. However gl_FragStencil doesn't seem to be part of GL2
or GLES2, so we can just leave it not implemented.
Also we need to take care of stop bit for instructions - now we can't
just set it in every instruction that stores output, since there may be
several outputs. So if there's any store_output instructions in the
block just mark that block has a stop, and set stop bit in the last
instruction in the block. The only exception is discard - we always need
to set stop bit in discard instruction.
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13830>
We can't insert mul node into add node instruction if it's a virtual dep
(sequence or write_or_read dep), so use ppir_node_has_single_src_succ
in addition to ppir_node_has_single_succ.
We can't use ppir_node_has_single_src_succ alone, since node may have
a virtual dependency in addition to source dependency, and we can't
insert it either in this case.
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13830>
The function need_temp_reg_initialization looks suspecious.
It will only ever return true if we get past this if:
if (!(emit->info.indirect_files && (1u << TGSI_FILE_TEMPORARY)) ...
Using the logical && means the intended initialization done
based on the result of this check is not performed.
This code was both introduced and altered in MR 5317.
ccb4ea5a introduces the function.
ba37d408 is a collection of performance improvements and misc
fixes. This altered the if from using bitwise to logical and.
This commit changes it back to bitwise.
Spotted from a compile warning.
Fixes: ba37d408da ("svga: Performance fixes")
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12157>
We don't advertise bufferDeviceAddressCaptureReplay capability and
neither does blob, because at the moment there is no way to allocate
bo with predefined iova.
There is no support of any arithmetic with addresses since shaderInt64
is not enabled. However, we could enable int64 support whenever we want.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8717>
Separating atomic opcodes makes possible to express a6xx global
atomics which take iova in SRC1. They would be needed by
VK_KHR_buffer_device_address.
The change also makes easier to distiguish atomics in conditions.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8717>
LLVM has all the required intrinsics available on IBM Z, so use them for
rounding operations (they will be implemented as a single instruction).
This change makes the test case lp_test_arit pass, because it avoids
using the buggy generic code.
v2: update .gitlab-ci/cross-xfail-s390x to reflect passing lp_test_arit
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13927>
As preparation for changing the behavior of LLVMpipe on IBM Z, add a
flag to detect that platform. As it is always known at compile-time, we
do not add it to the struct for cpu flags to avoid inflating that
struct's size.
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13927>
Add a new command associated to glLinkProgram. With this we should be
able to compile and link shaders when requested by the user, thus
avoiding that to happen in the middle of a frame.
Together with the command we pass an array of shader handles attached to
the program, where each position of the array corresponds to a pipe
shader type.
Signed-off-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13674>
If we did, we would have the instruction coming right after ldvary write
to the same implicit destination as ldvary at the same time. We prevent
this when merging instructions, but we should make sure we prevent this
when we move ldvary around for pipelining too.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13921>
In the following scenario:
CmdBindPipeline()
CmdBindVertexBuffers()
CmdSetVertexInput()
CmdDraw()
CmdBindVertexBuffers()
CmdSetVertexInput()
CmdDraw()
The VBO won't be updated for the second draw because the state is
cleared when the dynamic state is emitted and the pipeline isn't dirty.
Found by inspection.
Cc: 21.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13855>
For SIMD8 half float payload, each component takes a full register, so
we can use existing LOAD_PAYLOAD infrastruture for required padding by
alternating plain 8-wide half float vector and null vector.
Also this patch removes an unwanted assertion from
opt_copy_propagation_local for LOAD_PAYLOAD.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
To support SIMD8 half float payloads, each component takes one full
32bit wide register in both SIMD8H and SIMD16H mode. So we can make use
of existing LOAD_PAYLOAD infrastructure alternating a half float vector
and a null vector, in order to handle required padding.
v2: (Francisco)
- Skip header sources
- Fix comparision units
- Don't allocate VGRF for padded source
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11766>
This function is big and I don't think it will won't get meaningfully
constant-propagated during inlining without LTO. Move it to a .c file so
we just have one copy, saving 2.8MB from libnir.a on an amd64 release
build.
text data bss total filename
before:
18953406 7768312 687260 27408978 build-release/driver-symlinks/iris_dri.so
9734366 5542453 481692 15758511 build-release/lib/libvulkan_intel.so
28687772 13310765 1168952 43167489 (TOTALS)
after:
15478350 7767864 687260 23933474 build-release/driver-symlinks/iris_dri.so
6810366 5541685 481692 12833743 build-release/lib/libvulkan_intel.so
22288716 13309549 1168952 36767217 (TOTALS)
No statistically significant performance difference on iris shader-db, n=8.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13889>