We cannot send negative viewport coordinates to the hardware,
so clamp them since negative min.x/y is valid per spec.
The negative origin still counts in calculations of guardband.
Fixes crash in 3DMark's "Sling Shot Extreme" test.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9629>
The previous approach attempted to construct phi nodes
on-demand and on-the-fly. Due to several bugs, it became
necessary to always create incomplete phis for all live-in
variables on loop headers, which is highly inefficient.
The new approach assumes that live-in variables on loop-
headers don't get renamed, and afterwards does one renaming
pass per loop nest. This greatly simplifies the code and
reduces the memory footprint.
Totals from 37 (0.03% of 136546) affected shaders (Navi10):
CodeSize: 588148 -> 588020 (-0.02%); split: -0.03%, +0.01%
Instrs: 111793 -> 111761 (-0.03%); split: -0.04%, +0.01%
Latency: 4546013 -> 4545611 (-0.01%); split: -0.02%, +0.01%
InvThroughput: 2806217 -> 2805730 (-0.02%); split: -0.03%, +0.01%
VClause: 2044 -> 2046 (+0.10%)
SClause: 3889 -> 3884 (-0.13%)
Copies: 17730 -> 17700 (-0.17%); split: -0.23%, +0.06%
Branches: 3282 -> 3280 (-0.06%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9763>
OpEntryPoint is only supposed to include the Input and Output storage
classes prior to SPIR-V 1.4.
Since we're always emitting SPIR-V 1.0, we should simply omit these from
OpEntryPoint for now. If we start emitting SPIR-V 1.4 or later, we
should add these back in.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9828>
The liveness code in ppir can be a bottleneck in complicated shaders
with multiple spills, and the use of the mesa set data structure is one
of the main reasons it is expensive to run.
With some changes, it can be adapted to using bitsets which makes it run
substantially faster.
ppir liveness can't run with a regular bitset for registers since we
need to track inviditual component masks for non-ssa registers, but we
can switch to using a separate packed bit array just for the masks,
rather than a full blown hash set. This also makes operations such as
liveness propagation much more straightforward.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9745>
The live_out set was serving mostly as a temporary storage of data
propagated from the next instructions.
If we do an early copy to preserve the instruction live_in set and
propagate the liveness information directly to the live_in set, we can
skip having a live_out set entirely.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9745>
liveness information was stored in blocks so that the algorithm doesn't
have to find a valid instruction to inherit liveness information from.
It turns out this part of the code can be a performance bottleneck in
applications loading with complicated shaders, so skip the liveness
information in blocks to reduce the number of times we have to copy
liveness information.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9745>
Loosely based on ANV implementation.
For executable's internal representation we output:
- Initial NIR after spirv_to_nir
- Final optimized NIR
- IR3 disassembly
Note, that vkGetPipelineExecutablePropertiesKHR is required to
return executable properties even if pipeline was not created with
CAPTURE_STATISTICS or CAPTURE_INTERNAL_REPRESENTATIONS bits set.
So the executables array is unconditionally populated, however
NIR and IR3 disassemlies are filled only when
CAPTURE_INTERNAL_REPRESENTATIONS is set.
Passes dEQP-VK.pipeline.executable_properties.*
Works with RenderDoc.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8877>
When always using 128-bit operations, an access at the end of an SSBO
could spill over to the next page and cause a GPU fault. To avoid
this, pick the smallest instruction that fits.
The if ladder might need extending for sizes of less than 32 bits, but
those currently fail in other parts of the compiler so are untested.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9747>
Integer add/sub can be implemented as either an add or a mul instruction
but we always emit them as add instructions at VIR level. We can use this
flexibility to improve our QPU scheduling so we can be more effective
at instruction merging by converting these to mul instructions when we
are attempting to merge them with another add instruction.
total instructions in shared programs: 13721549 -> 13691004 (-0.22%)
instructions in affected programs: 3340493 -> 3309948 (-0.91%)
helped: 12805
HURT: 1656
Instructions are helped.
total max-temps in shared programs: 2319528 -> 2319317 (<.01%)
max-temps in affected programs: 5285 -> 5074 (-3.99%)
helped: 195
HURT: 3
Max-temps are helped.
total sfu-stalls in shared programs: 31616 -> 31752 (0.43%)
sfu-stalls in affected programs: 469 -> 605 (29.00%)
helped: 52
HURT: 161
Sfu-stalls are HURT.
total inst-and-stalls in shared programs: 13753165 -> 13722756 (-0.22%)
inst-and-stalls in affected programs: 3340383 -> 3309974 (-0.91%)
helped: 12782
HURT: 1666
Inst-and-stalls are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9769>
So for example, on v3dv_CmdDrawIndexed we can return early if
instanceCount is 0.
This fixes failures when using the simulator with tests with the
following pattern:
dEQP-VK.draw.instanced.draw_indexed_vk_primitive_topology*
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9820>
We already have code to lower away these to something we don't need so
much special-case code to handle.
The sad part here is that we generate slightly worse code; trees of
OpLogicalAnd / OpLogicalOr instead of OpAny OpAll. But I would imagine
that for any GPU where this matters, these would be easy to combine
back, so I'm not losing a lot of sleep over this.
But this makes things simpler. And if we *really* care about OpAny or
OpAll, we should add NIR ALU instructions for them so we can optimize
them on other places as well, not open-code all places where it could
improve things.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9797>
Specifically, fix this error (which is covered in existing tests):
../src/compiler/glsl/glcpp/pp.c:198:28: runtime error: applying non-zero offset 1 to null pointer
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../src/compiler/glsl/glcpp/pp.c:198:28 in
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9669>