First, we need to give the parent_instr field a unique name to be able to
replace with a helper. We have parent_instr fields for both nir_src and
nir_def, so let's rename nir_src::parent_instr in preparation for rework.
This was done with a combination of sed and manual fix-ups.
Then we use semantic patches plus manual fixups:
@@
expression s;
@@
-s->renamed_parent_instr
+nir_src_parent_instr(s)
@@
expression s;
@@
-s.renamed_parent_instr
+nir_src_parent_instr(&s)
@@
expression s;
@@
-s->parent_if
+nir_src_parent_if(s)
@@
expression s;
@@
-s.renamed_parent_if
+nir_src_parent_if(&s)
@@
expression s;
@@
-s->is_if
+nir_src_is_if(s)
@@
expression s;
@@
-s.is_if
+nir_src_is_if(&s)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24671>
Instead, we replace it directly with nir_def. We could replace it with
nir_dest but the next commit gets rid of that so this avoids unnecessary
churn. Most of this commit was generated by sed:
sed -i -e 's/dest.dest.ssa/def/g' src/**/*.h src/**/*.c src/**/*.cpp
There were a few manual fixups required in the nir_legacy.c and
nir_from_ssa.c as nir_legacy_reg and nir_parallel_copy_entry both have a
similar pattern.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>
Since 624e799cc3 ("nir: Drop nir_ssa_def::name and nir_register::name"), SSA
defs don't have names, making the name argument unused. Drop it from the
signature and fix the call sites. This was done with the help of the following
Coccinelle semantic patch:
@@
expression A, B, C, D, E;
@@
-nir_ssa_dest_init(A, B, C, D, E);
+nir_ssa_dest_init(A, B, C, D);
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23078>
Builds on the work of !15121. This gets to delete even more code
because many drivers shared a lot of code for i2b and f2b.
No shader-db or fossil-db changes on any Intel platform.
v2: Rebase on 1a35acd8d9.
v3: Update a comment in nir_opcodes_c.py. Suggested by Konstantin.
v4: Another rebase. Remove f2b stuff from Midgard.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20509>
There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b. Instead of adding a huge
pile of additional patterns (including variations that include both ine
and i2b), always lower i2b to a != 0.
At this point in the series, it should be impossible for anything to
generate i2b, so there /should not/ be any changes.
The failing test on d3d12 is a pre-existing bug that is triggered by
this change. I talked to Jesse about it, and, after some analysis, he
suggested just adding it to the list of known failures.
v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b.
v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py.
v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after
nir_lower_doubles makes progress. The latter can generate b2i
instructions, but nir_lower_int64 can't handle them (anymore).
v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I
had accidentally removed the f2b(bf2(x)) optimization.
v6: Just eliminate the i2b instruction.
v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused)
emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this
function was still used. 🤷
No shader-db changes on any Intel platform.
All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141165875 -> 141165873 (-0.0%)
Instructions helped: 2
Cycles in all programs: 9098956382 -> 9098956350 (-0.0%)
Cycles helped: 2
The two Vulkan shaders are helped because of the "new" (('b2i32',
('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern.
Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
Soon we'll be allocating instructions out of a per-shader pool, which
means that if we don't free too many instructions during the main
optimization loop, the final nir_sweep() call will create holes which
can't be filled. By freeing instructions more aggressively, we can
allocate more instructions from the freelist which will reduce the final
memory usage.
Modified from Connor Abbott's original patch to rebase on top of
refactored DCE and so that the use-after-free in nir_algebraic_impl() is
fixed.
Co-authored-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12910>
This helps concentrate the dirty pages from the relocations, reduces how
many relocations there are, and reduces the size of each variable assuming
variables mostly don't have conditions or the conditions are mostly
reused). Reduces libvulkan_intel.so size by 49kb.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
This helps concentrate the dirty pages from the relocations, reduces how
many relocations there are, and reduces the size of each expression
(assuming expressions mostly don't have conditions or the conditions are
mostly reused). Reduces libvulkan_intel.so size by 8.7kb.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
Even with packing all 3 types into a 40-byte union (nir_search_constant
being 24 bytes and nir_search_expression having formerly been 32), and
having a single array of them, this cuts 1.7MB from each of
libvulkan_intel.so and libgallium_dri.so.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13987>
Many places need to know the maximum or minimum possible value for a
given size integer... so everyone just open-codes their favorite
version. There is some potential to hit either undefined or
implementation-defined behavior, so having one version that Just Works
seems beneficial.
v2: Fix copy-and-pasted bug (INT64_MAX instead of INT64_MIN) in
u_intmin. Noticed by CI. Lol. Rename functions
`s/u_(uint|int)(min|max)/u_\1N_\2/g`. Suggested by Jason. Add some
unit tests that would have caught the copy-and-paste bug before wasting
CI time. Change the implementation of u_intN_min to use the same
pattern as stdint.h. This avoids the integer division. Noticed by
Jason.
v3: Add changes to convert_clear_color
(src/gallium/drivers/iris/iris_clear.c). Suggested by Nanley.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12177>
Basically every pass in NIR uses nir_ssa_def_rewrite_uses which calls
nir_instr_rewrite_src which is fairly complex because it handles all
sorts of non-SSA cases. Since we already know a priori that every
source written by nir_ssa_def_rewrite_uses is SSA, we can check new_src
once at the top of the function and cut out all that complexity.
While we're at it, we expose a new SSA-only nir_ssa_def_rewrite_uses_ssa
helper which takes an SSA def which avoids the one SSA check. It's also
more convenient 90% of the time.
Compile time as tested by Rhys Perry <pendingchaos02@gmail.com>
Difference at 95.0% confidence
-797.166 +/- 418.649
-0.566174% +/- 0.296441%
(Student's t, pooled s = 325.459)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8790>
If the expression tree that is being replaced has a unary operation at
its root, set the cursor (location where new instructions are inserted)
at the source instruction instead.
This doesn't do much now because there are very few patterns that have a
unary operation as the root. Almost all of the patterns that do have a
unary operation as the root have inot. All of the shaders that are
affected by this commit have expression trees with an inot at the root.
This change prevents some significant, spurious caused by the next
commit. There is further explanation in the large comment added in
the code.
I also considered a couple other options that may still be worth exploring.
1. Add some mark-up to the search pattern to denote where new
instructions should be added. I considered using "@" to denote the
cursor location. For example,
(('fneg', ('fadd@', a, b)), ...)
2. To prevent other kinds of unintended code motion, add the ability to
name expressions in the search pattern so that they can be reused in
the replacement. For example,
(('bcsel', ('ige', ('find_lsb=b', a), 0), ('find_lsb', a), -1), b),
An alternative would be to add some kind of CSE at the time of
inserting the replacements. Create a new instruction, then check to
see if it already exists. That option might be better overall.
Over the years I know Matt has heard me complain, "I added a pattern
that just deleted an instruction, but it added a bunch of spills!" This
was always in large, complex shaders that are very hard to analyze. I
always blamed these cases on the scheduler being dumb. I am now very
suspicious that unintended code motion was the real problem.
All Gen4+ Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 17611405 -> 17611333 (<.01%)
instructions in affected programs: 18613 -> 18541 (-0.39%)
helped: 41
HURT: 13
helped stats (abs) min: 1 max: 18 x̄: 4.46 x̃: 4
helped stats (rel) min: 0.27% max: 5.68% x̄: 1.29% x̃: 1.34%
HURT stats (abs) min: 1 max: 20 x̄: 8.54 x̃: 7
HURT stats (rel) min: 0.30% max: 4.20% x̄: 2.15% x̃: 2.38%
95% mean confidence interval for instructions value: -3.29 0.63
95% mean confidence interval for instructions %-change: -0.95% 0.02%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 338366118 -> 338365223 (<.01%)
cycles in affected programs: 257889 -> 256994 (-0.35%)
helped: 42
HURT: 15
helped stats (abs) min: 2 max: 120 x̄: 39.38 x̃: 34
helped stats (rel) min: 0.04% max: 2.55% x̄: 0.86% x̃: 0.76%
HURT stats (abs) min: 6 max: 204 x̄: 50.60 x̃: 34
HURT stats (rel) min: 0.11% max: 4.75% x̄: 1.12% x̃: 0.56%
95% mean confidence interval for cycles value: -30.39 -1.02
95% mean confidence interval for cycles %-change: -0.66% -0.02%
Cycles are helped.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>
This introduces new vec8 and vec16 instructions (which are the only
instructions taking more than 4 sources), in order to construct 8 and 16
component vectors.
In order to avoid fixing up the non-autogenerated nir_build_alu() sites
and making them pass 16 src args for the benefit of the two instructions
that take more than 4 srcs (ie vec8 and vec16), nir_build_alu() is has
nir_build_alu_tail() split out and re-used by nir_build_alu2() (which is
used for the > 4 src args case).
v2 (Karol Herbst):
use nir_build_alu2 for vec8 and vec16
use python's array multiplication syntax
add nir_op_vec helper
simplify nir_vec
nir_build_alu_tail -> nir_builder_alu_instr_finish_and_insert
use nir_build_alu for opcodes with <= 4 sources
v3 (Karol Herbst):
fix nir_serialize
v4 (Dave Airlie):
fix serialization of glsl_type
handle vec8/16 in lowering of bools
v5 (Karol Herbst):
fix load store vectorizer
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The algebraic pass was exhibiting O(n^2) behavior in
dEQP-GLES2.functional.uniform_api.random.3 and
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 (along with
other code-generated tests, and likely real-world loop-unroll cases).
In the process of using fmul(b2f(x), b2f(x)) -> b2f(iand(x, y)) to
transform:
result = b2f(a == b);
result *= b2f(c == d);
...
result *= b2f(z == w);
->
temp = (a == b)
temp = temp && (c == d)
...
temp = temp && (z == w)
result = b2f(temp);
nir_opt_algebraic, proceeding bottom-to-top, would match and convert
the top-most fmul(b2f(), b2f()) case each time, leaving the new b2f to
be matched by the next fmul down on the next time algebraic got run by
the optimization loop.
Back in 2016 in 7be8d07732 ("nir: Do opt_algebraic in reverse
order."), Matt changed algebraic to go bottom-to-top so that we would
match the biggest patterns first. This helped his cases, but I
believe introduced this failure mode. Instead of reverting that, now
that we've got the automaton, we can update the automaton's state
recursively and just re-process any instructions whose state has
changed (indicating that they might match new things). There's a
small chance that the state will hash to the same value and miss out
on this round of algebraic, but this seems to be good enough to fix
dEQP.
Effects with NIR_VALIDATE=0 (improvement is better with validation enabled):
Intel shader-db runtime -0.954712% +/- 0.333844% (n=44/46, obvious throttling
outliers removed)
dEQP-GLES2.functional.uniform_api.random.3 runtime
-65.3512% +/- 4.22369% (n=21, was 1.4s)
dEQP-GLES31.functional.ubo.random.all_per_block_buffers.13 runtime
-68.8066% +/- 6.49523% (was 4.8s)
v2: Use two worklists, suggested by @cwabbott, to cut out a bunch of
tricky code. Runtime of uniform_api.random.3 down -0.790299% +/-
0.244213% compred to v1.
v3: Re-add the nir_instr_remove() that I accidentally dropped in v2,
fixing infinite loops.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
My motivation was to clarify the changes in the following commit, but
incidentally, it reduces runtime of
dEQP-GLES2.functional.uniform_api.random.3 (an algebraic-heavy
testcase) by -5.39524% +/- 2.21179% (n=15)
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
In order to have nir_opt_algebraic be able to do further algebraic
work on the output of a replacement, we need to maintain the
automaton's state.
Reviewed-by: Eric Anholt <eric@anholt.net>
Working on the algebraic implementation, I was being driven nuts by my
editor not highlighting and handling indentation for the C code. It turns
out that it's basically not pass-specific code, and we can move it over to
the relevant .c file. Replaces 30KB of code with 34KB of data on my i965
build. No perf diff on shader-db (n=3)
Reviewed-by: Ian Romanick <ian.d.romainck@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
This lets us memoize range analysis work across instructions. Reduces
runtime of shader-db on Intel by -30.0288% +/- 2.1693% (n=3).
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Having passes generate these is just making more work for copy
propagation (and thus probably calling more optimization passes)
later. Noticed while trying to debug nir_opt_algebraic()
top-to-bottom having O(n^2) behavior due to not finding new matches in
replacement code.
Reviewed-by: Ian Romanick <ian.d.romainck@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
MAYBE_UNUSED is going away, so let's replace legitimate uses of it with
UNUSED, which the former aliased to so far anyway.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
No shader-db change on any Intel platform. No shader-db run-time
difference on a certain 36-core / 72-thread system at 95% confidence
(n=20).
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>