Otherwise, if the base type is (for example) uint32, we would
incorrectly think that PoT optimizations could not apply.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstsrand <jason@jleksrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Fixes: 09705747d7 ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern")
This lets us memoize range analysis work across instructions. Reduces
runtime of shader-db on Intel by -30.0288% +/- 2.1693% (n=3).
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Most integer operations are omitted because dealing with integer
overflow is hard. There are a few things that could be smarter if there
was a small amount more tracking of ranges of integer types (i.e.,
operands are Boolean, operand values fit in 16 bits, etc.).
The changes to nir_search_helpers.h are included in this patch to
simplify reordering the changes to nir_opt_algebraic.py.
v2: Memoize range analysis results. Without this, some shaders appear
to get stuck in infinite loops.
v3: Rebase on many months of Mesa changes, including 1-bit Boolean
changes.
v4: Rebase on "nir: Drop imov/fmov in favor of one mov instruction".
v5: Use nir_alu_srcs_equal for detecting (a*a). Previously just the SSA
value was compared, and this incorrectly matched (a.x*a.y).
v6: Many code improvements including (but not limited to) better names,
more comments, and better use of helper functions. All suggested by
Caio. Rework the handling of several opcodes to use a table for mapping
source ranges to a result range. This change fixed a bug that caused
fmax(gt_zero, ge_zero) to be incorrectly recognized as ge_zero.
Slightly tighten the range of fmul by recognizing that x*x is gt_zero if
x is gt_zero. Add similar handling for -x*x.
v7: Use _______ in the tables as an alias for unknown. Suggested by
Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The helpers are needed so we can use the syntax `instr(cond)` in the
algebraic rules. Add simple rule for dropping a pair of mul-div of
the same value when wrapping is guaranteed to not happen.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For umul_low (al * bl), zero is returned if the low 16-bits word of either
source is zero.
for imadsh_mix16 (ah * bl << 16 + c), c is returned if either 'ah' or 'bl'
is zero.
A couple of nir_search_helpers are added:
is_upper_half_zero() returns true if the highest word of all components of
an integer NIR alu src are zero.
is_lower_half_zero() returns true if the lowest word of all components of
an integer nir alu src are zero.
Reviewed-by: Eric Anholt <eric@anholt.net>
The goal is to avoid having an extra MOV instruction to perform the
saturate. Doing the subtraction first allows the saturate to be applied
to the ADD instruction making the MOV unnecessary. Values generated in
different block and values from non-ALU instructions (e.g., texture
instructions) almost always need the extra MOV.
Multiply instructions are restricted because doing this rearrangement
can interfere with the generation of flrp and ffma instructions.
v2: Now that the final method has been selected, squash three commits
into one.
All Intel platforms has similar results. (Ice Lake shown)
total instructions in shared programs: 17223214 -> 17219386 (-0.02%)
instructions in affected programs: 1524376 -> 1520548 (-0.25%)
helped: 2686
HURT: 26
helped stats (abs) min: 1 max: 32 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.03% max: 16.67% x̄: 0.54% x̃: 0.37%
HURT stats (abs) min: 1 max: 2 x̄: 1.69 x̃: 2
HURT stats (rel) min: 0.33% max: 1.67% x̄: 0.54% x̃: 0.35%
95% mean confidence interval for instructions value: -1.46 -1.36
95% mean confidence interval for instructions %-change: -0.56% -0.50%
Instructions are helped.
total cycles in shared programs: 360811571 -> 360791896 (<.01%)
cycles in affected programs: 103650214 -> 103630539 (-0.02%)
helped: 1557
HURT: 675
helped stats (abs) min: 1 max: 1773 x̄: 41.44 x̃: 16
helped stats (rel) min: <.01% max: 26.77% x̄: 1.37% x̃: 0.64%
HURT stats (abs) min: 1 max: 1513 x̄: 66.44 x̃: 14
HURT stats (rel) min: <.01% max: 46.16% x̄: 2.00% x̃: 0.49%
95% mean confidence interval for cycles value: -14.82 -2.81
95% mean confidence interval for cycles %-change: -0.50% -0.20%
Cycles are helped.
LOST: 2
GAINED: 0
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
This pass attempts to dectect code sequences like
if (x < y) {
z = y - x;
...
}
and replace them with sequences like
t = x - y;
if (t < 0) {
z = -t;
...
}
On architectures where the subtract can generate the flags used by the
if-statement, this saves an instruction. It's also possible that moving
an instruction out of the if-statement will allow
nir_opt_peephole_select to convert the whole thing to a bcsel.
Currently only floating point compares and adds are supported. Adding
support for integer will be a challenge due to integer overflow. There
are a couple possible solutions, but they may not apply to all
architectures.
v2: Fix a typo in the commit message and a couple typos in comments.
Fix possible NULL pointer deref from result of push_block(). Add
missing (-A + B) case. Suggested by Caio.
v3: Fix is_not_const_zero to work correctly with types other than
nir_type_float32. Suggested by Ken.
v4: Add some comments explaining how this works. Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This not only makes them safe for more bit sizes but it also fixes a bug
in is_zero_to_one where it would return true for constant NaN.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
V2: mark float opts as inexact
If one of the inputs to an mul/add is the result of another
mul/add there is a chance that we can reuse the result of that
mul/add in other calls if we do the multiplication in the right
order.
Also by attempting to move all constants to the top we increase
the chance of constant folding.
For example it is a fairly common pattern for shaders to do something
similar to this:
const float a = 0.5;
in vec4 b;
in float c;
...
b.x = b.x * c;
b.y = b.y * c;
...
b.x = b.x * a + a;
b.y = b.y * a + a;
So by simply detecting that constant a is part of the multiplication
in ffma and switching it with previous fmul that updates b we end up
with:
...
c = a * c;
...
b.x = b.x * c + a;
b.y = b.y * c + a;
Shader-db results BDW:
total instructions in shared programs: 13011050 -> 12967888 (-0.33%)
instructions in affected programs: 4118366 -> 4075204 (-1.05%)
helped: 17739
HURT: 1343
total cycles in shared programs: 246717952 -> 246410716 (-0.12%)
cycles in affected programs: 166870802 -> 166563566 (-0.18%)
helped: 18493
HURT: 7965
total spills in shared programs: 14937 -> 14560 (-2.52%)
spills in affected programs: 9331 -> 8954 (-4.04%)
helped: 284
HURT: 33
total fills in shared programs: 20211 -> 19671 (-2.67%)
fills in affected programs: 12586 -> 12046 (-4.29%)
helped: 286
HURT: 33
LOST: 39
GAINED: 33
Some of the hurt will go away when we shuffle things back down to the
bottom in the following patch. It's also noteworthy that almost all of the
spill changes are in Deus Ex both hurt and helped.
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Didn't turn out as useful as I'd hoped, but it will help alot more on
i965 by reducing regressions when we drop brw_do_channel_expressions()
and brw_do_vector_splitting().
I'm not sure how much sense 'is_not_used_by_conditional' makes on
platforms other than i965 but since this is a new opt it at least
won't do any harm.
shader-db BDW:
total instructions in shared programs: 13029581 -> 13029415 (-0.00%)
instructions in affected programs: 15268 -> 15102 (-1.09%)
helped: 86
HURT: 0
total cycles in shared programs: 247038346 -> 247036198 (-0.00%)
cycles in affected programs: 692634 -> 690486 (-0.31%)
helped: 183
HURT: 27
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Otherwise we will end up with an extra instruction to compare the
result of the inot.
On BDW:
total instructions in shared programs: 13060620 -> 13060481 (-0.00%)
instructions in affected programs: 103379 -> 103240 (-0.13%)
helped: 127
HURT: 0
total cycles in shared programs: 256590950 -> 256587408 (-0.00%)
cycles in affected programs: 11324730 -> 11321188 (-0.03%)
helped: 114
HURT: 21
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We turn these from bcsel into inot/b2f combos in order for other
optimisation passes to get further. Once we have finished turn
the ones that remain and are used in more than a single expression
back into a bcsel.
On BDW:
total instructions in shared programs: 13060965 -> 13060297 (-0.01%)
instructions in affected programs: 835701 -> 835033 (-0.08%)
helped: 670
HURT: 2
total cycles in shared programs: 256599536 -> 256598006 (-0.00%)
cycles in affected programs: 114655488 -> 114653958 (-0.00%)
helped: 419
HURT: 240
LOST: 0
GAINED: 1
The 2 HURT is because inserting bcsel creates the only use of
const 1.0 in two shaders from tri-of-friendship-and-madness.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Some optimizations, like converting integer multiply/divide into left/
right shifts, have additional constraints on the search expression.
Like requiring that a variable is a constant power of two. Support
these cases by allowing a fxn name to be appended to the search var
expression (ie. "a#32(is_power_of_two)").
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>