pan/bi: Optimize bitwise arithmetic of booleans
This is easier to schedule on Bifrost. In theory it's also better on Valhall,
but in practice the CVT unit is too overloaded on Valhall for this to help
at the moment. We can revisit these rules for Valhall in the future where the
Valhall optimizer is more mature and/or Valhall grows a scheduler to balance the
execution units.
total instructions in shared programs: 2415350 -> 2414877 (-0.02%)
instructions in affected programs: 120948 -> 120475 (-0.39%)
helped: 192
HURT: 49
helped stats (abs) min: 1.0 max: 5.0 x̄: 2.89 x̃: 4
helped stats (rel) min: 0.25% max: 4.35% x̄: 0.66% x̃: 0.52%
HURT stats (abs) min: 1.0 max: 3.0 x̄: 1.67 x̃: 1
HURT stats (rel) min: 0.11% max: 7.14% x̄: 1.73% x̃: 0.77%
95% mean confidence interval for instructions value: -2.24 -1.68
95% mean confidence interval for instructions %-change: -0.37% 0.02%
Inconclusive result (%-change mean confidence interval includes 0).
total tuples in shared programs: 1928474 -> 1927478 (-0.05%)
tuples in affected programs: 146482 -> 145486 (-0.68%)
helped: 514
HURT: 73
helped stats (abs) min: 1.0 max: 8.0 x̄: 2.11 x̃: 1
helped stats (rel) min: 0.18% max: 9.52% x̄: 1.35% x̃: 0.76%
HURT stats (abs) min: 1.0 max: 2.0 x̄: 1.23 x̃: 1
HURT stats (rel) min: 0.15% max: 7.14% x̄: 1.07% x̃: 0.76%
95% mean confidence interval for tuples value: -1.85 -1.55
95% mean confidence interval for tuples %-change: -1.19% -0.91%
Tuples are helped.
total clauses in shared programs: 354985 -> 354853 (-0.04%)
clauses in affected programs: 8562 -> 8430 (-1.54%)
helped: 124
HURT: 22
helped stats (abs) min: 1.0 max: 8.0 x̄: 1.24 x̃: 1
helped stats (rel) min: 0.83% max: 7.14% x̄: 2.47% x̃: 1.72%
HURT stats (abs) min: 1.0 max: 1.0 x̄: 1.00 x̃: 1
HURT stats (rel) min: 1.25% max: 20.00% x̄: 5.08% x̃: 4.35%
95% mean confidence interval for clauses value: -1.11 -0.70
95% mean confidence interval for clauses %-change: -1.92% -0.75%
Clauses are helped.
total cycles in shared programs: 166575.48 -> 166542.56 (-0.02%)
cycles in affected programs: 4556.58 -> 4523.67 (-0.72%)
helped: 395
HURT: 65
helped stats (abs) min: 0.041665999999999315 max: 0.33333199999999863 x̄: 0.09 x̃: 0
helped stats (rel) min: 0.19% max: 11.11% x̄: 1.42% x̃: 0.81%
HURT stats (abs) min: 0.041665999999999315 max: 0.08333400000000069 x̄: 0.05 x̃: 0
HURT stats (rel) min: 0.15% max: 8.33% x̄: 1.21% x̃: 0.83%
95% mean confidence interval for cycles value: -0.08 -0.06
95% mean confidence interval for cycles %-change: -1.22% -0.87%
Cycles are helped.
total arith in shared programs: 73687.88 -> 73643 (-0.06%)
arith in affected programs: 6339 -> 6294.13 (-0.71%)
helped: 570
HURT: 72
helped stats (abs) min: 0.041665999999999315 max: 0.3333340000000007 x̄: 0.08 x̃: 0
helped stats (rel) min: 0.19% max: 12.50% x̄: 1.41% x̃: 0.77%
HURT stats (abs) min: 0.041665999999999315 max: 0.08333400000000069 x̄: 0.05 x̃: 0
HURT stats (rel) min: 0.15% max: 8.33% x̄: 1.13% x̃: 0.75%
95% mean confidence interval for arith value: -0.08 -0.06
95% mean confidence interval for arith %-change: -1.27% -0.98%
Arith are helped.
total quadwords in shared programs: 1674486 -> 1673974
(-0.03%)
quadwords in affected programs: 117696 -> 117184 (-0.44%)
helped: 424
HURT: 127
helped stats (abs) min: 1.0 max: 6.0 x̄: 1.64 x̃: 1
helped stats (rel) min: 0.19% max: 4.88% x̄: 1.00% x̃: 0.82%
HURT stats (abs) min: 1.0 max: 5.0 x̄: 1.46 x̃: 1
HURT stats (rel) min: 0.15% max: 6.25% x̄: 1.31% x̃: 0.88%
95% mean confidence interval for quadwords value: -1.07 -0.79
95% mean confidence interval for quadwords %-change: -0.58% -0.36%
Quadwords are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17266>
This commit is contained in:

committed by
Marge Bot

parent
e348985cd3
commit
1ef20f1f35
@@ -4571,6 +4571,10 @@ bi_optimize_nir(nir_shader *nir, unsigned gpu_id, bool is_blend)
|
||||
NIR_PASS(progress, nir, nir_opt_cse);
|
||||
}
|
||||
|
||||
/* This opt currently helps on Bifrost but not Valhall */
|
||||
if (gpu_id < 0x9000)
|
||||
NIR_PASS(progress, nir, bifrost_nir_opt_boolean_bitwise);
|
||||
|
||||
NIR_PASS(progress, nir, nir_lower_alu_to_scalar, bi_scalarize_filter, NULL);
|
||||
NIR_PASS(progress, nir, nir_lower_phis_to_scalar, true);
|
||||
NIR_PASS(progress, nir, nir_opt_vectorize, bi_vectorize_filter, NULL);
|
||||
|
@@ -27,3 +27,4 @@
|
||||
|
||||
bool bifrost_nir_lower_algebraic_late(nir_shader *shader);
|
||||
bool bifrost_nir_lower_xfb(nir_shader *shader);
|
||||
bool bifrost_nir_opt_boolean_bitwise(nir_shader *shader);
|
||||
|
@@ -28,6 +28,20 @@ a = 'a'
|
||||
b = 'b'
|
||||
c = 'c'
|
||||
|
||||
# In general, bcsel is cheaper than bitwise arithmetic on Mali. On
|
||||
# Bifrost, we can implement bcsel as either CSEL or MUX to schedule to either
|
||||
# execution unit. On Valhall, bitwise arithmetic may be on the SFU whereas MUX
|
||||
# is on the higher throughput CVT unit. We get a zero argument for free relative
|
||||
# to the bitwise op, which would be LSHIFT_* internally taking a zero anyway.
|
||||
#
|
||||
# As such, it's beneficial to reexpress bitwise arithmetic of booleans as bcsel.
|
||||
opt_bool_bitwise = [
|
||||
(('iand', 'a@1', 'b@1'), ('bcsel', a, b, False)),
|
||||
(('ior', 'a@1', 'b@1'), ('bcsel', a, a, b)),
|
||||
(('iand', 'a@1', ('inot', 'b@1')), ('bcsel', b, 0, a)),
|
||||
(('ior', 'a@1', ('inot', 'b@1')), ('bcsel', b, a, True)),
|
||||
]
|
||||
|
||||
algebraic_late = [
|
||||
# Canonical form. The scheduler will convert back if it makes sense.
|
||||
(('fmul', a, 2.0), ('fadd', a, a)),
|
||||
@@ -69,6 +83,8 @@ def run():
|
||||
|
||||
print('#include "bifrost_nir.h"')
|
||||
|
||||
print(nir_algebraic.AlgebraicPass("bifrost_nir_opt_boolean_bitwise",
|
||||
opt_bool_bitwise).render())
|
||||
print(nir_algebraic.AlgebraicPass("bifrost_nir_lower_algebraic_late",
|
||||
algebraic_late).render())
|
||||
|
||||
|
Reference in New Issue
Block a user