nir,radeonsi: move ffma fusing to late optimizations for better codegen
The freedreno trace changes were suggested by Rob Clark. ALU performance is higher, because ffma is used more often, but so is register usage, because trinary opcodes (such as ffma) usually need at least 3 live registers. 54793 shaders in 33659 tests Totals: SGPRS: 2639746 -> 2642938 (0.12 %) VGPRS: 1534120 -> 1536392 (0.15 %) Spilled SGPRs: 3541 -> 3618 (2.17 %) Spilled VGPRs: 33 -> 44 (33.33 %) Scratch size: 292 -> 312 (6.85 %) dwords per thread Code Size: 55639836 -> 55620116 (-0.04 %) bytes Max Waves: 964785 -> 963977 (-0.08 %) Totals from affected shaders: SGPRS: 1105800 -> 1108992 (0.29 %) VGPRS: 635292 -> 637564 (0.36 %) Spilled SGPRs: 3193 -> 3270 (2.41 %) Spilled VGPRs: 33 -> 44 (33.33 %) Scratch size: 36 -> 56 (55.56 %) dwords per thread Code Size: 31568708 -> 31548988 (-0.06 %) bytes Max Waves: 319991 -> 319183 (-0.25 %) Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6596>
This commit is contained in:
@@ -194,7 +194,8 @@ optimizations.extend([
|
||||
(('ffract', a), ('fsub', a, ('ffloor', a)), 'options->lower_ffract'),
|
||||
(('fceil', a), ('fneg', ('ffloor', ('fneg', a))), 'options->lower_fceil'),
|
||||
(('ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->lower_ffma'),
|
||||
(('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'),
|
||||
# Always lower inexact ffma, because it will be fused back by late optimizations (nir_opt_algebraic_late).
|
||||
(('~ffma', a, b, c), ('fadd', ('fmul', a, b), c), 'options->fuse_ffma'),
|
||||
|
||||
(('~fmul', ('fadd', ('iand', ('ineg', ('b2i', 'a@bool')), ('fmul', b, c)), '#d'), '#e'),
|
||||
('bcsel', a, ('fmul', ('fadd', ('fmul', b, c), d), e), ('fmul', d, e))),
|
||||
@@ -2027,6 +2028,7 @@ late_optimizations = [
|
||||
(('fneg', a), ('fsub', 0.0, a), 'options->lower_negate'),
|
||||
(('ineg', a), ('isub', 0, a), 'options->lower_negate'),
|
||||
(('iabs', a), ('imax', a, ('ineg', a)), 'options->lower_iabs'),
|
||||
(('~fadd', ('fmul', a, b), c), ('ffma', a, b, c), 'options->fuse_ffma'),
|
||||
|
||||
# These are duplicated from the main optimizations table. The late
|
||||
# patterns that rearrange expressions like x - .5 < 0 to x < .5 can create
|
||||
|
Reference in New Issue
Block a user