Commit Graph

826 Commits

Author SHA1 Message Date
Francisco Jerez
14e1b9ee69 intel/fs/xe2+: Update TES payload setup for Xe2 reg size.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Ian Romanick
0b23df3951 intel/compiler/xe2: Update fs_visitor::setup_vs_payload to account for Xe2 reg size
[ Francisco Jerez: Simplify. ]

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
a573531785 intel/compiler/xe2+: Represent dispatch_grf_start_reg in native GRF units.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
17ef5e7ead intel/fs/xe2+: Allow increased SIMD width for various get_fpu_lowered_simd_width() restrictions.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
421d43fe62 intel/fs/xe2+: Fixes for increased accumulator register width.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
bd98df5d8e intel/compiler: Make MAX_VGRF_SIZE macro depend on devinfo and update it for Xe2.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Iván Briano
4eddeea7bf intel/fs: handle URB setup for fast linked mesh pipelines
Up until now, the mesh pipeline assumed it would be always linked to the
fragment shader, and so the calculated MUE map would always be
available.
That is not the case for fast linked pipeline libraries, so the URB
setup needs to account for this. We do this by replicating what's done
for non-mesh pipelines, defining the URB based on the FS inputs, and
always assuming they will be laid out in order of varying number, except
that we also account for per-primitive attributes.

Fixes all GPL using tests under dEQP-VK.mesh_shader.ext.smoke.*

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25047>
2023-09-12 02:51:31 +00:00
Lionel Landwerlin
c9739e8912 intel/fs: limit register flag interaction of FIND_*LIVE_CHANNEL
Those instructions do not access the flag registers on Gfx8+. Removing
the interaction enables CSE to remove more of those instructions.

Results are a bit mixed (DG2 vulkan fossils):

  ACO:
  Totals from 127 (5.97% of 2128) affected shaders:
  Instrs: 139966 -> 138972 (-0.71%); split: -0.85%, +0.14%
  Cycles: 1685747 -> 1667480 (-1.08%); split: -2.35%, +1.26%
  Max live registers: 10582 -> 10544 (-0.36%)
  Max dispatch width: 1048 -> 1040 (-0.76%)

  Cyberpunk 2077:
  Totals from 2879 (27.95% of 10301) affected shaders:
  Instrs: 4264789 -> 4225666 (-0.92%); split: -1.01%, +0.09%
  Cycles: 72380209 -> 71619521 (-1.05%); split: -1.63%, +0.58%
  Subgroup size: 30624 -> 30632 (+0.03%)
  Spill count: 98 -> 101 (+3.06%)
  Fill count: 90 -> 93 (+3.33%)
  Scratch Memory Size: 8192 -> 9216 (+12.50%)
  Max live registers: 217807 -> 217098 (-0.33%); split: -0.59%, +0.26%
  Max dispatch width: 23792 -> 24112 (+1.34%)

  Gaining 40 SIMD16 shaders

  Rise Of The Tomb Raider:
  Totals from 622 (5.06% of 12289) affected shaders:
  Instrs: 437380 -> 434760 (-0.60%); split: -0.72%, +0.12%
  Cycles: 261843085 -> 261580703 (-0.10%); split: -0.73%, +0.63%
  Max live registers: 27731 -> 27766 (+0.13%); split: -1.01%, +1.14%
  Max dispatch width: 5832 -> 5432 (-6.86%); split: +0.27%, -7.13%

  Loosing 26 SIMD32 shaders

  Strange Brigade:
  Totals from 1298 (31.48% of 4123) affected shaders:
  Instrs: 1504408 -> 1487968 (-1.09%); split: -1.17%, +0.08%
  Cycles: 20735976 -> 20443216 (-1.41%); split: -1.60%, +0.19%
  Max live registers: 89911 -> 89957 (+0.05%)

DG2 shader-db run:

  total instructions in shared programs: 23130895 -> 23130036 (<.01%)
  instructions in affected programs: 260956 -> 260097 (-0.33%)
  helped: 234
  HURT: 101
  helped stats (abs) min: 1 max: 54 x̄: 6.36 x̃: 4
  helped stats (rel) min: 0.05% max: 8.16% x̄: 2.01% x̃: 1.90%
  HURT stats (abs)   min: 1 max: 37 x̄: 6.23 x̃: 3
  HURT stats (rel)   min: 0.02% max: 5.67% x̄: 0.89% x̃: 0.55%
  95% mean confidence interval for instructions value: -3.62 -1.51
  95% mean confidence interval for instructions %-change: -1.33% -0.94%
  Instructions are helped.

  total loops in shared programs: 6071 -> 6071 (0.00%)
  loops in affected programs: 0 -> 0
  helped: 0
  HURT: 0

  total cycles in shared programs: 898610645 -> 898557166 (<.01%)
  cycles in affected programs: 18308201 -> 18254722 (-0.29%)
  helped: 315
  HURT: 48
  helped stats (abs) min: 1 max: 19312 x̄: 404.23 x̃: 128
  helped stats (rel) min: 0.02% max: 28.98% x̄: 3.92% x̃: 2.65%
  HURT stats (abs)   min: 2 max: 14478 x̄: 1538.60 x̃: 409
  HURT stats (rel)   min: <.01% max: 23.24% x̄: 3.34% x̃: 0.41%
  95% mean confidence interval for cycles value: -333.68 39.03
  95% mean confidence interval for cycles %-change: -3.51% -2.41%
  Inconclusive result (value mean confidence interval includes 0).

  total spills in shared programs: 5964 -> 5964 (0.00%)
  spills in affected programs: 0 -> 0
  helped: 0
  HURT: 0

  total fills in shared programs: 6909 -> 6909 (0.00%)
  fills in affected programs: 0 -> 0
  helped: 0
  HURT: 0

  total sends in shared programs: 1040266 -> 1040266 (0.00%)
  sends in affected programs: 0 -> 0
  helped: 0
  HURT: 0

  LOST:   3
  GAINED: 1

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24553>
2023-09-06 14:47:40 +00:00
Kenneth Graunke
08fc4603dd intel/fs: Dump IR for pre-RA scheduler modes in DEBUG_OPTIMIZER
This lets us more easily compare and contrast the various scheduling
options that the compiler considered.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
2023-08-23 21:34:38 +00:00
Kenneth Graunke
07f2ad32e4 intel/fs: Pick the lowest register pressure schedule when spilling
We try various pre-RA scheduler modes and see if any of them allow
us to register allocate without spilling.  If all of them spill,
however, we left it on the last mode: LIFO.  This is unfortunately
sometimes significantly worse than other modes (such as "none").

This patch makes us instead select the pre-RA scheduling mode that
gives the lowest register pressure estimate, if none of them manage
to avoid spilling.  The hope is that this scheduling will spill the
least out of all of them.

fossil-db stats (on Alchemist) speak for themselves:

    Totals:
    Instrs: 197297092 -> 195326552 (-1.00%); split: -1.02%, +0.03%
    Cycles: 14291286956 -> 14303502596 (+0.09%); split: -0.55%, +0.64%
    Spill count: 190886 -> 129204 (-32.31%); split: -33.01%, +0.70%
    Fill count: 361408 -> 225038 (-37.73%); split: -39.17%, +1.43%
    Scratch Memory Size: 12935168 -> 10868736 (-15.98%); split: -16.08%, +0.10%

    Totals from 1791 (0.27% of 668386) affected shaders:
    Instrs: 7628929 -> 5658389 (-25.83%); split: -26.50%, +0.67%
    Cycles: 719326691 -> 731542331 (+1.70%); split: -10.95%, +12.65%
    Spill count: 110627 -> 48945 (-55.76%); split: -56.96%, +1.20%
    Fill count: 221560 -> 85190 (-61.55%); split: -63.89%, +2.34%
    Scratch Memory Size: 4471808 -> 2405376 (-46.21%); split: -46.51%, +0.30%

Improves performance when using XeSS in Cyberpunk 2077 by 90% on A770.
Improves performance of Borderlands 3 by 1.54% on A770.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
2023-08-23 21:34:38 +00:00
Kenneth Graunke
158ac265df intel/fs: Make helpers for saving/restoring instruction order
This moves a bit of code out of a large function, but also lets us reuse
it a few extra places in the next commit.

I opted to stop using ralloc here since this is short-lived data that
doesn't need to stick around for the rest of the compile, and it's easy
enough to free.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
2023-08-23 21:34:38 +00:00
Kenneth Graunke
2dd56921c9 intel/fs: Index scheduler mode string table by mode enum
pre_modes[] is an array with the modes ordered in our desired
preference.  scheduler_mode_name[] was also in that order, and the two
had to be kept in sync.  This is a little silly; we should just have
a mode enum -> string table and look it up via the enum.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
2023-08-23 21:34:38 +00:00
Kenneth Graunke
7eba19245d intel/compiler: Move SCHEDULE_NONE handling into schedule_instructions()
I'm going to introduce another call site for this function, and just
handling SCHEDULE_NONE in the scheduler itself makes more sense than
duplicating the logic.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
2023-08-23 21:34:38 +00:00
Kenneth Graunke
743fd60bea intel/fs: Account for payload GRFs when calculating register pressure
The register pressure analysis I wrote in 2013 only considered VGRFs,
and not other GRFs, such as payload registers and push constants.  We
need to consider those too, because payload registers definitely occupy
space and add to pressure.

In 2015, Connor already made the scheduler account for this, so the only
real use for this is in shader statistic dumps and optimizer printouts.
But we should make it more accurate.  (We will use it in more places
shortly, a few commits from now.)

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24707>
2023-08-23 21:34:38 +00:00
Kenneth Graunke
d7daf78f62 intel/compiler: Respect NIR_DEBUG_PRINT_INTERNAL for DEBUG_OPTIMIZER
If the NIR_DEBUG_PRINT_INTERNAL flag is not set, don't print debugging
information for internal shaders in INTEL_DEBUG=optimizer dumps.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24684>
2023-08-17 18:19:53 +00:00
Matt Turner
d142c845d0 Revert "intel/fs: only avoid SIMD32 if strictly inferior in throughput"
This reverts commit 6b494745be.

The logic is not entirely correct: the comparison is between two
static-analysis estimates of a dynamic system with variables that aren't
captured by the shader source, so using ">" will always have greater potential
to cause regressions whenever the performance difference between the two builds
is something not captured by the static model, no matter how much the model is
improved.

Reference: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9262
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24615>
2023-08-16 14:56:15 +00:00
Faith Ekstrand
4695bebc79 nir: Drop nir_dest
Instead, we replace every use of it with nir_def.  Most of this commit
was generated by sed:

   sed -i -e 's/dest.ssa/def/g' src/**/*.h src/**/*.c src/**/*.cpp

A few manual fixups were required in lima and the nir_legacy code.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>
2023-08-14 21:22:53 +00:00
Alyssa Rosenzweig
09d31922de nir: Drop "SSA" from NIR language
Everything is SSA now.

   sed -e 's/nir_ssa_def/nir_def/g' \
       -e 's/nir_ssa_undef/nir_undef/g' \
       -e 's/nir_ssa_scalar/nir_scalar/g' \
       -e 's/nir_src_rewrite_ssa/nir_src_rewrite/g' \
       -e 's/nir_gather_ssa_types/nir_gather_types/g' \
       -i $(git grep -l nir | grep -v relnotes)

   git mv src/compiler/nir/nir_gather_ssa_types.c \
          src/compiler/nir/nir_gather_types.c

   ninja -C build/ clang-format
   cd src/compiler/nir && find *.c *.h -type f -exec clang-format -i \{} \;

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24585>
2023-08-12 16:44:41 -04:00
Lionel Landwerlin
6f694432e4 intel/fs: add variable for output of debug backend optimizer
It can be useful to compare 2 runs with different compiler changes.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24552>
2023-08-10 06:39:57 +00:00
Lionel Landwerlin
0e244d56e3 intel/fs: track more steps with INTEL_DEBUG=optimizer
One particular nice thing to have is the first generated backend IR
before validation. Especially if you made a mistake in the NIR
translation, you can at least look at it before validation tells you
off.

Then the last 2 steps of the optimize() function can be interesting to
look at.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24552>
2023-08-10 06:39:57 +00:00
Lionel Landwerlin
9934613c74 anv/hasvk: track robustness per pipeline stage
And split them into UBO and SSBO

v2 (Lionel):
 - Get rid of robustness fields in anv_shader_bin
v3 (Lionel):
 - Do not pass unused parameters around

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17545>
2023-08-09 09:00:12 +03:00
Alyssa Rosenzweig
ab0d878932 treewide: Remove more is_ssa asserts
Stuff Coccinelle missed.

   sed -i -e '/assert(.*\.is_ssa)/d' $(git grep -l is_ssa)
   sed -i -e '/ASSERT.*\.is_ssa)/d' $(git grep -l is_ssa)

+ a manual fixup to restore the assert for parallel copy lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24432>
2023-08-03 22:40:28 +00:00
Yonggang Luo
86bcc90c0e intel/compiler,intel/blorp,intel/vulkan: decouple vulkan driver and compiler from gallium
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24438>
2023-08-03 22:00:15 +00:00
Lionel Landwerlin
d33aff783d intel/fs: add support for sparse accesses
Purely from the backend point of view it's just an additional
parameter to sampler messages.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23882>
2023-07-27 02:02:30 +03:00
Lionel Landwerlin
d099e47de0 intel/fs: add more UNDEFs around SEND messages
lower_find_live_channel() in particular is used a lot in control flow
to find the live channel for the surface/sampler handle. Adding UNDEFs
on the temporary registers used for finding the live channels helps
reduce the liveness of those temporary registers, especially in loops.

Some titles affected :

Rise Of The Tomb Raider:
Totals from 2780 (22.58% of 12311) affected shaders:
Instrs: 1294455 -> 1294592 (+0.01%); split: -0.15%, +0.16%
Cycles: 1473136441 -> 1471302617 (-0.12%); split: -1.52%, +1.40%
Max live registers: 144282 -> 143595 (-0.48%)
Max dispatch width: 22200 -> 22232 (+0.14%)

Red Dead Redemption 2:
Totals from 435 (7.28% of 5972) affected shaders:
Instrs: 488472 -> 487594 (-0.18%); split: -0.31%, +0.14%
Cycles: 11354732 -> 11384928 (+0.27%); split: -0.44%, +0.71%
Spill count: 1217 -> 1172 (-3.70%)
Fill count: 3521 -> 3447 (-2.10%)
Scratch Memory Size: 64512 -> 62464 (-3.17%)
Max live registers: 35997 -> 35798 (-0.55%)

Fallout 4:
Totals from 8 (0.49% of 1638) affected shaders:
Instrs: 41908 -> 40509 (-3.34%)
Cycles: 3638464 -> 3555680 (-2.28%); split: -2.67%, +0.39%
Spill count: 717 -> 665 (-7.25%)
Fill count: 2542 -> 2438 (-4.09%)
Scratch Memory Size: 32768 -> 16384 (-50.00%)
Max live registers: 567 -> 534 (-5.82%)

Cyberpunk 2077:
Totals from 2984 (28.97% of 10301) affected shaders:
Instrs: 3888874 -> 3891600 (+0.07%); split: -0.20%, +0.27%
Cycles: 67906489 -> 67767721 (-0.20%); split: -0.68%, +0.47%
Spill count: 200 -> 98 (-51.00%)
Fill count: 237 -> 90 (-62.03%)
Scratch Memory Size: 10240 -> 8192 (-20.00%)
Max live registers: 215715 -> 212727 (-1.39%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24282>
2023-07-26 08:48:33 +00:00
Lionel Landwerlin
5c72724819 intel/fs: consider UNDEF as non-partial write
A few titles show max live register reductions, but nothing
significant in instruction count or other stats.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24282>
2023-07-26 08:48:32 +00:00
Ian Romanick
cb0de0a1d3 intel/fs: Constant fold OR and AND
The path taken in fs_visitor::swizzle_nir_scratch_addr for DG2 generates
some AND and OR instructions before the SHL. This commit folds those so
the whold calculation becomes a constant (like on older platforms).

v2: Fix return type of src_as_uint. Noticed by Marcin.

shader-db results:

DG2
total instructions in shared programs: 23190475 -> 23179540 (-0.05%)
instructions in affected programs: 36026 -> 25091 (-30.35%)
helped: 7 / HURT: 0

total cycles in shared programs: 841196807 -> 841142563 (<.01%)
cycles in affected programs: 1660670 -> 1606426 (-3.27%)
helped: 7 / HURT: 0

No shader-db changes on any older Intel platforms.

fossil-db results:

DG2
Totals:
Instrs: 197780372 -> 197773966 (-0.00%)
Cycles: 14066410782 -> 14066399378 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 8438104 -> 8438112 (+0.00%)
Send messages: 8049445 -> 8049446 (+0.00%)
Scratch Memory Size: 14263296 -> 14264320 (+0.01%)

Totals from 9 (0.00% of 668055) affected shaders:
Instrs: 24547 -> 18141 (-26.10%)
Cycles: 1984791 -> 1973387 (-0.57%); split: -0.98%, +0.40%
Subgroup size: 88 -> 96 (+9.09%)
Send messages: 867 -> 868 (+0.12%)
Scratch Memory Size: 69632 -> 70656 (+1.47%)

No fossil-db changes on any older Intel platforms.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23884>
2023-07-25 22:11:21 +00:00
Ian Romanick
61c786bad5 intel/fs: Constant fold SHL
This is a modified version of a commit originally in !7698. This version
add the changes to brw_fs_copy_propagation. If the address passed to
fs_visitor::swizzle_nir_scratch_addr is a constant, that function will
generate SHL with two constant sources.

DG2 uses a different path to generate those addresses, so the constant
folding can't occur there yet. That will be addressed in the next
commit.

What follows is the commit change history from that older MR.

v2: Previously this commit was after `intel/fs: Combine constants for
integer instructions too`.  However, this commit can create invalid
instructions that are only cleaned up by `intel/fs: Combine constants
for integer instructions too`.  That would potentially affect the
shader-db results of each commit, but I did not collect new data for
the reordering.

v3: Fix masking for W/UW and for Q/UQ types. Add an assertion for
!saturate. Both suggested by Ken. Also add an assertion that B/UB types
don't matically come back.

v4: Fix sources count. See also ed3c2f73db ("intel/fs: fixup sources
number from opt_algebraic").

v5: Fix typo in comment added in v3. Noticed by Marcin. Fix a typo in a
comment added when pulling this commit out of !7698. Noticed by Ken.

shader-db results:

DG2
No changes.

Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
total instructions in shared programs: 20655696 -> 20651648 (-0.02%)
instructions in affected programs: 23125 -> 19077 (-17.50%)
helped: 7 / HURT: 0

total cycles in shared programs: 858436639 -> 858407749 (<.01%)
cycles in affected programs: 8990532 -> 8961642 (-0.32%)
helped: 7 / HURT: 0

Broadwell and Haswell had similar results. (Broadwell shown)
total instructions in shared programs: 18500780 -> 18496630 (-0.02%)
instructions in affected programs: 24715 -> 20565 (-16.79%)
helped: 7 / HURT: 0

total cycles in shared programs: 946100660 -> 946087688 (<.01%)
cycles in affected programs: 5838252 -> 5825280 (-0.22%)
helped: 7 / HURT: 0

total spills in shared programs: 17588 -> 17572 (-0.09%)
spills in affected programs: 1206 -> 1190 (-1.33%)
helped: 2 / HURT: 0

total fills in shared programs: 25192 -> 25156 (-0.14%)
fills in affected programs: 156 -> 120 (-23.08%)
helped: 2 / HURT: 0

No shader-db changes on any older Intel platforms.

fossil-db results:

DG2
Totals:
Instrs: 197780415 -> 197780372 (-0.00%); split: -0.00%, +0.00%
Cycles: 14066412266 -> 14066410782 (-0.00%); split: -0.00%, +0.00%

Totals from 16 (0.00% of 668055) affected shaders:
Instrs: 16420 -> 16377 (-0.26%); split: -0.43%, +0.17%
Cycles: 220133 -> 218649 (-0.67%); split: -0.69%, +0.01%

Tiger Lake, Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 153425977 -> 153423678 (-0.00%)
Cycles: 14747928947 -> 14747929547 (+0.00%); split: -0.00%, +0.00%
Subgroup size: 8535968 -> 8535976 (+0.00%)
Send messages: 7697606 -> 7697607 (+0.00%)
Scratch Memory Size: 4380672 -> 4381696 (+0.02%)

Totals from 6 (0.00% of 662749) affected shaders:
Instrs: 13893 -> 11594 (-16.55%)
Cycles: 5386074 -> 5386674 (+0.01%); split: -0.42%, +0.43%
Subgroup size: 80 -> 88 (+10.00%)
Send messages: 675 -> 676 (+0.15%)
Scratch Memory Size: 91136 -> 92160 (+1.12%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23884>
2023-07-25 22:11:21 +00:00
Ian Romanick
56e6186dcf intel/fs: Always do opt_algebraic after opt_copy_propagation makes progress
opt_copy_propagation can create invalid instructions like

    shl(8) vgrf96:UD, 2d, 8u

These instructions will be cleaned up by opt_algebraic.  The irony is
opt_algebraic converts these to simple mov instructions that
opt_copy_propagation should clean up.  I don't think we want a loop like

   do {
      progress = false;
      if (OPT(opt_copy_propagation)) {
         OPT(opt_algebraic);
         OPT(dead_code_eliminate);
      }
   } while (progress);

But maybe we do?

Maybe this would be sufficient:

   while (OPT(opt_copy_propagation))
      OPT(opt_algebraic);
   OPT(dead_code_eliminate);

No shader-db or fossil-db changes (yet) on any Intel platform.  This is
expected.

v2: Do opt_algebraic immediately after every call to
opt_copy_propagation instead of being clever. Suggested by Lionel.

Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23884>
2023-07-25 22:11:21 +00:00
Marcin Ślusarz
c1685f08dd intel/compiler,anv: put some vertex and primitive data in headers
Both per-primitive and per-vertex space is allocated in MUE in 8 dword
chunks and those 8-dword chunks (granularity of
3DSTATE_SBE_MESH.Per[Primitive|Vertex]URBEntryOutputReadLength)
are passed to fragment shaders as inputs (either non-interpolated
for per-primitive and flat vertex attributes or interpolated
for non-flat vertex attributes).

Some attributes have a special meaning and must be placed in separate
8/16-dword slot called Primitive Header or Vertex Header.

Primitive Header contains 4 such attributes (Cull Primitive,
ViewportIndex, RTAIndex, CPS), leaving 4 dwords (the rest of 8-dword
slot) potentially unused.

Vertex Header is similar - it starts with 3 unused dwords, 1 dword for
Point Size (but if we declare that shader doesn't produce Point Size
then we can reuse it), followed by 4 dwords for Position and optionally
8 dwords for clip distances.

This means we have an interesting optimization problem - we can put
some user attributes into holes in Primitive and Vertex Headers, which
may lead to smaller MUE size and potentially more mesh threads running
in parallel, but we have to be careful to use those holes only when
we need it, otherwise we could force HW to pass too much data to
fragment shader.

Example 1:
Let's assume that Primitive Header is enabled and user defined
12 dwords of per-primitive attributes.

Without packing we would consume 8 + ALIGN(12, 8) = 24 dwords of
MUE space and pass ALIGN(12, 8) = 16 dwords to fragment shader.

With packing, we'll consume 4 + 4 + ALIGN(12 - 4, 8) = 16 dwords of
MUE space and pass ALIGN(4, 8) + ALIGN(12 - 4, 8) = 16 dwords to
fragment shader.

16/16 is better than 24/16, so packing makes sense.

Example 2:
Now let's assume that Primitive Header is enabled and user defined
16 dwords of per-primitive attributes.

Without packing we would consume 8 + ALIGN(16, 8) = 24 dwords of
MUE space and pass ALIGN(16, 16) = 16 dwords to fragment shader.

With packing, we'll consume 4 + 4 + ALIGN(16 - 4, 8) = 24 dwords of
MUE space and pass ALIGN(4, 8) + ALIGN(16 - 4, 8) = 24 dwords to
fragment shader.

24/24 is worse than 24/16, so packing doesn't make sense.

This change doesn't affect vk_meshlet_cadscene in default configuration,
but it speeds it up by up to 25% with "-extraattributes N", where
N is some small value divisible by 2 (by default N == 1) and we
are bound by URB size.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>
2023-07-24 07:55:29 +00:00
Marcin Ślusarz
a252123363 intel/compiler/mesh: compactify MUE layout
Instead of using 4 dwords for each output slot, use only the amount
of memory actually needed by each variable.

There are some complications from this "obvious" idea:
- flat and non-flat variables can't be merged into the same vec4 slot,
  because flat inputs mask has vec4 stride
- multi-slot variables can have different layout:
   float[N] requires N 1-dword slots, but
   i64vec3 requires 1 fully occupied 4-dword slot followed by 2-dword slot
- some output variables occur both in single-channel/component split
  and combined variants
- crossing vec4 boundary requires generating more writes, so avoiding them
  if possible is beneficial

This patch fixes some issues with arrays in per-vertex and per-primitive data
(func.mesh.ext.outputs.*.indirect_array.q0 in crucible)
and by reduction in single MUE size it allows spawning more threads at
the same time.

Note: this patch doesn't improve vk_meshlet_cadscene performance because
default layout is already optimal enough.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>
2023-07-24 07:55:29 +00:00
Lionel Landwerlin
3384f029be intel/compiler: rework input parameters
Use a struct for various common parameters rather than per stage
structure or arguments to stage specific entrypoints.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23942>
2023-07-20 09:08:08 +00:00
Marcin Ślusarz
36ff6c0004 intel/compiler: remove NV_mesh_shader support
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24071>
2023-07-14 08:27:14 +00:00
Lionel Landwerlin
c26c0a36d3 intel/fs: disable coarse pixel shader with interpolater messages at sample
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9292
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23962>
2023-07-06 12:48:52 +00:00
Yonggang Luo
68b8aa788d intel/compiler: Switch to use nir_foreach_function_impl
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23920>
2023-06-29 11:29:54 +00:00
Ian Romanick
ed5d346868 intel/fs: Add missing newline
Emacs will add a newline to the end of this file whether I've edited
that line or not. It was driving me up the wall, so... yeah.

Trivial.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23777>
2023-06-21 19:57:58 +00:00
Caio Oliveira
fde8bf7b7f intel/compiler: Respect NIR_DEBUG_PRINT_INTERNAL flag
If flag is not set, don't print debugging
information for internal shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23756>
2023-06-21 00:01:10 +00:00
Lionel Landwerlin
ff3494fce3 intel/fs: print identation for control flow
INTEL_DEBUG=optimizer output changes from :

{ 10}   40: cmp.nz.f0.0(8) null:F, vgrf3470:F, 0f
{ 10}   41: (+f0.0) if(8) (null):UD,
{ 11}   42: txf_logical(8) vgrf3473:UD, vgrf250:D(null):UD, 0d(null):UD(null):UD(null):UD(null):UD, 31u, 0u(null):UD(null):UD(null):UD, 3d, 0d
{ 12}   43: and(8) vgrf262:UD, vgrf3473:UD, 2u
{ 11}   44: cmp.nz.f0.0(8) null:D, vgrf262:D, 0d
{ 10}   45: (+f0.0) if(8) (null):UD,
{ 11}   46: mov(8) vgrf270:D, -1082130432d
{ 12}   47: mov(8) vgrf271:D, 1082130432d
{ 14}   48: mov(8) vgrf274+0.0:D, 0d
{ 14}   49: mov(8) vgrf274+1.0:D, 0d

to :

{ 10}   40: cmp.nz.f0.0(8) null:F, vgrf3470:F, 0f
{ 10}   41: (+f0.0) if(8) (null):UD,
{ 11}   42:   txf_logical(8) vgrf3473:UD, vgrf250:D(null):UD, 0d(null):UD(null):UD(null):UD(null):UD, 31u, 0u(null):UD(null):UD(null):UD, 3d, 0d
{ 12}   43:   and(8) vgrf262:UD, vgrf3473:UD, 2u
{ 11}   44:   cmp.nz.f0.0(8) null:D, vgrf262:D, 0d
{ 10}   45:   (+f0.0) if(8) (null):UD,
{ 11}   46:     mov(8) vgrf270:D, -1082130432d
{ 12}   47:     mov(8) vgrf271:D, 1082130432d
{ 14}   48:     mov(8) vgrf274+0.0:D, 0d
{ 14}   49:     mov(8) vgrf274+1.0:D, 0d

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>
2023-06-14 12:04:05 +00:00
Caio Oliveira
2bb26cc01d intel/compiler: Refactor dump_instruction(s)
Delete unnecessary virtual functions, we need just two.  Refactor code
so the 'default behavior' logic (stderr and/or creating file) is not
duplicated.

Rename the virtuals so overrides don't hide the common convenience
functions.  Finally, provide a variant of dump_instructions() with
a `FILE *` parameter.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23457>
2023-06-08 22:00:21 +00:00
Ian Romanick
4cc3206218 intel/fs: Don't munge source order of 3-src instructions in opt_algebraic
This only impacts ADD3, so at this point it should not have any
affect. As soon as constants are propagated into ADD3 instructions, it
will be a problem.

The worst part is, the ADD3 instrutions that are broken by the old code
aren't even "progress" on this pass.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Kenneth Graunke
2d9a3bb093 intel/compiler: Fix a fallthrough in components_read() for atomics
In commit 284f0c9a57 I refactored the
handling of the data source to just call a helper rather than special
casing opcodes with 0 or 2 sources.  Unfortunately, I also dropped the
"else return 1", creating a fallthrough for all sources other than
SURFACE_LOGICAL_SRC_ADDRESS and SURFACE_LOGICAL_SRC_DATA.

The case below happened to return the correct value for all cases except
SURFACE_LOGICAL_SRC_SURFACE, which has been returning 2 instead of 1
since that commit.

Restore the else case.  Thanks to Marcin Ślusarz for catching this.

Fixes: 284f0c9a57 ("intel/compiler: Add an lsc_op_num_data_values() helper")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23347>
2023-06-01 21:06:57 +00:00
Rohan Garg
ef2b763d9c anv: fix incorrect asserts when combining CPS and per sample interpolation
CPS is dynamically turned off when per sample interpolation is active.
Update the asserts to reflect this.

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 5644011f06 ("intel/compiler: Convert wm_prog_key::persample_interp to a tri-state")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23103>
2023-05-31 19:26:59 +00:00
Lionel Landwerlin
ad9bc1ffb5 intel/fs: enable UBO accesses through bindless heap
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
e09cfda0de intel/fs: lower get_buffer_size like other logical sends
This will also enable the use of the bindless heap.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:36 +00:00
Lionel Landwerlin
21c7b55f6f intel/fs: fix size_read() for LOAD_PAYLOAD
With Anv/Zink, the piglit test :

  arb_shader_storage_buffer_object-max-ssbo-size -auto -fbo fsexceed

is failing validation after copy propagation :

load_payload(8) vgrf15:F, vgrf1+0.12<0>:F, vgrf1+0.0<0>:F, vgrf1+0.4<0>:F, vgrf1+0.8<0>:F, vgrf1+0.12<0>:F
../src/intel/compiler/brw_fs_validate.cpp:191: A <= B failed
  A = inst->src[i].offset / REG_SIZE + regs_read(inst, i) = 2
  B = alloc.sizes[inst->src[i].nr] = 1

In most cases it works because src[0] would be at offset 0 and so
reading a full reg passes validation, but Anv/Zink started emitting
slightly different code adding an offset maybe the size read 2 GRFs.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23126>
2023-05-23 12:39:08 +00:00
Rohan Garg
a15cc833f9 intel: drop unused is_scalar function parameter in brw_nir_apply_key
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23098>
2023-05-18 15:46:06 +02:00
Rohan Garg
212810ac8a intel: infer scalar'ness locally for brw_postprocess_nir
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23098>
2023-05-18 15:46:06 +02:00
Lionel Landwerlin
b4b17f8aaa Revert "intel/compiler: make uses_pos_offset a tri-state"
This reverts commit 5489033fa8.

The problem I was trying to address is that we were programming the
3DSTATE_PS::PositionXYOffsetSelect bit differently with GPL (CENTROID)
than without (NONE).

I failed to understand that this bit also impacts the thread payload
layout. GPL fragment shaders don't know ahead of time if pos_offset is
going to be used. It'll be choosen at runtime base on push constant
bits. So we need to program this bit different just to have a payload
matching the compiled shader code.

This fixes the freedoom replay with GPL FS shader in SIMD32.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22938>
2023-05-11 08:01:46 +00:00
Lionel Landwerlin
5489033fa8 intel/compiler: make uses_pos_offset a tri-state
This value depends on the per-sample value which can be unknown at
compile time with graphics pipeline libraries. So we need to have this
dynamic has well and pick the right value when generating the
3DSTATE_PS/3DSTATE_WM packet.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d8dfd153c5 ("intel/fs: Make per-sample and coarse dispatch tri-state")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22728>
2023-05-03 10:03:57 +00:00
Tapani Pälli
ccf16693e1 intel/fs: use intel_needs_workaround for Wa_22013689345
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22437>
2023-04-13 07:33:50 +00:00