Commit Graph

2568 Commits

Author SHA1 Message Date
Tapani Pälli
12d7aaf2b8 intel/compiler: add more validation for acc register usage
This is described in Wa_14014617373 and a programming note has
been added to specification.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23682>
2023-06-21 08:15:59 +00:00
Caio Oliveira
fde8bf7b7f intel/compiler: Respect NIR_DEBUG_PRINT_INTERNAL flag
If flag is not set, don't print debugging
information for internal shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23756>
2023-06-21 00:01:10 +00:00
Caio Oliveira
59a72570b6 compiler: Move spirv into a module of its own
For historical reasons, nir and vtn were compiled together,
and a bunch of vtn specific targets were defined in
src/compiler/meson.build.

Now that we can, make src/compiler/spirv produce an internal
library that depends on NIR, and is used by the drivers/tools.
Also move the vtn specific targets into that directory's
meson.build.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23668>
2023-06-20 16:18:08 +00:00
Caio Oliveira
cb588d5d6e compiler/clc: Move related NIR passes to the common mesa clc
These were historically in the spirv+nir combo, but the common mesa clc
is a better home for them.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Nora Allen <blackcatgames@protonmail.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23667>
2023-06-20 03:43:41 +00:00
Caio Oliveira
be3e4c8aaf compiler/clc: Rename the internal library from libclc to libmesaclc
There is an actual external libclc and we do use it, so rename the
internal common library to avoid confusion.

Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Nora Allen <blackcatgames@protonmail.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23667>
2023-06-20 03:43:41 +00:00
Caio Oliveira
0c387249e1 intel/compiler: Move brw_kernel.c to the intel_clc target
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23667>
2023-06-20 03:43:40 +00:00
Caio Oliveira
59cc77f0fa compiler: Move from nir_scope to mesa_scope
Just moving the enum and performing renames, no behavior change.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23328>
2023-06-19 23:29:26 +00:00
Erik Faye-Lund
a593de7cf3 nir: add missed nir_cmp_imm-helpers
Seems I missed these in my previous round, let's fix them up now!

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23461>
2023-06-15 13:34:49 +00:00
Erik Faye-Lund
2a71e332aa nir: use new immediate comparison helpers
There's plenty of places we can use these new and shiny helpers, so
let's clean up the code a bit.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23460>
2023-06-15 13:33:58 +02:00
Ian Romanick
96cde9cc01 intel/fs: Emit better code for bfi(..., 0)
DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
total instructions in shared programs: 20570141 -> 20570063 (<.01%)
instructions in affected programs: 30679 -> 30601 (-0.25%)
helped: 77 / HURT: 0

total cycles in shared programs: 902113977 -> 902118723 (<.01%)
cycles in affected programs: 3255958 -> 3260704 (0.15%)
helped: 60 / HURT: 19

Broadwell
total instructions in shared programs: 18524633 -> 18524547 (<.01%)
instructions in affected programs: 34095 -> 34009 (-0.25%)
helped: 75 / HURT: 2

total cycles in shared programs: 949532394 -> 949543761 (<.01%)
cycles in affected programs: 3419107 -> 3430474 (0.33%)
helped: 57 / HURT: 24

total spills in shared programs: 22484 -> 22484 (0.00%)
spills in affected programs: 516 -> 516 (0.00%)
helped: 2 / HURT: 2

total fills in shared programs: 29346 -> 29338 (-0.03%)
fills in affected programs: 572 -> 564 (-1.40%)
helped: 4 / HURT: 0

Haswell
total instructions in shared programs: 17331356 -> 17331523 (<.01%)
instructions in affected programs: 27920 -> 28087 (0.60%)
helped: 41 / HURT: 4

total cycles in shared programs: 936603192 -> 936574664 (<.01%)
cycles in affected programs: 3417695 -> 3389167 (-0.83%)
helped: 28 / HURT: 21

total spills in shared programs: 19718 -> 19756 (0.19%)
spills in affected programs: 436 -> 474 (8.72%)
helped: 0 / HURT: 4

total fills in shared programs: 22547 -> 22607 (0.27%)
fills in affected programs: 444 -> 504 (13.51%)
helped: 0 / HURT: 4

Ivy Bridge
total cycles in shared programs: 463451277 -> 463451273 (<.01%)
cycles in affected programs: 95870 -> 95866 (<.01%)
helped: 3 / HURT: 2

DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
Totals:
Instrs: 152825278 -> 152819969 (-0.00%); split: -0.00%, +0.00%
Cycles: 15014075626 -> 15014628652 (+0.00%); split: -0.01%, +0.01%
Subgroup size: 8528536 -> 8528560 (+0.00%)
Send messages: 7711431 -> 7711464 (+0.00%)
Spill count: 99907 -> 99509 (-0.40%); split: -0.40%, +0.00%
Fill count: 202459 -> 201598 (-0.43%); split: -0.43%, +0.00%
Scratch Memory Size: 4376576 -> 4371456 (-0.12%)

Totals from 2915 (0.44% of 662497) affected shaders:
Instrs: 2288842 -> 2283533 (-0.23%); split: -0.24%, +0.01%
Cycles: 471633295 -> 472186321 (+0.12%); split: -0.27%, +0.39%
Subgroup size: 27488 -> 27512 (+0.09%)
Send messages: 151344 -> 151377 (+0.02%)
Spill count: 48091 -> 47693 (-0.83%); split: -0.83%, +0.00%
Fill count: 59053 -> 58192 (-1.46%); split: -1.46%, +0.00%
Scratch Memory Size: 1827840 -> 1822720 (-0.28%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
2023-06-14 18:49:53 +00:00
Ian Romanick
e419eefd34 intel/fs: Use nir_opt_reassociate_bfi
All Skylake and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19907072 -> 19907054 (<.01%)
instructions in affected programs: 8859 -> 8841 (-0.20%)
helped: 9 / HURT: 0

total cycles in shared programs: 855791238 -> 855779334 (<.01%)
cycles in affected programs: 3308294 -> 3296390 (-0.36%)
helped: 12 / HURT: 13

Broadwell
total instructions in shared programs: 17818231 -> 17817440 (<.01%)
instructions in affected programs: 9887 -> 9096 (-8.00%)
helped: 9 / HURT: 0

total cycles in shared programs: 902970035 -> 902941221 (<.01%)
cycles in affected programs: 2767243 -> 2738429 (-1.04%)
helped: 14 / HURT: 5

total spills in shared programs: 17784 -> 17718 (-0.37%)
spills in affected programs: 318 -> 252 (-20.75%)
helped: 1 / HURT: 0

total fills in shared programs: 25458 -> 24949 (-2.00%)
fills in affected programs: 1346 -> 837 (-37.82%)
helped: 1 / HURT: 0

Haswell
total instructions in shared programs: 16707799 -> 16707586 (<.01%)
instructions in affected programs: 24049 -> 23836 (-0.89%)
helped: 41 / HURT: 0

total cycles in shared programs: 882730648 -> 882723174 (<.01%)
cycles in affected programs: 5096737 -> 5089263 (-0.15%)
helped: 25 / HURT: 12

total spills in shared programs: 14937 -> 14909 (-0.19%)
spills in affected programs: 436 -> 408 (-6.42%)
helped: 4 / HURT: 0

total fills in shared programs: 17569 -> 17529 (-0.23%)
fills in affected programs: 444 -> 404 (-9.01%)
helped: 4 / HURT: 0

No shader-db changes on any older Intel platforms.

All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 153118594 -> 153117340 (-0.00%); split: -0.00%, +0.00%
Cycles: 15011967556 -> 15011904351 (-0.00%); split: -0.00%, +0.00%
Fill count: 203692 -> 203684 (-0.00%)

Totals from 703 (0.11% of 662496) affected shaders:
Instrs: 192826 -> 191572 (-0.65%); split: -0.65%, +0.00%
Cycles: 29937640 -> 29874435 (-0.21%); split: -0.25%, +0.04%
Fill count: 4146 -> 4138 (-0.19%)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968>
2023-06-14 18:49:53 +00:00
Emma Anholt
10b94772d2 intel: Reduce cost of resetting last_grf_write.
In zink-on-anv fs-mod-dvec3-dvec3.shader_test, we were memsetting 2MB of
last_grf_write 2400 times, multiple times through the scheduler.  Just
resetting for the processed instructions reduces runtime from 21s to 16s.
No change on steam shader-db runtime across several runs.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>
2023-06-14 16:16:56 +00:00
Emma Anholt
7d4769e802 intel: Allocate the last_grf_write once per scheduler.
No need to re-calloc it per block when we're going to use it again.  Also,
this fixes the vec4 backend to avoid allocating giant grf_count-sized
arrays on the stack.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>
2023-06-14 16:16:56 +00:00
Emma Anholt
2ad865b219 intel: Count reads_remaining across all blocks.
We were zeroing it out per block, but it doesn't actually help to count
per block, since the question is "will scheduling this instruction free
the reg?".  Saves some memsetting, which was showing up high in the
profile (but not from this source).

No change on iris SKL shader-db.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635>
2023-06-14 16:16:55 +00:00
Lionel Landwerlin
6b9f838d62 intel/fs: handle load_global_constant_uniform_block_intel
Again, load the data just once in GRF, share it across lanes.

Shader-db on dg2:

total instructions in shared programs: 23214555 -> 23215400 (<.01%)
instructions in affected programs: 199977 -> 200822 (0.42%)
helped: 3
HURT: 38
helped stats (abs) min: 5 max: 670 x̄: 283.67 x̃: 176
helped stats (rel) min: 1.34% max: 49.41% x̄: 22.15% x̃: 15.70%
HURT stats (abs)   min: 1 max: 185 x̄: 44.63 x̃: 32
HURT stats (rel)   min: 0.13% max: 42.86% x̄: 10.25% x̃: 9.30%
95% mean confidence interval for instructions value: -18.65 59.87
95% mean confidence interval for instructions %-change: 3.29% 12.47%
Inconclusive result (value mean confidence interval includes 0).

total loops in shared programs: 5928 -> 5928 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total cycles in shared programs: 851137495 -> 851152449 (<.01%)
cycles in affected programs: 16406137 -> 16421091 (0.09%)
helped: 9
HURT: 32
helped stats (abs) min: 10 max: 13498 x̄: 6443.22 x̃: 5581
helped stats (rel) min: 0.11% max: 4.75% x̄: 1.45% x̃: 0.34%
HURT stats (abs)   min: 3 max: 15056 x̄: 2279.47 x̃: 735
HURT stats (rel)   min: 0.10% max: 23.71% x̄: 4.58% x̃: 4.65%
95% mean confidence interval for cycles value: -1315.40 2044.87
95% mean confidence interval for cycles %-change: 1.71% 4.80%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 11856 -> 11825 (-0.26%)
spills in affected programs: 2368 -> 2337 (-1.31%)
helped: 4
HURT: 0

total fills in shared programs: 16258 -> 16207 (-0.31%)
fills in affected programs: 2930 -> 2879 (-1.74%)
helped: 4
HURT: 0

total sends in shared programs: 1038194 -> 1038185 (<.01%)
sends in affected programs: 40 -> 31 (-22.50%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 2.25 x̃: 2
helped stats (rel) min: 10.00% max: 33.33% x̄: 21.46% x̃: 21.25%
95% mean confidence interval for sends value: -4.64 0.14
95% mean confidence interval for sends %-change: -40.41% -2.51%
Inconclusive result (value mean confidence interval includes 0).

LOST:   0
GAINED: 0

Some VK/DX titles result (on DG2 only), it's mostly additional
instruction counts except for the unity spaceship demo where a CS
shader gets additional SIMDness. The reason for additional
instructions is that since we're doing block loads, we need to find
the live channels in control flow to select a single lane value that
is valid.

aztec_ruins_high:
Totals from 3 (1.12% of 269) affected shaders:
Instrs: 17732 -> 17896 (+0.92%)
Cycles: 796518 -> 819302 (+2.86%)

cyberpunk_2077:
Totals from 17 (0.17% of 10301) affected shaders:
Instrs: 10848 -> 11658 (+7.47%)
Cycles: 248243 -> 259168 (+4.40%); split: -0.57%, +4.97%

fallout_4_dxvk_g2:
Totals from 2 (0.12% of 1638) affected shaders:
Instrs: 3157 -> 3368 (+6.68%)
Cycles: 487807 -> 490426 (+0.54%); split: -0.26%, +0.79%
Max live registers: 139 -> 141 (+1.44%)

red_dead_redemption2:
Totals from 68 (1.14% of 5970) affected shaders:
Instrs: 34871 -> 36486 (+4.63%)
Cycles: 551430 -> 565211 (+2.50%)
Send messages: 2074 -> 2072 (-0.10%)
Max live registers: 5078 -> 5077 (-0.02%)

total_war_warhammer2:
Totals from 5 (1.05% of 478) affected shaders:
Instrs: 6905 -> 6971 (+0.96%); split: -0.16%, +1.12%
Cycles: 97035 -> 97989 (+0.98%); split: -0.07%, +1.05%

unity spaceship demo (instruction count going up due to a CS shader
                      bump from SIMD8->16):
Totals from 53 (9.71% of 546) affected shaders:
Instrs: 223748 -> 233223 (+4.23%); split: -0.01%, +4.25%
Cycles: 23134697 -> 25207080 (+8.96%); split: -0.17%, +9.13%
Subgroup size: 480 -> 488 (+1.67%)
Spill count: 2156 -> 2242 (+3.99%); split: -0.19%, +4.17%
Fill count: 4617 -> 4845 (+4.94%); split: -0.09%, +5.02%
Max live registers: 5991 -> 6050 (+0.98%); split: -0.40%, +1.39%
Max dispatch width: 480 -> 488 (+1.67%)

witcher_3_dxvk_g2:
Totals from 27 (2.51% of 1074) affected shaders:
Instrs: 57067 -> 57677 (+1.07%); split: -0.03%, +1.10%
Cycles: 1397871 -> 1436704 (+2.78%); split: -0.35%, +3.13%

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>
2023-06-14 12:04:05 +00:00
Lionel Landwerlin
5ae8a78d8c intel/fs: make use of load_ubo_uniform_block_intel
The principle is the same as the load_ssbo_uniform_block_intel.
Whenever we see a uniform offset, load the data only once in GRFs to
reduce register pressure.

Iris shader-db run on DG2 :

total instructions in shared programs: 23001325 -> 23094969 (0.41%)
instructions in affected programs: 1775989 -> 1869633 (5.27%)
helped: 764
HURT: 2097
helped stats (abs) min: 1 max: 102 x̄: 6.96 x̃: 2
helped stats (rel) min: 0.03% max: 16.91% x̄: 1.36% x̃: 0.63%
HURT stats (abs)   min: 1 max: 2461 x̄: 47.19 x̃: 7
HURT stats (rel)   min: <.01% max: 199.34% x̄: 5.91% x̃: 2.60%
95% mean confidence interval for instructions value: 25.43 40.03
95% mean confidence interval for instructions %-change: 3.60% 4.33%
Instructions are HURT.

total loops in shared programs: 5847 -> 5847 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total cycles in shared programs: 839329852 -> 845491482 (0.73%)
cycles in affected programs: 130229434 -> 136391064 (4.73%)
helped: 1098
HURT: 2228
helped stats (abs) min: 1 max: 130102 x̄: 1340.64 x̃: 22
helped stats (rel) min: <.01% max: 64.25% x̄: 4.03% x̃: 0.71%
HURT stats (abs)   min: 1 max: 185309 x̄: 3426.24 x̃: 87
HURT stats (rel)   min: <.01% max: 92.85% x̄: 8.12% x̃: 3.82%
95% mean confidence interval for cycles value: 1342.16 2362.97
95% mean confidence interval for cycles %-change: 3.70% 4.52%
Cycles are HURT.

total spills in shared programs: 10768 -> 11856 (10.10%)
spills in affected programs: 9717 -> 10805 (11.20%)
helped: 25
HURT: 28

total fills in shared programs: 13720 -> 16258 (18.50%)
fills in affected programs: 12016 -> 14554 (21.12%)
helped: 25
HURT: 28

total sends in shared programs: 1034790 -> 1031266 (-0.34%)
sends in affected programs: 33416 -> 29892 (-10.55%)
helped: 1005
HURT: 0
helped stats (abs) min: 1 max: 22 x̄: 3.51 x̃: 3
helped stats (rel) min: 1.69% max: 60.00% x̄: 15.20% x̃: 14.08%
95% mean confidence interval for sends value: -3.72 -3.29
95% mean confidence interval for sends %-change: -15.82% -14.57%
Sends are helped.

LOST:   26
GAINED: 183

shader-db on a number of VK/DX titles on DG2 :

 PERCENTAGE DELTAS  Shaders   Instrs    Cycles
 age_of_wonders_III 1928      +0.02%    -0.19%

 PERCENTAGE DELTAS       Shaders   Instrs    Cycles  Subgroup size Send messages Spill count Fill count Max live registers Max dispatch width
 assassins_creed_odyssey 2119      +1.12%    -0.42%      -0.03%        -0.29%       -9.10%     -4.26%         -0.64%             +0.65%

 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Spill count Fill count Max live registers
 aztec_ruins_high  269       -0.05%    -0.45%     -0.29%     -7.27%         -0.33%

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles  Max live registers Max dispatch width
 dark_souls_3_dxvk_g2 1420      +0.09%    +0.24%        +0.21%             +0.12%

(stats look bad, but it's just one shader affected)
 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Spill count Fill count Scratch Memory Size Max live registers
 fallout_4_dxvk_g2 1638      +0.67%    +8.32%    +16.02%     +7.17%         +100.00%            +0.48%

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles  Send messages Spill count Fill count Max live registers Max dispatch width
 red_dead_redemption2 5969      +0.16%    -0.04%      -0.04%       +0.01%     +0.05%         -0.20%             +0.04%

 PERCENTAGE DELTAS          Shaders   Instrs    Cycles  Send messages Max live registers Max dispatch width
 rise_of_the_tomb_raider_g2 12129     +2.19%    +1.36%      -1.23%          -0.36%             +2.04%

 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Send messages Max live registers
 shooter-game      693       +0.07%    -0.89%      -0.09%          -0.09%

 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Send messages Max live registers Max dispatch width
 talos_g2          1140      +0.37%    +3.80%      -0.86%          -0.67%             +0.19%

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles  Max live registers Max dispatch width
 total_war_warhammer2 477       +0.25%    +0.66%        -0.17%             +0.10%

 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Send messages Max live registers Max dispatch width
 witcher_3_dxvk_g2 1074      +0.75%   -10.45%      -0.15%          -0.16%             -0.16%

 PERCENTAGE DELTAS      Shaders   Instrs    Cycles  Send messages Max live registers
 wolfenstein_youngblood 1111      +0.52%    +0.66%      -0.59%          -0.03%

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>
2023-06-14 12:04:05 +00:00
Lionel Landwerlin
7eb1e2a690 intel/fs: avoid reusing the VGRF for uniform load_ubo
Only found 3 shaders affected in Red Dead Redemption :

Totals from 3 (0.05% of 5969) affected shaders:
Instrs: 2246 -> 2230 (-0.71%)
Cycles: 156506 -> 148402 (-5.18%); split: -5.23%, +0.05%

This will have a larger effect when we add the
load_ubo_uniform_block_intel intrinsic where we will have larger
blocks (vec8/vec16 vs vec4 only now).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>
2023-06-14 12:04:05 +00:00
Lionel Landwerlin
ff3494fce3 intel/fs: print identation for control flow
INTEL_DEBUG=optimizer output changes from :

{ 10}   40: cmp.nz.f0.0(8) null:F, vgrf3470:F, 0f
{ 10}   41: (+f0.0) if(8) (null):UD,
{ 11}   42: txf_logical(8) vgrf3473:UD, vgrf250:D(null):UD, 0d(null):UD(null):UD(null):UD(null):UD, 31u, 0u(null):UD(null):UD(null):UD, 3d, 0d
{ 12}   43: and(8) vgrf262:UD, vgrf3473:UD, 2u
{ 11}   44: cmp.nz.f0.0(8) null:D, vgrf262:D, 0d
{ 10}   45: (+f0.0) if(8) (null):UD,
{ 11}   46: mov(8) vgrf270:D, -1082130432d
{ 12}   47: mov(8) vgrf271:D, 1082130432d
{ 14}   48: mov(8) vgrf274+0.0:D, 0d
{ 14}   49: mov(8) vgrf274+1.0:D, 0d

to :

{ 10}   40: cmp.nz.f0.0(8) null:F, vgrf3470:F, 0f
{ 10}   41: (+f0.0) if(8) (null):UD,
{ 11}   42:   txf_logical(8) vgrf3473:UD, vgrf250:D(null):UD, 0d(null):UD(null):UD(null):UD(null):UD, 31u, 0u(null):UD(null):UD(null):UD, 3d, 0d
{ 12}   43:   and(8) vgrf262:UD, vgrf3473:UD, 2u
{ 11}   44:   cmp.nz.f0.0(8) null:D, vgrf262:D, 0d
{ 10}   45:   (+f0.0) if(8) (null):UD,
{ 11}   46:     mov(8) vgrf270:D, -1082130432d
{ 12}   47:     mov(8) vgrf271:D, 1082130432d
{ 14}   48:     mov(8) vgrf274+0.0:D, 0d
{ 14}   49:     mov(8) vgrf274+1.0:D, 0d

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>
2023-06-14 12:04:05 +00:00
Lionel Landwerlin
0cd9f0c3d3 intel/fs: fix bindless/shared surface mistake
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 068bf1378d ("intel/fs: enable SSBO accesses through the bindless heap")
Tested-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23536>
2023-06-14 07:42:57 +00:00
Alyssa Rosenzweig
1d4a59448c treewide: Remove use_scoped_barrier
It is now set by all relevant drivers and not checked anywhere.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23191>
2023-06-13 16:36:10 +00:00
Jesse Natalie
082eba6165 nir_lower_mem_access_bit_sizes: Move options into a struct
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
2023-06-13 00:43:36 +00:00
Jesse Natalie
4217353e2d nir_lower_mem_access_bit_sizes: Add a bit_size input to the callback
We'd like to use this callback to adjust loads and stores from things
that are unsupported to things that are supported, but if the input
is already supported, we'd prefer not to change it. Rather than making
up a bit size that'd work and doing a bunch of pack/unpack bit math,
only return a different bit size if the input one doesn't work for us
(i.e. can't load enough memory or just an unsupported size entirely).

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
2023-06-13 00:43:36 +00:00
Caio Oliveira
26f6ea5c30 intel/compiler: Remove unused functions and declarations
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23539>
2023-06-09 20:09:51 +00:00
Lionel Landwerlin
25de091753 intel/nir: switch ray query state tracking to local variables uint16_t
We should be able to use uint8_t but there appears to be a backend
bug.

Q2RTX shader compute shader improvement with ray queries :

Totals:
Instrs: 102221 -> 101499 (-0.71%); split: -0.82%, +0.12%
Cycles: 4451260 -> 4396025 (-1.24%)
Send messages: 3587 -> 3585 (-0.06%)
Spill count: 717 -> 658 (-8.23%)
Fill count: 1248 -> 1214 (-2.72%); split: -3.21%, +0.48%
Scratch Memory Size: 21504 -> 16384 (-23.81%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19982>
2023-06-09 08:29:43 +03:00
Caio Oliveira
2bb26cc01d intel/compiler: Refactor dump_instruction(s)
Delete unnecessary virtual functions, we need just two.  Refactor code
so the 'default behavior' logic (stderr and/or creating file) is not
duplicated.

Rename the virtuals so overrides don't hide the common convenience
functions.  Finally, provide a variant of dump_instructions() with
a `FILE *` parameter.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23457>
2023-06-08 22:00:21 +00:00
Alyssa Rosenzweig
99a00e2247 treewide: Use nir_trim_vector more
Via Coccinelle patches

    @@
    expression a, b, c;
    @@

    -nir_channels(b, a, (1 << c) - 1)
    +nir_trim_vector(b, a, c)

    @@
    expression a, b, c;
    @@

    -nir_channels(b, a, BITFIELD_MASK(c))
    +nir_trim_vector(b, a, c)

    @@
    expression a, b;
    @@

    -nir_channels(b, a, 3)
    +nir_trim_vector(b, a, 2)

    @@
    expression a, b;
    @@

    -nir_channels(b, a, 7)
    +nir_trim_vector(b, a, 3)

Plus a fixup for pointless trimming an immediate in RADV and radeonsi.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23352>
2023-06-06 18:52:25 +00:00
Lionel Landwerlin
049c791a63 intel/fs: fix pull-constant-load prior to gfx7
In ad9bc1ffb5 ("intel/fs: enable UBO accesses through bindless heap")
we added a new source, we need to fixup the source index for the
generator.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ad9bc1ffb5 ("intel/fs: enable UBO accesses through bindless heap")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23405>
2023-06-06 14:47:41 +00:00
Ian Romanick
78dd15d8e8 intel/eu/validate: Add some validation of ADD3
v2: Remove spurious ALIGN_1 checks. Suggested by Matt.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Ian Romanick
1c4c76032b intel/eu/validate: Add Gfx12.5
This required updating the expected results in a number of test. The
vast majority of these are cases where Gfx12.5 platforms don't allow
mixing F and HF sources.

In all honesty... I just updated the half_float_conversion expected
results until the test passed.

The next commit will add changes specific to Gfx12.5.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Ian Romanick
a3cfec0690 intel/eu/validate: Use a single macro define half_float_conversion cases
This is what other tests do. The next commit will add a third set of
possible results (for Gfx12.5+), and the multiple macro method does not
scale.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Ian Romanick
7ef45e661f intel/fs: Add constant propagation for ADD3
v2: Require that the constant value be representable as either uint16_t
or int16_t. Suggested by Matt.

v3: Remove redundant patterns. Noticed by Matt.

shader-db:

DG2
total instructions in shared programs: 23103767 -> 23103577 (<.01%)
instructions in affected programs: 51822 -> 51632 (-0.37%)
helped: 98 / HURT: 15

total cycles in shared programs: 842347714 -> 842380017 (<.01%)
cycles in affected programs: 1942595 -> 1974898 (1.66%)
helped: 97 / HURT: 32

Nearly all of the affected shaders (around 9,900) are shaders in
Cyberpunk 2077. It's about an even split between vertex and fragment
shaders. The majority of the remaining affected shaders (3,600) are
from Strange Brigade. This was also a nearly even split between
fragment and vertex.

All but two of the lost shaders are SIMD32 fragment shaders in
Cyberpunk 2077. The other two are SIMD32 fragment shaders in Dota2.

fossil-db:

DG2
Instructions in all programs: 196379107 -> 196248608 (-0.1%)
helped: 13467 / HURT: 1210

Cycles in all programs: 13931355281 -> 13929955971 (-0.0%)
helped: 11801 / HURT: 2922

Lost: 90

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Ian Romanick
9a9a86013c intel/fs: Allow HF const in MAD on Gfx12.5 if all sources are HF
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Ian Romanick
4f272bf001 intel/fs: Fix handling of W, UW, and HF constants in combine_constants
Sources that are already W, UW, or HF can be represented as those types
by definition. Pass them through. Previously an HF source on a MAD would
have been marked as !can_promote. I'm pretty sure this means it would
get moved out to a register, but I did not verify this.

For ADD3, a constant source could be D or UD. In this case, the value
must be tested to determine whether it can be represented as W or
UW. The patterns in opt_algebraic won't generate an ADD3 with constant
source, so this problem cannot occur yet.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Ian Romanick
4cc3206218 intel/fs: Don't munge source order of 3-src instructions in opt_algebraic
This only impacts ADD3, so at this point it should not have any
affect. As soon as constants are propagated into ADD3 instructions, it
will be a problem.

The worst part is, the ADD3 instrutions that are broken by the old code
aren't even "progress" on this pass.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Erik Faye-Lund
6d142078bc nir: use generated immediate comparison helpers
This makes the code a bit less verbose, so let's use the helpers.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23393>
2023-06-05 13:40:08 +00:00
Erik Faye-Lund
28b1c5bca1 nir: use nir_i{ne,eq}_imm helpers
We already have these, so let's use them more.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23393>
2023-06-05 13:40:07 +00:00
Yonggang Luo
12256136e0 compiler: Rename shader_prim to mesa_prim and replace all usage of pipe_prim_type with mesa_prim
This is a prepare step to remove depends on p_defines.h in src/util/*

This is done by:
replace pipe_prim_type with mesa_prim
replace shader_prim with mesa_prim
replace PIPE_PRIM_MAX  with MESA_PRIM_COUNT
replace SHADER_PRIM_  with MESA_PRIM_
replace PIPE_PRIM_ with MESA_PRIM_

This patch only replace code only

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23369>
2023-06-03 03:29:03 +00:00
Mark Janes
a98f246857 isl: use generated workaround helpers for Wa_1806565034
This workaround was enabled for gen12+, but only applies to gen12.0.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21912>
2023-06-02 16:17:34 +00:00
Kenneth Graunke
2d9a3bb093 intel/compiler: Fix a fallthrough in components_read() for atomics
In commit 284f0c9a57 I refactored the
handling of the data source to just call a helper rather than special
casing opcodes with 0 or 2 sources.  Unfortunately, I also dropped the
"else return 1", creating a fallthrough for all sources other than
SURFACE_LOGICAL_SRC_ADDRESS and SURFACE_LOGICAL_SRC_DATA.

The case below happened to return the correct value for all cases except
SURFACE_LOGICAL_SRC_SURFACE, which has been returning 2 instead of 1
since that commit.

Restore the else case.  Thanks to Marcin Ślusarz for catching this.

Fixes: 284f0c9a57 ("intel/compiler: Add an lsc_op_num_data_values() helper")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23347>
2023-06-01 21:06:57 +00:00
Lionel Landwerlin
018e306b8e intel/fs: fix a couple of descriptor mistakes
I found those issues while testing DOOM eternal and Ian also ran into
it with other shaders.

We write the desc register in SIMD1 exec_all, so all the data is in
the first component. We need to make sure to pass that component in
the lower SEND instructions.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23354>
2023-06-01 19:53:41 +00:00
Rohan Garg
ef2b763d9c anv: fix incorrect asserts when combining CPS and per sample interpolation
CPS is dynamically turned off when per sample interpolation is active.
Update the asserts to reflect this.

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 5644011f06 ("intel/compiler: Convert wm_prog_key::persample_interp to a tri-state")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23103>
2023-05-31 19:26:59 +00:00
Mark Janes
d0669f3ede intel/dev: switch defect identifiers to use lineage numbers
Update existing workarounds when necessary to match changed
identifiers.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23226>
2023-05-30 22:13:41 +00:00
Alyssa Rosenzweig
ebf4eff7eb treewide: Use nir_replicate
Via coccinelle.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23259>
2023-05-30 16:24:21 -04:00
Lionel Landwerlin
3f1ff326e0 anv: reduce push constant size for descriptor sets
Now that descriptor sets are located a in a 1Gb area, we can avoid
storing the whole address to the descriptor and add the base address
of the area to a 32bit offset.

Replay a bunch of fossils with this and changes not really significant
one way or another :

Totals:
Instrs: 9278246 -> 9277148 (-0.01%); split: -0.01%, +0.00%
Cycles: 3547598421 -> 3547579435 (-0.00%); split: -0.00%, +0.00%

Totals from 353 (1.14% of 31021) affected shaders:
Instrs: 581546 -> 580448 (-0.19%); split: -0.23%, +0.04%
Cycles: 25885422 -> 25866436 (-0.07%); split: -0.31%, +0.24%

No difference on send messages or spills/fills.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:38 +00:00
Lionel Landwerlin
04777171e0 intel/fs: try to rematerialize surface computation code
This helps a lot with accessing surface handles in control flow. Our
resource_intel intrinsic has a non_uniform flag, in which case we
cannot apply this optimization. But in uniform cases, this is just a
massive win. We drop all kind of pipeline stalls due to
find_live_channel. We also reduce register pressure by doing the
surface handle computation in a single GRF (instead of 2 or 4).

There are some regressions in max dispatch width but those I think are
only on SIMD32 and due to the current heuristic disabling it after
throughput comparison with SIMD16. We know this heuristic is not
perfect, it should probably be updated in another change.

Here are some stats (all titles seem to have similar gains) :

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles  Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
 red_dead_redemption2 5860     -36.80%    -5.67%      +0.77%        +0.06%      -81.26%     -79.16%        -70.62%             -8.63%             -6.93%
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------
 All affected         4716     -37.29%    -5.67%      +0.95%        +0.07%      -81.26%     -79.16%        -70.62%             -9.15%             -8.47%
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Total                5860     -36.80%    -5.67%      +0.77%        +0.06%      -81.26%     -79.16%        -70.62%             -8.63%             -6.93%

 PERCENTAGE DELTAS          Shaders   Instrs    Cycles  Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
 rise_of_the_tomb_raider_g2 12010    -37.19%   -22.12%      +0.01%        +0.00%      -99.01%     -99.14%        -98.65%             -7.62%             -4.96%
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 All affected               11732    -37.27%   -22.14%      +0.01%        +0.00%      -99.01%     -99.14%        -98.65%             -7.67%             -5.11%
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Total                      12010    -37.19%   -22.12%      +0.01%        +0.00%      -99.01%     -99.14%        -98.65%             -7.62%             -4.96%

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles  Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
 total_war_warhammer2 462      -27.45%   -12.42%    -82.35%     -88.46%        -66.67%             -5.52%             -5.62%
 -----------------------------------------------------------------------------------------------------------------------------------
 All affected         335      -28.31%   -12.77%    -82.35%     -88.46%        -66.67%             -6.25%             -7.24%
 -----------------------------------------------------------------------------------------------------------------------------------
 Total                462      -27.45%   -12.42%    -82.35%     -88.46%        -66.67%             -5.52%             -5.62%

 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
 witcher_3_dxvk_g2 1049     -36.94%   -57.82%      +0.06%        +0.01%      -98.52%     -97.29%        -98.10%             -7.81%             -1.00%
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 All affected      693      -41.93%   -58.45%      +0.09%        +0.01%      -98.52%     -97.29%        -98.10%             -10.25%            -1.33%
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 Total             1049     -36.94%   -57.82%      +0.06%        +0.01%      -98.52%     -97.29%        -98.10%             -7.81%             -1.00%

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
b28609a756 intel/fs: enable uniform block accesses through bindless heap
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
05089f305f intel/fs: enable bindless sampler state offsets
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
6d6877bf99 intel/fs: enable extended bindless surface offset
Gives use 4Gb of bindless surface state on Gfx12.5+ instead of 64Mb.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
01fc9a06bd intel/fs: enable get_buffer_size on bindless heap
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
ad9bc1ffb5 intel/fs: enable UBO accesses through bindless heap
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00