Marcin Ślusarz
7ed9ec70c0
intel/compiler: simplify reading of gl_NumWorkGroups in task/mesh
...
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22334 >
2023-07-04 09:15:08 +00:00
Marcin Ślusarz
1ac1d5d62e
anv,intel/compiler: enable shortcut in wg id to wg idx lowering on >= gfx12.5
...
This speeds up vk_meshlet_cadscene in "VK mesh ext" renderer by 1.4%
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22334 >
2023-07-04 09:15:08 +00:00
Marcin Ślusarz
7ec1ef75d3
intel/compiler: pass num_workgroups from task to mesh shaders
...
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22334 >
2023-07-04 09:15:08 +00:00
Konstantin Seurer
05269047d3
intel: Use nir_builder_at
...
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23883 >
2023-07-03 15:21:38 +00:00
Rohan Garg
c3110ef1e9
intel/compiler: reuse previously computed bitsize
...
Signed-off-by: Rohan Garg <rohan.garg@intel.com >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23933 >
2023-06-30 09:19:57 +00:00
Rohan Garg
7f48c70bab
intel/compiler: construct masks instead of using magic values
...
Signed-off-by: Rohan Garg <rohan.garg@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23933 >
2023-06-30 09:19:57 +00:00
Yonggang Luo
68b8aa788d
intel/compiler: Switch to use nir_foreach_function_impl
...
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23920 >
2023-06-29 11:29:54 +00:00
Erik Faye-Lund
c4b6b0d949
intel: use imm-helpers
...
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23855 >
2023-06-29 07:08:19 +00:00
Alyssa Rosenzweig
173b9ee69a
treewide: Use nir_builder_create more
...
perl -p0e 's/nir_builder_init\(&([^,]*), /\1 = nir_builder_create(/g' -i $(git grep -l nir_builder_init)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23860 >
2023-06-27 18:13:02 +00:00
Alyssa Rosenzweig
815efcdf7e
nir: Use nir_builder_create
...
perl -p0e 's/nir_builder ([^;]*);\s*nir_builder_init\(&\1, /nir_builder \1 = nir_builder_create(/g' -i $(git grep -l nir_builder_init)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23860 >
2023-06-27 18:13:02 +00:00
Konstantin Seurer
8f3db26d14
intel: Use nir_ instead of nir_build_ helpers
...
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23858 >
2023-06-27 17:37:54 +00:00
Alyssa Rosenzweig
6689c678fe
nir/lower_locals_to_regs: Add bool bitsize knob
...
GLSL booleans (and hence bool derefs) may be translated either as 1-bit or
32-bit NIR registers, depending whether the backend uses nir_lower_bool_to_int32
or not. Add a knob for this and choose the right type for different backends.
Fixes nir_validate failure on
dEQP-VK.subgroups.ballot_broadcast.graphics.subgroupbroadcast_bvec3 run under
lavapipe. That test indexes into a bvec3 array, and gallivm first lowers bools
and then lowers derefs to registers, resulting in random 1-bit booleans mixed in
with 32-bit bools.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23804 >
2023-06-26 08:22:06 -04:00
Karol Herbst
570c263ea3
nir/load_libclc: run some opt passes for everybody
...
Cuts down serialized size from 2850288 to 1377780 bytes.
Reduces clinfo with Rusticl time by 40% for debug builds.
(Old data, but the point stands)
Signed-off-by: Karol Herbst <kherbst@redhat.com >
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15996 >
2023-06-22 21:02:57 +00:00
Michel Zou
badb85edb8
util: reinstate ENUM_PACKED
...
gets rid of warning: 'gcc_struct' attribute ignored [-Wattributes] introduced by !23338
Fixes: 86532fa21d
("util: Use the gcc_struct attribute for packed structures in mingw")
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23478 >
2023-06-21 21:51:59 +00:00
Ian Romanick
ed5d346868
intel/fs: Add missing newline
...
Emacs will add a newline to the end of this file whether I've edited
that line or not. It was driving me up the wall, so... yeah.
Trivial.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23777 >
2023-06-21 19:57:58 +00:00
Ian Romanick
5336cbff3b
intel/fs: Constant propagate into SHADER_OPCODE_SHUFFLE
...
Code already exists to convert SHADER_OPCODE_SHUFFLE into a simple MOV
when either source is constant. However... the constants have to
actually get into those sources!
On a shader that I'm working on that multiplies very large matrices using
lots of subgroup operations,
-SIMD8 shader: 1378 instructions. 3 loops. 793896 cycles. 0:0 spills:fills, 23 sends, scheduled with mode non-lifo. Promoted 0 constants. Compacted 22048 to 21664 bytes (2%)
+SIMD8 shader: 346 instructions. 3 loops. 61742 cycles. 0:0 spills:fills, 23 sends, scheduled with mode top-down. Promoted 0 constants. Compacted 5536 to 5216 bytes (6%)
No changes in shader-db or fossil-db on any Intel platform.
v2: Merge a bunch of identical cases. Suggested by Ken.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com > [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23609 >
2023-06-21 17:16:57 +00:00
Tapani Pälli
12d7aaf2b8
intel/compiler: add more validation for acc register usage
...
This is described in Wa_14014617373 and a programming note has
been added to specification.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23682 >
2023-06-21 08:15:59 +00:00
Caio Oliveira
fde8bf7b7f
intel/compiler: Respect NIR_DEBUG_PRINT_INTERNAL flag
...
If flag is not set, don't print debugging
information for internal shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23756 >
2023-06-21 00:01:10 +00:00
Caio Oliveira
59a72570b6
compiler: Move spirv into a module of its own
...
For historical reasons, nir and vtn were compiled together,
and a bunch of vtn specific targets were defined in
src/compiler/meson.build.
Now that we can, make src/compiler/spirv produce an internal
library that depends on NIR, and is used by the drivers/tools.
Also move the vtn specific targets into that directory's
meson.build.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23668 >
2023-06-20 16:18:08 +00:00
Caio Oliveira
cb588d5d6e
compiler/clc: Move related NIR passes to the common mesa clc
...
These were historically in the spirv+nir combo, but the common mesa clc
is a better home for them.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com >
Acked-by: Nora Allen <blackcatgames@protonmail.com >
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23667 >
2023-06-20 03:43:41 +00:00
Caio Oliveira
be3e4c8aaf
compiler/clc: Rename the internal library from libclc to libmesaclc
...
There is an actual external libclc and we do use it, so rename the
internal common library to avoid confusion.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com >
Acked-by: Nora Allen <blackcatgames@protonmail.com >
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23667 >
2023-06-20 03:43:41 +00:00
Caio Oliveira
0c387249e1
intel/compiler: Move brw_kernel.c to the intel_clc target
...
Reviewed-by: Ivan Briano <ivan.briano@intel.com >
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23667 >
2023-06-20 03:43:40 +00:00
Caio Oliveira
59cc77f0fa
compiler: Move from nir_scope to mesa_scope
...
Just moving the enum and performing renames, no behavior change.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Acked-by: Yonggang Luo <luoyonggang@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23328 >
2023-06-19 23:29:26 +00:00
Erik Faye-Lund
a593de7cf3
nir: add missed nir_cmp_imm-helpers
...
Seems I missed these in my previous round, let's fix them up now!
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23461 >
2023-06-15 13:34:49 +00:00
Erik Faye-Lund
2a71e332aa
nir: use new immediate comparison helpers
...
There's plenty of places we can use these new and shiny helpers, so
let's clean up the code a bit.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com >
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23460 >
2023-06-15 13:33:58 +02:00
Ian Romanick
96cde9cc01
intel/fs: Emit better code for bfi(..., 0)
...
DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
total instructions in shared programs: 20570141 -> 20570063 (<.01%)
instructions in affected programs: 30679 -> 30601 (-0.25%)
helped: 77 / HURT: 0
total cycles in shared programs: 902113977 -> 902118723 (<.01%)
cycles in affected programs: 3255958 -> 3260704 (0.15%)
helped: 60 / HURT: 19
Broadwell
total instructions in shared programs: 18524633 -> 18524547 (<.01%)
instructions in affected programs: 34095 -> 34009 (-0.25%)
helped: 75 / HURT: 2
total cycles in shared programs: 949532394 -> 949543761 (<.01%)
cycles in affected programs: 3419107 -> 3430474 (0.33%)
helped: 57 / HURT: 24
total spills in shared programs: 22484 -> 22484 (0.00%)
spills in affected programs: 516 -> 516 (0.00%)
helped: 2 / HURT: 2
total fills in shared programs: 29346 -> 29338 (-0.03%)
fills in affected programs: 572 -> 564 (-1.40%)
helped: 4 / HURT: 0
Haswell
total instructions in shared programs: 17331356 -> 17331523 (<.01%)
instructions in affected programs: 27920 -> 28087 (0.60%)
helped: 41 / HURT: 4
total cycles in shared programs: 936603192 -> 936574664 (<.01%)
cycles in affected programs: 3417695 -> 3389167 (-0.83%)
helped: 28 / HURT: 21
total spills in shared programs: 19718 -> 19756 (0.19%)
spills in affected programs: 436 -> 474 (8.72%)
helped: 0 / HURT: 4
total fills in shared programs: 22547 -> 22607 (0.27%)
fills in affected programs: 444 -> 504 (13.51%)
helped: 0 / HURT: 4
Ivy Bridge
total cycles in shared programs: 463451277 -> 463451273 (<.01%)
cycles in affected programs: 95870 -> 95866 (<.01%)
helped: 3 / HURT: 2
DG2, Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
Totals:
Instrs: 152825278 -> 152819969 (-0.00%); split: -0.00%, +0.00%
Cycles: 15014075626 -> 15014628652 (+0.00%); split: -0.01%, +0.01%
Subgroup size: 8528536 -> 8528560 (+0.00%)
Send messages: 7711431 -> 7711464 (+0.00%)
Spill count: 99907 -> 99509 (-0.40%); split: -0.40%, +0.00%
Fill count: 202459 -> 201598 (-0.43%); split: -0.43%, +0.00%
Scratch Memory Size: 4376576 -> 4371456 (-0.12%)
Totals from 2915 (0.44% of 662497) affected shaders:
Instrs: 2288842 -> 2283533 (-0.23%); split: -0.24%, +0.01%
Cycles: 471633295 -> 472186321 (+0.12%); split: -0.27%, +0.39%
Subgroup size: 27488 -> 27512 (+0.09%)
Send messages: 151344 -> 151377 (+0.02%)
Spill count: 48091 -> 47693 (-0.83%); split: -0.83%, +0.00%
Fill count: 59053 -> 58192 (-1.46%); split: -1.46%, +0.00%
Scratch Memory Size: 1827840 -> 1822720 (-0.28%)
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968 >
2023-06-14 18:49:53 +00:00
Ian Romanick
e419eefd34
intel/fs: Use nir_opt_reassociate_bfi
...
All Skylake and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19907072 -> 19907054 (<.01%)
instructions in affected programs: 8859 -> 8841 (-0.20%)
helped: 9 / HURT: 0
total cycles in shared programs: 855791238 -> 855779334 (<.01%)
cycles in affected programs: 3308294 -> 3296390 (-0.36%)
helped: 12 / HURT: 13
Broadwell
total instructions in shared programs: 17818231 -> 17817440 (<.01%)
instructions in affected programs: 9887 -> 9096 (-8.00%)
helped: 9 / HURT: 0
total cycles in shared programs: 902970035 -> 902941221 (<.01%)
cycles in affected programs: 2767243 -> 2738429 (-1.04%)
helped: 14 / HURT: 5
total spills in shared programs: 17784 -> 17718 (-0.37%)
spills in affected programs: 318 -> 252 (-20.75%)
helped: 1 / HURT: 0
total fills in shared programs: 25458 -> 24949 (-2.00%)
fills in affected programs: 1346 -> 837 (-37.82%)
helped: 1 / HURT: 0
Haswell
total instructions in shared programs: 16707799 -> 16707586 (<.01%)
instructions in affected programs: 24049 -> 23836 (-0.89%)
helped: 41 / HURT: 0
total cycles in shared programs: 882730648 -> 882723174 (<.01%)
cycles in affected programs: 5096737 -> 5089263 (-0.15%)
helped: 25 / HURT: 12
total spills in shared programs: 14937 -> 14909 (-0.19%)
spills in affected programs: 436 -> 408 (-6.42%)
helped: 4 / HURT: 0
total fills in shared programs: 17569 -> 17529 (-0.23%)
fills in affected programs: 444 -> 404 (-9.01%)
helped: 4 / HURT: 0
No shader-db changes on any older Intel platforms.
All Intel platforms had similar results. (Ice Lake shown)
Totals:
Instrs: 153118594 -> 153117340 (-0.00%); split: -0.00%, +0.00%
Cycles: 15011967556 -> 15011904351 (-0.00%); split: -0.00%, +0.00%
Fill count: 203692 -> 203684 (-0.00%)
Totals from 703 (0.11% of 662496) affected shaders:
Instrs: 192826 -> 191572 (-0.65%); split: -0.65%, +0.00%
Cycles: 29937640 -> 29874435 (-0.21%); split: -0.25%, +0.04%
Fill count: 4146 -> 4138 (-0.19%)
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19968 >
2023-06-14 18:49:53 +00:00
Emma Anholt
10b94772d2
intel: Reduce cost of resetting last_grf_write.
...
In zink-on-anv fs-mod-dvec3-dvec3.shader_test, we were memsetting 2MB of
last_grf_write 2400 times, multiple times through the scheduler. Just
resetting for the processed instructions reduces runtime from 21s to 16s.
No change on steam shader-db runtime across several runs.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635 >
2023-06-14 16:16:56 +00:00
Emma Anholt
7d4769e802
intel: Allocate the last_grf_write once per scheduler.
...
No need to re-calloc it per block when we're going to use it again. Also,
this fixes the vec4 backend to avoid allocating giant grf_count-sized
arrays on the stack.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635 >
2023-06-14 16:16:56 +00:00
Emma Anholt
2ad865b219
intel: Count reads_remaining across all blocks.
...
We were zeroing it out per block, but it doesn't actually help to count
per block, since the question is "will scheduling this instruction free
the reg?". Saves some memsetting, which was showing up high in the
profile (but not from this source).
No change on iris SKL shader-db.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23635 >
2023-06-14 16:16:55 +00:00
Lionel Landwerlin
6b9f838d62
intel/fs: handle load_global_constant_uniform_block_intel
...
Again, load the data just once in GRF, share it across lanes.
Shader-db on dg2:
total instructions in shared programs: 23214555 -> 23215400 (<.01%)
instructions in affected programs: 199977 -> 200822 (0.42%)
helped: 3
HURT: 38
helped stats (abs) min: 5 max: 670 x̄: 283.67 x̃: 176
helped stats (rel) min: 1.34% max: 49.41% x̄: 22.15% x̃: 15.70%
HURT stats (abs) min: 1 max: 185 x̄: 44.63 x̃: 32
HURT stats (rel) min: 0.13% max: 42.86% x̄: 10.25% x̃: 9.30%
95% mean confidence interval for instructions value: -18.65 59.87
95% mean confidence interval for instructions %-change: 3.29% 12.47%
Inconclusive result (value mean confidence interval includes 0).
total loops in shared programs: 5928 -> 5928 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 851137495 -> 851152449 (<.01%)
cycles in affected programs: 16406137 -> 16421091 (0.09%)
helped: 9
HURT: 32
helped stats (abs) min: 10 max: 13498 x̄: 6443.22 x̃: 5581
helped stats (rel) min: 0.11% max: 4.75% x̄: 1.45% x̃: 0.34%
HURT stats (abs) min: 3 max: 15056 x̄: 2279.47 x̃: 735
HURT stats (rel) min: 0.10% max: 23.71% x̄: 4.58% x̃: 4.65%
95% mean confidence interval for cycles value: -1315.40 2044.87
95% mean confidence interval for cycles %-change: 1.71% 4.80%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 11856 -> 11825 (-0.26%)
spills in affected programs: 2368 -> 2337 (-1.31%)
helped: 4
HURT: 0
total fills in shared programs: 16258 -> 16207 (-0.31%)
fills in affected programs: 2930 -> 2879 (-1.74%)
helped: 4
HURT: 0
total sends in shared programs: 1038194 -> 1038185 (<.01%)
sends in affected programs: 40 -> 31 (-22.50%)
helped: 4
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 2.25 x̃: 2
helped stats (rel) min: 10.00% max: 33.33% x̄: 21.46% x̃: 21.25%
95% mean confidence interval for sends value: -4.64 0.14
95% mean confidence interval for sends %-change: -40.41% -2.51%
Inconclusive result (value mean confidence interval includes 0).
LOST: 0
GAINED: 0
Some VK/DX titles result (on DG2 only), it's mostly additional
instruction counts except for the unity spaceship demo where a CS
shader gets additional SIMDness. The reason for additional
instructions is that since we're doing block loads, we need to find
the live channels in control flow to select a single lane value that
is valid.
aztec_ruins_high:
Totals from 3 (1.12% of 269) affected shaders:
Instrs: 17732 -> 17896 (+0.92%)
Cycles: 796518 -> 819302 (+2.86%)
cyberpunk_2077:
Totals from 17 (0.17% of 10301) affected shaders:
Instrs: 10848 -> 11658 (+7.47%)
Cycles: 248243 -> 259168 (+4.40%); split: -0.57%, +4.97%
fallout_4_dxvk_g2:
Totals from 2 (0.12% of 1638) affected shaders:
Instrs: 3157 -> 3368 (+6.68%)
Cycles: 487807 -> 490426 (+0.54%); split: -0.26%, +0.79%
Max live registers: 139 -> 141 (+1.44%)
red_dead_redemption2:
Totals from 68 (1.14% of 5970) affected shaders:
Instrs: 34871 -> 36486 (+4.63%)
Cycles: 551430 -> 565211 (+2.50%)
Send messages: 2074 -> 2072 (-0.10%)
Max live registers: 5078 -> 5077 (-0.02%)
total_war_warhammer2:
Totals from 5 (1.05% of 478) affected shaders:
Instrs: 6905 -> 6971 (+0.96%); split: -0.16%, +1.12%
Cycles: 97035 -> 97989 (+0.98%); split: -0.07%, +1.05%
unity spaceship demo (instruction count going up due to a CS shader
bump from SIMD8->16):
Totals from 53 (9.71% of 546) affected shaders:
Instrs: 223748 -> 233223 (+4.23%); split: -0.01%, +4.25%
Cycles: 23134697 -> 25207080 (+8.96%); split: -0.17%, +9.13%
Subgroup size: 480 -> 488 (+1.67%)
Spill count: 2156 -> 2242 (+3.99%); split: -0.19%, +4.17%
Fill count: 4617 -> 4845 (+4.94%); split: -0.09%, +5.02%
Max live registers: 5991 -> 6050 (+0.98%); split: -0.40%, +1.39%
Max dispatch width: 480 -> 488 (+1.67%)
witcher_3_dxvk_g2:
Totals from 27 (2.51% of 1074) affected shaders:
Instrs: 57067 -> 57677 (+1.07%); split: -0.03%, +1.10%
Cycles: 1397871 -> 1436704 (+2.78%); split: -0.35%, +3.13%
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477 >
2023-06-14 12:04:05 +00:00
Lionel Landwerlin
5ae8a78d8c
intel/fs: make use of load_ubo_uniform_block_intel
...
The principle is the same as the load_ssbo_uniform_block_intel.
Whenever we see a uniform offset, load the data only once in GRFs to
reduce register pressure.
Iris shader-db run on DG2 :
total instructions in shared programs: 23001325 -> 23094969 (0.41%)
instructions in affected programs: 1775989 -> 1869633 (5.27%)
helped: 764
HURT: 2097
helped stats (abs) min: 1 max: 102 x̄: 6.96 x̃: 2
helped stats (rel) min: 0.03% max: 16.91% x̄: 1.36% x̃: 0.63%
HURT stats (abs) min: 1 max: 2461 x̄: 47.19 x̃: 7
HURT stats (rel) min: <.01% max: 199.34% x̄: 5.91% x̃: 2.60%
95% mean confidence interval for instructions value: 25.43 40.03
95% mean confidence interval for instructions %-change: 3.60% 4.33%
Instructions are HURT.
total loops in shared programs: 5847 -> 5847 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total cycles in shared programs: 839329852 -> 845491482 (0.73%)
cycles in affected programs: 130229434 -> 136391064 (4.73%)
helped: 1098
HURT: 2228
helped stats (abs) min: 1 max: 130102 x̄: 1340.64 x̃: 22
helped stats (rel) min: <.01% max: 64.25% x̄: 4.03% x̃: 0.71%
HURT stats (abs) min: 1 max: 185309 x̄: 3426.24 x̃: 87
HURT stats (rel) min: <.01% max: 92.85% x̄: 8.12% x̃: 3.82%
95% mean confidence interval for cycles value: 1342.16 2362.97
95% mean confidence interval for cycles %-change: 3.70% 4.52%
Cycles are HURT.
total spills in shared programs: 10768 -> 11856 (10.10%)
spills in affected programs: 9717 -> 10805 (11.20%)
helped: 25
HURT: 28
total fills in shared programs: 13720 -> 16258 (18.50%)
fills in affected programs: 12016 -> 14554 (21.12%)
helped: 25
HURT: 28
total sends in shared programs: 1034790 -> 1031266 (-0.34%)
sends in affected programs: 33416 -> 29892 (-10.55%)
helped: 1005
HURT: 0
helped stats (abs) min: 1 max: 22 x̄: 3.51 x̃: 3
helped stats (rel) min: 1.69% max: 60.00% x̄: 15.20% x̃: 14.08%
95% mean confidence interval for sends value: -3.72 -3.29
95% mean confidence interval for sends %-change: -15.82% -14.57%
Sends are helped.
LOST: 26
GAINED: 183
shader-db on a number of VK/DX titles on DG2 :
PERCENTAGE DELTAS Shaders Instrs Cycles
age_of_wonders_III 1928 +0.02% -0.19%
PERCENTAGE DELTAS Shaders Instrs Cycles Subgroup size Send messages Spill count Fill count Max live registers Max dispatch width
assassins_creed_odyssey 2119 +1.12% -0.42% -0.03% -0.29% -9.10% -4.26% -0.64% +0.65%
PERCENTAGE DELTAS Shaders Instrs Cycles Spill count Fill count Max live registers
aztec_ruins_high 269 -0.05% -0.45% -0.29% -7.27% -0.33%
PERCENTAGE DELTAS Shaders Instrs Cycles Max live registers Max dispatch width
dark_souls_3_dxvk_g2 1420 +0.09% +0.24% +0.21% +0.12%
(stats look bad, but it's just one shader affected)
PERCENTAGE DELTAS Shaders Instrs Cycles Spill count Fill count Scratch Memory Size Max live registers
fallout_4_dxvk_g2 1638 +0.67% +8.32% +16.02% +7.17% +100.00% +0.48%
PERCENTAGE DELTAS Shaders Instrs Cycles Send messages Spill count Fill count Max live registers Max dispatch width
red_dead_redemption2 5969 +0.16% -0.04% -0.04% +0.01% +0.05% -0.20% +0.04%
PERCENTAGE DELTAS Shaders Instrs Cycles Send messages Max live registers Max dispatch width
rise_of_the_tomb_raider_g2 12129 +2.19% +1.36% -1.23% -0.36% +2.04%
PERCENTAGE DELTAS Shaders Instrs Cycles Send messages Max live registers
shooter-game 693 +0.07% -0.89% -0.09% -0.09%
PERCENTAGE DELTAS Shaders Instrs Cycles Send messages Max live registers Max dispatch width
talos_g2 1140 +0.37% +3.80% -0.86% -0.67% +0.19%
PERCENTAGE DELTAS Shaders Instrs Cycles Max live registers Max dispatch width
total_war_warhammer2 477 +0.25% +0.66% -0.17% +0.10%
PERCENTAGE DELTAS Shaders Instrs Cycles Send messages Max live registers Max dispatch width
witcher_3_dxvk_g2 1074 +0.75% -10.45% -0.15% -0.16% -0.16%
PERCENTAGE DELTAS Shaders Instrs Cycles Send messages Max live registers
wolfenstein_youngblood 1111 +0.52% +0.66% -0.59% -0.03%
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477 >
2023-06-14 12:04:05 +00:00
Lionel Landwerlin
7eb1e2a690
intel/fs: avoid reusing the VGRF for uniform load_ubo
...
Only found 3 shaders affected in Red Dead Redemption :
Totals from 3 (0.05% of 5969) affected shaders:
Instrs: 2246 -> 2230 (-0.71%)
Cycles: 156506 -> 148402 (-5.18%); split: -5.23%, +0.05%
This will have a larger effect when we add the
load_ubo_uniform_block_intel intrinsic where we will have larger
blocks (vec8/vec16 vs vec4 only now).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477 >
2023-06-14 12:04:05 +00:00
Lionel Landwerlin
ff3494fce3
intel/fs: print identation for control flow
...
INTEL_DEBUG=optimizer output changes from :
{ 10} 40: cmp.nz.f0.0(8) null:F, vgrf3470:F, 0f
{ 10} 41: (+f0.0) if(8) (null):UD,
{ 11} 42: txf_logical(8) vgrf3473:UD, vgrf250:D(null):UD, 0d(null):UD(null):UD(null):UD(null):UD, 31u, 0u(null):UD(null):UD(null):UD, 3d, 0d
{ 12} 43: and(8) vgrf262:UD, vgrf3473:UD, 2u
{ 11} 44: cmp.nz.f0.0(8) null:D, vgrf262:D, 0d
{ 10} 45: (+f0.0) if(8) (null):UD,
{ 11} 46: mov(8) vgrf270:D, -1082130432d
{ 12} 47: mov(8) vgrf271:D, 1082130432d
{ 14} 48: mov(8) vgrf274+0.0:D, 0d
{ 14} 49: mov(8) vgrf274+1.0:D, 0d
to :
{ 10} 40: cmp.nz.f0.0(8) null:F, vgrf3470:F, 0f
{ 10} 41: (+f0.0) if(8) (null):UD,
{ 11} 42: txf_logical(8) vgrf3473:UD, vgrf250:D(null):UD, 0d(null):UD(null):UD(null):UD(null):UD, 31u, 0u(null):UD(null):UD(null):UD, 3d, 0d
{ 12} 43: and(8) vgrf262:UD, vgrf3473:UD, 2u
{ 11} 44: cmp.nz.f0.0(8) null:D, vgrf262:D, 0d
{ 10} 45: (+f0.0) if(8) (null):UD,
{ 11} 46: mov(8) vgrf270:D, -1082130432d
{ 12} 47: mov(8) vgrf271:D, 1082130432d
{ 14} 48: mov(8) vgrf274+0.0:D, 0d
{ 14} 49: mov(8) vgrf274+1.0:D, 0d
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477 >
2023-06-14 12:04:05 +00:00
Lionel Landwerlin
0cd9f0c3d3
intel/fs: fix bindless/shared surface mistake
...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Fixes: 068bf1378d
("intel/fs: enable SSBO accesses through the bindless heap")
Tested-by: Karol Herbst <kherbst@redhat.com >
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23536 >
2023-06-14 07:42:57 +00:00
Alyssa Rosenzweig
1d4a59448c
treewide: Remove use_scoped_barrier
...
It is now set by all relevant drivers and not checked anywhere.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Acked-by: Caio Oliveira <caio.oliveira@intel.com >
Reviewed-by: Jesse Natalie <jenatali@microsoft.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23191 >
2023-06-13 16:36:10 +00:00
Jesse Natalie
082eba6165
nir_lower_mem_access_bit_sizes: Move options into a struct
...
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173 >
2023-06-13 00:43:36 +00:00
Jesse Natalie
4217353e2d
nir_lower_mem_access_bit_sizes: Add a bit_size input to the callback
...
We'd like to use this callback to adjust loads and stores from things
that are unsupported to things that are supported, but if the input
is already supported, we'd prefer not to change it. Rather than making
up a bit size that'd work and doing a bunch of pack/unpack bit math,
only return a different bit size if the input one doesn't work for us
(i.e. can't load enough memory or just an unsupported size entirely).
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173 >
2023-06-13 00:43:36 +00:00
Caio Oliveira
26f6ea5c30
intel/compiler: Remove unused functions and declarations
...
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23539 >
2023-06-09 20:09:51 +00:00
Lionel Landwerlin
25de091753
intel/nir: switch ray query state tracking to local variables uint16_t
...
We should be able to use uint8_t but there appears to be a backend
bug.
Q2RTX shader compute shader improvement with ray queries :
Totals:
Instrs: 102221 -> 101499 (-0.71%); split: -0.82%, +0.12%
Cycles: 4451260 -> 4396025 (-1.24%)
Send messages: 3587 -> 3585 (-0.06%)
Spill count: 717 -> 658 (-8.23%)
Fill count: 1248 -> 1214 (-2.72%); split: -3.21%, +0.48%
Scratch Memory Size: 21504 -> 16384 (-23.81%)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Reviewed-by: Ivan Briano <ivan.briano@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19982 >
2023-06-09 08:29:43 +03:00
Caio Oliveira
2bb26cc01d
intel/compiler: Refactor dump_instruction(s)
...
Delete unnecessary virtual functions, we need just two. Refactor code
so the 'default behavior' logic (stderr and/or creating file) is not
duplicated.
Rename the virtuals so overrides don't hide the common convenience
functions. Finally, provide a variant of dump_instructions() with
a `FILE *` parameter.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23457 >
2023-06-08 22:00:21 +00:00
Alyssa Rosenzweig
99a00e2247
treewide: Use nir_trim_vector more
...
Via Coccinelle patches
@@
expression a, b, c;
@@
-nir_channels(b, a, (1 << c) - 1)
+nir_trim_vector(b, a, c)
@@
expression a, b, c;
@@
-nir_channels(b, a, BITFIELD_MASK(c))
+nir_trim_vector(b, a, c)
@@
expression a, b;
@@
-nir_channels(b, a, 3)
+nir_trim_vector(b, a, 2)
@@
expression a, b;
@@
-nir_channels(b, a, 7)
+nir_trim_vector(b, a, 3)
Plus a fixup for pointless trimming an immediate in RADV and radeonsi.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23352 >
2023-06-06 18:52:25 +00:00
Lionel Landwerlin
049c791a63
intel/fs: fix pull-constant-load prior to gfx7
...
In ad9bc1ffb5
("intel/fs: enable UBO accesses through bindless heap")
we added a new source, we need to fixup the source index for the
generator.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Fixes: ad9bc1ffb5
("intel/fs: enable UBO accesses through bindless heap")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Tested-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23405 >
2023-06-06 14:47:41 +00:00
Ian Romanick
78dd15d8e8
intel/eu/validate: Add some validation of ADD3
...
v2: Remove spurious ALIGN_1 checks. Suggested by Matt.
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262 >
2023-06-06 06:10:53 +00:00
Ian Romanick
1c4c76032b
intel/eu/validate: Add Gfx12.5
...
This required updating the expected results in a number of test. The
vast majority of these are cases where Gfx12.5 platforms don't allow
mixing F and HF sources.
In all honesty... I just updated the half_float_conversion expected
results until the test passed.
The next commit will add changes specific to Gfx12.5.
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262 >
2023-06-06 06:10:53 +00:00
Ian Romanick
a3cfec0690
intel/eu/validate: Use a single macro define half_float_conversion cases
...
This is what other tests do. The next commit will add a third set of
possible results (for Gfx12.5+), and the multiple macro method does not
scale.
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262 >
2023-06-06 06:10:53 +00:00
Ian Romanick
7ef45e661f
intel/fs: Add constant propagation for ADD3
...
v2: Require that the constant value be representable as either uint16_t
or int16_t. Suggested by Matt.
v3: Remove redundant patterns. Noticed by Matt.
shader-db:
DG2
total instructions in shared programs: 23103767 -> 23103577 (<.01%)
instructions in affected programs: 51822 -> 51632 (-0.37%)
helped: 98 / HURT: 15
total cycles in shared programs: 842347714 -> 842380017 (<.01%)
cycles in affected programs: 1942595 -> 1974898 (1.66%)
helped: 97 / HURT: 32
Nearly all of the affected shaders (around 9,900) are shaders in
Cyberpunk 2077. It's about an even split between vertex and fragment
shaders. The majority of the remaining affected shaders (3,600) are
from Strange Brigade. This was also a nearly even split between
fragment and vertex.
All but two of the lost shaders are SIMD32 fragment shaders in
Cyberpunk 2077. The other two are SIMD32 fragment shaders in Dota2.
fossil-db:
DG2
Instructions in all programs: 196379107 -> 196248608 (-0.1%)
helped: 13467 / HURT: 1210
Cycles in all programs: 13931355281 -> 13929955971 (-0.0%)
helped: 11801 / HURT: 2922
Lost: 90
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262 >
2023-06-06 06:10:53 +00:00
Ian Romanick
9a9a86013c
intel/fs: Allow HF const in MAD on Gfx12.5 if all sources are HF
...
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262 >
2023-06-06 06:10:53 +00:00
Ian Romanick
4f272bf001
intel/fs: Fix handling of W, UW, and HF constants in combine_constants
...
Sources that are already W, UW, or HF can be represented as those types
by definition. Pass them through. Previously an HF source on a MAD would
have been marked as !can_promote. I'm pretty sure this means it would
get moved out to a register, but I did not verify this.
For ADD3, a constant source could be D or UD. In this case, the value
must be tested to determine whether it can be represented as W or
UW. The patterns in opt_algebraic won't generate an ADD3 with constant
source, so this problem cannot occur yet.
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262 >
2023-06-06 06:10:53 +00:00
Ian Romanick
4cc3206218
intel/fs: Don't munge source order of 3-src instructions in opt_algebraic
...
This only impacts ADD3, so at this point it should not have any
affect. As soon as constants are propagated into ADD3 instructions, it
will be a problem.
The worst part is, the ADD3 instrutions that are broken by the old code
aren't even "progress" on this pass.
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262 >
2023-06-06 06:10:53 +00:00