Commit Graph

2553 Commits

Author SHA1 Message Date
Lionel Landwerlin
5ae8a78d8c intel/fs: make use of load_ubo_uniform_block_intel
The principle is the same as the load_ssbo_uniform_block_intel.
Whenever we see a uniform offset, load the data only once in GRFs to
reduce register pressure.

Iris shader-db run on DG2 :

total instructions in shared programs: 23001325 -> 23094969 (0.41%)
instructions in affected programs: 1775989 -> 1869633 (5.27%)
helped: 764
HURT: 2097
helped stats (abs) min: 1 max: 102 x̄: 6.96 x̃: 2
helped stats (rel) min: 0.03% max: 16.91% x̄: 1.36% x̃: 0.63%
HURT stats (abs)   min: 1 max: 2461 x̄: 47.19 x̃: 7
HURT stats (rel)   min: <.01% max: 199.34% x̄: 5.91% x̃: 2.60%
95% mean confidence interval for instructions value: 25.43 40.03
95% mean confidence interval for instructions %-change: 3.60% 4.33%
Instructions are HURT.

total loops in shared programs: 5847 -> 5847 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total cycles in shared programs: 839329852 -> 845491482 (0.73%)
cycles in affected programs: 130229434 -> 136391064 (4.73%)
helped: 1098
HURT: 2228
helped stats (abs) min: 1 max: 130102 x̄: 1340.64 x̃: 22
helped stats (rel) min: <.01% max: 64.25% x̄: 4.03% x̃: 0.71%
HURT stats (abs)   min: 1 max: 185309 x̄: 3426.24 x̃: 87
HURT stats (rel)   min: <.01% max: 92.85% x̄: 8.12% x̃: 3.82%
95% mean confidence interval for cycles value: 1342.16 2362.97
95% mean confidence interval for cycles %-change: 3.70% 4.52%
Cycles are HURT.

total spills in shared programs: 10768 -> 11856 (10.10%)
spills in affected programs: 9717 -> 10805 (11.20%)
helped: 25
HURT: 28

total fills in shared programs: 13720 -> 16258 (18.50%)
fills in affected programs: 12016 -> 14554 (21.12%)
helped: 25
HURT: 28

total sends in shared programs: 1034790 -> 1031266 (-0.34%)
sends in affected programs: 33416 -> 29892 (-10.55%)
helped: 1005
HURT: 0
helped stats (abs) min: 1 max: 22 x̄: 3.51 x̃: 3
helped stats (rel) min: 1.69% max: 60.00% x̄: 15.20% x̃: 14.08%
95% mean confidence interval for sends value: -3.72 -3.29
95% mean confidence interval for sends %-change: -15.82% -14.57%
Sends are helped.

LOST:   26
GAINED: 183

shader-db on a number of VK/DX titles on DG2 :

 PERCENTAGE DELTAS  Shaders   Instrs    Cycles
 age_of_wonders_III 1928      +0.02%    -0.19%

 PERCENTAGE DELTAS       Shaders   Instrs    Cycles  Subgroup size Send messages Spill count Fill count Max live registers Max dispatch width
 assassins_creed_odyssey 2119      +1.12%    -0.42%      -0.03%        -0.29%       -9.10%     -4.26%         -0.64%             +0.65%

 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Spill count Fill count Max live registers
 aztec_ruins_high  269       -0.05%    -0.45%     -0.29%     -7.27%         -0.33%

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles  Max live registers Max dispatch width
 dark_souls_3_dxvk_g2 1420      +0.09%    +0.24%        +0.21%             +0.12%

(stats look bad, but it's just one shader affected)
 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Spill count Fill count Scratch Memory Size Max live registers
 fallout_4_dxvk_g2 1638      +0.67%    +8.32%    +16.02%     +7.17%         +100.00%            +0.48%

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles  Send messages Spill count Fill count Max live registers Max dispatch width
 red_dead_redemption2 5969      +0.16%    -0.04%      -0.04%       +0.01%     +0.05%         -0.20%             +0.04%

 PERCENTAGE DELTAS          Shaders   Instrs    Cycles  Send messages Max live registers Max dispatch width
 rise_of_the_tomb_raider_g2 12129     +2.19%    +1.36%      -1.23%          -0.36%             +2.04%

 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Send messages Max live registers
 shooter-game      693       +0.07%    -0.89%      -0.09%          -0.09%

 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Send messages Max live registers Max dispatch width
 talos_g2          1140      +0.37%    +3.80%      -0.86%          -0.67%             +0.19%

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles  Max live registers Max dispatch width
 total_war_warhammer2 477       +0.25%    +0.66%        -0.17%             +0.10%

 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Send messages Max live registers Max dispatch width
 witcher_3_dxvk_g2 1074      +0.75%   -10.45%      -0.15%          -0.16%             -0.16%

 PERCENTAGE DELTAS      Shaders   Instrs    Cycles  Send messages Max live registers
 wolfenstein_youngblood 1111      +0.52%    +0.66%      -0.59%          -0.03%

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>
2023-06-14 12:04:05 +00:00
Lionel Landwerlin
7eb1e2a690 intel/fs: avoid reusing the VGRF for uniform load_ubo
Only found 3 shaders affected in Red Dead Redemption :

Totals from 3 (0.05% of 5969) affected shaders:
Instrs: 2246 -> 2230 (-0.71%)
Cycles: 156506 -> 148402 (-5.18%); split: -5.23%, +0.05%

This will have a larger effect when we add the
load_ubo_uniform_block_intel intrinsic where we will have larger
blocks (vec8/vec16 vs vec4 only now).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>
2023-06-14 12:04:05 +00:00
Lionel Landwerlin
ff3494fce3 intel/fs: print identation for control flow
INTEL_DEBUG=optimizer output changes from :

{ 10}   40: cmp.nz.f0.0(8) null:F, vgrf3470:F, 0f
{ 10}   41: (+f0.0) if(8) (null):UD,
{ 11}   42: txf_logical(8) vgrf3473:UD, vgrf250:D(null):UD, 0d(null):UD(null):UD(null):UD(null):UD, 31u, 0u(null):UD(null):UD(null):UD, 3d, 0d
{ 12}   43: and(8) vgrf262:UD, vgrf3473:UD, 2u
{ 11}   44: cmp.nz.f0.0(8) null:D, vgrf262:D, 0d
{ 10}   45: (+f0.0) if(8) (null):UD,
{ 11}   46: mov(8) vgrf270:D, -1082130432d
{ 12}   47: mov(8) vgrf271:D, 1082130432d
{ 14}   48: mov(8) vgrf274+0.0:D, 0d
{ 14}   49: mov(8) vgrf274+1.0:D, 0d

to :

{ 10}   40: cmp.nz.f0.0(8) null:F, vgrf3470:F, 0f
{ 10}   41: (+f0.0) if(8) (null):UD,
{ 11}   42:   txf_logical(8) vgrf3473:UD, vgrf250:D(null):UD, 0d(null):UD(null):UD(null):UD(null):UD, 31u, 0u(null):UD(null):UD(null):UD, 3d, 0d
{ 12}   43:   and(8) vgrf262:UD, vgrf3473:UD, 2u
{ 11}   44:   cmp.nz.f0.0(8) null:D, vgrf262:D, 0d
{ 10}   45:   (+f0.0) if(8) (null):UD,
{ 11}   46:     mov(8) vgrf270:D, -1082130432d
{ 12}   47:     mov(8) vgrf271:D, 1082130432d
{ 14}   48:     mov(8) vgrf274+0.0:D, 0d
{ 14}   49:     mov(8) vgrf274+1.0:D, 0d

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23477>
2023-06-14 12:04:05 +00:00
Lionel Landwerlin
0cd9f0c3d3 intel/fs: fix bindless/shared surface mistake
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 068bf1378d ("intel/fs: enable SSBO accesses through the bindless heap")
Tested-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23536>
2023-06-14 07:42:57 +00:00
Alyssa Rosenzweig
1d4a59448c treewide: Remove use_scoped_barrier
It is now set by all relevant drivers and not checked anywhere.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23191>
2023-06-13 16:36:10 +00:00
Jesse Natalie
082eba6165 nir_lower_mem_access_bit_sizes: Move options into a struct
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
2023-06-13 00:43:36 +00:00
Jesse Natalie
4217353e2d nir_lower_mem_access_bit_sizes: Add a bit_size input to the callback
We'd like to use this callback to adjust loads and stores from things
that are unsupported to things that are supported, but if the input
is already supported, we'd prefer not to change it. Rather than making
up a bit size that'd work and doing a bunch of pack/unpack bit math,
only return a different bit size if the input one doesn't work for us
(i.e. can't load enough memory or just an unsupported size entirely).

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
2023-06-13 00:43:36 +00:00
Caio Oliveira
26f6ea5c30 intel/compiler: Remove unused functions and declarations
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23539>
2023-06-09 20:09:51 +00:00
Lionel Landwerlin
25de091753 intel/nir: switch ray query state tracking to local variables uint16_t
We should be able to use uint8_t but there appears to be a backend
bug.

Q2RTX shader compute shader improvement with ray queries :

Totals:
Instrs: 102221 -> 101499 (-0.71%); split: -0.82%, +0.12%
Cycles: 4451260 -> 4396025 (-1.24%)
Send messages: 3587 -> 3585 (-0.06%)
Spill count: 717 -> 658 (-8.23%)
Fill count: 1248 -> 1214 (-2.72%); split: -3.21%, +0.48%
Scratch Memory Size: 21504 -> 16384 (-23.81%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19982>
2023-06-09 08:29:43 +03:00
Caio Oliveira
2bb26cc01d intel/compiler: Refactor dump_instruction(s)
Delete unnecessary virtual functions, we need just two.  Refactor code
so the 'default behavior' logic (stderr and/or creating file) is not
duplicated.

Rename the virtuals so overrides don't hide the common convenience
functions.  Finally, provide a variant of dump_instructions() with
a `FILE *` parameter.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23457>
2023-06-08 22:00:21 +00:00
Alyssa Rosenzweig
99a00e2247 treewide: Use nir_trim_vector more
Via Coccinelle patches

    @@
    expression a, b, c;
    @@

    -nir_channels(b, a, (1 << c) - 1)
    +nir_trim_vector(b, a, c)

    @@
    expression a, b, c;
    @@

    -nir_channels(b, a, BITFIELD_MASK(c))
    +nir_trim_vector(b, a, c)

    @@
    expression a, b;
    @@

    -nir_channels(b, a, 3)
    +nir_trim_vector(b, a, 2)

    @@
    expression a, b;
    @@

    -nir_channels(b, a, 7)
    +nir_trim_vector(b, a, 3)

Plus a fixup for pointless trimming an immediate in RADV and radeonsi.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23352>
2023-06-06 18:52:25 +00:00
Lionel Landwerlin
049c791a63 intel/fs: fix pull-constant-load prior to gfx7
In ad9bc1ffb5 ("intel/fs: enable UBO accesses through bindless heap")
we added a new source, we need to fixup the source index for the
generator.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ad9bc1ffb5 ("intel/fs: enable UBO accesses through bindless heap")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23405>
2023-06-06 14:47:41 +00:00
Ian Romanick
78dd15d8e8 intel/eu/validate: Add some validation of ADD3
v2: Remove spurious ALIGN_1 checks. Suggested by Matt.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Ian Romanick
1c4c76032b intel/eu/validate: Add Gfx12.5
This required updating the expected results in a number of test. The
vast majority of these are cases where Gfx12.5 platforms don't allow
mixing F and HF sources.

In all honesty... I just updated the half_float_conversion expected
results until the test passed.

The next commit will add changes specific to Gfx12.5.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Ian Romanick
a3cfec0690 intel/eu/validate: Use a single macro define half_float_conversion cases
This is what other tests do. The next commit will add a third set of
possible results (for Gfx12.5+), and the multiple macro method does not
scale.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Ian Romanick
7ef45e661f intel/fs: Add constant propagation for ADD3
v2: Require that the constant value be representable as either uint16_t
or int16_t. Suggested by Matt.

v3: Remove redundant patterns. Noticed by Matt.

shader-db:

DG2
total instructions in shared programs: 23103767 -> 23103577 (<.01%)
instructions in affected programs: 51822 -> 51632 (-0.37%)
helped: 98 / HURT: 15

total cycles in shared programs: 842347714 -> 842380017 (<.01%)
cycles in affected programs: 1942595 -> 1974898 (1.66%)
helped: 97 / HURT: 32

Nearly all of the affected shaders (around 9,900) are shaders in
Cyberpunk 2077. It's about an even split between vertex and fragment
shaders. The majority of the remaining affected shaders (3,600) are
from Strange Brigade. This was also a nearly even split between
fragment and vertex.

All but two of the lost shaders are SIMD32 fragment shaders in
Cyberpunk 2077. The other two are SIMD32 fragment shaders in Dota2.

fossil-db:

DG2
Instructions in all programs: 196379107 -> 196248608 (-0.1%)
helped: 13467 / HURT: 1210

Cycles in all programs: 13931355281 -> 13929955971 (-0.0%)
helped: 11801 / HURT: 2922

Lost: 90

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Ian Romanick
9a9a86013c intel/fs: Allow HF const in MAD on Gfx12.5 if all sources are HF
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Ian Romanick
4f272bf001 intel/fs: Fix handling of W, UW, and HF constants in combine_constants
Sources that are already W, UW, or HF can be represented as those types
by definition. Pass them through. Previously an HF source on a MAD would
have been marked as !can_promote. I'm pretty sure this means it would
get moved out to a register, but I did not verify this.

For ADD3, a constant source could be D or UD. In this case, the value
must be tested to determine whether it can be represented as W or
UW. The patterns in opt_algebraic won't generate an ADD3 with constant
source, so this problem cannot occur yet.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Ian Romanick
4cc3206218 intel/fs: Don't munge source order of 3-src instructions in opt_algebraic
This only impacts ADD3, so at this point it should not have any
affect. As soon as constants are propagated into ADD3 instructions, it
will be a problem.

The worst part is, the ADD3 instrutions that are broken by the old code
aren't even "progress" on this pass.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23262>
2023-06-06 06:10:53 +00:00
Erik Faye-Lund
6d142078bc nir: use generated immediate comparison helpers
This makes the code a bit less verbose, so let's use the helpers.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23393>
2023-06-05 13:40:08 +00:00
Erik Faye-Lund
28b1c5bca1 nir: use nir_i{ne,eq}_imm helpers
We already have these, so let's use them more.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23393>
2023-06-05 13:40:07 +00:00
Yonggang Luo
12256136e0 compiler: Rename shader_prim to mesa_prim and replace all usage of pipe_prim_type with mesa_prim
This is a prepare step to remove depends on p_defines.h in src/util/*

This is done by:
replace pipe_prim_type with mesa_prim
replace shader_prim with mesa_prim
replace PIPE_PRIM_MAX  with MESA_PRIM_COUNT
replace SHADER_PRIM_  with MESA_PRIM_
replace PIPE_PRIM_ with MESA_PRIM_

This patch only replace code only

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23369>
2023-06-03 03:29:03 +00:00
Mark Janes
a98f246857 isl: use generated workaround helpers for Wa_1806565034
This workaround was enabled for gen12+, but only applies to gen12.0.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21912>
2023-06-02 16:17:34 +00:00
Kenneth Graunke
2d9a3bb093 intel/compiler: Fix a fallthrough in components_read() for atomics
In commit 284f0c9a57 I refactored the
handling of the data source to just call a helper rather than special
casing opcodes with 0 or 2 sources.  Unfortunately, I also dropped the
"else return 1", creating a fallthrough for all sources other than
SURFACE_LOGICAL_SRC_ADDRESS and SURFACE_LOGICAL_SRC_DATA.

The case below happened to return the correct value for all cases except
SURFACE_LOGICAL_SRC_SURFACE, which has been returning 2 instead of 1
since that commit.

Restore the else case.  Thanks to Marcin Ślusarz for catching this.

Fixes: 284f0c9a57 ("intel/compiler: Add an lsc_op_num_data_values() helper")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23347>
2023-06-01 21:06:57 +00:00
Lionel Landwerlin
018e306b8e intel/fs: fix a couple of descriptor mistakes
I found those issues while testing DOOM eternal and Ian also ran into
it with other shaders.

We write the desc register in SIMD1 exec_all, so all the data is in
the first component. We need to make sure to pass that component in
the lower SEND instructions.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23354>
2023-06-01 19:53:41 +00:00
Rohan Garg
ef2b763d9c anv: fix incorrect asserts when combining CPS and per sample interpolation
CPS is dynamically turned off when per sample interpolation is active.
Update the asserts to reflect this.

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 5644011f06 ("intel/compiler: Convert wm_prog_key::persample_interp to a tri-state")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23103>
2023-05-31 19:26:59 +00:00
Mark Janes
d0669f3ede intel/dev: switch defect identifiers to use lineage numbers
Update existing workarounds when necessary to match changed
identifiers.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23226>
2023-05-30 22:13:41 +00:00
Alyssa Rosenzweig
ebf4eff7eb treewide: Use nir_replicate
Via coccinelle.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23259>
2023-05-30 16:24:21 -04:00
Lionel Landwerlin
3f1ff326e0 anv: reduce push constant size for descriptor sets
Now that descriptor sets are located a in a 1Gb area, we can avoid
storing the whole address to the descriptor and add the base address
of the area to a 32bit offset.

Replay a bunch of fossils with this and changes not really significant
one way or another :

Totals:
Instrs: 9278246 -> 9277148 (-0.01%); split: -0.01%, +0.00%
Cycles: 3547598421 -> 3547579435 (-0.00%); split: -0.00%, +0.00%

Totals from 353 (1.14% of 31021) affected shaders:
Instrs: 581546 -> 580448 (-0.19%); split: -0.23%, +0.04%
Cycles: 25885422 -> 25866436 (-0.07%); split: -0.31%, +0.24%

No difference on send messages or spills/fills.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:38 +00:00
Lionel Landwerlin
04777171e0 intel/fs: try to rematerialize surface computation code
This helps a lot with accessing surface handles in control flow. Our
resource_intel intrinsic has a non_uniform flag, in which case we
cannot apply this optimization. But in uniform cases, this is just a
massive win. We drop all kind of pipeline stalls due to
find_live_channel. We also reduce register pressure by doing the
surface handle computation in a single GRF (instead of 2 or 4).

There are some regressions in max dispatch width but those I think are
only on SIMD32 and due to the current heuristic disabling it after
throughput comparison with SIMD16. We know this heuristic is not
perfect, it should probably be updated in another change.

Here are some stats (all titles seem to have similar gains) :

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles  Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
 red_dead_redemption2 5860     -36.80%    -5.67%      +0.77%        +0.06%      -81.26%     -79.16%        -70.62%             -8.63%             -6.93%
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------
 All affected         4716     -37.29%    -5.67%      +0.95%        +0.07%      -81.26%     -79.16%        -70.62%             -9.15%             -8.47%
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Total                5860     -36.80%    -5.67%      +0.77%        +0.06%      -81.26%     -79.16%        -70.62%             -8.63%             -6.93%

 PERCENTAGE DELTAS          Shaders   Instrs    Cycles  Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
 rise_of_the_tomb_raider_g2 12010    -37.19%   -22.12%      +0.01%        +0.00%      -99.01%     -99.14%        -98.65%             -7.62%             -4.96%
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 All affected               11732    -37.27%   -22.14%      +0.01%        +0.00%      -99.01%     -99.14%        -98.65%             -7.67%             -5.11%
 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Total                      12010    -37.19%   -22.12%      +0.01%        +0.00%      -99.01%     -99.14%        -98.65%             -7.62%             -4.96%

 PERCENTAGE DELTAS    Shaders   Instrs    Cycles  Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
 total_war_warhammer2 462      -27.45%   -12.42%    -82.35%     -88.46%        -66.67%             -5.52%             -5.62%
 -----------------------------------------------------------------------------------------------------------------------------------
 All affected         335      -28.31%   -12.77%    -82.35%     -88.46%        -66.67%             -6.25%             -7.24%
 -----------------------------------------------------------------------------------------------------------------------------------
 Total                462      -27.45%   -12.42%    -82.35%     -88.46%        -66.67%             -5.52%             -5.62%

 PERCENTAGE DELTAS Shaders   Instrs    Cycles  Subgroup size Send messages Spill count Fill count Scratch Memory Size Max live registers Max dispatch width
 witcher_3_dxvk_g2 1049     -36.94%   -57.82%      +0.06%        +0.01%      -98.52%     -97.29%        -98.10%             -7.81%             -1.00%
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 All affected      693      -41.93%   -58.45%      +0.09%        +0.01%      -98.52%     -97.29%        -98.10%             -10.25%            -1.33%
 ------------------------------------------------------------------------------------------------------------------------------------------------------------
 Total             1049     -36.94%   -57.82%      +0.06%        +0.01%      -98.52%     -97.29%        -98.10%             -7.81%             -1.00%

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
b28609a756 intel/fs: enable uniform block accesses through bindless heap
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
05089f305f intel/fs: enable bindless sampler state offsets
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
6d6877bf99 intel/fs: enable extended bindless surface offset
Gives use 4Gb of bindless surface state on Gfx12.5+ instead of 64Mb.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
01fc9a06bd intel/fs: enable get_buffer_size on bindless heap
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
ad9bc1ffb5 intel/fs: enable UBO accesses through bindless heap
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
068bf1378d intel/fs: enable SSBO accesses through the bindless heap
Using the information coming from surface_index_intel, we can tell
whether we should use the BTI or bindless heap for a particular SSBO
access.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
3d0cc3f63b intel/fs: keep track of new resource_intel information
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
86e9943b00 intel/fs: teach ubo range analysis pass about resource_intel
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
12540dfb6b intel/fs: add a pass to move resource_intel closer to user
Non uniform lower can insert read_first_invocation on the result of
resource_intel. We want to keep that intrinsic directly in front of
the user (load_ubo/load_ssbo/load_image/etc...)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:37 +00:00
Lionel Landwerlin
e09cfda0de intel/fs: lower get_buffer_size like other logical sends
This will also enable the use of the bindless heap.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:36 +00:00
Lionel Landwerlin
a66944dfbc intel/fs: reuse descriptor helper
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21645>
2023-05-30 06:36:36 +00:00
Erik Faye-Lund
20d619cd84 nir: use more nir_fmul_imm
This simplifies things a bit. Note that in some cases, the arguments are
swapped, because multiplications are commutative, and nir_fmul_imm only
allows the second operand to be an immediate.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23179>
2023-05-25 06:59:24 +00:00
Lionel Landwerlin
429ef02f83 intel/fs: make tcs input_vertices dynamic
We need to do 3 things to accomplish this :

   1. make all the register access consider the maximal case when
      unknown at compile time

   2. move the clamping of load_per_vertex_input prior to lowering
      nir_intrinsic_load_patch_vertices_in (in the dynamic cases, the
      clamping will use the nir_intrinsic_load_patch_vertices_in to
      clamp), meaning clamping using derefs rather than lowered
      nir_intrinsic_load_per_vertex_input

   3. in the known cases, lower nir_intrinsic_load_patch_vertices_in
      in NIR (so that the clamped elements still be vectorized to the
      smallest number of URB read messages)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22378>
2023-05-24 18:32:07 +00:00
Lionel Landwerlin
21c7b55f6f intel/fs: fix size_read() for LOAD_PAYLOAD
With Anv/Zink, the piglit test :

  arb_shader_storage_buffer_object-max-ssbo-size -auto -fbo fsexceed

is failing validation after copy propagation :

load_payload(8) vgrf15:F, vgrf1+0.12<0>:F, vgrf1+0.0<0>:F, vgrf1+0.4<0>:F, vgrf1+0.8<0>:F, vgrf1+0.12<0>:F
../src/intel/compiler/brw_fs_validate.cpp:191: A <= B failed
  A = inst->src[i].offset / REG_SIZE + regs_read(inst, i) = 2
  B = alloc.sizes[inst->src[i].nr] = 1

In most cases it works because src[0] would be at offset 0 and so
reading a full reg passes validation, but Anv/Zink started emitting
slightly different code adding an offset maybe the size read 2 GRFs.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23126>
2023-05-23 12:39:08 +00:00
Kenneth Graunke
a2d384a5c0 intel/compiler: Fix 64-bit ufind_msb, find_lsb, and bit_count
We only support 32-bit versions of ufind_msb, find_lsb, and bit_count,
so we need to lower them via nir_lower_int64.

Previously, we were failing to do so on platforms older than Icelake
and let those operations fall through to nir_lower_bit_size, which
used a callback to determine it should lower them for bit_size != 32.
However, that pass only emulates small bit-size operations by promoting
them to supported, larger bit-sizes (i.e. 16-bit using 32-bit).  It
doesn't support emulating larger operations (i.e. 64-bit using 32-bit).

So nir_lower_bit_size would just u2u32 the 64-bit source, causing us to
flat ignore half of the bits.

Commit 78a195f252 (intel/compiler: Postpone most int64 lowering to
brw_postprocess_nir) provoked this bug on Icelake and later as well,
by moving the nir_lower_int64 handling for ufind_msb until late in
compilation, allowing it to reach nir_lower_bit_size which broke it.

To fix this, we always set int64 lowering for these opcodes, and also
correct the nir_lower_bit_size callback to ignore 64-bit operations.

Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23123>
2023-05-19 22:44:37 +00:00
Erik Faye-Lund
185001a86f meson: remove needless c++17-overrides
C++17 is the project-wide default since f9057cea51 ("fix(FTBFS):
meson: raise C++ standard to C++17"), so let's drop these local
overrides.

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23048>
2023-05-19 12:45:31 +00:00
Rohan Garg
6b8fe32322 intel: infer scalar'ness locally for brw_vectorize_lower_mem_access
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23098>
2023-05-18 15:46:06 +02:00
Rohan Garg
3a8f5c2783 intel: update comments about non-existent function parameter
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23098>
2023-05-18 15:46:06 +02:00
Rohan Garg
a15cc833f9 intel: drop unused is_scalar function parameter in brw_nir_apply_key
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23098>
2023-05-18 15:46:06 +02:00
Rohan Garg
212810ac8a intel: infer scalar'ness locally for brw_postprocess_nir
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23098>
2023-05-18 15:46:06 +02:00