Commit Graph

2497 Commits

Author SHA1 Message Date
Lionel Landwerlin
9c1c1888d9 intel/fs: put scratch surface in the surface state heap
In 4ceaed7839 we made scratch surface state allocations part of the
internal heap (mapped to STATE_BASE_ADDRESS::SurfaceStateBaseAddress)
so that it doesn't uses slots in the application's expected 1M
descriptors (especially with vkd3d-proton).

But all our compiler code relies on BSS
(STATE_BASE_ADDRESS::BindlessSurfaceStateBaseAddress).

The additional issue is that there is only 26bits of surface offset
available in CS instruction (CFE_STATE, 3DSTATE_VS, etc...) for
scratch surfaces. So we need the drivers to put the scratch surfaces
in the first chunk of STATE_BASE_ADDRESS::SurfaceStateBaseAddress
(hence all the driver changes).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4ceaed7839 ("anv: split internal surface states from descriptors")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7687
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19727>
2022-11-19 14:58:58 +00:00
Michael Skorokhodov
a9602134a3 intel/compiler: Require C++17
Fixes: 6c194ddd18 ("intel/compiler: Prepare SIMD selection helpers to handle different prog_datas")

Signed-off-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19833>
2022-11-19 04:37:51 +00:00
Caio Oliveira
fbe40720e0 intel/compiler: Remove redundant argument from brw_nir_create_passthrough_tcs
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19831>
2022-11-19 00:35:56 +00:00
Yonggang Luo
4b0409ff9a intel: fixes -Werror,-Wunused-but-set-variable for clang-15
one of those error message:
../../src/intel/compiler/brw_vec4_cmod_propagation.cpp:53:8: error: variable 'ip' set but not used [-Werror,-Wunused-but-set-variable]
   int ip = block->end_ip + 1;

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19527>
2022-11-17 23:17:40 +00:00
Yonggang Luo
d6bd382352 intel: Fixes -Werror,-Wbitwise-instead-of-logical for clang-15 in brw_nir_lower_shader_calls.c
error message:
error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19527>
2022-11-17 23:17:40 +00:00
Caio Oliveira
eedbd1ddbf intel/compiler: Use SIMD selection helpers in compile_single_bs()
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>
2022-11-15 04:55:18 +00:00
Caio Oliveira
6c194ddd18 intel/compiler: Prepare SIMD selection helpers to handle different prog_datas
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>
2022-11-15 04:55:18 +00:00
Caio Oliveira
6ffa597bcf intel/compiler: Keep track of compiled/spilled in brw_simd_selection_state
We still update the cs_prog_data, but don't rely on it for this state anymore.
This will allow use the SIMD selector with shaders that don't use cs_prog_data.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>
2022-11-15 04:55:18 +00:00
Caio Oliveira
3c52e2d04c intel/compiler: Add a SIMD_COUNT constant
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>
2022-11-15 04:55:18 +00:00
Caio Oliveira
a0580dadfd intel/compiler: Create a struct to hold SIMD selection state
This is a preparation to decouple the storage of what SIMDs
compiled/spilled from the cs_prog_data.  This will allow reuse
of SIMD selection code by Bindless Shaders.

And since we have a struct now, move the error array there so
reduce the boilerplate of the users.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>
2022-11-15 04:55:18 +00:00
Caio Oliveira
8cda6cd774 intel/compiler: Simplify usage of brw_simd_select_for_workgroup_size()
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>
2022-11-15 04:55:18 +00:00
Caio Oliveira
a943dbf475 intel/compiler: Make brw_private.h and simd selector helpers C++
We don't intend to expose neither to drivers, so it is fine to be C++.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>
2022-11-15 04:55:18 +00:00
Caio Oliveira
494e2edb90 intel/compiler: Fix missing tie-breaker in brw_nir_analyze_ubo_ranges() ordering code
Per Ken suggestion, use ascending order for the start offset.

Fixes: 6d28c6e52c ("i965: Select ranges of UBO data to be uploaded as push constants.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19731>
2022-11-14 19:41:35 +00:00
Caio Oliveira
9fd1d47aa0 intel/compiler: Fix dynarray usage in intel_clc
The code builds up the dynamic array of objects (spirv_objs) and
collect pointers to each of them into another dynamic
array (spirv_ptr_objs).

If the growth of the first array cause a reallocation, it is
possible that the previous pointers end up invalid.

Fixes: 77e929a527 ("intel/clc: allow multiple CL files to be compiled together")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19730>
2022-11-14 19:15:05 +00:00
Lionel Landwerlin
bdf680cd3f intel/fs: use nir_opt_ray_query_ranges
Results on DG2 q2rtx shaders:

Totals from 6 (12.24% of 49) affected shaders:
Instrs: 88927 -> 54088 (-39.18%)
Cycles: 4115088 -> 2536902 (-38.35%)
Send messages: 2639 -> 1609 (-39.03%)
Spill count: 1321 -> 613 (-53.60%)
Fill count: 3130 -> 1104 (-64.73%)
Scratch Memory Size: 22528 -> 18432 (-18.18%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16593>
2022-11-11 15:17:08 +00:00
Caio Oliveira
ecc2dfc503 intel/compiler: Use std::unique_ptr for tracking the fs_visitors
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19605>
2022-11-10 18:01:52 +00:00
Lionel Landwerlin
b499a27d74 nir: make ray query load values visible in NIR prints
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19641>
2022-11-10 14:40:08 +02:00
Ian Romanick
351b8c6aec intel/fs: Enable nir_op_imul_32x16 and nir_op_umul_32x16 on pre-Gfx7
Even though Intel's CI doesn't test these old platforms anymore, the
validation added in "intel/eu/validate: Validate integer multiplication
source size restrictions" combined with full shader-db runs gives me
confidence in the changes.

Sandy Bridge
total instructions in shared programs: 13902341 -> 13902167 (<.01%)
instructions in affected programs: 30771 -> 30597 (-0.57%)
helped: 66 / HURT: 0

total cycles in shared programs: 741795500 -> 741791931 (<.01%)
cycles in affected programs: 987602 -> 984033 (-0.36%)
helped: 28 / HURT: 5

Iron Lake
total instructions in shared programs: 8365806 -> 8365754 (<.01%)
instructions in affected programs: 1766 -> 1714 (-2.94%)
helped: 10 / HURT: 0

total cycles in shared programs: 248542694 -> 248542378 (<.01%)
cycles in affected programs: 29836 -> 29520 (-1.06%)
helped: 9 / HURT: 0

GM45
total instructions in shared programs: 5187127 -> 5187101 (<.01%)
instructions in affected programs: 891 -> 865 (-2.92%)
helped: 5 / HURT: 0

total cycles in shared programs: 163643914 -> 163643750 (<.01%)
cycles in affected programs: 22206 -> 22042 (-0.74%)
helped: 5 / HURT: 0

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19602>
2022-11-09 21:34:26 +00:00
Ian Romanick
293ad13e3f intel/fs: Slightly restructure emitting nir_op_imul_32x16 and nir_op_umul_32x16
There are no immediate values at this point, so all of this code was
bunk. :face_palm:

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19602>
2022-11-09 21:34:26 +00:00
Ian Romanick
ee2a299661 intel/eu/validate: Validate integer multiplication source size restrictions
v2: Expect correct result on BDW in test_eu.

v3: Fix SNB type-size check. Noticed by Marcin.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19602>
2022-11-09 21:34:26 +00:00
Ian Romanick
d668512f88 intel/compiler: Fix signed integer range analysis of imax and imin
Some review feedback of an earlier commit caused me to rearrange some
code quite a bit. I wasn't paying enough attention while applying the
later commits, and these breaks should have been returns. As it is, the
result of the imin or imax analysis is overwritten by the default case
handling... effectively the original commit does nothing. :(

Tiger Lake and Ice Lake had similar results. (Ice Lake shown)
total instructions in shared programs: 19914090 -> 19904772 (-0.05%)
instructions in affected programs: 121258 -> 111940 (-7.68%)
helped: 445 / HURT: 0

total cycles in shared programs: 855291535 -> 855266659 (<.01%)
cycles in affected programs: 2737005 -> 2712129 (-0.91%)
helped: 426 / HURT: 17

LOST:   0
GAINED: 3

Skylake and Broadwell had similar results. (Skylake shown)
total cycles in shared programs: 842395356 -> 842338259 (<.01%)
cycles in affected programs: 5460985 -> 5403888 (-1.05%)
helped: 458 / HURT: 0

Haswell and Ivy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 16710449 -> 16708449 (-0.01%)
instructions in affected programs: 44101 -> 42101 (-4.54%)
helped: 75 / HURT: 0

total cycles in shared programs: 882760230 -> 882727923 (<.01%)
cycles in affected programs: 2867797 -> 2835490 (-1.13%)
helped: 62 / HURT: 10

No shader-db change on any other Intel platform.

No fossil-db changes on any Intel platform.

Fixes: 5ec75ca10d ("intel/compiler: Teach signed integer range analysis about imax and imin")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19602>
2022-11-09 21:34:26 +00:00
Jason Ekstrand
25c180b509 intel: Don't cross DWORD boundaries with byte scratch load/store
The back-end swizzles dwords so that our indirect scratch messages match
the memory layout of spill/fill messages for better cache coherency.
The swizzle happens at a DWORD granularity.  If a read or write crosses
a DWORD boundary, the first bit will get correctly swizzled but whatever
piece lands in the next dword will not because the scatter instructions
assume sequential addresses for all bytes.  For DWORD writes, this is
handled naturally as part of scalarizing.  For smaller writes, we need
to be sure that a single write never escapes a dword.

Fixes: fd04f858b0 ("intel/nir: Don't try to emit vector load_scratch instructions")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7364
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19580>
2022-11-09 19:45:10 +00:00
Jason Ekstrand
85685cf932 intel/lower_mem_access_bit_sizes: Compute alignments automatically
Because dup_mem_intrinsic() retains the SSA offset from the original
intrinsic and only modifies it by adding a constant, we can compute the
alignment based on the original alignment and the constant offset.  This
is both easier and more accurate.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19580>
2022-11-09 19:45:10 +00:00
Caio Oliveira
22d8ed84b8 intel/compiler: Remove unused fs_visitor::emit_percomp()
Since 7ef7738a61 ("i965: Write gl_FragCoord directly to the destination.") this
is not used.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19586>
2022-11-08 07:33:09 +00:00
Caio Oliveira
90861e6fea intel/compiler: Remove various unused function declarations
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19586>
2022-11-08 07:33:08 +00:00
Caio Oliveira
48506a9029 intel/compiler: Remove unused data members
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19586>
2022-11-08 07:33:08 +00:00
Ian Romanick
9abeb3d739 intel/fs: Optimize integer multiplication of large constants by factoring
Many Intel platforms can only perform 32x16 bit multiplication.  The
straightforward way to implement 32x32 bit multiplications is by
splitting one of the operands into high and low parts called H and L,
repsectively.  The full multiplication can be implemented as:

         ((A * H) << 16) + (A * L)

On Intel platforms, special register accesses can be used to eliminate
the shift operation.  This results in three instructions and a temporary
register for most values.

If H or L is 1, then one (or both) of the multiplications will later be
eliminated.  On some platforms it may be possible to eliminate the
multiplication when H is 256.

If L is zero (note that H cannot be zero), one of the multiplications
will also be eliminated.

Instead of splitting the operand into high and low parts, it may
possible to factor the operand into two 16-bit factors X and Y.  The
original multiplication can be replaced with (A * (X * Y)) = ((A * X) *
Y).  This requires two instructions without a temporary register.

I may have gone a bit overboard with optimizing the factorization
routine.  It was a fun brainteaser, and I couldn't put it down. :) On my
1.3GHz Ice Lake, a standalone test could chug through 1,000,000 randomly
selected values in about 5.7 seconds.  This is about 9x the performance
of the obvious, straightforward implementation that I started with.

v2: Drop an unnecessary return.  Rearrange logic slightly and rename
variables in factor_uint32 to better match the names used in the large
comment.  Both suggested by Caio. Rearrange logic to avoid possibly
using `a` uninitialized. Noticed by Marcin.

v3: Use DIV_ROUND_UP instead of open coding it. Noticed by Caio.

Tiger Lake, Ice Lake, Haswell, and Ivy Bridge had similar results. (Ice Lake shown)
total instructions in shared programs: 19912558 -> 19912526 (<.01%)
instructions in affected programs: 3432 -> 3400 (-0.93%)
helped: 10 / HURT: 0

total cycles in shared programs: 856413218 -> 856412810 (<.01%)
cycles in affected programs: 122032 -> 121624 (-0.33%)
helped: 9 / HURT: 0

No shader-db changes on any other Intel platforms.

Tiger Lake and Ice Lake had similar results. (Ice Lake shown)
Instructions in all programs: 141997227 -> 141996923 (-0.0%)
Instructions helped: 71

Cycles in all programs: 9162524757 -> 9162523886 (-0.0%)
Cycles helped: 63
Cycles hurt: 5

No fossil-db changes on any other Intel platforms.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
2022-11-08 00:02:16 +00:00
Ian Romanick
5ec75ca10d intel/compiler: Teach signed integer range analysis about imax and imin
This is especially helpful for a*isign(a) generated by idiv_by_const
optimization.  On many GPUs, isign(a) is lowered to imax(imin(a, 1),
-1).

There are no changes on fossil-db because ANV uses a different
optimization path for idiv with a constant denominator.  A future MR
will change this.

NOTE: This commit used to help a few hundred shader-db shaders, but
now none are affected.  I suspect this is due to some change in the
idiv_by_const optimization.  This could possibly be dropped.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
2022-11-08 00:02:16 +00:00
Ian Romanick
1b0da3a765 intel/compiler: Signed integer range analysis for imul_32x16 generation
Only iabs and ineg are treated specially.  Everything else just uses
nir_unsigned_upper_bound.  The special treatment of source modifiers is
because they cause problems for nir_unsigned_upper_bound.  Once those
are peeled off, nir_unsigned_upper_bound can generally produce a
tighter bound.

Future commits will add more opcodes.  This mostly introduces the
basic framework.

v2: Add a bunch of comments to signed_integer_range_analysis. Re-arrange
the code a little to reduce duplication.  Both suggested by
Caio. Rearrange some logic to simplify things. Suggested by Marcin.

Tiger Lake, Ice Lake, Haswell, and Ivy Bridge had similar results. (Ice Lake shown)
total instructions in shared programs: 19912894 -> 19912558 (<.01%)
instructions in affected programs: 109275 -> 108939 (-0.31%)
helped: 74 / HURT: 0

total cycles in shared programs: 856422769 -> 856413218 (<.01%)
cycles in affected programs: 15268102 -> 15258551 (-0.06%)
helped: 65 / HURT: 4

total fills in shared programs: 8218 -> 8217 (-0.01%)
fills in affected programs: 1171 -> 1170 (-0.09%)
helped: 1 / HURT: 0

Skylake and Broadwell had similar results. (Skylake shown)
total cycles in shared programs: 845145547 -> 845142263 (<.01%)
cycles in affected programs: 15261465 -> 15258181 (-0.02%)
helped: 65 / HURT: 0

Tiger Lake
Tiger Lake
Instructions in all programs: 157580768 -> 157579730 (-0.0%)
Instructions helped: 312
Instructions hurt: 28

Cycles in all programs: 7566977172 -> 7566967746 (-0.0%)
Cycles helped: 288
Cycles hurt: 53

Spills in all programs: 19701 -> 19700 (-0.0%)
Spills helped: 2
Spills hurt: 4

Fills in all programs: 33311 -> 33335 (+0.1%)
Fills helped: 5
Fills hurt: 4

Ice Lake
Instructions in all programs: 141998667 -> 141997227 (-0.0%)
Instructions helped: 420
Instructions hurt: 3

Cycles in all programs: 9162565297 -> 9162524757 (-0.0%)
Cycles helped: 389
Cycles hurt: 29

Spills in all programs: 19918 -> 19916 (-0.0%)
Spills helped: 2
Spills hurt: 3

Fills in all programs: 32795 -> 32814 (+0.1%)
Fills helped: 6
Fills hurt: 3

Skylake
Instructions in all programs: 132567691 -> 132567745 (+0.0%)
Instructions hurt: 24

Cycles in all programs: 8828897462 -> 8828889517 (-0.0%)
Cycles helped: 405
Cycles hurt: 6

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
2022-11-08 00:02:16 +00:00
Ian Romanick
f90d71055b intel/compiler: Add and use a pass to generate imul_32x16 instructions
Gfx8 and Gfx9 platforms are helped for cycles because now many
instructions like

    mul(8)          g12<1>D         g10<8,8,1>D     6D

become

    mul(8)          g12<1>D         g10<8,8,1>D     6W

It is the same number of instructions, but the 32x16 multiply is a
little faster.

v2: Fix transposed hi and lo in "(hi >= INT16_MIN && lo <= INT16_MAX)".
Noticed by Caio.  Use nir_src_is_const instead of open coding it.
Suggested by Caio.

Broadwell and Skylake had similar results. (Skylake shown)
total cycles in shared programs: 845748380 -> 845145547 (-0.07%)
cycles in affected programs: 446346348 -> 445743515 (-0.14%)
helped: 6017
HURT: 0
helped stats (abs) min: 2 max: 7380 x̄: 100.19 x̃: 8
helped stats (rel) min: <.01% max: 3.72% x̄: 0.41% x̃: 0.39%
95% mean confidence interval for cycles value: -113.37 -87.00
95% mean confidence interval for cycles %-change: -0.42% -0.41%
Cycles are helped.

Skylake
Cycles in all programs: 8844820715 -> 8828897462 (-0.2%)
Cycles helped: 47914
Cycles hurt: 1

No shader-db or fossil-db changes on any other Intel platform.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
2022-11-08 00:02:16 +00:00
Ian Romanick
9479e3a19b intel/fs: Allow constant copy prop from DW to W
This enables copy propagation of

    mov(8)          g5<1>UD         0x00000180UD
    mul(8)          g10<1>D         g2.3<0,1,0>D    g5<16,8,2>W

into

    mul(8)          g10<1>D         g2.3<0,1,0>D    180W

This is necessary for any optimization passes that generate imul_32x16
instructions.

No fossil-db or shader-db changes on any Intel platform.

v2: Fix type size check to (src size != 2) || (dest size != 4).  It was
previously &&. :( This allowed copying constants into UB sources, and
that is invalid.

v3: Fix incorrect extraction of upper 16-bits of immediate value when
subnr=2. Noticed by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
2022-11-08 00:02:16 +00:00
Ian Romanick
90d267b2d1 intel/fs: Fix bounds checking for integer multiplication lowering
The previous bounds checking would cause

    mul(8)          g121<1>D        g120<8,8,1>D    0xec4dD

to be lowered to

    mul(8)          g121<1>D        g120<8,8,1>D    0xec4dUW
    mul(8)          g41<1>D         g120<8,8,1>D    0x0000UW
    add(8)          g121.1<2>UW     g121.1<16,8,2>UW g41<16,8,2>UW

Instead of picking the bounds (and the new type) based on the old type,
pick the new type based on the value only.

This helps a few fossil-db shaders in Witcher 3 and Geekbench5.  No
changes on any other Intel platforms.

Tiger Lake
Instructions in all programs: 157581069 -> 157580768 (-0.0%)
Instructions helped: 24

Cycles in all programs: 7566979620 -> 7566977172 (-0.0%)
Cycles helped: 22
Cycles hurt: 4

Ice Lake
Instructions in all programs: 141998965 -> 141998667 (-0.0%)
Instructions helped: 26

Cycles in all programs: 9162568666 -> 9162565297 (-0.0%)
Cycles helped: 24
Cycles hurt: 2

Skylake
No changes.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
2022-11-08 00:02:16 +00:00
Ian Romanick
db20412168 intel/fs: Fix constant propagation into 32x16 integer multiplication
Don't copy propagate the constant in situations like

    mov(8)          g8<1>D          0x7fffffffD
    mul(8)          g16<1>D         g8<8,8,1>D      g15<16,8,2>W

On platforms that only have a 32x16 multiplier, this will result in
lowering the multiply to

    mul(8)          g15<1>D         g14<8,8,1>D     0xffffUW
    mul(8)          g16<1>D         g14<8,8,1>D     0x7fffUW
    add(8)          g15.1<2>UW      g15.1<16,8,2>UW g16<16,8,2>UW

On Gfx8 and Gfx9, which have the full 32x32 multiplier, it results in

    mul(8)          g16<1>D         g15<16,8,2>W    0x7fffffffD

Volume 2a of the Skylake PRM says:

    When multiplying a DW and any lower precision integer, the
    DW operand must on src0.

See also https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/104.

Previous to INTEL_shader_integer_functions2 (in Vulkan or OpenGL), I
don't think it would be possible to create a situation where this could
occur.  I discovered this via some optimizations that can determine that
the non-constant source must be able to fit in 16-bits.  The case listed
above came from piglit's "ext_transform_feedback-order arrays points"
with those optimizations in place.

No shader-db or fossil-db changes on any Intel platform.

Fixes: de6c0f8487 ("intel/fs: Implement support for NIR opcodes for INTEL_shader_integer_functions2")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
2022-11-08 00:02:16 +00:00
Francisco Jerez
5d4df3ac23 intel/compiler: Run extra fp64 lowering pass on devices that don't support int64.
In some cases nir_lower_int64 will emit fp64 operations which aren't
natively supported on any Intel hardware (e.g. ftrunc, frem).  An
extra pass of nir_opt_algebraic (for frem) and nir_lower_doubles is
required in order to take care of them.  This fixes several int64
test-cases on MTL hardware.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19390>
2022-11-07 07:35:22 +00:00
Illia Abernikhin
aa4ac5ff8b utils: Merge util/debug.* into util/u_debug.* and remove util/debug.*
Rename env_var_as_unsigned() -> debug_get_num_option(), because duplicate
Rename env_var_as_bool() -> debug_get_bool_option(), because duplicate

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7177

Signed-off-by: Illia Abernikhin <illia.abernikhin@globallogic.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19336>
2022-11-02 07:25:39 +00:00
Kenneth Graunke
88756cee8d intel/compiler: Run nir_opt_large_constants before scalarizing consts
nir_opt_large_constants balks at seeing a store_deref of a variable
where the source is a vecN operation of multiple load_consts, and thinks
that isn't a constant, so it should not bother promoting it.

Unfortunately, we were running nir_lower_load_const_to_scalar before
nir_opt_large_constants, so this prevented a ton of constant promotion.

This commit /used to help/ some shaders in shader-db. Presumably since
!16770 landed, those shaders were already helped.  Currently ther are
no shader-db changes on any Intel platform.

Fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141998227 -> 141421756 (-0.4%)
Instructions helped: 12515
Instructions hurt: 237

SENDs in all programs: 7437925 -> 7468033 (+0.4%)
SENDs hurt: 12806

Cycles in all programs: 9161655753 -> 9132869800 (-0.3%)
Cycles helped: 10163
Cycles hurt: 2637

Spills in all programs: 19977 -> 18678 (-6.5%)
Spills helped: 384
Spills hurt: 40

Fills in all programs: 32863 -> 31396 (-4.5%)
Fills helped: 385
Fills hurt: 42

Lost: 1

Lots of Shadow of the Tomb Raider fragment shaders and Batman Arkham
Origins vertex shaders were hurt for SENDs in this commit.  A couple
Aztec Ruins compute shaders and Spaceship shaders (multiple stages)
were also hurt.

All of the shaders hurt for spills or fills were Spaceship compute
shaders.  Nearly all of the shaders helped were Shadow of the Tomb
Raider fragmenet shaders.  One Spaceship shader was reall, REALLY helped:

Spills helped fossils/fossil-db/Spaceship.run.9f90a2a226fcc57f.1.foz/0b507d3abe2e3c28/compute: 321 -> 13 (-96.0%)
Fills helped fossils/fossil-db/Spaceship.run.9f90a2a226fcc57f.1.foz/0b507d3abe2e3c28/compute: 279 -> 21 (-92.5%)

Overall this seems like an improvement, but we may want to actually
run these few benchmarks before landing.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16539>
2022-11-01 14:55:21 -07:00
Lionel Landwerlin
920aed2121 intel/compiler: don't allocate compaction arrays on the stack
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7569
Cc: mesa-stable
Reviewed-by: Luis Felipe Strano Moraes <luis.strano@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19339>
2022-10-28 07:10:58 +00:00
Lionel Landwerlin
e59c4a912b intel/fs: use fs implementation of dump_instructions
This specialized version prints out the liveness count as well as the
maximum liveness count. It was eye opening when seeing the max
liveness jump after lowering of packing instructions which should not
have changed the count.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>
2022-10-27 21:05:00 +00:00
Lionel Landwerlin
e5dfff0946 intel/fs: reduce liveness of variables in lowering passes
When lowering a single instruction with a destination VGRF to 2 or
more, the VGRF is now considered partially written by each generated
instruction and that increases its liveness especially in loops. Thus
potentially increasing the number of spills/fills due to register
allocation.

Putting an UNDEF instruction in front of the lowered instructions
allows the IR to limit the liveness of the VGRF, reducing register
pressure.

This has a pretty dramatic effect on spills/fills for RT shaders. Here
the stats on Q2RTX shaders on DG2 (wipping out any spills/fills due to
register allocation) :

Instructions in all programs: 26150 -> 24955 (-4.6%)
SENDs in all programs: 1148 -> 1148 (+0.0%)
Loops in all programs: 4 -> 4 (+0.0%)
Cycles in all programs: 392179 -> 332787 (-15.1%)
Spills in all programs: 132 -> 116 (-12.1%)
Fills in all programs: 262 -> 154 (-41.2%)

Shader-db results on TGL :

total instructions in shared programs: 21158140 -> 21158377 (<.01%)
instructions in affected programs: 76629 -> 76866 (0.31%)
helped: 18
HURT: 20
helped stats (abs) min: 1 max: 60 x̄: 18.89 x̃: 12
helped stats (rel) min: 0.21% max: 3.61% x̄: 1.02% x̃: 0.77%
HURT stats (abs)   min: 1 max: 79 x̄: 28.85 x̃: 18
HURT stats (rel)   min: 0.04% max: 2.81% x̄: 1.13% x̃: 0.79%
95% mean confidence interval for instructions value: -4.82 17.30
95% mean confidence interval for instructions %-change: -0.34% 0.57%
Inconclusive result (value mean confidence interval includes 0).

total loops in shared programs: 5753 -> 5753 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total cycles in shared programs: 798856834 -> 798870688 (<.01%)
cycles in affected programs: 6208395 -> 6222249 (0.22%)
helped: 22
HURT: 17
helped stats (abs) min: 2 max: 8794 x̄: 1438.18 x̃: 782
helped stats (rel) min: 0.05% max: 2.28% x̄: 0.63% x̃: 0.44%
HURT stats (abs)   min: 2 max: 19178 x̄: 2676.12 x̃: 1358
HURT stats (rel)   min: 0.04% max: 23.49% x̄: 2.25% x̃: 0.71%
95% mean confidence interval for cycles value: -952.19 1662.65
95% mean confidence interval for cycles %-change: -0.64% 1.90%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 4078 -> 4066 (-0.29%)
spills in affected programs: 40 -> 28 (-30.00%)
helped: 2
HURT: 0

total fills in shared programs: 2856 -> 2832 (-0.84%)
fills in affected programs: 127 -> 103 (-18.90%)
helped: 2
HURT: 0

total sends in shared programs: 998554 -> 998554 (0.00%)
sends in affected programs: 0 -> 0
helped: 0
HURT: 0

LOST:   0
GAINED: 0

Total CPU time (seconds): 2346.06 -> 2304.80 (-1.76%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>
2022-10-27 21:05:00 +00:00
Lionel Landwerlin
dd6d40429b intel/fs: make split_virtual_grfs deal with partial undefs
v2: fix up UNDEFs instructions (Curro)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>
2022-10-27 21:05:00 +00:00
Lionel Landwerlin
14b99df7d9 intel/fs: require UNDEFs register offsets to be aligned to REG_SIZE
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>
2022-10-27 21:05:00 +00:00
Alyssa Rosenzweig
941c37c085 nir/lower_idiv: Remove imprecise_32bit_lowering
NIR has two implementations of lower_idiv, keyed on the
imprecise_32bit_lowering flag. This flag is misleading: the results when
setting this flag "imprecise", they're completely wrong for some values.
If a backend has a native implementation of umul_high, the correct path
isn't that much more expensive. If it doesn't, it's substantially slower
for highp integer divison... but in practice, non-constant highp integer
division is pretty rare.

After a painful migration of the tree, this code path has no more users.
Remove it so nobody else gets the bright idea of using it again.

Closes: #6555
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19303>
2022-10-27 19:37:14 +00:00
Jordan Justen
c238699afa intel/compiler: Broadcast lower code should check 64-bit int support
This will affect MTL which will have fp64 support without int64
support.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19284>
2022-10-27 09:22:09 +00:00
Lionel Landwerlin
2da7ec0db9 intel/clc: assert when libclc shader is not found
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7483
Reviewed-by: Luis Felipe Strano Moraes <luis.strano@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19091>
2022-10-27 08:53:55 +00:00
Tapani Pälli
1e51383258 intel/compiler: run nir_opt_idiv_const before nir_lower_idiv
Integer div lowering can potentially create a lot of code that is
not removed later on. Running const lowering pass first can be used
to eliminate that code.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19157>
2022-10-20 15:35:48 +03:00
Luis Felipe Strano Moraes
eff1517cd7 anv: added proper handling for input argument in intel_clc
That was previously listed on the getopt_long struct but not actually
being used. This makes intel_clc argument processing easier as now
all of its arguments are handled with getopt and anything after the
special argument '--' is passed along to clang to form the final build
command.

Thanks to Dylan Baker for help with changes to the meson file.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19153>
2022-10-20 02:24:39 +00:00
Luis Felipe Strano Moraes
8de02ff980 anv: fixing typo on description of output flag for intel_clc
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19153>
2022-10-20 02:24:39 +00:00
Luis Felipe Strano Moraes
056d72c897 anv: add missing separator to help for intel_clc
intel_clc relies on the special argument '--' for getopt to be given so
it knows when to start expecting purely input files or clang arguments.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19153>
2022-10-20 02:24:39 +00:00
Luis Felipe Strano Moraes
8e1f03ada0 anv: reword info flag in intel_clc's getopt to avoid clash
The info keyword was using the same short description that
was listed for input files on the struct for long_options.

Rewording it to 'v' and 'verbose' to be more in line with
expectations.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19153>
2022-10-20 02:24:39 +00:00
Iván Briano
d9747169b6 anv: support VK_PIPELINE_CREATE_RAY_TRACING_SKIP_*
VK_PIPELINE_CREATE_RAY_TRACING_SKIP_AABBS_BIT_KHR and
VK_PIPELINE_CREATE_RAY_TRACING_SKIP_TRIANGLES_BIT_KHR, when specified,
make TraceRay behave as if the corresponding shader flags were set, but
without affecting the value of IncomingRayFlags in shaders.

v2 (Lionel): Improve comments

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19152>
2022-10-20 00:03:55 +00:00