Commit Graph

2366 Commits

Author SHA1 Message Date
Oleksii Bozhenko
d5d8bb1dbb brw: fix saturate propagation region overlap range
Fixes: 947c828d5c
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7691

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Oleksii Bozhenko <oleksii.bozhenko@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20206>
2022-12-09 00:39:05 +00:00
Tapani Pälli
bc4b7de0d0 intel/fs: implement Wa_14017989577
The first instruction of any kernel should have non-zero emask. This
restriction needs to be obeyed to avoid GPU hangs.

Patch adds a function to insert dummy mov as first instruction
to make sure this requirement is fulfilled.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20194>
2022-12-08 23:58:32 +00:00
Kenneth Graunke
bafbe7c23a intel/compiler: Set NoMask on cr0 access for float controls mode
This is trying to clear a bit in the control register.  However, it's
executing with whatever channel mask happens to be active.  Typically
this is the one at the start of the program, so at least some channels
will be active.  Typically the first channel will be active due to
packed dispatch, but that's not always guaranteed.  Without NoMask,
the float controls writes may randomly not happen.

Recent GPUs also seem to have a hang issue when the first instruction in
the shader doesn't have any active channels.  Having an instruction with
NoMask at the start of the program works around the issue.  See HSD bug
14017989577.  In our case, the float controls preamble was breaking that
restriction every time, causing us to run into this problem frequently.

Thanks to Tapani Pälli for finding this hang issue, and Francisco
Jerez and Lionel Landwerlin for helping pinpoint this issue during
review of a workaround patch in !20194.

Fixes GPU hangs in Elder Scrolls Online, Witcher 3, and likely more.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7639
Fixes: 9da56ffc52 ("i965/fs: add emit_shader_float_controls_execution_mode() and aux functions")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20214>
2022-12-08 09:54:09 +00:00
Lionel Landwerlin
e25e17dd0c intel/fs: clamp per vertex input accesses to patchControlPoints
In a tesselation control shader where an input array is accessed using
the index gl_InvocationID, we can end up accessing elements beyond the
number of input vertices specified in the shader key.

This happens because of the lowering in nir_lower_indirect_derefs().
This lowering will affect compact variables which happens in this
case :

  in gl_PerVertex {
      vec4  gl_Position;
      float gl_ClipDistance[1];
  } gl_in[gl_MaxPatchVertices];

The lowered code produced by NIR is somewhat ineffecient (implements a
binary seach) :

  if (gl_InvocationID < 16) {
     if (gl_InvocationID < 8) {
        if (gl_InvocationID < 4) {
          vec4 vals = load_at_offset(0);
          value = bcsel(vals, gl_InvocationID);
        } else {
          vec4 vals = load_at_offset(4);
          value = bcsel(vals, gl_InvocationID - 4);
        }
     } else {
        if (gl_InvocationID < 12) {
          vec4 vals = load_at_offset(8);
          value = bcsel(vals, gl_InvocationID - 8);
        } else {
          vec4 vals = load_at_offset(12);
          value = bcsel(vals, gl_InvocationID - 12);
        }
     }
  } else {
     if (gl_InvocationID < 24) {
        ...
     } else {
        ...
     }
  }

By default the gl_MaxPatchVertices must be set at 32 items and that's
what the lowering code will use to divide the access into chunks of 4.
But when running with 3 input vertices, this means we'll pull one more
item than what was delivered in the shader payload.

This triggers issues further down the register scheduling where the
g5UD (register for the 4th item) is overwritten by a previous SEND,
leading the URB read to use an invalid handle.

This pass clamps any access load_per_vertex_input intrinsic vertex
indice to (input_vertices - 1).

Fixes issues with tests like :
dEQP-VK.clipping.user_defined.clip_distance.vert_tess.*

Also fixes a hang with zink/anv on :
KHR-GL46.draw_elements_base_vertex_tests.AEP_shader_stages

v2: Don't replace source register

v3: Implement in NIR

v4: Clamp per vertex array sizes in NIR (Jason)

v5: Move the clamping on the intel compiler

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9749>
2022-12-07 08:16:03 +00:00
Marcin Ślusarz
7809f76fe8 intel/compiler/mesh: align payload size to the size of vec4
This reduces the number of instructions in task shaders when payload
size is not aligned to vec4 and payload_in_shared WA is enabled,
because nir_lower_task_shader will not need to handle the unaligned
size case.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20080>
2022-12-06 16:31:11 +00:00
Lionel Landwerlin
d4cd33630a intel: add missing restriction on fragment simd dispatch
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7755
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Tested-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20169>
2022-12-06 00:37:50 +02:00
Lionel Landwerlin
b9403b1c47 intel: factor out dispatch PS enabling logic
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Tested-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20169>
2022-12-06 00:37:47 +02:00
Lionel Landwerlin
df38426072 intel/rt/nir: add support for RayCullMaskKHR
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20011>
2022-12-02 09:28:23 +00:00
Lionel Landwerlin
6202a2c6b4 intel/rt/nir: enable the trampoline shader to load the indirect ray shader bsr
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20011>
2022-12-02 09:28:23 +00:00
Lionel Landwerlin
a855bdbf47 intel/nir/rt: switch to workgroup_id_zero_base
RT don't use a base workgroup id so no reason of using workgroup_id.
Additionally the lowering introduced in b4dd3df227 requires something
provides base_workgroup_id which we don't have for RT as it's not
needed.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b4dd3df227 ("intel/nir: Set has_base_workgroup_id for lower_compute_system_values")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7812
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20115>
2022-12-02 05:25:22 +00:00
Marcin Ślusarz
db0e6f9a07 intel/compiler: user payload starts after TUE header & its padding
All data written by the user are offset by TUE header size.
Without this patch we copy the correct amount of user data, but both
"from" and "to" offsets are wrong.

Fixes: 37e78803d7 ("intel/compiler: use nir_lower_task_shader pass")

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19409>
2022-12-01 11:19:47 +00:00
Marcin Ślusarz
7aaafaa8ae intel/compiler: adjust [store|load]_task_payload.base too
Base also needs to be converted from bytes to words.

Fixes: c36ae42e4c ("intel/compiler: Use nir_var_mem_task_payload")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19409>
2022-12-01 11:19:47 +00:00
Jason Ekstrand
b4dd3df227 intel/nir: Set has_base_workgroup_id for lower_compute_system_values
This option didn't exist half a decade ago when I first implemented base
workgroup support in ANV.  It's cleaner to just have split system values
like all the other zero_base+base things do.

We currently only do this for COMPUTE and not KERNEL because it lets us
avoid changing intel_clc for now.  We can add KERNEL later if needed.
We also don't do this lowering for task/mesh.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20068>
2022-12-01 04:56:48 +00:00
Jason Ekstrand
7d2e3f660c intel/fs: Support load_workgroup_id_zero_base
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20068>
2022-12-01 04:56:48 +00:00
Jason Ekstrand
d9a24632d3 nir/builder: Drop nir_i2i and nir_u2u in favor of nir_x2xN
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20067>
2022-12-01 01:10:12 +00:00
Lionel Landwerlin
6f2dbe6da1 anv: enable lower_shader_calls vectorizing
On Q2RTX RT shaders :

Totals from 7 (22.58% of 31) affected shaders:
Instrs: 15453 -> 14418 (-6.70%)
Cycles: 232647 -> 224959 (-3.30%)
Send messages: 574 -> 481 (-16.20%)
Spill count: 118 -> 106 (-10.17%)
Fill count: 156 -> 140 (-10.26%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20058>
2022-11-30 07:23:30 +00:00
LingMan
11f91505d9 intel/fs: Accept an unsigned int in fs_reg::fs_reg
The parameter `nr` is currenlty an `int` but it only gets assigned to an
`unsigned int`. Make it clear in the function signature what's actually
required.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19423>
2022-11-23 18:37:35 +00:00
LingMan
fc00314085 intel/fs: Preserve unsignedness in fs_visitor::split_virtual_grfs
GCC 12.2.0 warns:
../src/intel/compiler/brw_fs.cpp: In member function ‘bool fs_visitor::
split_virtual_grfs()’:
../src/intel/compiler/brw_fs.cpp:2199:10: warning: ‘void* memset(void*, int,
size_t)’ specified size between 18446744071562067968 and 18446744073709551615
exceeds maximum object size 9223372036854775807 [-Wstringop-overflow=]
 2199 |    memset(vgrf_has_split, 0, num_vars * sizeof(*vgrf_has_split));

`num_vars` is an `int` but gets assigned the value of `this->alloc.count`,
which is an `unsigned int`. Thus, `num_vars` will be negative if
`this->alloc.count` is larger than int max value. Converting that negative
`int` to a `size_t`, which `memset` expects, then blows it up to a huge
positive value.

Simply turning `num_vars` into an `unsigned int` would be enough to fix this
specific problem, but there are many other instances where an `unsigned int`
gets assigned to an `int` for no good reason in this function. Some of which
the compiler warns about now, some of which it doesn't warn about.

This turns all variables in `fs_visitor::split_virtual_grfs`, which should
reasonably be unsigned, into `unsigned int`s. While at it, a few now pointless
casts are removed.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19423>
2022-11-23 18:37:35 +00:00
Lionel Landwerlin
945637514e intel/fs: improve Wa_22013689345 workaround
The initial implementation is a pretty big hammer. Implement the HW
recommendation to minimize cases in which we need a fence.

This improves by 10FPS on some of the Sascha Willems RT demos.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6031ad4bf6 ("intel/fs: Add Wa_22013689345")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19322>
2022-11-23 06:54:04 +00:00
Lionel Landwerlin
9c1c1888d9 intel/fs: put scratch surface in the surface state heap
In 4ceaed7839 we made scratch surface state allocations part of the
internal heap (mapped to STATE_BASE_ADDRESS::SurfaceStateBaseAddress)
so that it doesn't uses slots in the application's expected 1M
descriptors (especially with vkd3d-proton).

But all our compiler code relies on BSS
(STATE_BASE_ADDRESS::BindlessSurfaceStateBaseAddress).

The additional issue is that there is only 26bits of surface offset
available in CS instruction (CFE_STATE, 3DSTATE_VS, etc...) for
scratch surfaces. So we need the drivers to put the scratch surfaces
in the first chunk of STATE_BASE_ADDRESS::SurfaceStateBaseAddress
(hence all the driver changes).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4ceaed7839 ("anv: split internal surface states from descriptors")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7687
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19727>
2022-11-19 14:58:58 +00:00
Michael Skorokhodov
a9602134a3 intel/compiler: Require C++17
Fixes: 6c194ddd18 ("intel/compiler: Prepare SIMD selection helpers to handle different prog_datas")

Signed-off-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19833>
2022-11-19 04:37:51 +00:00
Caio Oliveira
fbe40720e0 intel/compiler: Remove redundant argument from brw_nir_create_passthrough_tcs
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19831>
2022-11-19 00:35:56 +00:00
Yonggang Luo
4b0409ff9a intel: fixes -Werror,-Wunused-but-set-variable for clang-15
one of those error message:
../../src/intel/compiler/brw_vec4_cmod_propagation.cpp:53:8: error: variable 'ip' set but not used [-Werror,-Wunused-but-set-variable]
   int ip = block->end_ip + 1;

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19527>
2022-11-17 23:17:40 +00:00
Yonggang Luo
d6bd382352 intel: Fixes -Werror,-Wbitwise-instead-of-logical for clang-15 in brw_nir_lower_shader_calls.c
error message:
error: use of bitwise '|' with boolean operands [-Werror,-Wbitwise-instead-of-logical]

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19527>
2022-11-17 23:17:40 +00:00
Caio Oliveira
eedbd1ddbf intel/compiler: Use SIMD selection helpers in compile_single_bs()
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>
2022-11-15 04:55:18 +00:00
Caio Oliveira
6c194ddd18 intel/compiler: Prepare SIMD selection helpers to handle different prog_datas
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>
2022-11-15 04:55:18 +00:00
Caio Oliveira
6ffa597bcf intel/compiler: Keep track of compiled/spilled in brw_simd_selection_state
We still update the cs_prog_data, but don't rely on it for this state anymore.
This will allow use the SIMD selector with shaders that don't use cs_prog_data.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>
2022-11-15 04:55:18 +00:00
Caio Oliveira
3c52e2d04c intel/compiler: Add a SIMD_COUNT constant
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>
2022-11-15 04:55:18 +00:00
Caio Oliveira
a0580dadfd intel/compiler: Create a struct to hold SIMD selection state
This is a preparation to decouple the storage of what SIMDs
compiled/spilled from the cs_prog_data.  This will allow reuse
of SIMD selection code by Bindless Shaders.

And since we have a struct now, move the error array there so
reduce the boilerplate of the users.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>
2022-11-15 04:55:18 +00:00
Caio Oliveira
8cda6cd774 intel/compiler: Simplify usage of brw_simd_select_for_workgroup_size()
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>
2022-11-15 04:55:18 +00:00
Caio Oliveira
a943dbf475 intel/compiler: Make brw_private.h and simd selector helpers C++
We don't intend to expose neither to drivers, so it is fine to be C++.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19601>
2022-11-15 04:55:18 +00:00
Caio Oliveira
494e2edb90 intel/compiler: Fix missing tie-breaker in brw_nir_analyze_ubo_ranges() ordering code
Per Ken suggestion, use ascending order for the start offset.

Fixes: 6d28c6e52c ("i965: Select ranges of UBO data to be uploaded as push constants.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19731>
2022-11-14 19:41:35 +00:00
Caio Oliveira
9fd1d47aa0 intel/compiler: Fix dynarray usage in intel_clc
The code builds up the dynamic array of objects (spirv_objs) and
collect pointers to each of them into another dynamic
array (spirv_ptr_objs).

If the growth of the first array cause a reallocation, it is
possible that the previous pointers end up invalid.

Fixes: 77e929a527 ("intel/clc: allow multiple CL files to be compiled together")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19730>
2022-11-14 19:15:05 +00:00
Lionel Landwerlin
bdf680cd3f intel/fs: use nir_opt_ray_query_ranges
Results on DG2 q2rtx shaders:

Totals from 6 (12.24% of 49) affected shaders:
Instrs: 88927 -> 54088 (-39.18%)
Cycles: 4115088 -> 2536902 (-38.35%)
Send messages: 2639 -> 1609 (-39.03%)
Spill count: 1321 -> 613 (-53.60%)
Fill count: 3130 -> 1104 (-64.73%)
Scratch Memory Size: 22528 -> 18432 (-18.18%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16593>
2022-11-11 15:17:08 +00:00
Caio Oliveira
ecc2dfc503 intel/compiler: Use std::unique_ptr for tracking the fs_visitors
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19605>
2022-11-10 18:01:52 +00:00
Lionel Landwerlin
b499a27d74 nir: make ray query load values visible in NIR prints
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19641>
2022-11-10 14:40:08 +02:00
Ian Romanick
351b8c6aec intel/fs: Enable nir_op_imul_32x16 and nir_op_umul_32x16 on pre-Gfx7
Even though Intel's CI doesn't test these old platforms anymore, the
validation added in "intel/eu/validate: Validate integer multiplication
source size restrictions" combined with full shader-db runs gives me
confidence in the changes.

Sandy Bridge
total instructions in shared programs: 13902341 -> 13902167 (<.01%)
instructions in affected programs: 30771 -> 30597 (-0.57%)
helped: 66 / HURT: 0

total cycles in shared programs: 741795500 -> 741791931 (<.01%)
cycles in affected programs: 987602 -> 984033 (-0.36%)
helped: 28 / HURT: 5

Iron Lake
total instructions in shared programs: 8365806 -> 8365754 (<.01%)
instructions in affected programs: 1766 -> 1714 (-2.94%)
helped: 10 / HURT: 0

total cycles in shared programs: 248542694 -> 248542378 (<.01%)
cycles in affected programs: 29836 -> 29520 (-1.06%)
helped: 9 / HURT: 0

GM45
total instructions in shared programs: 5187127 -> 5187101 (<.01%)
instructions in affected programs: 891 -> 865 (-2.92%)
helped: 5 / HURT: 0

total cycles in shared programs: 163643914 -> 163643750 (<.01%)
cycles in affected programs: 22206 -> 22042 (-0.74%)
helped: 5 / HURT: 0

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19602>
2022-11-09 21:34:26 +00:00
Ian Romanick
293ad13e3f intel/fs: Slightly restructure emitting nir_op_imul_32x16 and nir_op_umul_32x16
There are no immediate values at this point, so all of this code was
bunk. :face_palm:

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19602>
2022-11-09 21:34:26 +00:00
Ian Romanick
ee2a299661 intel/eu/validate: Validate integer multiplication source size restrictions
v2: Expect correct result on BDW in test_eu.

v3: Fix SNB type-size check. Noticed by Marcin.

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19602>
2022-11-09 21:34:26 +00:00
Ian Romanick
d668512f88 intel/compiler: Fix signed integer range analysis of imax and imin
Some review feedback of an earlier commit caused me to rearrange some
code quite a bit. I wasn't paying enough attention while applying the
later commits, and these breaks should have been returns. As it is, the
result of the imin or imax analysis is overwritten by the default case
handling... effectively the original commit does nothing. :(

Tiger Lake and Ice Lake had similar results. (Ice Lake shown)
total instructions in shared programs: 19914090 -> 19904772 (-0.05%)
instructions in affected programs: 121258 -> 111940 (-7.68%)
helped: 445 / HURT: 0

total cycles in shared programs: 855291535 -> 855266659 (<.01%)
cycles in affected programs: 2737005 -> 2712129 (-0.91%)
helped: 426 / HURT: 17

LOST:   0
GAINED: 3

Skylake and Broadwell had similar results. (Skylake shown)
total cycles in shared programs: 842395356 -> 842338259 (<.01%)
cycles in affected programs: 5460985 -> 5403888 (-1.05%)
helped: 458 / HURT: 0

Haswell and Ivy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 16710449 -> 16708449 (-0.01%)
instructions in affected programs: 44101 -> 42101 (-4.54%)
helped: 75 / HURT: 0

total cycles in shared programs: 882760230 -> 882727923 (<.01%)
cycles in affected programs: 2867797 -> 2835490 (-1.13%)
helped: 62 / HURT: 10

No shader-db change on any other Intel platform.

No fossil-db changes on any Intel platform.

Fixes: 5ec75ca10d ("intel/compiler: Teach signed integer range analysis about imax and imin")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19602>
2022-11-09 21:34:26 +00:00
Jason Ekstrand
25c180b509 intel: Don't cross DWORD boundaries with byte scratch load/store
The back-end swizzles dwords so that our indirect scratch messages match
the memory layout of spill/fill messages for better cache coherency.
The swizzle happens at a DWORD granularity.  If a read or write crosses
a DWORD boundary, the first bit will get correctly swizzled but whatever
piece lands in the next dword will not because the scatter instructions
assume sequential addresses for all bytes.  For DWORD writes, this is
handled naturally as part of scalarizing.  For smaller writes, we need
to be sure that a single write never escapes a dword.

Fixes: fd04f858b0 ("intel/nir: Don't try to emit vector load_scratch instructions")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7364
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19580>
2022-11-09 19:45:10 +00:00
Jason Ekstrand
85685cf932 intel/lower_mem_access_bit_sizes: Compute alignments automatically
Because dup_mem_intrinsic() retains the SSA offset from the original
intrinsic and only modifies it by adding a constant, we can compute the
alignment based on the original alignment and the constant offset.  This
is both easier and more accurate.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19580>
2022-11-09 19:45:10 +00:00
Caio Oliveira
22d8ed84b8 intel/compiler: Remove unused fs_visitor::emit_percomp()
Since 7ef7738a61 ("i965: Write gl_FragCoord directly to the destination.") this
is not used.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19586>
2022-11-08 07:33:09 +00:00
Caio Oliveira
90861e6fea intel/compiler: Remove various unused function declarations
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19586>
2022-11-08 07:33:08 +00:00
Caio Oliveira
48506a9029 intel/compiler: Remove unused data members
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19586>
2022-11-08 07:33:08 +00:00
Ian Romanick
9abeb3d739 intel/fs: Optimize integer multiplication of large constants by factoring
Many Intel platforms can only perform 32x16 bit multiplication.  The
straightforward way to implement 32x32 bit multiplications is by
splitting one of the operands into high and low parts called H and L,
repsectively.  The full multiplication can be implemented as:

         ((A * H) << 16) + (A * L)

On Intel platforms, special register accesses can be used to eliminate
the shift operation.  This results in three instructions and a temporary
register for most values.

If H or L is 1, then one (or both) of the multiplications will later be
eliminated.  On some platforms it may be possible to eliminate the
multiplication when H is 256.

If L is zero (note that H cannot be zero), one of the multiplications
will also be eliminated.

Instead of splitting the operand into high and low parts, it may
possible to factor the operand into two 16-bit factors X and Y.  The
original multiplication can be replaced with (A * (X * Y)) = ((A * X) *
Y).  This requires two instructions without a temporary register.

I may have gone a bit overboard with optimizing the factorization
routine.  It was a fun brainteaser, and I couldn't put it down. :) On my
1.3GHz Ice Lake, a standalone test could chug through 1,000,000 randomly
selected values in about 5.7 seconds.  This is about 9x the performance
of the obvious, straightforward implementation that I started with.

v2: Drop an unnecessary return.  Rearrange logic slightly and rename
variables in factor_uint32 to better match the names used in the large
comment.  Both suggested by Caio. Rearrange logic to avoid possibly
using `a` uninitialized. Noticed by Marcin.

v3: Use DIV_ROUND_UP instead of open coding it. Noticed by Caio.

Tiger Lake, Ice Lake, Haswell, and Ivy Bridge had similar results. (Ice Lake shown)
total instructions in shared programs: 19912558 -> 19912526 (<.01%)
instructions in affected programs: 3432 -> 3400 (-0.93%)
helped: 10 / HURT: 0

total cycles in shared programs: 856413218 -> 856412810 (<.01%)
cycles in affected programs: 122032 -> 121624 (-0.33%)
helped: 9 / HURT: 0

No shader-db changes on any other Intel platforms.

Tiger Lake and Ice Lake had similar results. (Ice Lake shown)
Instructions in all programs: 141997227 -> 141996923 (-0.0%)
Instructions helped: 71

Cycles in all programs: 9162524757 -> 9162523886 (-0.0%)
Cycles helped: 63
Cycles hurt: 5

No fossil-db changes on any other Intel platforms.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
2022-11-08 00:02:16 +00:00
Ian Romanick
5ec75ca10d intel/compiler: Teach signed integer range analysis about imax and imin
This is especially helpful for a*isign(a) generated by idiv_by_const
optimization.  On many GPUs, isign(a) is lowered to imax(imin(a, 1),
-1).

There are no changes on fossil-db because ANV uses a different
optimization path for idiv with a constant denominator.  A future MR
will change this.

NOTE: This commit used to help a few hundred shader-db shaders, but
now none are affected.  I suspect this is due to some change in the
idiv_by_const optimization.  This could possibly be dropped.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
2022-11-08 00:02:16 +00:00
Ian Romanick
1b0da3a765 intel/compiler: Signed integer range analysis for imul_32x16 generation
Only iabs and ineg are treated specially.  Everything else just uses
nir_unsigned_upper_bound.  The special treatment of source modifiers is
because they cause problems for nir_unsigned_upper_bound.  Once those
are peeled off, nir_unsigned_upper_bound can generally produce a
tighter bound.

Future commits will add more opcodes.  This mostly introduces the
basic framework.

v2: Add a bunch of comments to signed_integer_range_analysis. Re-arrange
the code a little to reduce duplication.  Both suggested by
Caio. Rearrange some logic to simplify things. Suggested by Marcin.

Tiger Lake, Ice Lake, Haswell, and Ivy Bridge had similar results. (Ice Lake shown)
total instructions in shared programs: 19912894 -> 19912558 (<.01%)
instructions in affected programs: 109275 -> 108939 (-0.31%)
helped: 74 / HURT: 0

total cycles in shared programs: 856422769 -> 856413218 (<.01%)
cycles in affected programs: 15268102 -> 15258551 (-0.06%)
helped: 65 / HURT: 4

total fills in shared programs: 8218 -> 8217 (-0.01%)
fills in affected programs: 1171 -> 1170 (-0.09%)
helped: 1 / HURT: 0

Skylake and Broadwell had similar results. (Skylake shown)
total cycles in shared programs: 845145547 -> 845142263 (<.01%)
cycles in affected programs: 15261465 -> 15258181 (-0.02%)
helped: 65 / HURT: 0

Tiger Lake
Tiger Lake
Instructions in all programs: 157580768 -> 157579730 (-0.0%)
Instructions helped: 312
Instructions hurt: 28

Cycles in all programs: 7566977172 -> 7566967746 (-0.0%)
Cycles helped: 288
Cycles hurt: 53

Spills in all programs: 19701 -> 19700 (-0.0%)
Spills helped: 2
Spills hurt: 4

Fills in all programs: 33311 -> 33335 (+0.1%)
Fills helped: 5
Fills hurt: 4

Ice Lake
Instructions in all programs: 141998667 -> 141997227 (-0.0%)
Instructions helped: 420
Instructions hurt: 3

Cycles in all programs: 9162565297 -> 9162524757 (-0.0%)
Cycles helped: 389
Cycles hurt: 29

Spills in all programs: 19918 -> 19916 (-0.0%)
Spills helped: 2
Spills hurt: 3

Fills in all programs: 32795 -> 32814 (+0.1%)
Fills helped: 6
Fills hurt: 3

Skylake
Instructions in all programs: 132567691 -> 132567745 (+0.0%)
Instructions hurt: 24

Cycles in all programs: 8828897462 -> 8828889517 (-0.0%)
Cycles helped: 405
Cycles hurt: 6

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
2022-11-08 00:02:16 +00:00
Ian Romanick
f90d71055b intel/compiler: Add and use a pass to generate imul_32x16 instructions
Gfx8 and Gfx9 platforms are helped for cycles because now many
instructions like

    mul(8)          g12<1>D         g10<8,8,1>D     6D

become

    mul(8)          g12<1>D         g10<8,8,1>D     6W

It is the same number of instructions, but the 32x16 multiply is a
little faster.

v2: Fix transposed hi and lo in "(hi >= INT16_MIN && lo <= INT16_MAX)".
Noticed by Caio.  Use nir_src_is_const instead of open coding it.
Suggested by Caio.

Broadwell and Skylake had similar results. (Skylake shown)
total cycles in shared programs: 845748380 -> 845145547 (-0.07%)
cycles in affected programs: 446346348 -> 445743515 (-0.14%)
helped: 6017
HURT: 0
helped stats (abs) min: 2 max: 7380 x̄: 100.19 x̃: 8
helped stats (rel) min: <.01% max: 3.72% x̄: 0.41% x̃: 0.39%
95% mean confidence interval for cycles value: -113.37 -87.00
95% mean confidence interval for cycles %-change: -0.42% -0.41%
Cycles are helped.

Skylake
Cycles in all programs: 8844820715 -> 8828897462 (-0.2%)
Cycles helped: 47914
Cycles hurt: 1

No shader-db or fossil-db changes on any other Intel platform.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
2022-11-08 00:02:16 +00:00
Ian Romanick
9479e3a19b intel/fs: Allow constant copy prop from DW to W
This enables copy propagation of

    mov(8)          g5<1>UD         0x00000180UD
    mul(8)          g10<1>D         g2.3<0,1,0>D    g5<16,8,2>W

into

    mul(8)          g10<1>D         g2.3<0,1,0>D    180W

This is necessary for any optimization passes that generate imul_32x16
instructions.

No fossil-db or shader-db changes on any Intel platform.

v2: Fix type size check to (src size != 2) || (dest size != 4).  It was
previously &&. :( This allowed copying constants into UB sources, and
that is invalid.

v3: Fix incorrect extraction of upper 16-bits of immediate value when
subnr=2. Noticed by Caio.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
2022-11-08 00:02:16 +00:00