Commit Graph

291 Commits

Author SHA1 Message Date
Jesse Natalie
4217353e2d nir_lower_mem_access_bit_sizes: Add a bit_size input to the callback
We'd like to use this callback to adjust loads and stores from things
that are unsupported to things that are supported, but if the input
is already supported, we'd prefer not to change it. Rather than making
up a bit size that'd work and doing a bunch of pack/unpack bit math,
only return a different bit size if the input one doesn't work for us
(i.e. can't load enough memory or just an unsupported size entirely).

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23173>
2023-06-13 00:43:36 +00:00
Mark Janes
a98f246857 isl: use generated workaround helpers for Wa_1806565034
This workaround was enabled for gen12+, but only applies to gen12.0.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21912>
2023-06-02 16:17:34 +00:00
Mark Janes
d0669f3ede intel/dev: switch defect identifiers to use lineage numbers
Update existing workarounds when necessary to match changed
identifiers.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23226>
2023-05-30 22:13:41 +00:00
Erik Faye-Lund
20d619cd84 nir: use more nir_fmul_imm
This simplifies things a bit. Note that in some cases, the arguments are
swapped, because multiplications are commutative, and nir_fmul_imm only
allows the second operand to be an immediate.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23179>
2023-05-25 06:59:24 +00:00
Lionel Landwerlin
429ef02f83 intel/fs: make tcs input_vertices dynamic
We need to do 3 things to accomplish this :

   1. make all the register access consider the maximal case when
      unknown at compile time

   2. move the clamping of load_per_vertex_input prior to lowering
      nir_intrinsic_load_patch_vertices_in (in the dynamic cases, the
      clamping will use the nir_intrinsic_load_patch_vertices_in to
      clamp), meaning clamping using derefs rather than lowered
      nir_intrinsic_load_per_vertex_input

   3. in the known cases, lower nir_intrinsic_load_patch_vertices_in
      in NIR (so that the clamped elements still be vectorized to the
      smallest number of URB read messages)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22378>
2023-05-24 18:32:07 +00:00
Kenneth Graunke
a2d384a5c0 intel/compiler: Fix 64-bit ufind_msb, find_lsb, and bit_count
We only support 32-bit versions of ufind_msb, find_lsb, and bit_count,
so we need to lower them via nir_lower_int64.

Previously, we were failing to do so on platforms older than Icelake
and let those operations fall through to nir_lower_bit_size, which
used a callback to determine it should lower them for bit_size != 32.
However, that pass only emulates small bit-size operations by promoting
them to supported, larger bit-sizes (i.e. 16-bit using 32-bit).  It
doesn't support emulating larger operations (i.e. 64-bit using 32-bit).

So nir_lower_bit_size would just u2u32 the 64-bit source, causing us to
flat ignore half of the bits.

Commit 78a195f252 (intel/compiler: Postpone most int64 lowering to
brw_postprocess_nir) provoked this bug on Icelake and later as well,
by moving the nir_lower_int64 handling for ufind_msb until late in
compilation, allowing it to reach nir_lower_bit_size which broke it.

To fix this, we always set int64 lowering for these opcodes, and also
correct the nir_lower_bit_size callback to ignore 64-bit operations.

Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23123>
2023-05-19 22:44:37 +00:00
Rohan Garg
6b8fe32322 intel: infer scalar'ness locally for brw_vectorize_lower_mem_access
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23098>
2023-05-18 15:46:06 +02:00
Rohan Garg
3a8f5c2783 intel: update comments about non-existent function parameter
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23098>
2023-05-18 15:46:06 +02:00
Rohan Garg
a15cc833f9 intel: drop unused is_scalar function parameter in brw_nir_apply_key
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23098>
2023-05-18 15:46:06 +02:00
Rohan Garg
212810ac8a intel: infer scalar'ness locally for brw_postprocess_nir
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23098>
2023-05-18 15:46:06 +02:00
Kenneth Graunke
78a195f252 intel/compiler: Postpone most int64 lowering to brw_postprocess_nir
Float conversions continue to be lowered early at the same time as
nir_lower_doubles, which we run early so we don't have to run it for
every shader key variant.  However, all other int64 lowering is now
done late, after nir_opt_load_store_vectorize(), allowing it to
comprehend basic arithmetic on 64-bit addresses.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23064>
2023-05-18 10:48:50 +00:00
Alyssa Rosenzweig
01e9ee79f7 nir: Drop unused name from nir_ssa_dest_init
Since 624e799cc3 ("nir: Drop nir_ssa_def::name and nir_register::name"), SSA
defs don't have names, making the name argument unused. Drop it from the
signature and fix the call sites. This was done with the help of the following
Coccinelle semantic patch:

    @@
    expression A, B, C, D, E;
    @@

    -nir_ssa_dest_init(A, B, C, D, E);
    +nir_ssa_dest_init(A, B, C, D);

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23078>
2023-05-17 23:46:16 +00:00
Alyssa Rosenzweig
c323762f9f treewide: Stop lowering legacy atomics
There are no more producers of legacy atomics so these calls are inert.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23036>
2023-05-16 22:36:21 +00:00
Lionel Landwerlin
952a523abb intel: switch over to unified atomics
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23004>
2023-05-15 16:32:21 +00:00
Kenneth Graunke
f00143acc3 intel/compiler: Fold constants after distributing source modifiers
This can generate things like fneg! of load_const, which is silly.
Fold those away into an actual constant.  Only do so on the scalar
backend because there's a comment above that the vec4 backend doesn't
want any new constants this late, and I'm inclined to believe it.

fossil-db stats show a very minor improvement:

   Totals:
   Instrs: 203091223 -> 203091099 (-0.00%); split: -0.00%, +0.00%
   Cycles: 14410638075 -> 14410577067 (-0.00%); split: -0.00%, +0.00%

   Totals from 20 (0.00% of 665070) affected shaders:
   Instrs: 27067 -> 26943 (-0.46%); split: -0.47%, +0.01%
   Cycles: 2687958 -> 2626950 (-2.27%); split: -2.27%, +0.00%

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22881>
2023-05-09 00:16:40 -07:00
Ian Romanick
d47f521ee4 intel/compiler: Use NIR_PASS instead of NIR_PASS_V
Reduce debug log spam by only logging the shader if a pass made some
changes. This can also elide some nir_validate calls in debug builds.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
2023-04-06 19:07:50 +00:00
Emma Anholt
f1ea6c1b40 intel: Always call nir_lower_frexp.
We have NIR lowering for Vulkan, and rely on GLSL's lowering in the
frontend, but this will let us drop the GLSL lowering.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22083>
2023-04-06 02:32:01 +00:00
Lionel Landwerlin
e25aee8e34 intel/fs: also allow vec8+ vectorization of load_global_const_block_intel
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
2023-04-05 12:32:56 +00:00
Lionel Landwerlin
a358b97c58 intel/fs: optimize uniform SSBO & shared loads
Using divergence analysis, figure out when SSBO & shared memory loads
are uniform and carry the data only once in register space.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21853>
2023-04-05 12:32:56 +00:00
Patrick Lerda
5d85966805 intel: fix memory leak related to brw_nir_create_passthrough_tcs()
Indeed, the parameter "mem_ctx" was not processed.

For instance, this issue is triggered with the crocus driver and
"piglit/bin/shader_runner tests/spec/arb_tessellation_shader/execution/compatibility/tes-clip-vertex-different-from-position.shader_test -auto -fbo":
SUMMARY: AddressSanitizer: 235216 byte(s) leaked in 48 allocation(s).

Fixes: 96ba0344db ("intel: Use common helpers for TCS passthrough shaders")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22173>
2023-03-30 10:52:07 +00:00
Marcin Ślusarz
32107d8b5a intel/compiler: compactify locations of mesh outputs
Needed in support of anv code for Wa_14015590813.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17622>
2023-03-29 18:35:55 +00:00
Tapani Pälli
6538c5bcd4 intel/fs: restore message layout changes for cube array
This reverts commit bc04e2daca that handled the change as a WA while
this is about a new feature, change done in message layout. Patch also
changes the original comment to not refer to Wa but bspec page.

Fixes: bc04e2daca ("intel/fs: use generated helpers for Wa_1209978020 / Wa_18012201914")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22068>
2023-03-22 20:18:11 +00:00
Lionel Landwerlin
56474fae93 intel/fs: fix subgroup invocation read bounds checking
nir->info.subgroup_size can be set to an enum :
  SUBGROUP_SIZE_VARYING = 0
  SUBGROUP_SIZE_UNIFORM = 1
  SUBGROUP_SIZE_API_CONSTANT = 2
  SUBGROUP_SIZE_FULL_SUBGROUPS = 3

So compute the API subgroup size value and compare it to the dispatch
size to determine whether we need some bound checking.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9ac192d79d ("intel/fs: bound subgroup invocation read to dispatch size")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21856>
2023-03-14 12:15:48 +00:00
Lionel Landwerlin
bf59cfcee1 intel/fs: prevent large vector ops generated by peephole_ffma
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21782>
2023-03-14 10:38:50 +00:00
Mark Janes
bc04e2daca intel/fs: use generated helpers for Wa_1209978020 / Wa_18012201914
Wa_1209978020 is a clone of Wa_18012201914.  Update references to
refer to the originating bug, and use generated helpers to ensure it
is applied to future platforms as needed.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21741>
2023-03-07 01:41:53 +00:00
Caio Oliveira
07de034791 intel/compiler: Drop brw_nir_lower_scoped_barriers
Now that we handle scoped barriers with execution scope during
NIR -> Backend IR translation, this lowering is not needed anymore.

Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21634>
2023-03-07 00:41:13 +00:00
Alyssa Rosenzweig
952bd63d6d nir/opt_barrier: Generalize to control barriers
For GLSL, we want to optimize code like

   memoryBarrierBuffer();
   controlBarrier();

into a single scoped_barrier intrinsic for the backend to consume. Now that
backends can get scoped_barriers everywhere, what's left is enabling backends to
combine these barriers together. We already have an Intel-specific pass for
combining memory barriers; it just needs a teensy bit of generalization to allow
combining all sorts of barriers together.

This avoids code quality regression on Asahi when switching to purely scoped
barriers. It's probably useful for other backends too.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21661>
2023-03-06 22:09:27 +00:00
Faith Ekstrand
83fd7a5ed1 intel: Use nir_lower_tex_options::lower_index_to_offset
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21546>
2023-03-06 21:38:32 +00:00
Faith Ekstrand
9a4641cf6b intel/nir: Limit unaligned loads to vec4
This probably doesn't affect Vulkan or GL because they can't have
anything bigger than a vec4 anyway unless it's a u64vec4 and those have
to be at least 8B aligned.  This may affect CL apps if they use
__attribute__((packed)) on something with big vectors, depending on how
LLVM decides to translate that.

Fixes: f8aa83f0c8 ("intel/nir: Use nir_lower_mem_access_bit_sizes()")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21524>
2023-03-03 02:00:39 +00:00
Faith Ekstrand
eb9a56b6ca nir: Rename nir_mem_access_size_align::align_mul to align
It's a simple alignment so calling it align_mul is a bit misleading.

Suggested-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21524>
2023-03-03 02:00:39 +00:00
Faith Ekstrand
ca4d73ba36 nir: Add a combined alignment helper
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@colllabora.com>
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21524>
2023-03-03 02:00:39 +00:00
Faith Ekstrand
116a851264 nir: Add mode filtering to lower_mem_access_bit_sizes
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21524>
2023-03-03 02:00:39 +00:00
Kenneth Graunke
96ba0344db intel: Use common helpers for TCS passthrough shaders
Rob added these new helpers a while back, which freedreno and radeonsi
both share.  We should use them too.  The new helpers use variables and
system value intrinsics, so we can drop the explicit binding table
creation and just use the normal paths.

Because we have to rewrite the system value uploading anyway, we drop
the scrambling of the default tessellation levels on upload, and instead
let the compiler go ahead and remap components like any normal shader.
In theory, this results in more shuffling in the shader.  In practice,
we already do MOVs for message setup.  In the passthrough shaders I
looked at, this resulted in no extra instructions on Icelake (SIMD8
SINGLE_PATCH) and Tigerlake (8_PATCH).  On Haswell, one shader grew by
a single instruction for a pittance of cycles in a stage that isn't a
performance bottleneck anyway.  Avoiding remapping wasn't so much of an
optimization as just the way that I originally wrote it.  Not worth it.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20809>
2023-02-20 03:54:24 +00:00
Faith Ekstrand
f8aa83f0c8 intel/nir: Use nir_lower_mem_access_bit_sizes()
This drops the Intel-specific pass in favor of the new generic one.

No shader-db changes on Skylake or DG2.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21232>
2023-02-17 00:55:54 +00:00
Marcin Ślusarz
9d3e3c15f3 intel/compiler: replace gl_Layer & gl_ViewportIndex by 0 in fs if ms doesn't write it
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17620>
2023-02-14 08:24:51 +00:00
Alejandro Piñeiro
ba0bc7182d anv: use shader_info->var_copies_lowered
Instead of passing allow_copies as a parameter for brw_nir_optimize
(so manually doing that tracking).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19338>
2023-02-06 22:11:34 +00:00
Jason Ekstrand
949b42c4dc intel/compiler: Convert wm_prog_key::multisample_fbo to a tri-state
This allows us to communicate to the back-end that we don't actually
know if the framebuffer is multisampled or not.  No drivers set anything
but ALWAYS/NEVER and we still have a few ALWAYS/NEVER assumptions but
those should be asserted.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>
2023-02-06 09:12:18 +00:00
Jason Ekstrand
5644011f06 intel/compiler: Convert wm_prog_key::persample_interp to a tri-state
This allows for the possibility that we may not know at compile time if
sample shading is enabled through the API.  While we're here, also
document exactly what this bit means so we don't confuse ourselves.

v2: Fixup coarse pixel values (Lionel)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>
2023-02-06 09:12:18 +00:00
Jason Ekstrand
d25e5310bc intel/nir: Lower barycentrics to per-sample in a dedicated pass
This is more similar to what we do for single-sample and it should be
more clear going forward once our lowering gets more complex.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21094>
2023-02-06 09:12:17 +00:00
Kenneth Graunke
90a2137cd5 intel/compiler: Use LSC opcode enum rather than legacy BRW_AOPs
This gets our logical atomic messages using the lsc_opcode enum rather
than the legacy BRW_AOP_* defines.  We have to translate one way or
another, and using the modern set makes sense going forward.

One advantage is that the lsc_opcode encoding has opcodes for both
integer and floating point atomics in the same enum, whereas the legacy
encoding used overlapping values (BRW_AOP_AND == 1 == BRW_AOP_FMAX),
which made it impossible to handle both sensibly in common code.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
2023-01-19 08:42:22 +00:00
Lionel Landwerlin
94bb4a13fa intel/fs: make Wa_1806565034 conditional to non robust access
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20280>
2022-12-13 18:05:19 +00:00
Kenneth Graunke
8c2448d4e6 intel/compiler: Delete sampler key handling for planar format stuff
i965 used these, but Gallium drivers do this lowering via a separate
nir_lower_tex call from st/mesa.  Vulkan drivers don't use these at all.

Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20223>
2022-12-09 10:18:25 +00:00
Jason Ekstrand
b4dd3df227 intel/nir: Set has_base_workgroup_id for lower_compute_system_values
This option didn't exist half a decade ago when I first implemented base
workgroup support in ANV.  It's cleaner to just have split system values
like all the other zero_base+base things do.

We currently only do this for COMPUTE and not KERNEL because it lets us
avoid changing intel_clc for now.  We can add KERNEL later if needed.
We also don't do this lowering for task/mesh.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20068>
2022-12-01 04:56:48 +00:00
Lionel Landwerlin
6f2dbe6da1 anv: enable lower_shader_calls vectorizing
On Q2RTX RT shaders :

Totals from 7 (22.58% of 31) affected shaders:
Instrs: 15453 -> 14418 (-6.70%)
Cycles: 232647 -> 224959 (-3.30%)
Send messages: 574 -> 481 (-16.20%)
Spill count: 118 -> 106 (-10.17%)
Fill count: 156 -> 140 (-10.26%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20058>
2022-11-30 07:23:30 +00:00
Caio Oliveira
fbe40720e0 intel/compiler: Remove redundant argument from brw_nir_create_passthrough_tcs
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19831>
2022-11-19 00:35:56 +00:00
Lionel Landwerlin
bdf680cd3f intel/fs: use nir_opt_ray_query_ranges
Results on DG2 q2rtx shaders:

Totals from 6 (12.24% of 49) affected shaders:
Instrs: 88927 -> 54088 (-39.18%)
Cycles: 4115088 -> 2536902 (-38.35%)
Send messages: 2639 -> 1609 (-39.03%)
Spill count: 1321 -> 613 (-53.60%)
Fill count: 3130 -> 1104 (-64.73%)
Scratch Memory Size: 22528 -> 18432 (-18.18%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16593>
2022-11-11 15:17:08 +00:00
Ian Romanick
351b8c6aec intel/fs: Enable nir_op_imul_32x16 and nir_op_umul_32x16 on pre-Gfx7
Even though Intel's CI doesn't test these old platforms anymore, the
validation added in "intel/eu/validate: Validate integer multiplication
source size restrictions" combined with full shader-db runs gives me
confidence in the changes.

Sandy Bridge
total instructions in shared programs: 13902341 -> 13902167 (<.01%)
instructions in affected programs: 30771 -> 30597 (-0.57%)
helped: 66 / HURT: 0

total cycles in shared programs: 741795500 -> 741791931 (<.01%)
cycles in affected programs: 987602 -> 984033 (-0.36%)
helped: 28 / HURT: 5

Iron Lake
total instructions in shared programs: 8365806 -> 8365754 (<.01%)
instructions in affected programs: 1766 -> 1714 (-2.94%)
helped: 10 / HURT: 0

total cycles in shared programs: 248542694 -> 248542378 (<.01%)
cycles in affected programs: 29836 -> 29520 (-1.06%)
helped: 9 / HURT: 0

GM45
total instructions in shared programs: 5187127 -> 5187101 (<.01%)
instructions in affected programs: 891 -> 865 (-2.92%)
helped: 5 / HURT: 0

total cycles in shared programs: 163643914 -> 163643750 (<.01%)
cycles in affected programs: 22206 -> 22042 (-0.74%)
helped: 5 / HURT: 0

Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19602>
2022-11-09 21:34:26 +00:00
Ian Romanick
f90d71055b intel/compiler: Add and use a pass to generate imul_32x16 instructions
Gfx8 and Gfx9 platforms are helped for cycles because now many
instructions like

    mul(8)          g12<1>D         g10<8,8,1>D     6D

become

    mul(8)          g12<1>D         g10<8,8,1>D     6W

It is the same number of instructions, but the 32x16 multiply is a
little faster.

v2: Fix transposed hi and lo in "(hi >= INT16_MIN && lo <= INT16_MAX)".
Noticed by Caio.  Use nir_src_is_const instead of open coding it.
Suggested by Caio.

Broadwell and Skylake had similar results. (Skylake shown)
total cycles in shared programs: 845748380 -> 845145547 (-0.07%)
cycles in affected programs: 446346348 -> 445743515 (-0.14%)
helped: 6017
HURT: 0
helped stats (abs) min: 2 max: 7380 x̄: 100.19 x̃: 8
helped stats (rel) min: <.01% max: 3.72% x̄: 0.41% x̃: 0.39%
95% mean confidence interval for cycles value: -113.37 -87.00
95% mean confidence interval for cycles %-change: -0.42% -0.41%
Cycles are helped.

Skylake
Cycles in all programs: 8844820715 -> 8828897462 (-0.2%)
Cycles helped: 47914
Cycles hurt: 1

No shader-db or fossil-db changes on any other Intel platform.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
2022-11-08 00:02:16 +00:00
Francisco Jerez
5d4df3ac23 intel/compiler: Run extra fp64 lowering pass on devices that don't support int64.
In some cases nir_lower_int64 will emit fp64 operations which aren't
natively supported on any Intel hardware (e.g. ftrunc, frem).  An
extra pass of nir_opt_algebraic (for frem) and nir_lower_doubles is
required in order to take care of them.  This fixes several int64
test-cases on MTL hardware.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19390>
2022-11-07 07:35:22 +00:00
Kenneth Graunke
88756cee8d intel/compiler: Run nir_opt_large_constants before scalarizing consts
nir_opt_large_constants balks at seeing a store_deref of a variable
where the source is a vecN operation of multiple load_consts, and thinks
that isn't a constant, so it should not bother promoting it.

Unfortunately, we were running nir_lower_load_const_to_scalar before
nir_opt_large_constants, so this prevented a ton of constant promotion.

This commit /used to help/ some shaders in shader-db. Presumably since
!16770 landed, those shaders were already helped.  Currently ther are
no shader-db changes on any Intel platform.

Fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141998227 -> 141421756 (-0.4%)
Instructions helped: 12515
Instructions hurt: 237

SENDs in all programs: 7437925 -> 7468033 (+0.4%)
SENDs hurt: 12806

Cycles in all programs: 9161655753 -> 9132869800 (-0.3%)
Cycles helped: 10163
Cycles hurt: 2637

Spills in all programs: 19977 -> 18678 (-6.5%)
Spills helped: 384
Spills hurt: 40

Fills in all programs: 32863 -> 31396 (-4.5%)
Fills helped: 385
Fills hurt: 42

Lost: 1

Lots of Shadow of the Tomb Raider fragment shaders and Batman Arkham
Origins vertex shaders were hurt for SENDs in this commit.  A couple
Aztec Ruins compute shaders and Spaceship shaders (multiple stages)
were also hurt.

All of the shaders hurt for spills or fills were Spaceship compute
shaders.  Nearly all of the shaders helped were Shadow of the Tomb
Raider fragmenet shaders.  One Spaceship shader was reall, REALLY helped:

Spills helped fossils/fossil-db/Spaceship.run.9f90a2a226fcc57f.1.foz/0b507d3abe2e3c28/compute: 321 -> 13 (-96.0%)
Fills helped fossils/fossil-db/Spaceship.run.9f90a2a226fcc57f.1.foz/0b507d3abe2e3c28/compute: 279 -> 21 (-92.5%)

Overall this seems like an improvement, but we may want to actually
run these few benchmarks before landing.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16539>
2022-11-01 14:55:21 -07:00