Commit Graph

2625 Commits

Author SHA1 Message Date
Jason Ekstrand
739e21fa9a intel/fs: Add a parameter to speed up register spilling
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24299>
2023-07-28 14:51:42 +00:00
Mike Blumenkrantz
e68e612826 nir: add a helper for calculating variable slots
this will maybe avoid future bugs, but probably not

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24163>
2023-07-28 13:14:35 +00:00
Mike Blumenkrantz
59396eefe6 nir: fix slot calculations for compact variables with location_frac
a variable with a component offset may span multiple slots, and this cannot
be inferred from its type alone (e.g., compacted clip+cull distances)

cc: mesa-stable

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24163>
2023-07-28 13:14:35 +00:00
Lionel Landwerlin
fe81d40bff intel/nir: add lower for sparse images & textures
We have to lower images into image load + sampler residency.

There is also a restriction on sampler access with a compare, lower
those as 2 sampler instructions to meet the restriction.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23882>
2023-07-27 02:02:59 +03:00
Lionel Landwerlin
300cc829de intel/nir: handle image_sparse_load in storage format lowering
The last component of sparse load is the residency data. We don't want
to touch/convert that value with the format lowering.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23882>
2023-07-27 02:02:34 +03:00
Lionel Landwerlin
d33aff783d intel/fs: add support for sparse accesses
Purely from the backend point of view it's just an additional
parameter to sampler messages.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23882>
2023-07-27 02:02:30 +03:00
Lionel Landwerlin
d099e47de0 intel/fs: add more UNDEFs around SEND messages
lower_find_live_channel() in particular is used a lot in control flow
to find the live channel for the surface/sampler handle. Adding UNDEFs
on the temporary registers used for finding the live channels helps
reduce the liveness of those temporary registers, especially in loops.

Some titles affected :

Rise Of The Tomb Raider:
Totals from 2780 (22.58% of 12311) affected shaders:
Instrs: 1294455 -> 1294592 (+0.01%); split: -0.15%, +0.16%
Cycles: 1473136441 -> 1471302617 (-0.12%); split: -1.52%, +1.40%
Max live registers: 144282 -> 143595 (-0.48%)
Max dispatch width: 22200 -> 22232 (+0.14%)

Red Dead Redemption 2:
Totals from 435 (7.28% of 5972) affected shaders:
Instrs: 488472 -> 487594 (-0.18%); split: -0.31%, +0.14%
Cycles: 11354732 -> 11384928 (+0.27%); split: -0.44%, +0.71%
Spill count: 1217 -> 1172 (-3.70%)
Fill count: 3521 -> 3447 (-2.10%)
Scratch Memory Size: 64512 -> 62464 (-3.17%)
Max live registers: 35997 -> 35798 (-0.55%)

Fallout 4:
Totals from 8 (0.49% of 1638) affected shaders:
Instrs: 41908 -> 40509 (-3.34%)
Cycles: 3638464 -> 3555680 (-2.28%); split: -2.67%, +0.39%
Spill count: 717 -> 665 (-7.25%)
Fill count: 2542 -> 2438 (-4.09%)
Scratch Memory Size: 32768 -> 16384 (-50.00%)
Max live registers: 567 -> 534 (-5.82%)

Cyberpunk 2077:
Totals from 2984 (28.97% of 10301) affected shaders:
Instrs: 3888874 -> 3891600 (+0.07%); split: -0.20%, +0.27%
Cycles: 67906489 -> 67767721 (-0.20%); split: -0.68%, +0.47%
Spill count: 200 -> 98 (-51.00%)
Fill count: 237 -> 90 (-62.03%)
Scratch Memory Size: 10240 -> 8192 (-20.00%)
Max live registers: 215715 -> 212727 (-1.39%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24282>
2023-07-26 08:48:33 +00:00
Lionel Landwerlin
5c72724819 intel/fs: consider UNDEF as non-partial write
A few titles show max live register reductions, but nothing
significant in instruction count or other stats.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24282>
2023-07-26 08:48:32 +00:00
Lionel Landwerlin
d62e494b37 intel/vec4: fix log_data pointer
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3384f029be ("intel/compiler: rework input parameters")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9421
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24307>
2023-07-26 06:36:18 +00:00
Iván Briano
377c2a045f intel/compiler: call brw_nir_adjust_payload from brw_postprocess_nir
Calling anything after nir_trivialize_registers() risks undoing some of
its work.
In this case, brw_nir_adjust_payload() will do a constant folding pass
if any payload adjusting happened, and that can turn a bunch of
@store_regs into basically noops.

Fixes dEQP-VK.subgroups.*task

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24325>
2023-07-25 22:48:09 +00:00
Ian Romanick
cb0de0a1d3 intel/fs: Constant fold OR and AND
The path taken in fs_visitor::swizzle_nir_scratch_addr for DG2 generates
some AND and OR instructions before the SHL. This commit folds those so
the whold calculation becomes a constant (like on older platforms).

v2: Fix return type of src_as_uint. Noticed by Marcin.

shader-db results:

DG2
total instructions in shared programs: 23190475 -> 23179540 (-0.05%)
instructions in affected programs: 36026 -> 25091 (-30.35%)
helped: 7 / HURT: 0

total cycles in shared programs: 841196807 -> 841142563 (<.01%)
cycles in affected programs: 1660670 -> 1606426 (-3.27%)
helped: 7 / HURT: 0

No shader-db changes on any older Intel platforms.

fossil-db results:

DG2
Totals:
Instrs: 197780372 -> 197773966 (-0.00%)
Cycles: 14066410782 -> 14066399378 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 8438104 -> 8438112 (+0.00%)
Send messages: 8049445 -> 8049446 (+0.00%)
Scratch Memory Size: 14263296 -> 14264320 (+0.01%)

Totals from 9 (0.00% of 668055) affected shaders:
Instrs: 24547 -> 18141 (-26.10%)
Cycles: 1984791 -> 1973387 (-0.57%); split: -0.98%, +0.40%
Subgroup size: 88 -> 96 (+9.09%)
Send messages: 867 -> 868 (+0.12%)
Scratch Memory Size: 69632 -> 70656 (+1.47%)

No fossil-db changes on any older Intel platforms.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23884>
2023-07-25 22:11:21 +00:00
Ian Romanick
61c786bad5 intel/fs: Constant fold SHL
This is a modified version of a commit originally in !7698. This version
add the changes to brw_fs_copy_propagation. If the address passed to
fs_visitor::swizzle_nir_scratch_addr is a constant, that function will
generate SHL with two constant sources.

DG2 uses a different path to generate those addresses, so the constant
folding can't occur there yet. That will be addressed in the next
commit.

What follows is the commit change history from that older MR.

v2: Previously this commit was after `intel/fs: Combine constants for
integer instructions too`.  However, this commit can create invalid
instructions that are only cleaned up by `intel/fs: Combine constants
for integer instructions too`.  That would potentially affect the
shader-db results of each commit, but I did not collect new data for
the reordering.

v3: Fix masking for W/UW and for Q/UQ types. Add an assertion for
!saturate. Both suggested by Ken. Also add an assertion that B/UB types
don't matically come back.

v4: Fix sources count. See also ed3c2f73db ("intel/fs: fixup sources
number from opt_algebraic").

v5: Fix typo in comment added in v3. Noticed by Marcin. Fix a typo in a
comment added when pulling this commit out of !7698. Noticed by Ken.

shader-db results:

DG2
No changes.

Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
total instructions in shared programs: 20655696 -> 20651648 (-0.02%)
instructions in affected programs: 23125 -> 19077 (-17.50%)
helped: 7 / HURT: 0

total cycles in shared programs: 858436639 -> 858407749 (<.01%)
cycles in affected programs: 8990532 -> 8961642 (-0.32%)
helped: 7 / HURT: 0

Broadwell and Haswell had similar results. (Broadwell shown)
total instructions in shared programs: 18500780 -> 18496630 (-0.02%)
instructions in affected programs: 24715 -> 20565 (-16.79%)
helped: 7 / HURT: 0

total cycles in shared programs: 946100660 -> 946087688 (<.01%)
cycles in affected programs: 5838252 -> 5825280 (-0.22%)
helped: 7 / HURT: 0

total spills in shared programs: 17588 -> 17572 (-0.09%)
spills in affected programs: 1206 -> 1190 (-1.33%)
helped: 2 / HURT: 0

total fills in shared programs: 25192 -> 25156 (-0.14%)
fills in affected programs: 156 -> 120 (-23.08%)
helped: 2 / HURT: 0

No shader-db changes on any older Intel platforms.

fossil-db results:

DG2
Totals:
Instrs: 197780415 -> 197780372 (-0.00%); split: -0.00%, +0.00%
Cycles: 14066412266 -> 14066410782 (-0.00%); split: -0.00%, +0.00%

Totals from 16 (0.00% of 668055) affected shaders:
Instrs: 16420 -> 16377 (-0.26%); split: -0.43%, +0.17%
Cycles: 220133 -> 218649 (-0.67%); split: -0.69%, +0.01%

Tiger Lake, Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 153425977 -> 153423678 (-0.00%)
Cycles: 14747928947 -> 14747929547 (+0.00%); split: -0.00%, +0.00%
Subgroup size: 8535968 -> 8535976 (+0.00%)
Send messages: 7697606 -> 7697607 (+0.00%)
Scratch Memory Size: 4380672 -> 4381696 (+0.02%)

Totals from 6 (0.00% of 662749) affected shaders:
Instrs: 13893 -> 11594 (-16.55%)
Cycles: 5386074 -> 5386674 (+0.01%); split: -0.42%, +0.43%
Subgroup size: 80 -> 88 (+10.00%)
Send messages: 675 -> 676 (+0.15%)
Scratch Memory Size: 91136 -> 92160 (+1.12%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23884>
2023-07-25 22:11:21 +00:00
Ian Romanick
56e6186dcf intel/fs: Always do opt_algebraic after opt_copy_propagation makes progress
opt_copy_propagation can create invalid instructions like

    shl(8) vgrf96:UD, 2d, 8u

These instructions will be cleaned up by opt_algebraic.  The irony is
opt_algebraic converts these to simple mov instructions that
opt_copy_propagation should clean up.  I don't think we want a loop like

   do {
      progress = false;
      if (OPT(opt_copy_propagation)) {
         OPT(opt_algebraic);
         OPT(dead_code_eliminate);
      }
   } while (progress);

But maybe we do?

Maybe this would be sufficient:

   while (OPT(opt_copy_propagation))
      OPT(opt_algebraic);
   OPT(dead_code_eliminate);

No shader-db or fossil-db changes (yet) on any Intel platform.  This is
expected.

v2: Do opt_algebraic immediately after every call to
opt_copy_propagation instead of being clever. Suggested by Lionel.

Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23884>
2023-07-25 22:11:21 +00:00
Faith Ekstrand
94f36cfaa3 intel/fs: Assume NIR is in SSA form
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24310>
2023-07-25 16:25:11 +00:00
Faith Ekstrand
965bbe5286 intel/fs: Rework the overlapping mov/vec case
Now that we're using load/store_reg intrinsics, the previous checks for
registers aren't what we want.  Instead, we need to be looking for a mov
or vec where both the destination and a source are load/store_reg with
matching decl_reg.

Fixes: b8209d69ff ("intel/fs: Add support for new-style registers")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24310>
2023-07-25 16:25:11 +00:00
Faith Ekstrand
45ee952efb intel/fs: Use write masks from store_reg intrinsics
Fixes: b8209d69ff ("intel/fs: Add support for new-style registers")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24310>
2023-07-25 16:25:10 +00:00
Marcin Ślusarz
4f1125e4ae intel/compiler/test: fix crashes when TEST_DEBUG is set
Dumping instructions requires that ISA info is not empty.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24274>
2023-07-25 15:13:29 +00:00
Lionel Landwerlin
8cbf730145 intel/fs: don't try to rebuild sequences of non ssa values
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 04777171e0 ("intel/fs: try to rematerialize surface computation code")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9378
Reviewed-by: Illia Polishchuk <illia.a.polishchuk@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24228>
2023-07-24 20:04:24 +00:00
Faith Ekstrand
079e8a9674 anv,hasvk,iris: sampler_prog_key::swizzles is only used on crocus
The field is no longer consumed by brw_complie_* and is instead handled
directly by the crocus driver.  Therefore, it's safe to leave it zero
and not even bother setting it.  This removes our reliance on the
SWIZZLE_* macros in prog_instructions.h.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24288>
2023-07-24 15:40:40 +00:00
Marcin Ślusarz
48885c7fe3 intel/compiler: load debug mesh compaction options once
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>
2023-07-24 07:55:29 +00:00
Marcin Ślusarz
c1685f08dd intel/compiler,anv: put some vertex and primitive data in headers
Both per-primitive and per-vertex space is allocated in MUE in 8 dword
chunks and those 8-dword chunks (granularity of
3DSTATE_SBE_MESH.Per[Primitive|Vertex]URBEntryOutputReadLength)
are passed to fragment shaders as inputs (either non-interpolated
for per-primitive and flat vertex attributes or interpolated
for non-flat vertex attributes).

Some attributes have a special meaning and must be placed in separate
8/16-dword slot called Primitive Header or Vertex Header.

Primitive Header contains 4 such attributes (Cull Primitive,
ViewportIndex, RTAIndex, CPS), leaving 4 dwords (the rest of 8-dword
slot) potentially unused.

Vertex Header is similar - it starts with 3 unused dwords, 1 dword for
Point Size (but if we declare that shader doesn't produce Point Size
then we can reuse it), followed by 4 dwords for Position and optionally
8 dwords for clip distances.

This means we have an interesting optimization problem - we can put
some user attributes into holes in Primitive and Vertex Headers, which
may lead to smaller MUE size and potentially more mesh threads running
in parallel, but we have to be careful to use those holes only when
we need it, otherwise we could force HW to pass too much data to
fragment shader.

Example 1:
Let's assume that Primitive Header is enabled and user defined
12 dwords of per-primitive attributes.

Without packing we would consume 8 + ALIGN(12, 8) = 24 dwords of
MUE space and pass ALIGN(12, 8) = 16 dwords to fragment shader.

With packing, we'll consume 4 + 4 + ALIGN(12 - 4, 8) = 16 dwords of
MUE space and pass ALIGN(4, 8) + ALIGN(12 - 4, 8) = 16 dwords to
fragment shader.

16/16 is better than 24/16, so packing makes sense.

Example 2:
Now let's assume that Primitive Header is enabled and user defined
16 dwords of per-primitive attributes.

Without packing we would consume 8 + ALIGN(16, 8) = 24 dwords of
MUE space and pass ALIGN(16, 16) = 16 dwords to fragment shader.

With packing, we'll consume 4 + 4 + ALIGN(16 - 4, 8) = 24 dwords of
MUE space and pass ALIGN(4, 8) + ALIGN(16 - 4, 8) = 24 dwords to
fragment shader.

24/24 is worse than 24/16, so packing doesn't make sense.

This change doesn't affect vk_meshlet_cadscene in default configuration,
but it speeds it up by up to 25% with "-extraattributes N", where
N is some small value divisible by 2 (by default N == 1) and we
are bound by URB size.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>
2023-07-24 07:55:29 +00:00
Marcin Ślusarz
a252123363 intel/compiler/mesh: compactify MUE layout
Instead of using 4 dwords for each output slot, use only the amount
of memory actually needed by each variable.

There are some complications from this "obvious" idea:
- flat and non-flat variables can't be merged into the same vec4 slot,
  because flat inputs mask has vec4 stride
- multi-slot variables can have different layout:
   float[N] requires N 1-dword slots, but
   i64vec3 requires 1 fully occupied 4-dword slot followed by 2-dword slot
- some output variables occur both in single-channel/component split
  and combined variants
- crossing vec4 boundary requires generating more writes, so avoiding them
  if possible is beneficial

This patch fixes some issues with arrays in per-vertex and per-primitive data
(func.mesh.ext.outputs.*.indirect_array.q0 in crucible)
and by reduction in single MUE size it allows spawning more threads at
the same time.

Note: this patch doesn't improve vk_meshlet_cadscene performance because
default layout is already optimal enough.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>
2023-07-24 07:55:29 +00:00
Alyssa Rosenzweig
1466014184 nir: Rename lower_locals_to_reg_intrinsics back
The short name is freed up.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24253>
2023-07-21 11:25:49 +00:00
Alyssa Rosenzweig
a08286f993 intel/fs: Don't read reg.base_offset
It's not set in the new intrinsics path.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24253>
2023-07-21 11:25:48 +00:00
Marcin Ślusarz
3c83ac8002 intel/compiler: remove redundant code
has_lsc is checked few lines above, so this code doesn't matter.

Added in: a358b97c58 ("intel/fs: optimize uniform SSBO & shared loads")

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24260>
2023-07-21 07:22:22 +00:00
Felix DeGrood
d04be9770b intel/compiler: use shader source hash in shader dump code
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23942>
2023-07-20 09:08:08 +00:00
Lionel Landwerlin
3384f029be intel/compiler: rework input parameters
Use a struct for various common parameters rather than per stage
structure or arguments to stage specific entrypoints.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23942>
2023-07-20 09:08:08 +00:00
Lionel Landwerlin
46958bcb74 intel/fs: fix missing predicate on SEL instruction
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: d8dfd153c5 ("intel/fs: Make per-sample and coarse dispatch tri-state")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9381
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24236>
2023-07-19 21:57:25 +00:00
Faith Ekstrand
61a90c2968 intel/vec4: Drop support for nir_register
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24104>
2023-07-19 02:11:57 +00:00
Faith Ekstrand
39b5bb0809 intel/fs: Drop support for nir_register
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24104>
2023-07-19 02:11:57 +00:00
Faith Ekstrand
ce75c3c3fe intel: Switch to intrinsic-based registers
Results on HSW (vec4 only):

    total instructions in shared programs: 2978400 -> 2974135 (-0.14%)
    instructions in affected programs: 77870 -> 73605 (-5.48%)
    helped: 143
    HURT: 48
    helped stats (abs) min: 1 max: 100 x̄: 30.22 x̃: 9
    helped stats (rel) min: 0.03% max: 30.49% x̄: 8.02% x̃: 6.39%
    HURT stats (abs)   min: 1 max: 4 x̄: 1.19 x̃: 1
    HURT stats (rel)   min: 0.08% max: 16.67% x̄: 3.71% x̃: 3.23%
    95% mean confidence interval for instructions value: -26.69 -17.97
    95% mean confidence interval for instructions %-change: -6.24% -3.90%
    Instructions are helped.

    total cycles in shared programs: 45345924 -> 44742666 (-1.33%)
    cycles in affected programs: 29083466 -> 28480208 (-2.07%)
    helped: 4785
    HURT: 3879
    helped stats (abs) min: 2 max: 8072 x̄: 276.00 x̃: 24
    helped stats (rel) min: 0.02% max: 54.43% x̄: 7.78% x̃: 1.95%
    HURT stats (abs)   min: 2 max: 14736 x̄: 184.95 x̃: 20
    HURT stats (rel)   min: 0.02% max: 97.00% x̄: 7.69% x̃: 1.53%
    95% mean confidence interval for cycles value: -83.49 -55.77
    95% mean confidence interval for cycles %-change: -1.16% -0.55%
    Cycles are helped.

    total spills in shared programs: 1093 -> 539 (-50.69%)
    spills in affected programs: 772 -> 218 (-71.76%)
    helped: 74
    HURT: 0

    total fills in shared programs: 760 -> 757 (-0.39%)
    fills in affected programs: 66 -> 63 (-4.55%)
    helped: 3
    HURT: 0

Results on TGL (all stages):

    total instructions in shared programs: 21486982 -> 21488266 (<.01%)
    instructions in affected programs: 2245938 -> 2247222 (0.06%)
    helped: 1288
    HURT: 1385
    helped stats (abs) min: 1 max: 93 x̄: 4.05 x̃: 2
    helped stats (rel) min: 0.02% max: 3.82% x̄: 0.61% x̃: 0.46%
    HURT stats (abs)   min: 1 max: 134 x̄: 4.69 x̃: 2
    HURT stats (rel)   min: <.01% max: 5.59% x̄: 0.65% x̃: 0.44%
    95% mean confidence interval for instructions value: 0.13 0.83
    95% mean confidence interval for instructions %-change: <.01% 0.08%
    Instructions are HURT.

    total cycles in shared programs: 809326677 -> 809475669 (0.02%)
    cycles in affected programs: 447781659 -> 447930651 (0.03%)
    helped: 1924
    HURT: 1994
    helped stats (abs) min: 1 max: 74567 x̄: 1217.49 x̃: 10
    helped stats (rel) min: <.01% max: 38.44% x̄: 1.09% x̃: 0.17%
    HURT stats (abs)   min: 1 max: 76426 x̄: 1249.47 x̃: 8
    HURT stats (rel)   min: <.01% max: 137.11% x̄: 1.64% x̃: 0.17%
    95% mean confidence interval for cycles value: -125.61 201.67
    95% mean confidence interval for cycles %-change: 0.12% 0.48%
    Inconclusive result (value mean confidence interval includes 0).

    LOST:   4
    GAINED: 4

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24104>
2023-07-19 02:11:57 +00:00
Faith Ekstrand
abb6188a04 intel/vec4: Add support for new-style registers
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24104>
2023-07-19 02:11:57 +00:00
Faith Ekstrand
f783eb9ebd intel/vec4: Assume get_nir_dest() provides a sane write-mask
It should be providing a write mask that is all the channels.  Drop the
one case for load_input where we stomp this for no good reason.  Also,
make ALU write-masking AND with the existing mask.  This prepares us for
the next patch where we convert to new-style registers.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24104>
2023-07-19 02:11:57 +00:00
Faith Ekstrand
b8209d69ff intel/fs: Add support for new-style registers
The old ones still work for now.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24104>
2023-07-19 02:11:57 +00:00
Sagar Ghuge
efa6594536 intel/compiler: Look at 2 register worth of data instead of 4
Sampler always writes 4/8 register worth of data but for ld_mcs only
valid data is in first two register. So with 16-bit payload, we need to
split 2-32bit registers into 4-16-bit payload.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22802>
2023-07-14 21:17:19 +00:00
Marcin Ślusarz
36ff6c0004 intel/compiler: remove NV_mesh_shader support
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24071>
2023-07-14 08:27:14 +00:00
Christian Gmeiner
9383009809 nir: rename has_txs to has_texture_scaling
Convert it to an opt-in for backends to prefer and use nir_load_texture_scale
instead of txs for nir lowerings.

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Suggested-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24054>
2023-07-12 10:03:06 +00:00
Faith Ekstrand
73e191924c nir: Add a reg_intrinsics flag to nir_convert_from_ssa
It doesn't do anything yet. We leave that to the subsequent patches so we can
keep the tree-wide refactor as simple as possible.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23089>
2023-07-12 01:34:27 +00:00
Yonggang Luo
48a25ef700 treewide: Remove all usage of nir_builder_init with nir_builder_create and nir_builder_at
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24038>
2023-07-10 19:20:17 +00:00
Karol Herbst
1e655b2f25 clc: rework optional subgroup feature
OpenCL 3.0 core requires __opencl_c_subgroups to be set, the OpenCL
cl_khr_subgroups extenions can only be enabled if and only if the driver
guarentees independent forward progress between subgroups.

See CL_DEVICE_SUB_GROUP_INDEPENDENT_FORWARD_PROGRESS for more information.

Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: Nora Allen <blackcatgames@protonmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22893>
2023-07-07 12:27:35 +00:00
Lionel Landwerlin
c26c0a36d3 intel/fs: disable coarse pixel shader with interpolater messages at sample
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9292
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23962>
2023-07-06 12:48:52 +00:00
Marcin Ślusarz
7ed9ec70c0 intel/compiler: simplify reading of gl_NumWorkGroups in task/mesh
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22334>
2023-07-04 09:15:08 +00:00
Marcin Ślusarz
1ac1d5d62e anv,intel/compiler: enable shortcut in wg id to wg idx lowering on >= gfx12.5
This speeds up vk_meshlet_cadscene in "VK mesh ext" renderer by 1.4%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22334>
2023-07-04 09:15:08 +00:00
Marcin Ślusarz
7ec1ef75d3 intel/compiler: pass num_workgroups from task to mesh shaders
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22334>
2023-07-04 09:15:08 +00:00
Konstantin Seurer
05269047d3 intel: Use nir_builder_at
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23883>
2023-07-03 15:21:38 +00:00
Rohan Garg
c3110ef1e9 intel/compiler: reuse previously computed bitsize
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23933>
2023-06-30 09:19:57 +00:00
Rohan Garg
7f48c70bab intel/compiler: construct masks instead of using magic values
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23933>
2023-06-30 09:19:57 +00:00
Yonggang Luo
68b8aa788d intel/compiler: Switch to use nir_foreach_function_impl
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23920>
2023-06-29 11:29:54 +00:00
Erik Faye-Lund
c4b6b0d949 intel: use imm-helpers
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23855>
2023-06-29 07:08:19 +00:00
Alyssa Rosenzweig
173b9ee69a treewide: Use nir_builder_create more
perl -p0e 's/nir_builder_init\(&([^,]*), /\1 = nir_builder_create(/g' -i $(git grep -l nir_builder_init)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23860>
2023-06-27 18:13:02 +00:00