Commit Graph

2745 Commits

Author SHA1 Message Date
Francisco Jerez
37e280f28a intel/fs: Lower unsupported regioning with non-trivial 2D regions on FIXED_GRFs.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Caio Oliveira
dd632bf527 intel/fs/xe2+: Update TASK/MESH payload setup for Xe2 reg size.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Caio Oliveira
8944ac7d6c intel/fs/xe2+: Update BS payload setup for Xe2 reg size.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
14e1b9ee69 intel/fs/xe2+: Update TES payload setup for Xe2 reg size.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
4b3243104c intel/fs/xe2+: Update TCS payload setup for Xe2 reg size.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
6195eac210 intel/fs/xe2+: Update GS payload setup for Xe2 reg size.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Caio Oliveira
28744c8954 intel/compiler/xe2: Account for reg_unit() in TES intrinsics
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Caio Oliveira
9859f5b4d2 intel/compiler/xe2: Account for reg_unit() in TCS intrinsics
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
610daa3166 intel/fs/xe2+: Fix payload layout of sampler messages for Xe2 reg size
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Ian Romanick
c9f2857546 intel/compiler/xe2: TXD is lowered to SIMD16 in SIMD32 mode
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Ian Romanick
ef817650c9 intel/compiler/xe2: Use SIMD16 for nir_intrinsic_image_size
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Ian Romanick
0b23df3951 intel/compiler/xe2: Update fs_visitor::setup_vs_payload to account for Xe2 reg size
[ Francisco Jerez: Simplify. ]

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Rohan Garg
42b90f05f6 intel/compiler: Adjust barrier emission for Xe2+
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
8b1dc77521 intel/fs/xe2+: Scale BRW_MAX_MSG_LENGTH by native register size.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Rohan Garg
4de065f6a2 intel/compiler: Adjust fence message lengths for new register width on Xe2+
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Rohan Garg
e1289d6135 intel/compiler: Adjust CS payload registers for new register width on Xe2+
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
150b3e87c8 intel/fs/xe2+: Round up fs_builder::vgrf() size calculation to HW register unit.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
24dcc3269b intel/fs/xe2+: Update encoding of FB write message payload.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
a573531785 intel/compiler/xe2+: Represent dispatch_grf_start_reg in native GRF units.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
17ef5e7ead intel/fs/xe2+: Allow increased SIMD width for various get_fpu_lowered_simd_width() restrictions.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
6423cb9bfa intel/eu/xe2+: Update validation of GRF region size to account for Xe2 reg size
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
00b614a5a7 intel/fs/xe2+: Scale MAX_SAMPLER_MESSAGE_SIZE by native register size.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
421d43fe62 intel/fs/xe2+: Fixes for increased accumulator register width.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
80e9031b44 intel/fs/xe2+: Fix grf_count in post-RA scheduling for updated register file size.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
571ddf8516 intel/fs/xe2+: Fix payload node live range calculations for change in register size.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
2b7419d090 intel/fs: Fix signedness of payload_node_count argument of calculate_payload_ranges().
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
abf8111560 intel/eu/xe2+: Fix encoding of various message descriptors for change in register size.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
6d39b3d6ae intel/fs/ra/xe2: Scale up register allocation granularity by 2x on Xe2+ platforms.
v2: Fix spill register allocation.  Switch to brw_reg::nr
    representation in fake 256b units.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
bd98df5d8e intel/compiler: Make MAX_VGRF_SIZE macro depend on devinfo and update it for Xe2.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Francisco Jerez
a7d521e556 intel/vec4/ra: Define REG_CLASS_COUNT constant specifying the number of register classes.
Rework:
 * Jordan: 16=>20 following d33aff783d ("intel/fs: add support for
   sparse accesses")

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:35 -07:00
Francisco Jerez
5d87f41a54 intel/fs/ra: Define REG_CLASS_COUNT constant specifying the number of register classes.
Rework:
 * Jordan: 16=>20 following d33aff783d ("intel/fs: add support for
   sparse accesses")

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:35 -07:00
Connor Abbott
4282386311 nir/spirv: Add inverse_ballot intrinsic
This is actually a no-op on AMD, so we really don't want to lower it to
something more complicated.  There may be a more efficient way to do
this on Intel too. In addition, in the future we'll want to use this for
lowering boolean reduce operations, where the inverse ballot will
operate on the backend's "natural" ballot type as indicated by
options->ballot_bit_size, instead of uvec4 as produced by SPIR-V. In
total, there are now three possible lowerings we may have to perform:

- inverse_ballot with source type of uvec4 from SPIR-V to inverse_ballot
with natural source type, when the backend supports inverse_ballot
natively.
- inverse_ballot with source type of uvec4 from SPIR-V to arithmetic,
when the backend doesn't support inverse_ballot.
- inverse_ballot with natural source type from reduce operation, when
the backend doesn't support inverse_ballot.

Previously we just did the second lowering unconditionally in vtn, but
it's just a combination of the first and third. We add support here for
the first and third lowerings in nir_lower_subgroups, instead of simply
moving the second lowering, to avoid unnecessary churn.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25123>
2023-09-20 14:41:18 +00:00
Pavel Ondračka
1c72c71bdf nir/move_vec_src_uses_to_dest: allow to skip reuse of constant sources
And enable this for r300 and intel-vec4

crocus HSW (mostly helps few doplhin ubershaders):
total instructions in shared programs: 1576736 -> 1576589 (<.01%)
instructions in affected programs: 38235 -> 38088 (-0.38%)
helped: 12
HURT: 0
total cycles in shared programs: 111025838 -> 110944796 (-0.07%)
cycles in affected programs: 5646582 -> 5565540 (-1.44%)
helped: 15
HURT: 6
total spills in shared programs: 447 -> 432 (-3.36%)
spills in affected programs: 186 -> 171 (-8.06%)
helped: 12
HURT: 0
total fills in shared programs: 792 -> 774 (-2.27%)
fills in affected programs: 291 -> 273 (-6.19%)
helped: 12
HURT: 0

r300 RV530:
total instructions in shared programs: 96655 -> 96304 (-0.36%)
instructions in affected programs: 15020 -> 14669 (-2.34%)
helped: 79
HURT: 18
total temps in shared programs: 13027 -> 12952 (-0.58%)
temps in affected programs: 677 -> 602 (-11.08%)
helped: 41
HURT: 9
total cycles in shared programs: 147745 -> 147314 (-0.29%)
cycles in affected programs: 21831 -> 21400 (-1.97%)
helped: 84
HURT: 19

r300 RV370:
total instructions in shared programs: 63678 -> 63669 (-0.01%)
instructions in affected programs: 931 -> 922 (-0.97%)
helped: 12
HURT: 6
total temps in shared programs: 10028 -> 10013 (-0.15%)
temps in affected programs: 339 -> 324 (-4.42%)
helped: 33
HURT: 10
total cycles in shared programs: 101118 -> 101087 (-0.03%)
cycles in affected programs: 2659 -> 2628 (-1.17%)
helped: 22
HURT: 6

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24932>
2023-09-19 18:05:37 +02:00
Alyssa Rosenzweig
d1eb17e92e treewide: Drop nir_ssa_for_src users
Via Coccinelle patch:

    @@
    expression b, s, n;
    @@

    -nir_ssa_for_src(b, *s, n)
    +s->ssa

    @@
    expression b, s, n;
    @@

    -nir_ssa_for_src(b, s, n)
    +s.ssa

Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25247>
2023-09-18 10:25:17 -04:00
Sviatoslav Peleshko
b1a63d5418 intel/fs: Check if the whole ubo load range is in the push const range
Before this, we were checking only the beginning of the ubo range, so
partially overlapping loads were trying to load undefined data.

Fixes: b2da1238 ("i965: Use pushed UBO data in the scalar backend.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9748
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25111>
2023-09-15 10:55:24 +00:00
Ian Romanick
92f5442489 intel/fs: Merge copy prop dataflow loops
This is kept as a separate commit because the change looks like a lot
more than it it. The order of the two loops is swapped, then the two
loops are merged.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
fa2757aa97 intel/fs: Use rb_tree for copy prop dataflow
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
35644bb483 intel/fs: Use rb_tree to store ACP entries by destination
Using a single data structure seems better. There's no appreciable
performance change. On batman_arkham_city_goty.foz, the difference
reported was 0.48%±0.36% (n=20). Several commits in the MR, including
some that should have no effect at all, reported similar changes. I
attribute this primarily changing of loop alignments and similar.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
c28bf1a249 intel/fs: Use rb_tree to store ACP entries by source
On batman_arkham_city_goty.foz, this improves fossil-db time by
-3.83%±0.24% (n=20). This fossil takes the longest time of any in my
database.

v2: Add some comments for cmp_entry_src_entry_src and
cmp_entry_src_nr. Suggested by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
06bdd3eac0 intel/fs: Encapsulate per-block ACP in a structure
This simplifies some later changes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
c262752d74 intel/fs: Make opt_copy_propagation_local file private
This annoyed me durning development of this MR. Every time I changed the
parameters to this internal function, I had to modify a public header
file... and trigger a much large rebuild.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
0946108298 intel/fs: Simplify check in can_propagate_from
The larger predicate here already requires that inst->opcode must be
BRW_OPCODE_MOV, so it can't BRW_OPCODE_SEL. With that removed, the
other simplifications are pretty straight forward.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
1f15a0f8b2 intel/fs: Don't loop in try_constant_propagate
The caller already loops over the sources. This means that the caller
must loop over the sources in reverse because constant propagation
prefers to propagate into the last sources first.

The shader-db and fossil-db changes (below) are all due to SEL
instructions. Changing the order sources are visited changes whether a
SEL with two immediate sources is

    (+f0.0) sel     g12    IMM_A    IMM_B

or

    (-f0.0) sel     g12    IMM_B    IMM_A

The ordering of the sources affects the order the constant combining
encounters the values, and the determines which value is "combined"
and which value remains an immediate.

This affects the results by luck. If there are two instructions:

    (+f0.0) sel     g12    IMM_A    IMM_B
    (+f0.0) sel     g13    IMM_A    IMM_C

Picking IMM_A is advantageous over picking IMM_B and IMM_C. Since the
selection algorithm in constant combining is greedy, this case
requires the algorithm see the values in just the right order for the
right thing to happen.

v2: Rebase on many, many changes. Move instruction source fixup
reordering out or try_constant_propagate.

v3: Rebase on !7698.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
ab23d89ade intel/fs: Move src.file checks out of try_constant_propagate and try_copy_propagate
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:23 +00:00
Ian Romanick
b5b2338c5c intel/fs: Make try_constant_propagate and try_copy_propagate file private
This annoyed me durning development of this MR. Every time I changed the
parameters to this internal function, I had to modify a public header
file... and trigger a much large rebuild.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:22 +00:00
Ian Romanick
8665e37960 intel/fs: Don't try to copy propagate into a source again after progress is made
If the linked list structure used depended on the list head to know when
to terminate, this would be a pretty serious bug. If try_constant_propage
or try_copy_propagate make progress, inst->src[i].nr will change. This
results in the foreach_in_list using a different list header on later
iterations of the loop.

This causes two shaders in shader-db and 9 shaders in fossil-db to
change. Looking at the code changes, these are cases where there was a
copy of a copy that gets propagated. The part that confuses me is the
VGRF numbers involved should **not** hash to the same bucket, so it
should be impossible to find the original source from the intermediate
VGRF.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:22 +00:00
Ian Romanick
e488b46419 intel/fs: Don't continue fixed point iteration just because liveout changes
Unless the change in liveout also causes livein to change, updates to
liveout cannot have any global effect. Changes to livein already flag
additional interation.

I had additional changes in this area that didn't pan out. While working
on those change, I was a little confused about this bit of code. It's
unnecessary, so it's better to delete it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25091>
2023-09-14 22:31:22 +00:00
Caio Oliveira
3890c60584 compiler/types: Remove unused GLSL_TYPE_FUNCTION and related functions
GLSL doesn't use that type.  SPIR-V used for a while but later started
relying on its own data structures and stopped using it.
See ca62e849d3 ("nir/spirv: Stop using glsl_type for function types")

If we were ever to add this one again, would be better to have a way to
grab a key for lookup that did not require allocations, right now that's
needed to inject return type as the first element in params array.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25160>
2023-09-12 23:18:12 +00:00
Iván Briano
f1bc58cb7b intel/fs: use ffsll so we don't explode on 32 bits
Fixes: b200e5765c ("anv: use a simpler MUE layout for fast linked libraries")

Tested-by: Mark Janes <markjanes@swizzler.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25192>
2023-09-12 22:42:38 +00:00
Iván Briano
4eddeea7bf intel/fs: handle URB setup for fast linked mesh pipelines
Up until now, the mesh pipeline assumed it would be always linked to the
fragment shader, and so the calculated MUE map would always be
available.
That is not the case for fast linked pipeline libraries, so the URB
setup needs to account for this. We do this by replicating what's done
for non-mesh pipelines, defining the URB based on the FS inputs, and
always assuming they will be laid out in order of varying number, except
that we also account for per-primitive attributes.

Fixes all GPL using tests under dEQP-VK.mesh_shader.ext.smoke.*

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25047>
2023-09-12 02:51:31 +00:00