Commit Graph

62 Commits

Author SHA1 Message Date
Kenneth Graunke
9886615958 intel/compiler: Move spill/fill tracking to the register allocator
Originally, we had virtual opcodes for scratch access, and let the
generator count spills/fills separately from other sends.  Later, we
started using the generic SHADER_OPCODE_SEND for spills/fills on some
generations of hardware, and simply detected stateless messages there.

But then we started using stateless messages for other things:
- anv uses stateless messages for the buffer device address feature.
- nir_opt_large_constants generates stateless messages.
- XeHP curbe setup can generate stateless messages.

So counting stateless messages is not accurate.  Instead, we move the
spill/fill accounting to the register allocator, as it generates such
things, as well as the load/store_scratch intrinsic handling, as those
are basically spill/fills, just at a higher level.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16691>
2022-05-25 06:56:01 +00:00
Jason Ekstrand
c38812be1d intel/fs: Implement spilling on XeHP
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11582>
2021-06-25 00:18:29 +00:00
Jason Ekstrand
9e0fd49858 intel/fs/ra: Fix payload node setup for SIMD16 on Gen4-5
Since 40e1d798c6, we are now using physical register numbers for
everything which makes it all simpler.  In particular, we no longer need
the special case for setting up the payload for SIMD16 on Gen4-5.  This
fixes a pile of piglit tests on ILK and similar.

Fixes: 40e1d798c6 "intel/fs: Use ra_alloc_contig_reg_class()..."
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11221>
2021-06-07 16:52:19 +00:00
Eric Anholt
40e1d798c6 intel/fs: Use ra_alloc_contig_reg_class() to speed up RA.
By using the new class type, we don't need to make 1928 different
registers to represent each contigous reg size starting from the actual
128 HW register, or have a mapping between RA regs and HW base regs.  With
the number of regs reduced, and the fast q computation when using the new
classes, we no longer need to compute our own q.

This drops the FS RA initialization time on my CFL system from about 1ms to
50us.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
2021-06-04 19:08:57 +00:00
Eric Anholt
95d41a3525 ra: Use struct ra_class in the public API.
All these unsigned ints are awful to keep track of.  Use pointers so we
get some type checking.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
2021-06-04 19:08:57 +00:00
Jason Ekstrand
34c560ae95 intel/fs: Stop using brw_dp_read/write_desc in Gen7+ only code
Those helpers exist primarily to sort out some of the weirdness around
Gen4-6 dataport access.  On Gen5 and earlier, everything was called
"dataport" and, instead of the SFID we have today there was a "target
cache" parameter in the descriptor.  There are also some bits that moved
around on various gens depending on read vs. write.  Starting with Gen6,
most things which target one of the data cache SFIDs should use
brw_dp_desc() instead.

v2: Drop backward comment (Ken)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
2021-05-02 20:20:06 +00:00
Lionel Landwerlin
64551610d1 intel/compiler: rework message descriptors for render targets
Render target message descriptors are slightly different from the
dataport ones. In particular the msg_type field is on bits 14:17 for
RT while bits 14:18 for DP.

v2: Drop unused send_commit_msg field in brw_fb_write_desc() (Ken)

v3: Rebase on top renaming (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
2021-05-02 20:20:06 +00:00
Anuj Phogat
61e8636557 intel: Rename gen_device prefix to intel_device
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "gen_device" -rIl $SEARCH_PATH | xargs sed -ie "s/gen_device/intel_device/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10241>
2021-04-20 20:06:33 +00:00
Anuj Phogat
e7e55af4d6 intel: Rename GENx keyword to GFXx
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "GEN[[:digit:]]+" -rIl $SEARCH_PATH | xargs sed -ie "s/GEN\([[:digit:]]\+\)/GFX\1/g"

Exclude the changes to modifiers:
grep -E "I915_.*GFX" -rIl $SEARCH_PATH | xargs sed -ie "s/\(I915_.*\)GFX/\1GEN/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat
1d296484b4 intel: Rename Genx keyword to Gfxx
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "Gen[[:digit:]]+" -rIl $SEARCH_PATH | xargs sed -ie "s/Gen\([[:digit:]]\+\)/Gfx\1/g"

Exclude changes in src/intel/perf/oa-*.xml:
find src/intel/perf -type f \( -name "*.xml" \) | xargs sed -ie "s/Gfx/Gen/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat
b75f095bc7 intel: Rename genx keyword to gfxx in source files
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "gen[[:digit:]]+" -rIl $SEARCH_PATH | xargs sed -ie "s/gen\([[:digit:]]\+\)/gfx\1/g"

Exclude pack.h and xml changes in this patch:
grep -E "gfx[[:digit:]]+_pack\.h" -rIl $SEARCH_PATH | xargs sed -ie "s/gfx\([[:digit:]]\+_pack\.h\)/gen\1/g"
grep -E "gfx[[:digit:]]+\.xml" -rIl $SEARCH_PATH | xargs sed -ie "s/gfx\([[:digit:]]\+\.xml\)/gen\1/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat
c1f3a778de intel: Rename GENx prefix in macros to GFXx in source files
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "GEN" -rIl src/intel/genxml | grep -E ".*py" |  xargs sed -ie "s/GEN\([%{]\)/GFX\1/g"
grep -E "[^_]GEN[[:digit:]]+" -rIl $SEARCH_PATH | grep -E ".*(\.c|\.h|\.y|\.l)" | xargs sed -ie "s/\([^_]\)GEN\([[:digit:]]\+\)/\1GFX\2/g"

Leave out renaming GFX12_CCS_E macros. They fall under renaming pattern like "_GEN[[:digit:]]+":
grep -E "GFX12_CCS_E" -rIl $SEARCH_PATH | xargs sed -ie "s/GFX12_CCS_E/GEN12_CCS_E/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat
abe9a71a09 intel: Rename gen field in gen_device_info struct to ver
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "info\)*(.|->)gen" -rIl $SEARCH_PATH | xargs sed -ie "s/info\()*\)\(\.\|->\)gen/info\1\2ver/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Jason Ekstrand
b9e9f92f73 intel/fs: Handle payload node interference in destinations
Starting with d0d039a4d3, we emit writes to the push constant chunk
of the payload to stomp out-of-bounds data to zero for Vulkan.  Then, in
369eab9420, we started emitting shader preamble code for emulated
push constants on Gen12.5 parts.  In either of these cases, we can run
into issues if we don't have a proper live range for some of the payload
registers where they get used for something and then smashed by our push
handling code.  We've not seen many issues with this yet because it only
happens when you have dead push constants.

Fixes: d0d039a4d3 "anv: Emit pushed UBO bounds checking code..."
Fixes: 369eab9420 "intel/fs: Emit code for Gen12-HP indirect..."
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9501>
2021-03-10 22:17:41 +00:00
Yevhenii Kolesnikov
5ad54d498c intel/fs: don't spill a register, set by undef
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3941
Signed-off-by: Yevhenii Kolesnikov <yevhenii.kolesnikov@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8185>
2020-12-21 21:18:01 +00:00
Jason Ekstrand
5abac85177 intel/fs: Rework scratch handling on Gen9+
The current scratch mechanism uses an MRF hack where we reserve a few
GRF registers to treat like the MRF and we collect the data into that
MRF region before doing a scratch write.  We also use that region for
the header for scratch reads.

This commit changes things and gets rid of the MRF hack.  Instead, we
reserve a single register (which RA is free to pick) for the scratch
header and uses split sends for scratch writes to avoid having to do
the copy.  This should provide RA with more freedom in the presence of
spilling as well as avoid some unnecessary data moves.  In future, the
new GEN9_SCRATCH_HEADER opcode gives us a place where we can do our own
per-thread scratch base address calculations rather than depending on
the scratch base address that gets pushed into g0.  Having an opcode for
this lets us do it once at the top of the shader rather than repeating
it at every read/write.

One other noticeable difference is the use of SHADER_OPCODE_SEND.  We
can get away with this thanks to the fact that we're now using a set to
track which instructions are generated by spills and don't rely on the
opcodes to find spill/fill instructions.  This allows us to avoid adding
more virtual opcodes and let the normal code paths handle things like
scoreboard dependencies between header setup and the SEND.  It also
means that post-RA scheduling may be able to space out the header setup
MOV and the SEND for better latency hiding.

Shader-db results on Skylake:

    total spills in shared programs: 12137 -> 10604 (-12.63%)
    spills in affected programs: 6685 -> 5152 (-22.93%)
    helped: 274
    HURT: 2

    total fills in shared programs: 13065 -> 11515 (-11.86%)
    fills in affected programs: 9007 -> 7457 (-17.21%)
    helped: 275
    HURT: 1

Shader-db results on Ice Lake:

    total spills in shared programs: 12482 -> 10953 (-12.25%)
    spills in affected programs: 6586 -> 5057 (-23.22%)
    helped: 275
    HURT: 0

    total fills in shared programs: 12819 -> 11234 (-12.36%)
    fills in affected programs: 7867 -> 6282 (-20.15%)
    helped: 274
    HURT: 0

Shader-db results on Tigerlake:

    total spills in shared programs: 11689 -> 10233 (-12.46%)
    spills in affected programs: 4740 -> 3284 (-30.72%)
    helped: 259
    HURT: 0

    total fills in shared programs: 10840 -> 9443 (-12.89%)
    fills in affected programs: 6244 -> 4847 (-22.37%)
    helped: 259
    HURT: 0

Fossil-db results on Ice Lake:

    Spills in all programs: 245249 -> 201633 (-17.8%)
    Fills in all programs: 366066 -> 314368 (-14.1%)

More practically, this seems to give about a 0.5-1% perf boost in
Witcher 3 (DXVK) and Shadow of the Tomb Raider (Vulkan native).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13 21:59:27 +00:00
Jason Ekstrand
e557af9781 intel/fs/ra: Use a set to track added spill/fill instructions
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13 21:59:27 +00:00
Jason Ekstrand
f650c4c0c6 intel/fs/ra: Sanity-check our IP counts
Starting with e99081e76d, we don't re-construct liveness information
every time we spill a register.  Instead, we're very careful to track
which instructions are spill instructions and not contribute those to
the IP count so that we can continue to use the old liveness information
even though instructions have been added.  This commit adds an assert
that sanity-checks that we count the same number of instructions as our
liveness information is based on.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13 21:59:27 +00:00
Jason Ekstrand
d80d0a6ced intel/fs/ra: Store the last non-spill VGRF node
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13 21:59:27 +00:00
Jason Ekstrand
2af6528c33 intel/fs/ra: Refactor handling of Gen7 scratch reads
The attempt at de-duplication with the gen7_read Boolean wasn't actually
saving us anything.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13 21:59:27 +00:00
Jason Ekstrand
74a1843ca0 intel/fs/ra: Increment spill_offset as part of the emit_spill loop
This makes it consistent with our handling of src.offset and with our
handling of spill_offset in emit_unspill.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13 21:59:27 +00:00
Marcin Ślusarz
5ea0b6a9c6 intel/compiler: initialize remaining fields of various classes
These variables seem to be initialized before being used, so this
patch is not fixing any bug, but leaving them unitialized may become
a bug after some refactoring.

These classes were affected: fs_reg_alloc, fs_visitor, fs_generator,
instruction_scheduler.

Found by Coverity.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6667>
2020-09-10 12:16:58 +00:00
Dylan Baker
f8e4542bad replace _mesa_logbase2 with util_logbase2
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>
2020-04-21 11:09:03 -07:00
Matt Turner
75a33e268e intel/compiler: Mark some methods and parameters const
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4093>
2020-03-09 04:44:11 +00:00
Matt Turner
43019c6f2c intel/compiler: Remove unnecessary local variables
These are already provided in the fs_reg_alloc class.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4093>
2020-03-09 04:44:11 +00:00
Francisco Jerez
ea44de6d8c intel/compiler/fs: Switch liveness analysis to IR analysis framework
This involves wrapping fs_live_variables in a BRW_ANALYSIS object and
hooking it up to invalidate_analysis() so it's properly invalidated.
Seems like a lot of churn but it's fairly straightforward.  The
fs_visitor invalidate_ and calculate_live_intervals() methods are no
longer necessary after this change.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:20:57 -08:00
Francisco Jerez
ba73e606f6 intel/compiler: Move all live interval analysis results into fs_live_variables
This moves the following methods that are currently defined in
fs_visitor (even though they are side products of the liveness
analysis computation) and are already implemented in
brw_fs_live_variables.cpp:

> bool virtual_grf_interferes(int a, int b) const;
> int *virtual_grf_start;
> int *virtual_grf_end;

It makes sense for them to be part of the fs_live_variables object,
because they have the same lifetime as other liveness analysis results
and because this will allow some extra validation to happen wherever
they are accessed in order to make sure that we only ever use
up-to-date liveness analysis results.

This shortens the virtual_grf prefix in order to compensate for the
slightly increased lexical overhead from the live_intervals pointer
dereference.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:20:43 -08:00
Francisco Jerez
ab6d792986 intel/compiler: Pass detailed dependency classes to invalidate_analysis()
Have fun reading through the whole back-end optimizer to verify
whether I've missed any dependency flags -- Or alternatively, just
trust that any mistake here will trigger an assertion failure during
analysis pass validation if it ever poses a problem for the
consistency of any of the analysis passes managed by the framework.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:20:39 -08:00
Francisco Jerez
d966a6b4c4 intel/compiler: Introduce backend_shader method to propagate IR changes to analysis passes
The invalidate_analysis() method knows what analysis passes there are
in the back-end and calls their invalidate() method to report changes
in the IR.  For the moment it just calls invalidate_live_intervals()
(which will eventually be fully replaced by this function) if anything
changed.

This makes all optimization passes invalidate DEPENDENCY_EVERYTHING,
which is clearly far from ideal -- The dependency classes passed to
invalidate_analysis() will be refined in a future commit.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:20:32 -08:00
Francisco Jerez
0dd18d70ae intel/fs/gen6: Generalize aligned_pairs_class to SIMD16 aligned barycentrics.
This is mainly meant to avoid shader-db regressions on SNB as we start
using VGRFs for barycentrics more frequently.  Currently the
aligned_pairs_class is only useful in SIMD8 mode, because in SIMD16
mode barycentric vectors are typically 4 GRFs.  This is not a problem
on Gen4-5, because on those platforms all VGRF allocations are
pair-aligned in SIMD16 mode.  However on Gen6 we end up using either
the fast or the slow path of LINTERP rather non-deterministically
based on the behavior of the register allocator.

Fix it by repurposing aligned_pairs_class to hold PLN-aligned
registers of whatever the natural size of a barycentric vector is in
the current dispatch width.

On SNB this prevents the following shader-db regressions (including
SIMD32 programs) in combination with the interpolation rework part of
this series:

   total instructions in shared programs: 13983257 -> 14527274 (3.89%)
   instructions in affected programs: 1766255 -> 2310272 (30.80%)
   helped: 0
   HURT: 11608

   LOST:   26
   GAINED: 13

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-01-17 13:22:34 -08:00
Francisco Jerez
369aef851d intel/fs/gen4-6: Allocate registers from aligned_pairs_class based on LINTERP use.
Previously we would hardcode fs_visitor::delta_xy barycentrics to be
allocated from aligned_pairs_class on hardware with PLN source
alignment restrictions (pre-Gen7).  Instead allocate any registers
consumed by LINTERP from aligned_pairs_class, even if some barycentric
vector had ended up in a temporary.

On SNB this prevents the following shader-db regressions (including
SIMD32 programs) in combination with the interpolation rework part of
this series:

   total instructions in shared programs: 13983257 -> 14527274 (3.89%)
   instructions in affected programs: 1766255 -> 2310272 (30.80%)
   helped: 0
   HURT: 11608

   LOST:   26
   GAINED: 13

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-01-17 13:22:20 -08:00
Francisco Jerez
c20dc9b836 intel/fs: Make implied_mrf_writes() an fs_inst method.
This will be convenient in a later commit enabling SIMD32 fragment
shaders, and happens to fix the calculation for MATH instructions
which is currently inaccurate for SIMD-lowered instructions on Gen4-5
platforms (all of them on Gen4 in SIMD16 mode), since it was based on
the shader's dispatch width rather than on the actual execution size
of the instruction.

This causes some shader-db noise on Gen4 due to the more compact
register allocation interacting with the SEND dependency workarounds,
but otherwise no major changes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-01-10 11:02:30 -08:00
Francisco Jerez
0703eab012 intel/fs/gen8+: Fix r127 dst/src overlap RA workaround for EOT message payload.
The problem occured when the return payload of a SIMD8 SEND
instruction was re-used as source payload of an EOT SEND message.  In
such cases the interference edge added by that workaround between the
payload and grf127_send_hack_node would have no effect, because the
payload would be allocated to a fixed range of registers containing
r127 by the special handling of EOT message payloads in the same
function.  This would cause things to blow up if the source payload of
the first SIMD8 message ended up being allocated to a range which
happened to overlap the destination.

Fix it by avoiding r127 altogether in the allocation of EOT message
payloads.

The problem can be reproduced on ICL with the fp-indirections2 Piglit
test-case in combination with the other optimizer changes of this
series.

Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends dest"
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-01-10 11:00:42 -08:00
Jason Ekstrand
a84de3fb7c intel/fs: Skip registers faster when setting spill costs
This might be slightly faster since we're doing one read rather than
two before we decide to skip.  The more important reason, however, is
because no_spill prevents us from re-spilling spill registers.  In the
new world in which we don't re-calculate liveness every spill, we may
not have valid liveness for spill registers so we shouldn't even look
their live ranges up.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110825
Fixes: e99081e76d "intel/fs/ra: Spill without destroying the..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-04 14:37:56 +00:00
Jason Ekstrand
b2d274c677 intel/fs/ra: Choose a spill reg before throwing away the graph
Otherwise, we get an effectively random spill reg because we no longer
have the information from RA to guide us.  Also, a completely clean
graph has undefined data in in_stack which is used for choosing the
spill reg so it really is non-deterministic.

Fixes: e99081e76d "intel/fs/ra: Spill without destroying the..."
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-16 02:13:09 +00:00
Jason Ekstrand
c19acf321c intel/fs/ra: Add spill costs to the graph on-demand
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-16 02:13:09 +00:00
Jason Ekstrand
2c14e2b5bf intel/fs/ra: Add a helper for discarding the interference graph
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-16 02:13:09 +00:00
Jason Ekstrand
e99081e76d intel/fs/ra: Spill without destroying the interference graph
Instead of re-building the interference graph every time we spill, we
modify it in place so we can avoid recalculating liveness and the whole
O(n^2) interference graph building process.  We make a simplifying
assumption in order to do so which is that all spill/fill temporary
registers live for the entire duration of the instruction around which
we're spilling.  This isn't quite true because a spill into the source
of an instruction doesn't need to interfere with its destination, for
instance.  Not re-calculating liveness also means that we aren't
adjusting spill costs based on the new liveness.  The combination of
these things results in a bit of churn in spilling.  It takes a large
cut out of the run-time of shader-db on my laptop.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311224 -> 15311360 (<.01%)
    instructions in affected programs: 77027 -> 77163 (0.18%)
    helped: 11
    HURT: 18

    total cycles in shared programs: 355544739 -> 355830749 (0.08%)
    cycles in affected programs: 203273745 -> 203559755 (0.14%)
    helped: 234
    HURT: 190

    total spills in shared programs: 12049 -> 12042 (-0.06%)
    spills in affected programs: 2465 -> 2458 (-0.28%)
    helped: 9
    HURT: 16

    total fills in shared programs: 25112 -> 25165 (0.21%)
    fills in affected programs: 6819 -> 6872 (0.78%)
    helped: 11
    HURT: 16

    Total CPU time (seconds): 2469.68 -> 2360.22 (-4.43%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
147665d0a2 intel/fs/ra: Put the VGRFs at the end of the nodes
This is slightly less convenient in some places but it will make it much
easier when we want to start adding nodes dynamically.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
e7b7d572b3 intel/fs/ra: Re-arrange interference setup
The old code was arranged by the type of interference being added.  It
would set up payload registers and then add payload interference for all
VGRFs.  It would set up MRFs and add MRF interference for all VGRFs.
This commit re-arranges things to be organized differently.  It first
creates and sets up all RA nodes and then groups interference into two
new categories:  live range and instruction interference.  Once all the
RA nodes have been set up, it walks the list of VGRFs and sets up their
live range interference and then walks the list of instructions and sets
up instruction interference.  This new arrangement will be advantageous
for a future patch but, at the moment, it cuts 2% off the run-time of
shader-db on my laptop.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311224 -> 15311224 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 355544739 -> 355544739 (0.00%)
    cycles in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    Total CPU time (seconds): 2523.45 -> 2469.68 (-2.13%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
0fd60e95fb intel/fs/ra: Do the spill loop inside RA
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
47b1dcdcab intel/fs/ra: Only add MRF hack interference if we're spilling
The only use of the MRF hack these days is for spilling and there we
don't need the precise MRF usage information.  If we're spilling then we
know pretty well how many MRFs are going to be used.  It is possible if
the only things that are spilled have fewer SIMD channels than the
dispatch width of the shader that this may be more MRFs than needed.
That's a risk we're willing to takd.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311100 -> 15311224 (<.01%)
    instructions in affected programs: 16664 -> 16788 (0.74%)
    helped: 1
    HURT: 5

    total cycles in shared programs: 355543197 -> 355544739 (<.01%)
    cycles in affected programs: 731864 -> 733406 (0.21%)
    helped: 3
    HURT: 6

The hurt shaders are all SIMD32 compute shaders where we reserve enough
space for a 32-wide spill/fill but don't need it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
69878a9bb0 intel/fs/ra: Pull the guts of RA into its own class
This accomplishes two things.  First, it makes interfaces which are
really private to RA private to RA.  Second, it gives us a place to
store some common stuff as we go through the algorithm.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
9e00a251be intel/fs/ra: Move assign_regs further down in the file
It's the main function from which all the other functions are called.
It belongs at the bottom.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
5d9ac57c8c intel/fs/ra: Split building the interference graph into a helper
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
472ef2f98d intel/fs/ra: Initialize grf_used with first_non_payload_grf
There's no reason why we need to use the calculated payload_node_count
value which is just first_non_payload_grf aligned up.  The grf_used
value will be aligned up to 16 anyway (which is a much bigger alignment)
before being handed off to hardware.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
096ad8a809 intel/fs/ra: Stop adding RA interference to too many SENDS nodes
We only have one node per VGRF so this was adding way too much
interference.  No idea how we didn't catch this before.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311100 -> 15311100 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 355468050 -> 355543197 (0.02%)
    cycles in affected programs: 2472492 -> 2547639 (3.04%)
    helped: 17
    HURT: 20

Fixes: 014edff0d2 "intel/fs: Add interference between SENDS sources"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
88cac12230 intel/fs/ra: Only add dest interference to sources that exist
Fixes: 83dedb6354 "i965: Add src/dst interference for certain"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Juan A. Suarez Romero
b06ae53606 Revert "intel/compiler: split is_partial_write() into two variants"
This reverts commit 40b3abb4d1.

It is not clear that this commit was entirely correct, and unfortunately
it was pushed by error.

CC: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-25 09:19:10 +02:00
Jason Ekstrand
cd4ffb376f intel/fs: Account for live range lengths in spill costs
The current register allocator has a concept of "spill benefit" which is
based on the number of nodes with which a given node interferes.  The
idea is that you want to spill stuff with high interference because
those are the most likely registers to help when spilling.  However,
this fails to take into account the length of the live range so the
allocator frequently picks "cheap" (not many uses) registers which are
actually very short lived and so spilling them doesn't help with the
pressure situation.

This commit takes into account the length of the live range to make
long-lived registers more likely to get spilled than short-lived ones.
This encourages the spill chooser to choose slightly larger registers
which will affect a larger area of the program and hopefully we have to
spill fewer of them to get the same reduction in over-all register
pressure.

Shader-db results on Kaby Lake:

    total spills in shared programs: 23664 -> 12050 (-49.08%)
    spills in affected programs: 19243 -> 7629 (-60.35%)
    helped: 296
    HURT: 8

    total fills in shared programs: 32028 -> 25139 (-21.51%)
    fills in affected programs: 20378 -> 13489 (-33.81%)
    helped: 295
    HURT: 16

Of course, most of that is in Deus Ex...

Shader-db results on Kaby Lake (without Deus Ex):

    total spills in shared programs: 6479 -> 5834 (-9.96%)
    spills in affected programs: 3231 -> 2586 (-19.96%)
    helped: 40
    HURT: 4

    total fills in shared programs: 17165 -> 17099 (-0.38%)
    fills in affected programs: 6951 -> 6885 (-0.95%)
    helped: 40
    HURT: 7

Even without Deus Ex, the spill help is pretty respectable.  The worst
hurt shaders were one compute shader in Aztec Ruins and one fragment
shader in KSP that were each hurt by around 13% fill 9% spill.

VkPipeline-db results on Kaby Lake:

    total spills in shared programs: 9149 -> 8069 (-11.80%)
    spills in affected programs: 5197 -> 4117 (-20.78%)
    helped: 27
    HURT: 16

    total fills in shared programs: 26390 -> 25477 (-3.46%)
    fills in affected programs: 12662 -> 11749 (-7.21%)
    helped: 24
    HURT: 22

The Vulkan results were decidedly more mixed but we don't have nearly as
many apps in that database yet.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 23:04:45 +00:00