Commit Graph

40 Commits

Author SHA1 Message Date
Dylan Baker
f8e4542bad replace _mesa_logbase2 with util_logbase2
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>
2020-04-21 11:09:03 -07:00
Matt Turner
75a33e268e intel/compiler: Mark some methods and parameters const
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4093>
2020-03-09 04:44:11 +00:00
Matt Turner
43019c6f2c intel/compiler: Remove unnecessary local variables
These are already provided in the fs_reg_alloc class.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4093>
2020-03-09 04:44:11 +00:00
Francisco Jerez
ea44de6d8c intel/compiler/fs: Switch liveness analysis to IR analysis framework
This involves wrapping fs_live_variables in a BRW_ANALYSIS object and
hooking it up to invalidate_analysis() so it's properly invalidated.
Seems like a lot of churn but it's fairly straightforward.  The
fs_visitor invalidate_ and calculate_live_intervals() methods are no
longer necessary after this change.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:20:57 -08:00
Francisco Jerez
ba73e606f6 intel/compiler: Move all live interval analysis results into fs_live_variables
This moves the following methods that are currently defined in
fs_visitor (even though they are side products of the liveness
analysis computation) and are already implemented in
brw_fs_live_variables.cpp:

> bool virtual_grf_interferes(int a, int b) const;
> int *virtual_grf_start;
> int *virtual_grf_end;

It makes sense for them to be part of the fs_live_variables object,
because they have the same lifetime as other liveness analysis results
and because this will allow some extra validation to happen wherever
they are accessed in order to make sure that we only ever use
up-to-date liveness analysis results.

This shortens the virtual_grf prefix in order to compensate for the
slightly increased lexical overhead from the live_intervals pointer
dereference.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:20:43 -08:00
Francisco Jerez
ab6d792986 intel/compiler: Pass detailed dependency classes to invalidate_analysis()
Have fun reading through the whole back-end optimizer to verify
whether I've missed any dependency flags -- Or alternatively, just
trust that any mistake here will trigger an assertion failure during
analysis pass validation if it ever poses a problem for the
consistency of any of the analysis passes managed by the framework.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:20:39 -08:00
Francisco Jerez
d966a6b4c4 intel/compiler: Introduce backend_shader method to propagate IR changes to analysis passes
The invalidate_analysis() method knows what analysis passes there are
in the back-end and calls their invalidate() method to report changes
in the IR.  For the moment it just calls invalidate_live_intervals()
(which will eventually be fully replaced by this function) if anything
changed.

This makes all optimization passes invalidate DEPENDENCY_EVERYTHING,
which is clearly far from ideal -- The dependency classes passed to
invalidate_analysis() will be refined in a future commit.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:20:32 -08:00
Francisco Jerez
0dd18d70ae intel/fs/gen6: Generalize aligned_pairs_class to SIMD16 aligned barycentrics.
This is mainly meant to avoid shader-db regressions on SNB as we start
using VGRFs for barycentrics more frequently.  Currently the
aligned_pairs_class is only useful in SIMD8 mode, because in SIMD16
mode barycentric vectors are typically 4 GRFs.  This is not a problem
on Gen4-5, because on those platforms all VGRF allocations are
pair-aligned in SIMD16 mode.  However on Gen6 we end up using either
the fast or the slow path of LINTERP rather non-deterministically
based on the behavior of the register allocator.

Fix it by repurposing aligned_pairs_class to hold PLN-aligned
registers of whatever the natural size of a barycentric vector is in
the current dispatch width.

On SNB this prevents the following shader-db regressions (including
SIMD32 programs) in combination with the interpolation rework part of
this series:

   total instructions in shared programs: 13983257 -> 14527274 (3.89%)
   instructions in affected programs: 1766255 -> 2310272 (30.80%)
   helped: 0
   HURT: 11608

   LOST:   26
   GAINED: 13

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-01-17 13:22:34 -08:00
Francisco Jerez
369aef851d intel/fs/gen4-6: Allocate registers from aligned_pairs_class based on LINTERP use.
Previously we would hardcode fs_visitor::delta_xy barycentrics to be
allocated from aligned_pairs_class on hardware with PLN source
alignment restrictions (pre-Gen7).  Instead allocate any registers
consumed by LINTERP from aligned_pairs_class, even if some barycentric
vector had ended up in a temporary.

On SNB this prevents the following shader-db regressions (including
SIMD32 programs) in combination with the interpolation rework part of
this series:

   total instructions in shared programs: 13983257 -> 14527274 (3.89%)
   instructions in affected programs: 1766255 -> 2310272 (30.80%)
   helped: 0
   HURT: 11608

   LOST:   26
   GAINED: 13

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-01-17 13:22:20 -08:00
Francisco Jerez
c20dc9b836 intel/fs: Make implied_mrf_writes() an fs_inst method.
This will be convenient in a later commit enabling SIMD32 fragment
shaders, and happens to fix the calculation for MATH instructions
which is currently inaccurate for SIMD-lowered instructions on Gen4-5
platforms (all of them on Gen4 in SIMD16 mode), since it was based on
the shader's dispatch width rather than on the actual execution size
of the instruction.

This causes some shader-db noise on Gen4 due to the more compact
register allocation interacting with the SEND dependency workarounds,
but otherwise no major changes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-01-10 11:02:30 -08:00
Francisco Jerez
0703eab012 intel/fs/gen8+: Fix r127 dst/src overlap RA workaround for EOT message payload.
The problem occured when the return payload of a SIMD8 SEND
instruction was re-used as source payload of an EOT SEND message.  In
such cases the interference edge added by that workaround between the
payload and grf127_send_hack_node would have no effect, because the
payload would be allocated to a fixed range of registers containing
r127 by the special handling of EOT message payloads in the same
function.  This would cause things to blow up if the source payload of
the first SIMD8 message ended up being allocated to a range which
happened to overlap the destination.

Fix it by avoiding r127 altogether in the allocation of EOT message
payloads.

The problem can be reproduced on ICL with the fp-indirections2 Piglit
test-case in combination with the other optimizer changes of this
series.

Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends dest"
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-01-10 11:00:42 -08:00
Jason Ekstrand
a84de3fb7c intel/fs: Skip registers faster when setting spill costs
This might be slightly faster since we're doing one read rather than
two before we decide to skip.  The more important reason, however, is
because no_spill prevents us from re-spilling spill registers.  In the
new world in which we don't re-calculate liveness every spill, we may
not have valid liveness for spill registers so we shouldn't even look
their live ranges up.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110825
Fixes: e99081e76d "intel/fs/ra: Spill without destroying the..."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
2019-06-04 14:37:56 +00:00
Jason Ekstrand
b2d274c677 intel/fs/ra: Choose a spill reg before throwing away the graph
Otherwise, we get an effectively random spill reg because we no longer
have the information from RA to guide us.  Also, a completely clean
graph has undefined data in in_stack which is used for choosing the
spill reg so it really is non-deterministic.

Fixes: e99081e76d "intel/fs/ra: Spill without destroying the..."
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-16 02:13:09 +00:00
Jason Ekstrand
c19acf321c intel/fs/ra: Add spill costs to the graph on-demand
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-16 02:13:09 +00:00
Jason Ekstrand
2c14e2b5bf intel/fs/ra: Add a helper for discarding the interference graph
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-16 02:13:09 +00:00
Jason Ekstrand
e99081e76d intel/fs/ra: Spill without destroying the interference graph
Instead of re-building the interference graph every time we spill, we
modify it in place so we can avoid recalculating liveness and the whole
O(n^2) interference graph building process.  We make a simplifying
assumption in order to do so which is that all spill/fill temporary
registers live for the entire duration of the instruction around which
we're spilling.  This isn't quite true because a spill into the source
of an instruction doesn't need to interfere with its destination, for
instance.  Not re-calculating liveness also means that we aren't
adjusting spill costs based on the new liveness.  The combination of
these things results in a bit of churn in spilling.  It takes a large
cut out of the run-time of shader-db on my laptop.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311224 -> 15311360 (<.01%)
    instructions in affected programs: 77027 -> 77163 (0.18%)
    helped: 11
    HURT: 18

    total cycles in shared programs: 355544739 -> 355830749 (0.08%)
    cycles in affected programs: 203273745 -> 203559755 (0.14%)
    helped: 234
    HURT: 190

    total spills in shared programs: 12049 -> 12042 (-0.06%)
    spills in affected programs: 2465 -> 2458 (-0.28%)
    helped: 9
    HURT: 16

    total fills in shared programs: 25112 -> 25165 (0.21%)
    fills in affected programs: 6819 -> 6872 (0.78%)
    helped: 11
    HURT: 16

    Total CPU time (seconds): 2469.68 -> 2360.22 (-4.43%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
147665d0a2 intel/fs/ra: Put the VGRFs at the end of the nodes
This is slightly less convenient in some places but it will make it much
easier when we want to start adding nodes dynamically.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
e7b7d572b3 intel/fs/ra: Re-arrange interference setup
The old code was arranged by the type of interference being added.  It
would set up payload registers and then add payload interference for all
VGRFs.  It would set up MRFs and add MRF interference for all VGRFs.
This commit re-arranges things to be organized differently.  It first
creates and sets up all RA nodes and then groups interference into two
new categories:  live range and instruction interference.  Once all the
RA nodes have been set up, it walks the list of VGRFs and sets up their
live range interference and then walks the list of instructions and sets
up instruction interference.  This new arrangement will be advantageous
for a future patch but, at the moment, it cuts 2% off the run-time of
shader-db on my laptop.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311224 -> 15311224 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 355544739 -> 355544739 (0.00%)
    cycles in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    Total CPU time (seconds): 2523.45 -> 2469.68 (-2.13%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
0fd60e95fb intel/fs/ra: Do the spill loop inside RA
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
47b1dcdcab intel/fs/ra: Only add MRF hack interference if we're spilling
The only use of the MRF hack these days is for spilling and there we
don't need the precise MRF usage information.  If we're spilling then we
know pretty well how many MRFs are going to be used.  It is possible if
the only things that are spilled have fewer SIMD channels than the
dispatch width of the shader that this may be more MRFs than needed.
That's a risk we're willing to takd.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311100 -> 15311224 (<.01%)
    instructions in affected programs: 16664 -> 16788 (0.74%)
    helped: 1
    HURT: 5

    total cycles in shared programs: 355543197 -> 355544739 (<.01%)
    cycles in affected programs: 731864 -> 733406 (0.21%)
    helped: 3
    HURT: 6

The hurt shaders are all SIMD32 compute shaders where we reserve enough
space for a 32-wide spill/fill but don't need it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
69878a9bb0 intel/fs/ra: Pull the guts of RA into its own class
This accomplishes two things.  First, it makes interfaces which are
really private to RA private to RA.  Second, it gives us a place to
store some common stuff as we go through the algorithm.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
9e00a251be intel/fs/ra: Move assign_regs further down in the file
It's the main function from which all the other functions are called.
It belongs at the bottom.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
5d9ac57c8c intel/fs/ra: Split building the interference graph into a helper
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
472ef2f98d intel/fs/ra: Initialize grf_used with first_non_payload_grf
There's no reason why we need to use the calculated payload_node_count
value which is just first_non_payload_grf aligned up.  The grf_used
value will be aligned up to 16 anyway (which is a much bigger alignment)
before being handed off to hardware.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
096ad8a809 intel/fs/ra: Stop adding RA interference to too many SENDS nodes
We only have one node per VGRF so this was adding way too much
interference.  No idea how we didn't catch this before.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15311100 -> 15311100 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 355468050 -> 355543197 (0.02%)
    cycles in affected programs: 2472492 -> 2547639 (3.04%)
    helped: 17
    HURT: 20

Fixes: 014edff0d2 "intel/fs: Add interference between SENDS sources"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Jason Ekstrand
88cac12230 intel/fs/ra: Only add dest interference to sources that exist
Fixes: 83dedb6354 "i965: Add src/dst interference for certain"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-14 12:30:22 -05:00
Juan A. Suarez Romero
b06ae53606 Revert "intel/compiler: split is_partial_write() into two variants"
This reverts commit 40b3abb4d1.

It is not clear that this commit was entirely correct, and unfortunately
it was pushed by error.

CC: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-25 09:19:10 +02:00
Jason Ekstrand
cd4ffb376f intel/fs: Account for live range lengths in spill costs
The current register allocator has a concept of "spill benefit" which is
based on the number of nodes with which a given node interferes.  The
idea is that you want to spill stuff with high interference because
those are the most likely registers to help when spilling.  However,
this fails to take into account the length of the live range so the
allocator frequently picks "cheap" (not many uses) registers which are
actually very short lived and so spilling them doesn't help with the
pressure situation.

This commit takes into account the length of the live range to make
long-lived registers more likely to get spilled than short-lived ones.
This encourages the spill chooser to choose slightly larger registers
which will affect a larger area of the program and hopefully we have to
spill fewer of them to get the same reduction in over-all register
pressure.

Shader-db results on Kaby Lake:

    total spills in shared programs: 23664 -> 12050 (-49.08%)
    spills in affected programs: 19243 -> 7629 (-60.35%)
    helped: 296
    HURT: 8

    total fills in shared programs: 32028 -> 25139 (-21.51%)
    fills in affected programs: 20378 -> 13489 (-33.81%)
    helped: 295
    HURT: 16

Of course, most of that is in Deus Ex...

Shader-db results on Kaby Lake (without Deus Ex):

    total spills in shared programs: 6479 -> 5834 (-9.96%)
    spills in affected programs: 3231 -> 2586 (-19.96%)
    helped: 40
    HURT: 4

    total fills in shared programs: 17165 -> 17099 (-0.38%)
    fills in affected programs: 6951 -> 6885 (-0.95%)
    helped: 40
    HURT: 7

Even without Deus Ex, the spill help is pretty respectable.  The worst
hurt shaders were one compute shader in Aztec Ruins and one fragment
shader in KSP that were each hurt by around 13% fill 9% spill.

VkPipeline-db results on Kaby Lake:

    total spills in shared programs: 9149 -> 8069 (-11.80%)
    spills in affected programs: 5197 -> 4117 (-20.78%)
    helped: 27
    HURT: 16

    total fills in shared programs: 26390 -> 25477 (-3.46%)
    fills in affected programs: 12662 -> 11749 (-7.21%)
    helped: 24
    HURT: 22

The Vulkan results were decidedly more mixed but we don't have nearly as
many apps in that database yet.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 23:04:45 +00:00
Iago Toral Quiroga
40b3abb4d1 intel/compiler: split is_partial_write() into two variants
This function is used in two different scenarios that for 32-bit
instructions are the same, but for 16-bit instructions are not.

One scenario is that in which we are working at a SIMD8 register
level and we need to know if a register is fully defined or written.
This is useful, for example, in the context of liveness analysis or
register allocation, where we work with units of registers.

The other scenario is that in which we want to know if an instruction
is writing a full scalar component or just some subset of it. This is
useful, for example, in the context of some optimization passes
like copy propagation.

For 32-bit instructions (or larger), a SIMD8 dispatch will always write
at least a full SIMD8 register (32B) if the write is not partial. The
function is_partial_write() checks this to determine if we have a partial
write. However, when we deal with 16-bit instructions, that logic disables
some optimizations that should be safe. For example, a SIMD8 16-bit MOV will
only update half of a SIMD register, but it is still a complete write of the
variable for a SIMD8 dispatch, so we should not prevent copy propagation in
this scenario because we don't write all 32 bytes in the SIMD register
or because the write starts at offset 16B (wehere we pack components Y or
W of 16-bit vectors).

This is a problem for SIMD8 executions (VS, TCS, TES, GS) of 16-bit
instructions, which lose a number of optimizations because of this, most
important of which is copy-propagation.

This patch splits is_partial_write() into is_partial_reg_write(), which
represents the current is_partial_write(), useful for things like
liveness analysis, and is_partial_var_write(), which considers
the dispatch size to check if we are writing a full variable (rather
than a full register) to decide if the write is partial or not, which
is what we really want in many optimization passes.

Then the patch goes on and rewrites all uses of is_partial_write() to use
one or the other version. Specifically, we use is_partial_var_write()
in the following places: copy propagation, cmod propagation, common
subexpression elimination, saturate propagation and sel peephole.

Notice that the semantics of is_partial_var_write() exactly match the
current implementation of is_partial_write() for anything that is
32-bit or larger, so no changes are expected for 32-bit instructions.

Tested against ~5000 tests involving 16-bit instructions in CTS produced
the following changes in instruction counts:

            Patched  |     Master    |    %    |
================================================
SIMD8  |    621,900  |    706,721    | -12.00% |
================================================
SIMD16 |     93,252  |     93,252    |   0.00% |
================================================

As expected, the change only affects SIMD8 dispatches.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2019-04-18 11:05:18 +02:00
Jason Ekstrand
b4f0d062cd intel/fs: Do the grf127 hack on SIMD8 instructions in SIMD16 mode
Previously, we only applied the fix to shaders with a dispatch mode of
SIMD8 but the code it relies on for SIMD16 mode only applies to SIMD16
instructions.  If you have a SIMD8 instruction in a SIMD16 shader,
neither would trigger and the restriction could still be hit.

Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127..."
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-02-01 16:11:00 -06:00
Jason Ekstrand
014edff0d2 intel/fs: Add interference between SENDS sources
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand
7f1cf046cd intel/fs: Add a generic SEND opcode
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Matt Turner
7e4e9da90d intel/compiler: Prevent warnings in the following patch
The next patch replaces an unsigned bitfield with a plain unsigned,
which triggers gcc to begin warning on signed/unsigned comparisons.

Keeping this patch separate from the actual move allows bisectablity and
generates no additional warnings temporarily.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-01-09 16:42:41 -08:00
Iago Toral Quiroga
6c418dfa42 intel/compiler: fix node interference of simd16 instructions
SIMD16 instructions need to have additional interferences to prevent
source / destination hazards when the source and destination registers
are off by one register.

While we already have code to handle this, it was only running for SIMD16
dispatches, however, we can have SIDM16 instructions in a SIMD8 dispatch.
An example of this are pull constant loads since commit b56fa830c6,
but there are more cases.

This fixes a number of CTS test failures found in work-in-progress
tests that were hitting this situation for 16-wide pull constants
in a SIMD8 program.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-11-09 08:22:08 +01:00
Jose Maria Casanova Crespo
62f37ee53d i965/fs: unspills shoudn't use grf127 as dest since Gen8+
At 232ed89802 "i965/fs: Register allocator
shoudn't use grf127 for sends dest" we didn't take into account the case
of SEND instructions that are not send_from_grf. But since Gen7+ although
the backend still uses MRFs internally for sends they are finally
assigned to a GRFs.

In the case of unspills the backend assigns directly as source its
destination because it is suppose to be available. So we always have a
source-destination overlap. If the reg_allocator assigns registers that
include the grf127 we fail the validation rule that affects Gen8+
"r127 must not be used for return address when there is a src and dest
overlap in send instruction."

So this patch activates the grf127_send_hack_node for Gen8+ and if we
have any register spilled we add interferences to the destination of
the unspill operations.

We also need to avoid that opt_bank_conflicts() optimization, that runs
after the register allocation, doesn't move things around, causing the
grf127 to be used in the condition we were avoiding.

Fixes piglit test tests/spec/arb_compute_shader/linker/bug-93840.shader_test
and some shader-db crashed because of the grf127 validation rule..

v2: make sure that opt_bank_conflicts() optimization doesn't change
the use of grf127. (Caio)

Found by Caio Marcelo de Oliveira Filho

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107193
Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends dest"
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Cc: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-12 18:02:26 +02:00
Jose Maria Casanova Crespo
232ed89802 i965/fs: Register allocator shoudn't use grf127 for sends dest
Since Gen8+ Intel PRM states that "r127 must not be used for return
address when there is a src and dest overlap in send instruction."

This patch implements this restriction creating new grf127_send_hack_node
at the register allocator. This node has a fixed assignation to grf127.

For vgrf that are used as destination of send messages we create node
interfereces with the grf127_send_hack_node. So the register allocator
will never assign to these vgrf a register that involves grf127.

If dispatch_width > 8 we don't create these interferences to the because
all instructions have node interferences between sources and destination.
That is enough to avoid the r127 restriction.

This fixes CTS tests that raised this issue as they were executed as SIMD8:

dEQP-VK.spirv_assembly.instruction.graphics.8bit_storage.8struct_to_32struct.storage_buffer_*int_geom

Shader-db results on Skylake:
   total instructions in shared programs: 7686798 -> 7686797 (<.01%)
   instructions in affected programs: 301 -> 300 (-0.33%)
   helped: 1
   HURT: 0

   total cycles in shared programs: 337092322 -> 337091919 (<.01%)
   cycles in affected programs: 22420415 -> 22420012 (<.01%)
   helped: 712
   HURT: 588

Shader-db results on Broadwell:

   total instructions in shared programs: 7658574 -> 7658625 (<.01%)
   instructions in affected programs: 19610 -> 19661 (0.26%)
   helped: 3
   HURT: 4

   total cycles in shared programs: 340694553 -> 340676378 (<.01%)
   cycles in affected programs: 24724915 -> 24706740 (-0.07%)
   helped: 998
   HURT: 916

   total spills in shared programs: 4300 -> 4311 (0.26%)
   spills in affected programs: 333 -> 344 (3.30%)
   helped: 1
   HURT: 3

   total fills in shared programs: 5370 -> 5378 (0.15%)
   fills in affected programs: 274 -> 282 (2.92%)
   helped: 1
   HURT: 3

v2: Avoid duplicating register classes without grf127. Let's use a node
    with a fixed assignation to grf127 and create interferences to send
    message vgrf destinations. (Eric Anholt)
v3: Update reference to CTS VK_KHR_8bit_storage failing tests.
    (Jose Maria Casanova)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: 18.1 <mesa-stable@lists.freedesktop.org>
2018-07-10 00:14:49 +02:00
Francisco Jerez
58324389be intel/fs: Take into account amount of data read in spilling cost heuristic.
Until now the spilling cost calculation was neglecting the amount of
data read from the register during the spilling cost calculation.
This caused it to make suboptimal decisions in some cases leading to
higher memory bandwidth usage than necessary.

Improves Unigine Heaven performance by ~4% on BDW, reversing an
unintended FPS regression from my previous commit
147e71242c with n=12 and statistical
significance 5%.  In addition SynMark2 OglCSDof performance is
improved by an additional ~5% on SKL, and a Kerbal Space Program
apitrace around the Moho planet I can provide on request improves by
~20%.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-24 11:01:40 -07:00
Francisco Jerez
ecc19e12dc intel/fs: Use regs_written() in spilling cost heuristic for improved accuracy.
This is what we use later on to compute the number of registers that
will actually get spilled to memory, so it's more likely to match
reality than the current open-coded approximation.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-04-24 10:59:56 -07:00
Francisco Jerez
147e71242c i965/fs: Take into account lower frequency of conditional blocks in spilling cost heuristic.
The individual branches of an if/else/endif construct will be executed
some unknown number of times between 0 and 1 relative to the parent
block.  Use some factor in between as weight while approximating the
cost of spill/fill instructions within a conditional if-else branch.
This favors spilling registers used within conditional branches which
are likely to be executed less frequently than registers used at the
top level.

Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on
my SKL GT4e.  Should have a comparable effect on other platforms.  No
significant regressions.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-11 15:28:54 -07:00
Jason Ekstrand
700bebb958 i965: Move the back-end compiler to src/intel/compiler
Mostly a dummy git mv with a couple of noticable parts:
 - With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
 - Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
 - brw_util.[ch] is not really compiler specific, so it's moved to i965.

v2:
 - move brw_eu_defines.h instead of brw_defines.h
 - remove no-longer applicable includes
 - add missing vulkan/ prefix in the Android build (thanks Tapani)

v3:
 - don't list brw_defines.h in src/intel/Makefile.sources (Jason)
 - rebase on top of the oa patches

[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00