It makes sense to keep the result of analysis passes independent from
the IR itself. Instead of representing the idom tree as a pointer in
each basic block pointing to its immediate dominator, the whole
dominator tree is represented separately from the IR as an array of
pointers inside the idom_tree object. This has the advantage that
it's no longer possible to use stale dominance results by accident
without having called require() beforehand, which makes sure that the
idom tree is recalculated if necessary.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This only does half of the work. The actual representation of the
idom tree is left untouched, but the computation algorithm is moved
into a separate analysis result class wrapped in a BRW_ANALYSIS
object, along with the intersect() and dump_domtree() auxiliary
functions in order to keep things tidy.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This involves wrapping vec4_live_variables in a BRW_ANALYSIS object
and hooking it up to invalidate_analysis() so it's properly
invalidated. Seems like a lot of churn but it's fairly
straightforward. The vec4_visitor invalidate_ and
calculate_live_intervals() methods are no longer necessary after this
change.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This involves wrapping fs_live_variables in a BRW_ANALYSIS object and
hooking it up to invalidate_analysis() so it's properly invalidated.
Seems like a lot of churn but it's fairly straightforward. The
fs_visitor invalidate_ and calculate_live_intervals() methods are no
longer necessary after this change.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This could be improved somewhat with additional validation of the
calculated live in/out sets and by checking that the calculated live
intervals are minimal (which isn't strictly necessary to guarantee the
correctness of the program). This should be good enough though to
catch accidental use of stale liveness results due to missing or
incorrect analysis invalidation.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This could be improved somewhat with additional validation of the
calculated live in/out sets and by checking that the calculated live
intervals are minimal (which isn't strictly necessary to guarantee the
correctness of the program). This should be good enough though to
catch accidental use of stale liveness results due to missing or
incorrect analysis invalidation.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This removes the dependency of fs_live_variables on fs_visitor. The
IR analysis framework requires the analysis result to be constructible
with a single argument -- The second argument was redundant anyway.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This makes the structure of the vec4 live intervals calculation more
similar to the FS back-end liveness analysis code. The non-CF-aware
start/end computation is moved into the same pass that calculates the
block-local def/use sets, which saves quite a bit of code, while the
CF-aware start/end computation is moved into a separate
compute_start_end() function as is done in the FS back-end.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This moves the following methods that are currently defined in
vec4_visitor (even though they are side products of the liveness
analysis computation) and are already implemented in
brw_vec4_live_variables.cpp:
> int var_range_start(unsigned v, unsigned n) const;
> int var_range_end(unsigned v, unsigned n) const;
> bool virtual_grf_interferes(int a, int b) const;
> int *virtual_grf_start;
> int *virtual_grf_end;
It makes sense for them to be part of the vec4_live_variables object,
because they have the same lifetime as other liveness analysis results
and because this will allow some extra validation to happen wherever
they are accessed in order to make sure that we only ever use
up-to-date liveness analysis results.
The naming of the virtual_grf_start/end arrays was rather misleading,
they were indexed by variable rather than by vgrf, this renames them
start/end to match the FS liveness analysis pass. The churn in the
definition of var_range_start/end is just in order to avoid a
collision between the start/end arrays and local variables declared
with the same name.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This moves the following methods that are currently defined in
fs_visitor (even though they are side products of the liveness
analysis computation) and are already implemented in
brw_fs_live_variables.cpp:
> bool virtual_grf_interferes(int a, int b) const;
> int *virtual_grf_start;
> int *virtual_grf_end;
It makes sense for them to be part of the fs_live_variables object,
because they have the same lifetime as other liveness analysis results
and because this will allow some extra validation to happen wherever
they are accessed in order to make sure that we only ever use
up-to-date liveness analysis results.
This shortens the virtual_grf prefix in order to compensate for the
slightly increased lexical overhead from the live_intervals pointer
dereference.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
Have fun reading through the whole back-end optimizer to verify
whether I've missed any dependency flags -- Or alternatively, just
trust that any mistake here will trigger an assertion failure during
analysis pass validation if it ever poses a problem for the
consistency of any of the analysis passes managed by the framework.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
I've deliberately separated this from the general analysis pass
infrastructure in order to discuss it independently. The dependency
classes defined here refer to state changes of several objects of the
program IR, and are fully orthogonal and expected to change less often
than the set of analysis passes present in the compiler back-end.
The objective is to avoid unnecessary coupling between optimization
and analysis passes in the back-end. By doing things in this way the
set of flags to be passed to invalidate_analysis() can be determined
from knowledge of a single optimization pass and a small set of well
specified dependency classes alone -- IOW there is no need to audit
all analysis passes to find out which ones might be affected by
certain kind of program transformation performed by an optimization
pass, as well as the converse, there is no need to audit all
optimization passes when writing a new analysis pass to find out which
ones can potentially invalidate the result of the analysis.
The set of dependency classes defined here is rather conservative and
mainly based on the requirements of the few analysis passes already
part of the back-end. I've also used them without difficulty with a
few additional analysis passes I've written but haven't yet sent for
review.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
The invalidate_analysis() method knows what analysis passes there are
in the back-end and calls their invalidate() method to report changes
in the IR. For the moment it just calls invalidate_live_intervals()
(which will eventually be fully replaced by this function) if anything
changed.
This makes all optimization passes invalidate DEPENDENCY_EVERYTHING,
which is clearly far from ideal -- The dependency classes passed to
invalidate_analysis() will be refined in a future commit.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
Motivated in detail in the source code. The only piece missing here
from the analysis pass infrastructure is some sort of mechanism to
broadcast changes in the IR to all existing analysis passes, which
will be addressed by a future commit. The analysis_dependency_class
enum might seem a bit silly at this point, more interesting dependency
categories will be defined later on.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
brw_vec4.h (in particular vec4_visitor) is logically a user of the
live variables analysis pass, not the other way around.
brw_vec4_live_variables.h requires the definition of some VEC4 IR data
structures to compile, but those can be obtained directly from
brw_ir_vec4.h without including brw_vec4.h.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
brw_fs.h (in particular fs_visitor) is logically a user of the live
variables analysis pass, not the other way around.
brw_fs_live_variables.h requires the definition of some FS IR data
structures to compile, but those can be obtained directly from
brw_ir_fs.h without including brw_fs.h. The dependency of
fs_live_variables on fs_visitor is rather accidental and will be
removed in a future commit, a forward declaration is enough for the
moment.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
When this commit was originally written, these two structures had the
exact same name. Subsequently in commit 12a8f2616a (intel/compiler:
Fix C++ one definition rule violations) they were renamed.
Original commit message:
> These two structures have exactly the same name which prevents the two
> files from being included at the same time and could cause serious
> trouble in the future if it ever leads to a (silent) violation of the
> C++ one definition rule.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This pulls out the i965 IR definitions into a separate file and leaves
the top-level backend_shader structure and back-end compiler entry
points in brw_shader.h. The purpose is to keep things tidy and
prevent a nasty circular dependency between brw_cfg.h and
brw_shader.h. The logical dependency between these data structures
looks like:
backend_shader (brw_shader.h) -> cfg_t (brw_cfg.h)
-> bblock_t (brw_cfg.h) -> backend_instruction (brw_shader.h)
This circular header dependency is currently resolved by using forward
declarations of cfg_t/bblock_t in brw_shader.h and having brw_cfg.h
include brw_shader.h, which seems backwards and won't work at all when
the forward declarations of cfg_t/bblock_t are no longer sufficient in
a future commit.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
Our current GPGPU_WALKER code only supports up to 64 threads.
On HSW we could use up to 70 and TGL up to 112, but only if the walker
is adjusted so the width does not exceed 64. Work to support this is
in progress.
Previous to this change, we might try to downgrade to SIMD8 if the
SIMD16 shader spilled. Since HSW and TGL have the max number of
threads above 64, we would then try to emit an invalid GPGPU walker
command.
Fixes: 932045061b ("i965/cs: Emit compute shader code and upload programs")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Tested-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
The other source of the multiply will be interpreted as a uint32_t in an
XOR instruction. Any source modifiers with either not be interpreted at
all or will be misinterpreted due to the differing types.
If the other operand of the multiplication has a source modifier, just
emit an extra move to resolve the source modifiers.
The negation source modifier problem is difficult to reproduce due to an
algebraic optimization that changes (-a*b) to -(a*b). However, changes
in MR !1359 push the negations back down.
On Gen7+ it might be possible to do slightly better for an abs() source
modifier by using BFI2 as a glorified copysign().
On Gen8+ it might be possible to do slightly better for a neg() source
modifier by emitting (~a ^ b).
There were no shader-db changes on any Intel platform, so I think we can
deal with that problem when it arises.
See also piglit!224.
Fixes: 06d2c11641 ("intel/fs: Add a scale factor to emit_fsign")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3780>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3780>
../src/intel/compiler/brw_nir_analyze_ubo_ranges.c:316:4: runtime error: null pointer passed as argument 1, which is declared to never be null
#0 0x7f78f5916611 in brw_nir_analyze_ubo_ranges ../src/intel/compiler/brw_nir_analyze_ubo_ranges.c:316
#1 0x7f78f255c189 in brw_codegen_wm_prog ../src/mesa/drivers/dri/i965/brw_wm.c:97
#2 0x7f78f2565571 in brw_fs_precompile ../src/mesa/drivers/dri/i965/brw_wm.c:608
#3 0x7f78f24edd2c in brw_shader_precompile ../src/mesa/drivers/dri/i965/brw_link.cpp:56
#4 0x7f78f24f3af8 in brw_link_shader ../src/mesa/drivers/dri/i965/brw_link.cpp:381
#5 0x7f78f39a302a in _mesa_glsl_link_shader ../src/mesa/program/ir_to_mesa.cpp:3119
#6 0x7f78f3a43826 in create_new_program ../src/mesa/main/ff_fragment_shader.cpp:1133
#7 0x7f78f3a43d00 in _mesa_get_fixed_func_fragment_program ../src/mesa/main/ff_fragment_shader.cpp:1163
#8 0x7f78f325ddcd in update_program ../src/mesa/main/state.c:134
#9 0x7f78f325fe64 in _mesa_update_state_locked ../src/mesa/main/state.c:360
#10 0x7f78f32600f1 in _mesa_update_state ../src/mesa/main/state.c:394
#11 0x7f78f2b3e587 in clear ../src/mesa/main/clear.c:169
#12 0x7f78f2b3e587 in _mesa_Clear ../src/mesa/main/clear.c:242
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3825>
../src/intel/compiler/brw_nir.c:979:40: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
#0 0x7f78f590d10b in brw_nir_apply_sampler_key ../src/intel/compiler/brw_nir.c:979
#1 0x7f78f590e07b in brw_nir_apply_key ../src/intel/compiler/brw_nir.c:1057
#2 0x7f78f5754b45 in brw_compile_fs ../src/intel/compiler/brw_fs.cpp:8347
#3 0x7f78f255c8e4 in brw_codegen_wm_prog ../src/mesa/drivers/dri/i965/brw_wm.c:123
#4 0x7f78f2565571 in brw_fs_precompile ../src/mesa/drivers/dri/i965/brw_wm.c:608
#5 0x7f78f24edd2c in brw_shader_precompile ../src/mesa/drivers/dri/i965/brw_link.cpp:56
#6 0x7f78f24f3af8 in brw_link_shader ../src/mesa/drivers/dri/i965/brw_link.cpp:381
#7 0x7f78f39a302a in _mesa_glsl_link_shader ../src/mesa/program/ir_to_mesa.cpp:3119
#8 0x7f78f3a43826 in create_new_program ../src/mesa/main/ff_fragment_shader.cpp:1133
#9 0x7f78f3a43d00 in _mesa_get_fixed_func_fragment_program ../src/mesa/main/ff_fragment_shader.cpp:1163
#10 0x7f78f325ddcd in update_program ../src/mesa/main/state.c:134
#11 0x7f78f325fe64 in _mesa_update_state_locked ../src/mesa/main/state.c:360
#12 0x7f78f32600f1 in _mesa_update_state ../src/mesa/main/state.c:394
#13 0x7f78f2b3e587 in clear ../src/mesa/main/clear.c:169
#14 0x7f78f2b3e587 in _mesa_Clear ../src/mesa/main/clear.c:242
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3825>
The interpretation of the fields is different depending whether the
instruction is a SEND/MATH or not.
This fixes the disassembly output for non-SEND/MATH instructions that
have both in-order and out-of-order dependencies. Their dependencies
were wrongly represented as `@A $B` when the correct would be `@A
$B.dst`.
Fixes: 6154cdf924 ("intel/eu/gen12: Add auxiliary type to represent SWSB information during codegen.")
Fixes: 83612c0127 ("intel/disasm/gen12: Disassemble software scoreboard information.")
Acked-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3660>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3660>
At this point this simply involves fixing the initialization of the
sample mask flag register to take the right dispatch mask from the
thread payload, and fixing sample_mask_reg() to return f1.1 for the
second half of a SIMD32 thread. This improves Manhattan 3.1
performance by 2.4%±0.31% (N>40) on my ICL with SIMD32 enabled
relative to falling back to SIMD16 for the shaders that use discard.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In SIMD32 programs that don't use discard, the upper 16 bits of the UD
result of sample_mask_reg() don't contain the sample mask of the upper
16 channels as one would expect. Stop pretending we are returning a
valid 32-bit mask.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
FIND_LIVE_CHANNEL was using f1.0-f1.1 as temporary flag register on
Gen7, instead use f0.0-f0.1. In order to avoid collision with the
discard sample mask, move the latter to f1.0-f1.1. This makes room
for keeping track of the sample mask of the second half of SIMD32
programs that use discard.
Note that some MOVs of the sample mask into f1.0 become redundant now
in lower_surface_logical_send() and lower_a64_logical_send().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>x
Use it instead of hard-coding f0.1 for the sample mask of programs
that use discard. This will make the task easier when we replace f0.1
with another flag register location in order to support discard with
SIMD32 shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It's only really useful there. This will avoid confusion with another
helper with a similar purpose I'm about to introduce that will be
useful in multiple files from the FS back-end.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The SIMD8 dual-source blending framebuffer write messages seem to have
trouble releasing the pixel scoreboard dependency in SIMD32 dispatch
mode, which leads to hangs. I have a better workaround for this which
doesn't involve disabling SIMD32 when dual-source blending is enabled,
but I'm still investigating some issues with it. Limit the dispatch
width to SIMD16 in such cases for the moment in order to make the CI
happy on ICL with SIMD32 fragment shaders enabled.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Currently the "Source0 Alpha Present to RenderTarget" bit of the RT
write message header is derived from brw_wm_prog_data::replicate_alpha.
However the src0_alpha payload is provided anytime it's specified to
the logical message. This could theoretically lead to an
inconsistency if somebody provided a src0_alpha value while
brw_wm_prog_data::replicate_alpha was false, as I'm planning to do in
a future commit in order to implement a hardware workaround.
Instead calculate the header bit based on whether a src0_alpha value
was provided to the logical message, which guarantees the same
behavior on pre-ICL and ICL+ (the latter used an extended descriptor
bit for this which didn't suffer from the same issue). Remove the
brw_wm_prog_data::replicate_alpha flag.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Together with the fixup_nomask_control_flow() pass introduced in a
previous patch, this implements a less invasive alternative to the
workaround documented in the hardware spec for GEN:BUG:1407528679,
which doesn't involve disabling structured control flow.
Under some conditions Gen12 hardware can end up executing a BB with
all channels disabled, which will lead to the execution of any NoMask
instructions in it, even though any execution-masked instructions will
be correctly shot down. This could break assumptions of the SWSB pass
if the data computed by a NoMask instruction is synchronized against
by using an SWSB annotation baked into a regular execution-masked
instruction, since the first (NoMask) instruction may be executed
redundantly by the hardware, even though the second will correctly be
shot down, potentially leading to a RaW or WaW hazard if a third
instruction subsequently accesses the destination register of the
first instruction.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
Found by inspection. Existing code was trying to avoid assuming that
an SBID had been assigned to the virtual instruction, but
synchronizing the header setup with respect to the previous SIMD16
SEND by using SYNC.ALLRD doesn't really seem possible unless the SEND
instruction had been assigned an SBID. Assert-fail instead if no SBID
has been allocated.
Fixes: 15e3a0d9d2 "intel/eu/gen12: Set SWSB annotations in hand-crafted assembly."
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
This is a less invasive alternative to the workaround documented in
the hardware spec for GEN:BUG:1407528679, which doesn't involve
disabling structured control flow (it's unlikely that switching to
GOTO/JOIN would have actually fixed the problem anyway).
Under some conditions Gen12 hardware can end up executing a BB with
all channels disabled, which will lead to the execution of any NoMask
instructions in it, even though any execution-masked instructions will
be correctly shot down. This may break assumptions of some NoMask
SEND messages whose descriptor depends on data generated by live
invocations of the shader.
This avoids the problem by predicating certain instructions on an ANY
horizontal predicate that makes sure that their execution is omitted
when all channels of the program are disabled. The shader-db impact
of this patch seems to be minimal:
total instructions in shared programs: 17169833 -> 17169913 (0.00%)
instructions in affected programs: 30663 -> 30743 (0.26%)
helped: 0
HURT: 42
total cycles in shared programs: 336966176 -> 336968568 (0.00%)
cycles in affected programs: 2367290 -> 2369682 (0.10%)
helped: 0
HURT: 13
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
We need to pass a width of 32 since the opcode bashes the whole f1.0
register on IVB. This is unlikely to have caused problems since f1.0
is largely unused currently. That's likely to change soon though,
even on platforms other than Gen7.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
Found by inspection. This seems particularly likely to cause problems
with instructions dependent on the current execution mask like
SHADER_OPCODE_FIND_LIVE_CHANNEL or the FS_OPCODE_LOAD_LIVE_CHANNELS
instruction I'm about to introduce, but one could imagine it leading
to data corruption if CSE ever managed to combine two instructions
before and after the FS_OPCODE_PLACEHOLDER_HALT, since the one before
may not be executed for some channels.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: 20.0 <mesa-stable@lists.freedesktop.org>
This means you can directly use format utils on it without having to have
your own GL enum to number-of-components switch statement (or whatever) in
your vulkan backend.
Thanks to imirkin for fixing up the nouveau driver (and a couple of core
details).
This fixes the computed qualifiers for EXT_shader_image_load_store's
non-integer sizeNxM qualifiers, which we don't have tests for.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v3d)
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>