Commit Graph

29 Commits

Author SHA1 Message Date
Francisco Jerez
37e280f28a intel/fs: Lower unsupported regioning with non-trivial 2D regions on FIXED_GRFs.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25020>
2023-09-20 17:19:36 -07:00
Jordan Justen
fcb72ffd0c intel/compiler/gfx12.5+: Lower 64-bit cluster_broadcast with 32-bit ops
For MTL (verx10 == 125), float64 is supported, but int64 is not.
Therefore we need to lower cluster broadcast using 32-bit int ops.

For gfx12.5+ platforms that support int64, the register regions
used by cluster broadcast aren't supported by the 64-bit pipeline.

On MTL, dEQP-VK.subgroups.clustered.*_double* and
dEQP-VK.subgroups.clustered.*_dvec* were failing to validate the
compiled shader in debug mode, and reportedly gpu-hanging in release
mode.

With this change dEQP-VK.subgroups.clustered.*_double* passed all 48
tests and dEQP-VK.subgroups.clustered.*_dvec* passed all 140 tests on
MTL.

Rework:
 * Move from generator to brw_fs_lower_regioning.cpp. (Suggested by
   Francisco)
 * Apply to verx10 >= 125.. (Suggested by Francisco)

Cc: 23.1 <mesa-stable>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> (v1)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22569>
2023-04-20 11:41:10 -07:00
Paulo Zanoni
a099d6ae4d intel: add devinfo->has_64bit_float_via_math_pipe
Unusual hardware features that require special hanlding usually get a
devinfo field, so do this for MTL's unordered DF types. This will
guarantee that any platform based on MTL (thus inheriting from
MTL_FEATURES) will automatically be handled in these special cases.

v2: s/has_unordered_64bit_float/has_64bit_float_via_math_pipe/ (Curro).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Paulo Zanoni
df50add27e intel/compiler: avoid 64bit SEL_EXEC on MTL
On MTL, instructions with DF type are unordered, executed in the math
pipe. This means that they require different SWSB dependency handling,
and also that in some cases such as MOVs it's generally faster to
simply use 2 smaller ordered moves than a single unordered MOV.

One problem we have with the current code is that generate_code() is
not setting the proper SWSB dependencies for the generated DF MOVs,
causing some tests to fail.

One solution would be to fix generate_code() by making it set the
appropriate dependencies. This was the first patch I wrote. Another
solution to this problem, pointed to us by Curro, is to change
required_exec_type() so we use UD instructions instead of DF, just
like we do with platforms that don't have 64 bit instructions, which
means there won't be anything to fix in generate_code(). The second
solution is what this patch implements.

This fixes at least:
 - dEQP-VK.subgroups.arithmetic.framebuffer.subgroupmin_double_vertex

Thanks to Francisco Jerez for all the major help provided with this
problem.

Credits-to: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Francisco Jerez
051887fbf3 intel/fs: Make the result of is_unordered() dependent on devinfo.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
José Roberto de Souza
daf0b67bc2 intel/compiler/fs: Fix compilation of shaders with SHADER_OPCODE_SHUFFLE of float64 type
During the lower_regioning() optimization, required_exec_type() is
returning BRW_REGISTER_TYPE_UQ type when processing
SHADER_OPCODE_SHUFFLE instructions of type BRW_REGISTER_TYPE_DF but
MTL has float64 support but lacks int64 support causing shader
compilation to fail.

To fix that we could make required_exec_type() return
BRW_REGISTER_TYPE_DF in such case but SHADER_OPCODE_SHUFFLE virtual
instruction runs in the integer pipeline(inferred_exec_pipe()).

So here replacing the has_64bit check by has_64bit_int, this will
properly handle older and newer cases making this function return
BRW_REGISTER_TYPE_UD.
Then lower_exec_type() will take care to generate 2 32bits operations
to accomplish the same.

While at it also dropping the 'devinfo->verx10 == 70' check as
GFX7_FEATURES fall into the same category as MTL, has float64 but no
int64 support.

Fixes at least this crucible tests:
func.uniform-subgroup.exclusive.fadd64.q0
func.uniform-subgroup.exclusive.fmin64.q0
func.uniform-subgroup.exclusive.fmax64.q0

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18577>
2022-09-14 19:32:43 +00:00
Iván Briano
81f97905c3 intel/compiler: make CLUSTER_BROADCAST always deal with integers
This way we don't run afoul of regioning restrictions around floating
point types.

Cc: 22.0 <mesa-stable>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15039>
2022-02-16 21:36:42 +00:00
Francisco Jerez
79fb7f9de8 intel/fs: Perform 64-bit CLUSTER_BROADCAST lowering in the lower_regioning pass.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
2022-01-25 22:40:44 +00:00
Francisco Jerez
6c8782c135 intel/fs: Perform 64-bit SEL_EXEC lowering in the lower_regioning pass.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
2022-01-25 22:40:44 +00:00
Francisco Jerez
9449b71bdd intel/fs: Perform 64-bit SHUFFLE lowering in the lower_regioning pass.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
2022-01-25 22:40:44 +00:00
Francisco Jerez
44e48751d2 intel/fs: Teach the lower_regioning pass how to split instructions of unsuported exec type.
This adds some generic infrastructure that allows splitting any
instruction into a number of instructions of a smaller legal execution
type.  This is meant to replace several instances of handcrafted 64bit
type lowering done manually in the code generator, which is rather
error-prone, prevents scheduling of the lowered instructions, and
makes them invisible to the SWSB pass on Gfx12+ platforms, which will
become especially problematic on Gfx12.5+ since the EUs introduce
multiple asynchronous execution pipelines which the SWSB pass needs to
be able to synchronize to one another, so it's critical for the real
execution type of the instruction to be visible to the SWSB pass.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
2022-01-25 22:40:44 +00:00
Francisco Jerez
539c879a6b intel/fs: Move legal exec type calculation into helper function in lower_regioning pass.
Right now the execution type lowering functionality of this pass
assumes that an integer type of the original bit size is always
acceptable, however we'll want more complex behavior than that in
order to leverage this pass to automate the lowering of unsupported
64-bit operations into multiple 32-bit operations.

In order to do that calculate the closest legal execution type from a
new helper function, and take advantage of that function from the
has_invalid_exec_type() helper, along the lines of other
lower_regioning() helpers structured as a pair of has_invalid_foo() +
required_foo() functions.

This shouldn't have any functional changes.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14273>
2022-01-25 22:40:44 +00:00
Lionel Landwerlin
361b3fee3c intel: move away from booleans to identify platforms
v2: Drop changes around GFX_VERx10 == 75 (Luis)

v3: Replace
   (GFX_VERx10 < 75 && devinfo->platform != INTEL_PLATFORM_BYT)
   by
   (devinfo->platform == INTEL_PLATFORM_IVB)
   Replace
   (devinfo->ver >= 5 || devinfo->platform == INTEL_PLATFORM_G4X)
   by
   (devinfo->verx10 >= 45)
   Replace
   (devinfo->platform != INTEL_PLATFORM_G4X)
   by
   (devinfo->verx10 != 45)

v4: Fix crocus typo

v5: Rebase

v6: Add GFX3, ILK & I965 platforms (Jordan)
    Move ifdef to code expressions (Jordan)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12981>
2021-11-08 16:48:06 +00:00
Ian Romanick
38807ceeae intel/fs: sel.cond writes the flags on Gfx4 and Gfx5
On Gfx4 and Gfx5, sel.l (for min) and sel.ge (for max) are implemented
using a separte cmpn and sel instruction.  This lowering occurs in
fs_vistor::lower_minmax which is called very, very late... a long, long
time after the first calls to opt_cmod_propagation.  As a result,
conditional modifiers can be incorrectly propagated across sel.cond on
those platforms.

No tests were affected by this change, and I find that quite shocking.
After just changing flags_written(), all of the atan tests started
failing on ILK.  That required the change in cmod_propagatin (and the
addition of the prop_across_into_sel_gfx5 unit test).

Shader-db results for ILK and GM45 are below.  I looked at a couple
before and after shaders... and every case that I looked at had
experienced incorrect cmod propagation.  This affected a LOT of apps!
Euro Truck Simulator 2, The Talos Principle, Serious Sam 3, Sanctum 2,
Gang Beasts, and on and on... :(

I discovered this bug while working on a couple new optimization
passes.  One of the passes attempts to remove condition modifiers that
are never used.  The pass made no progress except on ILK and GM45.
After investigating a couple of the affected shaders, I noticed that
the code in those shaders looked wrong... investigation led to this
cause.

v2: Trivial changes in the unit tests.

v3: Fix type in comment in unit tests.  Noticed by Jason and Priit.

v4: Tweak handling of BRW_OPCODE_SEL special case.  Suggested by Jason.

Fixes: df1aec763e ("i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Dave Airlie <airlied@redhat.com>

Iron Lake
total instructions in shared programs: 8180493 -> 8181781 (0.02%)
instructions in affected programs: 541796 -> 543084 (0.24%)
helped: 28
HURT: 1158
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.35% max: 0.86% x̄: 0.53% x̃: 0.50%
HURT stats (abs)   min: 1 max: 3 x̄: 1.14 x̃: 1
HURT stats (rel)   min: 0.12% max: 4.00% x̄: 0.37% x̃: 0.23%
95% mean confidence interval for instructions value: 1.06 1.11
95% mean confidence interval for instructions %-change: 0.31% 0.38%
Instructions are HURT.

total cycles in shared programs: 239420470 -> 239421690 (<.01%)
cycles in affected programs: 2925992 -> 2927212 (0.04%)
helped: 49
HURT: 157
helped stats (abs) min: 2 max: 284 x̄: 62.69 x̃: 70
helped stats (rel) min: 0.04% max: 6.20% x̄: 1.68% x̃: 1.96%
HURT stats (abs)   min: 2 max: 48 x̄: 27.34 x̃: 24
HURT stats (rel)   min: 0.02% max: 2.91% x̄: 0.31% x̃: 0.20%
95% mean confidence interval for cycles value: -0.80 12.64
95% mean confidence interval for cycles %-change: -0.31% <.01%
Inconclusive result (value mean confidence interval includes 0).

GM45
total instructions in shared programs: 4985517 -> 4986207 (0.01%)
instructions in affected programs: 306935 -> 307625 (0.22%)
helped: 14
HURT: 625
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.35% max: 0.82% x̄: 0.52% x̃: 0.49%
HURT stats (abs)   min: 1 max: 3 x̄: 1.13 x̃: 1
HURT stats (rel)   min: 0.12% max: 3.90% x̄: 0.34% x̃: 0.22%
95% mean confidence interval for instructions value: 1.04 1.12
95% mean confidence interval for instructions %-change: 0.29% 0.36%
Instructions are HURT.

total cycles in shared programs: 153827268 -> 153828052 (<.01%)
cycles in affected programs: 1669290 -> 1670074 (0.05%)
helped: 24
HURT: 84
helped stats (abs) min: 2 max: 232 x̄: 64.33 x̃: 67
helped stats (rel) min: 0.04% max: 4.62% x̄: 1.60% x̃: 1.94%
HURT stats (abs)   min: 2 max: 48 x̄: 27.71 x̃: 24
HURT stats (rel)   min: 0.02% max: 2.66% x̄: 0.34% x̃: 0.14%
95% mean confidence interval for cycles value: -1.94 16.46
95% mean confidence interval for cycles %-change: -0.29% 0.11%
Inconclusive result (value mean confidence interval includes 0).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12191>
2021-08-11 13:09:20 -07:00
Lionel Landwerlin
474eaa25ad intel/fs: make sure shuffle is lowered to supported types
On XeHP there are restrictions on types of source and destinations
with float types. As shuffle is implemented using MOV we need to make
sure we lower it to supported types.

This fixes tests like :

    dEQP-VK.subgroups.arithmetic.framebuffer.subgroupexclusivemax_vec4_vertex
    dEQP-VK.subgroups.arithmetic.framebuffer.subgroupexclusivemul_f16vec3_vertex

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10902>
2021-05-22 21:55:33 +00:00
Marcin Ślusarz
3340d5ee02 intel: simplify is_haswell checks, part 1
Generated with:

files=`git grep is_haswell | cut -d: -f1 | sort | uniq`
for file in $files; do
        cat $file | \
                sed "s/devinfo->ver <= 7 && !devinfo->is_haswell/devinfo->verx10 <= 70/g" | \
                sed "s/devinfo->ver >= 8 || devinfo->is_haswell/devinfo->verx10 >= 75/g" | \
                sed "s/devinfo->is_haswell || devinfo->ver >= 8/devinfo->verx10 >= 75/g" | \
                sed "s/devinfo.is_haswell || devinfo.ver >= 8/devinfo.verx10 >= 75/g" | \
                sed "s/devinfo->ver > 7 || devinfo->is_haswell/devinfo->verx10 >= 75/g" | \
                sed "s/devinfo->ver == 7 && !devinfo->is_haswell/devinfo->verx10 == 70/g" | \
                sed "s/devinfo.ver == 7 && !devinfo.is_haswell/devinfo.verx10 == 70/g" | \
                sed "s/devinfo->ver < 8 && !devinfo->is_haswell/devinfo->verx10 <= 70/g" | \
                sed "s/device->info.ver == 7 && !device->info.is_haswell/device->info.verx10 == 70/g" \
                > tmpXXX
        mv tmpXXX $file
done

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10810>
2021-05-17 09:46:45 +00:00
Anuj Phogat
4c535cbf99 intel: Fix alignment and line wrapping due to gen_device renaming
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10241>
2021-04-20 20:06:33 +00:00
Anuj Phogat
61e8636557 intel: Rename gen_device prefix to intel_device
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "gen_device" -rIl $SEARCH_PATH | xargs sed -ie "s/gen_device/intel_device/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10241>
2021-04-20 20:06:33 +00:00
Francisco Jerez
f3e5cd813a intel/fs: Handle regioning restrictions of split FP/DP pipelines.
The floating-point and double-precision FPU pipelines of XeHP
platforms don't support arbitrary regioning modes, corresponding
channels of sources and destination are required to be aligned to the
same sub-register offset, similar to the restriction FP64 instructions
had on CHV/BXT platforms.

Most violations of this restriction can be fixed easily by teaching
has_dst_aligned_region_restriction() about the change so the regioning
lowering pass gets rid of any unsupported regioning.  For cases where
this is not sufficient (e.g. because a virtual instruction internally
uses some regioning mode not supported by the floating-point pipeline)
the regioning lowering pass is extended with an additional
lower_exec_type() codepath that bit-casts sources and destination to
an integer type whenever the execution type is not supported by the
instruction.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>
2021-04-16 08:27:35 +00:00
Anuj Phogat
abe9a71a09 intel: Rename gen field in gen_device_info struct to ver
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "info\)*(.|->)gen" -rIl $SEARCH_PATH | xargs sed -ie "s/info\()*\)\(\.\|->\)gen/info\1\2ver/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Mark Janes
ee06e47c5b intel/fs: Assert if lower_source_modifiers converts 32x16 to 32x32 multiplication
Lowering source modifiers will convert a 16bit source to a 32bit
value.  In the case of integer multiplication, this will reverse
previous lowering performed by lower_mul_dword_inst.

Assert to prevent an illegal DxD operation (and GPU hang).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2020-08-10 13:29:56 -07:00
Francisco Jerez
ab6d792986 intel/compiler: Pass detailed dependency classes to invalidate_analysis()
Have fun reading through the whole back-end optimizer to verify
whether I've missed any dependency flags -- Or alternatively, just
trust that any mistake here will trigger an assertion failure during
analysis pass validation if it ever poses a problem for the
consistency of any of the analysis passes managed by the framework.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:20:39 -08:00
Francisco Jerez
d966a6b4c4 intel/compiler: Introduce backend_shader method to propagate IR changes to analysis passes
The invalidate_analysis() method knows what analysis passes there are
in the back-end and calls their invalidate() method to report changes
in the IR.  For the moment it just calls invalidate_live_intervals()
(which will eventually be fully replaced by this function) if anything
changed.

This makes all optimization passes invalidate DEPENDENCY_EVERYTHING,
which is clearly far from ideal -- The dependency classes passed to
invalidate_analysis() will be refined in a future commit.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:20:32 -08:00
Jason Ekstrand
f4ef34f207 intel/fs: Add an UNDEF instruction to avoid excess live ranges
With 8 and 16-bit types and anything where we have to use non-trivial
strides registersto deal with restrictions, we end up with things that
look like partial writes even though we don't care about any values in
the register except those written by that instruction.  This is
particularly important when dealing with loops because liveness sees
is_partial_write and the fact that an old version from a previous loop
iteration may be valid at that point and extends all purely partially
written values to the entire loop.

This commit adds a new UNDEF instruction which does nothing (the
generator doesn't emit anything) but which does a fake write to the
register.  This informs liveness that we don't care about any values
before that point so it won't consider those registers to be falsely
live.  We can safely emit UNDEF instructions for all SSA values that
come in from NIR and nearly all temporaries generated by various stages
of the compiler.  In particular, we need to insert UNDEF instructions
when we handle region restrictions because the newly allocated registers
are almost guaranteed to be partially written.

No shader-db changes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110432
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-04 14:27:30 -05:00
Iago Toral Quiroga
0986199b31 intel/compiler: workaround for SIMD8 half-float MAD in gen8
Empirical testing shows that gen8 has a bug where MAD instructions with
a half-float source starting at a non-zero offset fail to execute
properly.

This scenario usually happened in SIMD8 executions, where we used to
pack vector components Y and W in the second half of SIMD registers
(therefore, with a 16B offset). It looks like we are not currently doing
this any more but this would handle the situation properly if we ever
happen to produce code like this again.

v2 (Jason):
 - Move this workaround to the lower_regioning pass as an additional case
   to has_invalid_src_region()
 - Do not apply the workaround if the stride of the source operand is 0,
   testing suggests the problem doesn't exist in that case.

v3 (Jason):
 - We want offset % REG_SIZE > 0, not just offset > 0
 - Use a helper to compute the offset

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
2019-04-18 11:05:18 +02:00
Francisco Jerez
7f9f6263c1 intel/fs: Cap dst-aligned region stride to maximum representable hstride value.
This is required in combination with the following commit, because
otherwise if a source region with an extended 8+ stride is present in
the instruction (which we're about to declare legal) we'll end up
emitting code that attempts to write to such a region, even though
strides greater than four are still illegal for the destination.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Francisco Jerez
c3c27762f7 intel/fs: Exclude control sources from execution type and region alignment calculations.
Currently the execution type calculation will return a bogus value in
cases like:

  mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u

Which will be considered to have a 32-bit integer execution type even
though the actual indirect move operation will be carried out with
16-bit precision.

Similarly there's no need to apply the CHV/BXT double-precision region
alignment restrictions to such control sources, since they aren't
directly involved in the double-precision arithmetic operations
emitted by these virtual instructions.  Applying the CHV/BXT
restrictions to control sources was expected to be harmless if mildly
inefficient, but unfortunately it exposed problems at codegen level
for virtual instructions (namely the SHUFFLE instruction used for the
Vulkan 1.1 subgroup feature) that weren't prepared to accept control
sources with an arbitrary strided region.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes <mark.a.janes@intel.com>
Fixes: efa4e4bc5f "intel/fs: Introduce regioning lowering pass."
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Jason Ekstrand
eb32dad07c intel/fs: Don't touch accumulator destination while applying regioning alignment rule
In some shaders, you can end up with a stride in the source of a
SHADER_OPCODE_MULH.  One way this can happen is if the MULH is acting on
the top bits of a 64-bit value due to 64-bit integer lowering.  In this
case, the compiler will produce something like this:

mul(8)    acc0<1>UD   g5<8,4,2>UD   0x0004UW      { align1 1Q };
mach(8)   g6<1>UD     g5<8,4,2>UD   0x00000004UD  { align1 1Q AccWrEnable };

The new region fixup pass looks at the MUL and sees a strided source and
unstrided destination and determines that the sequence is illegal.  It
then attempts to fix the illegal stride by replacing the destination of
the MUL with a temporary and emitting a MOV into the accumulator:

mul(8)    g9<2>UD     g5<8,4,2>UD   0x0004UW      { align1 1Q };
mov(8)    acc0<1>UD   g9<8,4,2>UD                 { align1 1Q };
mach(8)   g6<1>UD     g5<8,4,2>UD   0x00000004UD  { align1 1Q AccWrEnable };

Unfortunately, this new sequence isn't correct because MOV accesses the
accumulator with a different precision to MUL and, instead of filling
the bottom 32 bits with the source and zeroing the top 32 bits, it
leaves the top 32 (or maybe 31) bits alone and full of garbage.  When
the MACH comes along and tries to complete the multiplication, the
result is correct in the bottom 32 bits (which we throw away) and
garbage in the top 32 bits which are actually returned by MACH.

This commit does two things:  First, it adds an assert to ensure that we
don't try to rewrite accumulator destinations of MUL instructions so we
can avoid this precision issue.  Second, it modifies
required_dst_byte_stride to require a tightly packed stride so that we
fix up the sources instead and the actual code which gets emitted is
this:

mov(8)    g9<1>UD     g5<8,4,2>UD                 { align1 1Q };
mul(8)    acc0<1>UD   g9<8,8,1>UD   0x0004UW      { align1 1Q };
mach(8)   g6<1>UD     g5<8,4,2>UD   0x00000004UD  { align1 1Q AccWrEnable };

Fixes: efa4e4bc5f "intel/fs: Introduce regioning lowering pass"
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-01-18 10:18:52 -06:00
Francisco Jerez
efa4e4bc5f intel/fs: Introduce regioning lowering pass.
This legalization pass is meant to handle situations where the source
or destination regioning controls of an instruction are unsupported by
the hardware and need to be lowered away into separate instructions.
This should be more reliable and future-proof than the current
approach of handling CHV/BXT restrictions manually all over the
visitor.  The same mechanism is leveraged to lower unsupported type
conversions easily, which obsoletes the lower_conversions pass.

v2: Give conditional modifiers the same treatment as predicates for
    SEL instructions in lower_dst_modifiers() (Iago).  Special-case a
    couple of other instructions with inconsistent conditional mod
    semantics in lower_dst_modifiers() (Curro).

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-09 12:03:09 -08:00