Commit Graph

94 Commits

Author SHA1 Message Date
Matt Turner
46a3ea06be i965/fs: Print the scheduler mode.
Line wrap some awfully long lines while we are here.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-07-30 14:35:43 -07:00
Matt Turner
dabb5d4bee i965/fs: Add a shader_stats struct.
It'll grow further, and we'd like to avoid adding an additional
parameter to fs_generator() for each new piece of data.

v2 (idr): Rebase on 17 months.  Track a visitor instead of a cfg.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-30 14:35:43 -07:00
Caio Marcelo de Oliveira Filho
b390ff3517 intel/fs: Add support for SLM fence in Gen11
Gen11 SLM is not on L3 anymore, so now the hardware has two separate
fences.  Add a way to control which fence types to use.

At this time, we don't have enough information in NIR to control the
visibility of the memory being fenced, so for now be conservative and
assume that fences will need a stall.  With more information later
we'll be able to reduce those.

Fixes Vulkan CTS tests in ICL:

    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_nonlocal.workgroup.guard_local.buffer.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.buffer.guard_nonlocal.workgroup.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.payload_local.image.guard_nonlocal.workgroup.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.buffer.guard_nonlocal.workgroup.comp
    dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.workgroup.payload_local.image.guard_nonlocal.workgroup.comp

The whole set of supported tests in dEQP-VK.memory_model.* group
should be passing in ICL now.

v2: Pass BTI around instead of having an enum.  (Jason)
    Emit two SHADER_OPCODE_MEMORY_FENCE instead of one that gets
    transformed into two.  (Jason)
    List tests fixed.  (Lionel)

v3: For clarity, split the decision of which fences to emit from the
    emission code.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-07-11 08:29:32 -07:00
Jason Ekstrand
fa869f45c8 intel/fs: Use nir_lower_interpolation on gen11+
On gen11, the removed the PLN instruction so we have to emit a pile of
MAD to emulate it.  We may as well do that in NIR so we can optimize and
later schedule it.

Shader-db results on Ice Lake:

    total instructions in shared programs: 17145644 -> 16556440 (-3.44%)
    instructions in affected programs: 11507454 -> 10918250 (-5.12%)
    helped: 35763
    HURT: 42085
    helped stats (abs) min: 1 max: 140 x̄: 19.09 x̃: 18
    helped stats (rel) min: 0.04% max: 37.93% x̄: 15.40% x̃: 14.49%
    HURT stats (abs)   min: 1 max: 248 x̄: 2.22 x̃: 2
    HURT stats (rel)   min: 0.05% max: 50.00% x̄: 5.00% x̃: 2.47%
    95% mean confidence interval for instructions value: -7.67 -7.47
    95% mean confidence interval for instructions %-change: -4.46% -4.29%
    Instructions are helped.

    total loops in shared programs: 4370 -> 4370 (0.00%)
    loops in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    total cycles in shared programs: 360624645 -> 368220857 (2.11%)
    cycles in affected programs: 269631244 -> 277227456 (2.82%)
    helped: 15583
    HURT: 65874
    helped stats (abs) min: 1 max: 28561 x̄: 78.45 x̃: 32
    helped stats (rel) min: <.01% max: 67.81% x̄: 5.38% x̃: 2.44%
    HURT stats (abs)   min: 1 max: 238638 x̄: 133.87 x̃: 20
    HURT stats (rel)   min: <.01% max: 306.25% x̄: 5.81% x̃: 3.97%
    95% mean confidence interval for cycles value: 67.42 119.09
    95% mean confidence interval for cycles %-change: 3.61% 3.73%
    Cycles are HURT.

    total spills in shared programs: 8943 -> 8981 (0.42%)
    spills in affected programs: 1925 -> 1963 (1.97%)
    helped: 44
    HURT: 14

    total fills in shared programs: 21815 -> 21925 (0.50%)
    fills in affected programs: 3511 -> 3621 (3.13%)
    helped: 41
    HURT: 18

    LOST:   70
    GAINED: 14

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Sagar Ghuge
83fdec0f0d intel/compiler: Enable the emission of ROR/ROL instructions
v2: 1) Drop changes for vec4 backend as on Gen11+ we don't support
       align16 mode (Matt Turner)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-01 10:14:22 -07:00
Lionel Landwerlin
836225840c intel/compiler: fix derivative on y axis implementation
This rewrites the ddy in EXECUTE_4 mode with a loop to make it more
obvious what is going on and also sets the group each of the 4 threads
in the groups are supposed to execute.

Fixes the following CTS tests :

   dEQP-VK.glsl.derivate.dfdyfine.dynamic_*

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Co-Authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 2134ea3800 ("intel/compiler/fs: Implement ddy without using align16 for Gen11+")
2019-06-27 18:14:58 +00:00
Jason Ekstrand
f4ef34f207 intel/fs: Add an UNDEF instruction to avoid excess live ranges
With 8 and 16-bit types and anything where we have to use non-trivial
strides registersto deal with restrictions, we end up with things that
look like partial writes even though we don't care about any values in
the register except those written by that instruction.  This is
particularly important when dealing with loops because liveness sees
is_partial_write and the fact that an old version from a previous loop
iteration may be valid at that point and extends all purely partially
written values to the entire loop.

This commit adds a new UNDEF instruction which does nothing (the
generator doesn't emit anything) but which does a fake write to the
register.  This informs liveness that we don't care about any values
before that point so it won't consider those registers to be falsely
live.  We can safely emit UNDEF instructions for all SSA values that
come in from NIR and nearly all temporaries generated by various stages
of the compiler.  In particular, we need to insert UNDEF instructions
when we handle region restrictions because the newly allocated registers
are almost guaranteed to be partially written.

No shader-db changes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110432
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-06-04 14:27:30 -05:00
Jason Ekstrand
9e403dc56e intel/fs: Do a stalling MFENCE in endInvocationInterlock()
Fixes: 939312702e "i965: Add ARB_fragment_shader_interlock support"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-30 14:00:26 +00:00
Jason Ekstrand
859de4a748 intel/fs,vec4: Use g0 as the header for MFENCE
We set header_present but then pass it some random garbage.  Give it g0
instead.  I'm not actually sure this does anything but g0 is the usual
header data and this is what the windows driver does so it seems like a
good idea.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-05-30 14:00:26 +00:00
Matt Turner
e8c74a1e16 intel/compiler: Unset flag reg when FB write is not predicated
In the FS IR we pretend that the instruction is predicated with (+f0.1)
just for flag dependency tracking purposes. Since the instruction
doesn't support predication before Haswell, we unset the predicate so we
should also unset the flag register so that we can round-trip the
disassembly.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-07 14:33:48 -07:00
Rafael Antognolli
70e03e220c intel/fs: Remove fs_generator::generate_linterp from gen11+.
We now have a lowering pass that will do this at the fs_visitor level,
so we can remove this code from gen11+.

v2: Reduce size of the "i" array from 4 to 2 (Matt).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-22 16:54:00 -07:00
Rafael Antognolli
c0504569ea intel/fs: Move the scalar-region conversion to the generator.
Move the scalar-region conversion from the IR to the generator, so it
doesn't affect the Gen11 path. We need the non-scalar regioning
for a later lowering pass that we are adding.

v2: Better commit message (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-22 16:54:00 -07:00
Iago Toral Quiroga
aaae24179f intel/compiler: fix ddy for half-float in Broadwell
Broadwell has restrictions that apply to Align16 half-float that
make the Align16 implementation of this invalid for this platform.
Use the gen11 path for this instead, which uses Align1 mode.

The restriction is not present in cherryview, gen9 or gen10, where
the Align16 implementation seems to work just fine.

v2:
 - Rework the comment in the code, move the PRM citation from the
   commit message to the comment in the code (Matt)
 - Cherryview isn't affected, only Broadwell (Matt)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 11:05:18 +02:00
Iago Toral Quiroga
60c7c6d3ba intel/compiler: fix ddx and ddy for 16-bit float
We were assuming 32-bit elements. Also, In SIMD8 we pack 2 vector components
in a single SIMD register, so for example, component Y of a 16-bit vec2
starts is at byte offset 16B. This means that when we compute the offset of
the elements to be differentiated we should not stomp whatever base offset we
have, but instead add to it.

v2
 - Use byte_offset() helper (Jason)
 - Merge the fix for SIMD8: using byte_offset() fixes that too.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-04-18 11:05:18 +02:00
Plamena Manolova
19ab082001 i965: Disable ARB_fragment_shader_interlock for platforms prior to GEN9
ARB_fragment_shader_interlock depends on memory fences to
ensure fragment ordering and this ordering guarantee is
only supported from GEN9 onwards.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109980
Fixes: 939312702e "i965: Add ARB_fragment_shader_interlock support."
Signed-off-by: Plamena Manolova <plamena.n.manolova@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-14 13:04:12 +00:00
Jason Ekstrand
c4fb6b0c81 intel/eu: Add an EOT parameter to send_indirect_[split]_message
For split indirect sends we have to put the EOT parameter in the
extended descriptor as well as the instruction itself so just calling
brw_inst_set_eot is insufficient.  Moving the EOT handling handling into
the send_indirect_[split]_message helper lets us handle it properly.
2019-02-25 11:35:12 -06:00
Francisco Jerez
e03be78252 intel/fs: Implement extended strides greater than 4 for IR source regions.
Strides up to 32B can be implemented for the source regions of most
instructions by leveraging either the vertical or the horizontal
stride of the hardware Align1 region.  The main motivation for this is
that currently the lower_integer_multiplication() pass will happily
double the stride of one of the 32-bit sources, which can blow up if
the stride of the original source was already the maximum value
allowed by the hardware.

An alternative would be to use the regioning legalization pass in
order to lower such strides into the composition of multiple legal
strides, but that would be somewhat less efficient.

This showed up as a regression from my commit cbea91eb57
in Vulkan 1.1 CTS tests on CHV/BXT platforms, however it was really a
pre-existing problem that had affected conformance on other platforms
without native support for integer multiplication.  CHV/BXT were
getting around it because the code I removed in that commit had the
"fortunate" side effect of emitting narrower regions that didn't hit
the hardware stride limit after lowering.  Beyond fixing the
regression this fixes ~90 additional Vulkan 1.1 subgroup CTS tests on
ICL (that's why this patch is marked for inclusion in mesa-stable even
though the original regressing patch was not).

According to Jason, a nearly equivalent change had been committed
previously as e8c9e65185 and then (mistakenly?) reverted as
a31d038208.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes <mark.a.janes@intel.com>
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Jason Ekstrand
5064464931 intel/fs: Silence a compiler warning
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-14 16:04:47 -06:00
Jason Ekstrand
eab1c55590 intel/fs: Support SENDS in SHADER_OPCODE_SEND
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand
b284d222db intel/fs: Use SHADER_OPCODE_SEND for varying UBO pulls on gen7+
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand
8514eba693 intel/fs: Use SHADER_OPCODE_SEND for texturing on gen7+
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand
d2d3e04501 intel/fs: Use SHADER_OPCODE_SEND for surface messages
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Jason Ekstrand
7f1cf046cd intel/fs: Add a generic SEND opcode
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Kenneth Graunke
04c2f12ab2 i965: Drop mark_surface_used mechanism.
The original idea was that the backend compiler could eliminate
surfaces, so we would have it mark which ones are actually used,
then shrink the binding table accordingly.  Unfortunately, it's a
pretty blunt mechanism - it can only prune things from the end,
not the middle - since we decide the layout before we even start
the backend compiler, and only limit the size.  It also basically
gives up if it sees indirect array access.

Besides, we do the vast majority of our surface elimination in NIR
anyway, not the backend - and I don't see that trend changing any
time soon.  Vulkan abandoned this plan a long time ago, and I don't
use it in Iris, but it's still been kicking around in i965.

I hacked shader-db to print the binding table size in bytes, and
observed no changes with this patch.  So, this code appears to do
nothing useful.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-13 09:35:32 -08:00
Matt Turner
8534742404 intel/compiler: Split 64-bit MOV-indirects if needed
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-01-09 16:42:40 -08:00
Francisco Jerez
230a8a541d intel/fs: Remove FS_OPCODE_UNPACK_HALF_2x16_SPLIT opcodes.
These are broken on a future platform, but it turns out we don't need
to fix them, since they're just type-converting moves with strided
source.  Kill them.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-09 12:03:09 -08:00
Francisco Jerez
812ede088f intel/fs: Implement quad swizzles on ICL+.
Align16 is no longer a thing, so a new implementation is provided
using Align1 instead.  Not all possible swizzles can be represented as
a single Align1 region, but some fast paths are provided for
frequently used swizzles that can be represented efficiently in Align1
mode.

Fixes ~90 subgroup quad swap Vulkan CTS tests.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-09 12:03:08 -08:00
Sagar Ghuge
0a7664fe8c intel/compiler: Change src1 reg type to unsigned doubleword
To have uniform behavior while disassembling send(c) instruction use
register type of unsigned doubleword for src1 when message descriptor is
immediate value. Bspec does not specifiy anything for src1 immediate
default type.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
2018-10-23 12:44:24 -07:00
Jason Ekstrand
3cbc02e469 intel: Use TXS for image_size when we have a typed surface
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:03 -05:00
Ian Romanick
d515c75463 intel/compiler: Implement untyped atomic float min, max, and compare-swap dataport messages
v2: Split changes to the message type field to another patch.  Suggested
by Caio.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-22 20:31:32 -07:00
Francisco Jerez
48d6fc5eb6 intel/fs: Initialize mlen for gen7 varying pull constant load messages.
This makes the message length available at the IR level, which should
save some guesswork in a future commit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:58 -07:00
Francisco Jerez
2bac890bf5 intel/eu: Use descriptor constructors for dataport read messages.
v2: Use SET_BITS macro instead of left shift (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
27c211e30f intel/eu: Use descriptor constructors for sampler messages.
v2: Use SET_BITS macro instead of left shift (Ken).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Francisco Jerez
1c90ae5acc intel/eu: Provide desc immediate argument up front to brw_send_indirect_message().
The current approach of returning a setup instruction where additional
descriptor fields can be specified is still supported in order to keep
things working, but it will be removed later in this series.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-07-09 23:46:57 -07:00
Jason Ekstrand
e208bc3bb7 intel/fs: Get rid of MOV_DISPATCH_TO_FLAGS
We can just emit the MOV in the two places where we use this.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
2d7d652d5c intel/fs: Disable opt_sampler_eot() in 32-wide dispatch.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
db6ca13efc intel/fs: Emit LINE+MAC for LINTERP with unaligned coordinates
On g4x through Sandy Bridge, src1 (the coordinates) of the PLN
instruction is required to be an even register number.  When it's odd
(which can happen with SIMD32), we have to emit a LINE+MAC combination
instead.  Unfortunately, we can't just fall through to the gen4 case
because the input registers are still set up for PLN which lays out the
four src1 registers differently in SIMD16 than LINE.

v2 (Jason Ekstrand):
 - Take advantage of both accumulators and emit LINE LINE MAC MAC
   (Based on a patch from Francisco Jerez)
 - Unify the gen4 and gen4x-6 cases using a loop

v3 (Jason Ekstrand):
 - Don't unify gen4 with gen4x-6 as this turns out to be more fragile
   than first thought without reworking the gen4 barycentric coordinate
   layout.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
73d60455e9 intel/fs: Rework INTERPOLATE_AT_PER_SLOT_OFFSET
This reworks INTERPOLATE_AT_PER_SLOT_OFFSET to work more like an ALU
operation and less like a send.  This is less code over-all and, as a
side-effect, it now properly handles execution groups and lowering so
SIMD32 support just falls out.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
74b477039d intel/fs: Add the group to the flag subreg number on SNB and older
We want consistent behavior in the meaning of the flag_subreg field
between SNB and IVB+.

v2 (Jason Ekstrand):
 - Add some extra commentary

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
ce370902d4 intel/fs: Fix FB write message control codegen for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
48241c780a intel/fs: Fix codegen of FS_OPCODE_SET_SAMPLE_ID for SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Francisco Jerez
5b6e91dd35 intel/fs: Remove program key argument from generator.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
a14fb0184a intel/fs: Set up FB write message headers in the visitor
Doing instruction header setup in the generator is awful for a number
of reasons.  For one, we can't schedule the header setup at all.  For
another, it means lots of implied writes which the instruction scheduler
and other passes can't properly read about.  The second isn't a huge
problem for FB writes since they always happen at the end.  We made a
similar change to sampler handling in ff4726077d.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
c0a1c248b8 intel/fs: Pull FB write implied headers from src[0]
Now that we have the implied header in src[0] for tracking purposes, we
may as well use it in the generator.  This makes things a tiny bit more
general.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-06-28 13:19:38 -07:00
Jason Ekstrand
381fac2740 intel/eu: Add some brw_get_default_ helpers
This is much cleaner than everything that wants a default value poking
at the bits of p->current directly.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-06-04 14:03:03 -07:00
Plamena Manolova
939312702e i965: Add ARB_fragment_shader_interlock support.
Adds suppport for ARB_fragment_shader_interlock. We achieve
the interlock and fragment ordering by issuing a memory fence
via sendc.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2018-06-01 16:36:39 +01:00
Francisco Jerez
4bd2047dee intel/fs: Add explicit last_rt flag to fb writes orthogonal to eot.
When using multiple RT write messages to the same RT such as for
dual-source blending or all RT writes in SIMD32, we have to set the
"Last Render Target Select" bit on all write messages that target the
last RT but only set EOT on the last RT write in the shader.
Special-casing for dual-source blend works today because that is the
only case which requires multiple RT write messages per RT.  When we
start doing SIMD32, this will become much more common so we add a
dedicated bit for it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00
Francisco Jerez
d3cd6b7215 intel/fs: Replace the CINTERP opcode with a simple MOV
The only reason it was it's own opcode was so that we could detect it
and adjust the source register based on the payload setup.  Now that
we're using the ATTR file for FS inputs, there's no point in having a
magic opcode for this.

v2 (Jason Ekstrand):
 - Break the bit which removes the CINTERP opcode into its own patch

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00
Jason Ekstrand
71a86d1fc6 intel/fs: Use groups for SIMD16 LINTERP on gen11+
This is better than compression control because it naturally extends to
SIMD32.

v2:
 - Push/pop instruction state around adjusted codegen (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00
Jason Ekstrand
a1a850cd34 intel/fs: Assert that the gen4-6 plane restrictions are followed
The fall-back does not work correctly in SIMD16 mode and the register
allocator should ensure that we never hit this case anyway.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00