Commit Graph

41 Commits

Author SHA1 Message Date
Ian Romanick
78ee74de4a intel/compiler: Micro optimize regions_overlap
On my Ice Lake laptop (using a locked CPU speed and other measures to
prevent thermal throttling, etc.) using a release build, improves
performance of compiling shaders from batman_arkham_city_goty.foz by
-1.09% ± 0.084% (n = 5, pooled s = 0.354471)

Reduces the size of a release build by 26k.

   text	   data	    bss	    dec	    hex	filename
23163641 400720	 231360	23795721	16b1809	before/lib64/dri/iris_dri.so
23137264 400720	 231360	23769344	16ab100	after/lib64/dri/iris_dri.so

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22299>
2023-04-06 19:07:50 +00:00
Paulo Zanoni
a099d6ae4d intel: add devinfo->has_64bit_float_via_math_pipe
Unusual hardware features that require special hanlding usually get a
devinfo field, so do this for MTL's unordered DF types. This will
guarantee that any platform based on MTL (thus inheriting from
MTL_FEATURES) will automatically be handled in these special cases.

v2: s/has_unordered_64bit_float/has_64bit_float_via_math_pipe/ (Curro).

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Paulo Zanoni
16b9f87104 intel/compiler: on MTL, DF instructions run in the math pipe
Adjust the scoreboard code to take that into account.

Fixes at least:
  - dEQP-VK.glsl.builtin.precision_double.refract.compute.vec3
  - dEQP-VK.glsl.builtin.precision_double.matrixcompmult.compute.mat4

v2: use intel_device_info_is_mtl()  (Curro, Jordan)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
Francisco Jerez
051887fbf3 intel/fs: Make the result of is_unordered() dependent on devinfo.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20072>
2022-12-10 03:59:19 +00:00
LingMan
11f91505d9 intel/fs: Accept an unsigned int in fs_reg::fs_reg
The parameter `nr` is currenlty an `int` but it only gets assigned to an
`unsigned int`. Make it clear in the function signature what's actually
required.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19423>
2022-11-23 18:37:35 +00:00
Caio Oliveira
0a2cfa14dd intel/compiler: Make component() work for FIXED_GRF/ARF
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18157>
2022-08-23 19:52:38 +00:00
Francisco Jerez
6f33b22495 intel/fs: Fix horiz_offset() to handle FIXED_GRFs with non-trivial 2D regions.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18157>
2022-08-23 19:52:38 +00:00
Kenneth Graunke
72e9843991 intel/compiler: Introduce a new brw_isa_info structure
This structure will contain the opcode mapping tables in the next
commit.  For now, this is the mechanical change to plumb it into all
the necessary places, and it continues simply holding devinfo.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
2022-06-30 23:46:35 +00:00
Lionel Landwerlin
361b3fee3c intel: move away from booleans to identify platforms
v2: Drop changes around GFX_VERx10 == 75 (Luis)

v3: Replace
   (GFX_VERx10 < 75 && devinfo->platform != INTEL_PLATFORM_BYT)
   by
   (devinfo->platform == INTEL_PLATFORM_IVB)
   Replace
   (devinfo->ver >= 5 || devinfo->platform == INTEL_PLATFORM_G4X)
   by
   (devinfo->verx10 >= 45)
   Replace
   (devinfo->platform != INTEL_PLATFORM_G4X)
   by
   (devinfo->verx10 != 45)

v4: Fix crocus typo

v5: Rebase

v6: Add GFX3, ILK & I965 platforms (Jordan)
    Move ifdef to code expressions (Jordan)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12981>
2021-11-08 16:48:06 +00:00
Ian Romanick
38807ceeae intel/fs: sel.cond writes the flags on Gfx4 and Gfx5
On Gfx4 and Gfx5, sel.l (for min) and sel.ge (for max) are implemented
using a separte cmpn and sel instruction.  This lowering occurs in
fs_vistor::lower_minmax which is called very, very late... a long, long
time after the first calls to opt_cmod_propagation.  As a result,
conditional modifiers can be incorrectly propagated across sel.cond on
those platforms.

No tests were affected by this change, and I find that quite shocking.
After just changing flags_written(), all of the atan tests started
failing on ILK.  That required the change in cmod_propagatin (and the
addition of the prop_across_into_sel_gfx5 unit test).

Shader-db results for ILK and GM45 are below.  I looked at a couple
before and after shaders... and every case that I looked at had
experienced incorrect cmod propagation.  This affected a LOT of apps!
Euro Truck Simulator 2, The Talos Principle, Serious Sam 3, Sanctum 2,
Gang Beasts, and on and on... :(

I discovered this bug while working on a couple new optimization
passes.  One of the passes attempts to remove condition modifiers that
are never used.  The pass made no progress except on ILK and GM45.
After investigating a couple of the affected shaders, I noticed that
the code in those shaders looked wrong... investigation led to this
cause.

v2: Trivial changes in the unit tests.

v3: Fix type in comment in unit tests.  Noticed by Jason and Priit.

v4: Tweak handling of BRW_OPCODE_SEL special case.  Suggested by Jason.

Fixes: df1aec763e ("i965/fs: Define methods to calculate the flag subset read or written by an fs_inst.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Dave Airlie <airlied@redhat.com>

Iron Lake
total instructions in shared programs: 8180493 -> 8181781 (0.02%)
instructions in affected programs: 541796 -> 543084 (0.24%)
helped: 28
HURT: 1158
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.35% max: 0.86% x̄: 0.53% x̃: 0.50%
HURT stats (abs)   min: 1 max: 3 x̄: 1.14 x̃: 1
HURT stats (rel)   min: 0.12% max: 4.00% x̄: 0.37% x̃: 0.23%
95% mean confidence interval for instructions value: 1.06 1.11
95% mean confidence interval for instructions %-change: 0.31% 0.38%
Instructions are HURT.

total cycles in shared programs: 239420470 -> 239421690 (<.01%)
cycles in affected programs: 2925992 -> 2927212 (0.04%)
helped: 49
HURT: 157
helped stats (abs) min: 2 max: 284 x̄: 62.69 x̃: 70
helped stats (rel) min: 0.04% max: 6.20% x̄: 1.68% x̃: 1.96%
HURT stats (abs)   min: 2 max: 48 x̄: 27.34 x̃: 24
HURT stats (rel)   min: 0.02% max: 2.91% x̄: 0.31% x̃: 0.20%
95% mean confidence interval for cycles value: -0.80 12.64
95% mean confidence interval for cycles %-change: -0.31% <.01%
Inconclusive result (value mean confidence interval includes 0).

GM45
total instructions in shared programs: 4985517 -> 4986207 (0.01%)
instructions in affected programs: 306935 -> 307625 (0.22%)
helped: 14
HURT: 625
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.35% max: 0.82% x̄: 0.52% x̃: 0.49%
HURT stats (abs)   min: 1 max: 3 x̄: 1.13 x̃: 1
HURT stats (rel)   min: 0.12% max: 3.90% x̄: 0.34% x̃: 0.22%
95% mean confidence interval for instructions value: 1.04 1.12
95% mean confidence interval for instructions %-change: 0.29% 0.36%
Instructions are HURT.

total cycles in shared programs: 153827268 -> 153828052 (<.01%)
cycles in affected programs: 1669290 -> 1670074 (0.05%)
helped: 24
HURT: 84
helped stats (abs) min: 2 max: 232 x̄: 64.33 x̃: 67
helped stats (rel) min: 0.04% max: 4.62% x̄: 1.60% x̃: 1.94%
HURT stats (abs)   min: 2 max: 48 x̄: 27.71 x̃: 24
HURT stats (rel)   min: 0.02% max: 2.66% x̄: 0.34% x̃: 0.14%
95% mean confidence interval for cycles value: -1.94 16.46
95% mean confidence interval for cycles %-change: -0.29% 0.11%
Inconclusive result (value mean confidence interval includes 0).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12191>
2021-08-11 13:09:20 -07:00
Lionel Landwerlin
f46aa1b9d7 intel/fs: use the final destination type for regioning restrictions
This is most likely a rebase mistake :(

Fixes: f3e5cd813a ("intel/fs: Handle regioning restrictions of split FP/DP pipelines.")
Ref: aa53665fda ("intel/fs/copy_prop: check stride constraints with actual final type")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10764>
2021-05-12 21:19:11 +00:00
Anuj Phogat
61e8636557 intel: Rename gen_device prefix to intel_device
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "gen_device" -rIl $SEARCH_PATH | xargs sed -ie "s/gen_device/intel_device/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10241>
2021-04-20 20:06:33 +00:00
Francisco Jerez
f3e5cd813a intel/fs: Handle regioning restrictions of split FP/DP pipelines.
The floating-point and double-precision FPU pipelines of XeHP
platforms don't support arbitrary regioning modes, corresponding
channels of sources and destination are required to be aligned to the
same sub-register offset, similar to the restriction FP64 instructions
had on CHV/BXT platforms.

Most violations of this restriction can be fixed easily by teaching
has_dst_aligned_region_restriction() about the change so the regioning
lowering pass gets rid of any unsupported regioning.  For cases where
this is not sufficient (e.g. because a virtual instruction internally
uses some regioning mode not supported by the floating-point pipeline)
the regioning lowering pass is extended with an additional
lower_exec_type() codepath that bit-casts sources and destination to
an integer type whenever the execution type is not supported by the
instruction.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>
2021-04-16 08:27:35 +00:00
Lionel Landwerlin
aa53665fda intel/fs/copy_prop: check stride constraints with actual final type
In some cases we will change the type of the destination register of
an instruction. This is the type we should use to verify that we're
allow to do the replacement.

Otherwise we can hit restrictions on CHV and upcoming Xe-Hp for
instance where the copy propagation transforms this :

send(16) (mlen: 2) vgrf10:UD, 0u, 0u, vgrf35:D, null:UD
mov(16) vgrf11:UW, vgrf10<2>:UW
mov(16) vgrf12:UW, vgrf10+0.2<2>:UW
mov(16) vgrf15:HF, |vgrf11|:HF
mov(16) vgrf16:HF, |vgrf12|:HF
mov(8) vgrf41<2>:UW, vgrf15+0.0:UW group0
mov(8) vgrf42<2>:UW, vgrf15+0.16:UW group8
mov(8) vgrf45<2>:UW, vgrf16+0.0:UW group0
mov(8) vgrf46<2>:UW, vgrf16+0.16:UW group8

into this :

send(16) (mlen: 2) vgrf10:UD, 0u, 0u, vgrf35:D, null:UD
mov(8) vgrf41<2>:HF, |vgrf10+0.0|<2>:HF group0
mov(8) vgrf42<2>:HF, |vgrf10+1.0|<2>:HF group8
mov(8) vgrf45<2>:HF, |vgrf10+0.2|<2>:HF group0
mov(8) vgrf46<2>:HF, |vgrf10+1.2|<2>:HF group8

Because of the floating point use, stride and offets should be the
same.

v2: Fix final destination type selection (Curro)

v3: constify (Curro)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9832>
2021-03-29 22:14:45 +00:00
Jason Ekstrand
69a3559efd intel/reg,fs: Handle immediates properly in subscript()
Just returning the original type isn't what we want in basically any
case.  Mask and shift the immediate as needed.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>
2021-01-22 18:38:37 +00:00
Jason Ekstrand
4c8cbe9b13 intel/compiler: Return 1 for immediates in regs_read
Previously, we were returning 2 whenever the source was a Q type.  As
far as I can tell, the only reason why this hasn't blown up before is
that it was only ever used for VGRFs until the SWSB pass landed which
uses it for everything.  This wasn't a problem because Q types generally
aren't a thing on TGL.  However, they are for a small handful of
instructions.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7329>
2021-01-22 18:38:37 +00:00
Francisco Jerez
bda1d72dd9 intel/fs: Replace fs_visitor::bank_conflict_cycles() with stand-alone function.
This will be re-usable by the IR performance analysis pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-04-28 23:00:29 -07:00
Francisco Jerez
6310a05f68 intel/fs: Rename half() helpers to quarter(), allow index up to 3.
Makes more sense considering SIMD32.  Relaxing the assertion in
brw_ir_fs.h will be required in order to avoid assertion failures on
SNB with SIMD32 fragment shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-04-28 23:00:29 -07:00
Dylan Baker
f8e4542bad replace _mesa_logbase2 with util_logbase2
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>
2020-04-21 11:09:03 -07:00
Francisco Jerez
ab0d1b3b3d intel/fs: Rework fs_inst::is_copy_payload() into multiple classification helpers.
This reworks the current fs_inst::is_copy_payload() method into a
number of classification helpers with well-defined semantics.  This
will be useful later on in order to optimize LOAD_PAYLOAD instructions
more aggressively in cases where we can determine it's safe to do so.

The closest equivalent of the present fs_inst::is_copy_payload()
method is the is_coalescing_payload() helper introduced here.

No functional nor shader-db changes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-01-17 13:21:19 -08:00
Francisco Jerez
c20dc9b836 intel/fs: Make implied_mrf_writes() an fs_inst method.
This will be convenient in a later commit enabling SIMD32 fragment
shaders, and happens to fix the calculation for MATH instructions
which is currently inaccurate for SIMD-lowered instructions on Gen4-5
platforms (all of them on Gen4 in SIMD16 mode), since it was based on
the shader's dispatch width rather than on the actual execution size
of the instruction.

This causes some shader-db noise on Gen4 due to the more compact
register allocation interacting with the SEND dependency workarounds,
but otherwise no major changes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-01-10 11:02:30 -08:00
Francisco Jerez
e0b8d7953e intel/fs/gen12: Add scheduling information to the IR.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-11 12:24:16 -07:00
Francisco Jerez
6f275a863d intel/fs: Define is_send() convenience IR helper.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Francisco Jerez
f326d9d218 intel/fs: Define is_payload() method of the IR instruction class.
This is required because SEND message payload sources are fetched
asynchronously by the hardware, which can lead to WaR data corruption
on Gen12+ platforms if not handled specially by the compiler to
guarantee proper synchronization.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Juan A. Suarez Romero
b06ae53606 Revert "intel/compiler: split is_partial_write() into two variants"
This reverts commit 40b3abb4d1.

It is not clear that this commit was entirely correct, and unfortunately
it was pushed by error.

CC: Jason Ekstrand <jason@jlekstrand.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-25 09:19:10 +02:00
Iago Toral Quiroga
40b3abb4d1 intel/compiler: split is_partial_write() into two variants
This function is used in two different scenarios that for 32-bit
instructions are the same, but for 16-bit instructions are not.

One scenario is that in which we are working at a SIMD8 register
level and we need to know if a register is fully defined or written.
This is useful, for example, in the context of liveness analysis or
register allocation, where we work with units of registers.

The other scenario is that in which we want to know if an instruction
is writing a full scalar component or just some subset of it. This is
useful, for example, in the context of some optimization passes
like copy propagation.

For 32-bit instructions (or larger), a SIMD8 dispatch will always write
at least a full SIMD8 register (32B) if the write is not partial. The
function is_partial_write() checks this to determine if we have a partial
write. However, when we deal with 16-bit instructions, that logic disables
some optimizations that should be safe. For example, a SIMD8 16-bit MOV will
only update half of a SIMD register, but it is still a complete write of the
variable for a SIMD8 dispatch, so we should not prevent copy propagation in
this scenario because we don't write all 32 bytes in the SIMD register
or because the write starts at offset 16B (wehere we pack components Y or
W of 16-bit vectors).

This is a problem for SIMD8 executions (VS, TCS, TES, GS) of 16-bit
instructions, which lose a number of optimizations because of this, most
important of which is copy-propagation.

This patch splits is_partial_write() into is_partial_reg_write(), which
represents the current is_partial_write(), useful for things like
liveness analysis, and is_partial_var_write(), which considers
the dispatch size to check if we are writing a full variable (rather
than a full register) to decide if the write is partial or not, which
is what we really want in many optimization passes.

Then the patch goes on and rewrites all uses of is_partial_write() to use
one or the other version. Specifically, we use is_partial_var_write()
in the following places: copy propagation, cmod propagation, common
subexpression elimination, saturate propagation and sel peephole.

Notice that the semantics of is_partial_var_write() exactly match the
current implementation of is_partial_write() for anything that is
32-bit or larger, so no changes are expected for 32-bit instructions.

Tested against ~5000 tests involving 16-bit instructions in CTS produced
the following changes in instruction counts:

            Patched  |     Master    |    %    |
================================================
SIMD8  |    621,900  |    706,721    | -12.00% |
================================================
SIMD16 |     93,252  |     93,252    |   0.00% |
================================================

As expected, the change only affects SIMD8 dispatches.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2019-04-18 11:05:18 +02:00
Francisco Jerez
7272fe9c08 intel/fs: Rely on undocumented unrestricted regioning for 32x16-bit integer multiply.
Even though the hardware spec claims that any "integer DWord multiply"
operation is affected by the regioning restrictions of CHV/BXT/GLK,
this is inconsistent with the behavior of the simulator and with
empirical evidence -- Return false from has_dst_aligned_region_restriction()
for such instructions as a micro-optimization.

Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Francisco Jerez
c3c27762f7 intel/fs: Exclude control sources from execution type and region alignment calculations.
Currently the execution type calculation will return a bogus value in
cases like:

  mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u

Which will be considered to have a 32-bit integer execution type even
though the actual indirect move operation will be carried out with
16-bit precision.

Similarly there's no need to apply the CHV/BXT double-precision region
alignment restrictions to such control sources, since they aren't
directly involved in the double-precision arithmetic operations
emitted by these virtual instructions.  Applying the CHV/BXT
restrictions to control sources was expected to be harmless if mildly
inefficient, but unfortunately it exposed problems at codegen level
for virtual instructions (namely the SHUFFLE instruction used for the
Vulkan 1.1 subgroup feature) that weren't prepared to accept control
sources with an arbitrary strided region.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109328
Reported-by: Mark Janes <mark.a.janes@intel.com>
Fixes: efa4e4bc5f "intel/fs: Introduce regioning lowering pass."
Tested-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-21 14:07:25 -08:00
Jason Ekstrand
077b9557a4 intel/fs: Get rid of fs_inst::equals
There are piles of fields that it doesn't check so using it is a lie.
The only reason why it's not causing problem is because it has exactly
one user which only uses it for MOV instructions (which aren't very
interesting) and only on Sandy Bridge and earlier hardware.  Just get
rid of it and inline it in the one place that it's actually used.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-29 18:43:55 +00:00
Francisco Jerez
c84ec70b3a intel/fs: Promote execution type to 32-bit when any half-float conversion is needed.
The docs are fairly incomplete and inconsistent about it, but this
seems to be the reason why half-float destinations are required to be
DWORD-aligned on BDW+ projects.  This way the regioning lowering pass
will make sure that the destination components of W to HF and HF to W
conversions are aligned like the corresponding conversion operation
with 32-bit execution data type.

Tested-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-18 16:09:39 -08:00
Francisco Jerez
efa4e4bc5f intel/fs: Introduce regioning lowering pass.
This legalization pass is meant to handle situations where the source
or destination regioning controls of an instruction are unsupported by
the hardware and need to be lowered away into separate instructions.
This should be more reliable and future-proof than the current
approach of handling CHV/BXT restrictions manually all over the
visitor.  The same mechanism is leveraged to lower unsupported type
conversions easily, which obsoletes the lower_conversions pass.

v2: Give conditional modifiers the same treatment as predicates for
    SEL instructions in lower_dst_modifiers() (Iago).  Special-case a
    couple of other instructions with inconsistent conditional mod
    semantics in lower_dst_modifiers() (Curro).

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-09 12:03:09 -08:00
Francisco Jerez
b94519971a intel/fs: Constify fs_inst::can_do_source_mods().
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-09 12:03:09 -08:00
Francisco Jerez
c301f447ea intel/fs: Respect CHV/BXT regioning restrictions in copy propagation pass.
Currently the visitor attempts to enforce the regioning restrictions
that apply to double-precision instructions on CHV/BXT at NIR-to-i965
translation time.  It is possible though for the copy propagation pass
to violate this restriction if a strided move is propagated into one
of the affected instructions.  I've only reproduced this issue on a
future platform but it could affect CHV/BXT too under the right
conditions.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-09 12:03:08 -08:00
Jason Ekstrand
4ba445e011 intel: Don't propagate conditional modifiers if a UD source is negated
This fixes a bug uncovered by my NIR integer division by constant
optimization series.

Fixes: 19f9cb72c8 "i965/fs: Add pass to propagate conditional..."
Fixes: 627f94b72e "i965/vec4: adding vec4_cmod_propagation..."
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-10-10 13:13:12 -05:00
Francisco Jerez
4bd2047dee intel/fs: Add explicit last_rt flag to fb writes orthogonal to eot.
When using multiple RT write messages to the same RT such as for
dual-source blending or all RT writes in SIMD32, we have to set the
"Last Render Target Select" bit on all write messages that target the
last RT but only set EOT on the last RT write in the shader.
Special-casing for dual-source blend works today because that is the
only case which requires multiple RT write messages per RT.  When we
start doing SIMD32, this will become much more common so we add a
dedicated bit for it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-05-29 15:44:50 -07:00
Ian Romanick
8f83eea71e i965: Add negative_equals methods
This method is similar to the existing ::equals methods.  Instead of
testing that two src_regs are equal to each other, it tests that one is
the negation of the other.

v2: Simplify various checks based on suggestions from Matt.  Use
src_reg::type instead of fixed_hw_reg.type in a check.  Also suggested
by Matt.

v3: Rebase on 3 years.  Fix some problems with negative_equals with VF
constants.  Add fs_reg::negative_equals.

v4: Replace the existing default case with BRW_REGISTER_TYPE_UB,
BRW_REGISTER_TYPE_B, and BRW_REGISTER_TYPE_NF.  Suggested by Matt.
Expand the FINISHME comment to better explain why it isn't already
finished.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3]
Reviewed-by: Matt Turner <mattst88@gmail.com>
2018-03-26 08:50:43 -07:00
Jason Ekstrand
4150920b95 intel/fs: Add a helper for emitting scan operations
This commit adds a helper to the builder for emitting "scan" operations.
Given a binary operation #, a scan takes the vector [a0, a1, ..., aN]
and returns the vector [a0, a0 # a1, ..., a0 # a1 # ... # aN] where each
channel contains the combination of all previous channels.  The sequence
of instructions to perform the scan is fairly optimal; a 16-wide scan on
a 32-bit type is only 6 instructions.  The subgroup scan and reduction
operations will be implemented in terms of this.

Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2018-03-07 12:13:47 -08:00
Alejandro Piñeiro
a05b6f25bf i965/fs: Remove BRW_REGISTER_TYPE_HF assert at get_exec_type
Note that we don't remove the assert at i965/vec4. At this point half
float support is only for the scalar backend.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-12-06 08:57:18 +01:00
Kenneth Graunke
3d112a7cd4 i965: Move fs_inst::has_side_effects()'s eot check to the parent class.
This eliminates a layer of wrapping, and makes a backend_instruction
sufficient.  The downside is that it exposes 'eot' to the vec4 backend,
which it doesn't need, but can basically happily ignore.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Pallavi G <pallavi.g@intel.com>
2017-10-19 10:19:20 -07:00
Juan A. Suarez Romero
79af256388 i965/fs: add helper to retrieve instruction execution type
The execution data size is the biggest type size of any instruction
operand.

We will use it to know if the instruction deals with DF, because in Ivy
we need to double the execution size and regioning parameters.

v2:
- Fix typo in commit log (Matt)
- Use static inline function instead of fs_inst's method (Curro).
- Define the result as a constant (Curro).
- Fix indentation (Matt).
- Add braces to nested control flow (Matt).

v3 (Curro):
- Add get_exec_type() and other auxiliary functions and use them to
  calculate its size.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check.  Fix deduced
  execution type for integer vector types.  Take destination type as
  execution type where there is no valid source.  Assert-fail if the
  deduced execution type is byte.  Move into brw_ir_fs.h header for
  consistency with the VEC4 back-end. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Jason Ekstrand
700bebb958 i965: Move the back-end compiler to src/intel/compiler
Mostly a dummy git mv with a couple of noticable parts:
 - With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
 - Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
 - brw_util.[ch] is not really compiler specific, so it's moved to i965.

v2:
 - move brw_eu_defines.h instead of brw_defines.h
 - remove no-longer applicable includes
 - add missing vulkan/ prefix in the Android build (thanks Tapani)

v3:
 - don't list brw_defines.h in src/intel/Makefile.sources (Jason)
 - rebase on top of the oa patches

[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00