Commit Graph

1425 Commits

Author SHA1 Message Date
Rhys Perry
1ffacc3ce1 nir/lower_gs_intrinsics: add option for per-stream counts
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2422>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2422>
2020-01-14 12:11:14 +00:00
Jason Ekstrand
d3737002ee nir/lower_atomics_to_ssbo: Also lower barriers
This is more correct for a pass which is supposed to completely lower
away atomic counters.  It also lets us stop supporting atomic counter
barriers in most of the drivers.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:47 +00:00
Jason Ekstrand
e40b11bbcb nir: Rename nir_intrinsic_barrier to control_barrier
This is a more explicit name now that we don't want it to be doing any
memory barrier stuff for us.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:47 +00:00
Jason Ekstrand
bd3ab75aef intel/nir: Stop adding redundant barriers
Now that both GLSL and SPIR-V are adding shared and tcs_patch barriers
(as appropreate) prior to the nir_intrinsic_barrier, we don't need to do
it ourselves in the back-end.  This reverts commit
26e950a5de01564e3b5f2148ae994454ae5205fe.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:47 +00:00
Jason Ekstrand
60097cc840 nir: Add a new memory_barrier_tcs_patch intrinsic
Right now, it's implemented as a no-op for everyone.  For most drivers,
it's a switch case in the NIR -> whatever which just breaks.  For ir3,
they already have code to delete tessellation barriers so we just add a
case to also delete memory_barrier_tcs_patch.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:47 +00:00
Jason Ekstrand
ada49bae5e intel/vec4: Support scoped_memory_barrier
Fixes: 06aecb14c0 "anv: Implement VK_KHR_vulkan_memory_model"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:46 +00:00
Francisco Jerez
c20dc9b836 intel/fs: Make implied_mrf_writes() an fs_inst method.
This will be convenient in a later commit enabling SIMD32 fragment
shaders, and happens to fix the calculation for MATH instructions
which is currently inaccurate for SIMD-lowered instructions on Gen4-5
platforms (all of them on Gen4 in SIMD16 mode), since it was based on
the shader's dispatch width rather than on the actual execution size
of the instruction.

This causes some shader-db noise on Gen4 due to the more compact
register allocation interacting with the SEND dependency workarounds,
but otherwise no major changes.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-01-10 11:02:30 -08:00
Francisco Jerez
591f146fd2 intel/fs/cse: Fix non-deterministic behavior due to inaccurate liveness calculation.
The liveness calculation done by the local CSE pass in order to prune
AEB entries whose sources are no longer live is currently inaccurate,
because the live intervals are calculated once at the beginning of the
pass, so they don't take into account any of the copy instructions
inserted by the CSE pass as it makes progress.  However the IP counter
used in that calculation is based on the start_ip of the basic block,
which is updated automatically whenever any instructions are inserted
into the CFG.  This causes the IP counter and liveness intervals to
get out of sync in programs with multiple basic blocks, causing the
CSE pass to toss AEB entries prematurely, which can lead to missed
optimization opportunities rather non-deterministically.

On BDW this leads to the following shader-db changes:

 total instructions in shared programs: 14952488 -> 14951763 (-0.00%)
 instructions in affected programs: 45416 -> 44691 (-1.60%)
 helped: 40
 HURT: 4

 total spills in shared programs: 20989 -> 20970 (-0.09%)
 spills in affected programs: 103 -> 84 (-18.45%)
 helped: 3
 HURT: 0

 total fills in shared programs: 24981 -> 24926 (-0.22%)
 fills in affected programs: 127 -> 72 (-43.31%)
 helped: 3
 HURT: 0

In addition it avoids a number of regressions in combination with some
of the optimization changes I'm working on for SIMD32, which would
have made CSE more effective...  Causing it to be less effective
elsewhere in the program astonishingly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-01-10 11:02:06 -08:00
Francisco Jerez
cc0ea482ad intel/fs: Fix nir_intrinsic_load_barycentric_at_sample for SIMD32.
For uniform sample ID, only the first channel of msg_data will be
initialized.  We need to pass that component only to the SEND message
for SIMD lowering to unzip the descriptor source correctly.

Fixes several dozens of conformance test failures with SIMD32 fragment
shaders enabled, including:

dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.dynamic_sample_number.*

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-01-10 11:01:52 -08:00
Francisco Jerez
0703eab012 intel/fs/gen8+: Fix r127 dst/src overlap RA workaround for EOT message payload.
The problem occured when the return payload of a SIMD8 SEND
instruction was re-used as source payload of an EOT SEND message.  In
such cases the interference edge added by that workaround between the
payload and grf127_send_hack_node would have no effect, because the
payload would be allocated to a fixed range of registers containing
r127 by the special handling of EOT message payloads in the same
function.  This would cause things to blow up if the source payload of
the first SIMD8 message ended up being allocated to a range which
happened to overlap the destination.

Fix it by avoiding r127 altogether in the allocation of EOT message
payloads.

The problem can be reproduced on ICL with the fp-indirections2 Piglit
test-case in combination with the other optimizer changes of this
series.

Fixes: 232ed89802 "i965/fs: Register allocator shoudn't use grf127 for sends dest"
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-01-10 11:00:42 -08:00
Francisco Jerez
0a6e46d44d intel/fs/gen11+: Handle ROR/ROL in lower_simd_width().
Prevents invalid code from being emitted for ROR/ROL instructions in
SIMD32 shaders.

The problem can be reproduced with the following tests while forcing
SIMD32 to be used for fragment shaders:

 piglit.shaders.glsl-rotate-left
 piglit.shaders.glsl-rotate-right

However the issue could occur in production already with compute
shaders and a workgroup size large enough to trigger SIMD32 dispatch.

Fixes: 83fdec0f0d "intel/compiler: Enable the emission of ROR/ROL instructions"
Cc: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-01-10 11:00:24 -08:00
Jason Ekstrand
b788cccfe2 intel/disasm: Fix decoding of src0 of SENDS
There is no instruction field for the register file for src0 because
it's always GRF.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3309>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3309>
2020-01-08 14:14:16 +00:00
Jason Ekstrand
803fad43c3 intel/nir: Add a memory barrier before barrier()
Our barrier instruction does not implicitly do a memory fence but the
GLSL barrier() intrinsic is supposed to.  The easiest back-portable
solution is to just add the NIR barriers.  We'll sort this out more
properly in later commits.

Cc: mesa-stable@lists.freedesktop.org
Closes: #2138
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2020-01-07 21:52:19 -06:00
Kenneth Graunke
d0d28c783d iris: Set nir_shader_compiler_options::unify_interfaces.
This is technically enabling the option in the common intel backend
code, but only the st/nir linker uses the option, so it's iris-only.

Fixes Piglit's spec/glsl-1.50/execution/geometry/clip-distance-vs-gs-out

Closes: #2274
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3249>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3249>
2020-01-03 00:41:50 +00:00
Caio Marcelo de Oliveira Filho
766fdeccf9 intel/vec4: Fix lowering of multiplication by 16-bit constant
Existing code was ignoring whether the type of the immediate source
was signed or not.  If the source was signed, it would ignore small
negative values but it also would wrongly accept values between
INT16_MAX and UINT16_MAX, causing the atual value to later be
reinterpreted as a negative number (under 16-bits).

Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for older
platforms that don't support MUL with 32x32 types and use vec4.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-17 10:45:22 -08:00
Caio Marcelo de Oliveira Filho
2137be22fa intel/fs: Fix lowering of dword multiplication by 16-bit constant
Existing code was ignoring whether the type of the immediate source
was signed or not.  If the source was signed, it would ignore small
negative values but it also would wrongly accept values between
INT16_MAX and UINT16_MAX, causing the atual value to later be
reinterpreted as a negative number (under 16-bits).

Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for platforms
that don't support MUL with 32x32 types, including ICL and TGL.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2186
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-17 10:45:22 -08:00
Caio Marcelo de Oliveira Filho
c06ba83589 intel/fs: Lower 64-bit MOVs after lower_load_payload()
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3070>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3070>
2019-12-14 21:12:21 +00:00
Eric Engestrom
d600b19640 intel/compiler: replace 0 pointer with NULL
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-13 20:16:20 +00:00
Eric Engestrom
8074f68b3b intel/compiler: add ASSERTED annotation to avoid "unused variable" warning
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-12-13 20:16:20 +00:00
Tapani Pälli
32ebd4207a intel/compiler: force simd8 when dual src blending on gen8
Patch introduces option to force simd8 and uses it as a workaround for
dual source blending issues seen with skqp (skia testsuite) on gen8.

Fixes following Piglit test on gen8 platforms:
   arb_blend_func_extended-dual-src-blending-issue-1917

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1917
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
c: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 09:42:50 +02:00
Tapani Pälli
f6004bac1f intel/compiler: add newline to limit_dispatch_width message
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 08:13:58 +02:00
Ian Romanick
668635abd2 intel/compiler: Fix 'comparison is always true' warning
Without looking at the assembly or something, I'm not sure what the
compiler does here.  The brw_reg_type enum is marked packed, so I'm
guess that it gets represented as a uint8_t.  That's the only reason I
could think that comparing with -1 would be always true.

This patch adds the same cast that exists in brw_hw_type_to_reg_type.
It might be better to add a #define outside the enum for
BRW_REGISTER_TYPE_INVALID as (enum brw_reg_type)-1.

src/intel/compiler/brw_eu_compact.c: In function ‘has_immediate’:
src/intel/compiler/brw_eu_compact.c:1515:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
 1515 |       return *type != -1;
      |                    ^~
src/intel/compiler/brw_eu_compact.c:1518:20: warning: comparison is always true due to limited range of data type [-Wtype-limits]
 1518 |       return *type != -1;
      |                    ^~

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
CID: 1455194
Fixes: 12d3b11908 ("intel/compiler: Add instruction compaction support on Gen12")
Cc: @mattst88
2019-12-04 15:34:03 -08:00
Ian Romanick
d15344c0f5 intel/compiler: Increase nir_opt_peephole_select threshold
I tried 2, 4, 6, 8, and 10.  8 seemed to be the sweet spot across all
Intel platforms.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>

All Gen7+ platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14736141 -> 14661140 (-0.51%)
instructions in affected programs: 2272413 -> 2197412 (-3.30%)
helped: 8416
HURT: 140
helped stats (abs) min: 1 max: 1152 x̄: 8.99 x̃: 6
helped stats (rel) min: 0.13% max: 42.55% x̄: 4.15% x̃: 3.20%
HURT stats (abs)   min: 1 max: 140 x̄: 4.73 x̃: 1
HURT stats (rel)   min: 0.03% max: 3.44% x̄: 0.87% x̃: 0.60%
95% mean confidence interval for instructions value: -9.36 -8.17
95% mean confidence interval for instructions %-change: -4.14% -3.99%
Instructions are helped.

total cycles in shared programs: 231560416 -> 228585416 (-1.28%)
cycles in affected programs: 126536021 -> 123561021 (-2.35%)
helped: 7092
HURT: 1898
helped stats (abs) min: 1 max: 419320 x̄: 519.02 x̃: 159
helped stats (rel) min: <.01% max: 77.25% x̄: 13.52% x̃: 11.77%
HURT stats (abs)   min: 1 max: 14518 x̄: 371.91 x̃: 36
HURT stats (rel)   min: <.01% max: 103.23% x̄: 5.92% x̃: 2.55%
95% mean confidence interval for cycles value: -514.34 -147.50
95% mean confidence interval for cycles %-change: -9.69% -9.14%
Cycles are helped.

total spills in shared programs: 5763 -> 5848 (1.47%)
spills in affected programs: 1797 -> 1882 (4.73%)
helped: 13
HURT: 13

total fills in shared programs: 17163 -> 16931 (-1.35%)
fills in affected programs: 7214 -> 6982 (-3.22%)
helped: 22
HURT: 19

total sends in shared programs: 730410 -> 730246 (-0.02%)
sends in affected programs: 2705 -> 2541 (-6.06%)
helped: 114
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 1.44 x̃: 1
helped stats (rel) min: 0.60% max: 20.00% x̄: 7.26% x̃: 5.88%
95% mean confidence interval for sends value: -1.55 -1.33
95% mean confidence interval for sends %-change: -7.90% -6.62%
Sends are helped.

LOST:   4
GAINED: 0

Sandy Bridge
total instructions in shared programs: 10760511 -> 10724637 (-0.33%)
instructions in affected programs: 961305 -> 925431 (-3.73%)
helped: 3734
HURT: 110
helped stats (abs) min: 1 max: 151 x̄: 9.66 x̃: 8
helped stats (rel) min: 0.14% max: 41.21% x̄: 4.93% x̃: 3.95%
HURT stats (abs)   min: 1 max: 20 x̄: 1.68 x̃: 1
HURT stats (rel)   min: 0.12% max: 5.41% x̄: 0.88% x̃: 0.52%
95% mean confidence interval for instructions value: -9.76 -8.91
95% mean confidence interval for instructions %-change: -4.90% -4.63%
Instructions are helped.

total cycles in shared programs: 153359411 -> 152991077 (-0.24%)
cycles in affected programs: 11615401 -> 11247067 (-3.17%)
helped: 2725
HURT: 1138
helped stats (abs) min: 1 max: 2844 x̄: 164.27 x̃: 80
helped stats (rel) min: 0.02% max: 48.60% x̄: 7.47% x̃: 3.91%
HURT stats (abs)   min: 1 max: 4351 x̄: 69.69 x̃: 25
HURT stats (rel)   min: 0.02% max: 40.00% x̄: 3.39% x̃: 1.47%
95% mean confidence interval for cycles value: -103.18 -87.52
95% mean confidence interval for cycles %-change: -4.57% -3.97%
Cycles are helped.

total sends in shared programs: 584038 -> 583855 (-0.03%)
sends in affected programs: 3512 -> 3329 (-5.21%)
helped: 157
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 1.17 x̃: 1
helped stats (rel) min: 2.38% max: 25.00% x̄: 6.52% x̃: 6.06%
95% mean confidence interval for sends value: -1.26 -1.07
95% mean confidence interval for sends %-change: -7.17% -5.87%
Sends are helped.

LOST:   23
GAINED: 0

Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8122617 -> 8111592 (-0.14%)
instructions in affected programs: 380503 -> 369478 (-2.90%)
helped: 912
HURT: 86
helped stats (abs) min: 1 max: 129 x̄: 12.19 x̃: 9
helped stats (rel) min: 0.30% max: 39.21% x̄: 3.69% x̃: 2.57%
HURT stats (abs)   min: 1 max: 2 x̄: 1.05 x̃: 1
HURT stats (rel)   min: 0.12% max: 3.64% x̄: 0.54% x̃: 0.36%
95% mean confidence interval for instructions value: -12.00 -10.10
95% mean confidence interval for instructions %-change: -3.56% -3.10%
Instructions are helped.

total cycles in shared programs: 188509780 -> 188534398 (0.01%)
cycles in affected programs: 7211542 -> 7236160 (0.34%)
helped: 859
HURT: 132
helped stats (abs) min: 2 max: 690 x̄: 46.59 x̃: 16
helped stats (rel) min: 0.01% max: 26.76% x̄: 1.53% x̃: 0.33%
HURT stats (abs)   min: 2 max: 1592 x̄: 489.67 x̃: 618
HURT stats (rel)   min: 0.03% max: 185.92% x̄: 23.35% x̃: 6.26%
95% mean confidence interval for cycles value: 9.58 40.10
95% mean confidence interval for cycles %-change: 0.65% 2.93%
Cycles are HURT.
2019-12-02 16:46:20 -08:00
Ian Romanick
e51eda99df intel/fs: Disable conditional discard optimization on Gen4 and Gen5
The CMP instruction on Gen4 and Gen5 generates one bit (the LSB) of
valid data and 31 bits of junk.  Results of comparisons that are used as
Boolean values need to have a fixup applied to generate the proper 0/~0
values.

Calling fs_visitor::nir_emit_alu with need_dest=false prevents the fixup
code from being generated.  This results in a sequence like:

        cmp.l.f0.0(16)  g8<1>F          g14<8,8,1>F     0x0F  /* 0F */
        ...
        cmp.l.f0.0(16)  g4<1>F          g6<8,8,1>F      0x0F  /* 0F */
(+f0.1) or.z.f0.1(16) null<1>UD g4<8,8,1>UD     g8<8,8,1>UD

instead of

        cmp.l.f0.0(16)  g8<1>F          g14<8,8,1>F     0x0F  /* 0F */
        ...
        cmp.l.f0.0(16)  g4<1>F          g6<8,8,1>F      0x0F  /* 0F */
        or(16) g4<1>UD g4<8,8,1>UD     g8<8,8,1>UD
(+f0.1) and.z.f0.1(16) null<1>UD g4<8,8,1>UD     1UD

I examined a couple of the shaders hurt by this change, and ALL of them
would have been affected by this bug. :(

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1836
Fixes: 0ba9497e66 ("intel/fs: Improve discard_if code generation")

Iron Lake
total instructions in shared programs: 8122757 -> 8122957 (<.01%)
instructions in affected programs: 8307 -> 8507 (2.41%)
helped: 0
HURT: 100
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.84% max: 6.67% x̄: 2.81% x̃: 2.76%
95% mean confidence interval for instructions value: 2.00 2.00
95% mean confidence interval for instructions %-change: 2.58% 3.03%
Instructions are HURT.

total cycles in shared programs: 188510100 -> 188510376 (<.01%)
cycles in affected programs: 76018 -> 76294 (0.36%)
helped: 0
HURT: 55
HURT stats (abs)   min: 2 max: 12 x̄: 5.02 x̃: 4
HURT stats (rel)   min: 0.07% max: 3.75% x̄: 0.86% x̃: 0.56%
95% mean confidence interval for cycles value: 4.33 5.71
95% mean confidence interval for cycles %-change: 0.60% 1.12%
Cycles are HURT.

GM45
total instructions in shared programs: 4994403 -> 4994503 (<.01%)
instructions in affected programs: 4212 -> 4312 (2.37%)
helped: 0
HURT: 50
HURT stats (abs)   min: 2 max: 2 x̄: 2.00 x̃: 2
HURT stats (rel)   min: 0.84% max: 6.25% x̄: 2.76% x̃: 2.72%
95% mean confidence interval for instructions value: 2.00 2.00
95% mean confidence interval for instructions %-change: 2.45% 3.07%
Instructions are HURT.

total cycles in shared programs: 128928750 -> 128928982 (<.01%)
cycles in affected programs: 67442 -> 67674 (0.34%)
helped: 0
HURT: 47
HURT stats (abs)   min: 2 max: 12 x̄: 4.94 x̃: 4
HURT stats (rel)   min: 0.09% max: 3.75% x̄: 0.75% x̃: 0.53%
95% mean confidence interval for cycles value: 4.19 5.68
95% mean confidence interval for cycles %-change: 0.50% 1.00%
Cycles are HURT.
2019-11-21 16:40:50 -08:00
Jason Ekstrand
2fca325ea6 Revert "i965/fs: Merge CMP and SEL into CSEL on Gen8+"
This reverts commit 52c7df1643.  The pass,
while clearly useful for some shaders, has at least three bugs that I
was able to find fairly quickly:

 1. It doesn't work for type-converting MOVs because f > 0 is not the
    same as f2i(f) > 0

 2. CSEL is a 3src instruction and only supports one source type; it
    doesn't take this into account and tries to create instructions
    which do a F compare and a D select.  This is especially nasty to
    debug because you don't see that in the dumped assembly because we
    don't properly assert that types are the same in codegen.

 3. While you can handle 2, in theory, by reinterpreting types, you
    can't do that in the presence of source modifiers.  This pass
    doesn't even attempt to detect that.

Those are just the ones I found with the one almost trival shader I was
debugging.  There very likely may be more and.  Best thing to do for now
is just shut it off until someone has the time to figure out how to do
this properly and write tests to ensure it's correct.

Fixes: 3cb085e6d61a "i965/fs: Merge CMP and SEL into CSEL on Gen8+"
Reviewed-by: Brian Paul <brianp@vmware.com>
2019-11-20 20:47:32 +00:00
Marek Olšák
ebe7579655 nir: move data.image.access to data.access
The size of the data structure doesn't change.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-11-19 18:20:05 -05:00
Iván Briano
ca94717035 intel/compiler: Don't change hstride if not needed
Alignment requirements may have changed the horizontal stride already,
so don't set it if not required to avoid breaking said requirements.

Fixes several tests such as
dEQP-VK.subgroups.vote.graphics.subgroupallequal_int8_t

Signed-off-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-18 14:19:41 -08:00
Jason Ekstrand
d1c4e64a69 intel/compiler: Add a flag to avoid compacting push constants
In vec4, we can just not run the pass.  In fs, things are a bit more
deeply intertwined.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-11-18 18:35:14 +00:00
Italo Nicola
59623f211b intel/compiler: remove old comment
This comment was correct some time ago, but since commit
d3c10ad427, it isn't true anymore.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-11-18 10:20:34 -08:00
Danylo Piliaiev
0904ee0c60 intel/fs: Do not lower large local arrays to scratch on gen7
On gen7 and earlier the scratch space size is limited to 12kB.
By enabling this optimization we may easily exceed this limit
without having any fallback.

arb_compute_shader/linker/bug-93840.shader_test crashes with
this lowering on IVB due to exceeding scratch size limit.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2092
Fixes: 69244fc7
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-11-14 20:08:30 +00:00
Paulo Zanoni
eb6352162d intel/compiler: fix nir_op_{i,u}*32 on ICL
On ICL we have the src1 restriction which is applied through
fix_byte_src() and potentially changes the type of the operands from 8
to 32 bits. When this change happens, we fall into the "else if
(bit_size < 32)" case and miscompute src_type because it takes into
consideration bit_size (8) instead of the adjusted size of temp_op
(32). This results in the shader reading unused memory, giving us
mostly failures, but occasional passes due to whatever was already in
the registers we were reading.

This commit fixes a lot of dEQP subgroup i8vec2 tests on ICL, such as:
    dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2

This can also be verified by simply changing fix_byte_src() to apply
on all platforms.

Fixes: 5847de6e9a ("intel/compiler: don't use byte operands for src1 on ICL")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-11-13 22:13:52 +00:00
Jason Ekstrand
69244fc72a intel/fs: Lower large local arrays to scratch
Shader-db results on Kaby Lake:

    total instructions in shared programs: 14929212 -> 14880028 (-0.33%)
    instructions in affected programs: 72428 -> 23244 (-67.91%)
    helped: 6
    HURT: 2
    helped stats (abs) min: 2165 max: 15981 x̄: 8590.00 x̃: 7624
    helped stats (rel) min: 56.06% max: 74.52% x̄: 67.55% x̃: 72.08%
    HURT stats (abs)   min: 1178 max: 1178 x̄: 1178.00 x̃: 1178
    HURT stats (rel)   min: 350.60% max: 361.35% x̄: 355.97% x̃: 355.97%
    95% mean confidence interval for instructions value: -11947.03 -348.97
    95% mean confidence interval for instructions %-change: -125.72% 202.37%
    Inconclusive result (%-change mean confidence interval includes 0).

    total cycles in shared programs: 368585300 -> 342557344 (-7.06%)
    cycles in affected programs: 28144921 -> 2116965 (-92.48%)
    helped: 6
    HURT: 2
    helped stats (abs) min: 1404978 max: 7766106 x̄: 4353922.00 x̃: 3890682
    helped stats (rel) min: 82.01% max: 95.57% x̄: 89.95% x̃: 92.28%
    HURT stats (abs)   min: 47778 max: 47798 x̄: 47788.00 x̃: 47788
    HURT stats (rel)   min: 278.20% max: 282.98% x̄: 280.59% x̃: 280.59%
    95% mean confidence interval for cycles value: -5900438.73 -606550.27
    95% mean confidence interval for cycles %-change: -140.79% 146.16%
    Inconclusive result (%-change mean confidence interval includes 0).

    total spills in shared programs: 9243 -> 8901 (-3.70%)
    spills in affected programs: 2718 -> 2376 (-12.58%)
    helped: 4
    HURT: 4

    total fills in shared programs: 21831 -> 10141 (-53.55%)
    fills in affected programs: 11804 -> 114 (-99.03%)
    helped: 6
    HURT: 2

    total sends in shared programs: 815912 -> 815912 (0.00%)
    sends in affected programs: 0 -> 0
    helped: 0
    HURT: 0

    LOST:   1
    GAINED: 3

The helped shaders are all compute shaders in Aztec Ruins.  There is
also a compute shader in synmark2 OglCSDof that's helped but it doesn't
show up in above shader-db results because it went from SIMD8 to SIMD16.
That shader improves enough to yield an 15-20% performance boost to the
benchmark as a whole on my KBL laptop.  The hurt shaders are a couple
shaders in Kerbal Space Program and a couple in Aztec Ruins.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-11-11 17:17:02 +00:00
Jason Ekstrand
53bfcdeecf intel/fs: Implement the new load/store_scratch intrinsics
This commit fills in a number of different pieces:

 1. We add support to brw_nir_lower_mem_access_bit_sizes to handle the
    new intrinsics.  This involves simple plumbing work as well as a
    tiny bit of extra logic to always scalarize scratch intrinsics

 2. Add code to brw_fs_nir.cpp to turn nir_load/store_scratch intrinsics
    into byte/dword scattered read/write messages which use the A32
    stateless model.

 3. Add code to lower_surface_logical_send to handle dword scattered
    messages and the A32 stateless model.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-11-11 17:17:02 +00:00
Jason Ekstrand
e2297699de intel/nir: Plumb devinfo through lower_mem_access_bit_sizes
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-11-11 17:17:02 +00:00
Jason Ekstrand
1dff48af05 intel/fs: refactor surface header setup
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-11-11 17:17:02 +00:00
Jason Ekstrand
a0999bc049 intel/fs: Add DWord scattered read/write opcodes
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-11-11 17:17:02 +00:00
Jason Ekstrand
83f04d80b0 intel/nir: Use nir_extract_bits in lower_mem_access_bit_sizes
The new helper solves most of the annoying problems with data wrangling
in brw_nir_lower_mem_access_bit_sizes.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-11-11 17:17:02 +00:00
Paulo Zanoni
b57383a944 intel/compiler: remove the operand restriction for src1 on GLK
Commit 5847de6e9a implemented a restriction that applies to ICL, but
wrongly marked it as also applying to GLK. Reviewers or MR !1125
pointed this, and the commit history shows removal of GLK to parts of
the patch, but it turns there was still a left-over GLK check in the
code.

This code was breaking some of the i8vec2 tests on GLK, for example:
  dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2

Removing the GLK check solves the issue for GLK. I don't see a reason
on why implementing this restriction would actually break GLK, so
there's still more to investigate here since this bug may be affecting
ICL+, but let's apply the real GLK fix while we analyze and discuss
the other possible issues.

Fixes: 5847de6e9a ("intel/compiler: don't use byte operands for src1
on ICL")
BSpec: 3017
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
2019-11-05 00:08:34 +00:00
Ian Romanick
7b3f38ef69 intel/compiler: Report the number of non-spill/fill SEND messages on vec4 too
This make shader-db's report.py work on Haswell and earlier platforms.
The problem is that the script would detect the "sends" output for
scalar shaders and expect in in vec4 shaders too.  When it didn't find
it, the script would fail with:

    Traceback (most recent call last):
      File "./report.py", line 351, in <module>
        main()
      File "./report.py", line 182, in main
        before_count = before[p][m]
    KeyError: 'sends'

Fixes: f192741ddd ("intel/compiler: Report the number of non-spill/fill SEND messages")

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-30 21:27:03 -07:00
Jordan Justen
2b186264cc intel/eu/validate/gen12: Add TGL to eu_validate tests.
These reworks were combined into this patch:

 * Matt Turner: i965: Disable NoDDChk/NoDDClr test on Gen12+
 * Francisco Jerez: intel/eu/validate/gen12: Disable
   qword_low_power_no_depctrl eu_validate test.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-30 14:08:51 -07:00
Matt Turner
12d3b11908 intel/compiler: Add instruction compaction support on Gen12
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-30 11:11:50 -07:00
Matt Turner
c8fbc8823f intel/compiler: Make separate src0/src1 index tables
TGL uses different data (and even a different format!) for each source.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-30 11:11:50 -07:00
Matt Turner
cde73625f8 intel/compiler: Inline get_src_index()
TGL will have separate tables for src0 and src1, so the shared function
will no longer make sense.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-30 11:11:50 -07:00
Matt Turner
d0eff8a539 intel/compiler: Restructure instruction compaction in preparation for Gen12
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-30 11:11:50 -07:00
Matt Turner
ded9fb2b18 intel/compiler: Remove unreachable() from brw_reg_type.c
The EU compaction unit test fuzzes the compaction code by flipping bits.
We use a simple skip_bits() function with a list of reserved bits to
ignore, but for more complex cases like invalid combinations of register
file:type, we need either machinery to check validity or for these
functions to simply inform us whether a combination was valid.

enum brw_reg_type a 4-bit field in brw_reg, so rather than expanding it
with an "INVALID" value, just return -1 and let the caller check for
that.

Scott suggested redefining unreachable() within the unit test to
longjmp() which would allow driver code like this to still use it and
allow the test to handle expected failures like this. If that plan works
out, I plan to revert this.
2019-10-30 11:11:50 -07:00
Jason Ekstrand
24c0545b2d intel/vec4: Set brw_stage_prog_data::has_ubo_pull
In 0e4a75f917, Ken added a flag brw_stage_prog_data which indicates
whether any UBO pulls ever occur.  Unfortunately, he neglected to set
the bit in the vec4 back-end.  This was fine at the time because the
optimization was intended for iris which does not support gen7 and using
the vec4 back-end on Gen8+ requires an environment variable.  We want to
use this in Vulkan which does support Gen7 so we want the information
from the vec4 back-end as well as scalar.

Fixes: 0e4a75f917 "intel/compiler: Record whether any pull constant..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-30 16:05:57 +00:00
Timothy Arceri
7f106a2b5d util: rename list_empty() to list_is_empty()
This makes it clear that it's a boolean test and not an action
(eg. "empty the list").

Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-10-28 11:24:38 +00:00
Danylo Piliaiev
12a8f2616a intel/compiler: Fix C++ one definition rule violations
When building with "-flto" brw::block_data definitions
were colliding.

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-10-28 12:02:40 +02:00
Caio Marcelo de Oliveira Filho
e142061399 intel/fs: Implement scoped_memory_barrier
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-10-24 11:39:56 -07:00
Michel Dänzer
2b1b56cb3a intel/fs: Check for NULL key in fs_visitor constructor
Flagged by UBSan:

../src/intel/compiler/brw_fs_visitor.cpp:986:20: runtime error: member access within null pointer of type 'const struct brw_base_prog_key'
    #0 0x559fadb48556 in fs_visitor::init() ../src/intel/compiler/brw_fs_visitor.cpp:986
    #1 0x559fadb46db3 in fs_visitor::fs_visitor(brw_compiler const*, void*, void*, brw_base_prog_key const*, brw_stage_prog_data*, nir_shader const*, unsigned int, int, brw_vue_map const*) ../src/intel/compiler/brw_fs_visitor.cpp:962
    #2 0x559fad9c7cd8 in saturate_propagation_fs_visitor::saturate_propagation_fs_visitor(brw_compiler*, brw_wm_prog_data*, nir_shader*) (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/fs_saturate_propagation+0x61bcd8)
    #3 0x559fad9960a1 in saturate_propagation_test::SetUp() ../src/intel/compiler/test_fs_saturate_propagation.cpp:65
    #4 0x559fadd7a32d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2402
    #5 0x559fadd65c3b in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) ../src/gtest/src/gtest.cc:2438
    #6 0x559fadd0af75 in testing::Test::Run() ../src/gtest/src/gtest.cc:2470
    #7 0x559fadd0d8a4 in testing::TestInfo::Run() ../src/gtest/src/gtest.cc:2656
    #8 0x559fadd10032 in testing::TestCase::Run() ../src/gtest/src/gtest.cc:2774
    #9 0x559fadd2ba0c in testing::internal::UnitTestImpl::RunAllTests() ../src/gtest/src/gtest.cc:4649
    #10 0x559fadd7df46 in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2402
    #11 0x559fadd69613 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) ../src/gtest/src/gtest.cc:2438
    #12 0x559fadd2302e in testing::UnitTest::Run() ../src/gtest/src/gtest.cc:4257
    #13 0x559fadda2d61 in RUN_ALL_TESTS() ../src/gtest/include/gtest/gtest.h:2233
    #14 0x559fadda2c21 in main ../src/gtest/src/gtest_main.cc:37
    #15 0x7fe8f6748bba in __libc_start_main ../csu/libc-start.c:308
    #16 0x559fad9950f9 in _start (/home/daenzer/src/mesa-git/mesa/build-amd64-sanitize/src/intel/compiler/fs_saturate_propagation+0x5e90f9)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
2019-10-24 16:20:04 +02:00