Commit Graph

291 Commits

Author SHA1 Message Date
Ian Romanick
7832fb7889 intel/compiler: Use partial redundancy elimination for compares
Almost all of the hurt shaders are repeated instances of the same shader
in synmark's compilation speed tests.

shader-db results:

All Gen6+ platforms had similar results. (Skylake shown)
total instructions in shared programs: 15256840 -> 15256389 (<.01%)
instructions in affected programs: 54137 -> 53686 (-0.83%)
helped: 288
HURT: 0
helped stats (abs) min: 1 max: 15 x̄: 1.57 x̃: 1
helped stats (rel) min: 0.06% max: 26.67% x̄: 1.99% x̃: 0.74%
95% mean confidence interval for instructions value: -1.76 -1.38
95% mean confidence interval for instructions %-change: -2.47% -1.50%
Instructions are helped.

total cycles in shared programs: 372286583 -> 372283851 (<.01%)
cycles in affected programs: 833829 -> 831097 (-0.33%)
helped: 265
HURT: 16
helped stats (abs) min: 2 max: 74 x̄: 11.81 x̃: 4
helped stats (rel) min: 0.04% max: 9.07% x̄: 0.99% x̃: 0.35%
HURT stats (abs)   min: 2 max: 130 x̄: 24.88 x̃: 8
HURT stats (rel)   min: <.01% max: 12.31% x̄: 1.44% x̃: 0.27%
95% mean confidence interval for cycles value: -12.30 -7.15
95% mean confidence interval for cycles %-change: -1.06% -0.64%
Cycles are helped.

Iron Lake and GM45 had similar results. (GM45 shown)
total instructions in shared programs: 5038653 -> 5038495 (<.01%)
instructions in affected programs: 13939 -> 13781 (-1.13%)
helped: 50
HURT: 1
helped stats (abs) min: 1 max: 15 x̄: 3.18 x̃: 4
helped stats (rel) min: 0.33% max: 13.33% x̄: 2.24% x̃: 1.09%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.83% max: 0.83% x̄: 0.83% x̃: 0.83%
95% mean confidence interval for instructions value: -3.73 -2.47
95% mean confidence interval for instructions %-change: -3.16% -1.21%
Instructions are helped.

total cycles in shared programs: 128118922 -> 128118228 (<.01%)
cycles in affected programs: 134906 -> 134212 (-0.51%)
helped: 50
HURT: 0
helped stats (abs) min: 2 max: 60 x̄: 13.88 x̃: 18
helped stats (rel) min: 0.06% max: 3.19% x̄: 0.74% x̃: 0.70%
95% mean confidence interval for cycles value: -16.54 -11.22
95% mean confidence interval for cycles %-change: -0.95% -0.53%
Cycles are helped.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-28 15:35:53 -07:00
Jason Ekstrand
08f804ec0c anv,radv,turnip: Lower TG4 offsets with nir_lower_tex
v2: turn on for turnip as well (Karol Herbst)

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-03-21 02:58:41 +00:00
Jason Ekstrand
d3386e73c5 intel/nir: Lower array-deref-of-vector UBO and SSBO loads
This fixes a serious performance issue with DXVK:

https://github.com/doitsujin/dxvk/issues/937

This was caused by a recent change that to improve performance on RADV
which back-fired on ANV and killed performance for some apps:

e5a06d3f4a

Throwing in this bit of lowering lets us come along and CSE those UBO
loads (or copy-prop for SSBO load) and get one load where we previously
would have gotten several.

VkPipeline-db results on Kaby Lake:

    total instructions in shared programs: 5115361 -> 5073185 (-0.82%)
    instructions in affected programs: 1754333 -> 1712157 (-2.40%)
    helped: 5331
    HURT: 63

    total cycles in shared programs: 2544501169 -> 2481144545 (-2.49%)
    cycles in affected programs: 2531058653 -> 2467702029 (-2.50%)
    helped: 9202
    HURT: 4323

    total loops in shared programs: 3340 -> 3331 (-0.27%)
    loops in affected programs: 9 -> 0
    helped: 9
    HURT: 0

    total spills in shared programs: 3246 -> 3053 (-5.95%)
    spills in affected programs: 384 -> 191 (-50.26%)
    helped: 10
    HURT: 5

    total fills in shared programs: 4626 -> 4452 (-3.76%)
    fills in affected programs: 439 -> 265 (-39.64%)
    helped: 10
    HURT: 5

All of the shaders with hurt spilling were in Rise of the Tomb Raider
which also had shaders solidly helped in the spilling department.  Not
shown in those results (because I've not had success dumping the
shaders) is Witcher 3 where this reduces spilling and improves over-all
perf by around 20-25%.  There were no shader-db changes.  Apparently,
this just isn't a pattern that happens in OpenGL.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: "19.0" mesa-stable@lists.freedesktop.org
2019-03-15 23:10:27 -05:00
Caio Marcelo de Oliveira Filho
65e8761474 intel/nir: Combine store_derefs to improve code from SPIR-V
Due to lack of write mask in SPIR-V store, generators may produce
multiple stores to the same vector but using different array derefs.
Use the combining store pass to clean this up.  For example,

    layout(binding = 3) buffer block {
        vec4 v;
    };

    void main() {
        v.x = 11;
        v.y = 22;
    }

after going to SPIR-V and NIR, ends up with in two store_derefs to
v[0] and v[1]

    vec2 32 ssa_4 = deref_struct &ssa_3->field0 (ssbo vec4) /* &((block *)ssa_2)->field0 */
    vec2 32 ssa_6 = deref_array &(*ssa_4)[0] (ssbo float) /* &((block *)ssa_2)->field0[0] */
    intrinsic store_deref (ssa_6, ssa_7) (1, 0) /* wrmask=x */ /* access=0 */
    vec1 32 ssa_13 = load_const (0x00000001 /* 0.000000 */)
    vec2 32 ssa_14 = deref_array &(*ssa_4)[1] (ssbo float) /* &((block *)ssa_2)->field0[1] */
    intrinsic store_deref (ssa_14, ssa_15) (1, 0) /* wrmask=x */ /* access=0 */

producing two different sends instructions in skl.  The combining pass
transform the snippet above into

    vec2 32 ssa_4 = deref_struct &ssa_3->field0 (ssbo vec4) /* &((block *)ssa_2)->field0 */
    vec4 32 ssa_18 = vec4 ssa_7, ssa_15, ssa_16, ssa_17
    intrinsic store_deref (ssa_4, ssa_18) (3, 0) /* wrmask=xy */ /* access=0 */

producing a single sends instruction.

v2: Move this from spirv_to_nir into the general optimization pass for
    intel compiler.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-13 08:39:16 -07:00
Caio Marcelo de Oliveira Filho
10dfb0011e intel/nir: Combine store_derefs after vectorizing IO
Shader-db results for skl:

    total instructions in shared programs: 15232903 -> 15224781 (-0.05%)
    instructions in affected programs: 61246 -> 53124 (-13.26%)
    helped: 221
    HURT: 0

    total cycles in shared programs: 371440470 -> 371398018 (-0.01%)
    cycles in affected programs: 281363 -> 238911 (-15.09%)
    helped: 221
    HURT: 0

Results for bdw are very similar.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-13 08:39:16 -07:00
Caio Marcelo de Oliveira Filho
822a8865e4 nir: Add a pass to combine store_derefs to same vector
v2: (all from Jason)
    Reuse existing function for the end of the block combinations.
    Check the SSA values are coming from the right place in tests.
    Document the case when the store to array_deref is reused.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-13 08:39:16 -07:00
Jason Ekstrand
6d5d89d25a intel/nir: Vectorize all IO
The IO scalarization pass that we run to help with linking end up
turning some shader I/O such as that for tessellation and geometry
shaders into many scalar URB operations rather than one vector one.  To
alleviate this, we now vectorize the I/O once again.  This fixes a 10%
performance regression in the GfxBench tessellation test that was caused
by scalarizing.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15224023 -> 15220871 (-0.02%)
    instructions in affected programs: 342009 -> 338857 (-0.92%)
    helped: 1236
    HURT: 443

    total spills in shared programs: 23471 -> 23465 (-0.03%)
    spills in affected programs: 6 -> 0
    helped: 1
    HURT: 0

    total fills in shared programs: 31770 -> 31766 (-0.01%)
    fills in affected programs: 4 -> 0
    helped: 1
    HURT: 0

Cycles was just a lot of churn do to moves being different places.  Most
of the pure churn in instructions was +/- one or two instructions in
fragment shaders.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510
Fixes: 4434591bf5 "intel/nir: Call nir_lower_io_to_scalar_early"
Fixes: 8d8222461f "intel/nir: Enable nir_opt_find_array_copies"
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-03-12 15:34:06 +00:00
Jason Ekstrand
179d254cba intel/nir: Move lower_mem_access_bit_sizes to postprocess_nir
It doesn't really matter where this pass goes as long as it's after we
call nir_lower_explicit_io and before we go into the back-end.  Putting
it brw_postprocess_nir lets us move nir_lower_explicit_io significantly
later in the pipeline.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-03-08 22:03:14 -06:00
Jason Ekstrand
656ace3dd8 intel/nir: Move 64-bit lowering later
Now that we have a loop unrolling cost function and loop unrolling isn't
going to kill us the moment we have a 64-bit op in a loop, we can go
ahead and move 64-bit lowering later.  This gives us the opportunity to
do more optimizations and actually let the full optimizer run even on
64-bit ops rather than hoping one round of opt_algebraic will fix
everything.  This substantially reduces both fp64 shader compile times
and the resulting code size.  On the vs-isnan-dvec test from piglit:

Before this commit:

    1684.63s user 17.29s system 99% cpu 28:28.24 total
    101479 instructions. 0 loops. 802452 cycles. 79:369 spills:fills.
    Peak memory usage (according to massif): 1.435 GB

After this commit:

    179.64s user 7.75s system 99% cpu 3:07.92 total
    57316 instructions. 0 loops. 459287 cycles. 0:0 spills:fills.
    Peak memory usage (according to massif): 531.0 MB

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand
e02959f442 nir/lower_doubles: Inline functions directly in lower_doubles
Instead of trusting the caller to already have created a softfp64
function shader and added all its functions to our shader, we simply
take the softfp64 shader as an argument and do the function inlining
ouselves.  This means that there's no more nasty functions lying around
that the caller needs to worry about cleaning up.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand
8993e0973f intel/nir: Drop an unneeded lower_constant_initializers call
Even though this is technically a step in the function inlining process
as laid out in nir_inline_functions.c, it's not really needed.  We
already have constant initializers lowered here and no new ones are
added by appending the softfp64 functions.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-06 17:24:57 +00:00
Jason Ekstrand
5c96120b5c intel,nir: Lower TXD with min_lod when the sampler index is not < 16
When we have a larger sampler index, we get into the "high sampler"
scenario and need an instruction header.  Even in SIMD8, this pushes the
instruction over the sampler message size maximum of 11 registers.
Instead, we have to lower TXD to TXL.

Fixes: cb98e0755f "intel/fs: Support min_lod parameters on texture..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-03-04 23:56:39 +00:00
Jordan Justen
10c5579921 intel/compiler: Move int64/doubles lowering options
Instead of calculating the int64 and doubles lowering options each
time a shader is preprocessed, save and use the values in
nir_shader_compiler_options.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-03-02 14:33:44 -08:00
Kasireddy, Vivek
7cab8d3661 i965: Add support for sampling from XYUV images
Add support to the i965 DRI driver to sample from XYUV8888 buffers.

Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-02-26 13:08:52 +00:00
Tapani Pälli
3da858a6b9 intel/compiler: add scale_factors to sampler_prog_key_data
Patch propagates given scale_factors to lowering options.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-02-12 08:42:25 +02:00
Karol Herbst
b9fec2b38c nir: replace more nir_load_system_value calls with builder functions
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-21 00:16:51 +01:00
Karol Herbst
9b24028426 nir: rename nir_var_function to nir_var_function_temp
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-19 20:01:41 +01:00
Jason Ekstrand
24c8108ea6 intel/nir: Call nir_opt_deref in brw_nir_optimize
It's an optimization so we should probably be calling it in the
optimization loop.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-01-12 17:55:49 -06:00
Matt Turner
613ac3aaa2 i965: Compile fp64 software routines and lower double-ops
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-01-09 16:42:41 -08:00
Karol Herbst
d0c6ef2793 nir: rename global/local to private/function memory
the naming is a bit confusing no matter how you look at it. Within SPIR-V
"global" memory is memory accessible from all threads. glsl "global" memory
normally refers to shader thread private memory declared at global scope. As
we already use "shared" for memory shared across all thrads of a work group
the solution where everybody could be happy with is to rename "global" to
"private" and use "global" later for memory usually stored within system
accessible memory (be it VRAM or system RAM if keeping SVM in mind).
glsl "local" memory is memory only accessible within a function, while SPIR-V
"local" memory is memory accessible within the same workgroup.

v2: rename local to function as well
v3: rename vtn_variable_mode_local as well

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-01-08 18:51:46 +01:00
Timothy Arceri
50de3f80a8 nir: rename nir_link_constant_varyings() nir_link_opt_varyings()
The following patches will add support for an additional
optimisation so this function will no longer just optimise varying
constants.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02 12:19:17 +11:00
Iago Toral Quiroga
d6110d4d54 intel/compiler: move nir_lower_bool_to_int32 before nir_lower_locals_to_regs
The former expects to see SSA-only things, but the latter injects registers.

The assertions in the lowering where not seeing this because they asserted
on the bit_size values only, not on the is_ssa field, so add that assertion
too.

Fixes: 11dc130779 "nir: Add a bool to int32 lowering pass"
CC: mesa-stable@lists.freedesktop.org
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-20 08:02:44 +01:00
Ian Romanick
af07141b33 intel/compiler: More peephole_select for pre-Gen6
No shader-db changes on any Gen6+ platform.

All of the shaders with cycles hurt by more than ~2% are from Master of
Orion.  All of the shaders have instructions helped.  It looks like the
pass enables some control flow to be converted to bcsels, then the
scheduler does dumb things.  These are new shaders (just added before
doing this shader-db run), so there's probably some low-hanging fruit.

Iron Lake
total instructions in shared programs: 8214327 -> 8213684 (<.01%)
instructions in affected programs: 84469 -> 83826 (-0.76%)
helped: 114
HURT: 26
helped stats (abs) min: 2 max: 18 x̄: 7.75 x̃: 9
helped stats (rel) min: 0.17% max: 13.73% x̄: 2.52% x̃: 1.05%
HURT stats (abs)   min: 2 max: 20 x̄: 9.23 x̃: 8
HURT stats (rel)   min: 0.70% max: 2.48% x̄: 1.66% x̃: 1.61%
95% mean confidence interval for instructions value: -5.87 -3.32
95% mean confidence interval for instructions %-change: -2.32% -1.17%
Instructions are helped.

total cycles in shared programs: 187736850 -> 187749314 (<.01%)
cycles in affected programs: 506750 -> 519214 (2.46%)
helped: 104
HURT: 36
helped stats (abs) min: 2 max: 72 x̄: 21.96 x̃: 16
helped stats (rel) min: 0.02% max: 6.16% x̄: 0.97% x̃: 0.63%
HURT stats (abs)   min: 4 max: 1402 x̄: 409.67 x̃: 40
HURT stats (rel)   min: 0.33% max: 23.12% x̄: 5.79% x̃: 1.39%
95% mean confidence interval for cycles value: 28.32 149.74
95% mean confidence interval for cycles %-change: -0.07% 1.61%
Inconclusive result (%-change mean confidence interval includes 0).

GM45
total instructions in shared programs: 5044014 -> 5043652 (<.01%)
instructions in affected programs: 46751 -> 46389 (-0.77%)
helped: 63
HURT: 13
helped stats (abs) min: 2 max: 29 x̄: 7.65 x̃: 9
helped stats (rel) min: 0.17% max: 13.73% x̄: 2.93% x̃: 1.04%
HURT stats (abs)   min: 2 max: 20 x̄: 9.23 x̃: 8
HURT stats (rel)   min: 0.66% max: 2.35% x̄: 1.58% x̃: 1.52%
95% mean confidence interval for instructions value: -6.54 -2.99
95% mean confidence interval for instructions %-change: -3.04% -1.28%
Instructions are helped.

total cycles in shared programs: 128143042 -> 128150188 (<.01%)
cycles in affected programs: 324564 -> 331710 (2.20%)
helped: 57
HURT: 19
helped stats (abs) min: 6 max: 74 x̄: 30.70 x̃: 32
helped stats (rel) min: 0.08% max: 4.74% x̄: 1.22% x̃: 0.81%
HURT stats (abs)   min: 10 max: 1400 x̄: 468.21 x̃: 60
HURT stats (rel)   min: 0.56% max: 19.94% x̄: 5.80% x̃: 1.70%
95% mean confidence interval for cycles value: 6.90 181.15
95% mean confidence interval for cycles %-change: -0.52% 1.59%
Inconclusive result (%-change mean confidence interval includes 0).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick
378f996771 nir/opt_peephole_select: Don't peephole_select expensive math instructions
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive.  On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.

This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).

v2: Remove stray #if block.  Noticed by Thomas.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick
8fb8ebfbb0 intel/compiler: More peephole select
Shader-db results:

The one shader hurt for instructions is a compute shader that had both
spills and fills hurt.

v2: Fix typo in comment noticed by Caio.

v3: Fix inverted condition in brw_nir.c.  Noticed by Lionel.

Skylake, Broadwell, and Haswell had similar results. (Skylake shown)
total instructions in shared programs: 15072761 -> 15047884 (-0.17%)
instructions in affected programs: 895539 -> 870662 (-2.78%)
helped: 3623
HURT: 1
helped stats (abs) min: 1 max: 181 x̄: 6.89 x̃: 4
helped stats (rel) min: 0.10% max: 25.00% x̄: 3.93% x̃: 3.20%
HURT stats (abs)   min: 92 max: 92 x̄: 92.00 x̃: 92
HURT stats (rel)   min: 1.92% max: 1.92% x̄: 1.92% x̃: 1.92%
95% mean confidence interval for instructions value: -7.10 -6.63
95% mean confidence interval for instructions %-change: -4.03% -3.82%
Instructions are helped.

total cycles in shared programs: 369738930 -> 369535732 (-0.05%)
cycles in affected programs: 68027851 -> 67824653 (-0.30%)
helped: 2609
HURT: 1035
helped stats (abs) min: 1 max: 4508 x̄: 181.44 x̃: 77
helped stats (rel) min: <.01% max: 71.31% x̄: 9.14% x̃: 5.47%
HURT stats (abs)   min: 1 max: 33336 x̄: 261.04 x̃: 20
HURT stats (rel)   min: <.01% max: 47.61% x̄: 2.93% x̃: 1.47%
95% mean confidence interval for cycles value: -96.43 -15.09
95% mean confidence interval for cycles %-change: -6.07% -5.36%
Cycles are helped.

total spills in shared programs: 10158 -> 10159 (<.01%)
spills in affected programs: 166 -> 167 (0.60%)
helped: 1
HURT: 1

total fills in shared programs: 22105 -> 22116 (0.05%)
fills in affected programs: 837 -> 848 (1.31%)
helped: 4
HURT: 1

Ivy Bridge
total instructions in shared programs: 12021190 -> 11990256 (-0.26%)
instructions in affected programs: 910561 -> 879627 (-3.40%)
helped: 3344
HURT: 18
helped stats (abs) min: 1 max: 99 x̄: 9.29 x̃: 6
helped stats (rel) min: 0.11% max: 31.18% x̄: 5.19% x̃: 3.31%
HURT stats (abs)   min: 2 max: 20 x̄: 7.89 x̃: 6
HURT stats (rel)   min: 0.70% max: 2.59% x̄: 1.63% x̃: 1.70%
95% mean confidence interval for instructions value: -9.49 -8.91
95% mean confidence interval for instructions %-change: -5.32% -4.98%
Instructions are helped.

total cycles in shared programs: 179077826 -> 178570196 (-0.28%)
cycles in affected programs: 63205667 -> 62698037 (-0.80%)
helped: 2767
HURT: 620
helped stats (abs) min: 1 max: 7531 x̄: 217.58 x̃: 88
helped stats (rel) min: <.01% max: 75.86% x̄: 9.59% x̃: 6.09%
HURT stats (abs)   min: 1 max: 31255 x̄: 152.27 x̃: 11
HURT stats (rel)   min: <.01% max: 36.36% x̄: 2.77% x̃: 0.58%
95% mean confidence interval for cycles value: -173.94 -125.81
95% mean confidence interval for cycles %-change: -7.68% -6.97%
Cycles are helped.

Sandy Bridge
total instructions in shared programs: 10852569 -> 10843758 (-0.08%)
instructions in affected programs: 235803 -> 226992 (-3.74%)
helped: 800
HURT: 0
helped stats (abs) min: 1 max: 88 x̄: 11.01 x̃: 8
helped stats (rel) min: 0.11% max: 23.08% x̄: 4.69% x̃: 3.36%
95% mean confidence interval for instructions value: -11.93 -10.10
95% mean confidence interval for instructions %-change: -4.99% -4.39%
Instructions are helped.

total cycles in shared programs: 154732047 -> 154608941 (-0.08%)
cycles in affected programs: 4063110 -> 3940004 (-3.03%)
helped: 606
HURT: 253
helped stats (abs) min: 1 max: 2524 x̄: 227.93 x̃: 62
helped stats (rel) min: 0.02% max: 39.24% x̄: 4.36% x̃: 1.81%
HURT stats (abs)   min: 1 max: 1966 x̄: 59.36 x̃: 11
HURT stats (rel)   min: 0.02% max: 67.10% x̄: 3.22% x̃: 0.67%
95% mean confidence interval for cycles value: -170.49 -116.13
95% mean confidence interval for cycles %-change: -2.61% -1.65%
Cycles are helped.

No change on Iron Lake or GM45.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Ian Romanick
09b7e1d8e4 nir/opt_peephole_select: Don't try to remove flow control around indirect loads
That flow control may be trying to avoid invalid loads.  On at least
some platforms, those loads can also be expensive.

No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").

v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select.  Suggested
by Rob.  See also the big comment in src/intel/compiler/brw_nir.c.

v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).

v4: Fix inverted condition in brw_nir.c.  Noticed by Lionel.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2018-12-17 13:47:06 -08:00
Jason Ekstrand
11dc130779 nir: Add a bool to int32 lowering pass
We also enable it in all of the NIR drivers.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Jason Ekstrand
9ebc00f32e i965: Enable nir_opt_idiv_const for 32 and 64-bit integers
The pass should work for all bit sizes but it's less clear that the
extra instructions are worth it on small integers.  Also, the hardware
doesn't do mul_high on anything other than 32-bit integers and, absent
any decent mechanism for testing the pass on 8 and 16-bit types, it's
probably best to just leave it disabled for now.

Shader-db results on Sky Lake:

    total instructions in shared programs: 15105795 -> 15111403 (0.04%)
    instructions in affected programs: 72774 -> 78382 (7.71%)
    helped: 0
    HURT: 265

Note that hurt here actually means helped because we're getting rid of
integer quotient operations (which are a send on some platforms!) and
replacing them with fairly cheap ALU ops.

Reviewed-by: Ian Romanick ian.d.romanick@intel.com
2018-12-13 17:49:48 +00:00
Jason Ekstrand
cb98e0755f intel/fs: Support min_lod parameters on texture instructions
We have to lower some shadow instructions because they don't exist in
hardware and we have to lower txb+offset+clamp because the message gets
too big and we run into the sampler message length limit of 11 regs.

Acked-by: Ian Romanick <ian.d.romanick@intel.com>
2018-12-11 21:26:23 -06:00
Jason Ekstrand
6339aba775 intel/compiler: Lower SSBO and shared loads/stores in NIR
We have a bunch of code to do this in the back-end compiler but it's
fairly specific to typed surface messages and the way we emit them.
This breaks it out into NIR were it's easier to do things a bit more
generally.  It also means we can easily share the code between the vec4
and FS back-ends if we wish.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-11-15 19:59:49 -06:00
Gert Wollny
4bba280937 nir: Allow to skip integer ops in nir_lower_to_source_mods
Some hardware supports source mods only for float operations. Make it
possible to skip lowering to source mods in these cases.

v2: use option flags instead of a boolean (Jason Ekstrand)

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-11-14 08:59:26 +01:00
Timothy Arceri
3561108de0 anv/i965: make use of nir_link_constant_varyings()
shader-db results for SLK:

total instructions in shared programs: 13106498 -> 13091573 (-0.11%)
instructions in affected programs: 1186244 -> 1171319 (-1.26%)
helped: 6186
HURT: 0

total cycles in shared programs: 332062633 -> 331961653 (-0.03%)
cycles in affected programs: 8537165 -> 8436185 (-1.18%)
helped: 5371
HURT: 862

LOST:   6
GAINED: 14

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-11-13 14:06:32 +11:00
Lionel Landwerlin
89785e2d56 i965: add support for sampling from AYUV
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-11-12 13:22:54 +00:00
Jason Ekstrand
5cdeefe057 intel/nir: Use the OPT macro for more passes
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-10-26 11:45:29 -05:00
Jason Ekstrand
28bb6abd1d nir/validate: Print when the validation failed
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2018-10-26 11:45:29 -05:00
Caio Marcelo de Oliveira Filho
c20dd1f77c intel/nir, freedreno/ir3: Use the separated dead write vars pass
No changes to shader-db for intel.
No changes to shader-db expected for freedreno.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-10-15 17:29:46 -07:00
Caio Marcelo de Oliveira Filho
3cf07361ac intel/compiler: Export TCS passthrough creation
Move create_passthrough_tcs() from i965 so can be used in other
contexts.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2018-09-25 09:16:31 -07:00
Dylan Baker
8396043f30 Replace uses of _mesa_bitcount with util_bitcount
and _mesa_bitcount_64 with util_bitcount_64. This fixes a build problem
in nir for platforms that don't have popcount or popcountll, such as
32bit msvc.

v2: - Fix additional uses of _mesa_bitcount added after this was
      originally written

Acked-by: Eric Engestrom <eric.engestrom@intel.com> (v1)
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2018-09-07 10:21:26 -07:00
Jason Ekstrand
67571ae796 intel/compiler: Remove redundant nir_remove_dead_variables call
As of 07a2098a70, brw_nir_optimize calls nir_remove_dead_variables as
the last optimization.  Doing it again is just pointless.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
2018-09-04 09:03:16 -05:00
Lionel Landwerlin
07a2098a70 intel: compiler: remove dead local variables at optimization pass
We're hitting an assert in gfxbench because one of the local variable
is a sampler (according to Jason this isn't valid) :

testfw_app: ../src/compiler/nir_types.cpp:551: void glsl_get_natural_size_align_bytes(const glsl_type*, unsigned int*, unsigned int*): Assertion `!"type does not have a natural size"' failed.

Since this particular variable isn't used, it can be eliminated by
removing unused local variables at the end of the optimization loop.
This makes sense also for valid local variables.

v2: Move additional local variable removal out of optimization loop,
    but before large constant removal (Jason/Lionel)

v3: Move the removal at the end of brw_nir_optimize()

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107806
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-09-03 17:24:19 +01:00
Jason Ekstrand
8d8222461f intel/nir: Enable nir_opt_find_array_copies
We have to be a bit careful with this one because we want it to run in
the optimization loop but only in the first brw_nir_optimize call.
Later calls assume that we've lowered away copy_deref instructions and
we don't want to introduce any more.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15176942 -> 15176942 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

In spite of the lack of any shader-db improvement, this patch completely
eliminates spilling in the Batman: Arkham City tessellation shaders.
This is because we are now able to detect that the temporary array
created by DXVK for storing TCS inputs is a copy of the input arrays and
use indirect URB reads instead of making a copy of 4.5 KiB of input data
and then indirecting on it with if-ladders.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-23 21:47:51 -05:00
Jason Ekstrand
a4a9c07549 intel/nir: Use nir_shrink_vec_array_vars
Shader-db results on Kaby Lake:

    total instructions in shared programs: 15177605 -> 15176765 (<.01%)
    instructions in affected programs: 4259 -> 3419 (-19.72%)
    helped: 1
    HURT: 0

    total spills in shared programs: 10954 -> 10855 (-0.90%)
    spills in affected programs: 295 -> 196 (-33.56%)
    helped: 1
    HURT: 0

    total fills in shared programs: 22222 -> 22117 (-0.47%)
    fills in affected programs: 417 -> 312 (-25.18%)
    helped: 1
    HURT: 0

The helped shader is from the OglCSDof synmark test.  On my Kaby Lake
laptop, the actual framerate of the benchmark didn't appear to improve
beyond the noise.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-23 21:46:56 -05:00
Jason Ekstrand
02a5442dd7 intel/nir: Use the new structure and array splitting passes
We call structure splitting once because it is guaranteed to split all
the structures in the entire shader in one go.  We call array splitting
in the loop in case future optimizations turn indirects into direct
dereferences and we can split more arrays.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15177605 -> 15177605 (0.00%)
    instructions in affected programs: 0 -> 0
    helped: 0
    HURT: 0

This is unsurprising because nir_lower_vars_to_ssa already effectively
does structure and array splitting internally.  It doesn't actually
split the variables but it's ability to reason about aliasing in the
presence of arrays and structures and pick out scalars or vectors to be
lowered to SSA values is fairly advanced.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-08-23 21:44:14 -05:00
Jason Ekstrand
10f44da775 Revert "intel/nir: Call nir_lower_io_to_scalar_early"
Commit 4434591bf5 caused substantially more URB messages in
geometry and tessellation shaders.  Before we can really enable this
sort of optimization,  We either need some way of combining them back
together into vectors or we need to do cross-stage vector element
elimination without splitting everything into scalars.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107510
Fixes: 4434591bf5 "intel/nir: Call nir_lower_io_to_scalar_early"
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Mark Janes <mark.a.janes@intel.com>
2018-08-15 17:56:50 -05:00
Jason Ekstrand
4434591bf5 intel/nir: Call nir_lower_io_to_scalar_early
Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166953 -> 15073611 (-0.62%)
    instructions in affected programs: 2390284 -> 2296942 (-3.91%)
    helped: 16469
    HURT: 505

    total loops in shared programs: 4954 -> 4951 (-0.06%)
    loops in affected programs: 3 -> 0
    helped: 3
    HURT: 0

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-01 18:02:28 -07:00
Jason Ekstrand
b0bb547f78 intel/nir: Split IO arrays into elements
The NIR nir_lower_io_arrays_to_elements pass attempts to split I/O
variables which are arrays or matrices into a sequence of separate
variables.  This can help link-time optimization by allowing us to
remove varyings at a more granular level.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15177645 -> 15168494 (-0.06%)
    instructions in affected programs: 79857 -> 70706 (-11.46%)
    helped: 392
    HURT: 0

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-01 18:02:28 -07:00
Jason Ekstrand
4e060385e9 intel/nir: Use the correct scalar stage for consumers when linking
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2018-08-01 18:02:28 -07:00
Jose Maria Casanova Crespo
030472c1f0 i965: Support for 8-bit base types in helper functions
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-10 00:14:49 +02:00
Ian Romanick
a4d4787327 intel/compiler: More DCE after lowering
Some of the lowering passes, nir_lower_locals_to_regs for example, can
cause some previously live code to be dead.  This pass in particular
leaves a bunch of nir_instr_type_deref instructions floating around.
This causes shader-db runs on Gen5 through Haswell to spew tons of
messages like:

    VS instruction not yet implemented by NIR->vec4

UnrealEngine4/EffectsCaveDemo/239.shader_test is one shader that
generates these messages.  Cleaning up the dead code fixes that.

To verify, I did a shader-db before and after.  Even though all the
messages are gone, the results make my brain hurt. :(

Haswell
total cycles in shared programs: 411890163 -> 411891145 (<.01%)
cycles in affected programs: 57016 -> 57998 (1.72%)
helped: 3
HURT: 11
helped stats (abs) min: 2 max: 154 x̄: 96.67 x̃: 134
helped stats (rel) min: 0.08% max: 2.23% x̄: 1.42% x̃: 1.96%
HURT stats (abs)   min: 18 max: 686 x̄: 115.64 x̃: 20
HURT stats (rel)   min: 0.81% max: 7.12% x̄: 1.87% x̃: 0.93%
95% mean confidence interval for cycles value: -51.39 191.67
95% mean confidence interval for cycles %-change: -0.14% 2.46%
Inconclusive result (value mean confidence interval includes 0).

Ivy Bridge
total cycles in shared programs: 259114802 -> 259115032 (<.01%)
cycles in affected programs: 24034 -> 24264 (0.96%)
helped: 1
HURT: 9
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.08% max: 0.08% x̄: 0.08% x̃: 0.08%
HURT stats (abs)   min: 18 max: 48 x̄: 25.78 x̃: 20
HURT stats (rel)   min: 0.80% max: 1.94% x̄: 1.08% x̃: 0.80%
95% mean confidence interval for cycles value: 12.42 33.58
95% mean confidence interval for cycles %-change: 0.54% 1.38%
Cycles are HURT.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Fixes: 5a02ffb733 nir: Rework lower_locals_to_regs to use deref instructions
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-07-05 21:13:21 -07:00
Ian Romanick
fb6dc8e894 intel/compiler: Silence unused parameter warnings brw_nir.c
src/intel/compiler/brw_nir.c: In function ‘brw_nir_lower_vue_outputs’:
src/intel/compiler/brw_nir.c:464:32: warning: unused parameter ‘is_scalar’ [-Wunused-parameter]
                           bool is_scalar)
                                ^~~~~~~~~
src/intel/compiler/brw_nir.c: In function ‘lower_bit_size_callback’:
src/intel/compiler/brw_nir.c:610:57: warning: unused parameter ‘data’ [-Wunused-parameter]
 lower_bit_size_callback(const nir_alu_instr *alu, void *data)
                                                         ^~~~

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2018-07-02 16:17:19 -07:00