Commit Graph

141 Commits

Author SHA1 Message Date
Jason Ekstrand
8379993223 intel/fs: Drop fs_visitor::emit_alpha_to_coverage_workaround()
It no longer exists.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16506>
2022-05-27 14:33:53 +00:00
Kenneth Graunke
9886615958 intel/compiler: Move spill/fill tracking to the register allocator
Originally, we had virtual opcodes for scratch access, and let the
generator count spills/fills separately from other sends.  Later, we
started using the generic SHADER_OPCODE_SEND for spills/fills on some
generations of hardware, and simply detected stateless messages there.

But then we started using stateless messages for other things:
- anv uses stateless messages for the buffer device address feature.
- nir_opt_large_constants generates stateless messages.
- XeHP curbe setup can generate stateless messages.

So counting stateless messages is not accurate.  Instead, we move the
spill/fill accounting to the register allocator, as it generates such
things, as well as the load/store_scratch intrinsic handling, as those
are basically spill/fills, just at a higher level.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16691>
2022-05-25 06:56:01 +00:00
Caio Oliveira
f82731d0d7 intel/fs: Fix IsHelperInvocation for the case no discard/demote are used
Use emit_predicate_on_sample_mask() helper that does check where to
get the correct mask depending on whether discard/demote was used or
not.

Fixes: 45f5db5a84 ("intel/fs: Implement "demote to helper invocation"")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15400>
2022-03-25 08:20:27 +00:00
Sagar Ghuge
6031ad4bf6 intel/fs: Add Wa_22013689345
v2: Use a simpler framework (Lionel)

v3: Rebase, add task/mesh (Lionel)

v4: Fixup fence exec size (SIMDX -> SIMD1)

v5: Fix invalidate_analysis, add finishme comment (Curro)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: 22.0 <mesa-stable>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14947>
2022-03-17 14:18:02 +00:00
Dave Airlie
e12b0d0d60 intel/compiler: remove gfx6 gather wa from backend.
Crocus lowers this in the frontend, they key member is still used
but reset prior to backend.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14202>
2021-12-22 21:37:55 +00:00
Jason Ekstrand
cf98a3cc19 intel/fs: Use OPT() for split_virtual_grfs
Now that we're being conservative in the pass, it's easy to tell when it
makes progress and we can put it in the OPT() macro.  This way, we get
nice INTEL_DEBUG=optimizer dumps for it.  While we're here, fix the
header comment which is massively out-of-date.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
2021-12-18 01:46:19 +00:00
Jason Ekstrand
a580fd55e1 intel/fs: Rework emit_samplepos_setup()
This rolls compute_sample_position into emit_samplepos_setup, its only
caller, by using a loop instead of calling it twice.  We also
early-return for the !persample_dispatch case instead of doing it as
part of the sample calculation.  This means that we don't call
fetch_payload_reg() to get sample_pos_reg unless we're actually going to
use it so the function is safe to call even if we haven't set up
sample_pos_reg.  This will be important for the next commit.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>
2021-12-17 16:02:16 +00:00
Jason Ekstrand
ac7255ed1e intel/fs: Return fs_reg directly from builtin setup helpers
There's no good reason why we're allocating them on the heap and
returning a pointer.  Return the fs_reg directly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14198>
2021-12-17 16:02:16 +00:00
Jason Ekstrand
4fa58d27a5 intel/fs,vec4: Drop support for shader time
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14056>
2021-12-10 21:20:47 +00:00
Jason Ekstrand
8f3c100d61 intel/fs,vec4: Drop uniform compaction and pull constant support
The only driver using these was i965 and it's gone now.  This is all
dead code.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14056>
2021-12-10 21:20:47 +00:00
Caio Oliveira
70ace2bbcd intel/compiler: Implement Task Output and Mesh Input
Implement the output written by the task *workgroup* and available to
all the mesh *workgroups* dispatched from that task.  We currently
ignore any layout annotations (since they are not really testable) and
produce a (packed) layout ourselves.

The URB messages are only SIMD8, so for larger SIMDs, the functions
will produce multiple messages.  Making this lowering here instead of
the generic lower_simd_width() since it is not just a matter of
zip/unzip, e.g. the offset must be adjusted.

Indirect writes/reads are implemented by handling one component at a
time and using the PER_SLOT variant of the messages.

Note that VK_NV_mesh_shader allows reading outputs, so add support for
that as well.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
2021-12-04 00:41:46 +00:00
Caio Oliveira
db23c41537 intel/compiler: Add backend compiler basics for Task/Mesh
Task/Mesh stages are CS-like stages, and include many
builtins (e.g. workgroup ID/index) and intrinsics (e.g. workgroup
memory primitives) originally present only in CS.

This commit add two new stages (task and mesh) that 'inherit' from CS
by embedding a brw_cs_prog_data in their own prog_data structure, so
that CS functionality can be easily reused.  They also currently use
the same helpers to select the SIMD variant to use -- that was
recently added for CS.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
2021-12-04 00:41:46 +00:00
Caio Oliveira
827cf65a26 intel/compiler: Export brw_nir_lower_simd
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
2021-12-04 00:41:46 +00:00
Caio Oliveira
be89ea3231 intel/compiler: Handle per-primitive inputs in FS
In Fragment Shader, regular inputs are laid out in the thread payload
in a one dword per each half-GRF, that gives room for having the two
delta dwords needed for interpolation.

Per-primitive inputs are laid out before the regular inputs, and since
there's no need to have delta information, they are packed.  So
half-GRF will be fully filled with 4 dwords of input.

When num_per_primitive_inputs is zero (the default case), behavior
should be the same as before.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
2021-12-04 00:41:46 +00:00
Caio Oliveira
7938c38778 intel/compiler: Properly lower WorkgroupId for Task/Mesh
Task/Mesh currently only support a single dimension (both in NV API
and HW), so make Y and Z be zero.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13661>
2021-12-04 00:41:46 +00:00
Marcin Ślusarz
2cf189cc88 intel/fs: use stack for temporary array
"regs" is an array of 2 ->
  "m" must be <= 2 ->
  "components" array can be allocated on the stack

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11575>
2021-06-28 09:44:40 +00:00
Jason Ekstrand
705395344d intel/fs: Add support for compiling bindless shaders with resume shaders
Instead of depending on the driver to compile each resume shader
separately, we compile them all in one go in the back-end and build an
SBT as part of the shader program.  Shader relocs are used to make the
entries in the SBT point point to the correct resume shader.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8637>
2021-06-22 21:09:25 +00:00
Jason Ekstrand
ebba3cad81 intel/vec4: Add support for UBO pushing
Shader-db results on Haswell (vec4 only):

    total instructions in shared programs: 2853928 -> 2726576 (-4.46%)
    instructions in affected programs: 855840 -> 728488 (-14.88%)
    helped: 9500
    HURT: 18
    helped stats (abs) min: 1 max: 359 x̄: 13.54 x̃: 11
    helped stats (rel) min: 0.44% max: 53.33% x̄: 19.13% x̃: 17.44%
    HURT stats (abs)   min: 4 max: 124 x̄: 71.00 x̃: 92
    HURT stats (rel)   min: 3.64% max: 77.86% x̄: 46.43% x̃: 52.12%
    95% mean confidence interval for instructions value: -13.78 -12.98
    95% mean confidence interval for instructions %-change: -19.21% -18.81%
    Instructions are helped.

    total cycles in shared programs: 101822616 -> 60245580 (-40.83%)
    cycles in affected programs: 93312382 -> 51735346 (-44.56%)
    helped: 13292
    HURT: 4506
    helped stats (abs) min: 2 max: 1229260 x̄: 3370.82 x̃: 776
    helped stats (rel) min: 0.04% max: 96.70% x̄: 47.56% x̃: 43.76%
    HURT stats (abs)   min: 2 max: 17644 x̄: 716.37 x̃: 82
    HURT stats (rel)   min: 0.02% max: 491.80% x̄: 41.00% x̃: 11.11%
    95% mean confidence interval for cycles value: -3037.07 -1635.03
    95% mean confidence interval for cycles %-change: -26.03% -24.25%
    Cycles are helped.

    total spills in shared programs: 1080 -> 1314 (21.67%)
    spills in affected programs: 74 -> 308 (316.22%)
    helped: 0
    HURT: 47

    total fills in shared programs: 310 -> 497 (60.32%)
    fills in affected programs: 71 -> 258 (263.38%)
    helped: 0
    HURT: 47

    total sends in shared programs: 239884 -> 151799 (-36.72%)
    sends in affected programs: 129302 -> 41217 (-68.12%)
    helped: 9547
    HURT: 0
    helped stats (abs) min: 1 max: 226 x̄: 9.23 x̃: 8
    helped stats (rel) min: 3.12% max: 98.15% x̄: 72.38% x̃: 80.00%
    95% mean confidence interval for sends value: -9.48 -8.98
    95% mean confidence interval for sends %-change: -72.80% -71.97%
    Sends are helped.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10571>
2021-05-19 14:38:13 +00:00
Lionel Landwerlin
6d4070f3dd intel/compiler: add support for fragment coordinate with coarse pixels
v2: Drop new internal opcodes (Jason)
    Simplify code (Jason)

v3: Add Z computation for coarse pixels

v4: Document things a little

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
2021-05-02 20:20:06 +00:00
Lionel Landwerlin
a297061524 intel/compiler: add support for fragment shading rate variable
v2: Drop old register type initializers (Jason)
    Simplify instruction snippet (Jason)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7455>
2021-05-02 20:20:06 +00:00
Anuj Phogat
61e8636557 intel: Rename gen_device prefix to intel_device
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "gen_device" -rIl $SEARCH_PATH | xargs sed -ie "s/gen_device/intel_device/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10241>
2021-04-20 20:06:33 +00:00
Francisco Jerez
a0e0dfe174 intel/fs: Introduce lowering pass to implement derivatives in terms of quad swizzles.
Unfortunately the funky Align1 regions used by the code generator in
order to implement derivatives efficiently aren't available to the
floating-point pipeline on XeHP.  We need to lower them into a number
of pipelined integer shuffle instructions followed by the
floating-point difference computation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10000>
2021-04-16 08:27:35 +00:00
Yevhenii Kharchenko
edd12acbec intel/compiler: remove unused member 'input_vue_map'
v2: Instead of fixing unitialized member 'fs_visitor::input_vue_map'
(as reported by Coverity Scan in defect CID 1474559),
remove unused members 'vec4_tcs_visitor::input_vue_map' and
'fs_visitor::input_vue_map'.

Also fixed 'debug_enabled' argument skipped in a fs_visitor constructor
call from brw_compile_tes().

Signed-off-by: Yevhenii Kharchenko <yevhenii.kharchenko@globallogic.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10040>
2021-04-08 18:20:10 +00:00
Anuj Phogat
1d296484b4 intel: Rename Genx keyword to Gfxx
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "Gen[[:digit:]]+" -rIl $SEARCH_PATH | xargs sed -ie "s/Gen\([[:digit:]]\+\)/Gfx\1/g"

Exclude changes in src/intel/perf/oa-*.xml:
find src/intel/perf -type f \( -name "*.xml" \) | xargs sed -ie "s/Gfx/Gen/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat
b75f095bc7 intel: Rename genx keyword to gfxx in source files
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "gen[[:digit:]]+" -rIl $SEARCH_PATH | xargs sed -ie "s/gen\([[:digit:]]\+\)/gfx\1/g"

Exclude pack.h and xml changes in this patch:
grep -E "gfx[[:digit:]]+_pack\.h" -rIl $SEARCH_PATH | xargs sed -ie "s/gfx\([[:digit:]]\+_pack\.h\)/gen\1/g"
grep -E "gfx[[:digit:]]+\.xml" -rIl $SEARCH_PATH | xargs sed -ie "s/gfx\([[:digit:]]\+\.xml\)/gen\1/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat
c1f3a778de intel: Rename GENx prefix in macros to GFXx in source files
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "GEN" -rIl src/intel/genxml | grep -E ".*py" |  xargs sed -ie "s/GEN\([%{]\)/GFX\1/g"
grep -E "[^_]GEN[[:digit:]]+" -rIl $SEARCH_PATH | grep -E ".*(\.c|\.h|\.y|\.l)" | xargs sed -ie "s/\([^_]\)GEN\([[:digit:]]\+\)/\1GFX\2/g"

Leave out renaming GFX12_CCS_E macros. They fall under renaming pattern like "_GEN[[:digit:]]+":
grep -E "GFX12_CCS_E" -rIl $SEARCH_PATH | xargs sed -ie "s/GFX12_CCS_E/GEN12_CCS_E/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Anuj Phogat
abe9a71a09 intel: Rename gen field in gen_device_info struct to ver
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "info\)*(.|->)gen" -rIl $SEARCH_PATH | xargs sed -ie "s/info\()*\)\(\.\|->\)gen/info\1\2ver/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:07 +00:00
Caio Marcelo de Oliveira Filho
7fb1e58651 intel/compiler: Make visitors take debug_enabled as a parameter
The callers already have this value, and we would like to make it
follow different rules other than stage that might not be visible to
the helper function, so just pass explicitly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9779>
2021-03-24 23:18:46 +00:00
Jason Ekstrand
f9d549b2bf intel/fs: Use BRW_OPCODE_HALT for discards
We're about to start using it to implement nir_jump_halt which has
nothing inherently to do with fragment shaders or discards.  May as well
name it for the HW instruction it generates.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>
2020-12-01 16:19:08 -06:00
Jason Ekstrand
e76e359007 intel/fs: Rename PLACEHOLDER_HALT to HALT_TARGET
It's a bit more explicit and will play more nicely with what we're about
to do.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5071>
2020-12-01 16:18:50 -06:00
Jason Ekstrand
7280b0911d intel/compiler: Add support for bindless shaders
The Intel bindless thread dispatch model is very simple.  When a compute
shader is to be used for bindless dispatch, it can request a set of
stack IDs.  These are allocated per-dual-subslice by the hardware and
recycled automatically when the stack ID is returned.  Passed to the
bindless dispatch are a global argument address, a stack ID, and an
address of the BINDLESS_SHADER_RECORD to invoke.  When the bindless
shader is dispatched, it is passed its stack ID as well as the global
and local argument pointers.  The local argument pointer is the address
of the BINDLESS_SHADER_RECORD plus some offset which is specified as
part of the BINDLESS_SHADER_RECORD.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7356>
2020-11-25 05:37:09 +00:00
Marcin Ślusarz
06764e0e5d intel/compiler: use C++ template instead of preprocessor
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7382>
2020-11-03 10:42:29 +00:00
Caio Marcelo de Oliveira Filho
e7e24d5039 intel/fs: Handle nir_intrinsic_terminate
For terminate operation, jump the invocation without predicating on
the rest of the quad being disabled -- which is what is done for
demote and discard.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7150>
2020-10-15 21:40:09 +00:00
Jason Ekstrand
06ebf23283 intel/fs: Add a SCRATCH_HEADER opcode
This opcode is responsible for setting up the buffer base address and
per-thread scratch space fields of a scratch message header.  For the
most part, it's a copy of g0 but some messages need us to zero out g0.2
and the bottom bits of g0.5.

This may actually fix a bug when nir_load/store_scratch is used.  The
docs say that the DWORD scattered messages respect the per-thread
scratch size specified in gN.3[3:0] in the message header but we've been
leaving it zero.  This may mean that we've been ignoring any scratch
reads/writes from a load/store_scratch intrinsic above the 1KB mark.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7084>
2020-10-13 21:59:27 +00:00
Jason Ekstrand
0d462dbee5 intel/fs: Add an alignment to VARYING_PULL_CONSTANT_LOAD_LOGICAL
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3932>
2020-10-08 01:14:46 -05:00
Marcin Ślusarz
40b964dc8f intel/compiler: remove unused fs_validator::param_size
Found by Coverity as unitialized variable.

Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6667>
2020-09-10 12:16:58 +00:00
Jason Ekstrand
90b6745bc8 intel/fs,vec4: Stuff the constant data from NIR in the end of the program
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6244>
2020-09-02 19:48:44 +00:00
Matt Turner
66111bc95a intel/compiler: Drop opt_sampler_eot()
Gen9 and Cherryview have the ability to mark texture instructions with
the End-of-thread bit under some conditions, which allows the texture
result to be written to the render target directly, rather than
returning to the EU.

In order to handle overlapping primitives correctly, we have to use the
'sendc' instruction which stalls until other threads potentially writing
to the same locations in the render target are retired. Unfortunately,
this stall happens before the texture is sampled (rather than in
parallel with stall), so for some literal edge cases (like the diagonal
edge between two triangles forming a rectangle) there can be a
performance penalty. As a result, it's probably not a good idea to use
this optimization in general.

I had planned to leave it enabled only for BLORP, where we use rectangle
primitives and are typically clearing/blitting an entire render target
without any overlapping primitives, but I noticed that the optimization
wasn't applied in some normal cases anyway. For example, in the piglit
test tests/shaders/glsl-fs-texture2d-bias.shader_test it is applied to
one BLORP-blit shader but not another due to some kind of mishandling of
register types (the destination register type of the texture operation
is UD while the color source of the render target write is F).

Additionally the instruction scheduler assumed that the combined texture
and render target write operation took 0 cycles, leading to cycle
estimates that are wildly inaccurate. Since the optimization was not
implemented for SIMD32 and our decision whether to use the SIMD32
program is made by comparing the estimated performance with that of the
SIMD16 shader, we wrongly threw out a bunch of SIMD32 programs that are
likely profitable.

   total cycles in shared programs: 472807891 -> 473784245 (0.21%)
   cycles in affected programs: 108277 -> 1084631 (901.72%)
   helped: 0
   HURT: 1290

   total sends in shared programs: 998955 -> 1000245 (0.13%)
   sends in affected programs: 1400 -> 2690 (92.14%)
   helped: 0
   HURT: 1290

   LOST:   0
   GAINED: 33

This patch shows no performance changes in Intel's Mesa performance CI.

Given the problems, the lack of evidence that the pass improves
performance, and the fact that the hardware feature was removed from
subsequent GPU generations, I think that the pass is not valuable and
should be removed.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5412>
2020-06-12 19:01:26 +00:00
Caio Marcelo de Oliveira Filho
10d0f39beb intel/fs: Remove min_dispatch_width spilling decision from RA
Move the decision one level up, let brw_compile_*() functions use the
spilling information to decide whether or not a certain width
compilation can spill (passed via run_*() functions).

The min_dispatch_width was used to compare with the dispatch_width and
decide whether "a previous shader is already available, so don't
accept spill".

This is replaced by:

- Not calling run_*() functions if it is know beforehand a smaller width
  already spilled -- since the larger width will spill and fail;

- Explicitly passing whether or not a shader is allowed to spill.  For
  the cases where the smaller width is available and haven't spilled,
  the larger width will be compiled but is only useful if it won't
  spill.

Moving the decision to this level will be useful later for variable
group size, which is a case where we want all the widths to be allowed
to spill.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5142>
2020-05-27 18:16:31 -07:00
Francisco Jerez
6579f562c3 intel/ir: Use brw::performance object instead of CFG cycle counts for codegen stats.
These should be more accurate than the current cycle counts, since
among other things they consider the effect of post-scheduling passes
like the software scoreboard on TGL.  In addition it will enable us to
clean up some of the now redundant cycle-count estimation
functionality in the instruction scheduler.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-04-28 23:01:27 -07:00
Francisco Jerez
188a3659ae intel/ir: Import shader performance analysis pass.
This introduces an analysis pass intended to estimate several
performance statistics of the shader, including cycle count latency
and throughput values, based on static modeling.  It has instruction
performance information more comprehensive than the current scheduling
pass for all platforms between Gen4-11, and works on both the FS and
VEC4 back-end.

The most immediate purpose of this pass is to implement a heuristic
meant to determine whether using SIMD32 dispatch for a fragment shader
can be expected to help more than it hurts.  In addition this will
allow the effect of passes run after scheduling (e.g. the TGL software
scoreboard pass and the VEC4 dependency control pass) to be visible in
shader-db statistics.

But that isn't the end of the story, other potential applications of
this pass (not part of this MR) I've been playing around with are:

 - Implement a similar SIMD16 heuristic allowing the identification of
   inefficient SIMD16 fragment shaders.

 - Implement similar SIMD16 and SIMD32 heuristics for the compute
   shader stage -- Currently compute shader builds always use the
   SIMD16 shader if available and never use the SIMD32 shader unless
   strictly necessary, which is suboptimal under certain conditions.

 - Hook up to the instruction scheduler in order to improve the
   accuracy of its timing information.

 - Use as heuristic in order to drive the selection of scheduling
   modes (Matt was experimenting with that).

 - Plug to the TGL software scoreboard pass in order to implement a
   more effective SBID token allocation algorithm, since in general
   the optimal token allocation depends on the timings of all
   instructions in the program.

 - Use its bottleneck detection functionality in order to implement a
   heuristic computing a more optimal bound for the number of fragment
   shader threads executed in parallel (by adjusting the
   MaximumNumberofThreadsPerPSD control of 3DSTATE_PS).

As a follow-up I'm planning to submit updated timing information for
Gen12 platforms -- Everything else required to support Gen12 like SWSB
handling is already included in this patch, but there were some IP
concerns regarding the TGL timing parameters since they cannot
currently be obtained with the documentation and hardware which is
publicly available.  The timing parameters for any previous Gen7-11
platforms can be obtained by anyone by sampling the timestamp register
using e.g. shader_time, though I have some more convenient
instrumentation coming up.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-04-28 23:01:03 -07:00
Francisco Jerez
bda1d72dd9 intel/fs: Replace fs_visitor::bank_conflict_cycles() with stand-alone function.
This will be re-usable by the IR performance analysis pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2020-04-28 23:00:29 -07:00
Plamena Manolova
c77dc51203 intel/compiler: Add support for variable workgroup size
Add new builtin parameters that are used to keep track of the group
size.  This will be used to implement ARB_compute_variable_group_size.

The compiler will use the maximum group size supported to pick a
suitable SIMD variant.  A later improvement will be to keep all SIMD
variants (like FS) so the driver can select the best one at dispatch
time.

When variable workgroup size is used, the small workgroup optimization
is disabled as it we can't prove at compile time that the barriers
won't be needed.

Extracted from original i965 patch with additional changes by
Caio Marcelo de Oliveira Filho.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4504>
2020-04-09 19:23:12 -07:00
Mathias Fröhlich
630154e77b i965: Move down genX_upload_sbe in profiles.
Avoid looping over all VARYING_SLOT_MAX urb_setup array
entries from genX_upload_sbe. Prepare an array indirection
to the active entries of urb_setup already in the compile
step. On upload only walk the active arrays.

v2: Use uint8_t to store the attribute numbers.
v3: Change loop to build up the array indirection.
v4: Rebase.
v5: Style fix.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Mathias Fröhlich <Mathias.Froehlich@web.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/308>
2020-03-10 14:28:36 +00:00
Matt Turner
bb3e7b0fe3 intel/compiler: Pass shader_stats for each SIMD mode
Passing shader_stats to the fs_generator constructor means that the
SIMD8 shader stats from the visitor (such as the scheduler mode) will be
reported out for the SIMD16/SIMD32 versions as well.

As you can see, we are now passing 'shader_stats' and 'stats' to
generate_code(), which is obviously odd looking. Ian rebased and
committed an old patch of mine which added the shader_stats struct on
July 30 in commit dabb5d4bee (i965/fs: Add a shader_stats struct.) and
shortly after on August 12 Jason added the brw_compile_stats struct in
commit 134607760a (intel/compiler: Fill a compiler statistics struct).

I'd like to combine the two, but I'm not sure how. shader_stats is an
input to generate_code() while brw_compile_stats is an output and is
only used by the Vulkan driver. Leave it as is for now...

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4093>
2020-03-09 04:44:12 +00:00
Matt Turner
75a33e268e intel/compiler: Mark some methods and parameters const
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4093>
2020-03-09 04:44:11 +00:00
Francisco Jerez
e5e4d016b9 intel/compiler: Move register pressure calculation into IR analysis object
This defines a new BRW_ANALYSIS object which wraps the register
pressure computation code along with its result.  For the rationale
see the previous commits converting the liveness and dominance
analysis passes to the IR analysis framework.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:21:10 -08:00
Francisco Jerez
2878817197 intel/compiler: Drop invalidate_live_intervals()
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:21:01 -08:00
Francisco Jerez
ea44de6d8c intel/compiler/fs: Switch liveness analysis to IR analysis framework
This involves wrapping fs_live_variables in a BRW_ANALYSIS object and
hooking it up to invalidate_analysis() so it's properly invalidated.
Seems like a lot of churn but it's fairly straightforward.  The
fs_visitor invalidate_ and calculate_live_intervals() methods are no
longer necessary after this change.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:20:57 -08:00
Francisco Jerez
ba73e606f6 intel/compiler: Move all live interval analysis results into fs_live_variables
This moves the following methods that are currently defined in
fs_visitor (even though they are side products of the liveness
analysis computation) and are already implemented in
brw_fs_live_variables.cpp:

> bool virtual_grf_interferes(int a, int b) const;
> int *virtual_grf_start;
> int *virtual_grf_end;

It makes sense for them to be part of the fs_live_variables object,
because they have the same lifetime as other liveness analysis results
and because this will allow some extra validation to happen wherever
they are accessed in order to make sure that we only ever use
up-to-date liveness analysis results.

This shortens the virtual_grf prefix in order to compensate for the
slightly increased lexical overhead from the live_intervals pointer
dereference.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
2020-03-06 10:20:43 -08:00