Commit Graph

2101 Commits

Author SHA1 Message Date
Konstantin Seurer
85da294bfe intel: Use nir_test_mask instead of i2b(iand)
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17242>
2022-06-30 18:00:32 +00:00
Lionel Landwerlin
9d7d1c0637 intel/clc: enable fp16 & subgroups for GRL
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17253>
2022-06-27 15:31:49 +00:00
Marcin Ślusarz
42b551fe7f intel/compiler: adjust task payload offsets as late as possible
Otherwise passes which expect offsets to be in bytes (like
brw_nir_lower_mem_access_bit_sizes, called from brw_postprocess_nir)
may produce incorrect results.

Fixes 64-bit load/stores in task/mesh shaders.

Fixes: c36ae42e4c ("intel/compiler: Use nir_var_mem_task_payload")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16196>
2022-06-27 14:14:41 +00:00
Marcin Ślusarz
f4386b81e6 intel: fix typos found by codespell
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17191>
2022-06-27 10:20:55 +00:00
Marcin Ślusarz
f871aa10a1 intel/compiler: assert that base is 0 for [load|store]_shared intrins
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17143>
2022-06-22 10:32:13 +00:00
Marcin Ślusarz
008163f382 intel/compiler: vectorize task payload loads/stores
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17000>
2022-06-20 17:38:20 +00:00
Ian Romanick
676acfe956 intel/fs: Add missing synchronization for WaW dependency
v2: Do the synchronization in the correct place.  Noticed by Curro.

Fixes: b5fa43952a ("intel/fs: Better handle constant sources of FS_OPCODE_PACK_HALF_2x16_SPLIT")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Tested-by: Felix DeGrood <felix.j.degrood@intel.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17037>
2022-06-17 17:05:43 +00:00
Lionel Landwerlin
03e543a422 intel/validator: validate dst/src types against devinfo support
v2: deal with src3_a1/src3_a16 instruction types (Curro)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16985>
2022-06-17 15:43:05 +00:00
Yonggang Luo
0f3064ee44 intel: using C++11 keyword thread_local
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15087>
2022-06-15 17:37:16 +00:00
Jason Ekstrand
844a70f439 intel/compiler: Use NIR_PASS(_, ...)
I don't know when this was added but it's really neat and we should use
it instead of NIR_PASS_V since NIR_DEBUG=print and a few validation
things will work better.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17014>
2022-06-13 22:31:25 +00:00
Francisco Jerez
96e7e92f0d intel/fs/xehp+: Emit scheduling fence for all NIR barriers on platforms with LSC.
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15743>
2022-06-12 12:56:47 +03:00
Tapani Pälli
47773a5d7c intel/fs: setup SEND message descriptor from nir scope
This fixes many tests in following groups on DG2:
   dEQP-VK.memory_model.*
   dEQP-VK.fragment_shader_interlock.*

v2: use memory scope and setup descriptor also
    for barriers without defined scope (Curro),
    use local scope and flush type none with
    NIR_SCOPE_NONE scope, cleanups (Lionel)

v3: use LSC_FENCE_THREADGROUP for NIR_SCOPE_WORKGROUP,
    remove default case (Curro), use eviction if scope
    was not defined, use LSC_FENCE_GPU scope for vertex
    stage

v4: use LSC_FENCE_TILE independent of stage for device
    scope (Curro)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15743>
2022-06-12 12:29:47 +03:00
Kenneth Graunke
a8e718c7e5 intel/compiler: Fix A64 header construction with a uniform address
fs_visitor::assign_curb_setup() maps UNIFORM registers to HW regs,
and contains the following assert:

            assert(inst->src[i].stride == 0);

emit_a64_oword_block_header's striding tricks run afoul of this
restriction, by producing stride 1 values on a 64-bit UNIFORM source.

Work around this by copying the UNIFORM value to a VGRF first.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16938>
2022-06-10 02:14:57 +00:00
Emma Anholt
464b32c030 glsl: Drop the div-to-mul-rcp lowering for floats.
NIR has fdiv, and all the NIR backends have to have lower_fdiv set
appropriately already since various passes (format conversions,
tgsi_to_nir, nir_fast_normalize(), etc.) might generate one.

This causes softpipe and llvmpipe to now do actual divides, since
lower_fdiv is not set there.  Note that llvmpipe's rcp implementation is a
divide of 1.0 by x, so now we're going to be just doing div(x, y) instead
of mul(x, div(1.0, y)).

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16823>
2022-06-07 02:38:42 +00:00
Erik Faye-Lund
2a134347cb intel/compiler: use macro for power-of-two check
This will allow the use of static_assert here instead of our
compiler-specific implementation.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16670>
2022-06-03 07:14:43 +00:00
Paulo Zanoni
72a7d7d7a8 intel/compiler: call ordered_unit() only once at update_inst_scoreboard()
Call it once instead of calling the very same function for each source
and destination. This should make those ternary operators a little
easier to read, IMHO.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15835>
2022-06-02 23:04:39 +00:00
Paulo Zanoni
2256314b08 intel/compiler: split handling of 64 bit floats and ints
In opt_algebraic(), handle TYPE_DF in a different check than TYPE_Q. We have a
separate flag for each type, use separate checks so platforms where one is true
and the other is not can work properly.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15835>
2022-06-02 23:04:39 +00:00
Paulo Zanoni
8f02e6cb19 intel/compiler: compute int64_options based on devinfo->has_64bit_int
Don't compute it based on devinfo->has_64bit_float. Othwerwise we may
end up emitting 64bit-int (Q) instructions on platforms with 64bit
floats but not 64bit integers.

Right now, the only platforms where has_64bit_int is different from
has_64bit_float are the platforms that use GFX7_FEATURES.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15835>
2022-06-02 23:04:39 +00:00
Kenneth Graunke
26bb81f3f6 intel/compiler: Fix uncompaction of signed word immediates on Tigerlake
This expression accidentally performs a 32-bit sign-extension when
processing the second half of the expression (the low 16 bits).

Consider -7W, which is represented as 0xfff9fff9 in our encoding (the
16-bit word is replicated to both halves of the 32-bit dword).

Tigerlake's compaction stores the low 11-bits of an immediate as-is,
and replicates the 12th bit.  So here, compacted_imm will be 0xff9.

   (  (int)(0xff9 << 20) >> 4) |
   ((short)(0xff9 <<  4) >> 4))

   0xfff90000 | (0xff90 >> 4)
   0xfff90000 | 0xfffffff9 ...oops...
   0xfffffff9

By casting the second line of the expression to unsigned short, we
prevent the sign-extension when it combines both parts, so we get:

   0xfff90000 | 0x0000fff9
   0xfff9fff9

Fixes: 12d3b11908 ("intel/compiler: Add instruction compaction support on Gen12")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16833>
2022-06-02 13:59:38 -07:00
Jason Ekstrand
dfedeccc13 intel: Only set VectorMaskEnable when needed
For cases with lots of very small primitives, this may improve
performance because we're not executing those dead channels all the
time.

Shader-db reports no instruction or cycle-count changes.  However, by
hacking up the driver to report when this optimization triggers, it
appears to affect about 10% of shader-db.

v2 (Kenneth Graunke): Always enable VMask prior to XeHP for now,
because using VMask on those platforms allows us to perform the
eliminate_find_live_channel() optimization.  However, XeHP doesn't
seem to have packed fragment shader dispatch, so we lose that
optimization regardless, and there's no reason not to avoid vmask.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1054>
2022-05-27 21:52:48 +00:00
Jason Ekstrand
1b9248e761 intel/fs: Copy color_outputs_valid into wm_prog_data
Fixes: 36ee2fd61c ("anv: Implement the basic form of VK_EXT_transform_feedback")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16506>
2022-05-27 14:33:53 +00:00
Jason Ekstrand
8379993223 intel/fs: Drop fs_visitor::emit_alpha_to_coverage_workaround()
It no longer exists.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16506>
2022-05-27 14:33:53 +00:00
Lionel Landwerlin
e666089082 intel/disasm: add missing handling of <1;1,0>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 7cd9adeb41 ("intel/compiler: In XeHP prefer <1;1,0> regions before compacting")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16704>
2022-05-26 06:42:16 +00:00
Kenneth Graunke
9886615958 intel/compiler: Move spill/fill tracking to the register allocator
Originally, we had virtual opcodes for scratch access, and let the
generator count spills/fills separately from other sends.  Later, we
started using the generic SHADER_OPCODE_SEND for spills/fills on some
generations of hardware, and simply detected stateless messages there.

But then we started using stateless messages for other things:
- anv uses stateless messages for the buffer device address feature.
- nir_opt_large_constants generates stateless messages.
- XeHP curbe setup can generate stateless messages.

So counting stateless messages is not accurate.  Instead, we move the
spill/fill accounting to the register allocator, as it generates such
things, as well as the load/store_scratch intrinsic handling, as those
are basically spill/fills, just at a higher level.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16691>
2022-05-25 06:56:01 +00:00
Kenneth Graunke
59bfc9c6cb intel: Fix analysis invalidation in eliminate_find_live_channel
If we saw a HALT instruction, we would forget to invalidate our analysis
pass information before returning progress.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16677>
2022-05-24 22:36:39 +00:00
Timothy Arceri
d7a071a28f gallium/drivers: set force_indirect_unrolling_sampler for all required drivers
This is set to true for all drivers that have a GLSL level
of support lower than 4.00. This matches the rule for setting the
GLSL IR option EmitNoIndirectSampler.

Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16543>
2022-05-17 02:12:21 +00:00
Marcin Ślusarz
9acb30c8c4 intel/compiler: implement primitive shading rate for mesh
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16030>
2022-05-13 13:05:51 +00:00
Marcin Ślusarz
29a778fa6b intel/compiler: print name of the unhandled intrinsic
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16493>
2022-05-13 09:43:02 +00:00
Marcin Ślusarz
65ff6932dc intel/compiler: handle gl_Viewport and gl_Layer in FS URB setup
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16493>
2022-05-13 09:43:02 +00:00
Marcin Ślusarz
040062df41 intel/compiler: handle VARYING_SLOT_CULL_PRIMITIVE in mesh
It's needed for gl_MeshPerPrimitiveNV[].gl_ViewportMask

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16493>
2022-05-13 09:43:02 +00:00
Vadym Shovkoplias
55c71217ec driconf: Add a limit_trig_input_range option
With this option enabled range of input values for fsin and fcos is
limited to [-2*pi : 2*pi] by calculating the reminder after 2*pi modulo
division. This helps to improve calculation precision for large input
arguments on Intel.

-v2: Add limit_trig_input_range option to prog_key to update shader
     cache (Lionel)

Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16388>
2022-05-13 06:47:53 +00:00
Jason Ekstrand
352e32e5ba nir/builder: Add a nir_trim_vector helper
This pattern pops up a bunch and the semantics of nir_channels() aren't
very convenient much of the time.  Let's add a nir_trim_vector() which
matches nir_pad_vector().

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16309>
2022-05-11 14:47:33 +00:00
Karol Herbst
9c5fd100cc nir: add a nir_remove_non_entrypoints helper
This code just got duplicated a lot. There is still more, but the
remaining instances do a bit more than just removing other functions.

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16348>
2022-05-10 03:37:44 +00:00
Emma Anholt
3a42e92a4f glsl: Drop the dead MOD_TO_FLOOR path.
It's now called lower_fmod in NIR.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8044>
2022-05-05 22:25:03 +00:00
Caio Oliveira
7cd9adeb41 intel/compiler: In XeHP prefer <1;1,0> regions before compacting
Ken performed some tests with shader-db to evaluate the effects

```
Across all 145,848 shaders generated, the results were:

Total bytes compacted before: 3,326,224
Total bytes compacted after: 60,963,280
```

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15399>
2022-05-02 18:03:01 +00:00
Emma Anholt
536c8ee96d nir/lower_tex: Make the adding a 0 LOD to nir_op_tex in the VS optional.
This controls the whole lowering of "make tex ops with implicit
derivatives on non-implicit-derivative stages be tex ops with an explicit
lod of 0 instead", but it's really hard to describe that in a git commit
summary.

All existing callers get it added except:
- nir_to_tgsi which didn't want it.
- nouveau, which didn't want it (fixes regressions in shadowcube and
  shadow2darray with NIR, since the shading languages don't expose txl of
  those sampler types and thus it's not supported in HW)
- optional lowering passes in mesa/st (lower_rect, YUV lowering, etc)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16156>
2022-04-28 21:26:08 +00:00
Lionel Landwerlin
8ef8e72aac intel/fs: tidy up lower of ray queries
We already expect a single function.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15946>
2022-04-19 12:56:06 +00:00
Marcin Ślusarz
5dace41c10 intel/compiler: invalidate metadata in brw_nir_initialize_mue
New "if" blocks may have been inserted.

Fixes: bc4f8c073a ("intel/compiler: inject MUE initialization")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15924>
2022-04-19 11:43:55 +00:00
Marcin Ślusarz
4fddef33d5 intel/compiler: invalidate all metadata in brw_nir_lower_intersection_shader
New "if" blocks were inserted.

Fixes: 303378e1dd ("intel/rt: Add lowering for combined intersection/any-hit shaders")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15924>
2022-04-19 11:43:55 +00:00
Alexey Bozhenko
2d7d907ad1 intel/compiler: fix singleton pointer coverity warning
fix brw_kernel::stats member that was declared as a variable
but used as a pointer to array of 3 elements

CID: 1503279

Signed-off-by: Bozhenko Alexey <oleksii.bozhenko@globallogic.com>

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15975>
2022-04-19 12:36:10 +03:00
Lionel Landwerlin
04bd007757 intel/fs: require memory fence commit bit on Gfx9
Fixes a hang on Gfx9 GT1 : dEQP-VK.compute.zero_initialize_workgroup_memory.max_workgroup_memory.128

Tested-by: Mark Janes <markjanes@swizzler.org>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15596>
2022-04-17 21:24:17 +00:00
Jason Ekstrand
ad0dc8e4ab intel/compiler: Set lower_fisnormal
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15985>
2022-04-16 00:26:43 +00:00
Lionel Landwerlin
e11bedb9f5 intel/fs: add a note on possible optimization of root node address
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15910>
2022-04-13 11:24:49 +00:00
Lionel Landwerlin
9c0805ef91 intel/fs: fix metadata preserve on trace_ray intrinsic
c78be5da30 ("intel/fs: lower ray query intrinsics") introduced a
helper function using nir_(push|pop)_if which invalidated dominance &
block_index for the replacement of nir_intrinsic_rt_trace_ray.

We can still keep dominance/block_index metadata for the lowering of
nir_intrinsic_rt_execute_callable though.

This change uses 2 different lowering function with correct metadata
preservation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15910>
2022-04-13 11:24:49 +00:00
Jason Ekstrand
69b5424ea4 intel/nir: Lower 8 and 16-bit bitwise unops
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15829>
2022-04-12 23:19:38 +00:00
Jason Ekstrand
a482877c70 intel/fs: Implement 16-bit [ui]mul_high
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15829>
2022-04-12 23:19:38 +00:00
Mykhailo Skorokhodov
9c7e750ffe intel/fs: Enable b2f(inot(a)) and b2i(inot(a)) optimization for Gfx12+
The commit enables the optimization for Intel Gfx12+ graphics.

Tigerlake
```
total instructions in shared programs: 1289326 -> 1289015 (-0.02%)
instructions in affected programs: 37841 -> 37530 (-0.82%)
helped: 78
HURT: 9
helped stats (abs) min: 1 max: 26 x̄: 4.69 x̃: 3
helped stats (rel) min: 0.10% max: 12.50% x̄: 2.07% x̃: 1.21%
HURT stats (abs)   min: 1 max: 18 x̄: 6.11 x̃: 4
HURT stats (rel)   min: 0.16% max: 1.95% x̄: 0.94% x̃: 0.61%
95% mean confidence interval for instructions value: -4.95 -2.20
95% mean confidence interval for instructions %-change: -2.34% -1.18%
Instructions are helped.

total cycles in shared programs: 105606388 -> 105606442 (<.01%)
cycles in affected programs: 620119 -> 620173 (<.01%)
helped: 49
HURT: 28
helped stats (abs) min: 2 max: 3618 x̄: 228.63 x̃: 12
helped stats (rel) min: 0.02% max: 23.31% x̄: 4.60% x̃: 1.11%
HURT stats (abs)   min: 1 max: 2142 x̄: 402.04 x̃: 29
HURT stats (rel)   min: 0.01% max: 36.42% x̄: 5.01% x̃: 0.46%
95% mean confidence interval for cycles value: -151.80 153.20
95% mean confidence interval for cycles %-change: -3.00% 0.79%
Inconclusive result (value mean confidence interval includes 0).
```

Related-to: 7725d60938
Signed-off-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14017>
2022-04-12 10:55:05 +00:00
Ian Romanick
b5fa43952a intel/fs: Better handle constant sources of FS_OPCODE_PACK_HALF_2x16_SPLIT
I noticed that a *LOT* of fragment shaders in Shadow of the Tomb Raider,
for instance, end up with a sequence of NIR like:

    vec1 32 ssa_2 = load_const (0x00000000 = 0.000000)
    ...
    vec1 32 ssa_191 = pack_half_2x16_split ssa_188, ssa_2
    vec1 32 ssa_192 = pack_half_2x16_split ssa_189, ssa_2
    vec1 32 ssa_193 = pack_half_2x16_split ssa_190, ssa_2

This results in an assembly sequence like:

    mov(8)          g28<1>UD        0x00000000UD
    mov(8)          g21<2>HF        g28<8,8,1>F
    shl(8)          g21<1>UD        g21<8,8,1>UD    0x00000010UD
    mov(8)          g21<2>HF        g25<8,8,1>F
    mov(8)          g19<2>HF        g28<8,8,1>F
    shl(8)          g19<1>UD        g19<8,8,1>UD    0x00000010UD
    mov(8)          g19<2>HF        g23<8,8,1>F
    mov(8)          g20<2>HF        g28<8,8,1>F
    shl(8)          g20<1>UD        g20<8,8,1>UD    0x00000010UD
    mov(8)          g20<2>HF        g24<8,8,1>F

After this commit, the generated assembly is:

    mov(8)          g21<1>UD        0x00000000UD
    mov(8)          g21<2>HF        g23<8,8,1>F
    mov(8)          g19<1>UD        0x00000000UD
    mov(8)          g19<2>HF        g17<8,8,1>F
    mov(8)          g20<1>UD        0x00000000UD
    mov(8)          g20<2>HF        g18<8,8,1>F

Tiger Lake, Ice Lake, Skylake, and Haswell had similar results. (Ice Lake shown)
total instructions in shared programs: 20119086 -> 20119034 (<.01%)
instructions in affected programs: 9056 -> 9004 (-0.57%)
helped: 8
HURT: 0
helped stats (abs) min: 2 max: 16 x̄: 6.50 x̃: 4
helped stats (rel) min: 0.29% max: 1.75% x̄: 1.00% x̃: 0.98%
95% mean confidence interval for instructions value: -11.01 -1.99
95% mean confidence interval for instructions %-change: -1.56% -0.44%
Instructions are helped.

total cycles in shared programs: 861019414 -> 861021044 (<.01%)
cycles in affected programs: 279862 -> 281492 (0.58%)
helped: 4
HURT: 2
helped stats (abs) min: 6 max: 936 x̄: 239.00 x̃: 7
helped stats (rel) min: 0.03% max: 8.13% x̄: 2.09% x̃: 0.09%
HURT stats (abs)   min: 18 max: 2568 x̄: 1293.00 x̃: 1293
HURT stats (rel)   min: 0.36% max: 1.14% x̄: 0.75% x̃: 0.75%
95% mean confidence interval for cycles value: -972.56 1515.89
95% mean confidence interval for cycles %-change: -4.77% 2.49%
Inconclusive result (value mean confidence interval includes 0).

Broadwell
total instructions in shared programs: 17812327 -> 17812263 (<.01%)
instructions in affected programs: 9867 -> 9803 (-0.65%)
helped: 8
HURT: 0
helped stats (abs) min: 2 max: 28 x̄: 8.00 x̃: 4
helped stats (rel) min: 0.32% max: 1.80% x̄: 1.00% x̃: 0.95%
95% mean confidence interval for instructions value: -15.46 -0.54
95% mean confidence interval for instructions %-change: -1.54% -0.47%
Instructions are helped.

total cycles in shared programs: 904768620 -> 904773291 (<.01%)
cycles in affected programs: 454799 -> 459470 (1.03%)
helped: 4
HURT: 4
helped stats (abs) min: 36 max: 586 x̄: 344.50 x̃: 378
helped stats (rel) min: 0.47% max: 4.04% x̄: 2.01% x̃: 1.77%
HURT stats (abs)   min: 1 max: 5572 x̄: 1512.25 x̃: 238
HURT stats (rel)   min: <.01% max: 2.77% x̄: 1.46% x̃: 1.53%
95% mean confidence interval for cycles value: -1122.40 2290.15
95% mean confidence interval for cycles %-change: -2.26% 1.71%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 18581 -> 18579 (-0.01%)
spills in affected programs: 323 -> 321 (-0.62%)
helped: 1
HURT: 0

total fills in shared programs: 24985 -> 24981 (-0.02%)
fills in affected programs: 1348 -> 1344 (-0.30%)
helped: 1
HURT: 0

Tiger Lake, Ice Lake, and Skylake had similar results. (Ice Lake shown)
Instructions in all programs: 143585431 -> 143513657 (-0.0%)
Instructions helped: 14403

Cycles in all programs: 8439312778 -> 8439371578 (+0.0%)
Cycles helped: 10570
Cycles hurt: 3290

Gained: 146
Lost: 74

All of the lost and gained fossil-db shaders are SIMD32 fragment
shaders.  14,247 of the affected shaders are from Shadow of the Tomb
Raider.  154 are from Batman Arkham Origins, and the remaining two are
from Octopath Traveler.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15089>
2022-04-07 18:26:23 +00:00
Ian Romanick
c08302670b intel/compiler: Fix sample_d messages on DG2
DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces.  The
maximum number of gradient components and coordinate components should
be 2.  In spite of this limitation, the Bspec lists a mysterious R
component before the min_lod, so the maximum coordinate components is 3.

Fixes the following Vulkan CTS failures on DG2:

    dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1d_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2d_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_fixed_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1d_float_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_fixed_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2d_float_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1d_fragment
    dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2d_fragment

The Fixes: tag below is a bit misleading. This commit fixes some test
cases similar to ones fixed by the Fixes: commit.  I just want to make
sure this commit gets applied everywhere that commit was also applied.

Fixes: 635ed58e52 ("intel/compiler: Lower txd for 3D samplers on XeHP.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15781>
2022-04-07 17:09:28 +00:00
Lionel Landwerlin
b5031bd6f7 intel/nir: don't report progress on rayqueries if no queries
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15769>
2022-04-07 08:24:19 +00:00