Commit Graph

2265 Commits

Author SHA1 Message Date
Ian Romanick
db20412168 intel/fs: Fix constant propagation into 32x16 integer multiplication
Don't copy propagate the constant in situations like

    mov(8)          g8<1>D          0x7fffffffD
    mul(8)          g16<1>D         g8<8,8,1>D      g15<16,8,2>W

On platforms that only have a 32x16 multiplier, this will result in
lowering the multiply to

    mul(8)          g15<1>D         g14<8,8,1>D     0xffffUW
    mul(8)          g16<1>D         g14<8,8,1>D     0x7fffUW
    add(8)          g15.1<2>UW      g15.1<16,8,2>UW g16<16,8,2>UW

On Gfx8 and Gfx9, which have the full 32x32 multiplier, it results in

    mul(8)          g16<1>D         g15<16,8,2>W    0x7fffffffD

Volume 2a of the Skylake PRM says:

    When multiplying a DW and any lower precision integer, the
    DW operand must on src0.

See also https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/104.

Previous to INTEL_shader_integer_functions2 (in Vulkan or OpenGL), I
don't think it would be possible to create a situation where this could
occur.  I discovered this via some optimizations that can determine that
the non-constant source must be able to fit in 16-bits.  The case listed
above came from piglit's "ext_transform_feedback-order arrays points"
with those optimizations in place.

No shader-db or fossil-db changes on any Intel platform.

Fixes: de6c0f8487 ("intel/fs: Implement support for NIR opcodes for INTEL_shader_integer_functions2")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
2022-11-08 00:02:16 +00:00
Francisco Jerez
5d4df3ac23 intel/compiler: Run extra fp64 lowering pass on devices that don't support int64.
In some cases nir_lower_int64 will emit fp64 operations which aren't
natively supported on any Intel hardware (e.g. ftrunc, frem).  An
extra pass of nir_opt_algebraic (for frem) and nir_lower_doubles is
required in order to take care of them.  This fixes several int64
test-cases on MTL hardware.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mykhailo Skorokhodov <mykhailo.skorokhodov@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19390>
2022-11-07 07:35:22 +00:00
Illia Abernikhin
aa4ac5ff8b utils: Merge util/debug.* into util/u_debug.* and remove util/debug.*
Rename env_var_as_unsigned() -> debug_get_num_option(), because duplicate
Rename env_var_as_bool() -> debug_get_bool_option(), because duplicate

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7177

Signed-off-by: Illia Abernikhin <illia.abernikhin@globallogic.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19336>
2022-11-02 07:25:39 +00:00
Kenneth Graunke
88756cee8d intel/compiler: Run nir_opt_large_constants before scalarizing consts
nir_opt_large_constants balks at seeing a store_deref of a variable
where the source is a vecN operation of multiple load_consts, and thinks
that isn't a constant, so it should not bother promoting it.

Unfortunately, we were running nir_lower_load_const_to_scalar before
nir_opt_large_constants, so this prevented a ton of constant promotion.

This commit /used to help/ some shaders in shader-db. Presumably since
!16770 landed, those shaders were already helped.  Currently ther are
no shader-db changes on any Intel platform.

Fossil-db results:

All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141998227 -> 141421756 (-0.4%)
Instructions helped: 12515
Instructions hurt: 237

SENDs in all programs: 7437925 -> 7468033 (+0.4%)
SENDs hurt: 12806

Cycles in all programs: 9161655753 -> 9132869800 (-0.3%)
Cycles helped: 10163
Cycles hurt: 2637

Spills in all programs: 19977 -> 18678 (-6.5%)
Spills helped: 384
Spills hurt: 40

Fills in all programs: 32863 -> 31396 (-4.5%)
Fills helped: 385
Fills hurt: 42

Lost: 1

Lots of Shadow of the Tomb Raider fragment shaders and Batman Arkham
Origins vertex shaders were hurt for SENDs in this commit.  A couple
Aztec Ruins compute shaders and Spaceship shaders (multiple stages)
were also hurt.

All of the shaders hurt for spills or fills were Spaceship compute
shaders.  Nearly all of the shaders helped were Shadow of the Tomb
Raider fragmenet shaders.  One Spaceship shader was reall, REALLY helped:

Spills helped fossils/fossil-db/Spaceship.run.9f90a2a226fcc57f.1.foz/0b507d3abe2e3c28/compute: 321 -> 13 (-96.0%)
Fills helped fossils/fossil-db/Spaceship.run.9f90a2a226fcc57f.1.foz/0b507d3abe2e3c28/compute: 279 -> 21 (-92.5%)

Overall this seems like an improvement, but we may want to actually
run these few benchmarks before landing.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16539>
2022-11-01 14:55:21 -07:00
Lionel Landwerlin
920aed2121 intel/compiler: don't allocate compaction arrays on the stack
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7569
Cc: mesa-stable
Reviewed-by: Luis Felipe Strano Moraes <luis.strano@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19339>
2022-10-28 07:10:58 +00:00
Lionel Landwerlin
e59c4a912b intel/fs: use fs implementation of dump_instructions
This specialized version prints out the liveness count as well as the
maximum liveness count. It was eye opening when seeing the max
liveness jump after lowering of packing instructions which should not
have changed the count.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>
2022-10-27 21:05:00 +00:00
Lionel Landwerlin
e5dfff0946 intel/fs: reduce liveness of variables in lowering passes
When lowering a single instruction with a destination VGRF to 2 or
more, the VGRF is now considered partially written by each generated
instruction and that increases its liveness especially in loops. Thus
potentially increasing the number of spills/fills due to register
allocation.

Putting an UNDEF instruction in front of the lowered instructions
allows the IR to limit the liveness of the VGRF, reducing register
pressure.

This has a pretty dramatic effect on spills/fills for RT shaders. Here
the stats on Q2RTX shaders on DG2 (wipping out any spills/fills due to
register allocation) :

Instructions in all programs: 26150 -> 24955 (-4.6%)
SENDs in all programs: 1148 -> 1148 (+0.0%)
Loops in all programs: 4 -> 4 (+0.0%)
Cycles in all programs: 392179 -> 332787 (-15.1%)
Spills in all programs: 132 -> 116 (-12.1%)
Fills in all programs: 262 -> 154 (-41.2%)

Shader-db results on TGL :

total instructions in shared programs: 21158140 -> 21158377 (<.01%)
instructions in affected programs: 76629 -> 76866 (0.31%)
helped: 18
HURT: 20
helped stats (abs) min: 1 max: 60 x̄: 18.89 x̃: 12
helped stats (rel) min: 0.21% max: 3.61% x̄: 1.02% x̃: 0.77%
HURT stats (abs)   min: 1 max: 79 x̄: 28.85 x̃: 18
HURT stats (rel)   min: 0.04% max: 2.81% x̄: 1.13% x̃: 0.79%
95% mean confidence interval for instructions value: -4.82 17.30
95% mean confidence interval for instructions %-change: -0.34% 0.57%
Inconclusive result (value mean confidence interval includes 0).

total loops in shared programs: 5753 -> 5753 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total cycles in shared programs: 798856834 -> 798870688 (<.01%)
cycles in affected programs: 6208395 -> 6222249 (0.22%)
helped: 22
HURT: 17
helped stats (abs) min: 2 max: 8794 x̄: 1438.18 x̃: 782
helped stats (rel) min: 0.05% max: 2.28% x̄: 0.63% x̃: 0.44%
HURT stats (abs)   min: 2 max: 19178 x̄: 2676.12 x̃: 1358
HURT stats (rel)   min: 0.04% max: 23.49% x̄: 2.25% x̃: 0.71%
95% mean confidence interval for cycles value: -952.19 1662.65
95% mean confidence interval for cycles %-change: -0.64% 1.90%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 4078 -> 4066 (-0.29%)
spills in affected programs: 40 -> 28 (-30.00%)
helped: 2
HURT: 0

total fills in shared programs: 2856 -> 2832 (-0.84%)
fills in affected programs: 127 -> 103 (-18.90%)
helped: 2
HURT: 0

total sends in shared programs: 998554 -> 998554 (0.00%)
sends in affected programs: 0 -> 0
helped: 0
HURT: 0

LOST:   0
GAINED: 0

Total CPU time (seconds): 2346.06 -> 2304.80 (-1.76%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>
2022-10-27 21:05:00 +00:00
Lionel Landwerlin
dd6d40429b intel/fs: make split_virtual_grfs deal with partial undefs
v2: fix up UNDEFs instructions (Curro)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>
2022-10-27 21:05:00 +00:00
Lionel Landwerlin
14b99df7d9 intel/fs: require UNDEFs register offsets to be aligned to REG_SIZE
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>
2022-10-27 21:05:00 +00:00
Alyssa Rosenzweig
941c37c085 nir/lower_idiv: Remove imprecise_32bit_lowering
NIR has two implementations of lower_idiv, keyed on the
imprecise_32bit_lowering flag. This flag is misleading: the results when
setting this flag "imprecise", they're completely wrong for some values.
If a backend has a native implementation of umul_high, the correct path
isn't that much more expensive. If it doesn't, it's substantially slower
for highp integer divison... but in practice, non-constant highp integer
division is pretty rare.

After a painful migration of the tree, this code path has no more users.
Remove it so nobody else gets the bright idea of using it again.

Closes: #6555
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19303>
2022-10-27 19:37:14 +00:00
Jordan Justen
c238699afa intel/compiler: Broadcast lower code should check 64-bit int support
This will affect MTL which will have fp64 support without int64
support.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19284>
2022-10-27 09:22:09 +00:00
Lionel Landwerlin
2da7ec0db9 intel/clc: assert when libclc shader is not found
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7483
Reviewed-by: Luis Felipe Strano Moraes <luis.strano@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19091>
2022-10-27 08:53:55 +00:00
Tapani Pälli
1e51383258 intel/compiler: run nir_opt_idiv_const before nir_lower_idiv
Integer div lowering can potentially create a lot of code that is
not removed later on. Running const lowering pass first can be used
to eliminate that code.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19157>
2022-10-20 15:35:48 +03:00
Luis Felipe Strano Moraes
eff1517cd7 anv: added proper handling for input argument in intel_clc
That was previously listed on the getopt_long struct but not actually
being used. This makes intel_clc argument processing easier as now
all of its arguments are handled with getopt and anything after the
special argument '--' is passed along to clang to form the final build
command.

Thanks to Dylan Baker for help with changes to the meson file.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19153>
2022-10-20 02:24:39 +00:00
Luis Felipe Strano Moraes
8de02ff980 anv: fixing typo on description of output flag for intel_clc
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19153>
2022-10-20 02:24:39 +00:00
Luis Felipe Strano Moraes
056d72c897 anv: add missing separator to help for intel_clc
intel_clc relies on the special argument '--' for getopt to be given so
it knows when to start expecting purely input files or clang arguments.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19153>
2022-10-20 02:24:39 +00:00
Luis Felipe Strano Moraes
8e1f03ada0 anv: reword info flag in intel_clc's getopt to avoid clash
The info keyword was using the same short description that
was listed for input files on the struct for long_options.

Rewording it to 'v' and 'verbose' to be more in line with
expectations.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19153>
2022-10-20 02:24:39 +00:00
Iván Briano
d9747169b6 anv: support VK_PIPELINE_CREATE_RAY_TRACING_SKIP_*
VK_PIPELINE_CREATE_RAY_TRACING_SKIP_AABBS_BIT_KHR and
VK_PIPELINE_CREATE_RAY_TRACING_SKIP_TRIANGLES_BIT_KHR, when specified,
make TraceRay behave as if the corresponding shader flags were set, but
without affecting the value of IncomingRayFlags in shaders.

v2 (Lionel): Improve comments

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19152>
2022-10-20 00:03:55 +00:00
Rohan Garg
43169dbbe5 intel/compiler: Support 16 bit float ops
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17988>
2022-10-17 15:56:28 +02:00
Kenneth Graunke
2dfab687ec intel/compiler: Vectorize gl_TessLevelInner/Outer[] writes [v2]
Setting the NIR options takes care of iris thanks to the common st/mesa
linking code, and updating brw_nir_link_shaders should handle anv.

The main effort here is updating remap_tess_levels, which needs to
handle vector stores, writemasking, and swizzling.  Unfortunately,
we also need to continue handling the existing single-component
access because it's used for TES inputs, which we don't vectorize.

We could try to vectorize TES inputs too, but they're all pushed
anyway, so it wouldn't buy us much other than deleting this code.
Also, we do have opt_combine_stores, but not one for loads.

One limitation of using nir_vectorize_tess_levels is that it works
on variables, and so isn't able to combine outer/inner writes that
happen to live in the same vec4 slot (for triangle domains).  That
said, it's still better than before.

For writes, we allow the intrinsics to supply up to the full size
of the variable (vec4 for outer, vec2 for inner) even if the domain
only requires a subset of those components (i.e. triangles needs 3).

shader-db results on Icelake:

   total instructions in shared programs: 19600314 -> 19597528 (-0.01%)
   instructions in affected programs: 65338 -> 62552 (-4.26%)
   helped: 271 / HURT: 0
   helped stats (abs) min: 6 max: 24 x̄: 10.28 x̃: 12
   helped stats (rel) min: 1.30% max: 18.18% x̄: 5.80% x̃: 7.59%
   95% mean confidence interval for instructions value: -10.71 -9.85
   95% mean confidence interval for instructions %-change: -6.17% -5.43%
   Instructions are helped.

   total cycles in shared programs: 851842332 -> 851808165 (<.01%)
   cycles in affected programs: 618577 -> 584410 (-5.52%)
   helped: 271 / HURT: 0
   helped stats (abs) min: 64 max: 540 x̄: 126.08 x̃: 111
   helped stats (rel) min: 2.57% max: 37.97% x̄: 6.12% x̃: 5.06%
   95% mean confidence interval for cycles value: -135.35 -116.80
   95% mean confidence interval for cycles %-change: -6.67% -5.57%
   Cycles are helped.

   total sends in shared programs: 1025238 -> 1024308 (-0.09%)
   sends in affected programs: 6454 -> 5524 (-14.41%)
   helped: 271 / HURT: 0
   helped stats (abs) min: 2 max: 8 x̄: 3.43 x̃: 4
   helped stats (rel) min: 5.71% max: 25.00% x̄: 14.98% x̃: 17.39%
   95% mean confidence interval for sends value: -3.57 -3.29
   95% mean confidence interval for sends %-change: -15.42% -14.54%
   Sends are helped.

According to Felix DeGrood, this results in a 10% improvement in
the draw call time for certain draw calls from Strange Brigade.

v2: Fix assertions about number of components and add more of them.
    Combine the quads and triangles handling as it's nearly identical.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19061>
2022-10-13 11:38:21 -07:00
Tapani Pälli
c9c9a5b78d intel/fs: mark debug variables with ASSERTED
To clean up compilation warnings about unused variables
when asserts are disabled.

v2: UNUSED -> ASSERTED (Eric Engestrom)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19016>
2022-10-11 04:14:30 +00:00
Caio Oliveira
6cda887ac6 intel/compiler: Explicitly include build-id when linking intel_clc
Ensure that the program will have a build-id.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18924>
2022-10-01 16:42:07 +00:00
Kenneth Graunke
b61b1d5a4c Revert "intel/compiler: Vectorize gl_TessLevelInner/Outer[] writes"
This reverts commit abba55382f.
The assertions I added late in the process broke shader-db, and my
quick fix broke CI, so let's just revert it for now and I'll resubmit
this later when it's working better.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7385
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18895>
2022-09-29 17:39:18 -07:00
Marcin Ślusarz
9bac88856d intel/compiler: fix loading of draw_id from task & mesh payload
Previously both destination and source were floats, so no casting was
performed, but with 7664c85b1d source register was reinterpreted as
unsigned integer, so MOV started casting that integer to float.

Fixes: 7664c85b1d ("intel/compiler: Create and use struct for TASK and MESH thread payloads")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18886>
2022-09-29 17:17:25 +00:00
Lionel Landwerlin
23c7142cd6 anv: disable SIMD16 for RT shaders
Since divergence is a lot more likely in RT than compute, it makes
sense to limit ourselves to SIMD8.

The trampoline shader defaults to SIMD16 since this one is uniform.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:37 +00:00
Lionel Landwerlin
8fc7a98e31 intel/fs: disable split_array_vars on opencl kernels
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Lionel Landwerlin
57593c5395 intel/nir: disable assert on async stack id
This can be accessed from :
   - RT shaders
   - CS trampoline shader

We missed the second part here.

Fixes: 0465714790 ("intel/nir/rt: add more helpers for ray queries")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Lionel Landwerlin
8d580de4a9 intel/nir: fix potential invalid function impl ptr usage
We keep the nir_builder::impl value around, but we've run some passes
that might have change the main function.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 96fde5518b ("intel/rt: Add a helper to create the raygen trampoline shader")
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Lionel Landwerlin
1ffd28149f intel/nir: fixup preserved metadata in rayquery lowering
Another case of not clearing the metadata correctly.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Lionel Landwerlin
9dba8d8aa1 intel/fs: take a builder arg for resolve_source_modifiers()
There will be situations where we will want to use a local builder
rather than the one associated with NIR->backend translation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Lionel Landwerlin
649cdc617f intel/nir: reuse rt helper
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Lionel Landwerlin
57f1e95102 intel/rt: fix procedural primitive ID access
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Jason Ekstrand
aea88f16df intel/fs: SEL_EXEC uses the integer pipe for 64-bit stuff
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Jason Ekstrand
c80c0ed943 intel/fs: Always use integer types for indirect MOVs
There's a new Gen12.5 restriction which forbids using the VxH or Vx1 on
the floating-point pipe.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Lionel Landwerlin
c6a7f4b34e intel/devinfo: Rename & implement num_dual_subslices
v2: Use the upper bound of dual subslices as the ID is not remapped
with fused off parts and this is what we'll use for a bunch of
computation in RT.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Kenneth Graunke
abba55382f intel/compiler: Vectorize gl_TessLevelInner/Outer[] writes
Setting the NIR options takes care of iris thanks to the common st/mesa
linking code, and updating brw_nir_link_shaders should handle anv.

The main effort here is updating remap_tess_levels, which needs to
handle vector stores, writemasking, and swizzling.  Unfortunately,
we also need to continue handling the existing single-component
access because it's used for TES inputs, which we don't vectorize.

We could try to vectorize TES inputs too, but they're all pushed
anyway, so it wouldn't buy us much other than deleting this code.
Also, we do have opt_combine_stores, but not one for loads.

One limitation of using nir_vectorize_tess_levels is that it works
on variables, and so isn't able to combine outer/inner writes that
happen to live in the same vec4 slot (for triangle domains).  That
said, it's still better than before.

For writes, we allow the intrinsics to supply up to the full size
of the variable (vec4 for outer, vec2 for inner) even if the domain
only requires a subset of those components (i.e. triangles needs 3).

shader-db results on Icelake:

   total instructions in shared programs: 19605070 -> 19602284 (-0.01%)
   instructions in affected programs: 65338 -> 62552 (-4.26%)
   helped: 271 / HURT: 0
   helped stats (abs) min: 6 max: 24 x̄: 10.28 x̃: 12
   helped stats (rel) min: 1.30% max: 18.18% x̄: 5.80% x̃: 7.59%
   95% mean confidence interval for instructions value: -10.71 -9.85
   95% mean confidence interval for instructions %-change: -6.17% -5.43%
   Instructions are helped.

   total cycles in shared programs: 851854659 -> 851820320 (<.01%)
   cycles in affected programs: 618749 -> 584410 (-5.55%)
   helped: 271 / HURT: 0
   helped stats (abs) min: 69 max: 540 x̄: 126.71 x̃: 108
   helped stats (rel) min: 2.57% max: 37.97% x̄: 6.17% x̃: 5.06%
   95% mean confidence interval for cycles value: -135.89 -117.54
   95% mean confidence interval for cycles %-change: -6.72% -5.63%
   Cycles are helped.

   total sends in shared programs: 1025285 -> 1024355 (-0.09%)
   sends in affected programs: 6454 -> 5524 (-14.41%)
   helped: 271 / HURT: 0
   helped stats (abs) min: 2 max: 8 x̄: 3.43 x̃: 4
   helped stats (rel) min: 5.71% max: 25.00% x̄: 14.98% x̃: 17.39%
   95% mean confidence interval for sends value: -3.57 -3.29
   95% mean confidence interval for sends %-change: -15.42% -14.54%
   Sends are helped.

According to Felix DeGrood, this results in a 10% improvement in
the draw call time for certain draw calls from Strange Brigade.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>
2022-09-27 18:17:56 -07:00
Kenneth Graunke
be21d54aca intel/compiler: Use an existing URB write to end TCS threads when viable
VS, TCS, TES, and GS threads must end with a URB write message with the
EOT (end of thread) bit set.  For VS and TES, we shadow output variables
with temporaries and perform all stores at the end of the shader, giving
us an existing message to do the EOT.

In tessellation control shaders, we don't defer output stores until the
end of the thread like we do for vertex or evaluation shaders.  We just
process store_output and store_per_vertex_output intrinsics where they
occur, which may be in control flow.  So we can't guarantee that there's
a URB write being at the end of the shader.

Traditionally, we've just emitted a separate URB write to finish TCS
threads, doing a writemasked write to an single patch header DWord.
On Broadwell, we need to set a "TR DS Cache Disable" bit, so this is
a convenient spot to do so.  But on other platforms, there's no such
field, and this write is purely wasteful.

Insetad of emitting a separate write, we can just look for an existing
URB write at the end of the program and tag that with EOT, if possible.
We already had code to do this for geometry shaders, so just lift it
into a helper function and reuse it.

No changes in shader-db.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>
2022-09-27 18:17:42 -07:00
Lionel Landwerlin
e76e3d9cea intel/nir/rt: fixup alignment of memcpy iterations
Not sure if fixes anything because it's always 16 at least, but this
is more correct.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
2022-09-23 08:29:17 +00:00
Lionel Landwerlin
139e8f4635 intel/fs: fixup a64 messages
And run algebraic when either int64 for float64 are not supported so
those don't end up in the generated code.

Cc: mesa-stable
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
2022-09-23 08:29:17 +00:00
Lionel Landwerlin
838bbdcf2e intel/nir/rt: store ray query state in scratch
Initially I tried to store ray query state in the RT scratch space but
got the offset wrong. In the end putting this in the scratch surface
makes more sense, especially for non RT stages.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
2022-09-23 08:29:17 +00:00
Lionel Landwerlin
f7fab09a07 intel/nir/rt: change scratch check validation
It's very unfortunate that we have the RT scratch being conflated with
the usual scratch. In our implementation those are 2 different buffers.

The usual scratch access are done through the scratch surface state
(delivered through thread payload), while RT scratch (which outlives
thread dispatch with shader calls) is its own buffer.

So checking the NIR scratch size makes no sense as we can have normal
scratch accesses completely unrelated to RT scratch accesses.

This change switches the validation by looking at whether the scratch
base pointer intrinsic is being used (which is what we use/abuse to
implement RT scratch).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
2022-09-23 08:29:17 +00:00
Lionel Landwerlin
259b1647e6 intel/nir/rt: fix ray query proceed level
Initially the level is world (top level), then it's whatever level the
potential hit is.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
2022-09-23 08:29:17 +00:00
Lionel Landwerlin
3f01071c79 intel/nir/rt: remove ray query mem hit writes at initialization
This will not even be read by HW.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
2022-09-23 08:29:17 +00:00
Lionel Landwerlin
f843bec7de intel/nir/rt: spill/fill the entire ray query data
We need the traversal stack to saved/restored along with mem hits.
Total spill/fill is 256bytes.

We can potentially optimize this but we have to be very careful about
what state the query is in.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
2022-09-23 08:29:17 +00:00
Lionel Landwerlin
a88f725eea intel/nir/rt: fixup generate hit
This function copies the potential hit from its memory location to the
committed hit location. A couple of fields got their bit offset wrong.

Fixes some CTS tests in dEQP-VK.ray_query.*

v2: Copy primitive/instance leaf pointers

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0465714790 ("intel/nir/rt: add more helpers for ray queries")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
2022-09-23 08:29:17 +00:00
Marcin Ślusarz
ac8020ebfd intel/compiler: add support for 8/16 bits task payload loads
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18501>
2022-09-21 09:16:20 +00:00
Marcin Ślusarz
ac581b30ec intel/compiler: refactor brw_nir_lower_mem_access_bit_sizes
Change dup_mem_intrinsic return type.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18501>
2022-09-21 09:16:20 +00:00
Marcin Ślusarz
a31b8fa38b intel/compiler/task: use shared memory for small task payload loads & stores
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18501>
2022-09-21 09:16:20 +00:00
Marcin Ślusarz
37e78803d7 intel/compiler: use nir_lower_task_shader pass
This implements task payload atomics in ANV.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16852>
2022-09-20 18:04:29 +00:00
Marcin Ślusarz
3c96959bbc intel/compiler: print shader after successful brw_nir_lower_shading_rate_output
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18702>
2022-09-20 17:23:45 +00:00