Commit Graph

2461 Commits

Author SHA1 Message Date
Lionel Landwerlin
920aed2121 intel/compiler: don't allocate compaction arrays on the stack
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7569
Cc: mesa-stable
Reviewed-by: Luis Felipe Strano Moraes <luis.strano@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19339>
2022-10-28 07:10:58 +00:00
Lionel Landwerlin
e59c4a912b intel/fs: use fs implementation of dump_instructions
This specialized version prints out the liveness count as well as the
maximum liveness count. It was eye opening when seeing the max
liveness jump after lowering of packing instructions which should not
have changed the count.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>
2022-10-27 21:05:00 +00:00
Lionel Landwerlin
e5dfff0946 intel/fs: reduce liveness of variables in lowering passes
When lowering a single instruction with a destination VGRF to 2 or
more, the VGRF is now considered partially written by each generated
instruction and that increases its liveness especially in loops. Thus
potentially increasing the number of spills/fills due to register
allocation.

Putting an UNDEF instruction in front of the lowered instructions
allows the IR to limit the liveness of the VGRF, reducing register
pressure.

This has a pretty dramatic effect on spills/fills for RT shaders. Here
the stats on Q2RTX shaders on DG2 (wipping out any spills/fills due to
register allocation) :

Instructions in all programs: 26150 -> 24955 (-4.6%)
SENDs in all programs: 1148 -> 1148 (+0.0%)
Loops in all programs: 4 -> 4 (+0.0%)
Cycles in all programs: 392179 -> 332787 (-15.1%)
Spills in all programs: 132 -> 116 (-12.1%)
Fills in all programs: 262 -> 154 (-41.2%)

Shader-db results on TGL :

total instructions in shared programs: 21158140 -> 21158377 (<.01%)
instructions in affected programs: 76629 -> 76866 (0.31%)
helped: 18
HURT: 20
helped stats (abs) min: 1 max: 60 x̄: 18.89 x̃: 12
helped stats (rel) min: 0.21% max: 3.61% x̄: 1.02% x̃: 0.77%
HURT stats (abs)   min: 1 max: 79 x̄: 28.85 x̃: 18
HURT stats (rel)   min: 0.04% max: 2.81% x̄: 1.13% x̃: 0.79%
95% mean confidence interval for instructions value: -4.82 17.30
95% mean confidence interval for instructions %-change: -0.34% 0.57%
Inconclusive result (value mean confidence interval includes 0).

total loops in shared programs: 5753 -> 5753 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0

total cycles in shared programs: 798856834 -> 798870688 (<.01%)
cycles in affected programs: 6208395 -> 6222249 (0.22%)
helped: 22
HURT: 17
helped stats (abs) min: 2 max: 8794 x̄: 1438.18 x̃: 782
helped stats (rel) min: 0.05% max: 2.28% x̄: 0.63% x̃: 0.44%
HURT stats (abs)   min: 2 max: 19178 x̄: 2676.12 x̃: 1358
HURT stats (rel)   min: 0.04% max: 23.49% x̄: 2.25% x̃: 0.71%
95% mean confidence interval for cycles value: -952.19 1662.65
95% mean confidence interval for cycles %-change: -0.64% 1.90%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 4078 -> 4066 (-0.29%)
spills in affected programs: 40 -> 28 (-30.00%)
helped: 2
HURT: 0

total fills in shared programs: 2856 -> 2832 (-0.84%)
fills in affected programs: 127 -> 103 (-18.90%)
helped: 2
HURT: 0

total sends in shared programs: 998554 -> 998554 (0.00%)
sends in affected programs: 0 -> 0
helped: 0
HURT: 0

LOST:   0
GAINED: 0

Total CPU time (seconds): 2346.06 -> 2304.80 (-1.76%)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>
2022-10-27 21:05:00 +00:00
Lionel Landwerlin
dd6d40429b intel/fs: make split_virtual_grfs deal with partial undefs
v2: fix up UNDEFs instructions (Curro)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>
2022-10-27 21:05:00 +00:00
Lionel Landwerlin
14b99df7d9 intel/fs: require UNDEFs register offsets to be aligned to REG_SIZE
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18657>
2022-10-27 21:05:00 +00:00
Alyssa Rosenzweig
941c37c085 nir/lower_idiv: Remove imprecise_32bit_lowering
NIR has two implementations of lower_idiv, keyed on the
imprecise_32bit_lowering flag. This flag is misleading: the results when
setting this flag "imprecise", they're completely wrong for some values.
If a backend has a native implementation of umul_high, the correct path
isn't that much more expensive. If it doesn't, it's substantially slower
for highp integer divison... but in practice, non-constant highp integer
division is pretty rare.

After a painful migration of the tree, this code path has no more users.
Remove it so nobody else gets the bright idea of using it again.

Closes: #6555
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19303>
2022-10-27 19:37:14 +00:00
Jordan Justen
c238699afa intel/compiler: Broadcast lower code should check 64-bit int support
This will affect MTL which will have fp64 support without int64
support.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19284>
2022-10-27 09:22:09 +00:00
Lionel Landwerlin
2da7ec0db9 intel/clc: assert when libclc shader is not found
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7483
Reviewed-by: Luis Felipe Strano Moraes <luis.strano@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19091>
2022-10-27 08:53:55 +00:00
Tapani Pälli
1e51383258 intel/compiler: run nir_opt_idiv_const before nir_lower_idiv
Integer div lowering can potentially create a lot of code that is
not removed later on. Running const lowering pass first can be used
to eliminate that code.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19157>
2022-10-20 15:35:48 +03:00
Luis Felipe Strano Moraes
eff1517cd7 anv: added proper handling for input argument in intel_clc
That was previously listed on the getopt_long struct but not actually
being used. This makes intel_clc argument processing easier as now
all of its arguments are handled with getopt and anything after the
special argument '--' is passed along to clang to form the final build
command.

Thanks to Dylan Baker for help with changes to the meson file.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19153>
2022-10-20 02:24:39 +00:00
Luis Felipe Strano Moraes
8de02ff980 anv: fixing typo on description of output flag for intel_clc
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19153>
2022-10-20 02:24:39 +00:00
Luis Felipe Strano Moraes
056d72c897 anv: add missing separator to help for intel_clc
intel_clc relies on the special argument '--' for getopt to be given so
it knows when to start expecting purely input files or clang arguments.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19153>
2022-10-20 02:24:39 +00:00
Luis Felipe Strano Moraes
8e1f03ada0 anv: reword info flag in intel_clc's getopt to avoid clash
The info keyword was using the same short description that
was listed for input files on the struct for long_options.

Rewording it to 'v' and 'verbose' to be more in line with
expectations.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19153>
2022-10-20 02:24:39 +00:00
Iván Briano
d9747169b6 anv: support VK_PIPELINE_CREATE_RAY_TRACING_SKIP_*
VK_PIPELINE_CREATE_RAY_TRACING_SKIP_AABBS_BIT_KHR and
VK_PIPELINE_CREATE_RAY_TRACING_SKIP_TRIANGLES_BIT_KHR, when specified,
make TraceRay behave as if the corresponding shader flags were set, but
without affecting the value of IncomingRayFlags in shaders.

v2 (Lionel): Improve comments

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19152>
2022-10-20 00:03:55 +00:00
Rohan Garg
43169dbbe5 intel/compiler: Support 16 bit float ops
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17988>
2022-10-17 15:56:28 +02:00
Kenneth Graunke
2dfab687ec intel/compiler: Vectorize gl_TessLevelInner/Outer[] writes [v2]
Setting the NIR options takes care of iris thanks to the common st/mesa
linking code, and updating brw_nir_link_shaders should handle anv.

The main effort here is updating remap_tess_levels, which needs to
handle vector stores, writemasking, and swizzling.  Unfortunately,
we also need to continue handling the existing single-component
access because it's used for TES inputs, which we don't vectorize.

We could try to vectorize TES inputs too, but they're all pushed
anyway, so it wouldn't buy us much other than deleting this code.
Also, we do have opt_combine_stores, but not one for loads.

One limitation of using nir_vectorize_tess_levels is that it works
on variables, and so isn't able to combine outer/inner writes that
happen to live in the same vec4 slot (for triangle domains).  That
said, it's still better than before.

For writes, we allow the intrinsics to supply up to the full size
of the variable (vec4 for outer, vec2 for inner) even if the domain
only requires a subset of those components (i.e. triangles needs 3).

shader-db results on Icelake:

   total instructions in shared programs: 19600314 -> 19597528 (-0.01%)
   instructions in affected programs: 65338 -> 62552 (-4.26%)
   helped: 271 / HURT: 0
   helped stats (abs) min: 6 max: 24 x̄: 10.28 x̃: 12
   helped stats (rel) min: 1.30% max: 18.18% x̄: 5.80% x̃: 7.59%
   95% mean confidence interval for instructions value: -10.71 -9.85
   95% mean confidence interval for instructions %-change: -6.17% -5.43%
   Instructions are helped.

   total cycles in shared programs: 851842332 -> 851808165 (<.01%)
   cycles in affected programs: 618577 -> 584410 (-5.52%)
   helped: 271 / HURT: 0
   helped stats (abs) min: 64 max: 540 x̄: 126.08 x̃: 111
   helped stats (rel) min: 2.57% max: 37.97% x̄: 6.12% x̃: 5.06%
   95% mean confidence interval for cycles value: -135.35 -116.80
   95% mean confidence interval for cycles %-change: -6.67% -5.57%
   Cycles are helped.

   total sends in shared programs: 1025238 -> 1024308 (-0.09%)
   sends in affected programs: 6454 -> 5524 (-14.41%)
   helped: 271 / HURT: 0
   helped stats (abs) min: 2 max: 8 x̄: 3.43 x̃: 4
   helped stats (rel) min: 5.71% max: 25.00% x̄: 14.98% x̃: 17.39%
   95% mean confidence interval for sends value: -3.57 -3.29
   95% mean confidence interval for sends %-change: -15.42% -14.54%
   Sends are helped.

According to Felix DeGrood, this results in a 10% improvement in
the draw call time for certain draw calls from Strange Brigade.

v2: Fix assertions about number of components and add more of them.
    Combine the quads and triangles handling as it's nearly identical.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19061>
2022-10-13 11:38:21 -07:00
Tapani Pälli
c9c9a5b78d intel/fs: mark debug variables with ASSERTED
To clean up compilation warnings about unused variables
when asserts are disabled.

v2: UNUSED -> ASSERTED (Eric Engestrom)

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19016>
2022-10-11 04:14:30 +00:00
Caio Oliveira
6cda887ac6 intel/compiler: Explicitly include build-id when linking intel_clc
Ensure that the program will have a build-id.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18924>
2022-10-01 16:42:07 +00:00
Kenneth Graunke
b61b1d5a4c Revert "intel/compiler: Vectorize gl_TessLevelInner/Outer[] writes"
This reverts commit abba55382f.
The assertions I added late in the process broke shader-db, and my
quick fix broke CI, so let's just revert it for now and I'll resubmit
this later when it's working better.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7385
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18895>
2022-09-29 17:39:18 -07:00
Marcin Ślusarz
9bac88856d intel/compiler: fix loading of draw_id from task & mesh payload
Previously both destination and source were floats, so no casting was
performed, but with 7664c85b1d source register was reinterpreted as
unsigned integer, so MOV started casting that integer to float.

Fixes: 7664c85b1d ("intel/compiler: Create and use struct for TASK and MESH thread payloads")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18886>
2022-09-29 17:17:25 +00:00
Lionel Landwerlin
23c7142cd6 anv: disable SIMD16 for RT shaders
Since divergence is a lot more likely in RT than compute, it makes
sense to limit ourselves to SIMD8.

The trampoline shader defaults to SIMD16 since this one is uniform.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:37 +00:00
Lionel Landwerlin
8fc7a98e31 intel/fs: disable split_array_vars on opencl kernels
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Lionel Landwerlin
57593c5395 intel/nir: disable assert on async stack id
This can be accessed from :
   - RT shaders
   - CS trampoline shader

We missed the second part here.

Fixes: 0465714790 ("intel/nir/rt: add more helpers for ray queries")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Lionel Landwerlin
8d580de4a9 intel/nir: fix potential invalid function impl ptr usage
We keep the nir_builder::impl value around, but we've run some passes
that might have change the main function.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 96fde5518b ("intel/rt: Add a helper to create the raygen trampoline shader")
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Lionel Landwerlin
1ffd28149f intel/nir: fixup preserved metadata in rayquery lowering
Another case of not clearing the metadata correctly.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Lionel Landwerlin
9dba8d8aa1 intel/fs: take a builder arg for resolve_source_modifiers()
There will be situations where we will want to use a local builder
rather than the one associated with NIR->backend translation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Lionel Landwerlin
649cdc617f intel/nir: reuse rt helper
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Lionel Landwerlin
57f1e95102 intel/rt: fix procedural primitive ID access
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Jason Ekstrand
aea88f16df intel/fs: SEL_EXEC uses the integer pipe for 64-bit stuff
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Jason Ekstrand
c80c0ed943 intel/fs: Always use integer types for indirect MOVs
There's a new Gen12.5 restriction which forbids using the VxH or Vx1 on
the floating-point pipe.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Lionel Landwerlin
c6a7f4b34e intel/devinfo: Rename & implement num_dual_subslices
v2: Use the upper bound of dual subslices as the ID is not remapped
with fused off parts and this is what we'll use for a bunch of
computation in RT.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16970>
2022-09-28 05:38:36 +00:00
Kenneth Graunke
abba55382f intel/compiler: Vectorize gl_TessLevelInner/Outer[] writes
Setting the NIR options takes care of iris thanks to the common st/mesa
linking code, and updating brw_nir_link_shaders should handle anv.

The main effort here is updating remap_tess_levels, which needs to
handle vector stores, writemasking, and swizzling.  Unfortunately,
we also need to continue handling the existing single-component
access because it's used for TES inputs, which we don't vectorize.

We could try to vectorize TES inputs too, but they're all pushed
anyway, so it wouldn't buy us much other than deleting this code.
Also, we do have opt_combine_stores, but not one for loads.

One limitation of using nir_vectorize_tess_levels is that it works
on variables, and so isn't able to combine outer/inner writes that
happen to live in the same vec4 slot (for triangle domains).  That
said, it's still better than before.

For writes, we allow the intrinsics to supply up to the full size
of the variable (vec4 for outer, vec2 for inner) even if the domain
only requires a subset of those components (i.e. triangles needs 3).

shader-db results on Icelake:

   total instructions in shared programs: 19605070 -> 19602284 (-0.01%)
   instructions in affected programs: 65338 -> 62552 (-4.26%)
   helped: 271 / HURT: 0
   helped stats (abs) min: 6 max: 24 x̄: 10.28 x̃: 12
   helped stats (rel) min: 1.30% max: 18.18% x̄: 5.80% x̃: 7.59%
   95% mean confidence interval for instructions value: -10.71 -9.85
   95% mean confidence interval for instructions %-change: -6.17% -5.43%
   Instructions are helped.

   total cycles in shared programs: 851854659 -> 851820320 (<.01%)
   cycles in affected programs: 618749 -> 584410 (-5.55%)
   helped: 271 / HURT: 0
   helped stats (abs) min: 69 max: 540 x̄: 126.71 x̃: 108
   helped stats (rel) min: 2.57% max: 37.97% x̄: 6.17% x̃: 5.06%
   95% mean confidence interval for cycles value: -135.89 -117.54
   95% mean confidence interval for cycles %-change: -6.72% -5.63%
   Cycles are helped.

   total sends in shared programs: 1025285 -> 1024355 (-0.09%)
   sends in affected programs: 6454 -> 5524 (-14.41%)
   helped: 271 / HURT: 0
   helped stats (abs) min: 2 max: 8 x̄: 3.43 x̃: 4
   helped stats (rel) min: 5.71% max: 25.00% x̄: 14.98% x̃: 17.39%
   95% mean confidence interval for sends value: -3.57 -3.29
   95% mean confidence interval for sends %-change: -15.42% -14.54%
   Sends are helped.

According to Felix DeGrood, this results in a 10% improvement in
the draw call time for certain draw calls from Strange Brigade.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>
2022-09-27 18:17:56 -07:00
Kenneth Graunke
be21d54aca intel/compiler: Use an existing URB write to end TCS threads when viable
VS, TCS, TES, and GS threads must end with a URB write message with the
EOT (end of thread) bit set.  For VS and TES, we shadow output variables
with temporaries and perform all stores at the end of the shader, giving
us an existing message to do the EOT.

In tessellation control shaders, we don't defer output stores until the
end of the thread like we do for vertex or evaluation shaders.  We just
process store_output and store_per_vertex_output intrinsics where they
occur, which may be in control flow.  So we can't guarantee that there's
a URB write being at the end of the shader.

Traditionally, we've just emitted a separate URB write to finish TCS
threads, doing a writemasked write to an single patch header DWord.
On Broadwell, we need to set a "TR DS Cache Disable" bit, so this is
a convenient spot to do so.  But on other platforms, there's no such
field, and this write is purely wasteful.

Insetad of emitting a separate write, we can just look for an existing
URB write at the end of the program and tag that with EOT, if possible.
We already had code to do this for geometry shaders, so just lift it
into a helper function and reuse it.

No changes in shader-db.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17944>
2022-09-27 18:17:42 -07:00
Lionel Landwerlin
e76e3d9cea intel/nir/rt: fixup alignment of memcpy iterations
Not sure if fixes anything because it's always 16 at least, but this
is more correct.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
2022-09-23 08:29:17 +00:00
Lionel Landwerlin
139e8f4635 intel/fs: fixup a64 messages
And run algebraic when either int64 for float64 are not supported so
those don't end up in the generated code.

Cc: mesa-stable
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
2022-09-23 08:29:17 +00:00
Lionel Landwerlin
838bbdcf2e intel/nir/rt: store ray query state in scratch
Initially I tried to store ray query state in the RT scratch space but
got the offset wrong. In the end putting this in the scratch surface
makes more sense, especially for non RT stages.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
2022-09-23 08:29:17 +00:00
Lionel Landwerlin
f7fab09a07 intel/nir/rt: change scratch check validation
It's very unfortunate that we have the RT scratch being conflated with
the usual scratch. In our implementation those are 2 different buffers.

The usual scratch access are done through the scratch surface state
(delivered through thread payload), while RT scratch (which outlives
thread dispatch with shader calls) is its own buffer.

So checking the NIR scratch size makes no sense as we can have normal
scratch accesses completely unrelated to RT scratch accesses.

This change switches the validation by looking at whether the scratch
base pointer intrinsic is being used (which is what we use/abuse to
implement RT scratch).

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
2022-09-23 08:29:17 +00:00
Lionel Landwerlin
259b1647e6 intel/nir/rt: fix ray query proceed level
Initially the level is world (top level), then it's whatever level the
potential hit is.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
2022-09-23 08:29:17 +00:00
Lionel Landwerlin
3f01071c79 intel/nir/rt: remove ray query mem hit writes at initialization
This will not even be read by HW.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
2022-09-23 08:29:17 +00:00
Lionel Landwerlin
f843bec7de intel/nir/rt: spill/fill the entire ray query data
We need the traversal stack to saved/restored along with mem hits.
Total spill/fill is 256bytes.

We can potentially optimize this but we have to be very careful about
what state the query is in.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c78be5da30 ("intel/fs: lower ray query intrinsics")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
2022-09-23 08:29:17 +00:00
Lionel Landwerlin
a88f725eea intel/nir/rt: fixup generate hit
This function copies the potential hit from its memory location to the
committed hit location. A couple of fields got their bit offset wrong.

Fixes some CTS tests in dEQP-VK.ray_query.*

v2: Copy primitive/instance leaf pointers

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0465714790 ("intel/nir/rt: add more helpers for ray queries")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17396>
2022-09-23 08:29:17 +00:00
Marcin Ślusarz
ac8020ebfd intel/compiler: add support for 8/16 bits task payload loads
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18501>
2022-09-21 09:16:20 +00:00
Marcin Ślusarz
ac581b30ec intel/compiler: refactor brw_nir_lower_mem_access_bit_sizes
Change dup_mem_intrinsic return type.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18501>
2022-09-21 09:16:20 +00:00
Marcin Ślusarz
a31b8fa38b intel/compiler/task: use shared memory for small task payload loads & stores
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18501>
2022-09-21 09:16:20 +00:00
Marcin Ślusarz
37e78803d7 intel/compiler: use nir_lower_task_shader pass
This implements task payload atomics in ANV.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16852>
2022-09-20 18:04:29 +00:00
Marcin Ślusarz
3c96959bbc intel/compiler: print shader after successful brw_nir_lower_shading_rate_output
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18702>
2022-09-20 17:23:45 +00:00
Marcin Ślusarz
cfd1e5a91e intel/compiler: remove second shading rate lowering for mesh
It's already called in brw_postprocess_nir and calling it the second time
actually breaks shading rate.

Initially, when I added this call here in 9acb30c8c4, I was testing it
on an internal tree, which didn't have brw_nir_lower_shading_rate_output call
in brw_postprocess_nir.

Fixes: 9acb30c8c4 ("intel/compiler: implement primitive shading rate for mesh")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18702>
2022-09-20 17:23:45 +00:00
José Roberto de Souza
f4857591e1 intel/compiler/fs: Use DF to load constants when has_64bit_int is not supported
This was already been done to gen7 platforms, so now extending to all
platforms without has_64bit_int.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18577>
2022-09-14 19:32:43 +00:00
José Roberto de Souza
daf0b67bc2 intel/compiler/fs: Fix compilation of shaders with SHADER_OPCODE_SHUFFLE of float64 type
During the lower_regioning() optimization, required_exec_type() is
returning BRW_REGISTER_TYPE_UQ type when processing
SHADER_OPCODE_SHUFFLE instructions of type BRW_REGISTER_TYPE_DF but
MTL has float64 support but lacks int64 support causing shader
compilation to fail.

To fix that we could make required_exec_type() return
BRW_REGISTER_TYPE_DF in such case but SHADER_OPCODE_SHUFFLE virtual
instruction runs in the integer pipeline(inferred_exec_pipe()).

So here replacing the has_64bit check by has_64bit_int, this will
properly handle older and newer cases making this function return
BRW_REGISTER_TYPE_UD.
Then lower_exec_type() will take care to generate 2 32bits operations
to accomplish the same.

While at it also dropping the 'devinfo->verx10 == 70' check as
GFX7_FEATURES fall into the same category as MTL, has float64 but no
int64 support.

Fixes at least this crucible tests:
func.uniform-subgroup.exclusive.fadd64.q0
func.uniform-subgroup.exclusive.fmin64.q0
func.uniform-subgroup.exclusive.fmax64.q0

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18577>
2022-09-14 19:32:43 +00:00
Caio Oliveira
e612f32e1a intel/compiler: Use brw_ud* helpers in thread payload code
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18176>
2022-09-13 01:44:24 +00:00