Commit Graph

137 Commits

Author SHA1 Message Date
Caio Marcelo de Oliveira Filho
2663759af0 intel/fs: Add and use a new load_simd_width_intel intrinsic
Intrinsic to get the SIMD width, which not always the same as subgroup
size.  Starting with a small scope (Intel), but we can rename it later
to generalize if this turns out useful for other drivers.

Change brw_nir_lower_cs_intrinsics() to use this intrinsic instead of
a width will be passed as argument.  The pass also used to optimized
load_subgroup_id for the case that the workgroup fitted into a single
thread (it will be constant zero).  This optimization moved together
with lowering of the SIMD.

This is a preparation for letting the drivers call it before the
brw_compile_cs() step.

No shader-db changes in BDW, SKL, ICL and TGL.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4794>
2020-05-01 12:50:37 -07:00
Gert Wollny
42aa348dad nir: Add r600 specific intrinsics for tesselation shader IO
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4610>
2020-04-23 18:23:04 +00:00
Jason Ekstrand
33eb43349e nir: Add an alignment to nir_intrinsic_load_constant
In f1883cc73d we tried to pass through alignments from load_constant
intrinsics when rewriting them to load_ubo in iris.  However, those
intrinsics don't have ALIGN_MUL or ALIGN_OFFSET indices.  It's easy
enough to add them.  We just call the size/align function on the vector
type at the end of our deref chain and use the alignment returned from
there.  It's possible we could do better by walking the whole deref
chain but this should be good enough.

Fixes: f1883cc73d "iris: Set alignments on cbuf0 and constant reads"
Closes: #2739
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4468>
2020-04-16 17:00:13 +00:00
Connor Abbott
abcfb64370 ir3: Fix LDC offset units
I had missed that LDC actually uses vec4 units for its offset. This
means that we have to create a new instruction, and lower it in
ir3_nir_lower_io_offsets, similar to the existing SSBO instructions.
Unfortunately we can't assume that loads are always vec4-aligned, so we
have to use the alignment information that NIR gives us. Unfortunately,
it's currently woefully inadequate, and will have to be fixed to give us
good codegen in the future.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4568>
2020-04-15 22:38:20 +00:00
Connor Abbott
274f3815a5 ir3: Plumb through bindless support
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4358>
2020-04-09 15:56:55 +00:00
Alyssa Rosenzweig
7ab4e4dd96 nir: Add SSBO->global lowering pass
To facilitate lowering SSBOs to globals, we need a load_ssbo_address
intrinsic. This intrinsic takes an SSBO index and loads the address in
global memory of the SSBO (likely implemented via a uniform in the
driver). In the future, we'll support bounds checking, but at the moment
this is not supported (this pass should only be used for trusted
contexts at the moment, i.e. contexts without robustness extensions).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2753>
2020-02-21 13:06:22 +00:00
Gert Wollny
37125b7cc2 r600/sfn: Add lowering UBO access to r600 specific codes
r600 reads vec4 from the UBO, but the offsets in nir are evaluated to the component.
If the offsets are not literal then all non-vec4 reads must resolve the component
after reading a vec4 component (TODO: figure out whether there is a consistent way
to deduct the component that is actually read).

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3225>
2020-02-10 19:09:08 +00:00
Boris Brezillon
f5619f5073 pan/midgard: Turn Z/S stores into zs_output_pan intrinsics
Midgard can't write depth and stencil separately. It has to happen in
a single store operation containing both. Let's add a panfrost specific
intrinsic and turn all depth/stencil stores into a packed depth+stencil
one.

Note that this intrinsic is not yet handled in emit_intrinsic(), but
we'll address that later.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3697>
2020-02-05 15:41:55 +00:00
Samuel Pitoiset
cf6cae832c nir: lower interp_deref_at_vertex to load_input_vertex
This introduces a new NIR intrinsic for loading inputs at a specific
vertex index.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578>
2020-01-29 09:49:50 +00:00
Samuel Pitoiset
d29f10a7ca nir: add nir_intrinsic_interp_deref_at_vertex
From the SPV_AMD_shader_explicit_vertex_parameter extension:
   "Returns the value of the input <interpolant> without any
    interpolation, i.e. the raw output value of previous shader
    stage."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578>
2020-01-29 09:49:50 +00:00
Samuel Pitoiset
9021b45b35 nir: add nir_intrinsic_load_barycentric_model
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3578>
2020-01-29 09:49:50 +00:00
Jason Ekstrand
e40b11bbcb nir: Rename nir_intrinsic_barrier to control_barrier
This is a more explicit name now that we don't want it to be doing any
memory barrier stuff for us.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:47 +00:00
Jason Ekstrand
60097cc840 nir: Add a new memory_barrier_tcs_patch intrinsic
Right now, it's implemented as a no-op for everyone.  For most drivers,
it's a switch case in the NIR -> whatever which just breaks.  For ir3,
they already have code to delete tessellation barriers so we just add a
case to also delete memory_barrier_tcs_patch.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
2020-01-13 17:23:47 +00:00
Samuel Pitoiset
1b808d208f spirv,nir: add new lod parameter to image_{load,store} intrinsics
SPV_AMD_shader_image_load_store_lod allows to use a lod parameter
with OpImageRead, OpImageWrite and OpImageSparseRead.

According to the specification, this parameter should be a 32-bit
integer. It is initialized to 0 when no lod parameter is found
during SPIR-V->NIR translation.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2020-01-09 07:58:33 +01:00
Iago Toral Quiroga
6c7a2b69f8 v3d: handle writes to gl_Layer from geometry shaders
When geometry shaders write a value to gl_Layer that doesn't correspond to
an existing layer in the target framebuffer the rendering behavior is
undefined according to the spec, however, there are CTS tests that trigger
this scenario on purpose, probably to ensure that nothing terrible happens.

For V3D, this situation is problematic because the binner uses the layer
index to select the offset to write into the tile state data, and we only
allocate tile state for MAX2(num_layers, 1), so we want to make sure we
don't produce values that would lead to out of bounds writes. The simulator
has an assert to catch this, although we haven't observed issues in actual
hardware it is probably best to play safe.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16 08:42:37 +01:00
Alyssa Rosenzweig
deaebc82a7 nir: Add load_sampler_lod_paramaters_pan intrinsic
This loads in the <min_lod, max_lod, lod_bias> settings for a given
sampler, which is necessary for lowering clamps/biases on certain
Midgard chips.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-11-22 05:07:19 +00:00
Alyssa Rosenzweig
03f73c7fc6 nir: Add load_output_u8_as_fp16_pan intrinsic
This is a single opcode, at least on newer Midgard chips. It's easier to
have this represented in NIR rather than trying to optimize out the
conversions, so let's add the intrinsic.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
2019-11-11 15:23:44 +00:00
Kristian H. Kristensen
e28fbbd861 freedreno/ir3: Implement TCS synchronization intrinsics
We add two new IR3 specific nir intrinsics that map to the new condend
and endpatch instructions.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-11-07 16:40:27 -08:00
Kristian H. Kristensen
41984c8422 freedreno/ir3: Add ir3 intrinsics for tessellation
These provide the iovas for system memory buffers used for
tessellation as well as a new HW specific system value.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-11-07 16:36:50 -08:00
Kristian H. Kristensen
fe450ef4cf freedreno/ir3: Add load and store intrinsics for global io
These intrinsics take a ivec2 for the 64 bit base address and a
integer offset.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-11-07 16:36:44 -08:00
Caio Marcelo de Oliveira Filho
73572abc2a nir: Add scoped_memory_barrier intrinsic
Add a NIR instrinsic that represent a memory barrier in SPIR-V /
Vulkan Memory Model, with extra attributes that describe the barrier:

- Ordering: whether is an Acquire or Release;
- "Cache control": availability ("ensure this gets written in the memory")
  and visibility ("ensure my cache is up to date when I'm reading");
- Variable modes: which memory types this barrier applies to;
- Scope: how far this barrier applies.

Note that unlike in SPIR-V, the "Storage Semantics" and the "Memory
Semantics" are split into two different attributes so we can use
variable modes for the former.

NIR passes that took barriers in consideration were also changed

- nir_opt_copy_prop_vars: clean up the values for the mode of an
  ACQUIRE barrier.  Copy propagation effect is to "pull up a load" (by
  not performing it), which is what ACQUIRE restricts.

- nir_opt_dead_write_vars and nir_opt_combine_writes: clean up the
  pending writes for the modes of an RELEASE barrier.  Dead writes
  effect is to "push down a store", which is what RELEASE restricts.

- nir_opt_access: treat the ACQUIRE and RELEASE as a full barrier for
  the modes.  This is conservative, but since this is a GL-specific
  pass, doesn't make a difference for now.

v2: Fix the scoped barrier handling in copy propagation.  (Jason)
    Add scoped barrier handling to nir_opt_access and
    nir_opt_combine_writes.  (Rhys)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-24 11:39:55 -07:00
Erik Faye-Lund
beb6639a9d Revert "nir: drop unused alpha_ref_float"
This reverts commit e8095f2af0.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jose Maria Casanova <jmcasanova@igalia.com>
2019-10-23 13:03:38 +02:00
Kristian H. Kristensen
8e16fb1528 freedreno/ir3: Implement lowering passes for VS and GS
This introduces two new lowering passes. One to lower VS to explicit
outputs using STLW and one to lower GS to load input using LDLW and
implement the GS specific functionality.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-17 13:43:53 -07:00
Kristian H. Kristensen
0324706764 freedreno/ir3: Add intrinsics that map to LDLW/STLW
These intrinsics will let us do all the offset calculations in nir,
which is nicer to work with and lets nir_opt_algebraic eat it all up.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-17 13:43:53 -07:00
Erik Faye-Lund
e8095f2af0 nir: drop unused alpha_ref_float
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-17 10:41:36 +02:00
Jason Ekstrand
951cf94521 nir: Add explicit signs to image min/max intrinsics
This better matches all the other atomic intrinsics such as those for
SSBOs and shared variables where the sign is part of the intrinsic
opcode.  Both generators (GLSL and SPIR-V) know the sign from the type
of the image variable or handle.  In SPIR-V, signed min/max are separate
opcodes from unsigned.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-21 17:19:55 +00:00
Marek Olšák
9c7746ceae compiler: add SYSTEM_VALUE_TESS_LEVEL_OUTER/INNER_DEFAULT
TCS system values for internal passthru TCS, needed by radeonsi NIR support

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-12 14:52:17 -04:00
Marek Olšák
1b881852bc compiler: add SYSTEM_VALUE_USER_DATA_AMD
for internal radeonsi shaders
2019-08-12 14:52:17 -04:00
Pierre-Eric Pelloux-Prayer
a9ec718652 nir: add atomic_inc_wrap/atomic_dec_wrap image intrinsics
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-08-06 17:41:02 -04:00
Daniel Schürmann
e272fdd508 nir,intel: lower if (cond) demote() to new intrinsic demote_if(cond)
This will effectively enable the optimization in anv.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-24 13:02:18 -05:00
Andreas Baierl
f5804f1768 nir: Add gl_PointCoord system value
gl_PointCoord handling needs some special bits set in lima/ppir code
generation. Treating gl_PointCoord as a system value makes it easier
to distinguish from a regular varying.

Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 13:20:39 +00:00
Iago Toral Quiroga
50016d7718 nir: add a V3D-specific intrinsic for per-sample color writes
For per-sample color writes we need the output intrinsic to pack the
sample index, which is not provided with regular store_output intrinsics
unless we figured out a way to encode it into the base or the offset.

v2:
 - Drop the writemask (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-18 08:59:35 +02:00
Iago Toral Quiroga
b0eec9e27d nir: add a new v3d-specific intrinsic for tile buffer color reads
This is intended to be used, for example, with OpenGL logic operations. It
takes a render target as source and a sample index in the base index for
MSAA color reads.

v2: drop the CAN_ELIMINATE and CAN_REORDER flags (Eric).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-12 09:16:38 +02:00
Alyssa Rosenzweig
15000c79da nir: Add Panfrost-specific blending intrinsic
This gives more flexibility than the normal store_deref/store_output
versions (particularly, it allows us to abuse the type system in awful
ways, which is necessary for efficient format conversion in blend
shaders.)

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
2019-07-09 14:07:23 -07:00
Caio Marcelo de Oliveira Filho
a42e8f0ed1 nir: Add demote and is_helper_invocation intrinsics
From SPV_EXT_demote_to_helper_invocation.  Demote will be implemented
as a variant of discard, so mark uses_discard if it is used.

v2: Add CAN_ELIMINATE flag to the new intrinsic.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-08 08:57:25 -07:00
Connor Abbott
e5536aa584 compiler: Add color system value
This is nice to have with radeonsi, where color varyings are handled
specially to avoid recompiles.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-07-08 14:18:34 +02:00
Rob Clark
5787a2dfe3 nir: add pass to lower load_interpolated_input
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-02 16:15:25 +00:00
Connor Abbott
6f20643b47 nir: Allow qualifiers on copy_deref and image instructions
In the next commit, we'll properly handle access qualifiers on struct
members by propagating them to load/store instructions, but these
instructions had no way to specify the qualifier.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-06-19 14:08:27 +02:00
Daniel Schürmann
ea51275e07 nir: add intrinsics for AMD_shader_ballot
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-13 12:44:23 +00:00
Jonathan Marek
c12750527b nir: add type information to load uniform/input and store output intrinsics
This type information will be used by gather_ssa_types to get usable results

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Alyssa Rosenzweig
006cafc243 nir: Add blend_const_color_rgba sysval
This represents a float vec4 constant color, as passed to glBlendColor.
While the existing 4 shader sysvals are retained to minimize code churn,
a single vectorized intrinsic is required for efficient blending on
vector architectures. (This may also apply to archictectures like
Bifrost where ALU is scalar but load/store is vector; it largely depends
on how blending is implemented per-driver.)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-05-10 15:49:28 +00:00
Rob Clark
2f0b9d2249 freedreno/ir3: lower load_barycentric_at_offset
Calculates i,j at specified offset within a pixel.  A new load_size_ir3
intrinsic is used in conjunction with fddx/fddy to translate the offset
into primitive space and adjust the i,j from load_barycentric_pixel
accordingly.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Rob Clark
c4f423aa36 freedreno/ir3: lower load_barycentric_at_sample
This lowers load_barycentric_at_sample to load_sample_pos_from_id plus
load_barycentric_at_offset.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-04-25 14:13:31 -07:00
Jason Ekstrand
18ed82b084 nir: Add a pass for selectively lowering variables to scratch space
This commit adds new nir_load/store_scratch opcodes which read and write
a virtual scratch space.  It's up to the back-end to figure out what to
do with it and where to put the actual scratch data.

v2: Drop const_index comments (by anholt)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-12 15:59:31 -07:00
Eric Anholt
b88ef3bd76 nir: Add a comment about how intrinsic definitions work.
I was thinking about a refactor, and needed to read this first.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-12 15:56:12 -07:00
Eric Anholt
35355b4860 nir: Drop remaining references to const_index in favor of the call to use.
Please don't make me read a const_index[] expression ever again.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-12 15:56:04 -07:00
Eric Anholt
6e4d3d0a2f nir: Drop comments about the constant_index slots for load/stores.
The constant_index slots are named right there in the intrinsic
definition, and the comment is just a chance to get out of sync.  Noticed
while reviewing the lower_to_scratch changes that copy-and-pasted wrong
comments, and load_ubo and load_per_vertex_output had incorrect comments
currently.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-12 15:55:55 -07:00
Bas Nieuwenhuizen
282bacab4a nir: Add access qualifiers on load_ubo intrinsic.
Otherwise nir_lower_non_uniform_access crashes when it tries
to get the access of a load_ubo.

Fixes: 8ed583fe52 "spirv: Handle the NonUniformEXT decoration"
Fixes: e50ab2c0f2 "nir: Add access flags to deref and SSBO atomics"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-10 02:04:04 +02:00
Alyssa Rosenzweig
a83862754e nir: Add "viewport vector" system values
While a partial set of viewport system values exist, these are scalar
values, which is a poor fit for viewport transformations on vector ISAs
like Midgard (where the vec3 values for scale and offset each need to be
coherent in a vec4 uniform slot to take advantage of vectorized
transform math). This patch adds vec3 scale/offset fields corresponding
to the 3D Gallium viewport / glViewport+depth

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-04-04 03:44:09 +00:00
Jason Ekstrand
e50ab2c0f2 nir: Add access flags to deref and SSBO atomics
We will need them for a new ACCESS_NON_UNIFORM flag that's about to be
added in the next commit.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-03-25 16:12:09 -05:00