This peephole optimization looks for a series of load/store_deref or
copy_deref instructions that copy an array from one variable to another
and turns it into a copy_deref that copies the entire array. The
pattern it looks for is extremely specific but it's good enough to pick
up on the input array copies in DXVK and should also be able to pick up
the sequence generated by spirv_to_nir for a OpLoad of a large composite
followed by OpStore. It can always be improved later if needed.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This pass looks for variables with vector or array-of-vector types and
narrows the type to only the components used.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This pass looks for array variables where at least one level of the
array is never indirected and splits it into multiple smaller variables.
This pass doesn't really do much now because nir_lower_vars_to_ssa can
already see through arrays of arrays and can detect indirects on just
one level or even see that arr[i][0][5] does not alias arr[i][1][j].
This pass exists to help other passes more easily see through arrays of
arrays. If a back-end does implement arrays using scratch or indirects
on registers, having more smaller arrays is likely to have better memory
efficiency.
v2 (Jason Ekstrand):
- Better comments and naming (some from Caio)
- Rework to use one hash map instead of two
v2.1 (Jason Ekstrand):
- Fix a couple of bugs that were added in the rework including one
which basically prevented it from running
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
This pass doesn't really do much now because nir_lower_vars_to_ssa can
already see through structures and considers them to be "split". This
pass exists to help other passes more easily see through structure
variables. If a back-end does implement arrays using scratch or
indirects on registers, having more smaller arrays is likely to have
better memory efficiency.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Equivalent to the already existing how_declared at GLSL IR. The only
difference is that we are not adding all the declaration_type
available on GLSL, only the one that we will use on the short term. We
would add more mode if needed on the future.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
These are copied from the from the corresponding values in
ir_variable. The intention is to eventually use them in a pure-NIR
linker.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Until now, we had separate passes for lowering gl_PatchVerticesIn to
a statically known constant (for TES inputs when linked against a TCS),
and a uniform in the other cases. Annoyingly, one had to be run before
nir_lower_system_values, and the other afterward. This simplified the
passes, but made life painful for the callers.
This patch combines both into a single pass. If you give it a non-zero
static count, it uses that. If you give it Mesa state slots, it turns
it back into a built-in uniform. Otherwise, it does nothing.
This also moves the i965 uniform lowering out to shared code.
v2: Make token arrays const.
Reviewed-by: Eric Anholt <eric@anholt.net>
This is controlled by a new nir_shader_compiler_options flag, and fixes
dEQP-GLES3.functional.shaders.builtin_variable.pointcoord on V3D.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: reword comment about lower_helper_invocations to be more clear
that it might not work on all hardware
v3: add special variant of load_sample_id which does not imply per-
sample shading
Signed-off-by: Rob Clark <robdclark@gmail.com>
OpenCL knows vector of size 8 and 16.
v2: rebased on master (nir_swizzle rework)
rework more declarations with nir_component_mask_t
adjust print_var_decl
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Karol Herbst <kherbst@redhat.com>
This pass searches for reasonably large local variables which can be
statically proven to be constant and moves them into shader constant
data. This is especially useful when large tables are baked into the
shader source code because they can be moved into a UBO by the driver to
reduce register pressure and make indirect access cheaper.
v2 (Jason Ekstrand):
- Use a size/align function to ensure we get the right alignments
- Use the newly added deref offset helpers
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit adds a concept to NIR of having a blob of constant data
associated with a shader. Instead of being a UBO or uniform that can be
manipulated by the client, this constant data considered part of the
shader and remains constant across all invocations of the given shader
until the end of time. To access this constant data from the shader, we
add a new load_constant intrinsic. The intention is that drivers will
eventually lower load_constant intrinsics to load_ubo, load_uniform, or
something similar. Constant data will be used by the optimization pass
in the next commit but this concept may also be useful for OpenCL.
v2 (Jason Ekstrand):
- Rename num_constants to constant_data_size (anholt)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Now that SSA values can be derefs and they have special rules, we have
to be a bit more careful about our LCSSA phis. In particular, we need
to clean up in case LCSSA ended up creating a phi node for a deref.
This fixes validation issues with some Vulkan CTS tests with the new
deref instructions.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
This commit completely reworks function calls in NIR. Instead of having
a set of variables for the parameters and return value, nir_call_instr
now has simply has a number of sources which get mapped to load_param
intrinsics inside the functions. It's up to the client API to build an
ABI on top of that. In SPIR-V, out parameters are handled by passing
the result of a deref through as an SSA value and storing to it.
This virtue of this approach can be seen by how much it allows us to
delete from core NIR. In particular, nir_inline_functions gets halved
and goes from a fairly difficult pass to understand in detail to almost
trivial. It also simplifies spirv_to_nir somewhat because NIR functions
never were a good fit for SPIR-V.
Unfortunately, there is no good way to do this without a mega-commit.
Core NIR and SPIR-V have to be changed at the same time. This also
requires changes to anv and radv because nir_inline_functions couldn't
handle deref instructions before this change and can't work without them
after this change.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This adds a concept of "members" to a variable with an interface type.
It allows you to specify the full variable data for each member of the
interface instead of once for the variable. We also add a lowering pass
to lower those variables to a sequence of variables and rewrite all the
derefs accordingly.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This will be removed at the end of the transition, but add some tracking
plus asserts to help ensure that lowering passes are called at the
correct point (pre or post deref instruction lowering) as passes are
converted and the point where lower_deref_instrs() is called is moved.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit adds a new instruction type to NIR for handling derefs.
Nothing uses it yet but this adds the data structure as well as all of
the code to validate, print, clone, and [de]serialize them.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is copied from the corresponding value in ir_variable. The
intention is to eventually use it in a pure-NIR linker.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Run this pass late (after opt loop) to move load_const instructions back
into the basic blocks which use the result, in cases where a load_const
is only consumed in a single block.
This helps reduce register usage in cases where the backend driver
cannot lower the load_const to a uniform.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
nir_ssa_def::parent_instr and nir_src::parent_instr have the same name,
but they mean really different things. I choose to save the next person
the hour+ that I just spent figuring that out. Even now that I know, I
doubt I'd notice in code review that someone typed foo->parent_instr
when they actually meant foo->ssa->parent_instr.
v2: Minor wording tweak in nir_ssa_def::parent_instr. Suggested by
Jason.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is basically the same as the GLSL lowering path.
v2: Fix typo in the link
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This is based on the glsl/lower_instructions.cpp implementation, but
should be much more readable.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
ufind_msb is easily expressed in terms of clz, and we can reduce ifind_msb
to that.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
V3D doesn't have opcodes for ibfe/ubfe, so we need to lower similarly to
glsl/lower_instructions.cpp.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
If you don't have HW to do bfi, then lowering bitfieldInsert to bfi makes
things harder than keeping the "bits" argument around.
This still uses bfm, but I've added the obvious lowering of bfm if you
need it.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Rename and change the prototype for consistency regarding
nir_tex_instr_is_query(). This function will be used in the
following patch.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This pass is required by the Midgard compiler; our instruction set uses
NIR-style booleans (~0 for true) but lacks a dedicated b2f instruction.
Normally, this lowering pass would be implemented in a backend-specific
algebraic pass, but this conflicts with the existing iand->b2f pass in
nir_opt_algebraic.py, hanging the compiler. This patch thus makes the
existing pass optional (default on -- all other backends should remain
unaffected), adding an optional pass for lowering the opposite
direction.
v2: Defer lowering until late algebraic optimisations to allow
optimising the b2f instruction itself.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Not all bit-sizes may be supported natively in hardware for all operations.
This pass allows drivers to lower such operations to a bit-size that is
actually supported and then converts the result back to the original
bit-size.
Compiler backends control which operations and wich bit-sizes require
the lowering through a callback function.
v2: generalize this pass and make it available in NIR core (Rob, Jason)
v3: remove some temporaries and reduce nesting in instruction loop using
a continue statement (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With this we should have no passes in src/compiler/nir with any
dependencies on headers from core GL Mesa.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Add helpers to get the number of src/dest components for an intrinsic,
and update spots that were open-coding this logic to use the helpers
instead.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Because nir_instr_remove is an inline wrapper around nir_instr_remove_v,
the compiler should be able to tell that the return value is unused and
not emit the extra code in most cases.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>