Commit Graph

514 Commits

Author SHA1 Message Date
Timothy Arceri
c412ff426b nir: fix nir_variable_data packing
Before:

/* size: 60, cachelines: 1, members: 29 */

After:

/* size: 56, cachelines: 1, members: 29 */

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-10-24 13:22:59 +11:00
Rhys Perry
8b98d0954e nir/lower_idiv: add new llvm-based path
v2: make variable names snake_case
v2: minor cleanups in emit_udiv()
v2: fix Panfrost build failure
v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature
v4: remove nir_op_urcp
v5: drop nv50 path
v5: rebase
v6: add back nv50 path
v6: add comment for nir_lower_idiv_path enum
v7: rename _nv50/_llvm to _fast/_precise
v8: fix etnaviv build failure

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-21 18:49:46 +00:00
Rob Clark
5e08f070f0 nir: add nir_lower_amul pass
Lower amul to either imul or imul24, depending on whether 24b is enough
bits to calculate an offset within the thing being dereferenced.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-10-18 15:08:54 -07:00
Rob Clark
ad8167c1e0 nir/search: fix the PoT helpers
Otherwise, if the base type is (for example) uint32, we would
incorrectly think that PoT optimizations could not apply.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstsrand <jason@jleksrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Eduardo Lima Mitev
f1d4fadf1b nir: Add new texop nir_texop_tex_prefetch
This is like nir_texop_tex, but signals that the sampling coordinates
are immutable during the shader stage, in a way that allows the HW
that supports pre-dispatching sampling operations to pre-fetch
the result prior to scheduling the shader stage.

This is introduced to support the feature in Freedreno. Adreno HW
from a4xx supports it.

A NIR pass introduced later in this series will detect sampling
operations that are eligible for pre-dispatch, and replace
nir_texop_tex by this new op, to tell the backend to enable
pre-fetch.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-18 21:11:54 +00:00
Kristian H. Kristensen
8e16fb1528 freedreno/ir3: Implement lowering passes for VS and GS
This introduces two new lowering passes. One to lower VS to explicit
outputs using STLW and one to lower GS to load input using LDLW and
implement the GS specific functionality.

Signed-off-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-17 13:43:53 -07:00
Erik Faye-Lund
71c0dcf266 nir: support feeding state to nir_lower_clip_[vg]s
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-17 10:41:36 +02:00
Erik Faye-Lund
eb3047c094 nir: support lowering clipdist to arrays
This allows us to make sure clipdist is emitted as a scalar array rather
than two vec4s. This matches SPIR-V semantics, and will be useful for
Zink.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-17 10:41:36 +02:00
Erik Faye-Lund
878c94288a nir: add lowering-pass for point-size mov
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-17 10:41:36 +02:00
Erik Faye-Lund
6d7e02e37d nir: allow passing alpha-ref state to lowering-code
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-17 10:41:36 +02:00
Dave Airlie
dc91a02a72 nir: add a pass to lower flat shading.
This takes any color or backcolor that has unspecified
shading and converts it to flat shading.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-17 10:41:36 +02:00
Marek Olšák
cebc38ff60 nir: add nir_shader_compiler_options::lower_to_scalar
This will replace PIPE_SHADER_CAP_SCALAR_ISA.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-10 15:49:18 -04:00
Marek Olšák
3340c066a1 nir: move gl_nir_opt_access from glsl directory
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-10 15:49:18 -04:00
Samuel Iglesias Gonsálvez
45668a8be1 nir: add auxiliary functions to detect if a mode is enabled
v2:
- Added more functions.

v3:
- Simplify most of the functions (Caio).

v4:
- Updated to renamed enum values (Andres).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com> [v2]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> [v3]
2019-09-17 23:39:18 +03:00
Jason Ekstrand
f81a2623d8 nir: Add a block_is_unreachable helper
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-09-06 23:39:01 +00:00
Timur Kristóf
610cc3089c nir: Carve out nir_lower_samplers from GLSL code.
Lowering samplers is needed to produce NIR that can actually be
consumed by some gallium drivers, so it doesn't make sense to
to keep it only in the GLSL code.

This commit introduces nir_lower_samplers to compiler/nir,
while maintains the GL-specific function too.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-09-06 12:20:20 +03:00
Vasily Khoruzhick
9367d2ca37 nir: allow specifying filter callback in lower_alu_to_scalar
Set of opcodes doesn't have enough flexibility in certain cases. E.g.
Utgard PP has vector conditional select operation, but condition is always
scalar. Lowering all the vector selects to scalar increases instruction
number, so we need a way to filter only those ops that can't be handled
in hardware.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-09-06 01:51:28 +00:00
Alyssa Rosenzweig
a8f86fcb51 nir: Remove nir_const_load_to_arr
There are no remaining users in-tree.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-08-22 12:24:13 -07:00
Daniel Schürmann
df86c5ffb3 nir: add divergence analysis pass.
This pass expects the shader to be in LCSSA form.
The algorithm is based on 'The Simple Divergence Analysis' from
Diogo Sampaio, Rafael De Souza, Sylvain Collange, Fernando Magno Quintão Pereira.
Divergence Analysis. ACM Transactions on Programming Languages and Systems (TOPLAS)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-08-20 17:40:13 +02:00
Rhys Perry
911a1dfad2 nir/lcssa: allow to create LCSSA phis for loop-invariant booleans
ACO depends on LCSSA phis for divergent booleans to work correctly.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-20 17:40:05 +02:00
Daniel Schürmann
9c40ad49d5 nir/lcssa: Skip loop invariant variables when converting to LCSSA.
Co-authored-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-20 17:40:01 +02:00
Rhys Perry
8a6cfaa15a nir: make nir_to_lcssa() a general NIR pass.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-20 17:39:54 +02:00
Jason Ekstrand
5167e94f23 nir: Add more source types to nir_tex_instr_src_type
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-08-19 17:03:34 +00:00
Iago Toral Quiroga
48f5c34301 nir: add a pass to clamp gl_PointSize to a range
The OpenGL and OpenGL ES specs require that implementations clamp the
value of gl_PointSize to an implementation-depedent range. This pass
is useful for any GPU hardware that doesn't do this automatically
for either one or both sides of the range, such as V3D.

v2:
 - Turn into a generic NIR pass (Eric).
 - Make the pass work before lower I/O so we can use the deref variable
   to inspect if we are writing to gl_PointSize (Eric).
 - Make the pass take the range to clamp as parameter and allow it
   to clamp to both sides of the range or just one side.
 - Make the pass report progress.

v3:
 - Fix copyright header (Eric)
 - use fmin/fmax instead of bcsel to clamp (Eric)

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-13 09:44:12 +02:00
Rhys Perry
7740149852 nir: merge and extend nir_opt_move_comparisons and nir_opt_move_load_ubo
v2: add to series
v3: update Makefile.sources
v4: don't remove a comment and break statement
v4: use nir_can_move_instr

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 22:01:30 +00:00
Rhys Perry
da8ed68aca nir: replace nir_move_load_const() with nir_opt_sink()
This is mostly the same as nir_move_load_const() but can also move
undef instructions, comparisons and some intrinsics (being careful with
loops).

v2: actually delete nir_move_load_const.c
v3: fix nir_opt_sink() usage in freedreno
v3: update Makefile.sources
v4: replace get_move_def with nir_can_move_instr and nir_instr_ssa_def
v4: handle if uses
v4: fix handling of nested loops
v5: re-write adjust_block_for_loops
v5: re-write setting of use_block for if uses

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-08-12 22:01:30 +00:00
Rhys Perry
fd73ed1bd7 nir: add nir_lower_to_explicit()
v2: use glsl_type_size_align_func
v2: move get_explicit_type() to glsl_types.cpp/nir_types.cpp
v2: use align() instead of util_align_npot()
v2: pack arrays a bit tighter
v2: rename mem_* to field_*
v2: don't attempt to handle when struct offsets are already set
v2: use column_type() instead of recreating it
v2: use a branch instead of |= in nir_lower_to_explicit_impl()
v2: assign locations to variables and update shared_size and num_shared
v2: allow the pass to be used with nir_var_{shader_temp,function_temp}
v4: rebase
v5: add TODO
v5: small formatting changes
v5: remove incorrect assert in get_explicit_type()
v5: rename to nir_lower_vars_to_explicit_types
v5: correctly update progress when only variables are updated
v5: rename get_explicit_type() to get_explicit_shared_type()
v5: add comment explaining how get_explicit_shared_type() is different
v5: update cast strides
v6: update progress when lowering nir_var_function_temp variables
v6: formatting changes
v6: add more detailed documentation comment for get_explicit_shared_type
v6: rename get_explicit_shared_type to get_explicit_type_for_size_align
v7: fix comment in nir_lower_vars_to_explicit_types_impl()

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> (v5)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-08-08 12:10:39 -05:00
Jason Ekstrand
078dcb7ccd nir/lower_io: Add an option to lower 64-bit varyings
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-31 18:14:09 -05:00
Eric Engestrom
5d7bcac4e7 nir: remove explicit nir_intrinsic_index_flag values
These were left after a rebase and happen to make
NIR_INTRINSIC_SWIZZLE_MASK == NIR_INTRINSIC_SRC_ACCESS, which is how it
was noticed.

Fixes: 6f20643b47 ("nir: Allow qualifiers on copy_deref and image instructions")
Cc: Connor Abbott <cwabbott0@gmail.com>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-31 23:28:20 +01:00
Erico Nunes
b3676a6548 nir/algebraic: rename lower_bitshift to lower_bitops
Optimizations that insert bitshift or bitwise operations should not be
applied on GPUs that don't support integer operations.
The .lower_bitshift could be used to control the bitshift related ones,
but there was also one bitwise optimization uncovered.
Since only lima and freedreno use this option and the use case is that
no bit operations are wanted, let's rename it to .lower_bitops and use
it to control all bitops related optimizations.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-07-31 23:06:04 +02:00
Erico Nunes
4a407df682 nir/algebraic: add new fsum ops and fdot lowering
The Mali400 pp doesn't implement fdot but has fsum3 and fsum4, which can
be used to optimize fdot lowering. fsum2 is not implemented and can be
further lowered to an add with the vector components.
Currently lima ppir handles this lowering internally, however this
happens in a very late stage and requires a big chunk of code compared
to a nir_opt_algebraic lowering.
By having fsum in nir, we can reduce ppir complexity and enable the
lowered ops to be part of other nir optimizations in the optimization
loop.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-31 21:35:58 +02:00
Connor Abbott
156306e5e6 nir/find_array_copies: Handle wildcards and overlapping copies
This commit rewrites opt_find_array_copies to be able to handle
an array copy sequence with other intervening operations in between. In
particular, this handles the case where we OpLoad an array of structs
and then OpStore it, which generates code like:

foo[0].a = bar[0].a
foo[0].b = bar[0].b
foo[1].a = bar[1].a
foo[1].b = bar[1].b
...

that wasn't recognized by the previous pass.

In order to correctly handle copying arrays of arrays, and in particular
to correctly handle copies involving wildcards, we need to use a tree
structure similar to lower_vars_to_ssa so that we can walk all the
partial array copies invalidated by a particular write, including
ones where one of the common indices is a wildcard. I actually think
that when factoring in the needed hashing/comparing code, a hash table
based approach wouldn't be a lot smaller anyways.

All of the changes come from tessellation control shaders in Strange
Brigade, where we're able to remove the DXVK-inserted copy at the
beginning of the shader. These are the result for radv:

Totals from affected shaders:
SGPRS: 4576 -> 4576 (0.00 %)
VGPRS: 13784 -> 5560 (-59.66 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 8696 -> 6876 (-20.93 %) dwords per thread
Code Size: 329940 -> 263268 (-20.21 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 330 -> 898 (172.12 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-29 11:36:25 +02:00
Jonathan Marek
9be902097c nir/algebraic: add option to lower fall_equalN/fany_nequalN
Add generic lowerings for fall_equalN/fany_nequalN. These should be optimal
for vec4 backends that doesn't have any special instructions for it, as
long as they support saturate.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Jonathan Marek
1e089d0575 nir/algebraic: add option to lower fdph
For backends that don't have a 'fdph' instructions

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Jonathan Marek
bc3b6168ba nir: replace lower_sincos with algebraic opt
This version has less ops for the same precision.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2019-07-24 17:36:21 -04:00
Jason Ekstrand
0e6cb481fa nir: Add a nir_tex_instr_has_implicit_derivatives helper
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-23 13:40:41 -05:00
Jason Ekstrand
7a98c7804c nir: Move nir_alu_instr_is_comparison to the ALU section
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-23 13:40:41 -05:00
Timothy Arceri
30038dd5ec nir/lower_clip: add support for geometry shaders
This will be used to enabled compat profile support for geometry
shaders.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-07-19 09:25:47 +10:00
Eric Anholt
251c64a53d nir: Allow internal changes to the instr in nir_shader_lower_instructions().
v3d's NIR txf_ms lowering wants to swizzle around the input coordinates in
NIR, but doesn't generate a new txf_ms instructions as replacement.  It's
pretty easy to allow that in nir_shader_lower_instructions, and it may be
common in lowering passes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-07-18 11:28:56 -07:00
Jason Ekstrand
548da20b22 nir/lower_doubles: Handle fdiv and fsub directly
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
758fdce9fe nir: Add some generic helpers for writing lowering passes
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
c74b98486a nir: Add a helper for fetching the SSA def from an instruction
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-07-16 16:05:16 +00:00
Jason Ekstrand
0ba508d7a3 nir,intel: Add support for lowering 64-bit nir_opt_extract_*
We need this when doing full software 64-bit emulation.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110309
Fixes: cbad201c2b "nir/algebraic: Add missing 64-bit extract_[iu]8..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2019-07-15 16:08:37 -05:00
Jason Ekstrand
7a19e05e8c nir/opt_if: Clean up single-src phis in opt_if_loop_terminator
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111071
Fixes: 2a74296f24 "nir: add opt_if_loop_terminator()"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-15 19:58:51 +00:00
Ian Romanick
1259f6d802 nir: intel/vec4: Add flag to disable some algebraic optimizations
A couple patches later in this series use the flag to avoid a few
thousand shader-db regresions on all vec4 platforms.

I'm not particularly enamored with the name of this flag.  However, I
suspect the Intel vec4 backend is the only backend that will benefit
from it.  Specifically, the cases where this helps are all cases where
we want to prevent nir_opt_algebraic from rearranging instructions to
create 3-source instructions, such as ffma and flrp, with additional
immediate value or uniform sources.

The earlier commit "intel/vec4: Try to emit a single load for multiple
3-src instruction operands" solves most of the problems caused by
additional immediate values, but the restrictions on register strides
that cause problems for uniforms and shader inputs persist.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-11 10:20:03 -07:00
Jason Ekstrand
8f7405ed9d nir: Add some helpers for chasing SSA values properly
There are various cases in which we want to chase SSA values through ALU
ops ranging from hand-written optimizations to back-end translation
code.  In all these cases, it can be very tricky to do properly because
of swizzles.  This set of helpers lets you easily work with a single
component of an SSA def and chase through ALU ops safely.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
6e984bcb92 nir/instr_set: Expose nir_instrs_equal()
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
3acddc733f nir: Refactor nir_src_as_* constant functions
Now that we have the nir_const_value_as_* helpers, every one of these
functions is effectively the same except for the suffix they use so we
can easily define them with a repeated macro.  This also means that
they're inline and the fact that the nir_src is being passed by-value
should no longer really hurt anything.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Jason Ekstrand
ce5581e23e nir: Add more helpers for working with const values
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-07-10 00:20:59 +00:00
Ian Romanick
5450fd7a36 nir: Allow nir_ssa_alu_instr_src_components to operate on non-SSA destinations
Existing users only operate on instructions with SSA destinations.  Some
later patches add new direct calls and indirect calls (via existing NIR
functions) on instructions after going out of SSA.  At the very least,
these calls are added by:

intel/vec4: Try to emit a VF source in try_immediate_source
intel/vec4: Try to emit a single load for multiple 3-src instruction operands

The first commit adds direct calls, and the second adds calls via
nir_alu_srcs_equal and nir_alu_srcs_negative_equal.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-07-08 11:30:11 -07:00