Commit Graph

12 Commits

Author SHA1 Message Date
Rob Clark
c148dbe07e v3d: don't use intr->num_components for non-vectorized intrinsics
Squashed-in-fix-from: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5371>
2020-06-16 02:48:18 +00:00
Iago Toral Quiroga
6c7a2b69f8 v3d: handle writes to gl_Layer from geometry shaders
When geometry shaders write a value to gl_Layer that doesn't correspond to
an existing layer in the target framebuffer the rendering behavior is
undefined according to the spec, however, there are CTS tests that trigger
this scenario on purpose, probably to ensure that nothing terrible happens.

For V3D, this situation is problematic because the binner uses the layer
index to select the offset to write into the tile state data, and we only
allocate tile state for MAX2(num_layers, 1), so we want to make sure we
don't produce values that would lead to out of bounds writes. The simulator
has an assert to catch this, although we haven't observed issues in actual
hardware it is probably best to play safe.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16 08:42:37 +01:00
Iago Toral Quiroga
a07d70c54b v3d: we always have at least one output segment
If we program an output size of 0 the simulator asserts. This was
not a problem until now because our VS would always have to
emit fixed function outputs, however, now that it can be paired
with a GS we can end up with a VS shader that no longer emits
any outputs.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16 08:42:37 +01:00
Iago Toral Quiroga
5d578c27ce v3d: add initial compiler plumbing for geometry shaders
Most of the relevant work happens in the v3d_nir_lower_io. Since
geometry shaders can write any number of output vertices, this pass
injects a few variables into the shader code to keep track of things
like the number of vertices emitted or the offsets into the VPM
of the current vertex output, etc. This is also where we handle
EmitVertex() and EmitPrimitive() intrinsics.

The geometry shader VPM output layout has a specific structure
with a 32-bit general header, then another 32-bit header slot for
each output vertex, and finally the actual vertex data.

When vertex shaders are paired with geometry shaders we also need
to consider the following:
  - Only geometry shaders emit fixed function outputs.
  - The coordinate shader used for the vertex stage during binning must
    not drop varyings other than those used by transform feedback, since
    these may be read by the binning GS.

v2:
 - Use MAX3 instead of a chain of MAX2 (Alejandro).
 - Make all loop variables unsigned in ntq_setup_gs_inputs (Alejandro)
 - Update comment in IO owering so it includes the GS stage (Alejandro)

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16 08:42:37 +01:00
Iago Toral Quiroga
d6b0786a38 v3d: add debug assert
While lowering vpm outputs we look for the NIR variables matching
particular store output instructions and we expect to find a match,
so assert on that.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2019-12-16 08:42:37 +01:00
Iago Toral Quiroga
e7e501efce v3d: rename vertex shader key (num)_fs_inputs fields
Until now this made sense because we always paired vertex shaders
with fragment shaders, but as soon as we implement geometry and
tessellation shaders that will no longer be the case, so rename
this to (num_)used_outputs.

v2: Use 'used_outputs' instead of ns_outputs, which is more explicit (Eric).

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-10-31 08:46:35 +00:00
Eric Anholt
2780a99ff8 v3d: Move the stores for fixed function VS output reads into NIR.
This lets us emit the VPM_WRITEs directly from
nir_intrinsic_store_output() (useful once NIR scheduling is in place so
that we can reduce register pressure), and lets future NIR scheduling
schedule the math to generate them.  Even in the meantime, it looks like
this lets NIR DCE some more code and make better decisions.

total instructions in shared programs: 6429246 -> 6412976 (-0.25%)
total threads in shared programs: 153924 -> 153934 (<.01%)
total loops in shared programs: 486 -> 483 (-0.62%)
total uniforms in shared programs: 2385436 -> 2388195 (0.12%)

Acked-by: Ian Romanick <ian.d.romanick@intel.com> (nir)
2019-03-05 10:59:40 -08:00
Eric Anholt
81b9361b68 v3d: Stop scalarizing our uniform loads.
We can pull a whole vector in a single indirect load.  This saves a bunch
of round-trips to the TMU, instructions for setting up multiple loads,
references to the UBO base in the uniforms, and apparently manages to
reduce register pressure as well.

instructions in affected programs: 3086665 -> 2454967 (-20.47%)
uniforms in affected programs: 919581 -> 721039 (-21.59%)
threads in affected programs: 1710 -> 3420 (100.00%)
spills in affected programs: 596 -> 522 (-12.42%)
fills in affected programs: 680 -> 562 (-17.35%)

Improves 3dmmes performance by 2.29312% +/- 0.139825% (n=5)
2019-01-04 15:41:23 -08:00
Eric Anholt
b0e0086257 v3d: Remove dead switch cases and comments from v3d_nir_lower_io.
Moving things to NIR left this mess around.  All we lower now is uniforms.
2019-01-04 15:41:23 -08:00
Eric Anholt
2977c77758 v3d: Use the original bit size when scalarizing uniform loads.
Prevents a regression in jekstrand's 1-bit series.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-12-16 21:03:01 +00:00
Eric Anholt
cc54e1acf9 v3d: Use nir_remove_unused_io_vars to handle binner shader output DCE
We were doing this late after nir_lower_io, but we can just reuse the core
code.  By doing it at this stage, we won't even set up the VS attributes
as inputs, reducing our VPM size.
2018-10-30 10:46:52 -07:00
Eric Anholt
ade416d023 broadcom: Add VC5 NIR compiler.
This is a pretty straightforward fork of VC4's NIR compiler to VC5.  The
condition codes, registers, and I/O have all changed, making the backend
hard to share, though their heritage is still recognizable.

v2: Move to src/broadcom/compiler to match intel's layout, rename more
    "vc5" to "v3d", rename QIR to VIR ("V3D IR") to avoid symbol conflicts
    with vc4, use new v3d_debug header, add compiler init/free functions,
    do texture swizzling in NIR to allow optimization.
2017-10-10 11:42:04 -07:00