When geometry shaders write a value to gl_Layer that doesn't correspond to
an existing layer in the target framebuffer the rendering behavior is
undefined according to the spec, however, there are CTS tests that trigger
this scenario on purpose, probably to ensure that nothing terrible happens.
For V3D, this situation is problematic because the binner uses the layer
index to select the offset to write into the tile state data, and we only
allocate tile state for MAX2(num_layers, 1), so we want to make sure we
don't produce values that would lead to out of bounds writes. The simulator
has an assert to catch this, although we haven't observed issues in actual
hardware it is probably best to play safe.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
If we program an output size of 0 the simulator asserts. This was
not a problem until now because our VS would always have to
emit fixed function outputs, however, now that it can be paired
with a GS we can end up with a VS shader that no longer emits
any outputs.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Most of the relevant work happens in the v3d_nir_lower_io. Since
geometry shaders can write any number of output vertices, this pass
injects a few variables into the shader code to keep track of things
like the number of vertices emitted or the offsets into the VPM
of the current vertex output, etc. This is also where we handle
EmitVertex() and EmitPrimitive() intrinsics.
The geometry shader VPM output layout has a specific structure
with a 32-bit general header, then another 32-bit header slot for
each output vertex, and finally the actual vertex data.
When vertex shaders are paired with geometry shaders we also need
to consider the following:
- Only geometry shaders emit fixed function outputs.
- The coordinate shader used for the vertex stage during binning must
not drop varyings other than those used by transform feedback, since
these may be read by the binning GS.
v2:
- Use MAX3 instead of a chain of MAX2 (Alejandro).
- Make all loop variables unsigned in ntq_setup_gs_inputs (Alejandro)
- Update comment in IO owering so it includes the GS stage (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
While lowering vpm outputs we look for the NIR variables matching
particular store output instructions and we expect to find a match,
so assert on that.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Until now this made sense because we always paired vertex shaders
with fragment shaders, but as soon as we implement geometry and
tessellation shaders that will no longer be the case, so rename
this to (num_)used_outputs.
v2: Use 'used_outputs' instead of ns_outputs, which is more explicit (Eric).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This lets us emit the VPM_WRITEs directly from
nir_intrinsic_store_output() (useful once NIR scheduling is in place so
that we can reduce register pressure), and lets future NIR scheduling
schedule the math to generate them. Even in the meantime, it looks like
this lets NIR DCE some more code and make better decisions.
total instructions in shared programs: 6429246 -> 6412976 (-0.25%)
total threads in shared programs: 153924 -> 153934 (<.01%)
total loops in shared programs: 486 -> 483 (-0.62%)
total uniforms in shared programs: 2385436 -> 2388195 (0.12%)
Acked-by: Ian Romanick <ian.d.romanick@intel.com> (nir)
We can pull a whole vector in a single indirect load. This saves a bunch
of round-trips to the TMU, instructions for setting up multiple loads,
references to the UBO base in the uniforms, and apparently manages to
reduce register pressure as well.
instructions in affected programs: 3086665 -> 2454967 (-20.47%)
uniforms in affected programs: 919581 -> 721039 (-21.59%)
threads in affected programs: 1710 -> 3420 (100.00%)
spills in affected programs: 596 -> 522 (-12.42%)
fills in affected programs: 680 -> 562 (-17.35%)
Improves 3dmmes performance by 2.29312% +/- 0.139825% (n=5)
We were doing this late after nir_lower_io, but we can just reuse the core
code. By doing it at this stage, we won't even set up the VS attributes
as inputs, reducing our VPM size.
This is a pretty straightforward fork of VC4's NIR compiler to VC5. The
condition codes, registers, and I/O have all changed, making the backend
hard to share, though their heritage is still recognizable.
v2: Move to src/broadcom/compiler to match intel's layout, rename more
"vc5" to "v3d", rename QIR to VIR ("V3D IR") to avoid symbol conflicts
with vc4, use new v3d_debug header, add compiler init/free functions,
do texture swizzling in NIR to allow optimization.