broadcom/compiler: implement pipelining for general TMU operations

This creates the basic infrastructure to implement TMU pipelining and
applies it to general TMU. Follow-up patches will expand this
to texture and image/load store operations.

TMU pipelining means that we don't immediately end TMU sequences,
and instead, we postpone the thread switch and LDTMU (for loads)
or TMUWT (for stores) until we really need to do them.

For loads, we may need to flush them if another instruction reads
the result of a load operation. We can detect this because in that
case ntq_get_src() will not find the definition for that ssa/reg
(since we have not emitted the LDTMU instructions for it yet), so
when that happens, we flush all pending TMU operations and then
try again to find the definition for the source.

We also need to flush pending TMU operations when we reach the end
of a control flow block, to prevent the case where we emit a TMU
operation in a block, but then we read the result in another block
possibly under control flow.

It is also required to flush across barriers and discards to honor
their semantics.

Since this change doesn't implement pipelining for texture and
image load/store, we also need to flush outstanding TMU operations
if we ever have to emit one of these. This will be corrected with
follow-up patches.

Finally, the TMU has 3 fifos where it can queue TMU operations.
These fifos have limited capacity, depending on the number of threads
used to compile the shader, so we also need to ensure that we
don't have too many outstanding TMU requests and flush pending
TMU operations if a new TMU operation would overflow any of these
fifos. While overflowing the Input and Config fifos only leads
to stalls (which we want to avoid anyway), overflowing the Output
fifo is incorrect and would end up with a broken shader. This means
that we need to know how many TMU register writes are required
to emit a TMU operation and use that information to decide if we need
to flush pending TMU operations before we emit any register
writes for the new TMU operation.

v2: fix TMU flushing for NIR registers reads (jasuarez)

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
This commit is contained in:
Iago Toral Quiroga
2021-01-26 12:18:43 +01:00
committed by Marge Bot
parent 0e96f0f8cd
commit 197090a3fc
5 changed files with 390 additions and 126 deletions

View File

@@ -61,6 +61,9 @@ static const struct V3D41_TMU_CONFIG_PARAMETER_2 p2_unpacked_default = {
void
v3d40_vir_emit_tex(struct v3d_compile *c, nir_tex_instr *instr)
{
/* FIXME: allow tex pipelining */
ntq_flush_tmu(c);
unsigned texture_idx = instr->texture_index;
unsigned sampler_idx = instr->sampler_index;
@@ -343,6 +346,9 @@ void
v3d40_vir_emit_image_load_store(struct v3d_compile *c,
nir_intrinsic_instr *instr)
{
/* FIXME: allow image load/store pipelining */
ntq_flush_tmu(c);
unsigned format = nir_intrinsic_format(instr);
unsigned unit = nir_src_as_uint(instr->src[0]);
int tmu_writes = 0;