broadcom/compiler: implement pipelining for general TMU operations

This creates the basic infrastructure to implement TMU pipelining and
applies it to general TMU. Follow-up patches will expand this
to texture and image/load store operations.

TMU pipelining means that we don't immediately end TMU sequences,
and instead, we postpone the thread switch and LDTMU (for loads)
or TMUWT (for stores) until we really need to do them.

For loads, we may need to flush them if another instruction reads
the result of a load operation. We can detect this because in that
case ntq_get_src() will not find the definition for that ssa/reg
(since we have not emitted the LDTMU instructions for it yet), so
when that happens, we flush all pending TMU operations and then
try again to find the definition for the source.

We also need to flush pending TMU operations when we reach the end
of a control flow block, to prevent the case where we emit a TMU
operation in a block, but then we read the result in another block
possibly under control flow.

It is also required to flush across barriers and discards to honor
their semantics.

Since this change doesn't implement pipelining for texture and
image load/store, we also need to flush outstanding TMU operations
if we ever have to emit one of these. This will be corrected with
follow-up patches.

Finally, the TMU has 3 fifos where it can queue TMU operations.
These fifos have limited capacity, depending on the number of threads
used to compile the shader, so we also need to ensure that we
don't have too many outstanding TMU requests and flush pending
TMU operations if a new TMU operation would overflow any of these
fifos. While overflowing the Input and Config fifos only leads
to stalls (which we want to avoid anyway), overflowing the Output
fifo is incorrect and would end up with a broken shader. This means
that we need to know how many TMU register writes are required
to emit a TMU operation and use that information to decide if we need
to flush pending TMU operations before we emit any register
writes for the new TMU operation.

v2: fix TMU flushing for NIR registers reads (jasuarez)

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>
This commit is contained in:
Iago Toral Quiroga
2021-01-26 12:18:43 +01:00
committed by Marge Bot
parent 0e96f0f8cd
commit 197090a3fc
5 changed files with 390 additions and 126 deletions

View File

@@ -566,6 +566,24 @@ struct v3d_compile {
struct qinst **defs;
uint32_t defs_array_size;
/* TMU pipelining tracking */
struct {
/* NIR registers that have been updated with a TMU operation
* that has not been flushed yet.
*/
struct set *outstanding_regs;
uint32_t input_fifo_size;
uint32_t config_fifo_size;
uint32_t output_fifo_size;
struct {
nir_dest *dest;
uint32_t num_components;
} flush[8]; /* 16 entries / 2 threads for input/output fifos */
uint32_t flush_count;
} tmu;
/**
* Inputs to the shader, arranged by TGSI declaration order.
*
@@ -918,6 +936,7 @@ uint8_t vir_channels_written(struct qinst *inst);
struct qreg ntq_get_src(struct v3d_compile *c, nir_src src, int i);
void ntq_store_dest(struct v3d_compile *c, nir_dest *dest, int chan,
struct qreg result);
void ntq_flush_tmu(struct v3d_compile *c);
void vir_emit_thrsw(struct v3d_compile *c);
void vir_dump(struct v3d_compile *c);