broadcom/compiler: don't sort nodes for register allocation

Nodes are allocated in order to registers so initially sorting
was used to ensure that nodes with smaller life ranges would
be assigned first and therefore be more likely to get
accumulators.

However, since d81a6e5f1d now we don't rely on order to make
decisions about accumulators and instead we make policy decisions
based on actual liveness, so sorting is no longer strictly
relevant to this decision.

Furthermore, we are not re-sorting nodes after each spill either,
since that would probably require that we rebuild the interference
graph after each spill (the graph identifies nodes by their index).

Shader-db results show a significant improvement in instruction
counts, due to more optimal accumulator assignments. The reason for
this is that we use a round-robin policy for choosing the next
accumulator to assign. The idea behind this is preventing nearby
temps to be assigned to the same accumulator so that QPU scheduling
is more flexible, but if we  sort our nodes, we are basically not
assigning temps in program order any more and the round-robin policy
becomes less effective:

total instructions in shared programs: 13000420 -> 12663189 (-2.59%)
instructions in affected programs: 11791267 -> 11454036 (-2.86%)
helped: 62890
HURT: 19987

total threads in shared programs: 415874 -> 415870 (<.01%)
threads in affected programs: 20 -> 16 (-20.00%)
helped: 2
HURT: 4

total uniforms in shared programs: 3711652 -> 3711624 (<.01%)
uniforms in affected programs: 43430 -> 43402 (-0.06%)
helped: 134
HURT: 173

total max-temps in shared programs: 2144876 -> 2138822 (-0.28%)
max-temps in affected programs: 123334 -> 117280 (-4.91%)
helped: 4112
HURT: 1195

total spills in shared programs: 3870 -> 3860 (-0.26%)
spills in affected programs: 1013 -> 1003 (-0.99%)
helped: 14
HURT: 12

total fills in shared programs: 5560 -> 5573 (0.23%)
fills in affected programs: 1765 -> 1778 (0.74%)
helped: 14
HURT: 17

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
This commit is contained in:
Iago Toral Quiroga
2022-02-25 09:18:34 +01:00
committed by Marge Bot
parent 4483cd24af
commit 871b0a7f6a

View File

@@ -785,15 +785,6 @@ vir_init_reg_sets(struct v3d_compiler *compiler)
return true; return true;
} }
static int
node_to_temp_priority(const void *in_a, const void *in_b)
{
const struct node_to_temp_map *a = in_a;
const struct node_to_temp_map *b = in_b;
return a->priority - b->priority;
}
static inline bool static inline bool
tmu_spilling_allowed(struct v3d_compile *c) tmu_spilling_allowed(struct v3d_compile *c)
{ {
@@ -970,24 +961,18 @@ v3d_register_allocate(struct v3d_compile *c)
ra_set_node_reg(c->g, acc_nodes[i], ACC_INDEX + i); ra_set_node_reg(c->g, acc_nodes[i], ACC_INDEX + i);
} }
/* Initialize our node/temp map */
for (uint32_t i = 0; i < c->num_temps; i++) { for (uint32_t i = 0; i < c->num_temps; i++) {
c->ra_map.node[i].temp = i; c->ra_map.node[i].temp = i;
c->ra_map.node[i].priority = c->ra_map.node[i].priority =
c->temp_end[i] - c->temp_start[i]; c->temp_end[i] - c->temp_start[i];
c->ra_map.temp[i].node = i;
c->ra_map.temp[i].class_bits = CLASS_BITS_ANY;
} }
qsort(c->ra_map.node, c->num_temps, sizeof(c->ra_map.node[0]),
node_to_temp_priority);
for (uint32_t i = 0; i < c->num_temps; i++)
c->ra_map.temp[c->ra_map.node[i].temp].node = i;
/* Walk the instructions adding register class restrictions and /* Walk the instructions adding register class restrictions and
* interferences. * interferences.
*/ */
for (uint32_t i = 0; i < c->num_temps; i++)
c->ra_map.temp[i].class_bits = CLASS_BITS_ANY;
int ip = 0; int ip = 0;
vir_for_each_inst_inorder(inst, c) { vir_for_each_inst_inorder(inst, c) {
inst->ip = ip++; inst->ip = ip++;