broadcom/compiler: don't sort nodes for register allocation

Nodes are allocated in order to registers so initially sorting was used to ensure that nodes with smaller life ranges would be assigned first and therefore be more likely to get accumulators. However, since d81a6e5f1d now we don't rely on order to make decisions about accumulators and instead we make policy decisions based on actual liveness, so sorting is no longer strictly relevant to this decision. Furthermore, we are not re-sorting nodes after each spill either, since that would probably require that we rebuild the interference graph after each spill (the graph identifies nodes by their index). Shader-db results show a significant improvement in instruction counts, due to more optimal accumulator assignments. The reason for this is that we use a round-robin policy for choosing the next accumulator to assign. The idea behind this is preventing nearby temps to be assigned to the same accumulator so that QPU scheduling is more flexible, but if we sort our nodes, we are basically not assigning temps in program order any more and the round-robin policy becomes less effective: total instructions in shared programs: 13000420 -> 12663189 (-2.59%) instructions in affected programs: 11791267 -> 11454036 (-2.86%) helped: 62890 HURT: 19987 total threads in shared programs: 415874 -> 415870 (<.01%) threads in affected programs: 20 -> 16 (-20.00%) helped: 2 HURT: 4 total uniforms in shared programs: 3711652 -> 3711624 (<.01%) uniforms in affected programs: 43430 -> 43402 (-0.06%) helped: 134 HURT: 173 total max-temps in shared programs: 2144876 -> 2138822 (-0.28%) max-temps in affected programs: 123334 -> 117280 (-4.91%) helped: 4112 HURT: 1195 total spills in shared programs: 3870 -> 3860 (-0.26%) spills in affected programs: 1013 -> 1003 (-0.99%) helped: 14 HURT: 12 total fills in shared programs: 5560 -> 5573 (0.23%) fills in affected programs: 1765 -> 1778 (0.74%) helped: 14 HURT: 17 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
2022-02-25 09:18:34 +01:00
parent 4483cd24af
commit 871b0a7f6a
1 changed files with 3 additions and 18 deletions
--- a/src/broadcom/compiler/vir_register_allocate.c
+++ b/src/broadcom/compiler/vir_register_allocate.c
@@ -785,15 +785,6 @@ vir_init_reg_sets(struct v3d_compiler *compiler)
        return true;
 }
 static int
 node_to_temp_priority(const void *in_a, const void *in_b)
 {
        const struct node_to_temp_map *a = in_a;
        const struct node_to_temp_map *b = in_b;
        return a->priority - b->priority;
 }
 static inline bool
 tmu_spilling_allowed(struct v3d_compile *c)
 {
@@ -970,24 +961,18 @@ v3d_register_allocate(struct v3d_compile *c)
                ra_set_node_reg(c->g, acc_nodes[i], ACC_INDEX + i);
        }
        /* Initialize our node/temp map */
        for (uint32_t i = 0; i < c->num_temps; i++) {
                c->ra_map.node[i].temp = i;
                c->ra_map.node[i].priority =
                        c->temp_end[i] - c->temp_start[i];
                c->ra_map.temp[i].node = i;
                c->ra_map.temp[i].class_bits = CLASS_BITS_ANY;
        }
        qsort(c->ra_map.node, c->num_temps, sizeof(c->ra_map.node[0]),
              node_to_temp_priority);
        for (uint32_t i = 0; i < c->num_temps; i++)
                c->ra_map.temp[c->ra_map.node[i].temp].node = i;
        /* Walk the instructions adding register class restrictions and
         * interferences.
         */
        for (uint32_t i = 0; i < c->num_temps; i++)
                c->ra_map.temp[i].class_bits = CLASS_BITS_ANY;
        int ip = 0;
        vir_for_each_inst_inorder(inst, c) {
                inst->ip = ip++;