agx: Implement vector live range splitting
The SSA killer feature is that, under an "optimal" allocator, the number of registers used (register demand) is *equal* to the number of registers required (register pressure, the maximum number of variables simultaneously live at any point in the program). I put "optimal" in scare quotes, because we don't need to use the exact minimum number of registers as long as we don't sacrifice thread count or introduce spilling, and using a few extra registers when possible can help coalesce moves. Details-shmetails. The problem is that, prior to this commit, our register allocator was not well-behaved in certain circumstances, and would require an arbitrarily large number of registers. In particular, since different variables have different sizes and require contiguous allocation, in large programs the register file may become fragmented, causing the RA to use arbitrarily many registers despite having lots of registers free. The solution is vector live range splitting. First, we calculate the register pressure (the minimum number of registers that it is theoretically possible to allocate successfully), and round up to the maximum number of registers we will actually use (to give some wiggle room to coalesce moves). Then, we will treat this maximum as a *bound*, requiring that we don't use more registers than chosen. In the event that register file fragmentation prevents us from finding a contiguous sequence of registers to allocate a variable, rather than giving up or using registers we don't have, we shuffle the register file around (defragmenting it) to make room for the new variable. That lets us use a few moves to avoid sacrificing thread count or introducing spilling, which is usually a great choice. Android GLES3.1 shader-db results are as expected: some noise / small regressions for instruction count, but a bunch of shaders with improved thread count. The massive increase in register demand may seem weird, but this is the RA doing exactly what it's supposed to: using more registers if and only if they would not hurt thread count. Notice that no programs whatsoever are hurt for thread count, which is the salient part. total instructions in shared programs: 1781473 -> 1781574 (<.01%) instructions in affected programs: 276268 -> 276369 (0.04%) helped: 1074 HURT: 463 Inconclusive result (value mean confidence interval includes 0). total bytes in shared programs: 12196640 -> 12201670 (0.04%) bytes in affected programs: 1987322 -> 1992352 (0.25%) helped: 1060 HURT: 513 Bytes are HURT. total halfregs in shared programs: 488755 -> 529651 (8.37%) halfregs in affected programs: 295651 -> 336547 (13.83%) helped: 358 HURT: 9737 Halfregs are HURT. total threads in shared programs: 18875008 -> 18885440 (0.06%) threads in affected programs: 64576 -> 75008 (16.15%) helped: 82 HURT: 0 Threads are helped. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23832>
This commit is contained in:

committed by
Marge Bot

parent
72e6b683f3
commit
766535c867
@@ -355,6 +355,11 @@ typedef struct agx_block {
|
||||
BITSET_WORD *live_in;
|
||||
BITSET_WORD *live_out;
|
||||
|
||||
/* For visited blocks during register assignment and live-out registers, the
|
||||
* mapping of SSA names to registers at the end of the block.
|
||||
*/
|
||||
uint8_t *ssa_to_reg_out;
|
||||
|
||||
/* Register allocation */
|
||||
BITSET_DECLARE(regs_out, AGX_NUM_REGS);
|
||||
|
||||
|
Reference in New Issue
Block a user