intel/fs: Use OPT() for split_virtual_grfs

Now that we're being conservative in the pass, it's easy to tell when it
makes progress and we can put it in the OPT() macro.  This way, we get
nice INTEL_DEBUG=optimizer dumps for it.  While we're here, fix the
header comment which is massively out-of-date.

Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13734>
This commit is contained in:
Jason Ekstrand
2021-11-09 14:38:48 -06:00
committed by Marge Bot
parent 38fa18a7a3
commit cf98a3cc19
2 changed files with 22 additions and 21 deletions

View File

@@ -2049,22 +2049,17 @@ fs_visitor::assign_gs_urb_setup()
/** /**
* Split large virtual GRFs into separate components if we can. * Split large virtual GRFs into separate components if we can.
* *
* This is mostly duplicated with what brw_fs_vector_splitting does, * This pass aggressively splits VGRFs into as small a chunks as possible,
* but that's really conservative because it's afraid of doing * down to single registers if it can. If no VGRFs can be split, we return
* splitting that doesn't result in real progress after the rest of * false so this pass can safely be used inside an optimization loop. We
* the optimization phases, which would cause infinite looping in * want to split, because virtual GRFs are what we register allocate and
* optimization. We can do it once here, safely. This also has the * spill (due to contiguousness requirements for some instructions), and
* opportunity to split interpolated values, or maybe even uniforms, * they're what we naturally generate in the codegen process, but most
* which we don't have at the IR level. * virtual GRFs don't actually need to be contiguous sets of GRFs. If we
* * split, we'll end up with reduced live intervals and better dead code
* We want to split, because virtual GRFs are what we register * elimination and coalescing.
* allocate and spill (due to contiguousness requirements for some
* instructions), and they're what we naturally generate in the
* codegen process, but most virtual GRFs don't actually need to be
* contiguous sets of GRFs. If we split, we'll end up with reduced
* live intervals and better dead code elimination and coalescing.
*/ */
void bool
fs_visitor::split_virtual_grfs() fs_visitor::split_virtual_grfs()
{ {
/* Compact the register file so we eliminate dead vgrfs. This /* Compact the register file so we eliminate dead vgrfs. This
@@ -2180,8 +2175,11 @@ fs_visitor::split_virtual_grfs()
} }
assert(reg == reg_count); assert(reg == reg_count);
if (!has_splits) bool progress;
if (!has_splits) {
progress = false;
goto cleanup; goto cleanup;
}
foreach_block_and_inst_safe(block, fs_inst, inst, cfg) { foreach_block_and_inst_safe(block, fs_inst, inst, cfg) {
if (inst->opcode == SHADER_OPCODE_UNDEF) { if (inst->opcode == SHADER_OPCODE_UNDEF) {
@@ -2236,11 +2234,15 @@ fs_visitor::split_virtual_grfs()
} }
invalidate_analysis(DEPENDENCY_INSTRUCTION_DETAIL | DEPENDENCY_VARIABLES); invalidate_analysis(DEPENDENCY_INSTRUCTION_DETAIL | DEPENDENCY_VARIABLES);
progress = true;
cleanup: cleanup:
delete[] split_points; delete[] split_points;
delete[] vgrf_has_split; delete[] vgrf_has_split;
delete[] new_virtual_grf; delete[] new_virtual_grf;
delete[] new_reg_offset; delete[] new_reg_offset;
return progress;
} }
/** /**
@@ -8280,9 +8282,6 @@ fs_visitor::optimize()
validate(); validate();
split_virtual_grfs();
validate();
#define OPT(pass, args...) ({ \ #define OPT(pass, args...) ({ \
pass_num++; \ pass_num++; \
bool this_progress = pass(args); \ bool this_progress = pass(args); \
@@ -8313,6 +8312,8 @@ fs_visitor::optimize()
int iteration = 0; int iteration = 0;
int pass_num = 0; int pass_num = 0;
OPT(split_virtual_grfs);
/* Before anything else, eliminate dead code. The results of some NIR /* Before anything else, eliminate dead code. The results of some NIR
* instructions may effectively be calculated twice. Once when the * instructions may effectively be calculated twice. Once when the
* instruction is encountered, and again when the user of that result is * instruction is encountered, and again when the user of that result is
@@ -8385,7 +8386,7 @@ fs_visitor::optimize()
OPT(opt_redundant_halt); OPT(opt_redundant_halt);
if (OPT(lower_load_payload)) { if (OPT(lower_load_payload)) {
split_virtual_grfs(); OPT(split_virtual_grfs);
/* Lower 64 bit MOVs generated by payload lowering. */ /* Lower 64 bit MOVs generated by payload lowering. */
if (!devinfo->has_64bit_float && !devinfo->has_64bit_int) if (!devinfo->has_64bit_float && !devinfo->has_64bit_int)

View File

@@ -148,7 +148,7 @@ public:
void assign_regs_trivial(); void assign_regs_trivial();
void calculate_payload_ranges(int payload_node_count, void calculate_payload_ranges(int payload_node_count,
int *payload_last_use_ip) const; int *payload_last_use_ip) const;
void split_virtual_grfs(); bool split_virtual_grfs();
bool compact_virtual_grfs(); bool compact_virtual_grfs();
void assign_constant_locations(); void assign_constant_locations();
bool get_pull_locs(const fs_reg &src, unsigned *out_surf_index, bool get_pull_locs(const fs_reg &src, unsigned *out_surf_index,