intel/compiler/mesh: compactify MUE layout
Instead of using 4 dwords for each output slot, use only the amount of memory actually needed by each variable. There are some complications from this "obvious" idea: - flat and non-flat variables can't be merged into the same vec4 slot, because flat inputs mask has vec4 stride - multi-slot variables can have different layout: float[N] requires N 1-dword slots, but i64vec3 requires 1 fully occupied 4-dword slot followed by 2-dword slot - some output variables occur both in single-channel/component split and combined variants - crossing vec4 boundary requires generating more writes, so avoiding them if possible is beneficial This patch fixes some issues with arrays in per-vertex and per-primitive data (func.mesh.ext.outputs.*.indirect_array.q0 in crucible) and by reduction in single MUE size it allows spawning more threads at the same time. Note: this patch doesn't improve vk_meshlet_cadscene performance because default layout is already optimal enough. Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>
This commit is contained in:

committed by
Marge Bot

parent
fb765a65c8
commit
a252123363
@@ -126,6 +126,7 @@ fs_visitor::interp_reg(int location, int channel)
|
||||
|
||||
assert(prog_data->urb_setup[location] >= 0);
|
||||
unsigned nr = prog_data->urb_setup[location];
|
||||
channel += prog_data->urb_setup_channel[location];
|
||||
|
||||
/* Adjust so we start counting from the first per_vertex input. */
|
||||
assert(nr >= prog_data->num_per_primitive_inputs);
|
||||
@@ -142,19 +143,22 @@ fs_visitor::interp_reg(int location, int channel)
|
||||
* generate_code() time.
|
||||
*/
|
||||
fs_reg
|
||||
fs_visitor::per_primitive_reg(int location)
|
||||
fs_visitor::per_primitive_reg(int location, unsigned comp)
|
||||
{
|
||||
assert(stage == MESA_SHADER_FRAGMENT);
|
||||
assert(BITFIELD64_BIT(location) & nir->info.per_primitive_inputs);
|
||||
|
||||
const struct brw_wm_prog_data *prog_data = brw_wm_prog_data(this->prog_data);
|
||||
|
||||
comp += prog_data->urb_setup_channel[location];
|
||||
|
||||
assert(prog_data->urb_setup[location] >= 0);
|
||||
|
||||
const unsigned regnr = prog_data->urb_setup[location];
|
||||
const unsigned regnr = prog_data->urb_setup[location] + comp / 4;
|
||||
|
||||
assert(regnr < prog_data->num_per_primitive_inputs);
|
||||
|
||||
return fs_reg(ATTR, regnr, BRW_REGISTER_TYPE_F);
|
||||
return component(fs_reg(ATTR, regnr, BRW_REGISTER_TYPE_F), comp % 4);
|
||||
}
|
||||
|
||||
/** Emits the interpolation for the varying inputs. */
|
||||
|
Reference in New Issue
Block a user