intel/compiler/mesh: compactify MUE layout

Instead of using 4 dwords for each output slot, use only the amount
of memory actually needed by each variable.

There are some complications from this "obvious" idea:
- flat and non-flat variables can't be merged into the same vec4 slot,
  because flat inputs mask has vec4 stride
- multi-slot variables can have different layout:
   float[N] requires N 1-dword slots, but
   i64vec3 requires 1 fully occupied 4-dword slot followed by 2-dword slot
- some output variables occur both in single-channel/component split
  and combined variants
- crossing vec4 boundary requires generating more writes, so avoiding them
  if possible is beneficial

This patch fixes some issues with arrays in per-vertex and per-primitive data
(func.mesh.ext.outputs.*.indirect_array.q0 in crucible)
and by reduction in single MUE size it allows spawning more threads at
the same time.

Note: this patch doesn't improve vk_meshlet_cadscene performance because
default layout is already optimal enough.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>
This commit is contained in:
Marcin Ślusarz
2022-12-21 15:40:07 +01:00
committed by Marge Bot
parent fb765a65c8
commit a252123363
8 changed files with 478 additions and 118 deletions

View File

@@ -126,6 +126,7 @@ fs_visitor::interp_reg(int location, int channel)
assert(prog_data->urb_setup[location] >= 0);
unsigned nr = prog_data->urb_setup[location];
channel += prog_data->urb_setup_channel[location];
/* Adjust so we start counting from the first per_vertex input. */
assert(nr >= prog_data->num_per_primitive_inputs);
@@ -142,19 +143,22 @@ fs_visitor::interp_reg(int location, int channel)
* generate_code() time.
*/
fs_reg
fs_visitor::per_primitive_reg(int location)
fs_visitor::per_primitive_reg(int location, unsigned comp)
{
assert(stage == MESA_SHADER_FRAGMENT);
assert(BITFIELD64_BIT(location) & nir->info.per_primitive_inputs);
const struct brw_wm_prog_data *prog_data = brw_wm_prog_data(this->prog_data);
comp += prog_data->urb_setup_channel[location];
assert(prog_data->urb_setup[location] >= 0);
const unsigned regnr = prog_data->urb_setup[location];
const unsigned regnr = prog_data->urb_setup[location] + comp / 4;
assert(regnr < prog_data->num_per_primitive_inputs);
return fs_reg(ATTR, regnr, BRW_REGISTER_TYPE_F);
return component(fs_reg(ATTR, regnr, BRW_REGISTER_TYPE_F), comp % 4);
}
/** Emits the interpolation for the varying inputs. */