intel/compiler/mesh: compactify MUE layout
Instead of using 4 dwords for each output slot, use only the amount of memory actually needed by each variable. There are some complications from this "obvious" idea: - flat and non-flat variables can't be merged into the same vec4 slot, because flat inputs mask has vec4 stride - multi-slot variables can have different layout: float[N] requires N 1-dword slots, but i64vec3 requires 1 fully occupied 4-dword slot followed by 2-dword slot - some output variables occur both in single-channel/component split and combined variants - crossing vec4 boundary requires generating more writes, so avoiding them if possible is beneficial This patch fixes some issues with arrays in per-vertex and per-primitive data (func.mesh.ext.outputs.*.indirect_array.q0 in crucible) and by reduction in single MUE size it allows spawning more threads at the same time. Note: this patch doesn't improve vk_meshlet_cadscene performance because default layout is already optimal enough. Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20407>
This commit is contained in:

committed by
Marge Bot

parent
fb765a65c8
commit
a252123363
@@ -2085,6 +2085,21 @@ brw_nir_load_global_const(nir_builder *b, nir_intrinsic_instr *load_uniform,
|
||||
return sysval;
|
||||
}
|
||||
|
||||
const struct glsl_type *
|
||||
brw_nir_get_var_type(const struct nir_shader *nir, nir_variable *var)
|
||||
{
|
||||
const struct glsl_type *type = var->interface_type;
|
||||
if (!type) {
|
||||
type = var->type;
|
||||
if (nir_is_arrayed_io(var, nir->info.stage) || var->data.per_view) {
|
||||
assert(glsl_type_is_array(type));
|
||||
type = glsl_get_array_element(type);
|
||||
}
|
||||
}
|
||||
|
||||
return type;
|
||||
}
|
||||
|
||||
bool
|
||||
brw_nir_pulls_at_sample(nir_shader *shader)
|
||||
{
|
||||
|
Reference in New Issue
Block a user