intel/compiler: Properly consider UBO loads that cross 32B boundaries.
The UBO push analysis pass incorrectly assumed that all values would fit within a 32B chunk, and only recorded a bit for the 32B chunk containing the starting offset. For example, if a UBO contained the following, tightly packed: vec4 a; // [0, 16) float b; // [16, 20) vec4 c; // [20, 36) then, c would start at offset 20 / 32 = 0 and end at 36 / 32 = 1, which means that we ought to record two 32B chunks in the bitfield. Similarly, dvec4s would suffer from the same problem. Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
This commit is contained in:
@@ -141,10 +141,16 @@ analyze_ubos_block(struct ubo_analysis_state *state, nir_block *block)
|
||||
if (offset >= 64)
|
||||
continue;
|
||||
|
||||
/* The value might span multiple 32-byte chunks. */
|
||||
const int bytes = nir_intrinsic_dest_components(intrin) *
|
||||
(nir_dest_bit_size(intrin->dest) / 8);
|
||||
const int end = DIV_ROUND_UP(offset_const->u32[0] + bytes, 32);
|
||||
const int regs = end - offset + 1;
|
||||
|
||||
/* TODO: should we count uses in loops as higher benefit? */
|
||||
|
||||
struct ubo_block_info *info = get_block_info(state, block);
|
||||
info->offsets |= 1ull << offset;
|
||||
info->offsets |= ((1ull << regs) - 1) << offset;
|
||||
info->uses[offset]++;
|
||||
}
|
||||
}
|
||||
|
Reference in New Issue
Block a user