broadcom/compiler: update how we compute return_words_of_texture_data on non-ssa

For the non-ssa case, we were trying to use reg->num_components. But
this is not the same that nir_ssa_def_components_read. It is the
number of components of the destination register. And in the 16bit
case, even if nir_lower_tex packs the outcome, it doesn't update the
number of components, as nir_tex_instr_dest_size would still return
4. And nir validate would check that those values are the same.

So this change focuses on the last part of this comment at
nir_lower_tex:

 * Note that we don't change the destination num_components, because
 * nir_tex_instr_dest_size() will still return 4.  The driver is just
 * expected to not store the other channels, given that nothing at the
 * NIR level will read them.

We just limit how many channels we would use for the f16 case.

It is also worth to note, based on the CTS and different applications
we test, that this is a corner case.

This was detected when we experimented to enable nir_opt_gcm for v3d,
that lead to raise an assertion slightly below with some shaderdb
tests, but technically it could happen without it.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
This commit is contained in:
Alejandro Piñeiro
2022-06-16 23:26:22 +02:00
committed by Marge Bot
parent ec10a37a52
commit 9cbc3ab239

View File

@@ -229,16 +229,37 @@ v3d40_vir_emit_tex(struct v3d_compile *c, nir_tex_instr *instr)
unsigned texture_idx = instr->texture_index;
unsigned sampler_idx = instr->sampler_index;
/* Even if the texture operation doesn't need a sampler by
* itself, we still need to add the sampler configuration
* parameter if the output is 32 bit
*/
bool output_type_32_bit =
c->key->sampler[sampler_idx].return_size == 32 &&
!instr->is_shadow;
struct V3D41_TMU_CONFIG_PARAMETER_0 p0_unpacked = {
};
/* Limit the number of channels returned to both how many the NIR
* instruction writes and how many the instruction could produce.
*/
p0_unpacked.return_words_of_texture_data =
instr->dest.is_ssa ?
nir_ssa_def_components_read(&instr->dest.ssa) :
(1 << instr->dest.reg.reg->num_components) - 1;
if (instr->dest.is_ssa) {
p0_unpacked.return_words_of_texture_data =
nir_ssa_def_components_read(&instr->dest.ssa);
} else {
/* For the non-ssa case we don't have a full equivalent to
* nir_ssa_def_components_read. This is a problem for the 16
* bit case. nir_lower_tex will not change the destination as
* nir_tex_instr_dest_size will still return 4. The driver is
* just expected to not store on other channels, so we
* manually ensure that here.
*/
uint32_t num_components = output_type_32_bit ?
MIN2(instr->dest.reg.reg->num_components, 4) :
MIN2(instr->dest.reg.reg->num_components, 2);
p0_unpacked.return_words_of_texture_data = (1 << num_components) - 1;
}
assert(p0_unpacked.return_words_of_texture_data != 0);
struct V3D41_TMU_CONFIG_PARAMETER_2 p2_unpacked = {
@@ -294,14 +315,6 @@ v3d40_vir_emit_tex(struct v3d_compile *c, nir_tex_instr *instr)
vir_WRTMUC(c, QUNIFORM_TMU_CONFIG_P0, p0_packed);
/* Even if the texture operation doesn't need a sampler by
* itself, we still need to add the sampler configuration
* parameter if the output is 32 bit
*/
bool output_type_32_bit =
c->key->sampler[sampler_idx].return_size == 32 &&
!instr->is_shadow;
/* p1 is optional, but we can skip it only if p2 can be skipped too */
bool needs_p2_config =
(instr->op == nir_texop_lod ||