If surface height during fast clear is 16k, as per bspec the height programmed
should be "value - 1" i.e. 0x3FFF. However, HW adds "1" to it but ignores
overflow bit[14]. HW performs OOB check based on bit[13:0] which is 0 and
drops failed transactions.
This patch passes the following failing test on LNL:
"PIGLIT_PLATFORM=gbm PIGLIT_DEFAULT_SIZE=16384x16384
shader_runner fast-slow-clear-interaction.shader_test -auto -fbo"
Signed-off-by: Aditya Swarup <aditya.swarup@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29182>
16-bit SIMD8 sampler writeback messages come with a bit of padding in
them, requiring us to emit a LOAD_PAYLOAD to reorganize the data into
the padding-free format expected by NIR. Additionally, we may reduce
the response length on the sampler messages based on which components
of the (always vec4) NIR destination are actually in use. When we do
that, dest_size > read_size, and the trailing components are all empty
BAD_FILE registers, indicating the contents are undefined.
Unfortunately, we can't ignore those trailing components entirely.
In the past, we left them default-initialized, giving us a BAD_FILE
register with UD type (which didn't matter, since all sampler returns
were 32-bit). But with 16-bit, this was confusing the LOAD_PAYLOAD.
For example, writing RGB and skipping A (without sparse) would produce
read_size = 3 and dest_size = 4 and nir_dest[5] containing:
nir_dest[] = <R:hf, G:hf, B:hf, blank-A:ud, blank-sparse:ud>
We'd then call LOAD_PAYLOAD on the first 4 sources, causing it to see
3 HF's and a UD, and try to copy the full 32-bit value at the end,
instead of 16-bits of pad like we intended. This meant it would
overflow the destination register's size, triggering validation errors.
Thanks to Ian Romanick for noticing this, writing a test, and also
coming up with a nearly identical fix.
Fixes: 0116430d39 ("intel/brw: Handle 16-bit sampler return payloads")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11617
References: https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/152
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30529>
We can achieve most of what brw_fs_opt_predicated_break() does with
simple peepholes at NIR -> BRW conversion time.
For predicated break and continue, we can simply look at an IF ... ENDIF
sequence after emitting it. If there's a single instruction between the
two, and it's a BREAK or CONTINUE, then we can move the predicate from
the IF onto the jump, and delete the IF/ENDIF. Because we haven't built
the CFG at this stage, we only need to remove them from the linked list
of instructions, which is trivial to do.
For the predicated while optimization, we can rely on the fact that we
already did the predicated break optimization, and simply look for a
predicated BREAK just before the WHILE. If so, we move the predicate
onto the WHILE, invert it, and remove the BREAK.
There are a few cases where this approach does a worse job than the old
one: nir_convert_from_ssa may introduce load_reg and store_reg in blocks
containing break, and nir_trivialize_registers may decide it needs to
insert movs into those blocks. So, at NIR -> BRW time, we'll actually
emit some MOVs there, which might have been possible to copy propagate
out after later optimizations.
However, the fossil-db results show that it's still pretty competitive.
For instructions, 1017 shaders were helped (average -1.87 instructions),
while only 62 were hurt (average +2.19 instructions). In affected
shaders, it was -0.08% for instructions.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
UBO loads with a non-indirect buffer index should be safe to perform
speculatively. With a direct offset, we may sometimes turn them into
push constants, at which point it's just reading a register with no
cost at all. Otherwise, we access them via messages that use surface
state, and automatically perform bounds checking. So we shouldn't have
any issues with reading out of bounds and page faulting, for example.
This allows nir_opt_peephole_sel() to operate on load_ubo intrinsics,
so we can turn simple if's with loads on both sides to bcsels. In some
cases this can collapse a surprising amount of control flow, allowing
other optimizations to work better.
The i965 OpenGL driver used load_uniform intrinsics, which are allowed
in NIR's peephole select pass. But iris uses the Gallium NIR pass that
translates uniforms to loads from UBO 0, so we haven't been able to take
advantage of NIR's peephole select pass there. The backend pass was
still able to handle this to some extent, however.
fossil-db results on Alchemist:
Totals:
Instrs: 150656329 -> 150645307 (-0.01%); split: -0.01%, +0.00%
Cycles: 12635230179 -> 12633696811 (-0.01%); split: -0.02%, +0.00%
Send messages: 7416330 -> 7416261 (-0.00%)
Spill count: 52471 -> 52473 (+0.00%)
Fill count: 100818 -> 100803 (-0.01%); split: -0.02%, +0.00%
Scratch Memory Size: 3197952 -> 3198976 (+0.03%)
Totals from 1848 (0.29% of 630003) affected shaders:
Instrs: 1412300 -> 1401278 (-0.78%); split: -0.80%, +0.02%
Cycles: 1809789567 -> 1808256199 (-0.08%); split: -0.11%, +0.03%
Send messages: 59829 -> 59760 (-0.12%)
Spill count: 3870 -> 3872 (+0.05%)
Fill count: 9693 -> 9678 (-0.15%); split: -0.18%, +0.02%
Scratch Memory Size: 174080 -> 175104 (+0.59%)
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>
If TraceRay() is called with the TerminateOnFirstHit flag, we need to
terminate the ray on the first confirmed intersection. This is handled
by the lowering of accept_ray_intersection and it's working fine for the
case of multiple instances of the intersection shader being called.
But if the shader calls reportIntersection() more than once, we were
handling them all and accepting the closest one regardless of the flag.
Check for the flag on every confirmed intersection and, if set, accept
it right there. The subsequent lowering will take care of terminating
handling the ray termination if necessary.
Fixes new test dEQP-VK.ray_tracing_pipeline.amber.flags-accept-first
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30418>