The ALU-based implementation of the barycentric interpolation
intrinsics introduced by a subsequent commit will require some
primitive setup information not delivered in the PS thread payload
unless explicitly requested:
- "Source Depth and/or W Attribute Vertex Deltas" if a
perspective-correct interpolation mode is used -- Note that this is
already requested for CPS interpolation, we just need to enable it
in more cases.
- "Perspective Bary Planes" if a perspective-correct interpolation
mode is used.
- "Non-Perspective Bary Planes" if a non-perspective-corrected
interpolation mode is used.
- "Sample offsets" if any at_sample interpolation is used so the
coordinate offsets of the sample can be calculated.
This ALU implementation of barycentric interpolation will only be
needed for *_at_offset and *_at_sample interpolation, since the fixed
function hardware still computes barycentrics for us at the current
sample coordinates, only the cases that previously relied on the Pixel
Interpolator shared function need to be re-implemented with ALU
instructions, since that shared function will no longer exist on Xe2
hardware.
Thanks to Rohan for a bugfix of the uses_sample_offsets calculation,
this patch includes his fix squashed in.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>
Like NIR, we print SSA defs as %1, %2, and so on. The number here is
the VGRF number. VGRFs that don't correspond to a SSA def remain
printed as vgrf1, vgrf2, and so on.
This makes it much easier to see what values are SSA and which aren't.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
Our code to initialize gl_SubgroupInvocation uses multiple instructions
some of which are partial writes. This makes it difficult to analyze
expressions involving gl_SubgroupInvocation, which appear very
frequently in compute shaders.
To make this easier, we add a new virtual opcode which initializes
a full VGRF to the value of gl_SubgroupInvocation. (We also expand
it to UD for SIMD8 so there are not partial write issues.) We then
lower it to the original code later on in compilation, after we've
done the bulk of our optimizations.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
In 2c65d90bc8 I forgot to add the new SHADER_OPCODE_READ_MASK_REG
opcode to the list of barrier instruction in the scheduler. Let's just
use a single opcode for all ARF registers that need special
scoreboarding and put the register as source (nicer for the debug
output).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 2c65d90bc8 ("intel/brw: ensure find_live_channel don't access arch register without sync")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29446>
Some commas were being skipped, according to history as an attempt
to elide BAD_FILEs, but we still print them, so be consistent. Also
for instructions without any sources, the trailing comma was always
being printed. Fix that too.
Example of instruction output before the change
halt_target(8) (null):UD,
send(8) (mlen: 1) (EOT) (null):UD, 0u, 0u, g126:UD(null):UD NoMask
and after it
halt_target(8) (null):UD
send(8) (mlen: 1) (EOT) (null):UD, 0u, 0u, g126:UD, (null):UD NoMask
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29114>
With the previous commit, we now have new builder helpers that will
allocate a temporary destination for us. So we can eliminate a lot
of the temporary naming and declarations, and build up expressions.
In a number of cases here, the code was confusingly mixing D-type
addresses with UD-immediates, or expecting a UD destination. But the
underlying values should always be positive anyway. To accomodate the
type inference restriction that the base types much match, we switch
these over to be purely UD calculations. It's cleaner to do so anyway.
Compared to the old code, this may in some cases allocate additional
temporary registers for subexpressions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28957>
Moves the lowering of VGRFs into FIXED_GRFs from the code generation
to (almost) right after the register allocation.
This will allow: (1) later passes not worry about VGRFs (and what they
mean in a post reg alloc phase) and (2) make easier to add certain
types of validation post reg alloc phase using the backend IR.
Note that a couple of passes still take advantage of seeing "allocated
VGRFs", so perform lowering after they run.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28604>
We no longer support the old LINE+MAC lowering, and we already lower
this to MAD in NIR on Gfx11+, so the LINTERP virtual opcode always
corresponds the PLN. The only catch is that LINTERP's operands are
reversed from PLN, so we have to switch them.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
We cannot rely on the immediate message descriptor having accurate
values for mlen and rlen at the IR level, since they are updated at
codegen time via 'inst->mlen' and 'inst->size_written', which could
end up with values inconsistent with the message descriptor if
e.g. the split sends optimization had an effect. Instead, define
helpers that do the computation without relying on the message
descriptor, and use the pre-existing
brw_message_desc_mlen()/brw_message_desc_rlen() helpers (fully
equivalent to the lsc helpers deleted here) during disassembly.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28484>
In the common case, fs_inst will have up to 4 sources (the HW
instructions have up to 3, and our representation of SENDs have 4).
Embed such array into the fs_inst, and use it whenever applicable
instead of allocating a new array.
Also change the code to reuse the allocated src array when resizing to
a smaller length.
Between the changes above and the reduced amount of initializing
fs_regs, this reduces fossil-db time by around 2% for Borderlands 3
and Rise of the Tomb Raider, and around 1.5% for Total War Warhammer 3.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28379>
Folks, there's more than one accumulator. In general, when the
register file is ARF, the upper 4 bits of the register number specify
which ARF, and the lower 4 bits specify which one of that ARF. This
can be further partitioned by the subregister number.
This is already mostly handled correctly for flags register, but lots
of places wanted to check the register number for equality with
BRW_ARF_ACCUMULATOR. If acc1 is ever specified, that won't work.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28281>
It's a logical opcode which is lowered to a send-from-GRF later. That
lowering code is responsible for ensuring the sources are set up in a
proper SEND payload.
This was preventing copy propagation of surface handles which started
out as scalars, were splatted out to full-SIMD values with NoMask, then
actually consumed as only component 0 (scalar again), because we thought
that scalar values were not allowed.
fossil-db on Alchemist shows improvements in q2rtx but no other titles:
Totals:
Instrs: 161310436 -> 161310152 (-0.00%)
Cycles: 14370605159 -> 14370601066 (-0.00%)
Totals from 17 (0.00% of 652298) affected shaders:
Instrs: 16097 -> 15813 (-1.76%)
Cycles: 185508 -> 181415 (-2.21%)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28286>