This (combined with a pass to actually set the corresponding NIR flags)
should help fix a lot of the regressions from the SMEM addition combining
change.
fossil-db (Navi):
Totals from 12 (0.01% of 135946) affected shaders:
CodeSize: 12376 -> 12304 (-0.58%)
Instrs: 2436 -> 2422 (-0.57%)
VMEM: 1105 -> 1096 (-0.81%)
SClause: 133 -> 130 (-2.26%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2720>
SMEM does the addition with 64-bits, not 32. So if the original code
relied on wrapping around (for example, for subtraction), it would break.
Apparently swizzled MUBUF accesses also have issues with combining
additions that could overflow. Normal MUBUF accesses seem fine.
fossil-db (Navi):
Totals from 27219 (20.02% of 135946) affected shaders:
CodeSize: 128303256 -> 131062756 (+2.15%); split: -0.00%, +2.15%
Instrs: 24818911 -> 25280558 (+1.86%); split: -0.01%, +1.87%
VMEM: 162311926 -> 177226874 (+9.19%); split: +9.36%, -0.17%
SMEM: 18182559 -> 20218734 (+11.20%); split: +11.53%, -0.34%
VClause: 423635 -> 424398 (+0.18%); split: -0.02%, +0.20%
SClause: 865384 -> 1104986 (+27.69%); split: -0.00%, +27.69%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2748
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2720>
For all VMEM instructions, the resource constant is now
in operands[0]. For MIMG instructions, the sampler shares
operands[1] with write data in case this instruction writes memory.
Moving the VADDR to be the last operand for MIMG is the first step to
support Navi NSA encoding.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3602>
Currently all usages of exec and vcc are hardcoded to use s2 regclass.
This commit makes it possible to use s1 in wave32 mode and
s2 in wave64 mode.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Several places in ACO we use SOP1 or SOP2 instructions to operate over the
exec mask or VCC, and these need to be adapted to the new size in wave32
mode.
This commit adds a way to deal with this problem in aco_builder: the caller
can specify a wave size specific opcode and the builder will translate that
to the correct opcode based on the current wave size.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
The multiplication reduction is larger than it could be, but it should be
easier to implement this way.
No failures with dEQP-VK.subgroups.*int64* except those caused by LLVM
being used for other stages.
v2: don't call setFixed() for v_add carry-out, since setHint sets physReg
v3: add and use emit_vadd32() helper
v4: use num_opcodes instead of last_opcode
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> (v3)
Converting to 'Conventional SSA Form' ensures correctness w.r.t. spilling of phi nodes.
Previously, it was possible that phi operands have intersecting live-ranges, and thus,
couldn't get spilled to the same spilling slot. For this reason, ACO tried to avoid to
spill phis, even if it was beneficial.
This patch implements a conversion pass which is currently only called if spilling is necessary.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Previously subgroup shuffle was implemented using the bpermute
instruction, which only works accross half-waves, so by itself it's
not suitable for implementing subgroup shuffle when the shader is
running in wave64 mode.
This commit adds a trick using shared VGPRs that allows to implement
subgroup shuffle still relatively effectively in this mode.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>