Looking at binding_layout->desc_type is sketchy in the face of mutable
descriptors. It's safer for load_descriptor() to just return the
descriptor. load_descriptor_for_idx_intrin() knows about the
descriptor's actual shader usage and we can do the optimization there.
This isn't actually a bug fix. The optimization just didn't happen in
the presence of mutable descriptors.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29591>
The ldc_nv and ldcx_nv intrinsics correspond to the index and bindless
forms of NVIDIA's LDC instruction, respectively. ldc_nv is pretty much
load_ubo without some of the unnecessary constant bits while ldcx_nv
takes a 64-bit bindless handle instead of an index. The other two give
us a little control over register allocation at the NIR level to ensure
that LDCX handles are placed in uniform registers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29591>
We can propagate within a non-uniform block just fine but not across
them because that might change live registers in unpredictable ways.
The real boundary here is that we can't propagate across an OpPin but
that's a lot harder to express.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29591>
These act as a vector OpCopy, except that copy-prop can't see through
them and the destination of OpPin gets pinned in the register file and
is unallowed to move. Of course, we have to be careful with these
because spilling can't spill them, either. If we have too many live
pinned values at the same time, spilling or RA may fail.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29591>
Unlike the pinned set in VecRegAllocator which exists for the duration
of an instruction, registers which are pinned in the main allocator are
pinned until the register is freed. The pinned set in VecRegAllocator
is initialized to a copy of the one in the main register allocator.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29591>
The really tricky case here is phis, which may have a uniform def even
though some of the srcs are non-uniform. This happens because of the
restriction elsewhere that requires UGPRs and UPreds to only ever be
written in uniform control-flow.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29591>
Because we go in and out of SSA, all the phis get re-created and the new
phis will default to divergent. This little pass attempts to prove as
many of the phis convergent as possible.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29591>
We know this is wrong. In many cases, they're faster than warp
instructions, sometimes with a latency as low as 2. However, there seem
to be a bunch of exceptions we don't understand and it's better to be
more concervative and have correct shaders.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29591>
This time we take into account WaR and WaW dependencies and not just RaW
dependencies. The NVIDIA ISA is actually quite dynamic and the not
everything is nicely pipelined such that writes always happen at
consistent cycles. There are exact rules, of course, but we don't know
what those are so we need to make some worst-case assumptions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29591>
UGPRs in warp instructions are treated more like cbufs than GPRs.
You're only allowed to have one and it has to share space with the
possible cbuf or immediate. This means we need to treat them as a "not
a register" case for warp instructions but as a register for uniform
instructions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29591>