Flag day change to replace the previous hardcoded background/end-of-tile shaders
and the API-style load/store_output in fragment shaders with the generated
shaders and lowered *_agx intrinsics. This gets us working non-UNORM8 render
targets and working MRT. It's also a step in the direction of working MSAA but
that needs a lot more work, since the multisampling programming model on AGX is
quite different from any of the APIs (including Metal).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
Mutlisampled texture fetch (txf_ms) is encoded like regular txf. However, we now
need to pack the multisample index in the right place, which we do by extending
our existing NIR texture source lowering pass. 2D MS arrays use a new value of
dim which requires tweaking the encoding slightly. Otherwise, everything is
bog standard.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
This pass creates preamble shaders. The heavylifting is done by
nir_opt_preamble. We do need to define the cost model for nir_opt_preamble, set
up 16-bit units for the register file, and scalarize the resulting
load/store_preamble intrinsics.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18813>
lower_resinfo generates some 64-bit math, so we need to handle it. Even
though we don't have native 64-bit moves, it's convenient to pretend we
do to avoid special cases in the IR. In particular, modelling 64-bit
mov_imm in the IR means our existing small constant propagation code
works, with zero-extension from 8->64.
Fixes dEQP-GLES3.functional.texture.units.2_units.only_2d_array.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18813>
This means we don't reserve the registers, which improves RA
considerably. Using a special preload psuedo-op instead of a regular
move allows us to constrain semantics and gaurantee coalescing.
shader-db on glmark2 subset:
total instructions in shared programs: 6448 -> 6442 (-0.09%)
instructions in affected programs: 230 -> 224 (-2.61%)
helped: 4
HURT: 0
total bytes in shared programs: 42232 -> 42196 (-0.09%)
bytes in affected programs: 1530 -> 1494 (-2.35%)
helped: 4
HURT: 0
total halfregs in shared programs: 2291 -> 1926 (-15.93%)
halfregs in affected programs: 2185 -> 1820 (-16.70%)
helped: 75
HURT: 0
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18804>
We know that phi nodes are always at the start (this is asserted in
agx_validate and a fundamental invariant of SSA form). That means we can
cheaply iterate all n phi nodes forward (or n non-phi nodes backwards)
in O(n) time. We already open code this idiom in a few places, use
common iterators instead so we don't need to justify in random places.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18804>
..Rather than at backend IR translation time. This is considerably
simpler because we can use the txs lowering instead of special casing
array sizes. Unfortunately it generates worse code, but that gap should
close once nir_opt_preamble is wired in.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18652>
There's no native txs instruction... but we can emulate one :-) This is
heavy on shader ALU, but in the production driver, it'll all be hoisted
up to the preamble shader and so it shouldn't matter much. This
keeps the driver itself simple and low overhead, with a completely
obvious generalization to bindless.
Passes dEQP-GLES3.functional.shaders.texture_functions.texturesize.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18525>
Texture offsets and shadow comparison values get grouped into a vector
passed by register. Comparison values are provided as-is (fp32). Texture
offsets are packed into nibbles, but we can do this on the CPU, as
nonconstant offsets are forbidden in GLSL at least. They're also
forbidden in Vulkan/SPIR-V without ImageGatherExtended/
shaderImageGatherExtended. I'm happy kicking the NIR lowering can down
the line, this commit is complicated enough already.
Passes dEQP-GLES3.functional.shaders.texture_functions.texture.* and
dEQP-GLES3.functional.shaders.texture_functions.textureoffset.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18525>
vb_mask can include garbage vbufs, we can't rely on it. This will
prevent a regression when switching to u_blitter based clears. This is
also simpler and shrinks the VS shader key so all in all a good thing.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
Instead of using driver_location magic and hoping things work, make the
linkage between vertex and fragment shaders explicit. Thanks to the
coefficient register mechanism reverse-engineered and documented earlier
in this series, this does not require any shader keys to support
separable shaders. It just requires that we regenerate the coefficient
register binding tables at draw time, based on the varying layouts
decided by the compiler independently for the VS and FS. This is more
robust in the face of separate shaders.
This also gets us glProvokingVertex() support without shader keys.
After that, we don't need any of the remapping prepasses. For fragment
shaders, any old mapping will do, so we can assign coefficient registers
as we go (based on what the program actually uses, not nir_variable
information that might be stale by this point). We do want to cache
coefficient registers, particularly for fragcoord.w which is used for
perspective interpolation everywhere.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
Lifted from ir3. Algorithm is the same; the data structures and interface are
lightly modified to decouple from ir3's IR.
Sequentializing parallel copies after RA is tricky. ir3's implementation works
well enough, so I use that one.
Original implementation by Connor Abbott.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16268>
Rather than using builder magic (implicitly lowered on emit), add actual pseudo
operations (explicitly lowered before encoding). In theory this is slower, I
doubt it matters. This makes the instruction aliases first-class for IR prining
and machine inspection, which will make optimization passes easier to write.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16268>