This is similar to old gens which couldn't support loading from GMEM
automatically. It will be needed for loads with a fragment density map,
because we need to scale the image when loading to GMEM.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20304>
Different uses in various registers and the texture descriptor have
different shifts, and we already had a few ugly workarounds to handle
this. Remove the foot-gun by specifying it in bytes and letting users
handle the shift themselves using the correct macro.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20304>
Otherwise accesses to non-0 views of input attachments may be considered
out-of-bounds and return 0. This should've been removed when enabling
multiview for GMEM, not sure how it was missed.
Fixes: def56b531c ("tu: Support GMEM with layered rendering and multiview")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20304>
to comply with ES2+ line clipping rules, guardband clipping should be
used so that the rasterizer will clip lines without using clip planes
fixes (llvmpipe):
dEQP-GLES*.functional.clipping.line.wide_line_clip_viewport_center
dEQP-GLES*.functional.clipping.line.wide_line_clip_viewport_corner
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17284>
Below is the summary from the power validation.. "it looks like the only
workload where I see savings from DCC is PLT and it is only about 65mW
which is just run to run variation. For Idle I am seeing ~280mW increase
in power, ~200mW increase for power_VideoCall, and ~80mW increase for VP"
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22771>
When an ACTIVE batch takes over the active writer role from a SUBMITTED
batch, the written BO has the syncobj from the latter even though the
writer is the former. This is correct and an intended state, but it
means that then we can't gate the syncobj cleanup in agx_batch_cleanup
on being the active writer, since the SUBMITTED batch won't be.
Fixes: asahi/mesa#18
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
This code was originally based on the Panfrost implementation, but has been
improved in a number of ways.
1. Transform feedback programs are dispatched generically with Gallium calls,
rather than emitting something hardware-specific. This is cleaner and
portable to future GPUs.
2. Transform feedback with indexed draws is now fixed, by lowering to an index
buffer pull.
3. Transform feedback with buffer overflows is now fixed, by correctly
bounds checking in transform feedback programs.
4. Transform feedback with strips/fans/loops are fixed, by correctly
tessellating to the underlying primitives as required by OpenGL.
5. Transform feedback with QUADS is fixed, by tessellating to triangles as
required by OpenGL.
That said, the code is still not in its final form.
1. It still does not support indirect draws. This will require a substantial
overhaul to do tracking on the GPU instead of the CPU. Currently we force
unroll indirect draws (slow but kosher in GL, treif in Vulkan). This isn't
hard to solve but I'm not going to duplicate the code until the algorithms
are otherwise complete because it's a lot easier to hack on the CPU versions
than the GPU versions.
2. It still does not support primitive restart. This has especially nasty
interactions with transform feedback. Again we force unroll to non-primitive
restart forms, again slow but kosher in GL but treif in Vulkan. This is a lot
harder to deal with. I sketched out something really nasty in my notebook
(hinging on efficient GPU prefix sums) but I'm not in a hurry to type this
out.
3. There will be interactions with geometry and tessellation shaders and I don't
think I can get the core code here future-proofed without actually bringing
up the new shader stages.
As such, this is a hard fork of the panfrost code for now, I'm not trying to
share the code (although it *would* clear out almost all of panfrost's transform
feedback related piglit failures).
Passes dEQP-GLES3.functional.transform_feedback.* and most of the relevant
piglits.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
This shortcuts all headaches about how big this should be. It does increase
memory usage a bit if there are lots of shader variants compiled, but this
should be tolerable, and can be optimized later if so required. Thanks to the
previous commit, the disk cache size should be unaffected.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
We were being sloppy with the sizes before. It mostly worked out, but there were
some corner cases where we would end up with mixed sized collects and that won't
end well for us. Let's rework the logic to make all the sizes explicit in NIR --
32-bit for depth and 16-bit stencil -- and then do the needed promotions to make
it happen in the AGX IR side.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
In case .x isn't read, it'll be null which has the wrong size and will fail
the validation added later in this series. We fix this by padding with sized
undefs (something that exists of defined size but undefined value) rather than
nothingness (of undefined size).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
Sometimes a shader might index with a non-power-of-two stride. For example, if
it's indexing into an array of structures where the structure size is not a
power of two, we'll get a multiply with a constant as opposed to a shift. We
want to handle these cases, too. To do so, we generalize our pattern matching to
look for any kind of multiply (with our new helper), rather than hardcoding
logic for ishl. This eliminates right-shifts in a pile of compute shaders, which
makes me happy from a "I read lots of shader assembly when debugging"
perspective.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
Currently, we hardcode logic in the addressing chasing code to look for ishl
instructions that shift by constants. We can generalize this to looking for
integer multiplies by constants to optimize more addressing patterns. Add a
helper to do so.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
This is totally pointless. This saves some waits at the ends of compute kernels
(waiting for stores to complete before terminating the thread). I don't know
how much this would matter for performance, since the hardware may have to do
these waits internally, but it makes the generated code less silly which is
always nice.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
This lets us shadow textures updated in the middle of rendering in Quake3.
They're big memcpys, but as long as the texture memory is cached it's ok. We use
a heuristic to avoid too many memcpys from uncached memory, which would cause
slideshow performance in quake.
We need to be careful to avoid shadowing shared resources, though, that's
invalid and would break WSI pretty hard.
It would be better to blit on the GPU for large shadowing, but that's more
involved and left for future work.
Reduces stuttering in Quake3.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
Comparison with PowerVR's XML shows that this is the actual name... And it needs
to be set a bit more carefully than "no colour output" in order to get correct
behaviour for depth-only passes that use sample mask / discard. Fix the name
first, the extra conditions will come when they're needed for multisampling.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>
When possible. Only occassionally possible because the loads are pretty limited
in the addressing arithmetic. This probably doesn't matter for performance but
it saves some noise in dEQP tests which makes for nicer debugging, plenty of
optimizations end up worth it for that alone.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>