There seems to be a problem with running firefox by using Xwayland that
results in a shared resources being not always tagged as using staging.
As a result one process tries to map the resource that was allocated as
one that uses staging without actually using the staging resource, and
hence the mapped range only accounts for the small region that we have
to allocated because a zero-allocation doesn't work, but the application
mapping the resource assumes that a properly sized range is mapped, and
consequently this results in invalid memory access.
To work around this issue disable creating staging for resources that
are created by using shared binding. It is not clear to me whether this
is the best fix, but it seems to quell the issue.
Fixes: c9d99b7eec
virgl: Fix texture transfers by using a staging resource
Related: https://gitlab.freedesktop.org/virgl/virglrenderer/-/issues/291
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19655>
R300/R400 GPUs can't do it in hardware and all the lowering should have
happened in NIR already, there is no point in wasting CPU time, just to
abort later when emitting.
Reduces CPU time for dEQP run by ~25% for RV370. The wallclock time is
now just slighly above 1 minute at 10 threads, mostly determined by the
long-running dEQP-GLES2.functional.flush_finish.* tests.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip@gawin.net>
Tested-by: Filip Gawin <filip@gawin.net>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19766>
Previously only 'decode_vs_state' would dump the referenced shader,
which meant we completely ignored every other shader when decoding
the '3DSTATE_PIPELINED_POINTERS' command.
Move the program disassembly logic from 'decode_vs_state' into a
common 'disasm_program_from_group' helper and call it from every
other decode_*_state function, too.
Signed-off-by: Daniil Tatianin <99danilt@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19880>
* load_uniform for sand8_stride is uint32 instead of int32 and its range
is 4 instead of 1 as it is counted in bytes.
* Now we save and restore constant buffer 1 properly for the ubo used
in the blit. We need to take into account that in V3D the first UBO
with index 0 is stored on constant buffer 1, because gallium uses
internally contant buffer 0 (See for reference commit c8212731e7)
* Removed not needed return.
* Added shader information about uniforms, ubos, inputs and outputs.
* Fixed typos in the comments.
Fixes: 95c4f0f910 "v3d: Enables DRM_FORMAT_MOD_BROADCOM_SAND128 support"
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19639>
Implements the support to blit SAND modifier with columns 128-bytes-wide
support for P030 format to P010 with UIF layout. This allows sampling
from H265 10-bit frames exported by the video decoder on the Raspberry
Pi 4 devices.
When a DRM_FORMAT_MOD_BROADCOM_SAND128 is enabled with an imported P030
texture. The sand30 blit converts the Luma and Chroma planes to
a tiled P010 format that can be sampled using gallium YUV lowerings
without the interleaved 128-bytes-wide-columns.
This patch follows a similar approach to SAND8 blit but extracting luma
and chroma components from the DRM_FORMAT_P030 format. P030 is a two
plane YCbCr420 format where 3 10 bit components with 2 padding bits are
packed in 4 bytes.
index 0 = Y plane, [31:0] x:Y2:Y1:Y0 2:10:10:10 little endian
index 1 = Cr:Cb plane, [63:0] x:Cr2:Cb2:Cr1:x:Cb1:Cr0:Cb0
[2:10:10:10:2:10:10:10] little endian
After the sand30_blit is done, the shadow texture is an UIF tiled texture
with an R16_UNORM format for luma and R16G16_UNORM for chroma.
To reduce the number of texture-fetch operations during the blit, we
read pairs of 32-bit dwords. They include 6 10-bit unorm components.
And then we write 4 UNORM16 components from an uvec4 because our render
targets do not support writing to UNORM16 formats.
As sampling will be done using 16bpp (luma) and 32bpp (chroma), the
sand30_blit writes consider the different microtile layouts of UIF
format between 64, 32 and 16 bpp.
v2: Fixes save and recovery of constant buffers (Iago)
Typo corrections. (Iago)
Removed not needed return. (Iago)
Added shader information about uniforms, ubos, inputs and outputs.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19639>
The address itself was already stable assuming that the memory itself was
allocated with capture/replay. Enable the feature flag and add an equality
check to return VK_ERROR_INVALID_OPAQUE_CAPTURE_ADDRESS_KHR on mismatch.
Tested with:
- dEQP-VK.ray_tracing_pipeline.capture_replay.*
- q2rtx gfxrecon replays correctly without major errors.
* There are debug logs about VkBuffers missing opaque address
for unknown reason, however the AS part is confirmed to be correctly
captured.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19841>
Found when 16bit vec3 varying with llvm14 (not found
when llvm15), one 32bit vec4 slot is filled like this:
vec3[0] | undef
vec3[1] | undef
vec3[2] | undef
undef | undef
LLVM error is for the elements with undef:
%287 = insertelement float %280, half %279, i64 0
After this change, we get:
%287 = insertelement <2 x half> %280, half %279, i64 0
Fixes: 279eea5bda ("amd/llvm: Transition to LLVM "opaque pointers"")
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19848>
Flag day change to replace the previous hardcoded background/end-of-tile shaders
and the API-style load/store_output in fragment shaders with the generated
shaders and lowered *_agx intrinsics. This gets us working non-UNORM8 render
targets and working MRT. It's also a step in the direction of working MSAA but
that needs a lot more work, since the multisampling programming model on AGX is
quite different from any of the APIs (including Metal).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
With multiple render targets, it's not practical to generate all
variants of the background and end-of-tile programs at start up. Rather
than trying, add a hash table of meta program keys to background
programs, and compile variants as they're needed.
With the new infrastructure, it's sensible to handle clears with the
same code path as reloads. In addition to getting us closer to multiple
render target support, this gets us support for non-RGBA8 render
targets, as the u8norm tilebuffer format was baked into the hardcoded
clear shader and store shaders used before.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>
Multisampling uses different values of the dimension enum in tandem with a new
samples field. Handle this in agx_pack_texture (split off here) so we can use
the new functionality for texture descriptors in reloads too.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19871>