On a7xx DI_SRC_SEL_AUTO_INDEX is used instead of DI_SRC_SEL_AUTO_XFB.
On a7xx the counter value and offset are shifted right by 2, so
the vertexStride should also be in units of dwords.
CTS doesn't test this though.
Fixes:
dEQP-VK.transform_feedback.simple.draw_indirect_*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23217>
As r3d_ops emits sysmem draws directly, it needs to be manually
updated to emit the A7XX commands instead of A6XX.
VK-CTS tests success on A630 + A740:
dEQP-VK.api.copy_and_blit.core.blit_image.*
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23217>
_Presumably_ invalidates workgroup-wide cache for image/buffer data access.
so while "fence" is enough to synchronize data access inside a workgroup,
for cross-workgroup synchronization we have to invalidate that cache.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23217>
Though blob is not seen to even use mode1 on a740, it uses
S2EN variant instead.
Fixes:
dEQP-VK.binding_model.descriptor_buffer.multiple.*
dEQP-VK.binding_model.descriptor_buffer.embedded_imm_samplers.*
dEQP-VK.pipeline.monolithic.descriptor_limits.compute_shader.*
Adapted from Jonathan Marek's changes.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23217>
The set of magic regs is different between generations and even
sub-gens. Adding a new one and/or emitting one on specific generation
takes much more code than necessary. Doing this in a single place is
much nicer.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23217>
Otherwise, if a secondary CS is grown and then executed without IB2,
the INDIRECT_BUFFER packet would have been copied but it shouldn't.
This fixes a regression that introduced GPU hangs with
gl_vk_meshlet_cadscene on RDNA2.
Fixes: df0c742543 ("radv/amdgpu: rework growing a CS with the chained IB path slightly")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24891>
If a secondary cmdbuf has been grown and is executed without IB2
(eg. on compute queue or when it's not allowed), the ib size ptr
contains chaining info, which means the IB size was wrong.
This fixes CPU crashes when running gl_vk_meshlet_cadscene.
Fixes: 277b2afd70 ("radv/amdgpu: add support for executing DGC cmdbuf with RADV_DEBUG=noibs")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24891>
Previous commit decoupled EGL_ANDROID_blob_cache from the disk cache
and haven't updated the SHADER_CACHE_DISABLE_BY_DEFAULT test-case that
is failing because now cache is always created even if disk cache is
disabled, such cache is NO-OP in this case. Fix the failing test.
Fixes: 39f26642 ("util: Decouple disk cache from EGL_ANDROID_blob_cache")
Reviewed-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24985>
Dolphin ubershaders have a pattern:
if (x && y) {
} else {
discard;
}
The current code to simplify if's will bail on this pattern, since the condition
is not a comparison. However, if that check is dropped and we allow NIR to
invert this, we get:
if (!(x && y)) {
discard;
} else {
}
which is now in a form for nir_opt_conditional_discard to turn into it
discard_if(!(x && y))
which may be substantially cheaper than the original code.
In general, I see no reason to restrict to conditionals. Assuming the backend is
clever enough to delete empty else blocks (I think most are), then this patch is
a strict win as long as inot instructions are cheaper than empty else blocks.
This matches my intuition for typical GPUs, where simple ALU instructions are
cheaper than control flow. Furthermore, it may be possible in practice for
backends to fold the inot into a richer set of instructions. For example, most
GPUs have a NAND instructions which would fold in the inot in the above code.
So just drop the check, simplify the pass, get the win.
---
Also, to avoid inflating register pressure, make sure we put the inot right
before the if. Android shader-db on is uninspiring due to terrible
coalescing decisions in the current RA. But it does fix the Dolphin smell.
total instructions in shared programs: 1756571 -> 1756568 (<.01%)
instructions in affected programs: 1600 -> 1597 (-0.19%)
helped: 1
HURT: 4
Inconclusive result (value mean confidence interval includes 0).
total bytes in shared programs: 11521172 -> 11521156 (<.01%)
bytes in affected programs: 10080 -> 10064 (-0.16%)
helped: 1
HURT: 4
Inconclusive result (value mean confidence interval includes 0).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24965>
the vbo index is used to set the stride, so it needs to be updated
affects dEQP-VK.pipeline.pipeline_library.bind_buffers_2.single.stride_0_4_offset_1_0.count_2
Fixes: 7672545223 ("gallium: move vertex stride to CSO")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24954>
Issue: For texture with multiple planes, the planes will point to the
same BO with the total size, so current vcn dt_size is incorrect.
(gdb) p/x *((struct si_resource *)(((struct vl_video_buffer *)out_surf)->resources[0]))
...
buf = 0x5555558daa30,
gpu_address = 0xffff800101000000,
bo_size = 0xa2000,
...
}
(gdb) p/x *((struct si_resource *)(((struct vl_video_buffer *)out_surf)->resources[1]))
...
buf = 0x5555558daa30,
gpu_address = 0xffff800101000000,
bo_size = 0xa2000,
...
}
This is because: in function static struct si_texture *si_texture_create_object(),
if (plane0) {
/* The buffer is shared with the first plane. */
resource->bo_size = plane0->buffer.bo_size;
...
radeon_bo_reference(sscreen->ws, &resource->buf, plane0->buffer.buf);
resource->gpu_address = plane0->buffer.gpu_address;
}
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9728
Cc: mesa-stable
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25013>