internal_format is the "real" format of a resource, and the "real" format
of imported resources is the external-facing format, not the pipe format
this ensures the correct format is available for internal ops, such as nplanes queries
Fixes: 2e2775c11b ("zink: fix PIPE_RESOURCE_PARAM_NPLANES with format modifier")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20753>
When the base array layer of the image view is > 0, addrlib computes
the offset (in HwlComputeSubResourceOffsetForSwizzlePattern) which is
then added to the base VA in RADV. But if the driver doesn't reset
the base array layer, the hw will compute incorrect addressing
(ie. base array will be added twice). This also matches AMDVLK.
This fixes a VM fault followed by a GPU hang on RDNA2 when trying
to join a multiplayer game with medium settings in Halo Infinite.
Fixes: 98ba1e0d81 ("radv: Fix mipmap views on GFX10+")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20761>
The ideal case for performance is to have a single buffer for
all display list. The caveat is that large buffers are less
likely to be freed because they're refcounted: it only takes
1 user (diplay list) to keep it in VRAM.
This lowers VRAM usage when replaying the trace attached
of the trace attached to !6140 from 5.5 GB to about 1.8 GB.
Viewperf snx performance isn't affected.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6140
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20748>
grow_vertex_storage may call wrap_filled_vertex, which will
trigger the assert incorrectly because the new size will be
smaller than 'new_size' but it's correct because
'vertex_store->used' has been reset to 0.
Fixes: a08baaff97 ("vbo/dlist: fix indentation in vbo_save_api.c")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20748>
These are handled identically in almost all cases. There is one place
in the legacy surface lowering that was obtaining the bitsize from the
opcode, but the LSC-based lowering uses (type_sz(inst->dst.type) * 8)
for that and works just fine. If we just do that in the legacy lowering
too, then we don't need this plethora of opcodes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
These are basically identical save for:
- shared has surface hardcoded to SLM rather than an SSBO index
- shared has to handle adding the 'base' const_index (SSBO have none)
- the NIR source index for data is shifted by one
It's not worth copy and pasting the entire function for this.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
These are now basically identical to their non-float counterparts. The
only thing that differed was the opcode checking to determine which
operands existed. Now that we have a unified opcode enum and a helper
for the number of data operands, we can just use that.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
The only reason for the separate opcode was because of the overlapping
BRW_AOP_* enums, making it impossible to tell whether a particular AOP
was the integer or float operation. Now that we use the lsc_opcode
enums, we can just have the legacy lowering inspect the opcode and
select the right descriptor. No need for a separate opcode.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
This gets our logical atomic messages using the lsc_opcode enum rather
than the legacy BRW_AOP_* defines. We have to translate one way or
another, and using the modern set makes sense going forward.
One advantage is that the lsc_opcode encoding has opcodes for both
integer and floating point atomics in the same enum, whereas the legacy
encoding used overlapping values (BRW_AOP_AND == 1 == BRW_AOP_FMAX),
which made it impossible to handle both sensibly in common code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
This avoids a violation of the Vulkan memory model that was leading to
intermittent failures of at least 8k test-cases of the Vulkan CTS
(within the group dEQP-VK.memory_model.*) on TGL and DG2 platforms.
In theory the issue may be reproducible on earlier platforms like IVB
and ICL, but the SYNC.ALLWR instruction is not available on those
platforms so a different (likely costlier) fix will be needed.
The issue occurs within the sequence we emit for a NIR memory barrier
with acquire semantics requiring the synchronization of multiple
caches, e.g. in pseudocode for a barrier involving the TGM and UGM
caches on DG2:
x <- load.ugm // Atomic read sequenced-before the barrier
y <- fence.ugm
z <- fence.tgm
wait(y, z)
w <- load.tgm // Read sequenced-after the barrier
In the example we must provide the guarantee that the memory load for
x is completed before the one for w, however this ordering can be
reversed with the intervention of a concurrent thread, since the UGM
fence will block on the prior UGM load and potentially take a long
time, while the TGM fence may complete and invalidate the TGM cache
immediately, so a concurrent thread could pollute the TGM cache with
stale contents for the w location *before* the UGM load has completed,
leading to an inversion of the expected memory ordering.
v2: Apply the workaround regardless of whether the NIR barrier
intrinsic specifies multiple storage classes or a single one,
since an acquire barrier is required to order subsequent requests
relative to previous atomic requests of unknown storage class not
necessarily specified by the memory scope information of the
intrinsic.
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20690>
In this usual case, do a quick check to avoid generating 5 useless instructions
(mov/vec4 instructions). They'll get copypropped but that creates more work for
the optimizer and nir/lower_blend runs in a hot variant path on both Asahi and
Panfrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
nir/lower_blend asserts:
assert(nir_intrinsic_write_mask(store) ==
nir_component_mask(store->num_components));
For the special blend shaders used in Panfrost, this holds. But for arbitrary
shaders coming out of GLSL-to-NIR (as used with Asahi), this does not hold. In
particular, after nir_opt_undef runs, undefined components can be trimmed.
Concretely, if we have the shader:
gl_FragColor.xyz = foo;
Then this becomes in NIR
gl_FragColor = vec4(foo.xyz, undef);
and then opt_undef will give the store_deref a wrmask of xyz but 4 components.
Then lower_blend asserts out.
Found in a gfxbench shader on asahi.
Closes: #6982
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
Per the spec.
Fixes arb_color_buffer_float-render on both Panfrost and Asahi (before/after
reproduced on Mali-T860 and AGX G13 respectively). Without that patch, that test
fails the assertion:
arb_color_buffer_float-render: ../src/compiler/nir/nir_lower_blend.c:259: nir_blend_logicop: Assertion `util_format_is_pure_integer(format)' failed.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
The upper bits start correctly, there's no need to clear them as long as we keep
them zero'ed by using ixor with a valid bit mask instead of inot.
Makes the code generated for logic op slightly less ridiculous. I'm joking. It's
still ridiculous but I'm not in the mood to fix up the Midgard compiler and it's
just a little ALU for a feature almost nothing uses.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
Particularly constant colours, but also (more obscurely) SNORM.
Fixes arb_color_buffer_float-render with SNORM framebuffers. Issue affects both
Asahi and Panfrost (the latter after we start advertising EXT_render_snorm).
v2: Check the blend factor to avoid unnecessary clamps. This avoids regressing
blend shader code quality on Panfrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com> [v1]
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>
In this case we have 4 components but the value of the fourth component
is undefined. Apply the fixup we already have.
Fixes
dEQP-GLES3.functional.draw_buffers_indexed.random.max_implementation_draw_buffers.0
on Asahi. That test blend with DST_ALPHA with its RGB565 attachment,
which is fine if RGB565 is preserved, but Asahi is demoting that
format to RGBX8 which means -- after lowering the tilebuffer access --
we blend with an ssa_undef.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20016>