Images with modifiers come with restrictions:
1. They have to be simple 2D images right now
2. They need to have a sensible format (not compressed, multi-plane, or
non-power-of-two)
3. If a CCS modifier is being requested, they have to actually support
CCS_E and be CCS-compatible with any other formats the client may
wish to use for image views.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3434>
Ensure the hardware cursor is disabled when we set the mode for a
VkDisplayKHR object. The extension doesn't expose any mechanisms to
program the hardware cursor, so we need to ensure it is hidden.
Currently, it seems like X is responsible for disabling the cursor
before handing over the lease. But that seems a little frail, and we
should be disabling the cursor ourselves so it works correctly
independently of how the lease was prepared for us.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1922>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1922>
When swr driver is in use it print detected architecture
message to std::err. It can be harmfull when swr is using
in multinodes environments.
It can be enabled setting env var SWR_PRINT_INFO to 1.
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
The current logic considers that the nir_intrinsic_component(store_intr)
encodes the source components start, but it actually encodes the
destination one. Source component offset adjustment is taken care of in
install_registers_instr(), when offset_swizzle() is called.
This fixes dEQP-GLES2.functional.shaders.random.all_features.fragment.45
when PAN_MESA_DEBUG=deqp (looks like exposing GLES3 features has an
impact on the varyings layout).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3429>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3429>
The pre-tag right before is a block-level tag, which means it implicitly
terminates the paragraph. So there's no paragraph to close after this.
Instead, move the paragraph-closing before the pre-tag, to explicitly
close the paragraph.
Fixes: 41b3eb08d9 "docs: update meson docs for windows"
Acked-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3431>
Fixes the following valgrind error:
Invalid read of size 16
at 0x28F458A1: si_set_sampler_view_desc (in radeonsi_drv_video.so)
by 0x28F4657E: si_set_sampler_views (in radeonsi_drv_video.so)
by 0x28D62BF5: util_compute_blit (in radeonsi_drv_video.so)
by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so)
by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so)
by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0)
Address 0x18142a10 is 0 bytes inside a block of size 48 free'd
at 0x48369AB: free (vg_replace_malloc.c:540)
by 0x28D62D51: util_compute_blit (in radeonsi_drv_video.so)
by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so)
by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so)
by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0)
Block was alloc'd at
at 0x4837B65: calloc (vg_replace_malloc.c:762)
by 0x28EFB2EC: si_create_sampler_state (in radeonsi_drv_video.so)
by 0x28D62C30: util_compute_blit (in radeonsi_drv_video.so)
by 0x28D3A944: vlVaHandleVAProcPipelineParameterBufferType (in radeonsi_drv_video.so)
by 0x28D34EE1: vlVaRenderPicture (in radeonsi_drv_video.so)
by 0x4B2582B: vaRenderPicture (in libva.so.2.500.0)
Fixes: 69430d7e59 ("va: use a compute shader for the blit")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2321
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3428>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3428>
Introduce separate helper functions to set the blendfactor bits.
Lima uses bits 0-2 for the type, bit 3 sets the inverted function
and bit 4 is set if alpha is used.
alpha_src_factor and alpha_dst_factor don't need the alpha bit, so
they are masked with 0xf. There is only place for 4 bits anyway.
If alpha_src_factor is PIPE_BLENDFACTOR_SRC_ALPHA_SATURATE, we need
to change it to PIPE_BLENDFACTOR_ONE first.
This is exactly what the blob does and we pass all
dEQP-GLES2.functional.fragment_ops.blend.* tests now.
Better than the blob btw...
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3411>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3411>
This patch changes lower_to_cssa to be much more conservative
about assumptions which phi operands might interfere.
Previously, this pass wasn't exhaustive and could miss some corner cases.
v2: remove optimizations to find better insertion points as it's hard
to guarantee that they are always correct and have overall no benefit.
Fixes: 0b8216b2cd ('aco: Lower to CSSA')
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3385>
Some applications explicitly call glTex[ture]Parameteri[v] to set
GL_TEXTURE_MAX_LEVEL and GL_TEXTURE_BASE_LEVEL before uploading any
texture data. Core Mesa initializes MaxLevel to 1000, so if it isn't
that, we know they've set it. (We check for < TEXTURE_MAX_LEVELS to
avoid hardcoding that value, however.)
If MaxLevel - BaseLevel > 0, then the app is trying to tell us that
this texture is going to have multiple miplevels. In that case, go
ahead and allocate the space for it.
Avoids many resource_copy_region calls at texture finalization time
in the Civilization VI benchmark.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3401>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3401>
No stride / no attributes means that nothing is being written to the
buffer. However it might still prevent primitives from being written out
to the other buffers. Disabling it entirely seems to fix it.
Fixes GTF-GL45.gtf30.GL3Tests.transform_feedback.transform_feedback_overflow
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The existing liveness analysis in ppir still ultimately relies on a
single continuous live_in and live_out range per register and was
observed to be the bottleneck for register allocation on complicated
examples with several control flow blocks.
The use of live_in and live_out ranges was fine before ppir got control
flow, but now it ends up creating unnecessary interferences as live_in
and live_out ranges may span across entire blocks after blocks get
placed sequentially.
This new liveness analysis implementation generates a set of live
variables at each program point; before and after each instruction and
beginning and end of each block.
This is a global analysis and propagates the sets of live registers
across blocks independently of their sequence.
The resulting sets optimally represent all variables that cannot share a
register at each program point, so can be directly translated as
interferences to the register allocator.
Special care has to be taken with non-ssa registers. In order to
properly define their live range, their alive components also need to be
tracked. Therefore ppir can't use simple bitsets to keep track of live
registers.
The algorithm uses an auxiliary set data structure to keep track of the
live registers. The initial implementation used only trivial arrays,
however regalloc execution time was then prohibitive (>1minute on
Cortex-A53) on extreme benchmarks with hundreds of instructions,
hundreds of registers and several spilling iterations, mostly due to the
n^2 complexity to generate the interferences from the live sets. Since
the live registers set are only a very sparse subset of all registers at
each instruction, iterating only over this subset allows it to run very
fast again (a couple of seconds for the same benchmark).
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3358>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3358>
There are some cases in shades using control flow where the varying load
is cloned to every block, and then the original node is left orphan.
This is not harmful for program execution, but it complicates analysis
for register allocation as there is now a case of writing to a register
that is never read.
While ppir doesn't have a dead code elimination pass for its own
optimizations and it is not hard to detect when we cloned the last load,
let's remove it early.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3358>
After flushing batches, iris_fence_flush() asks the kernel whether
each batch's last_syncpt has already signalled or not. (The idea is
that either the compute or render batch may not have actually had any
work queued up, so last_syncpt there might have been signalled a long
time ago.) If it's already completed, we don't bother to record it.
A strange corner is the case of repeated flushes. For example, we
might flush for some reason, and hit a glFlush(), and hit SwapBuffers.
It's possible for all the batches to have been flushed previously, -and-
for them to have actually completed. In this case, we'll see that there
are no syncobj's to wait on, and record fence->count == 0.
This works fine internally - fence_finish can see count == 0 and realize
that it doesn't need to wait, for example. But when working with native
FDs, we may be asked to export a fence with count == 0. So we need an
actual synchronization primitive we can hand off. Because all of the
relevant batches had been signalled when creating the fence, we want the
new dummy fence to be signalled as well.
So we just make a signalled syncobj and export it.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Currently the schedule_program implementation being used is picked
at compile time, which on the Android platform means that the
bifrost compiler & scheduler is used for all targets, including
midgard based hardware.
This commit disambiguates between the two schedule_program functions.
Signed-off-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>