We need to perform cross-batch flushing if any batch writes to a BO
while others refer to it. We checked this case when recording a new
BO in the list which we'd never seen before. However, we neglected to
handle the case when we already read from a BO, but then began writing
to it. That new write may provoke a conflict between existing reads
in other batches, so we need to re-check the cross-batch flushing.
Caught by Piglit's copyteximage when forcing blits and copies to use
a new IRIS_BATCH_BLITTER that isn't upstream yet. But this bug could
be provoked by render/compute work today...we just hadn't noticed it.
Fixes: b21e916a62 ("iris: Combine iris_use_pinned_bo and add_exec_bo")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13808>
We are using the same definitions for both OpenGL and Vulkan, so let's
move it to common.
As we are here we are also adding versioning on the TFU register
definition. Those are basically register bit places, so really likely
to change between versions.
Adding 33 as it is the first version they got defined.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13832>
llvmpipe's fragment shaders are always run sequentially and
in API order for a single tile, so it's impossible to have
out of order render target writes requiring fetch barriers.
Issues fixed in previous commits were actually breaking most
piglit/deqp tests for coherent extension variant.
Signed-off-by: Pavel Asyutchenko <sventeam@yandex.ru>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13252>
Using 1 bit per wrap mode looked very suspicious and after some
experiments it turns out it's 3-bit enum.
Border color is also here, it sits right after depth field. For
some reason it uses 16 bit per channel just like for clear color in RSW
GL_CLAMP mode is broken for nearest filter just as on Midgard, so add
the same workaround - use GL_CLAMP_TO_EDGE for nearest filter.
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13213>
It looks like MBS format used by blob doesn't distinguish sampler2D from
sampler3D, so load texture instruction is the same for 2D and 3D
textures.
So all we need to RE is texture descriptor for 3D textures, but blob
doesn't implement it, so we need to do some guesswork:
- unknown_3_1 looks like a depth since it sits after height/width and
always set to 1
- unknown_2_2 is exactly 3 bits and it follows wrap_t, so it must be
wrap_r
- missing part is texture type for 3D textures. By trial and error it
seems to be 4. First bit is only set for cubemap, so it's likely a
separate flag, and rest 2 bits look like number of tex dimensions akin
to midgard and later (thanks, panfrost!) with 0 for 1D, 1 for 2D
and 2 for 3D.
Put it all together and we have working 3D textures on lima!
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13213>
I'd assumed that since insert didn't take a deleter, it was
find-or-insert, not insert-or-replace. This caused a bo reference
leak if the same bo was used more than once in a batch.
Fixes: fde36d7992 ("d3d12: Don't wait for GPU reads to do CPU reads")
Reviewed By: Bill Kristiansen <billkris@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13819>
Currently lima uses generic TXP lowering that results in downgrading
coords precision to FP16 since we have to do some calculations with
coords instead of loading them directly from varying.
Mali4x0 has native TXP support, however coords and projector have to
come from a single source.
Add NIR lowering pass that combines coords and projector into a single
backend-specific source and use it instead of generic lowering.
Unfortunately this change regresses one test, but it also fails in blob and
disassembly is now identical.
shader-db diff:
total instructions in shared programs: 15623 -> 15603 (-0.13%)
instructions in affected programs: 877 -> 857 (-2.28%)
helped: 7
HURT: 0
helped stats (abs) min: 2 max: 8 x̄: 2.86 x̃: 2
helped stats (rel) min: 0.87% max: 10.53% x̄: 4.93% x̃: 1.85%
95% mean confidence interval for instructions value: -4.95 -0.76
95% mean confidence interval for instructions %-change: -9.31% -0.55%
Instructions are helped.
total loops in shared programs: 3 -> 3 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 136 -> 137 (0.74%)
spills in affected programs: 0 -> 1
helped: 0
HURT: 1
total fills in shared programs: 598 -> 602 (0.67%)
fills in affected programs: 0 -> 4
helped: 0
HURT: 1
Tested-by: Denis Pauk <pauk.denis@gmail.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13111>
Instead of having a bunch of const vk_sync_type for each permutation of
vk_drm_syncobj capabilities, have a vk_drm_syncobj_get_type helper which
auto-detects features. If a driver can't support a feature for some
reason (i915 got timeline support very late, for instance), they can
always mask off feature bits they don't want.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>
This is, unfortunately, a large flag-day mega-commit. However, any
other approach would likely be fragile and involve a lot more churn as
we try to plumb the new vk_fence and vk_semaphore primitives into ANV's
submit code before we delete it all. Instead, we do it all in one go
and accept the consequences.
While this should be mostly functionally equivalent to the previous
code, there is one potential perf-affecting change. The command buffer
chaining optimization no longer works across VkSubmitInfo structs.
Within a single VkSubmitInfo, we will attempt to chain all the command
buffers together but we no longer try to chain across a VkSubmitInfo
boundary. Hopefully, this isn't a significant perf problem. If it ever
is, we'll have to teach the core runtime code how to combine two or more
VkSubmitInfos into a single vk_queue_submit.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>
Get rid of most of the guts of the base class and just leave it as a
vtable. We can also drop some of wsi_display_fence. One functional
change here is that we're now using VK_SYSTEM_ALLOCATION_SCOPE_INSTANCE
which is more correct anyway because, thanks to the funky reference
counting we do with destroyed and event_received, its lifetime is tied
to the physical device, at best.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>
This adds a new vk_queue_submit object which contains a list of command
buffers as well as wait and signal operations along with a driver hook
which takes a vk_queue and a vk_queue_submit and does the actual submit.
The common code then handles spawning a submit thread if needed, waiting
for timeline points to materialize, dealing with timeline semaphore
emulation via vk_timeline, etc. All the driver sees are vk_queue.submit
calls with fully materialized vk_sync objects which it can wait on
unconditionally.
This implementation takes a page from RADV's book and only ever spawns
the submit thread if it sees a timeline wait on a time point that has
not yet materialized. If this never happens, it calls vk_queue.submit
directly from vkQueueSubmit() and the thread is never spawned.
One other nicety of the new framework is that there is no longer a
distinction, from the driver's PoV, between fences and semaphores. The
fence, if any, is included as just one more signal operation on the
final vk_queue_submit in the batch.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>
This is built on the new vk_sync primitives. In the vk_physical_device,
the driver provides a null-terminated array of vk_sync_type pointers in
priority order. The semaphore implementation then selects the first
type that meets the necessary criterion. In particular, semaphores may
or may not be timelines depending on the VkSemaphoreType. It also
auto-selects the semaphore type based on the external handle types
provided and can down-grade as needed to support a particular external
handle.
The implementation itself is mostly copy+pasted from ANV. The primary
difference is the fact that anv_semaphore_impl has been replaced with
vk_sync. The permanent vk_sync is still embedded (like ANV) but the
temporary one is a pointer. This makes stealing the temporary state as
part of VkQueueSubmit a bit easier.
All of the interesting stuff around waits, signals, etc. is implemented
by the vk_sync interface. All this code does is wrap it all in the
annoyingly detailed VkFence rules so we can provide the correct Vulkan
entrypoint behavior.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>
This is built on the new vk_sync primitives. In the vk_physical_device,
the driver provides a null-terminated array of vk_sync_type pointers in
priority order. The fence implementation then selects the first type
that meets the necessary criterion. In particular, fences can't be
timelines and need to support reset and CPU wait. It also auto-selects
the fence type based on the external handle types provided and can
down-grade as needed to support a particular external handle.
The implementation itself is mostly copy+pasted from ANV. The primary
difference is the fact that anv_fence_impl has been replaced with
vk_sync. The permanent vk_sync is still embedded (like ANV) but the
temporary one is a pointer. This makes stealing the temporary state as
part of VkQueueSubmit a bit easier.
All of the interesting stuff around waits, resets, etc. is implemented
by the vk_sync interface. All this code does is wrap it all in the
annoyingly detailed VkFence rules so we can provide the correct Vulkan
entrypoint behavior.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>
This doesn't map directly to any particular Vulkan object but is,
instead, a base class for the internal implementations of both VkFence
and VkSemaphore. Its utility will become evident in later patches.
The design of vk_sync will look familiar to anyone with significant
experience in DRM. The base object itself is just a pointer to a vfunc
table with function pointers providing the implementation of the various
operations. Depending on how the vk_sync will be used some of of those
vfuncs are optional. If it's only going to be used for VkSemaphore, for
instance, there's no need for reset().
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13427>