A7XX introduces some changes into the CCU such as having different
amounts of memory per CCU for depth and color and dividing up CCU
control into two registers A7XX_RB_CCU_CNTL and A7XX_RB_CCU_CNTL2
where CNTL2 no longer requires a complete flush to be updated, we
currently don't take advantage of this as any CCU updates set both
registers but it's a potential optimization we can add in the future.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
Add a separate pass which uses the analyze_ubo_ranges machinery to
construct ranges of readonly globals accessed in the shader and push
them to constants in the preamble, using ldg.k if possible. This is
enough to handle inline uniforms in turnip but also provides a base for
OpenCL, although the pass would need further work for that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
a750 has SS6_DIRECT path broken, we should either use UBO lowering
or SS6_INDIRECT path.
It is implemented as INDIRECT load even on a750+ because with UBO
lowering it would be tricky to get const offset for to use in multidraw,
also we would need to ensure the offset is not 0.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
A750 expects driver params loaded through the preamble, old path
does work but has issues when the same LOAD_STATE is used between
several draw calls (it seems that LOAD_STATE is executed only for
the first draw call).
To solve this we now lower driver params to UBOs and let NIR deal with
them.
Notes:
- VS params are loaded via old path since blob do the same and there
are no issues observed.
- FDM is not supported at the moment.
- For now driver params data is emitted via CP_NOP because it's tricky
to allocate space for the data. (It is emitted when we are already in
sub_cs)
Co-Authored-By: Connor Abbott <cwabbott0@gmail.com>
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
SP_FS_VGPR_CONFIG was found to be correlated with blob using avgs/uvgs.
Other SP_*_VGPR_CONFIG where undefined per-stage regs and it was tested
via rddecompiler that they "fix" hangs in respective shader stage,
when such stage uses the following instructions pattern:
avgs.s.1.tex.0
(ss) avgs.e;
uvgs.s.tex.0;
uvgs.e
The exact meaning of SP_*_VGPR_CONFIG is to be investigated.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
We have to push the lowering of texture operations a bit further in
pipeline since nir_lower_tex gets invoked twice and if there is no LOD
source present, nir_lower_tex adds that as a source. Once that's all
done we can easily combine the LOD and array index into a single 32-bit
value.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27458>
On Gfx12.0, CCS allocations have to be allocated per image because the
format of the image goes into the AUX-TT PTEs. The effect on memory
allocations is limited since the main surface granularity in the
AUX-TT PTE is 64KB.
On Gfx12.5, the granularity of the AUX-TT PTE is 1MB. This creates a
lot of waste in the application memory allocations. Fortunately the HW
doesn't care about the format put into the PTEs anymore. So it becomes
possible to have 2 images share the same PTE.
To implement this we bring back an earlier version of AUX-TT mappings
where we used to allocate additional CCS space at the end of the
VkDeviceMemory objects. On Gfx12.5, if the BO has additional CCS
space, we will now map the main surface to that space.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26822>
Support should be the same as AMDVLK, except for these formats:
- VK_FORMAT_R4G4_UNORM_PACK8
- VK_FORMAT_A4R4G4B4_UNORM_PACK16_EXT
- VK_FORMAT_A4B4G4R4_UNORM_PACK16_EXT
- VK_FORMAT_A1B5G5R5_UNORM_PACK16_KHR
- VK_FORMAT_A8_UNORM_KHR
- VK_FORMAT_X8_D24_UNORM_PACK32
- VK_FORMAT_D24_UNORM_S8_UINT
And the various emulated compressed formats.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27551>
Instead of making it part of every BO, just reserve a bit of space at
the end of the top buffer as part of setting up our vma_heap. This
reduces our memory allocation by nvk_heap::overalloc per BO and means
that the over-allocation is taken into account when sparse binding heap
BOs in the contiguous case.
Fixes: e162c2e78e ("nvk: Use VM_BIND for contiguous heaps instead of copying")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27565>
When bitrate or fps change is detected, only update rate control
parameters instead of completely reinitializing encode session.
This fixes an issue where if application changed bitrate or fps often,
the output bitrate would significantly overshoot the target bitrate in some
cases. In other cases, the output bitrate would be extremely low instead.
Cc: mesa-stable
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27548>
It is possible to free memory backing images before images are
destroyed :
VkFreeMemory:
"Memory can be freed whilst still bound to resources, but those
resources must not be used afterwards."
The spec leaves us the option to keep a reference on the associated
memory and free it only when all the bound resources have been
destroyed. Here we choose to free memory immediately.
One particular test in the CTS
(dEQP-VK.synchronization.internally_synchronized_objects.pipeline_cache_graphics)
does the following :
imgA = vkCreateImage()
imgB = vkCreateImage()
memA = vkAllocateMemory()
vkBindImageMemory(imgA, memA) # Aux mapping with ref count = 1
vkFreeMemory(memA) # Aux mapping removed, ref count = 0
memB = vkAllocateMemory() # Same address as memA
vkBindImageMemory(imgB, memB)
vkDestroyImage(imgA) # Removes the mapping of imgB-memB
vkQueueSubmit() # hang with pagefault in AUX-TT
The solution implemented in this change is to not do anything AUX-TT
related in vkFreeMemory(). This soluation has some consequences,
because a virtual memory address range freed and reallocated cannot be
rebound in the AUX-TT until all the associated resources have released
their AUX-TT mapping (to bring back the AUX-TT refcount of the range
to 0). This should still be better than keeping the memory allocated
through refcounting of the anv_bo.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 7b87e1afbc ("anv: track & unbind image aux-tt binding")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10528
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27566>