Commit Graph

190251 Commits

Author SHA1 Message Date
Sviatoslav Peleshko
94989b45a5 anv,driconf: Add fake non device local memory WA for Total War: Warhammer 3
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8721
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29127>
2024-06-07 04:14:10 +00:00
Erik Faye-Lund
df17f2b89a meson: bump test-timeout
This tests usually takes around 15 seconds on CI, according to logs. It
recently timed out under load, causing a job to fail spuriously.

Let's bump the timeout here to 60. That's in line with the glsl compiler
warnings test, which usually takes around the same time.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29547>
2024-06-07 03:38:59 +00:00
Lucas Fryzek
db38a4913e llvmpipe: query winsys support for dmabuf mapping
Fixes #11257 by ensuring winsys mapping functions is only called
if its supported by the winsys, which should prevent llvmpipe from
crashing with kmswast.

If the winsys is kms_swrast then this method will be null, but on
drisw it will be available.

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29546>
2024-06-07 02:42:20 +00:00
Erik Faye-Lund
d0d5fedbab docs: wrap long words instead of overflowing
This fixes rendering on mobile for the 24.1.1 release notes, where we'd
otherwise end up with horizontal scrolling.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29573>
2024-06-07 02:32:48 +00:00
Yonggang Luo
85ff3f525c util: Rename DETECT_OS_UNIX to DETECT_OS_POSIX
Looking at each usage of DETECT_OS_UNIX, it's more about the POSIX API usage, not the
Unix-like OS, so let's rename it

And for POSIX it's a standard to claim which API present, but for UNIX there is no such thing

Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29555>
2024-06-07 01:56:28 +00:00
Eric Engestrom
73cc6c6738 venus/ci: add flake that's been blocking MRs
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29590>
2024-06-07 01:24:26 +00:00
Nanley Chery
de22e20294 anv: Rely more on ISL_SURF_USAGE_DISABLE_AUX_BIT
In order to support CCS, ISL may upgrade a main surface from Tile4 to
Tile64 with miptails disabled. To avoid using this space consuming
layout when not needed, inform ISL as soon as possible that compression
won't be used.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29094>
2024-06-07 00:58:41 +00:00
Nanley Chery
fc57991b66 anv: Support multiple aspects in anv_formats_ccs_e_compatible
Prevents the next patch from causing the following assert failure:

Test case 'dEQP-VK.ycbcr.copy.g8_b8_r8_3plane_420_unorm.g8_b8_r8_3plane_444_unorm.linear_linear_disjoint'..
deqp-vk: ../../src/intel/vulkan/anv_private.h:4962: anv_aspect_to_plane: Assertion `!(aspect & ~all_aspects)' failed.

We still disable CCS for multiplane formats elsewhere. I've attempted
enabling CCS for those cases but end up with failures in CI that I
cannot reproduce locally. Hopefully this change gets the next person a
step closer towards enabling this feature.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29094>
2024-06-07 00:58:41 +00:00
Nanley Chery
14a0f7391d anv,hasvk: Drop anv_get_isl_format_with_usage
Since 3beaaa9ae8 ("anv: drop lowered storage images code"), this
function has not used the VkImageUsageFlags parameter. So, we can drop
it and simplify its callers.

This function isn't used in hasvk.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29094>
2024-06-07 00:58:41 +00:00
Nanley Chery
3e9dc450a6 anv: Rely on the primary surf usage to disable aux
Instead of passing isl_extra_usage_flags to
add_aux_surface_if_supported, use the isl_surf::usage field of the
primary surface to check for ISL_SURF_USAGE_DISABLE_AUX_BIT.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29094>
2024-06-07 00:58:41 +00:00
Nanley Chery
8e96b516ca intel/isl: Assert alignments of surface addresses
In the import paths in iris, there are several cases where surface VMAs
are created without relying on the calculated surface alignment.
Asserting the alignments of surface addresses, should help catch any
cases where we end up with the wrong alignment.

This found a couple issues during development. One which required a
change to existing code is that when creating uncompressed surfaces from
compressed ones, ISL will sometimes increase the image alignment as a
result of the new format supporting CCS. This patch adds the usage flag
to disable that behavior.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29094>
2024-06-07 00:58:41 +00:00
Nanley Chery
31560d82ad iris: Simplify bo import in memobj_create_from_handle
Looking at the caller, we only import FDs without modifiers. By
asserting this behavior and dropping the unused cases, we gain some
clarity on the alignment of the imported BO's VMA.

Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29094>
2024-06-07 00:58:41 +00:00
Nanley Chery
6b969a4b43 intel/isl: Add and use multi-engine surf usage bits
Add and use two new surf usage bits:

* ISL_SURF_USAGE_MULTI_ENGINE_SEQ_BIT: the surface may be accessed by
  multiple engines, but not in parallel.

* ISL_SURF_USAGE_MULTI_ENGINE_PAR_BIT: the surface may be accessed by
  multiple engines in parallel.

Both usages are not concerned with read-after-read access patterns.

Using these bits allows ISL to conditionally use Tile64 or a 64KB
alignment to account for the gfx12.5 CCS WA from HSD 22015614752. Apart
from the potential space savings, there are three benefits of this
approach:

1) CCS can now be used with miptails (though nothing makes use of this
   today).

2) CCS can now be used with 3D depth/stencil surfaces in GL.

3) CCS can now be used with 3D depth/stencil surfaces in Vulkan when
   apps only use a single queue.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11111
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11117
Tested-by: Mark Janes <markjanes@swizzler.org>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29094>
2024-06-07 00:58:41 +00:00
Erik Faye-Lund
3053268fd0 mesa/main: updates for EXT_texture_format_BGRA8888
The spec is about to change, so let's prepare for the new and brighter
future.

Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27726>
2024-06-07 00:28:25 +00:00
Eric Engestrom
f81e38e5a9 docs: add sha256sum for 24.0.9
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29588>
2024-06-07 00:21:41 +00:00
Eric Engestrom
15627f9203 docs: update calendar for 24.0.9
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29588>
2024-06-07 00:21:41 +00:00
Eric Engestrom
92a44d3907 docs: add release notes for 24.0.9
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29588>
2024-06-07 00:21:41 +00:00
Nanley Chery
53440554c4 intel/isl: Add and use ISL_MAIN_TO_CCS_SIZE_RATIO_XE
In iris, use the CCS scale down factor to calculate the impact of CCS on
TBIMR tile sizes. Even though we fall back to a seemingly less accurate
method to calculate the impact of CCS, it ends up giving the same
answer, 1bpp. Anv already uses this factor, so this patch replaces the
constant with this macro.

There are two benefits to doing this:

1) Consistency between anv and iris.

2) Preparation for a future where we no longer use ISL surfaces to
   describe CCS on Xe+. In fact, in iris, we already don't create such
   surfaces on ACM.

I considered using INTEL_AUX_MAP_MAIN_SIZE_SCALEDOWN for the calculation
in both drivers, but the naming is aux-map specific and the scaledown
actually exists on flat-ccs platforms as well.

So, we introduce a new macro for all Xe platforms, currently only used
for the specific use case of TBIMR calculations. We can add more such
macros for future platforms, as needed.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28942>
2024-06-06 23:57:52 +00:00
Nanley Chery
26655a137f intel/aux_map: Add and use INTEL_AUX_MAP_MAIN_SIZE_SCALEDOWN
Introduce a macro so that drivers don't need to rely on the isl_surf
struct to determine the size of the CCS buffer on gfx12.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28942>
2024-06-06 23:57:52 +00:00
Nanley Chery
4ae50eaf70 intel/aux_map: Add and use INTEL_AUX_MAP_META_ALIGNMENT_B
Introduce a macro defining the alignment which aux data start addresses
should have. This alignment is for the worst case of the CCS buffer
being included in a dmabuf. Although a smaller alignment is possible for
non-dmabuf cases on TGL, no drivers would make use of that today as they
place CCS surfaces directly after tiled surfaces.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28942>
2024-06-06 23:57:52 +00:00
Nanley Chery
e27d951527 intel/aux_map: Add and use INTEL_AUX_MAP_MAIN_PITCH_SCALEDOWN
Introduce a macro so that drivers don't need to rely on the isl_surf
struct to determine the pitch of the CCS buffer on gfx12. This is useful
during layout queries of dmabufs.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28942>
2024-06-06 23:57:52 +00:00
Nanley Chery
e9653b5833 anv: Refactor modifier plane layout queries
Before this patch, we special-cased the clear color plane for layout
queries. This was because that plane lacks an ISL surface whereas all
others have one. We plan to drop the ISL surface for CCS buffers on
gfx12 in a future commit. So, in preparation, generalize the clear color
plane code to work for every plane queried on a surface that uses
modifiers.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28942>
2024-06-06 23:57:52 +00:00
Nanley Chery
0194290bb5 intel/isl: Add and use ISL_DRM_CC_PLANE_PITCH_B
At the interfaces which query the pitch of the clear color plane in GL
and Vulkan, we've been returning 64B for various reasons. Unify the
rationale under a macro.

The documentation for the macro is picked from anv, which reflects the
most recently synchronized copy of drm_fourcc.h. See the notable changes
at 8cd8f3d697.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28942>
2024-06-06 23:57:52 +00:00
Friedrich Vock
f1742d36f3 radv/rt: Fix memory leak when compiling libraries
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29579>
2024-06-06 21:56:11 +00:00
Daniel Schürmann
c452a4d1cc aco/ra: use round robin register allocation
Totals from 74681 (94.06% of 79395) affected shaders: (GFX11)

MaxWaves: 2265668 -> 2263546 (-0.09%); split: +0.01%, -0.10%
Instrs: 44941647 -> 44412809 (-1.18%); split: -1.23%, +0.05%
CodeSize: 234173852 -> 232009132 (-0.92%); split: -0.97%, +0.05%
VGPRs: 3033208 -> 3403000 (+12.19%); split: -0.02%, +12.22%
Latency: 305575738 -> 301100302 (-1.46%); split: -1.70%, +0.23%
InvThroughput: 49366070 -> 49020000 (-0.70%); split: -0.91%, +0.21%
VClause: 875748 -> 854930 (-2.38%); split: -2.65%, +0.27%
SClause: 1369614 -> 1327212 (-3.10%); split: -3.43%, +0.33%
Copies: 2887932 -> 2883061 (-0.17%); split: -1.93%, +1.76%
Branches: 885041 -> 885101 (+0.01%); split: -0.01%, +0.02%
VALU: 25218078 -> 25215170 (-0.01%); split: -0.20%, +0.19%
SALU: 4328640 -> 4326052 (-0.06%); split: -0.20%, +0.14%
VOPD: 9129 -> 9611 (+5.28%); split: +7.48%, -2.20%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29235>
2024-06-06 21:02:15 +00:00
Daniel Schürmann
197943ae27 aco/ra: change heuristic to first fit
Totals from 73175 (92.17% of 79395) affected shaders: (GFX11)

MaxWaves: 2217690 -> 2217930 (+0.01%); split: +0.02%, -0.01%
Instrs: 44780731 -> 44784895 (+0.01%); split: -0.14%, +0.15%
CodeSize: 233238960 -> 233255604 (+0.01%); split: -0.11%, +0.12%
VGPRs: 3009116 -> 3007684 (-0.05%); split: -0.29%, +0.24%
Latency: 304320163 -> 304286592 (-0.01%); split: -0.31%, +0.30%
InvThroughput: 49121992 -> 49145025 (+0.05%); split: -0.20%, +0.25%
VClause: 872566 -> 873242 (+0.08%); split: -0.25%, +0.33%
SClause: 1359666 -> 1361640 (+0.15%); split: -0.11%, +0.26%
Copies: 2879649 -> 2881646 (+0.07%); split: -1.13%, +1.20%
Branches: 887102 -> 887093 (-0.00%); split: -0.01%, +0.01%
VALU: 25128240 -> 25128572 (+0.00%); split: -0.12%, +0.12%
SALU: 4328852 -> 4330559 (+0.04%); split: -0.07%, +0.11%
VOPD: 8861 -> 8992 (+1.48%); split: +2.63%, -1.15%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29235>
2024-06-06 21:02:15 +00:00
Daniel Schürmann
d76fc005b6 aco/ra: re-use registers from killed operands
Totals from 77283 (97.34% of 79395) affected shaders: (GFX11)

MaxWaves: 2348498 -> 2348250 (-0.01%); split: +0.01%, -0.02%
Instrs: 45304558 -> 45097367 (-0.46%); split: -0.57%, +0.11%
CodeSize: 235719656 -> 234957768 (-0.32%); split: -0.43%, +0.11%
VGPRs: 3065984 -> 3073244 (+0.24%); split: -0.41%, +0.65%
Latency: 308010576 -> 307008565 (-0.33%); split: -0.85%, +0.52%
InvThroughput: 49560307 -> 49464214 (-0.19%); split: -0.54%, +0.34%
VClause: 881895 -> 879739 (-0.24%); split: -0.78%, +0.53%
SClause: 1388139 -> 1374634 (-0.97%); split: -1.12%, +0.14%
Copies: 2918583 -> 2910434 (-0.28%); split: -1.92%, +1.64%
Branches: 893947 -> 893712 (-0.03%); split: -0.06%, +0.03%
VALU: 25260728 -> 25256766 (-0.02%); split: -0.20%, +0.19%
SALU: 4377750 -> 4373595 (-0.09%); split: -0.17%, +0.07%
VOPD: 8603 -> 9163 (+6.51%); split: +8.54%, -2.03%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29235>
2024-06-06 21:02:15 +00:00
Daniel Schürmann
b054cfe704 aco/ra: move can_write_m0() check into get_reg_specified()
This way, affinities are also covered.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29235>
2024-06-06 21:02:15 +00:00
Daniel Schürmann
8e817cf52b aco/ra: refactor get_reg_simple() with increased stride.
This should avoid some redundant calls.

Totals from 153 (0.19% of 79395) affected shaders: (GFX11)

Instrs: 301717 -> 301687 (-0.01%); split: -0.06%, +0.05%
CodeSize: 1583080 -> 1582988 (-0.01%); split: -0.06%, +0.05%
VGPRs: 10068 -> 10348 (+2.78%)
Latency: 6685446 -> 6685475 (+0.00%); split: -0.11%, +0.11%
InvThroughput: 999241 -> 999316 (+0.01%); split: -0.01%, +0.02%
VClause: 3868 -> 3870 (+0.05%)
Copies: 23752 -> 23769 (+0.07%); split: -0.27%, +0.34%
Branches: 6479 -> 6480 (+0.02%)
VALU: 179290 -> 179307 (+0.01%); split: -0.04%, +0.04%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29235>
2024-06-06 21:02:15 +00:00
Daniel Schürmann
1b0edf3f33 aco/ra: Fix array access when finding register for subdword variables
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29235>
2024-06-06 21:02:15 +00:00
Daniel Schürmann
5326e033ff aco/ra: fix handling of killed operands in compact_relocate_vars()
Found by inspection.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29235>
2024-06-06 21:02:14 +00:00
Samuel Pitoiset
afa2070c99 radv: initialize compute preambles with the common helper
The PM4 mechanism can emit paired packets on GFX11+ when possible.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29452>
2024-06-06 20:26:47 +00:00
Samuel Pitoiset
3c8b48e310 ac,radeonsi: add a function to initialize compute preambles
Preambles are very similar between RADV and RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29452>
2024-06-06 20:26:47 +00:00
Samuel Pitoiset
428601095c ac,radeonsi import PM4 state from RadeonSI
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29452>
2024-06-06 20:26:47 +00:00
Lionel Landwerlin
62c52fb59d anv: expose VK_MESA_image_alignment_control
Our implementation is a no-op for the following reasons :

  - ISL always tries to go for the smallest tiling mode (see
    isl_surf_choose_tiling())

  - In the few cases where we need to use Tile64 for compression
    workarounds, VK_MESA_image_alignment_control doesn't require use
    to disable compression

  - vkd3d-proton has the ability to disable compression using
    VK_EXT_image_compression_control, disabling Tile64 requirements
    and ensuring ISL can select a 4k tiling mode

So vkd3d-proton should always be able to get a 4k tiling mode if it
wants to.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29175>
2024-06-06 19:00:47 +00:00
Eric Engestrom
3e7a82968d nvk+zink/ci: add another flake seen in nightly
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29574>
2024-06-06 18:49:11 +00:00
Samuel Pitoiset
15fe733703 radv: add a helper to get image VA
Similar to buffer, and less error prone.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29428>
2024-06-06 18:21:33 +00:00
Rhys Perry
4cfb7a0c17 aco: remove support for sub-dword push constants
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29480>
2024-06-06 17:52:05 +00:00
Rhys Perry
e21312018e ac/llvm: remove support for sub-dword push constants
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29480>
2024-06-06 17:52:05 +00:00
Rhys Perry
41c5f71343 radv: lower sub-dword push constants
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29480>
2024-06-06 17:52:05 +00:00
Rhys Perry
69b7fcd775 ac/nir: support lowering of sub-dword push constants
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29480>
2024-06-06 17:52:04 +00:00
Yusuf Khan
e7a2127f0e aux/draw: Use the draw info we get passed in instead of our own
Signed-off-by: Yusuf Khan <yusisamerican@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28641>
2024-06-06 17:00:18 +00:00
Yusuf Khan
377600b9df nv50/vbo: wrap draw_vbo to avoid ovehead from multidraw
Same as the nvc0 patch pretty much, similar improvement.

Signed-off-by: Yusuf Khan <yusisamerican@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
---
v2: remove tmp_info as per Karol Herbst suggestion
v3: nv50_draw_vbo -> nv50_draw_single_vbo per Karol's suggestion
v4: mutex assertion and remove num_draws

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28641>
2024-06-06 17:00:18 +00:00
Yusuf Khan
225f2aac96 nvc0/vbo: wrap draw_vbo for multidraw performance
This patch is to avoid the high overhead that exists when trying to
kick ever single draw during multidraw.

glMultiDrawArrays performance profiling:

342.5 thousand draws/second -> 40 million draws/second

Special thanks to Arthur Huillet for helping getting this profiled
in irc.

Signed-off-by: Yusuf Khan <yusisamerican@gmail.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
---
v2: fix typos pointed out by Arthur
v3: nvc0_draw_vbo -> nvc0_draw_single_vbo, intialize count
v4: remove num_draws from wrapped function and add mutex assert

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28641>
2024-06-06 17:00:18 +00:00
Georg Lehmann
3fb1a64918 aco: move s_add_u32 -> s_addk_i32 optimization fully to ra
Having this in one place is better.
When I wrote the old I wasn't aware that checking the kill flag on definitions
is the same as checking zero uses.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29512>
2024-06-06 16:28:23 +00:00
Georg Lehmann
60f3f0fdbb aco/ra: use a switch to check vop2acc instruction support
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29512>
2024-06-06 16:28:23 +00:00
Georg Lehmann
fdc2fb6835 aco: move literal unswizzle opt to RA
Much simpler.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29512>
2024-06-06 16:28:23 +00:00
Georg Lehmann
c63c750380 aco/gfx11+: fix inline constants for v_pk_fmac_f16
On newer hardware, the hi operation reads the lo half of the inline constant.
On older hardware, it reads the hi half (zero).
I tested this on Navi31 for gfx11 and Raphael for gfx10.

Foz-DB Navi31:
Totals from 4 (0.01% of 79395) affected shaders:
CodeSize: 36832 -> 36448 (-1.04%)
Latency: 20362 -> 20334 (-0.14%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29512>
2024-06-06 16:28:23 +00:00
Georg Lehmann
39380d475a aco: add affinities for possible sopk optimizations
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29512>
2024-06-06 16:28:23 +00:00
Georg Lehmann
fac475bc25 aco: rework how affinities for acc operands are determined
Improve accuracy by adding a helper that's also used by
the optimization function.

Foz-DB Navi31:
Totals from 50 (0.06% of 79206) affected shaders:
CodeSize: 126148 -> 126128 (-0.02%); split: -0.05%, +0.04%
Latency: 334049 -> 334060 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 59203 -> 59205 (+0.00%)
Copies: 2011 -> 1998 (-0.65%); split: -0.75%, +0.10%
VALU: 14221 -> 14208 (-0.09%); split: -0.11%, +0.01%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29512>
2024-06-06 16:28:23 +00:00