third_party_mesa3d

Author	SHA1	Message	Date
Paulo Zanoni	0e38b794e2	intel: fix compute SLM sizes on Xe2 and newer Before the patch, intel_device_info_get_max_preferred_slm_size() returns values in kilobytes, but then intel_device_info_get_max_slm_size() is multiplying it by 1024. As a result, LNL is reporting maxComputeSharedMemorySize to be 134217728, which is 128mb. Fix this by making intel_device_info_get_max_slm_size() not multiply it by 1024. This should fix at least the following dEQP tests: dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.1 dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.128 dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.16 dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.2 dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.4 dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.64 Some tests were failing with: deqp-vk: ../../src/intel/common/intel_compute_slm.c:24: slm_encode_lookup: Assertion `kbytes <= table[table_len - 1].size_in_kb' failed. while other tests were triggering the OOM. v2: - Make everybody return sizes in bytes (José). v3: - Rename variable to bytes (José, Jordan). Fixes: `fd368f5521` ("anv: Set maxComputeSharedMemorySize value for Xe2 platforms") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30541>	2024-08-07 16:14:02 +00:00
Sil Vilerino	a0f1a708c4	Revert "d3d12: Video Encode - Remove PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE as not supported" This reverts commit `d6bb4ddc63`. Fixes: `d6bb4ddc63` ("d3d12: Video Encode - Remove PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE as not supported") PIPE_VIDEO_PROFILE_MPEG4_AVC_BASELINE is necessary for some scenarios like the example below described in https://github.com/microsoft/WSL/issues/11838 gst-launch-1.0 -v videotestsrc num-buffers=250 ! video/x-raw,width=1920,height=1200 ! vaapipostproc ! vaapih264enc ! filesink location=~/wsl_test.h264 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30548>	2024-08-07 15:51:19 +00:00
Nanley Chery	54631ebc68	anv: Batch MCS and CCS aux-op flushes The PRMs suggest that certain classes of auxiliary surface operations will automatically synchronize when performed back-to-back: Any transition from any value in {Clear, Render, Resolve} to a different value in {Clear, Render, Resolve} requires end of pipe synchronization. Make use of this functionality by batching CCS and MCS flushes when compatible auxiliary surface operations are performed within a command buffer. Ref: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11325 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29922>	2024-08-07 15:25:37 +00:00
Nanley Chery	f854161928	anv,iris: Use WriteImmediate instead of Z flush for WA According to the HSD, this is an alternative option for Wa_14016712196. Taking this option allows us to combine this workaround with a couple other depth workarounds. Make sure to execute these workarounds before the workaround for the depth register mode, so that the stalling flush is not impacted. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29922>	2024-08-07 15:25:37 +00:00
Nanley Chery	db6ae41c65	intel/blorp: Use WA helpers for depth pipecontrol Instead of unconditionally emitting a pipe control on gfx11+, use the workaround helpers for workarounds 1408224581 and 14014097488. Also, add a check for workaround 14016712196, which is also impacted. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29922>	2024-08-07 15:25:37 +00:00
Nanley Chery	77e4f9690d	anv: Drop flush from unused depth workaround This flush was introduced with the following commits: `8949d27bb8` ("anv: implement gen9 post sync pipe control workaround") `bcb611361b` ("anv: implement gen12 post sync pipe control workaround") The flush was unsued with the following commit: `e79e1ca304` ("intel: Drop Tigerlake revision 0 workarounds") This prevents some extra pipecontrols caused by a following patch. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29922>	2024-08-07 15:25:37 +00:00
Zan Dobersek	f58e1ef7ec	tu: enable shaderInt8 support Enable the shaderInt8 Vulkan feature for Turnip. As final necessary changes, an assert for nir_op_imul is tweaked to also allow 8-bit multiplication, and nir_op_bcsel's conversion of the conditional value from 8 to 32 bits is applied through masking, like in the general conversion case. Signed-off-by: Zan Dobersek <zdobersek@igalia.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10675 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29875>	2024-08-07 14:32:28 +00:00
Zan Dobersek	e30c329026	ir3: improve validation, display for ldp instructions During validation, an ldp instruction should have all its three source registers validated. For display, the half-type register name should be displayed when applicable. Signed-off-by: Zan Dobersek <zdobersek@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29875>	2024-08-07 14:32:28 +00:00
Zan Dobersek	55ac28954e	ir3: indicate possible dword straddle for any multi-component pvtmem access When filling out ir3_info, any multi-component stp or ldp instruction should indicate possible straddle across dword boundaries. This indirectly prevents setting the PERWAVEMEMLAYOUT flag on the SP_VS_PVT_MEM_SIZE register, enabling proper functioning of three-component 8-bit accesses with natural alignment. Signed-off-by: Zan Dobersek <zdobersek@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29875>	2024-08-07 14:32:28 +00:00
Zan Dobersek	9e0b77d5c3	ir3: use fully-functional dp4acc when available a750 improves dp4acc to have support for all dot product variants. The main difference with dp4acc of previous generations is that the signedness and packed instruction fields have to be instead interpreted as signedness of either side of the dot product. Signed-off-by: Zan Dobersek <zdobersek@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29875>	2024-08-07 14:32:28 +00:00
Zan Dobersek	8aa2cad5df	ir3: lower relevant 8-bit ALU ops in nir_lower_bit_size The nir_lower_bit_size pass is used to properly adapt specific 8-bit ALU operations for correct behavior. In those cases inputs are converted to 16 bits and the result is converted back down to 8 bits. Signed-off-by: Zan Dobersek <zdobersek@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29875>	2024-08-07 14:32:28 +00:00
Zan Dobersek	7fd5f76393	nir/lower_vars_to_scratch: calculate threshold-limited variable size separately ir3's lowering of variables to scratch memory has to treat 8-bit values as 16-bit ones when comparing such value's size against the given threshold since those values are handled through 16-bit half-registers. But those values can still use natural 8-bit size and alignment for storing inside scratch memory. nir_lower_vars_to_scratch now accepts two size-and-alignment functions, one used for calculating the variable size and the other for calculating the size and alignment needed for storing inside scratch memory. Non-ir3 uses of this pass can just duplicate the currently-used function. ir3 provides a separate variable-size function that special-cases 8-bit types. Signed-off-by: Zan Dobersek <zdobersek@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29875>	2024-08-07 14:32:28 +00:00
Zan Dobersek	f8602612ed	ir3: some 8-bit subgroup intrinsics must execute as 16-bit instructions ir3 8-bit quad-broadcast, quad-swap, scan and reduce instructions only work correctly when done in 16-bit space. A nir_lower_bit_size pass is used to upcast the source value and then downcast the result back to 8 bits. Signed-off-by: Zan Dobersek <zdobersek@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29875>	2024-08-07 14:32:28 +00:00
Danylo Piliaiev	8b7beca572	tu: Enable UBWC for D24S8 with USAGE_SAMPLED and formatless border color DXVK and VKD3D-Proton use customBorderColorWithoutFormat and have most of D24S8 images with USAGE_SAMPLED, in such case we disable UBWC for correctness. However, games don't use border color for depth-stencil images. So we elect to ignore this edge case and force UBWC to be enabled. See also https://github.com/doitsujin/dxvk/issues/4191 Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30545>	2024-08-07 13:51:20 +00:00
Karol Herbst	012323a1d1	rusticl/image: properly sync mappings content for 1Dbuffer images This fixes clFillImage 1Dbuffer use_pitches CL CTS tests. Fixes: `7b22bc617b` ("rusticl/memory: complete rework on how mapping is implemented") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30528>	2024-08-07 13:38:06 +00:00
Karol Herbst	2484331e82	rusticl/image: take pitches into account when allocating memory for maps This is more correct than the previous code and the CL CTS relies on edge case behavior here, e.g. for 1Dbuffer images. I think part of that is not actually required by the spec, but whatever. Fixes: `7b22bc617b` ("rusticl/memory: complete rework on how mapping is implemented") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30528>	2024-08-07 13:38:06 +00:00
Karol Herbst	1fa288b224	rusticl/memory: Fix memory unmaps after rework An application could map and unmap a host ptr allocation multiple times, but because how the refcounting works, we might never ended up syncing the written data to the mapped region. This moves the refcounting out of the event processing. Fixes: `7b22bc617b` ("rusticl/memory: complete rework on how mapping is implemented") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30528>	2024-08-07 13:38:05 +00:00
Eric Engestrom	b6d8459e3a	ci: pass MESA_SPIRV_LOG_LEVEL from job to the test Fixes: `4b8735cd4e` ("ci: raise the log level threshold of spirv logs") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30546>	2024-08-07 11:43:25 +02:00
Mike Blumenkrantz	ef88af8467	dril: always take the egl init path using EGL_DEFAULT_DISPLAY will cover the swrast case, which fixes generating all the correct configs Fixes: `ec7afd2c24` ("dril: rework config creation") Reviewed-by: Eric Engestrom <eric@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30426>	2024-08-07 08:54:40 +00:00
Iago Toral Quiroga	086ed1e54b	broadcom/compiler: emit instructions producing flags earlier We usually emit flags right before consuming them but this is suboptimal from the point of view of register pressure: if an instruction is only used to generate flags then waiting to emit it right before reading the flags extends the liveness of the sources used to generate the flags for no gain. This pass will check for such instructions and try to move them as early as possible. Shader-db results below show this is effective to reduce register pressure, allowing a few shaders to increase thread counts and/or reduce spilling: total instructions in shared programs: 11057173 -> 11057076 (<.01%) instructions in affected programs: `1955543` -> 1955446 (<.01%) helped: 4214 HURT: 3905 Inconclusive result (value mean confidence interval includes 0). total threads in shared programs: 425096 -> 425170 (0.02%) threads in affected programs: 74 -> 148 (100.00%) helped: 37 HURT: 0 Threads are helped. total uniforms in shared programs: 3846275 -> 3845674 (-0.02%) uniforms in affected programs: 23574 -> 22973 (-2.55%) helped: 217 HURT: 30 Uniforms are helped. total max-temps in shared programs: 2222910 -> 2220488 (-0.11%) max-temps in affected programs: 61904 -> 59482 (-3.91%) helped: 2145 HURT: 113 Max-temps are helped. total spills in shared programs: 4294 -> 4280 (-0.33%) spills in affected programs: 148 -> 134 (-9.46%) helped: 8 HURT: 0 total fills in shared programs: 6497 -> 6468 (-0.45%) fills in affected programs: 291 -> 262 (-9.97%) helped: 8 HURT: 0 total sfu-stalls in shared programs: 14344 -> 14611 (1.86%) sfu-stalls in affected programs: 1308 -> 1575 (20.41%) helped: 217 HURT: 335 Inconclusive result (%-change mean confidence interval includes 0). total inst-and-stalls in shared programs: 11071517 -> 11071687 (<.01%) inst-and-stalls in affected programs: 1946767 -> 1946937 (<.01%) helped: 4191 HURT: 3909 Inconclusive result (value mean confidence interval includes 0). total nops in shared programs: 270628 -> 269829 (-0.30%) nops in affected programs: 22032 -> 21233 (-3.63%) helped: 1213 HURT: 571 Inconclusive result (%-change mean confidence interval includes 0). Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30511>	2024-08-07 09:28:39 +02:00
Georg Lehmann	d9849ac466	aco: test xor swap16 path Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30515>	2024-08-06 20:40:12 +00:00
Georg Lehmann	e0818cb87b	aco/gfx11+: don't use VOP3 v_swap_b16 v_swap_b16 is not offically supported as VOP3, so it can't be used with v128-255. Tests show that VOP3 appears to work correctly, but according to AMD that should not be relied on. https://github.com/llvm/llvm-project/pull/100442#discussion_r1703929676 Foz-DB Navi31: Totals from 6 (0.01% of 79395) affected shaders: Instrs: 64799 -> 65932 (+1.75%) CodeSize: 360180 -> 368440 (+2.29%) Latency: 1364648 -> 1365922 (+0.09%) InvThroughput: 635843 -> 636475 (+0.10%) Copies: 14766 -> 15698 (+6.31%) VALU: 38743 -> 39675 (+2.41%) Fixes: `80b8bbf0c5` ("aco/gfx11: use v_swap_b16") Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30515>	2024-08-06 20:40:12 +00:00
Alyssa Rosenzweig	796b3ab23d	nir/opt_peephole_select: allow speculatable load constant this is useful on AGX when soft fault is enabled. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30501>	2024-08-06 20:01:37 +00:00
Aditya Swarup	ae85f59645	anv: Disable fast clear when surface height is 16k As suggested in WA_16021232440: Disable fast clear when surface height equals 16k. Signed-off-by: Aditya Swarup <aditya.swarup@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29182>	2024-08-06 19:14:04 +00:00
Aditya Swarup	0f821c1e2f	iris: Disable fast clear when surface height is 16k If surface height during fast clear is 16k, as per bspec the height programmed should be "value - 1" i.e. 0x3FFF. However, HW adds "1" to it but ignores overflow bit[14]. HW performs OOB check based on bit[13:0] which is 0 and drops failed transactions. This patch passes the following failing test on LNL: "PIGLIT_PLATFORM=gbm PIGLIT_DEFAULT_SIZE=16384x16384 shader_runner fast-slow-clear-interaction.shader_test -auto -fbo" Signed-off-by: Aditya Swarup <aditya.swarup@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29182>	2024-08-06 19:14:04 +00:00
Lionel Landwerlin	6145798022	intel/mi_builder: enable control flow API on Gfx9+ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>	2024-08-06 17:55:19 +00:00
Lionel Landwerlin	8cc492cb26	genxml: unify some bits between Gfx8/Gfx11/Gfx12.5 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>	2024-08-06 17:55:18 +00:00
Lionel Landwerlin	343e569ab7	anv: ensure max_plane_count is at least 1 This simplifies a bunch of checks throughout the driver. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>	2024-08-06 17:55:18 +00:00
Lionel Landwerlin	4f093b2e2b	anv: add missing MEDIA_STATE_FLUSH for internal shaders Replicating what we do in genX_cmd_compute.c Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `7ca5c84804` ("anv: add support for simple internal compute shaders") Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>	2024-08-06 17:55:18 +00:00
Lionel Landwerlin	0bd96e868c	intel-clc: missing printf lowering Useful for printf() debugging in our opencl shader snippets. Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>	2024-08-06 17:55:18 +00:00
Lionel Landwerlin	398e6cf38b	anv: reuse cs_prog_data pointer Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>	2024-08-06 17:55:18 +00:00
Lionel Landwerlin	f4a812a229	anv: remove some unused includes Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>	2024-08-06 17:55:18 +00:00
Lionel Landwerlin	cde72181b7	anv: prevent asserts with debug printf in internal shaders Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>	2024-08-06 17:55:18 +00:00
Kenneth Graunke	32cce2f397	intel/brw: Set appropriate types for 16-bit sampler trailing components 16-bit SIMD8 sampler writeback messages come with a bit of padding in them, requiring us to emit a LOAD_PAYLOAD to reorganize the data into the padding-free format expected by NIR. Additionally, we may reduce the response length on the sampler messages based on which components of the (always vec4) NIR destination are actually in use. When we do that, dest_size > read_size, and the trailing components are all empty BAD_FILE registers, indicating the contents are undefined. Unfortunately, we can't ignore those trailing components entirely. In the past, we left them default-initialized, giving us a BAD_FILE register with UD type (which didn't matter, since all sampler returns were 32-bit). But with 16-bit, this was confusing the LOAD_PAYLOAD. For example, writing RGB and skipping A (without sparse) would produce read_size = 3 and dest_size = 4 and nir_dest[5] containing: nir_dest[] = <R:hf, G:hf, B:hf, blank-A:ud, blank-sparse:ud> We'd then call LOAD_PAYLOAD on the first 4 sources, causing it to see 3 HF's and a UD, and try to copy the full 32-bit value at the end, instead of 16-bits of pad like we intended. This meant it would overflow the destination register's size, triggering validation errors. Thanks to Ian Romanick for noticing this, writing a test, and also coming up with a nearly identical fix. Fixes: `0116430d39` ("intel/brw: Handle 16-bit sampler return payloads") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11617 References: https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/152 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30529>	2024-08-06 17:26:05 +00:00
Tatsuyuki Ishi	947a333ec3	util/u_queue: Replace relative time wait hack with u_cnd_monotonic Remove the gross hack. The hack was broken too, because it incorrectly added abs_time (a timestamp) to the now (another timestamp). Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30491>	2024-08-06 16:37:59 +00:00
Alyssa Rosenzweig	c40c723336	agx: use opt_uniform_atomics Apple does something similar. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>	2024-08-06 11:48:18 -04:00
Alyssa Rosenzweig	39e7d06eea	agx: add some SRs the subgroup one seen in metal uniform atomic code, the quad one is by symmetry. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>	2024-08-06 11:48:18 -04:00
Alyssa Rosenzweig	340831dbcc	nir/divergence_analysis: handle AGX stuff bunch of vendor intrinsics, plus some standard intrinsics used in weird shader stages. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>	2024-08-06 11:48:18 -04:00
Alyssa Rosenzweig	d99c2ef059	nir/opt_uniform_atomics: add fs atomics predicated? flag on agx (and mali), we predicate atomics on "if (!helper)", so doing so again in this pass is redundant. and would cause a problem since we'd then have to lower the "is helper inv?" flag late. so just skip the extra lowering code. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>	2024-08-06 11:48:17 -04:00
Alyssa Rosenzweig	fbbdc965aa	asahi: don't count helper invs in pipeline stats query Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>	2024-08-06 11:48:04 -04:00
Alyssa Rosenzweig	75d07cc3d0	agx: fix ballot extend packing hit with uniform atomic ops with tessellation. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>	2024-08-06 11:48:03 -04:00
Rhys Perry	810808b778	nir/opt_uniform_atomics: require block index metadata is_atomic_already_optimized() uses this. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30518>	2024-08-06 15:04:21 +00:00
Rhys Perry	373851e7ee	docs: update ACO_DEBUG documentation for perfwarn Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Fixes: `cc404d45ff` ("aco: remove perfwarn") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30519>	2024-08-06 14:58:44 +00:00
Rhys Perry	e45035c83a	docs: update ACO_DEBUG documentation for scheduler options Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Fixes: `48461c0d9e` ("aco: enable VOPD scheduler") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30519>	2024-08-06 14:58:44 +00:00
David Rosca	0c024bbe64	radeonsi/vcn: Add decode DPB buffers as CS dependency This is needed to ensure correct synchronization in kernel eg. when it moves the buffers between VRAM and GTT. Backport-to: 24.2 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3437 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11624 Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30510>	2024-08-06 14:09:50 +00:00
Surafel Assefa	979dc41558	vulkan: MESA_VK_ENABLE_SUBMIT_THREAD=0 disables threaded submit Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30492>	2024-08-06 13:19:40 +00:00
Iago Toral Quiroga	d58f7a24d1	v3d: do not expose EXT_float_blend This extension is all about exposing blending with 32-bit floating point formats, which V3D doesn't support at all so we should not be exposing it. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30512>	2024-08-06 10:33:10 +00:00
Alvin Wong	0413e1f7dc	hasvk: Conditionally expose VK_KHR_present_wait Gate it behind driconf query for now. Co-authored-by: Hans-Kristian Arntzen <post@arntzen-software.no> Acked-by: Hans-Kristian Arntzen <post@arntzen-software.no> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30480>	2024-08-06 11:39:38 +08:00
Kenneth Graunke	c19e5a0a75	intel/brw: Replace predicated break optimization with a simple peephole We can achieve most of what brw_fs_opt_predicated_break() does with simple peepholes at NIR -> BRW conversion time. For predicated break and continue, we can simply look at an IF ... ENDIF sequence after emitting it. If there's a single instruction between the two, and it's a BREAK or CONTINUE, then we can move the predicate from the IF onto the jump, and delete the IF/ENDIF. Because we haven't built the CFG at this stage, we only need to remove them from the linked list of instructions, which is trivial to do. For the predicated while optimization, we can rely on the fact that we already did the predicated break optimization, and simply look for a predicated BREAK just before the WHILE. If so, we move the predicate onto the WHILE, invert it, and remove the BREAK. There are a few cases where this approach does a worse job than the old one: nir_convert_from_ssa may introduce load_reg and store_reg in blocks containing break, and nir_trivialize_registers may decide it needs to insert movs into those blocks. So, at NIR -> BRW time, we'll actually emit some MOVs there, which might have been possible to copy propagate out after later optimizations. However, the fossil-db results show that it's still pretty competitive. For instructions, 1017 shaders were helped (average -1.87 instructions), while only 62 were hurt (average +2.19 instructions). In affected shaders, it was -0.08% for instructions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	fad63d6483	intel/brw: Delete the brw_fs_opt_dead_control_flow_eliminate() pass With the select peephole gone, this no longer does much of anything. No instruction changes in fossil-db on Alchemist. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00

1 2 3 4 5 ...

193113 Commits