On chordal graphs, a greedy coloring can be done in a way that never uses
more colors than are required for the largest clique. However, since we
have vector values and force phi resources into the same spill slots, the
interference graphs are not chordal, and thus, this assumption doesn't hold.
Use twice as many spill slots as upper bound.
Totals from 10 (0.01% of 79242) affected shaders: (GFX11)
MaxWaves: 52 -> 54 (+3.85%)
Instrs: 271386 -> 271779 (+0.14%)
CodeSize: 1362544 -> 1365432 (+0.21%)
VGPRs: 2536 -> 2532 (-0.16%)
SpillVGPRs: 778 -> 818 (+5.14%)
Scratch: 73472 -> 76800 (+4.53%)
Latency: 3331718 -> 3328798 (-0.09%); split: -0.14%, +0.05%
InvThroughput: 1665860 -> 1643350 (-1.35%); split: -1.40%, +0.05%
VClause: 3292 -> 3329 (+1.12%); split: -0.06%, +1.18%
Copies: 46082 -> 46257 (+0.38%)
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27011>
We can combine the previously introduced mechanisms to implement
line-smoothing, combined with some bespoke late RSD-patching to force
the correct multisampling and rasterization states.
The reason we need to do this patching late, is that we need to know the
primitive-type before we can determine this state, and we don't know
this until draw-time. Luckily, we have a convenient spot to do this.
This bespoke patching only happens for gen7 and earlier. Luckily, the
bits here remain the same for all of these gens. On later gens, we can
just emit this normally, without any bespoke patching.
We also need to use a fresh batch when changing the effective smoothing
state, because that's framebuffer-state. This applies to all gens.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10281
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26904>
How we're updating active_prim is currently a bit convoluted, because we
update it from two different places: First, we update it right before
updating the shader-variant in the case of point-sprite changes, and
then later on we update it unconditionally.
But we're about to need to change this logic, because we also need to
deal with line-smooth lowering. So let's clean this up a tad first, by
moving both updates to the same function, and renaming it a bit. This
let's us cleanly update it in one place, and then react to the change.
While we're at it, let's replace the bitwise xor with a logical
not-equals, which makes this all logical operations.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26904>
Unfortunately, the lowering-pass has gotten a condition added that we
don't really want. So we need to lower away the
load_poly_line_smooth_enabled-intrinsic to always return true to get rid
of it again.
It's not great, but line-smoothing isn't the most important feature to
have max performance on, so... meh?
An alternative here could be to make the condition in
nir_lower_poly_line_smooth optional, but that seems kinda hairy
code-wise. Dunno.
We also need to run nir_lower_alu in order to lower away bit_count on
midgard.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26904>
This implements a D3D11-style forcing of the sample-counts, which will
be used by the line-smoothing code we're about to add.
Even though we don't actually use the single-sample mode, I left it in
for completeness, as it's documented in the TRM.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26904>
When a device lost occured, we were not ensuring that
vkGetQueryPoolResults with VK_QUERY_RESULT_WAIT_BIT was returning.
We could have the caller stuck in our while(...) loop polling the query
atomics forever which violates the finite-time guarantees of this function.
This solves that by adding a timeout to this call to be double the
default TDR timeout for the potential queues relating to this query.
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27091>
Following discussion on kernel mailing list[1], we are not gaining
anything from this right now, and it does not handle soft recovery.
We will hear about the context loss and rationale when we vkQueueSubmit
next.
We can come back to this if there is ever a Vulkan extension for
figuring out innocent vs guilty like GL_EXT_robustness.
This does mean however that we return VK_SUCCESS for cancelled semaphore
and fence waits, but this is legal per the Vulkan spec:
"Commands that wait indefinitely for device execution (namely
vkDeviceWaitIdle, vkQueueWaitIdle, vkWaitForFences with a maximum
timeout, and vkGetQueryPoolResults with the VK_QUERY_RESULT_WAIT_BIT
bit set in flags) must return in finite time even in the case of a lost
device, and return either VK_SUCCESS or VK_ERROR_DEVICE_LOST."
"If device loss occurs (see Lost Device) before the timeout has expired,
vkWaitSemaphores must return in finite time with either VK_SUCCESS or
VK_ERROR_DEVICE_LOST."
[1]: https://lists.freedesktop.org/archives/amd-gfx/2024-January/103337.html
Signed-off-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27091>
Xe KMD reports SMEM size as half of RAM while i915 returns the whole
RAM size, so to keep it consistent here adjusting the values
returned by i915 KMD.
The free i915 SMEM also needs to be ajusted but as this is needed by
both KMDs because KMD uAPIs only reports free memory for applications
running elevated privileges, so this was moved to
intel_device_info_ajust_memory() to be shared by both KMD backends.
sram.mappable.size asserts had to be removed from i915 code paths
because of this adjustment.
anv_compute_sys_heap_size() was dropped in ANV and reduce in HASVK
because adjustments are now done in intel/dev level.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26567>
This improves vkpeak perf by 40-50% on the Ada GPU in my laptop. There
are probably more optimal things we can be doing (Intel's pass is pretty
clever) but this is already an improvement.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27157>
When building the CFG the instructions are taken of the list in
fs_visitor and added to the lists inside each block. The single
"exec_node" in the instruction is used for those memberships.
In the case the pass rebuilt the CFG, it had no instructions, so
calculate_cfg() had nothing to work with. For now fix the bug by
pulling all the instructions back to the original list.
We can do better here, but punting until upcoming work on
CFG itself.
Issue found in an unpublished CTS test. Small reproduction in our
unit tests now enabled.
Fixes: 65237f8bbc ("intel/fs: Don't add MOV instructions to DO blocks in combine constants")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27131>