It's split into ac_nir_lower_ps_early ac_nir_lower_ps_late.
ac_nir_lower_ps_early doesn't generate any AMD specific intrinsics except
some system values and is mainly an optimization pass with some lowering.
The new change here is that it also eliminates output components not needed
by spi_shader_col_format.
ac_nir_lower_ps_late lowers output stores to exports and does the bc_optimize
thing.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966>
Add support for getting time elapsed values via glBeginQuery/glEndQuery.
When recording query start & end time, we ensure that all pending jobs have
been completed by using v3d cpu_queue & the multisync extension.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32547>
Add support for getting timestamp values via
glGet(GL_TIMESTAMP) and glQueryCounter(GL_TIMESTAMP). For the case of
glQueryCounter, we make use of v3d cpu jobs via
DRM_IOCTL_V3D_SUBMIT_CPU and DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32547>
The spec says
vkCmdCopyQueryPoolResults is considered to be a transfer operation,
and its writes to buffer memory must be synchronized using
VK_PIPELINE_STAGE_TRANSFER_BIT and VK_ACCESS_TRANSFER_WRITE_BIT before
using the results.
While STORE_MULTIPLE is not exactly VK_PIPELINE_STAGE_TRANSFER_BIT /
VK_ACCESS_TRANSFER_WRITE_BIT, we can still rely on user barriers to do
the right thing (e.g., flush caches for host access).
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32697>
On gfx12+, the pre-amble and post-amble flushes contain the stalls
necessary to ensure the prior operation is complete. Remove the extra
uses of ANV_PIPE_END_OF_PIPE_SYNC_BIT in post-amble flushes. Also do
this for the pre-amble flushes, but this doesn't have any impact. The
flush application function will implicitly add the bit.
For A750, this improves the TWWH3 trace in the performance CI by 0.52%
(n=2).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31600>
Each reg may store a list of conflict regs. This was handled by
util_dynarray, however each of those hold an extra pointer for
the ra_regs (which serves as mem_ctx for that). Since the usage
here is very simple, we just handle the array growth manually.
The initial size remains the same as before.
The mem_ctx of each ra_reg was being used to identify the case
in which the list wasn't used. Change to use a bool in the
ra_regs struct instead.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25744>
Each node stores a list of adjacent nodes. This was handled by
util_dynarray, however each of those hold an extra pointer for
the ra_graph (which serves as mem_ctx for that). Since the usage
here is very simple, we just handle the array growth manually.
For now keep using the same initial size as was being used by dynarray.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25744>
Fast-clears require expensive flushes beforehand and afterwards. The
cost of flushes are decreased in a series of back-to-back fast-clears as
no extra fast-clear flushes are required in between them. If the ratio
of a command buffer's recorded back-to-back fast clears over independent
fast-clears falls below 1/2, prevent that command buffer from recording
any further fast-clears.
Averaging two runs of our Factorio trace on an A750 shows a +14.37%
improvement in FPS.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32984>
RBBM_PRIMCTR registers are used for different pipeline statistics that can
be queried, but current usage was wrong in some cases. Comments in the
register file are updated, and the per-statistic register index mapping is
updated accordingly.
Fixes on a750:
test_query_pipeline_statistics in vkd3d-proton
arb_query_buffer_object failures in piglit (zink)
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32900>
The `sad` instruction (used for iadd3) doesn't support the scalar ALU.
This means we might fall back to non-earlypreamble whenever we use it in
the preamble. Prevent this by emitting it as two adds instead.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32943>
If a polygon offset is set via glPolygonOffset, but the functionality
isn't enabled via glEnable(GL_POLYGON_OFFSET_FILL) the offset must not
be taken into account when computing the sample depth. As the Vivante
hardware does not have a separate enable state, the offset units and
scale must both be set to 0 to keep the sample depth unchanged.
Fixes dEQP-GLES2.functional.polygon_offset.default_enable
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32982>
Through the spec it's required that cl_kernel doesn't hold references to
its bound kernel arguments.
There is a CL CTS test verifying this, but because the arguments were not
used in the test kernel, a reference was never taken. This will change
with SVM and BDA as we need to know all bound memory objects even if they
aren't directly used in kernels.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32961>