Culling is traditionally done by the rasterizer, but that
can be a bottleneck when an app creates a large number
of primitives. Eg. a lot of tiny triangles reduce the
rasterziation efficiency.
NGG makes it possible for the shader to check primitives
and delete those that it can prove are not needed.
After this is done, we have to repack the surviving invocations
so they remain compact. This also saves bandwidth, because
some memory loads are only executed by those vertices that
survived the culling.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10525>
The new intrinsics fall into the following categories:
1. New viewport intrinsics:
For missing components that we need.
RADV will emit new SGPR arguments which will contain the
viewport information for culling shaders. These are used to
compute the screen space coordinates for small primitive culling.
2. load_cull_xxx:
Load the culling settings in runtime.
These will be a new SGPR argument in RADV.
3. overwrite_xxx:
These are needed because system values such as vertex and
instance ID are not writeable, but we need to change them
after repacking shader invocations of VS and TES.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10525>
No need to include the same BO multiple times in the long-lived ringbuffer
object's list of relocs to be added to the submit.
Improves non-TC drawoverhead -test 9 (8 tex updates) throughput by 1.4901%
+/- 0.8705% (n=20)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11697>
On drawoverhead -test 9 (8 texture changes), this saves us 172kb of
memory. That's only ~1% of the GEM memory while the test is running, but
more importantly it saves us 29% of the gem BO allocations.
non-TC drawoverhead -test 9 (8 texture change) throughput 0.449019% +/-
0.336296% (n=100), but this gets better as we get better suballocation
density.
Note that this means that all fd_ringbuffer_new_object calls can now
return data aligned to 64 bytes, instead of 4k. We may find that we need
to increase it if some of our objects (tex consts, sampler consts, etc.)
require more alignment than that. But, this may help non-drawoverhead
perf if any of our RB objects have a cache in front of them (indirect
consts?) and we don't have most of our data in the same cache set any
more.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11697>
This migration was done with libclang-based automatic tooling, which
performed these replacements:
* Operand(uint8_t) -> Operand::c8
* Operand(uint16_t) -> Operand::c16
* Operand(uint32_t, false) -> Operand::c32
* Operand(uint32_t, bool) -> Operand::c32_or_c64
* Operand(uint64_t) -> Operand::c64
* Operand(0) -> Operand::zero(num_bytes)
Casts that were previously used for constructor selection have automatically
been removed (e.g. Operand((uint16_t)1) -> Operand::c16(1)).
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11653>
Emit all the state layout config (such as push-const CONSTLEN) first,
before emitting anything that depends on that state. This fixes an
issue that was showing up when FLUT is enabled in ir3 (which results
in higher probability of not having any immediats lowered to push-
consts).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8705>
these are the only frontends which may be used by gallium drivers in ci,
so stop triggering all driver jobs when other frontends are changed since
those changes can never affect ci
<MrCooper> Not that simple unfortunately. E.g. the llvmpipe-piglit-cl job hits
src/gallium/frontends/clover & possibly src/gallium/targets/opencl,
many jobs hit src/gallium/{frontends,targets}/dri and probably
src/gallium/targets/pipe-loader, lavapipe jobs hit src/gallium/{frontends,targets}/lavapipe.
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11832>
Otherwise there would be no clause with the dependencies needed for
ATEST set, so the GPU would get stuck.
Not needed on v7, as there shader_wait_dependency in the RSD will wait
for the dependencies before the shader starts.
Explicitly create a NOP instruction, as it is assumed that clauses
have a non-zero count of instructions in various places.
Fixes GPU timeouts in many applications, such as SuperTuxKart and
GZDoom.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11842>
Doing *both* of thse ends up rewriting the previous mapping. Since this
doesn't seem to have lead to issues, it seems like the new mapping works
just as well.
Fixes: a22a1c0324 ("zink: Fix VK_FORMAT_A8B8G8R8_SRGB_PACK32 mapping on big-endian")
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11417>
I factored out the chunk of loader code that dlopen()s
libraries from the rest of the DRI driver loader function
in this commit:
commit bc343154f8
Author: James Jones <jajones@nvidia.com>
Date: Thu Apr 22 23:17:08 2021 -0700
loader: Factor out driver library loading code
However, I failed to adjust the DRI loader function that
now uses the new helper function to handle the case where
the requested DRI library is not found.
This change restores the prior behavior, and also ensures
loader_open_driver() consistently returns NULL in the
out_driver_handle parameter on failure.
Fixes: bc343154f8 ("loader: Factor out driver library loading code")
Signed-off-by: James Jones <jajones@nvidia.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11807>