We are losing the optimization of converting a single quad to
a triangle fan reusing the same four vertex in the buffer.
Maybe this optimization can be ported to primconvert, QUADS are
always converted to TRIANGLES with primconvert. And i965, crocus
and svga have similar optimizations.
V3D doesn't implement this optimization and according to when it was
introduced on 230e646a40 ("broadcom/vc4: Decompose single QUADs to
a TRIANGLE_FAN."):
"No significant difference in the minetest replay, but it should reduce
overhead by not requiring that we write quad indices to index buffers
that we repeatedly re-upload (and making the draw packet smaller, as
well)."
v2: Commit log includes more detail about the removed optimization.
(Alejandro Piñeiro)
Reference: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5277
Suggested-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12669>
This lets the driver use pipe_debug_message() for GL_ARB_debug_output.
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
With the syncobj support in place, lets use it to implement the
EGL_ANDROID_native_fence_sync extension. This mostly follows previous
implementations in freedreno and etnaviv.
v2: Drop the flags (Eric)
Handle in_fence_fd already in job_submit (Eric)
Drop extra vc4_fence_context_init (Eric)
Dup fds with CLOEXEC (Eric)
Mention exact extension name (Eric)
Signed-off-by: Stefan Schake <stschake@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This gives us access to the fence created for the render job.
v2: Drop flag (Eric)
Signed-off-by: Stefan Schake <stschake@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Notes:
- make sure the default size is large enough to handle all state trackers
- pipe wrappers don't receive transfer calls from stream_uploader, because
pipe_context::stream_uploader points directly to the underlying driver's
stream_uploader (to keep it simple for now)
v2: add error handling to nv50, nvc0, noop
v3: set const_uploader
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com> (v1)
Tested-by: Charmaine Lee <charmainel@vmware.com>
Track rendering to each FBO independently and flush rendering only when
necessary. This lets us avoid the overhead of storing and loading the
frame when an application momentarily switches to rendering to some other
texture in order to continue rendering the main scene.
Improves glmark -b desktop:effect=shadow:windows=4 by 27%
Improves glmark -b
desktop:blur-radius=5:effect=blur:passes=1:separable=true:windows=4
by 17%
While I haven't tested other apps, this should help X rendering a lot, and
I've heard GLBenchmark needed it too.
This is done in vc4_flush currently, but I'm going to make the job always
track the surfaces it might be rendering to instead of putting in the
destinations at flush time.
Any update here should have been the same as in
vc4_set_framebuffer_state(), except for the point where vc4_blit.c
temporarily sets different state for its different buffers.
This is apparently a weirdness of gallium -- nr_samples==1 is occasionally
used and means the same thing as nr_samples==0. Fixes a bunch of
ARB_framebuffer_srgb blit cases in piglit.
Use NULL tests of the form `if (ptr)' or `if (!ptr)'.
They do not depend on the definition of the symbol NULL.
Further, they provide the opportunity for the accidental
assignment, are clear and succinct.
Signed-off-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
I needed to rewrite this a bit for safety checking in the next commit.
Despite being a static inline of the same thing that was being done, we
lose 36 bytes of code for some reason.
Some, but not all, state trackers will explicitly unref (and set to
NULL) the previous *fence before calling pipe->flush(). So driver
should use fence_ref() which will unref the old fence if not NULL.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Acked-by: Eric Anholt <eric@anholt.net>
This avoids a security issue where userspace could have written the tile
state/tile alloc behind the GPU's back, and will apparently be necessary
for fixing stability bugs (tile state buffers are missing some top bits
for the tile alloc's address).
The idea I had when I wrote the original shadow code was that you'd see a
set_index_buffer to the IB, then a bunch of draws out of it. What's
actually happening in openarena is that set_index_buffer occurs at every
draw, so we end up making a new shadow BO every time, and converting more
of the BO than is actually used in the draw.
While I could maybe come up with a better caching scheme, for now just
do the simple thing that doesn't result in a new shadow IB allocation
per draw.
Improves performance of isosurf in drawelements mode by 58.7967% +/-
3.86152% (n=8).
I want to be able to have multiple jobs being set up at the same time (for
example, a render job to do a little fixup blit in the course of doing a
render to the main FBO).
We're over-allocating our BCL in vc4_draw.c, so this never mattered.
However, new RCL-only blit support might end up here without having set up
any BCL contents.
New BO create and mmap ioctls are added. The submit ABI gains a flags
argument, and the pointers are fixed at 64-bit. Shaders are now fixed at
the start of their BOs.
It turns out the simulator was not treating this bit the same as the RPi,
and I'd forgotten to remove it when turning on early Z. The result was
that you'd get big chunks of your rendering missing.
The optimizer obviously doesn't have the ability to rewrite these to skip
the size checks per call, so we have to do it manually.
Improves a norast benchmark on simulation by 0.779706% +/- 0.405838%
(n=6087).