This has proven to be incredibly useful for debugging CMA allocation
failures and driving memory management improvements. However, we don't
want to burden entry and exit from the BO cache with the labeling ioctl's
overhead on release builds.
I was overwriting view->texture with the shadow resource when we need to
do shadow copies (retiling or baselevel rebase), but that tripped up some
critical new sanity checking in state_tracker (making sure that stObj->pt
hasn't changed from view->texture through TexImage-related paths).
To avoid that, move the shadow resource to the vc4_sampler_view struct.
Fixes: f0ecd36ef8 ("st/mesa: add an entirely separate codepath for setting up buffer views")
The current way of importing the resource from renderonly after allocation
is opaque and is taking away control from the driver, which it needs in
order to implement more advanced scenarios than the simple linear scanout
with matching stride alignments.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
X11 and GL compositor performance on VC4 has been terrible because of our
SHARED-usage buffers all being forced to linear. This swaps SHARED &&
!LINEAR buffers over to being tiled.
This is an expected win for all GL compositors during rendering (a full
copy of each shared texture per draw call), allows X11 to be used with
decent performance without a GL compositor, and improves X11 windowed
swapbuffers performance as well. It also halves the memory usage of
shared buffers that get textured from. The only cost should be idle
systems with a scanout-only buffer that isn't flagged as LINEAR, in which
case the memory bandwidth cost of scanout goes up ~25%.
This implements the EGL_EXT_image_dma_buf_import_modifiers extension,
supporting the VC4 T_TILED modifier.
v2: Added modifier support to resource creation/import, and
advertisement (by daniels).
v3: Fix old-kernel fallback path, fix compiler error and warnings, and
comment touchups (by anholt).
Reviewed-by: Daniel Stone <daniels@collabora.com>
Rather than open-coding populating the first slice inside resource
import, use vc4_setup_slices to do it for us.
v2: Rebase on VC4_DEBUG=surf change
Reviewed-by: Daniel Stone <daniels@collabora.com>
This follows the model of imx (display) and etnaviv (render): pl111 is a
display-only device, so when asked to do GL for it, we see if we have a
vc4 renderer, make the vc4 screen, and have vc4 call back to pl111 to do
scanout allocations.
The difference from etnaviv is that we share the same BO between vc4 and
pl111, rather than having a vc4 bo and a pl11 bo and copies between the
two. The only mismatch between their requirements is that vc4 requires
4-pixel (at 32bpp) stride alignment, while pl111 requires that stride
match width. The kernel will reject any modesets to an incorrect stride,
so the 3D driver doesn't need to worry about that.
v2: Rebase on Android rework, drop unused include.
v3: Fix another Android bug, from Rob Herring's build-testing.
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Coverity caught the use of dead code copy-paste for
found_colors[] and num_found_colors.
CID: 1341850
Signed-off-by: Rhys Kidd <rhyskidd@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
If X11 did a software fallback to the entire screen, we would throw out
the BO the screen is scanning out from and allocate a new one.
Cc: mesa-stable@lists.freedesktop.org
pipe_draw_info::indexed is replaced with index_size. index_size == 0 means
non-indexed.
Instead of pipe_index_buffer::offset, pipe_draw_info::start is used.
For indexed indirect draws, pipe_draw_info::start is added to the indirect
start. This is the only case when "start" affects indirect draws.
pipe_draw_info::index is a union. Use either index::resource or
index::user depending on the value of pipe_draw_info::has_user_indices.
v2: fixes for nine, svga
Previously, the plan was "if the width/height we have to load/store isn't
the size the user is planning on writing, then we need to load the old
contents out beforehand to prevent writing back undefined".
However, when we're doing glTexImage() we often end up aligning the
width/height into the padding of the texture, and we don't actually
need to read out that padding.
Improves x11perf -aatrapezoid100 performance from ~460/sec to
~700/sec.
Track rendering to each FBO independently and flush rendering only when
necessary. This lets us avoid the overhead of storing and loading the
frame when an application momentarily switches to rendering to some other
texture in order to continue rendering the main scene.
Improves glmark -b desktop:effect=shadow:windows=4 by 27%
Improves glmark -b
desktop:blur-radius=5:effect=blur:passes=1:separable=true:windows=4
by 17%
While I haven't tested other apps, this should help X rendering a lot, and
I've heard GLBenchmark needed it too.
It's really just an upgrade to attempting WHOLE_RESOURCE. Pulling the
logic out caught two bugs in it: We would try to do so on cubemaps (even
though we're only mapping 1 of the 6 slices), and we would break
persistent coherent mappings by trying to reallocate when we shouldn't.
to reduce the call indirections with u_resource_vtbl.
The worst call tree you could get was:
- u_transfer_inline_write_vtbl
- u_default_transfer_inline_write
- u_transfer_map_vtbl
- driver_transfer_map
- u_transfer_unmap_vtbl
- driver_transfer_unmap
That's 6 indirect calls. Some drivers only had 5. The goal is to have
1 indirect call for drivers that care. The resource type can be determined
statically at most call sites.
The new interface is:
pipe_context::buffer_subdata(ctx, resource, usage, offset, size, data)
pipe_context::texture_subdata(ctx, resource, level, usage, box, data,
stride, layer_stride)
v2: fix whitespace, correct ilo's behavior
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Acked-by: Roland Scheidegger <sroland@vmware.com>
To support general GL_TEXTURE_BASE_LEVEL we have to copy to a temporary
miptree. However, if a single level is being selected, we can use the
existing miptree and force all the sampling to be from that particular
level.
This avoids a ton of software fallbacks in glGenerateMipmaps(), which uses
base levels in the blit implementation in gallium. Improves "glmark2 -b
terrain" from 2 fps to 3 (perhaps some more precision would be useful?),
and cuts its CPU usage during the benchmarking from ~30% to ~10% (total
CPU time from 8.8s to 7.6s).
The allocation has succeeded by that point, so it needs to be freed.
CovID: 1358929
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
The expected stride calculation is completely wrong. It should
ultimately be multiplying cpp and width rather than dividing. The width
also needs to be aligned to the tiling width first before converting to
stride bytes.
The whole stride check here is possibly pointless. Any buffers which
were allocated outside of vc4 may have strides with larger alignment
requirements.
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
We don't really support reading/writing of 3D textures since the hardware
doesn't do 3D, but we do need to make sure that a pipe_transfer for them
has enough space to store the image. This was previously not a problem
because the state tracker only mapped a slice at a time until
fb9fe352ea. Fixes glean glsl1 tests, which
all have setup of a 3D texture at the start.
We need to fix up the offset to point at the face of the cube. Fixes
piglit fbo-cubemap, copyteximage CUBE, and glean's fbo test.
Cc: "11.1 11.2" <mesa-stable@lists.freedesktop.org>
If we're going to sample from or render to them at some particular size,
we'd better make sure that they actually are that size. Causes some tests
under simulation to generate appropriate error messages instead of
failures.
This will allow drivers to make better decisions about texture sharing
for DRI2, DRI3, Wayland, and OpenCL.
v2: add read/write flags, take advantage of __DRI_IMAGE_USE_BACKBUFFER
Reviewed-by: Axel Davy <axel.davy@ens.fr>
The fixed alignment of u_upload_mgr will go away.
This is the first step.
The motivation is that one u_upload_mgr can have multiple users,
each allocating from the same buffer, but requiring a different alignment.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This is apparently a weirdness of gallium -- nr_samples==1 is occasionally
used and means the same thing as nr_samples==0. Fixes a bunch of
ARB_framebuffer_srgb blit cases in piglit.
For MSAA, we store full resolution tile buffer contents, which have their
own tiling format. Since they're full resolution buffers, we have to
align their size to full tiles.
I was afraid our callers weren't prepared for this, but it looks like
at least for resource creation, mesa/st throws an error appropriately.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
It's a bunch of work for us to emit it (and its uniforms), more work for
the kernel to validate it, and additional work for the CLE to read
it. Improves es2gears framerate by about 50%.
Signed-off-by: Eric Anholt <eric@anholt.net>
We need to make sure that when we store the aligned box, we've got
initialized contents in the border. We could potentially just load the
border area, but for now let's get text rendering working in X (and fix
the GL_TEXTURE_2D errors in piglit's texsubimage test and
gl-2.1-pbo/test_tex_image)
The idea I had when I wrote the original shadow code was that you'd see a
set_index_buffer to the IB, then a bunch of draws out of it. What's
actually happening in openarena is that set_index_buffer occurs at every
draw, so we end up making a new shadow BO every time, and converting more
of the BO than is actually used in the draw.
While I could maybe come up with a better caching scheme, for now just
do the simple thing that doesn't result in a new shadow IB allocation
per draw.
Improves performance of isosurf in drawelements mode by 58.7967% +/-
3.86152% (n=8).