It's faster. Not only is the memcpy more efficiently performed in the
kernel (making up for the system call overhead), but by not using mmap
we remove the greater overhead of tracking the vma of every batch.
And it means we can read back from the batch buffer without incurring
the cost of a uncached read through the GTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Shaves 60k off the driver from removing the broken spans code. This
means we now require 2.6.29, which seems fair given that it's a year
old and we've removed support for non-KMS already in the last release
of 2D.
This moves the logic for how to align pitches, heights, and sizes of
objects to one central location. Fixes rendering with texture tiling
on i915. Note that current libdrm is required for the change for
I915_TILING_NONE pitch alignment.
This uses a stamp mechanisms to mark the DRI drawable as invalid.
Instead of immediately updating the buffers we just bump the drawable
stamp and call out to DRI2GetBuffers "later".
"Later" used to be at LOCK_HARDWARE time, and this patch brings back
callouts at the points where we used to call LOCK_HARDWARE. A new function,
intel_prepare_render(), is called where we used to call LOCK_HARDWARE,
and if the buffers are invalid, we call out to DRI2GetBuffers there.
This lets us invalidate buffers only when notified instead of on
every glViewport() call. If the loader calls the DRI invalidate
entrypoint, we disable viewport triggered buffer invalidation.
Additionally, we can clean up the old viewport mechanism a bit,
since we can just invalidate the buffers and not worry about
reentrancy and whatnot.
Note that when detaching the PBO from the region and making a new BO
for the region, we don't make it tiled even if the region originally
was.
Fixes piglit pbo-teximage-tiling.
gen2/3/4 are easier to say than "8xx, 915-945/g33/pineview, 965/g45/misc",
and compares on generation are often easier than stringing together a bunch
of chipset checks.
This is somewhat nasty, but we need to do Y-tiled depth for FBO support.
May help with corruption and hangs since enabling texture tiling, and
since switching depth textures to Y tiled.
Fixes piglit depthtex.c on 965.
This fixes a regression in region read performance that came in with the
texture tiling changes. Ideally we'd have an access flag coming in so we
could also use bo_map_gtt for writing, like we do for buffer objects.
Bug #22190
pbo might be system buffer based or attached to another region. Call
intel_bufferobj_buffer to make sure pbo has a buffer of its own.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
This is about a 30% performance win in OA with high settings on my GM45,
and experiments with 915GM indicate that it'll be around a 20% win there.
Currently, 915-class hardware is seriously hurt by the fact that we use
fence regs to control the tiling even for 3D instructions that could live
without them, so we spend a bunch of time waiting on previous rendering in
order to pull fences off. Thus, the texture_tiling driconf option defaults
off there for now.
Looking for memory leaks that were causing crashes in my environment
in a situation where valgrind would not work, I ended up improving
the i965 debug traces so I could better see where the memory was
being allocated and where it was going, in the regions and miptrees
code, and in the state caches. These traces were specific enough
that external scripts could determine what elements were not being
released, and where the memory leaks were.
I also ended up creating my own backtrace code in intel_regions.c,
to determine exactly where regions were being allocated and for what,
since valgrind wasn't working. Because it was useful, I left it in,
but disabled and compiled out. It can be activated by changing a flag
at the top of the file.
This lets us avoid allocing new buffers for renderbuffers, finalized miptrees,
and PBO-uploaded textures when there's an unreferenced but still active one
cached, while also avoiding CPU waits for batchbuffers and CPU-uploaded
textures. The size of BOs allocated for a desktop running current GL
cairogears on i915 is cut in half with this.
Note that this means we require libdrm 2.4.5.
Previously, we were trying to pass a name to the GEM GET_TILING_IOCTL,
which needs a handle, and failing. None of our buffers were tiled yet, but
they will be at some point with DRI2 and UXA.