This lets us avoid allocing new buffers for renderbuffers, finalized miptrees,
and PBO-uploaded textures when there's an unreferenced but still active one
cached, while also avoiding CPU waits for batchbuffers and CPU-uploaded
textures. The size of BOs allocated for a desktop running current GL
cairogears on i915 is cut in half with this.
Note that this means we require libdrm 2.4.5.
Previously, we were trying to pass a name to the GEM GET_TILING_IOCTL,
which needs a handle, and failing. None of our buffers were tiled yet, but
they will be at some point with DRI2 and UXA.
Most of these were to ensure that caches got synchronized between 2d (or meta)
rendering and later use of the target as a source, such as for texture
miptree setup. Those are replaced with intel_batchbuffer_emit_mi_flush(),
which just drops an MI_FLUSH. Most of the remainder were to ensure that
REFERENCES_CLIPRECTS batchbuffers got flushed before the lock was dropped.
Those are now replaced by automatically flushing those when dropping the lock.
Fencing was used in two places: ensuring that we didn't get too many frames
ahead of ourselves, and glFinish. glFinish will be satisfied by waiting on
buffers like we would do for CPU access on them. The "don't get too far ahead"
is now the responsibility of the execution manager (kernel).
In addition to potentially binding when it was about to be mapped anyway,
failure to use CACHED_MAPPED means eating a full wbinvd on validate. Thanks to
airlied for catching this.
The screen wide info such as pitch and cpp are obsoleted by the FBO
changes, so clean up the last few references to those, except for
setting up the legacy screen regions.
Putting the bufmgr in the screen is not thread-safe since the emit_reloc
changes. It also led to a significant performance hit from pthread usage
for the attempted thread-safety (up to 12% of a cpu spent on refcounting
protection in single-threaded 965). The motivation had been to allow
multi-context bufmgr sharing in classic mode, but it wasn't worth the cost.