This prints a log of every PIPE_CONTROL flush we emit, noting which bits
were set, and also the reason for the flush. That way we can see which
are caused by hardware workarounds, render-to-texture, buffer updates,
and so on. It should make it easier to determine whether we're doing
too many flushes and why.
this adds support for imports where the image data begins at an offset
from the start of the buffer, as used in h/x264
fixeskwg/mesa#47
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We create two new helpers, iris_flush_bits_for_history, and
iris_dirty_for_history, then use them in the existing function.
The first accumulates flush bits based on res->bind_history, but doesn't
actually perform a flush. This allows us to accumulate flush bits by
looping over multiple resources, but ultimately emit a single flush for
all of them.
The latter flags dirty bits without flushing, which again allows us to
handle multiple resources, but also is more convenient when writing from
the CPU where we don't need a flush (as in commit 4d12236072).
Applications frequently call glBufferSubData() to consecutive regions
of a VBO to append new vertex data. If no data exists there yet, we
can promote these to unsynchronized writes, even if the buffer is busy,
since the GPU can't be doing anything useful with undefined content.
This can avoid a bunch of unnecessary blitting on the GPU.
u_threaded_context would do this for us, and in fact prohibits us from
doing so (see TC_TRANSFER_MAP_NO_INFER_UNSYNCHRONIZED). But we haven't
hooked that up yet, and it may be useful to disable u_threaded_context
when debugging...at which point we'd still want this optimization. At
the very least, it would let us measure the benefit of threading
independently from this optimization. And it's not a lot of code.
Removes most stall avoidance blits in "Total War: WARHAMMER."
On my Skylake GT4e at 1920x1080, this appears to improve performance
in games by the following (but I did not do many runs for proper
statistics gathering):
----------------------------------------------
| DiRT Rally | +2% (avg) | + 2% (max) |
| Bioshock Infinite | +3% (avg) | + 9% (max) |
| Shadow of Mordor | +7% (avg) | +20% (max) |
----------------------------------------------
We want to skip some types of aux usages (for instance,
ISL_AUX_USAGE_HIZ when the hardware doesn't support it, or when we have
multisampling) when sampling from the surface.
Instead of checking for those cases while filling the surface state and
leaving it blank, let's have a version of aux.possible_usages for
sampling. This way we can also avoid allocating surface state for the
cases we don't use.
Fixes: a8b5ea8ef0 "iris: Add function to update clear color in surface state."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2: Update tracked clear color when we update the surface state.
v3: Update all aux surface states when updating the clear color.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Also store clear color in the iris_resource.
Always allocate clear color state buffer.
v2:
- Make clear_color_offset be 64 bits (Ken).
- Simplify the logic to decide when to memset the aux buffer (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This is similar to intel_miptree_map_blit and intel_buffer_object.c's
temporary blits in i965.
Improves performance of DiRT Rally by 20-25% by eliminating stalls.
Breaks piglit's spec/arb_shader_image_load_store/host-mem-barrier,
by using the GPU to do uploads, exposing a st/mesa issue where it
doesn't give us memory_barrier() calls. This is a pre-existing issue
and will be fixed by a later patch (currently out for review).
If we change the aux state for a given resource, we need to re-emit the
binding table pointers for any stage that has such resource bound. Since
we don't track that, flag IRIS_ALL_DIRTY_BINDINGS and emit all of them.
(cleaned up by Ken - make sure a bunch of things were more obviously
not using res->surf, do allow checking res->surf.tiling == LINEAR,
drop format cpp checks that aren't needed, drop memzone handling for
images, assume buffers / non-buffers in a few places...)
When we blit, transfer, or copy_resource to a buffer, we need to flush
to ensure any stale data for that buffer is invalidated in the caches.
bind_history will inform us which caches need to be flushed.
Also, for any push constant buffers, we need to flag those dirty so
that we re-emit 3DSTATE_CONSTANT_*, causing the data to be re-pushed.
This commit introduces a new Gallium driver for Intel Gen8+ GPUs,
named 'iris_dri.so' after the hardware.
Developed by:
- Kenneth Graunke (overall driver)
- Dave Airlie (shaders, conditional render, overflow query, Gen8 port)
- Chris Wilson (fencing, pinned memory, ...)
- Jordan Justen (compute shaders)
- Jason Ekstrand (image load store)
- Caio Marcelo de Oliveira Filho (tessellation control passthrough)
- Rafael Antognolli (auxiliary buffer fixes)
- The rest of the i965 contributors and the Mesa community