I copy and pasted some of the boilerplate but never the implementation.
For now, ASTC 5x5 is disabled and faked via uncompressed RGBA; let's
delete these remnants until such a time when we implement it properly.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
To make PIPE_FORMATs usable from non-gallium parts of Mesa, I want to
move their helpers out of gallium. Since u_format used
util_copy_rect(), I moved that in there, too.
I've put it in a separate directory in util/ because it's a big chunk
of related code, and it's not clear to me whether we might want it as
a separate library from libmesa_util at some point.
Closes: #1905
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
We have to resolve destination surfaces if we are bliting to and from
the same surface.
v2: Revert unrelated change (Nanley Chery)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Add case for MCS_CCS so that we get the correct aux usage while copy
operation.
v2: Fix commit subject (Nanley Chery)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
u_upload_mgr sets it, so that util_range_add can skip the lock.
The time spent in tc_transfer_flush_region decreases from 0.8% to 0.2%
in torcs on radeonsi.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Lionel found actual documentation for this at long last. Apparently
it actually is a sampler cache limitation that was mostly fixed on
Icelake. Unfortunately, it seems there are still issues with ASTC
and non-ASTC sampler views. Still, we can lessen the flush condition
from "format mismatch" to "ASTC mismatch", which eliminates most of
the flushing here.
We also update the documentation to refer to the workaround name.
The DRI interface for modifiers with aux data treats the aux data as a
separate plane of the main surface.
When the dri layer requests the plane associated with the aux data, we
save the required information into the dri aux plane image.
Later when the image is used, the dri plane image will be available in
the pipe_resource structure's `next` field. Therefore in iris, we
reconstruct the aux setup from this separate dri plane image when the
image is used.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If our resource_copy_region size is a small number of DWords, then
instead of firing up BLORP, we can simply use MI_COPY_MEM_MEM (after
a CS stall). We also try and select the optimal batch.
Improves performance in Shadow of Mordor on Low settings at 1920x1080
on Skylake GT4e by 0.689096% +/- 0.473968% (n=4). It tries to copy
4 bytes of data to a buffer which was most recently used as a writable
compute shader SSBO. Previously we were switching from compute to the
render pipeline, then firing up all of blorp_buffer_copy...for 4 bytes.
I arbitrarily decided to support 4/8/12/16 bytes. Jason thinks this
is about the right threshold where it's cheaper to use MI_COPY_MEM_MEM.
My intention was to have iris_copy_region not do flushing, and leave
that up to the callers. iris_resource_copy_region needs to do this,
but iris_transfer_flush_region was already doing it. The net result
was that we were doing it twice for transfers.
So, move the flushing from iris_copy_region to iris_resource_copy_region
so that it only happens in the callers as I intended.
This prints a log of every PIPE_CONTROL flush we emit, noting which bits
were set, and also the reason for the flush. That way we can see which
are caused by hardware workarounds, render-to-texture, buffer updates,
and so on. It should make it easier to determine whether we're doing
too many flushes and why.
This should ensure the TC invalidate happens after the stall.
Fixes KHR-GL43.copy_image.functional which does a CopyImage (blorp_copy)
from a buffer (using R8G8B8A8_UINT), then GetTexImage to read back the
original image (using R10G10B10A2_UNORM).
When copying/blitting with format reinterpretation, we invalidate the
texture cache before/after. Before is so the source of the copy works,
and after is to get rid of our new data in the "wrong" format to protect
future attempts to sample.
When I ported these hacks to iris, I tried to be cautious by only
bothering with the hacks if the batch referenced the BO. This makes
some sense for the before case. If it isn't referenced, the texture
cache can't really have any data for the BO (since it's also invalidated
between batches). But we still need to do the after case regardless,
as we've just polluted the cache with hazardous entries.
this adds support for imports where the image data begins at an offset
from the start of the buffer, as used in h/x264
fixeskwg/mesa#47
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Applications frequently call glBufferSubData() to consecutive regions
of a VBO to append new vertex data. If no data exists there yet, we
can promote these to unsynchronized writes, even if the buffer is busy,
since the GPU can't be doing anything useful with undefined content.
This can avoid a bunch of unnecessary blitting on the GPU.
u_threaded_context would do this for us, and in fact prohibits us from
doing so (see TC_TRANSFER_MAP_NO_INFER_UNSYNCHRONIZED). But we haven't
hooked that up yet, and it may be useful to disable u_threaded_context
when debugging...at which point we'd still want this optimization. At
the very least, it would let us measure the benefit of threading
independently from this optimization. And it's not a lot of code.
Removes most stall avoidance blits in "Total War: WARHAMMER."
On my Skylake GT4e at 1920x1080, this appears to improve performance
in games by the following (but I did not do many runs for proper
statistics gathering):
----------------------------------------------
| DiRT Rally | +2% (avg) | + 2% (max) |
| Bioshock Infinite | +3% (avg) | + 9% (max) |
| Shadow of Mordor | +7% (avg) | +20% (max) |
----------------------------------------------
This is a port of Jason's 8379bff6c4
from i965 to iris. We can't find anything relevant in the documentation
and no one we've talked to has been able to help us pin down a solution.
Unfortunately, we have to put the hack in both iris_blit() and
iris_copy_region(). st/mesa's CopyImage() implementation sometimes
chooses to use pipe->blit() instead of pipe->resource_copy_region().
For blits, we only do the hack if the blit source format doesn't match
the underlying resource (i.e. it's reinterpreting the bits). Hopefully
this should not be too common.
For depth and stencil blits, we always want the main mask to be Z, and
the secondary pass mask to be S. If asked to blit Z+S to S, we should
handle the blit in the second pass which properly gets the stencil
resources.
Before, we were trying to handle S as the main mask, and accidentally
blitting a Z source to a S destination, which doesn't work out well.
Fixes Piglit's "framebuffer-blit-levels {draw,read} stencil" tests.
Take the clear depth into account when IRIS_DIRTY_DEPTH_BUFFER is marked
as dirty.
Also update the blorp surface clear color.
v2: Use a single if (zres && zres->aux.bo) (Ken).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'll want to use this for transfer maps, which already do their own
flushing. This lets us avoid a double flush, and also gives us more
control over the batch which is selected.
For texturing, we map alpha formats to the corresponding red format,
as many alpha formats are outright missing, and red is more efficient
when sampling anyway.
When rendering to A8_UNORM, we use that format directly, so the image
gets the shader output's .a/.w channel, rather than the .r/.x channel.
All other A* formats are non-renderable, so we can't do much and just
mark them as unsupported for rendering. Fortunately, GL only requires
rendering to A8_UNORM, so that works out.
According to Andre Heider and Timur Kristóf, this fixes font rendering
in Witcher 1 (via nine). Andre also reported that it fixes Unigine
Heaven (presumably via nine).
v2: Use the same swizzle for both sampler views and "render targets".
BLORP expects the read swizzle, and will take the inverse when
setting up the destination swizzle (and actually applying it in
the shaders). We ignore the format swizzle when setting up normal
rendering SURFACE_STATEs, which is necessary because it would be
an illegal shader channel select combination. Thanks to Jason
Ekstrand for pointing out that BLORP took an inverse swizzle.
Tested-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I don't know if this is required - surprisingly, I haven't seen it
matter - but I'd like to use it for multi-slice transfer maps. We may
as well do the right thing.
When we blit, transfer, or copy_resource to a buffer, we need to flush
to ensure any stale data for that buffer is invalidated in the caches.
bind_history will inform us which caches need to be flushed.
Also, for any push constant buffers, we need to flag those dirty so
that we re-emit 3DSTATE_CONSTANT_*, causing the data to be re-pushed.
Size can be too large for a surf, blorp_buffer_copy chops things up
into segments we can actually handle
Fixes map_buffer_range_test and copy_buffer_coherency
some of them had typos, didn't say 'authors or copyright holders',
or other mistakes. This is now https://opensource.org/licenses/MIT
text, formatted consistently.