This implements PIPE_CAP_INVALIDATE_BUFFER and invalidate_resource(),
as well as the PIPE_TRANSFER_DISCARD_WHOLE_RESOURCE flag. When either
of these happen, we swap out the backing storage of the buffer for a
new idle BO, allowing us to write to it immediately without stalling
or queueing a blit.
On my Skylake GT4e at 1920x1080, this improves performance in games:
-----------------------------------------------
| DiRT Rally | +25% (avg) | +17% (max) |
| Bioshock Infinite | +22% (avg) | +11% (max) |
| Shadow of Mordor | +27% (avg) | +83% (max) |
-----------------------------------------------
This unifies a bunch of the UBO and SSBO code to use common structures.
Beyond iris_state_ref, pipe_shader_buffer also gives us a buffer size,
which can be useful when filling out the surface state.
Marek recently extended pipe->set_shader_buffers() to take an extra
writable_bitmask parameter, indicating which SSBOs are writable (some
may be bound read-only). We can use this to decide whether to set
EXEC_OBJECT_WRITE when pinning. Avoiding the write flag can save us
some cross-batch flushing if the SSBO is used for reading in both the
render and compute engines.
Pipeline statistics queries should not count BLORP's rectangles.
(23) How do operations like Clear, TexSubImage, etc. affect the
results of the newly introduced queries?
DISCUSSION: Implementations might require "helper" rendering
commands be issued to implement certain operations like Clear,
TexSubImage, etc.
RESOLVED: They don't. Only application submitted rendering
commands should have an effect on the results of the queries.
Piglit's arb_pipeline_statistics_query-vert_adj exposes this bug when
the driver is hacked to always perform glBufferData via a GPU staging
copy (for debugging purposes).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
libintel_common depends on libintel_compiler, but it contains debug
functionality that is needed by libintel_compiler. Break the circular
dependency by moving gen_debug files to libintel_dev.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
MI_PREDICATE_DATA is an intermediate storage for the MI_PREDICATE
command's calculations - it holds the result of the subtraction when
the compare operation is SRCS_EQUAL or DELTAS_EQUAL. But the actual
result of the predication is MI_PREDICATE_RESULT, which is what we
want to copy from the render context to the compute context.
This function can be used to stall on the CPU and resolve the predicate
for the conditional render. It will convert ice->state.predicate from
IRIS_PREDICATE_STATE_USE_BIT to either IRIS_PREDICATE_STATE_RENDER or
IRIS_PREDICATE_STATE_DONT_RENDER, depending on the result of the query.
v2:
- return void (Ken)
- update the stored condition (Ken)
- simplify the code leading to resolve the predicate (Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I'll want to use this for transfer maps, which already do their own
flushing. This lets us avoid a double flush, and also gives us more
control over the batch which is selected.
Gallium might call us multiple times to bind subsets of the samplers,
at which point we'd recreate the table a bunch of times. It doesn't
really buy us anything to do it here - even if we defer to draw time,
the dirty tracking ensures we'll only do it on the first draw after a
bind_sampler_states() call.
We now use the number of samplers specified by the shader instead of
the binding count. If this number changes, we flag sampler state as
dirty so we re-upload a table with the right number of entries.
This also fixes a bug where ice->state.need_border_colors was never
unset, so once something needed border colors, the pool would always
be pinned in all future batches.
v2: Explicitly flag sampler states as dirty, rather than assuming that
bind_sampler_states() will be called if the program texture count
changes. While this may be true for st/mesa, it isn't the case for
Gallium HUD.
Tested-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Andre Heider <a.heider@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If Vertex Shader uses EdgeFlag the hardware request that it is setup
as the last VERTEX_ELEMENT_STATE. If SGVS are add at draw time we
need to also reconfigure the last 3DSTATE_VF_INSTANCING so its
VertexElementIndex points to the new Vertex Element that contains
the EdgeFlag.
So if draw parameters or edgeflag are not used the CSO generated at
iris_create_vertex_element is sent directly in the batches. But if
edge flag is used we adjust last VERTEX_ELEMENT_STATE and
last 3DSTATE_VF_INSTANCING using their alternative edge flag version
we generate at iris_create_vertex_element and store at the CSO.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Additional VERTEX_ELEMENT_STATE are used to store basevertex and
baseinstance and drawid updating the DWordLength of the
3DSTATE_VERTEX_ELEMENTS command.
This passes all piglit tests for spec.*draw_parameters.* tests
and VK-GL-CTS KHR-GL45.shader_draw_parameters_tests.* tests.
Now we only mark a dirty_update when parameters are changed or
when we have an indirect draw.
We enable PIPE_CAP_DRAW_PARAMETERS on Iris.
There is no edge flag support in the Vertex Elements setup.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of allocating 4K BO per query object, we can create a large blob
of memory and split it into pieces as required.
Having one BO for multiple query objects, we don't want to wait on all
of them, instead when we write last snapshot, we create a sync point, and
check syncpoints while waiting on particular object.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
In st_nir_lower_uniforms_to_ubo() all UBO access in the shader have
its index incremented to open room for uniforms in constbuf0. So if
we use UBOs, we always need to include the extra binding entry in the
table.
To avoid doing this checks both when compiling the shader and when
assigning binding tables, store the num_cbufs in iris_compiled_shader.
Fixes a bunch of tests from Piglit and CTS that use UBOs but don't use
uniforms or system values. Note that some tests fitting this criteria
were passing because the UBOs were moved to be push
constants (avoiding the problem).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We were relying on CSE/GVN/etc to coalesce all intrinsics that load the
same value, but that's a bad idea. We might have a couple intrinsics
that reload the same value. If so, we only want to set up the uniform
on the first one we see.
I was using the Gallium API wrong. set_* functions with start_slot
and count parameters are supposed to update a subrange of the items.
I had been trashing all bound vertex buffers and starting over.
This should hopefully also make it easier to slot in additional
VERTEX_BUFFER_STATEs at draw time, say, for shader draw parameters.
This exposes iris_upload_shader() without having to bind it, which will
be useful for precompiles. It also lets us examine the old programs and
flag dirty bits at a higher level, rather than cramming all that
knowledge into the cache layer.