Stop using brw_compiler to lower the final binding table indices for
surface access. This is done by simply not setting the
'prog_data->binding_table.*_start' fields. Then make the driver
perform this lowering.
This is a better place to perfom the binding table assignments, since
the driver has more information and will also later consume those
assignments to upload resources.
This also prepares us for two changes: use ibc without having to
implement binding table logic there; and remove unused entries from
the binding table.
Since the `block` field in brw_ubo_range now refers to the final
binding table index, we need to adjust it before using to index
shs->constbuf.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Felix noticed a crash when using INTEL_DEBUG=bat decoding. It turned
out that we were sometimes placing variable length data near the end
of a buffer, and with the decoder guessing random lengths rather than
having an actual count, it was walking off the end and crashing. So
this does more than improve the decoder output.
Unfortunately, this is a bit more complicated than i965's handling,
because we don't have a single state buffer. Various places upload
data via u_upload_mgr, and so there isn't a central place to record
the size. We don't need to catch every single place, however, since
it's only important to record variable length packets (like viewports
and binding tables).
State data also lives arbitrarily long, rather than being discarded on
every batch like i965, so we don't know when to clear out old entries
either. (We also don't have a callback when an upload buffer is
released.) So, this tracking may space leak over time. That's probably
okay though, as this is only a debugging feature and it's a slow leak.
We may also get lucky and overwrite existing entries as we reuse BOs,
though I find this unlikely to happen.
The fact that the decoder works in terms of offsets from a state base
address is also not ideal, as dynamic state base address and surface
state base address differ for iris. However, because dynamic state
addresses start from the top of a 4GB region, and binding tables start
from addresses [0, 64K), it's highly unlikely that we'll get overlap.
We can always improve this, but for now it's better than what we had.
iris_bufmgr allocates addresses across the entire screen, since buffers
may be shared between multiple contexts. There used to be a single
special address, IRIS_BINDER_ADDRESS, that was per-context - and all
contexts used the same address. When I moved to the multi-binder
system, I made a separate memory zone for them. I wanted there to be
2-3 binders per context, so we could cycle them to avoid the stalls
inherent in pinning two buffers to the same address in back-to-back
batches. But I figured I'd allow 100 binders just to be wildly
excessive/cautious.
What I didn't realize was that we need 2-3 binders per *context*,
and what I did was allocate 100 binders per *screen*. Web browsers,
for example, might have 1-2 contexts per tab, leading to hundreds of
contexts, and thus binders.
To fix this, we stop allocating VMA for binders in bufmgr, and let
the binder handle it itself. Binders are per-context, and they can
assign context-local addresses for the buffers by simply doing a
ringbuffer style approach. We only hold on to one binder BO at a
time, so we won't ever have a conflicting address.
This fixes dEQP-EGL.functional.multicontext.non_shared_clear.
Huge thanks to Tapani Pälli for debugging this whole mess and
figuring out what was going wrong.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
some of them had typos, didn't say 'authors or copyright holders',
or other mistakes. This is now https://opensource.org/licenses/MIT
text, formatted consistently.