The shader won't be available for deserialized variants, so we need to
include all the info we need for compiling variants to be in the
variant. Most of the things we dug out of the shader were various bits
from nir_shader_info which we move into ir3_shader_variant.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16147>
Saves per-batch allocations, avoids reallocation for various vertex
counts, and avoids needing the indirect tess addrs constobj so that we
could emit the relocs to the tess BO after we'd emitted all the draws.
Also apparently it fixes one of our CTS fails.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13851>
Use clangs thread-safety annotations to implement a virtual lock
protecting context fields that should only be accessed from driver-
thread. This should let the compiler help us detect problems where
ctx is used unsafely from things that could be called by the fe/st
thread.
This does end up sprinkled far and wide, it would be nice if the
compiler could be a bit smarter about understanding call-graphs
(at least with static fxns), but at least it makes it clear where
things are called from which thread.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9061>
Right now if the shader indirects on some large constant array, we see NIR
load_consts (usually from the const file) of its contents into general
registers, then indirection on the GPRs. This often results in register
allocation failures, as it's easy to go beyond the ~256 dwords of
registers per invocation.
By moving the large constants to a UBO, we can load an arbitrary number of
them. They also can be theoretically moved to the constant reg file (~2k
dwords), though you're unlikely to hit this path without an indirect load
on your large constant, and we don't yet let UBO indirect loads get moved
to constant regs.
This possibly won't work out right if we have 16-bit load_constants, but
without other MRs in flight we won't see 16-bit temps to be lowered to
this.
This allows 2 kerbal-space-program shaders to compile that previously
would fail, and fixes the new dEQP-VK and -GLES2 tests I wrote that
dynamically index a 40-element temporary array of float/vec2/vec3/vec4
with constant element initializers.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2789
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
They're almost entirely split by whether you're uploading user buffer or
from a BO. While I'm rewriting the API, drop the emit_const ->
fdN_emit_const wrapper in favor of a #define before the header and a
little helper for the asserts.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5990>
The only difference between this and `const_state->num_ubos` was that
the latter is counting # of ubos loaded via `ldg` (based on UBO addrs
in push-consts). But turns out there isn't really any reason to care.
Instead just add an early return in the one code-path that cares about
the number of `ldg` UBOs.
This gets rid of one more thing we need to move from `ir3_shader` to
`ir3_shader_variant`.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
We are going to want to move this back to the variant, and come up with
a different strategy for binning/nonbinning to share the same constant
layout, in order to implement shader-cache support. (Since then we
can have a mix of dynamically compiled variants and cache hits, so there
is no good place to serialize the const-state.)
To reduce the churn as we re-arrange things, move direct access to the
const-state to a helper fxn. This patch is the boring churny part.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5508>
It turns out the GL uniforms file is larger than the hardware constant
file, so we need to limit how many UBOs we lower to constbuf loads. To do
actual UBO loads, we'll need to be able to upload UBO 0's pointer or
descriptor.
No difference on nohw 1 UBO update drawoverhead case (n=35).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
For now we never ask to set up UBO 0 as a real UBO, so this doesn't
trigger, but it gets us ready for handling the case where UBO 0 is too big
to be push constants in the HW.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5273>
It saves addressing math, but may cause multiple loads to be done and
bcseled due to NIR not giving us good address alignment information
currently. I don't have any workloads I know of using non-const-uploaded
UBOs, so I don't have perf numbers for it
This makes us match the GLES blob's behavior, and turnip (other than being
bindful).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4858>
Drop vfunc callbacks for per-gen packet emit, and instead have a header
that is #include'd once per gen.
We'll end up with multiple copies of some of this, but since we never
have multiple gen's of adreno on a single device, only one copy will be
paged in (and hopefully in the I-cache for hot-paths)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4813>
In order to inline the const emit and drop the per-gen vfuncs to emit
the correct sort of packet, we should consolidate all of the entry-
points to const emit in one object file, otherwise we'll end up with
multiple copies per gen.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4813>