Prior to this change, agx_opt_cse is our most expensive backend pass, due to the
time spent hashing instructions. hash_instr was calling into XXH32 a massive
number of times, often to hash only a single bit. It's much faster to hash
entire blocks of memory at a time. Optimize to do just that.
With this change, agx_opt_cse is now cheaper than instruction selection as
it should be.
No shader-db changes (except CPU time decrease).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
If a render target isn't written to, we don't use the sample mask. Avoid
generating the intermediate instructions, common with gl_FragColor. It will get
DCE'd, but this means less work for DCE, which should help for shader jank since
this pass gets called per-variant.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>
We can end up logging both the buffer that the toplevel cmdstream is
allocated, as well as the sub-allocated part of that buffer. Possibly
the kernel could do better about this, but to avoid undecodeable
cmdstream dumps and devcores, detect this case and deal with it.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20496>
Some CTS tests enable all extensions ... , which combined with having
no shader cache on some platforms results in some CTS tests timing
out (in particular tests recreating the device all the time).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20422>
Nothing significant in shader-db on RV530:
total instructions in shared programs: 134963 -> 134957 (<.01%)
instructions in affected programs: 1108 -> 1102 (-0.54%)
helped: 7
HURT: 1
total temps in shared programs: 17153 -> 17154 (<.01%)
temps in affected programs: 38 -> 39 (2.63%)
helped: 2
HURT: 3
Just some fluctuations from pair scheduling due to different code order.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20208>
Negligible amount of instructions saved on RV530:
total instructions in shared programs: 134970 -> 134963 (<.01%)
instructions in affected programs: 2273 -> 2266 (-0.31%)
helped: 9
HURT: 1
The one hurt shader is when we fail to recognize the x - ffract(x)
pattern and skip the don't emit ftrunc optimization as implemented
in the previous patch due to some non-trivial swizzles going on.
Signed-off-by: Pavel Ondračka <pave.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Filip Gawin <filip@gawin.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20208>
We already skip emitting ftrunc in nir_lower_int_to_float when there is
ffloor, fround or any other integer-making opcode preceding f2i32. However
if lower_ffloor is set for driver that doesn't support integers, the lowered
x - ffract(x) patterns would not be recognized and extra ftruct would be
emitted, doing unnecessary rounding.
This optimization only works if there is no non-trivial swizzling used for
the fadd, fneg and ffract involved, which seems to be 99% of the cases according
to my testing.
This is needed to enable nir ffloor lowering on r300 driver without regressions.
I'm not sure if this helps anybody else, the only hardware which sets
lower_ffloor and converts ints to floats (and can't do trunc) are some old
etnaviv cards, so maybe it will help there a bit.
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20208>
dangling_attr_ref=true can be set when the following happens:
glBegin(GL_TRIANGLES)
glVertex(...)
glVertex(...)
glColor4(...)
glVertex(...)
When glColor4 is hit, the first 2 vertices are copied to the vertex_store
by upgrade_vertex, but since this is done before glColor4 new values are
copied, we make a note to fixup these attribute laters using dangling_attr_ref.
This causes very slow rendering. What this commit does instead, is in this
situation, the new attribute value are backported to the vertex store for the
copied vertices after upgrade_vertex is done updating the layout.
This avoids the slow corner case.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7912
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20495>