While a resource might be shared across different contexts all synchronization
of commands is the resposibility of the user (OpenGL spec Chapter 5 "Shared
Objects and Multiple Contexts").
Currently etnaviv tries to be extremely helpful by flushing foreign contexts
when using a resource that is still pending there. This introduces a lot of
issues, as context flushes can now happen at basically any time and also
introduces a lot of overhead due to the needed tracking and locking. Get rid
of all this cross-context tracking and flushing.
The only real requirement here is that we need to track pending resources
without mutating the state of the etna_resource, as this might be shared
across multiple contexts and thus be used by multiple threads concurrently.
Introduce a hash table to track the current pending resources and their
states in the local context.
A side-effect of this change is that we don't need to keep all used
resources referenced until context flush time, as we don't need to mutate
them anymore to track the status. This allows to free some of them a bit
earlier. Note that this introduces a small possibility of a new resource
matching the key of a already destroyed resource still in the
pending_resources hashtable, but even if we get such a collision the worst
outcome is a not strictly necessary flush of the local context being
performed, which is acceptable if it doesn't happen very often.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14466>
This might be called from multiple threads at the same time. To avoid
taking a global lock just to guard against the fairly low chance of
multiple threads calling this on the same BO at the same time, we allow
for the threads to race. All threads will set up a mapping, but only
the first thread is able to set the map member of the etna_bo, all other
threads just roll back and use the mapping set up by the winning thread.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14466>
The mmap offset is the only information we currently get from
DRM_ETNAVIV_GEM_INFO and there is no point in storing this
offset after the mapping has been established. Reduce the
shared mutable state on the etna_bo by inlining fetching the
offset into etna_bo_map.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14466>
Currently the buffer index hash is only used if the BO is used in
multiple streams and the current index is cached on the BO. This
introduces some shared state on the BO, which necessitates the use
of a lock to keep this state consistent across threads, which
negates some of the benefits of caching the index.
Always use the hash to keep track of the submit BOs, to get rid
of the shared state and simplify the code.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14466>
The LLVM backend is not officially supported by the RADV developers,
but it has been useful early during bring-up, or later when users are
experiencing what looks like a compiler bug. It is thus beneficial to
keep it working.
However, maintaining the vkcts expectations for every platform requires
more work and machine time than what we would like to commit to. This
is why we agreed that we would only keep LLVM tested on the latest
family of Radeon GPUs.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16841>
Most CI tests are currently running on LLVM 11 (released over 2 years
ago), which predates some of the GPUs we have in CI and prevents
testing RADV's LLVM backend.
LLVM 13 is known to work for RADV, released almost 8 months ago, and
is already available in most distributions. Fedora 36 is even already
on LLVM 14.
So this commit updates x86 testing on llvm 13.
v2:
- store the llvm apt repo key locally (Michel Dänzer)
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17188>
This avoids using a VGPR and uses two SGPRs
instead since we only need to store 2 bits.
Quake II RTX:
Totals from 7 (0.46% of 1513) affected shaders:
CodeSize: 229364 -> 229148 (-0.09%); split: -0.12%, +0.02%
Instrs: 41937 -> 41879 (-0.14%)
Latency: 977374 -> 976723 (-0.07%)
InvThroughput: 651582 -> 651148 (-0.07%)
Copies: 5064 -> 5033 (-0.61%)
PreSGPRs: 430 -> 433 (+0.70%)
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17293>
It was previously done dzn_cmd_buffer_flush_transition_barriers(),
leaving the queue+flush case unhandled. Let's fix that by moving
this piece of code to dzn_cmd_buffer_exec_transition_barriers().
Fixes: 35356b1173 ("dzn: Cache and pack transition barriers")
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17314>
While we've taken advantage of split-sends in select situations, there
are many other cases (such as sampler messages, framebuffer writes, and
URB writes) that have never received that treatment, and continued to
use monolithic send payloads.
This commit introduces a new optimization pass which detects SEND
messages with a single payload, finds an adjacent LOAD_PAYLOAD that
produces that payload, splits it two, and updates the SEND to use both
of the new smaller payloads.
In places where we manually used split SENDS, we rely on underlying
knowledge of the message to determine a natural split point. For
example, header and data, or address and value.
In this pass, we instead infer a natural split point by looking at the
source registers. Often times, consecutive LOAD_PAYLOAD sources may
already be grouped together in a contiguous block, such as a texture
coordinate. Then, there is another bit of data, such as a LOD, that
may come from elsewhere. We look for the point where the source list
switches VGRFs, and split it there. (If there is a message header, we
choose to split there, as it will naturally come from elsewhere.)
This not only reduces the payload sizes, alleviating register pressure,
but it means that we may be able to eliminate some payload construction
altogether, if we have a contiguous block already and some extra data
being tacked on to one side or the other.
shader-db results for Icelake are:
total instructions in shared programs: 19602513 -> 19369255 (-1.19%)
instructions in affected programs: 6085404 -> 5852146 (-3.83%)
helped: 23650 / HURT: 15
helped stats (abs) min: 1 max: 1344 x̄: 9.87 x̃: 3
helped stats (rel) min: 0.03% max: 35.71% x̄: 3.78% x̃: 2.15%
HURT stats (abs) min: 1 max: 44 x̄: 7.20 x̃: 2
HURT stats (rel) min: 1.04% max: 20.00% x̄: 4.13% x̃: 2.00%
95% mean confidence interval for instructions value: -10.16 -9.55
95% mean confidence interval for instructions %-change: -3.84% -3.72%
Instructions are helped.
total cycles in shared programs: 848180368 -> 842208063 (-0.70%)
cycles in affected programs: 599931746 -> 593959441 (-1.00%)
helped: 22114 / HURT: 13053
helped stats (abs) min: 1 max: 482486 x̄: 580.94 x̃: 22
helped stats (rel) min: <.01% max: 78.92% x̄: 4.76% x̃: 0.75%
HURT stats (abs) min: 1 max: 94022 x̄: 526.67 x̃: 22
HURT stats (rel) min: <.01% max: 188.99% x̄: 4.52% x̃: 0.61%
95% mean confidence interval for cycles value: -222.87 -116.79
95% mean confidence interval for cycles %-change: -1.44% -1.20%
Cycles are helped.
total spills in shared programs: 8387 -> 6569 (-21.68%)
spills in affected programs: 5110 -> 3292 (-35.58%)
helped: 359 / HURT: 3
total fills in shared programs: 11833 -> 8218 (-30.55%)
fills in affected programs: 8635 -> 5020 (-41.86%)
helped: 358 / HURT: 3
LOST: 1 SIMD16 shader, 659 SIMD32 shaders
GAINED: 65 SIMD16 shaders, 959 SIMD32 shaders
Total CPU time (seconds): 1505.48 -> 1474.08 (-2.09%)
Examining these results: the few shaders where spills/fills increased
were already spilling significantly, and were only slightly hurt. The
applications affected were also helped in countless other shaders, and
other shaders stopped spilling altogether or had 50% reductions. Many
SIMD16 shaders were gained, and overall we gain more SIMD32, though many
close to the register pressure line go back and forth.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17018>
SEND messages with EOT need to use g112-g127 for their sources so that
the hardware is able to launch new threads while old ones are finishing
without worrying about register overlap when pushing payloads. For the
newer split-send messages, this applies to both source registers.
Our special case for this in the register allocator was only considering
the first source. This wasn't a problem because we hadn't ever tried to
use split-sends with EOT before. However, my new optimization pass is
going to introduce some shortly, so we'll need to handle them properly.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17018>
We had been using thread_local index -> opcode_desc tables to avoid
plumbing through a storage location throughout all the code. But now
we have done so with the new brw_isa_info structure. So we can just
store the tables there, and initialize it with the compiler.
This fixes crashes in gtk4-demo on iris, and should help with some
programs on zink as well. Something was going wrong with the
thread_local variables not being set up correctly. While we might be
able to work around that issue, there's really no advantage to storing
these lookup tables in TLS (beyond it being simpler to do originally).
So let's simply stop doing so.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6728
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6229
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>
This patch creates a new header file, brw_isa_info.h, which will
contains all the functions related to opcode encoding on various
generations. Opcode numbers may have different meanings on different
hardware, so we remap them between an enum we can easily work with
and the hardware encoding.
We move the brw_inst setters and getters to brw_inst.h.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17309>