This patch has multiple goals:
1. Off-load the writing of records in 'always' mode to another thread
for performance.
2. Allow using ddebug with threaded contexts. This really forces us to
move some of the "after_draw" handling into another thread.
3. Simplify the different modes of ddebug, both in the code and in
the user interface, i.e. GALLIUM_DDEBUG. In particular, there's
no 'pipelined' anymore, since we're always pipelined; and 'noflush'
is replaced by 'flush', since we no longer flush by default.
4. Fix the fences in pipelining mode. They previously relied on writes
via pipe_context::clear_buffer. However, on radeonsi, those could
(quite reasonably) end up in the SDMA buffer. So we use the newly
added PIPE_FLUSH_{TOP,BOTTOM}_OF_PIPE fences instead.
5. Improve pipelined mode overall, using the finer grained information
provided by the new fences.
Overall, the result is that pipelined mode should be more useful, and
using ddebug in default mode is much less invasive, in the sense that
it changes the overall driver behavior less (which is kind of crucial
for a driver debugging tool).
An example of the new hang debug output:
Gallium debugger active.
Hang detection timeout is 1000ms.
GPU hang detected, collecting information...
Draw # driver prev BOP TOP BOP dump file
-------------------------------------------------------------
2 YES YES YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000000
3 YES NO YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000001
4 YES NO YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000002
5 YES NO YES NO /home/nha/ddebug_dumps/shader_runner_19919_00000003
Done.
We can see that there were almost certainly 4 draws in flight when
the hang happened: the top-of-pipe fence was signaled for all 4 draws,
the bottom-of-pipe fence for none of them. In virtually all cases,
we'd expect the first draw in the list to be at fault, but due to the
GPU parallelism, it's possible (though highly unlikely) that one of
the later draws causes a component to get stuck in a way that prevents
the earlier draws from making progress as well.
(In the above example, there were actually only 3 draws truly in flight:
the last draw is a blit that waits for the earlier draws; however, its
top-of-pipe fence is emitted before the cache flush and wait, and so
the fact that the draw hasn't truly started yet can only be seen from a
closer inspection of GPU state.)
Acked-by: Marek Olšák <marek.olsak@amd.com>
If the last operation happens to be a non-draw, such as a
transfer_map that triggers a decompress blit, there may be
interesting messages left in the driver log.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is mostly mechanical search-and-replace, plus touching up the
macros in u_dump_defines.c manually a bit.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
NIR shaders are not captured properly in pipelined mode currently. This
would require shader cloning, which requires linking all the Gallium
drivers against NIR. We can always do that later.
v2: avoid immediate crashes in pipelined mode
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
pipe_draw_info::indexed is replaced with index_size. index_size == 0 means
non-indexed.
Instead of pipe_index_buffer::offset, pipe_draw_info::start is used.
For indexed indirect draws, pipe_draw_info::start is added to the indirect
start. This is the only case when "start" affects indirect draws.
pipe_draw_info::index is a union. Use either index::resource or
index::user depending on the value of pipe_draw_info::has_user_indices.
v2: fixes for nine, svga
This was made unnecessary with fd33a6bcd7.
This was mostly done with:
find ./src -type f -exec sed -i -- \
's:PIPE_THREAD_ROUTINE(\([^,]*\), \([^)]*\)):int\n\1(void \*\2):g' {} \;
With some small manual tidy ups.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
pipe_mutex_unlock() was made unnecessary with fd33a6bcd7.
Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_unlock(\([^)]*\)):mtx_unlock(\&\1):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
replace pipe_mutex_lock() was made unnecessary with fd33a6bcd7.
Replaced using:
find ./src -type f -exec sed -i -- \
's:pipe_mutex_lock(\([^)]*\)):mtx_lock(\&\1):g' {} \;
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
required by glClientWaitSync (GL 4.5 Core spec) that can optionally flush
the context
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
For good performance while being able to generate decent hang reports.
The report doesn't contain the parsed IB and the buffer list, but it
isolates the draw call and dumps shaders while not having to flush
the context.
This is for GPU hangs that are harder to reproduce and require interactive
playing for minutes or even hours.
dd_pipe.h explains some implementation details. Initializing, copying
(recording) and clearing states is most of the code.
The performance should be at least 50% of the normal performance depending
on the circumstances. (i.e. 50% is expected to be the worst case scenario,
not the best case) The majority of time is spent in
dump_debug_state(PIPE_DUMP_CURRENT_SHADERS) and that's after all
the optimizations in later patches. There is no obvious way to optimize
that further.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The pipelined hang detection mode will not want to dump everything.
(and it's also time consuming) It will only dump shaders after a draw call
and then dump the status registers separately if a hang is detected.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This currently just writes out the name of dump files, which can be useful
to easily correlate those files with other log outputs (driver debug output,
apitrace calls, etc.)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This changes the default behavior of 'always' mode to be consistent with
hang detection mode.
I have used this to more easily compare dumped command streams using diff.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This helps in the use of GALLIUM_DDEBUG_SKIP: first run a target application
with skip set to a very large number and note how many draw calls happen
before the bug. Then re-run, skipping the corresponding number of calls.
Despite the additional run, this can still be much faster than not skipping
anything.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
When we know that hangs occur only very late in a reproducible run (e.g.
apitrace), we can save a lot of debugging time by skipping the flush and hang
detection for earlier draw calls.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
v2: lots of improvements
This is like identity or trace, but simpler. It doesn't wrap most states.
Run with:
GALLIUM_DDEBUG=1000 [executable]
where "executable" is the app and "1000" is in miliseconds, meaning that
the context will be considered hung if a fence fails to signal in 1000 ms.
If that happens, all shaders, context states, bound resources, draw
parameters, and driver debug information (if any) will be dumped into:
/home/$username/dd_dumps/$processname_$pid_$index.
Note that the context is flushed after every draw/clear/copy/blit operation
and then waited for to find the exact call that hangs.
You can also do:
GALLIUM_DDEBUG=always
to do the dumping after every draw/clear/copy/blit operation without
flushing and waiting.
Examples of driver states that can be dumped are:
- Hardware status registers saying which hw block is busy (hung).
- Disassembled shaders in a human-readable form.
- The last submitted command buffer in a human-readable form.
v2: drop pipe-loader changes, drop SConscript
rename dd.h -> dd_pipe.h
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>