This fixes#9807 but I don't understand why.
Emitting cache flushes before VGT_PRIMITIVE_TYPE is what makes
the problem go away but changing the order in si_draw() is clearer.
The only cases where sctx->flags is modified in si_emit_draw_registers
is handled using si_emit_cache_flush_direct so we can move cache
flushing up without any addtional conditionals.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9807
Fixes: 1e4b539042 ("radeonsi: handle deferred cache flushes as a state (si_atom)")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27095>
(cherry picked from commit 0e16da89fe)