Split the write desc helpers in two halves, one taking a descriptor
offset directly, and the other one taking a descriptor set pointer.
This will allow us to pre-calculate descriptor offsets when creating
a descriptor_update template and speed up a bit the write step in that
case.
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15691>
Let's be consistent with other helpers taking a dzn_descriptor_set_ptr
object and prefix all such functions with dzn_descriptor_set_ptr_.
We also rename dzn_descriptor_set_ptr_get_desc_vk_type() into
dzn_descriptor_set_ptr_get_vk_type() to shorten it a bit.
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15691>
GL spec states that the stride for indirect multidraws:
* cannot be negative
* can be zero
* must be a multiple of 4
some drivers can't support strides which are not a multiple of the
size of the indirect struct being used, however, so rewrite those to
direct draws
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15963>
Towards the renderer, venus better uses VK_EXT_image_drm_format_modifier
to force linear with tiling modifier and mod_linear. Doing so won't make
any difference on the mesa implementations we care about given we have
required VK_EXT_image_drm_format_modifier for wsi support.
A lucky side effect of this is to allow common wsi to work with host
implementations not supporting dma_buf export.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15993>
this was an attempt to minimize the number of xfb barriers being emitted,
but really xfb barriers need to always be emitted in order for xfb to work
cc: mesa-stable
fixes (nv):
KHR-GL46.texture_view.reference_counting
KHR-GL46.transform_feedback_overflow_query_ARB.multiple-streams-multiple-buffers-per-stream
KHR-GL46.transform_feedback_overflow_query_ARB.multiple-streams-one-buffer-per-stream
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16065>
this was well-documented, but ultimately wrong: the synchronization
being used was for binding streamout buffers (not counter buffers) as
vertex buffers, which was already handled just fine in the normal
vertex buffer binding
drawing from streamout ONLY uses the counter buffer, which means
the counter buffer needs to be synchronized for reading
cc: mesa-stable
fixes (nv):
KHR-GL46.transform_feedback.draw_xfb_feedbackk_test
KHR-GL46.transform_feedback.draw_xfb_instanced_test
KHR-GL46.transform_feedback.draw_xfb_stream_instanced_test
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16065>
I had some workarounds in ALU op emits trying to fix up when we were asked
to store to unsupported channels when the ALU op had 64bit srcs (so only
vec2 supported) but a 32-bit dest with a >vec2 writemask.
Those workarounds had some bugs breaking 64-bit uniform initializer tests
on virgl, and also set up too wide of a writemask such that they triggered
assertion failures on nvc0. We can avoid the need for those workarounds
at emit time by just having nir_lower_vec_to_movs not generate unsupported
writemasks in the first place.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15934>
Marge jobs are failing at their 1 hour timeout regularly because windows
CI lacks capacity. In the job I looked at, this test took 18 minutes,
which is surely contributing to the load. Cut it down to get us some hope
of getting MRs through that run windows jobs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16062>
There's no hardware instructions for them until then. These chips don't
expose the extension provinding the GLSL builtins for operations like
bfrev, but NIR can recognize the construct and optimize it to
bitfield_reverse, which pre-nvc0 would then fail to codegen. Prevents a
regression when moving to nir-to-tgsi. Other lower_bitfield flags are set
as well for when someone comes along and adds optimizations for them too.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16063>
Also adjust the lowering pass to handle wide SSBO loads that we now emit
for the nir case.
This improves generated code quality since memoryopt can't
merge SSBO loads that end up predicated on a bounds check.
This also happens to fix a few test cases, only because the simpler generated
IR is less likely to trigger other compiler bugs. Eg on kepler with
NV50_PROG_USE_NIR=1, this fixes
arb_gpu_shader_fp64-fs-non-uniform-control-flow-ubo
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16063>
For !8044 I'm working on getting all drivers to accept NIR. The NIR
compiler in the driver is apparently not quite ready, so use NIR-to-TGSI
instead. This is a net win in testcases working on my RV770 and Turks
cards (especially in some important piglit tests involving YUV dma-buf
decode), though it's not regression-free.
shader-db (R600):
total dw in shared programs: 8553412 -> 8358918 (-2.27%)
dw in affected programs: 7476702 -> 7282208 (-2.60%)
total gprs in shared programs: 217286 -> 213217 (-1.87%)
gprs in affected programs: 72747 -> 68678 (-5.59%)
total loops in shared programs: 398 -> 330 (-17.09%)
loops in affected programs: 68 -> 0
total cf in shared programs: 558835 -> 332768 (-40.45%)
cf in affected programs: 420475 -> 194408 (-53.76%)
shader-db (Turks):
total dw in shared programs: 14104598 -> 13556782 (-3.88%)
dw in affected programs: 12161972 -> 11614156 (-4.50%)
total gprs in shared programs: 321068 -> 313690 (-2.30%)
gprs in affected programs: 114899 -> 107521 (-6.42%)
total loops in shared programs: 736 -> 651 (-11.55%)
loops in affected programs: 111 -> 26 (-76.58%)
total cf in shared programs: 925771 -> 581226 (-37.22%)
cf in affected programs: 678600 -> 334055 (-50.77%)
total stack in shared programs: 27853 -> 27855 (<.01%)
stack in affected programs: 5 -> 7 (40.00%)
glmark2 terrain: 0.137649% +/- 0.0511938% (n=6)
glmark2 jellyfish: no change (n=8)
unigine valley (extreme) 5.36 -> 5.45 (n=1 it takes so long to run)
unigine heaven (basic) 16.13 -> 16.15 (n=1)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14319>
KHR-GL33.shaders.indexing.tmp_array.vertexid regressed with the switch to
NIR-to-TGSI because the shader got optimized enough to emit a read just
after writing to the array (the kind of situation where a non-rel write
would have been followed by a PV/PS read). The R600 and EG docs say you
always need to do this, but apparently some hardware gives you the right
answer anyway so we don't flag it on all of them.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14319>
We are underprovisioned for Windows, almost certainly not running wide
enough on the insufficient number of slots we do have, and there are
also indications that the machine itself is having physical issues.
Disable it until it's fixed.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16055>