Alyssa Rosenzweig
acb10043cb
nvk: add instruction count exec property
...
useful for shader-db, this isn't as simple as dividing the code size so it's
worth reporting.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: M Henning <drawoc@darkrefraction.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30136 >
2024-07-16 15:13:40 +00:00
Faith Ekstrand
4030447dab
nak: gather instr count explicitly
...
This isn't as simple as dividing so we want a real shader info property for nvk
to consume. Plumb one through.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: M Henning <drawoc@darkrefraction.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30136 >
2024-07-16 15:13:40 +00:00
Alyssa Rosenzweig
67e3b3fbfd
nouveau/drm-shim: set ram_user
...
this is required for nvk to create a heap. fixes zink+nvk+drm-shim:
run: ../src/gallium/drivers/zink/zink_screen.c:3371: zink_internal_create_screen: Assertion `i == ZINK_HEAP_HOST_VISIBLE_COHERENT_CACHED || i == ZINK_HEAP_DEVICE_LOCAL_LAZY || i == ZINK_HEAP_DEVICE_LOCAL_VISIBLE' failed.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: M Henning <drawoc@darkrefraction.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30136 >
2024-07-16 15:13:40 +00:00
Daniel Schürmann
6723128e94
aco/spill: Don't add phi definitions to live-in variables
...
Changes are because we don't add artificial uses to the phi definitions anymore.
Totals from 13 (0.02% of 79395) affected shaders: (GFX10.3)
Instrs: 230510 -> 230285 (-0.10%); split: -0.10%, +0.00%
CodeSize: 1269916 -> 1268760 (-0.09%); split: -0.10%, +0.01%
SpillSGPRs: 2057 -> 2058 (+0.05%)
Latency: 2729731 -> 2723103 (-0.24%)
InvThroughput: 696888 -> 695286 (-0.23%)
VClause: 5795 -> 5768 (-0.47%)
SClause: 6855 -> 6858 (+0.04%)
Copies: 32336 -> 32275 (-0.19%); split: -0.22%, +0.03%
VALU: 151782 -> 151731 (-0.03%); split: -0.04%, +0.01%
SALU: 30766 -> 30758 (-0.03%); split: -0.03%, +0.01%
VMEM: 12157 -> 12078 (-0.65%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30120 >
2024-07-16 14:00:49 +00:00
Daniel Schürmann
bb5af6bede
aco: remove live-out variables from IR
...
Since we changed all passes to use the live-in variables,
these are not needed anymore.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30120 >
2024-07-16 14:00:49 +00:00
Daniel Schürmann
f86816ca85
aco/print_ir: print live-in instead of live-out variables
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30120 >
2024-07-16 14:00:49 +00:00
Daniel Schürmann
043ec096c1
aco/validate: use live-in variables for RA validation
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30120 >
2024-07-16 14:00:49 +00:00
Daniel Schürmann
976dd71942
aco/cssa: use live-in variables instead of live-out variables
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30120 >
2024-07-16 14:00:49 +00:00
Daniel Schürmann
c146d4b6b6
aco/spill: use live-in variables directly rather than computing them
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30120 >
2024-07-16 14:00:49 +00:00
Daniel Schürmann
162876c875
aco/ra: use live-in variables directly rather than computing them
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30120 >
2024-07-16 14:00:49 +00:00
Daniel Schürmann
29262f8cf3
aco: compute live-in variables in addition to live-out variables
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30120 >
2024-07-16 14:00:49 +00:00
Mary Guillemard
9a4a03ec1f
ci/panfrost: Update t760 fails
...
Add a new failure found while running gles job manually
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com >
Reviewed-by: Eric R. Smith <eric.smith@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30150 >
2024-07-16 13:10:56 +00:00
Mary Guillemard
32a4596d17
panfrost: Handle gracefully resource BO alloc failures
...
This makes panfrost_bo_alloc not assert anymore and propagate the
failure again.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com >
Reviewed-by: Eric R. Smith <eric.smith@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30150 >
2024-07-16 13:10:56 +00:00
Mary Guillemard
71a24a0c5e
panfrost: Handle context_init errors correctly
...
This fix OpenCL CTS "multiple_device_context/test_multiples" failures.
Also improve create_context/destroy error management a bit.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com >
Reviewed-by: Eric R. Smith <eric.smith@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30150 >
2024-07-16 13:10:56 +00:00
Mary Guillemard
668bde4421
pan/kmod: Avoid deadlock on VA allocation failure on panthor
...
Fixes: 97f6a62f7e
("pan/kmod: Add a backend for panthor")
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com >
Reviewed-by: Eric R. Smith <eric.smith@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30150 >
2024-07-16 13:10:55 +00:00
Daniel Schürmann
ffef3d1709
nir/opt_sink: ignore loops without backedge
...
Loops without backedge should not be considered loops.
For RADV, 2069 (2.61% of 79395) affected shaders.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28783 >
2024-07-16 12:29:08 +00:00
Daniel Schürmann
79875737cc
radv: use NIR loop invariant code motion pass
...
Totals from 3469 (4.37% of 79395) affected shaders: (GFX11)
MaxWaves: 78690 -> 78622 (-0.09%); split: +0.03%, -0.11%
Instrs: 11093592 -> 11092346 (-0.01%); split: -0.09%, +0.07%
CodeSize: 57979444 -> 58077232 (+0.17%); split: -0.12%, +0.29%
VGPRs: 257892 -> 258336 (+0.17%); split: -0.08%, +0.25%
SpillSGPRs: 2958 -> 2521 (-14.77%); split: -32.83%, +18.05%
Latency: 135247583 -> 134446992 (-0.59%); split: -0.61%, +0.02%
InvThroughput: 25654328 -> 25478620 (-0.68%); split: -0.73%, +0.05%
VClause: 244799 -> 244499 (-0.12%); split: -0.17%, +0.05%
SClause: 313323 -> 315081 (+0.56%); split: -0.40%, +0.96%
Copies: 835953 -> 842457 (+0.78%); split: -0.38%, +1.15%
Branches: 330136 -> 330210 (+0.02%); split: -0.03%, +0.05%
PreSGPRs: 193374 -> 200277 (+3.57%); split: -0.38%, +3.95%
PreVGPRs: 223947 -> 224227 (+0.13%); split: -0.02%, +0.15%
VALU: 6312413 -> 6314841 (+0.04%); split: -0.02%, +0.06%
SALU: 1222275 -> 1227329 (+0.41%); split: -0.26%, +0.67%
VMEM: 408421 -> 408412 (-0.00%)
SMEM: 430966 -> 430399 (-0.13%)
VOPD: 2482 -> 2440 (-1.69%); split: +0.44%, -2.14%
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28783 >
2024-07-16 12:29:08 +00:00
Daniel Schürmann
540ee1c81a
nir: implement loop invariant code motion (LICM) pass
...
This simple LICM pass hoists all loop-invariant instructions
from the loops' top-level control flow, skipping any nested CF.
The hoisted instructions are placed right before the loop.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28783 >
2024-07-16 12:29:08 +00:00
Alejandro Piñeiro
e18b54fa5d
drm-shim: stub synobj_timeline_wait and query ioctl
...
Needed to avoid unhandled code DRM ioctl errors on some platforms when
using shader-db.
Reviewed-by: Eric Engestrom <eric@igalia.com >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30184 >
2024-07-16 11:17:59 +02:00
Dave Airlie
814a2da2f4
radv/video: advertise mutable/extended for dst video images.
...
This allows zink video to create planar image views if needed.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30203 >
2024-07-16 07:04:15 +00:00
Samuel Pitoiset
8863704c6b
radv/meta: add a helper to create descriptor set layout
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30187 >
2024-07-16 06:17:07 +00:00
Samuel Pitoiset
3d322b787e
radv/meta: add a helper to create pipeline layout
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30187 >
2024-07-16 06:17:07 +00:00
Samuel Pitoiset
c6a626e000
radv/meta: add a helper to create compute pipeline
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30187 >
2024-07-16 06:17:07 +00:00
Samuel Pitoiset
bf3b2d2912
radv/meta: remove useless checks for NULL handles before destroying
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30187 >
2024-07-16 06:17:07 +00:00
Samuel Pitoiset
4deb138e7d
radv/meta: remove unused number of rectangles for internal operations
...
It was always 1.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30187 >
2024-07-16 06:17:07 +00:00
Samuel Pitoiset
ecd3bbf826
radv/meta: remove redundant check for hw resolve pipelines
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30187 >
2024-07-16 06:17:07 +00:00
Samuel Pitoiset
76e4edefbf
radv/meta: remove unnecessary blit2d_dst_temps struct
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30187 >
2024-07-16 06:17:07 +00:00
Samuel Pitoiset
e739d0e5bb
radv/meta: remove non-valuable comments
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30187 >
2024-07-16 06:17:07 +00:00
Yukari Chiba
6f02ec5ed1
llvmpipe: add an implementation with llvm orcjit
...
Reviewed-by: Dave Airlie <airlied@redhat.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26018 >
2024-07-16 12:22:29 +10:00
Yukari Chiba
0b69b8d0db
llvmpipe/tests: add a new test for multiple symbols for orc jit testing
...
Reviewed-by: Dave Airlie <airlied@redhat.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26018 >
2024-07-16 11:31:24 +10:00
Yukari Chiba
ba283c0d84
llvmpipe: add function name to gallivm_jit_function
...
Reviewed-by: Dave Airlie <airlied@redhat.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26018 >
2024-07-16 11:31:24 +10:00
Yukari Chiba
28530c3eaa
gallivm: add riscv support to the mattrs setting code
...
Reviewed-by: Dave Airlie <airlied@redhat.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26018 >
2024-07-16 09:41:41 +10:00
Yukari Chiba
465510a211
util: detect RISC-V architecture
...
Reviewed-by: Dave Airlie <airlied@redhat.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26018 >
2024-07-16 09:41:28 +10:00
Doug Brown
60488d6213
xa: add missing stride setup in renderer_draw_yuv
...
This fixes a problem observed in VMware VMs where Xv playback results in
a black screen instead of the actual video.
Signed-off-by: Doug Brown <doug@schmorgal.com >
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11490
Fixes: 7672545223
("gallium: move vertex stride to CSO")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30116 >
2024-07-15 20:54:43 +00:00
Josh Simmons
1ced840632
radv: Add RADV_PROFILE_PSTATE
envvar
...
Enable selecting the specific pstate to enter when using thread tracing
and when acquiring the profiling lock for performance queries.
Signed-off-by: Josh Simmons <josh@nega.tv >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30139 >
2024-07-15 20:32:01 +00:00
Alyssa Rosenzweig
bda1de89db
asahi: eliminate load_num_workgroups from TCS unrolled ID
...
honeykrisp doesn't want to implement this sysval, we don't need it here.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30051 >
2024-07-15 20:09:00 +00:00
Alyssa Rosenzweig
ae769727d8
libagx: handle VS/IA pipeline stats on GPU
...
This was an obnoxious bit of cheating we had in the gl4.6 driver that I added
literally the morning I passed gl4.6 cts, just to fix my last gl4.6 cts test.
It had an expiration date.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30051 >
2024-07-15 20:09:00 +00:00
Alyssa Rosenzweig
1fbf2002e3
asahi: handle CS pipeline stat with indirect dispatch
...
no more stall here.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30051 >
2024-07-15 20:09:00 +00:00
Alyssa Rosenzweig
bc4d38d4ed
libagx: add kernel for incrementing CS counter
...
for indirect dispatch
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30051 >
2024-07-15 20:09:00 +00:00
Alyssa Rosenzweig
d26ae4f455
asahi,libagx: tessellate on device
...
Add OpenCL kernels implementing the tessellation algorithm on device. This is an
OpenCL C port of the D3D11 reference tessellator, originally written by
Microsoft in C++. There are significant differences compared to the CPU based
reference implementation:
* significant simplifications and clean up. The reference code did a lot of
things in weird ways that would be inefficient on the GPU. I did a *lot* of
work here to get good AGX assembly generated for the tessellation kernels ...
the first attempts were quite bad! Notably, everything is carefully written to
ensure that all private memory access is optimized out in NIR; the resulting
kernels do not use scratch and do not spill on G13.
* prefix sum variants. To implement geom+tess efficiently, we need to first
calculate the count of indices generated by the tessellator, then prefix sum
that, then tessellate using the prefix sum results writing into 1 large index
buffer for a single indirect draw. This isn't too bad, we already have most of
the logic and the guts of the prefix sum kernel is shared with geometry
shaders.
* VDM generation variant. To implement tess alone, it's fastest to generate a
hardware Index List word for each patch, adding an appropriate 32-bit index
bias to the dynamically allocated U16 index buffers. Then from the CPU, we
have the illusion of a single draw to Stream Link with Return to. This
requires packing hardware control words from the tessellator kernel.
Fortunately, we have GenXML available so we just use agx_pack like we would in
the driver.
Along the way, we pick up indirect tess support (this follows on naturally),
which gets rid of the other bit of tessellation-related cheating. Implementing
this requires reworking our internal agx_launch data structures, but that has
the nice side effect of speeding up GS invocations too (by fixing the workgroup
size).
Don't get me wrong. tessellator.cl is the single most unhinged file of my
career, featuring GenXML-based pack macros fed by dynamic memory allocation fed
by the inscrutable tessellation algorithm.
But it works *really* well.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30051 >
2024-07-15 20:09:00 +00:00
Alyssa Rosenzweig
cc9b815efa
libagx: specify heap size explicitly
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30051 >
2024-07-15 20:09:00 +00:00
Alyssa Rosenzweig
a82c0211e7
asahi: tuck in null query check
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30051 >
2024-07-15 20:09:00 +00:00
Alyssa Rosenzweig
bce466586e
asahi: make agx_pack opencl compatible
...
we don't want generic pointers here to keep things happy. also rename
CONSTANT to avoid collisions
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30051 >
2024-07-15 20:09:00 +00:00
Alyssa Rosenzweig
9624b86af0
asahi: drop stale comment
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30051 >
2024-07-15 20:08:59 +00:00
Alyssa Rosenzweig
1d4f0d3002
asahi: drop old comment
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30051 >
2024-07-15 20:08:59 +00:00
Alyssa Rosenzweig
e8b673a109
agx: do not flush denorms for fp16 fmin/fmax
...
total instructions in shared programs: 2164639 -> 2158940 (-0.26%)
instructions in affected programs: 319475 -> 313776 (-1.78%)
helped: 1200
HURT: 6
Instructions are helped.
total alu in shared programs: 1690198 -> 1684653 (-0.33%)
alu in affected programs: 272173 -> 266628 (-2.04%)
helped: 1181
HURT: 6
Alu are helped.
total fscib in shared programs: 1686497 -> 1680797 (-0.34%)
fscib in affected programs: 272922 -> 267222 (-2.09%)
helped: 1200
HURT: 6
Fscib are helped.
total bytes in shared programs: 14334550 -> 14300314 (-0.24%)
bytes in affected programs: 2075546 -> 2041310 (-1.65%)
helped: 1200
HURT: 6
Bytes are helped.
total regs in shared programs: 662332 -> 662302 (<.01%)
regs in affected programs: 1103 -> 1073 (-2.72%)
helped: 14
HURT: 15
Inconclusive result (value mean confidence interval includes 0).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30075 >
2024-07-15 19:29:00 +00:00
Alyssa Rosenzweig
6ac289dade
agx: set lower_fminmax_signed_zero
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30075 >
2024-07-15 19:29:00 +00:00
Alyssa Rosenzweig
d238d766c6
nir: add lower_fminmax_signed_zero
...
This implements IEEE-754-2019 signed zero semantics for fmin/fmax, as now
required by NIR, for hardware that has busted signed zero behaviour for
fmin/fmax. Ian expressed interest in this for Intel.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30075 >
2024-07-15 19:29:00 +00:00
Alyssa Rosenzweig
0e46f7b39a
nir/lower_alu: remove dead #define
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30075 >
2024-07-15 19:29:00 +00:00
Alyssa Rosenzweig
4ab3d95c11
nir/lower_double_ops: handle signed zero with min/max
...
Ensure the following identities hold to match IEEE-754-2019 and upcoming NIR:
min(-0, +0) = -0
min(+0, -0) = -0
max(-0, +0) = +0
max(+0, -0) = +0
NVK uses this lowering. In a simple compute shader using fmin64 on an SSBO with
signed zero preserve required, testing the effect of this patch, the instruction
count goes from 47->52. Obviously I'm not thrilled by that but I also couldn't
find any obvious way of mitigating the issue. (Maybe NVIDIA has special hardware
support here. By instruction count, lowering all the way to int64 is a loss,
though I don't know how to count cycles on NVIDIA.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30075 >
2024-07-15 19:29:00 +00:00