Lionel Landwerlin
8ac7802ac8
brw: move final send lowering up into the IR
...
Because we do emit the final send message form in code generation, a
lot of emissions look like this :
add(8) vgrf0, u0, 0x100
mov(1) a0.1, vgrf0 # emitted by the generator
send(8) ..., a0.1
By moving address register manipulation in the IR, we can get this
down to :
add(1) a0.1, u0, 0x100
send(8) ..., a0.1
This reduce register pressure around some send messages by 1 vgrf.
All lost shaders in the below results are fragment SIMD32, due to the
throughput estimator. If turned off, we loose no SIMD32 shaders with
this change.
DG2 results:
Assassin's Creed Valhalla:
Totals from 2044 (96.87% of 2110) affected shaders:
Instrs: 852879 -> 832044 (-2.44%); split: -2.45%, +0.00%
Subgroup size: 23832 -> 23824 (-0.03%)
Cycle count: 53345742 -> 52144277 (-2.25%); split: -5.08%, +2.82%
Spill count: 729 -> 554 (-24.01%); split: -28.40%, +4.39%
Fill count: 2005 -> 1256 (-37.36%)
Scratch Memory Size: 25600 -> 19456 (-24.00%); split: -32.00%, +8.00%
Max live registers: 116765 -> 115058 (-1.46%)
Max dispatch width: 19152 -> 18872 (-1.46%); split: +0.21%, -1.67%
Cyberpunk 2077:
Totals from 1181 (93.43% of 1264) affected shaders:
Instrs: 667192 -> 663615 (-0.54%); split: -0.55%, +0.01%
Subgroup size: 13016 -> 13032 (+0.12%)
Cycle count: 17383539 -> 17986073 (+3.47%); split: -0.93%, +4.39%
Spill count: 12 -> 8 (-33.33%)
Fill count: 9 -> 6 (-33.33%)
Dota2:
Totals from 173 (11.59% of 1493) affected shaders:
Cycle count: 274403 -> 280817 (+2.34%); split: -0.01%, +2.34%
Max live registers: 5787 -> 5779 (-0.14%)
Max dispatch width: 1344 -> 1152 (-14.29%)
Hitman3:
Totals from 5072 (95.39% of 5317) affected shaders:
Instrs: 2879952 -> 2841804 (-1.32%); split: -1.32%, +0.00%
Cycle count: 153208505 -> 165860401 (+8.26%); split: -2.22%, +10.48%
Spill count: 3942 -> 3200 (-18.82%)
Fill count: 10158 -> 8846 (-12.92%)
Scratch Memory Size: 257024 -> 223232 (-13.15%)
Max live registers: 328467 -> 324631 (-1.17%)
Max dispatch width: 43928 -> 42768 (-2.64%); split: +0.09%, -2.73%
Fortnite:
Totals from 360 (4.82% of 7472) affected shaders:
Instrs: 778068 -> 777925 (-0.02%)
Subgroup size: 3128 -> 3136 (+0.26%)
Cycle count: 38684183 -> 38734579 (+0.13%); split: -0.06%, +0.19%
Max live registers: 50689 -> 50658 (-0.06%)
Hogwarts Legacy:
Totals from 1376 (84.00% of 1638) affected shaders:
Instrs: 758810 -> 749727 (-1.20%); split: -1.23%, +0.03%
Cycle count: 27778983 -> 28805469 (+3.70%); split: -1.42%, +5.12%
Spill count: 2475 -> 2299 (-7.11%); split: -7.47%, +0.36%
Fill count: 2677 -> 2445 (-8.67%); split: -9.90%, +1.23%
Scratch Memory Size: 99328 -> 89088 (-10.31%)
Max live registers: 84969 -> 84671 (-0.35%); split: -0.58%, +0.23%
Max dispatch width: 11848 -> 11920 (+0.61%)
Metro Exodus:
Totals from 92 (0.21% of 43072) affected shaders:
Instrs: 262995 -> 262968 (-0.01%)
Cycle count: 13818007 -> 13851266 (+0.24%); split: -0.01%, +0.25%
Max live registers: 11152 -> 11140 (-0.11%)
Red Dead Redemption 2 :
Totals from 451 (7.71% of 5847) affected shaders:
Instrs: 754178 -> 753811 (-0.05%); split: -0.05%, +0.00%
Cycle count: 3484078523 -> 3484111965 (+0.00%); split: -0.00%, +0.00%
Max live registers: 42294 -> 42185 (-0.26%)
Spiderman Remastered:
Totals from 6820 (98.02% of 6958) affected shaders:
Instrs: 6921500 -> 6747933 (-2.51%); split: -4.16%, +1.65%
Cycle count: 234400692460 -> 236846720707 (+1.04%); split: -0.20%, +1.25%
Spill count: 72971 -> 72622 (-0.48%); split: -8.08%, +7.61%
Fill count: 212921 -> 198483 (-6.78%); split: -12.37%, +5.58%
Scratch Memory Size: 3491840 -> 3410944 (-2.32%); split: -12.05%, +9.74%
Max live registers: 493149 -> 487458 (-1.15%)
Max dispatch width: 56936 -> 56856 (-0.14%); split: +0.06%, -0.20%
Strange Brigade:
Totals from 3769 (91.21% of 4132) affected shaders:
Instrs: 1354476 -> 1321474 (-2.44%)
Cycle count: 25351530 -> 25339190 (-0.05%); split: -1.64%, +1.59%
Max live registers: 199057 -> 193656 (-2.71%)
Max dispatch width: 30272 -> 30240 (-0.11%)
Witcher 3:
Totals from 25 (2.40% of 1041) affected shaders:
Instrs: 24621 -> 24606 (-0.06%)
Cycle count: 2218793 -> 2217503 (-0.06%); split: -0.11%, +0.05%
Max live registers: 1963 -> 1955 (-0.41%)
LNL results:
Assassin's Creed Valhalla:
Totals from 1928 (98.02% of 1967) affected shaders:
Instrs: 856107 -> 835756 (-2.38%); split: -2.48%, +0.11%
Subgroup size: 41264 -> 41280 (+0.04%)
Cycle count: 64606590 -> 62371700 (-3.46%); split: -5.57%, +2.11%
Spill count: 915 -> 669 (-26.89%); split: -32.79%, +5.90%
Fill count: 2414 -> 1617 (-33.02%); split: -36.62%, +3.60%
Scratch Memory Size: 62464 -> 44032 (-29.51%); split: -36.07%, +6.56%
Max live registers: 205483 -> 202192 (-1.60%)
Cyberpunk 2077:
Totals from 1177 (96.40% of 1221) affected shaders:
Instrs: 682237 -> 678931 (-0.48%); split: -0.51%, +0.03%
Subgroup size: 24912 -> 24944 (+0.13%)
Cycle count: 24355928 -> 25089292 (+3.01%); split: -0.80%, +3.81%
Spill count: 8 -> 3 (-62.50%)
Fill count: 6 -> 3 (-50.00%)
Max live registers: 126922 -> 125472 (-1.14%)
Dota2:
Totals from 428 (32.47% of 1318) affected shaders:
Instrs: 89355 -> 89740 (+0.43%)
Cycle count: 1152412 -> 1152706 (+0.03%); split: -0.52%, +0.55%
Max live registers: 32863 -> 32847 (-0.05%)
Fortnite:
Totals from 5354 (81.72% of 6552) affected shaders:
Instrs: 4135059 -> 4239015 (+2.51%); split: -0.01%, +2.53%
Cycle count: 132557506 -> 132427302 (-0.10%); split: -0.75%, +0.65%
Spill count: 7144 -> 7234 (+1.26%); split: -0.46%, +1.72%
Fill count: 12086 -> 12403 (+2.62%); split: -0.73%, +3.35%
Scratch Memory Size: 600064 -> 604160 (+0.68%); split: -1.02%, +1.71%
Hitman3:
Totals from 4912 (97.09% of 5059) affected shaders:
Instrs: 2952124 -> 2916824 (-1.20%); split: -1.20%, +0.00%
Cycle count: 179985656 -> 189175250 (+5.11%); split: -2.44%, +7.55%
Spill count: 3739 -> 3136 (-16.13%)
Fill count: 10657 -> 9564 (-10.26%)
Scratch Memory Size: 373760 -> 318464 (-14.79%)
Max live registers: 597566 -> 589460 (-1.36%)
Hogwarts Legacy:
Totals from 1471 (96.33% of 1527) affected shaders:
Instrs: 748749 -> 766214 (+2.33%); split: -0.71%, +3.05%
Cycle count: 33301528 -> 34426308 (+3.38%); split: -1.30%, +4.68%
Spill count: 3278 -> 3070 (-6.35%); split: -8.30%, +1.95%
Fill count: 4553 -> 4097 (-10.02%); split: -10.85%, +0.83%
Scratch Memory Size: 251904 -> 217088 (-13.82%)
Max live registers: 168911 -> 168106 (-0.48%); split: -0.59%, +0.12%
Metro Exodus:
Totals from 18356 (49.81% of 36854) affected shaders:
Instrs: 7559386 -> 7621591 (+0.82%); split: -0.01%, +0.83%
Cycle count: 195240612 -> 196455186 (+0.62%); split: -1.22%, +1.84%
Spill count: 595 -> 546 (-8.24%)
Fill count: 1604 -> 1408 (-12.22%)
Max live registers: 2086937 -> 2086933 (-0.00%)
Red Dead Redemption 2:
Totals from 4171 (79.31% of 5259) affected shaders:
Instrs: 2619392 -> 2719587 (+3.83%); split: -0.00%, +3.83%
Subgroup size: 86416 -> 86432 (+0.02%)
Cycle count: 8542836160 -> 8531976886 (-0.13%); split: -0.65%, +0.53%
Fill count: 12949 -> 12970 (+0.16%); split: -0.43%, +0.59%
Scratch Memory Size: 401408 -> 385024 (-4.08%)
Spiderman Remastered:
Totals from 6639 (98.94% of 6710) affected shaders:
Instrs: 6877980 -> 6800592 (-1.13%); split: -3.11%, +1.98%
Cycle count: 282183352210 -> 282100051824 (-0.03%); split: -0.62%, +0.59%
Spill count: 63147 -> 64218 (+1.70%); split: -7.12%, +8.82%
Fill count: 184931 -> 175591 (-5.05%); split: -10.81%, +5.76%
Scratch Memory Size: 5318656 -> 5970944 (+12.26%); split: -5.91%, +18.17%
Max live registers: 918240 -> 906604 (-1.27%)
Strange Brigade:
Totals from 3675 (92.24% of 3984) affected shaders:
Instrs: 1462231 -> 1429345 (-2.25%); split: -2.25%, +0.00%
Cycle count: 37404050 -> 37345292 (-0.16%); split: -1.25%, +1.09%
Max live registers: 361849 -> 351265 (-2.92%)
Witcher 3:
Totals from 13 (46.43% of 28) affected shaders:
Instrs: 593 -> 660 (+11.30%)
Cycle count: 28302 -> 28714 (+1.46%)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199 >
2025-01-11 08:41:42 +00:00
Lionel Landwerlin
a27d98e933
brw: avoid having the scratch surface handle partially written
...
Allows it to be visible through the def_analysis.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199 >
2025-01-11 08:41:42 +00:00
Lionel Landwerlin
aac906c16c
brw: add scheduler support for address registers
...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199 >
2025-01-11 08:41:42 +00:00
Lionel Landwerlin
0a5bdf1199
brw: add infra to make use of the address register in the IR
...
This limits the address register to simple cases inside a block.
Validation ensures that the address register is only written once and
read once.
Instruction scheduling makes sure that instructions using the address
register in the generator are not scheduled while there is an usage of
the register in the IR.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199 >
2025-01-11 08:41:42 +00:00
Lionel Landwerlin
c9fa235c28
brw: split validation iteration into blocks
...
No functional change.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199 >
2025-01-11 08:41:42 +00:00
Lionel Landwerlin
9b73a73a6e
brw: use phys_nr() more in generation
...
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199 >
2025-01-11 08:41:42 +00:00
Lionel Landwerlin
b110b06447
brw: introduce a new register type for the address register
...
We want to reuse the brw::nr field as a virtual address register
identifer. So we can't use brw::file=ARF brw::nr=ADDRESS.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28199 >
2025-01-11 08:41:42 +00:00
Marek Olšák
842c91300f
mesa: enable GL name reuse by default for all drivers except virgl
...
v2: detect qemu, crossvm
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com > (v1)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32715 >
2025-01-11 05:54:52 +00:00
Marek Olšák
b15c8fe3f1
mesa: rework enablement of force_gl_names_reuse
...
force_gl_names_reuse is changed to integer.
-1 means default (currently disabled), 0 means disabled, 1 means enabled
The names reuse initialization is moved to _mesa_alloc_shared_state ->
_mesa_InitHashTable instead of _mesa_HashEnableNameReuse.
It will be enabled by default.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32715 >
2025-01-11 05:54:52 +00:00
Felix DeGrood
06423b1792
vk/overlay-layer: defer log creation to swapchain creation
...
Moving output file creation to coincide with swapchain creation
ensures only rendering thread will create/destroy log file. This
was causing problems with non-rendering processes stomping log file.
Reviewed-by: Caleb Callaway <caleb.callaway@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32814 >
2025-01-10 23:44:24 +00:00
Kenneth Graunke
de1eaa4019
brw: Always use MEMORY_LOAD for load_ubo_uniform_block_intel intrinsics
...
Rather than emitting FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD to do block
loads that were cacheline aligned, loading entire cachelines at a time,
we now rely on NIR passes to group, CSE, and vectorize things into
appropriately sized blocks. This means that we'll usually still load
a cacheline, but we may load only 32B if we don't actually need anything
from the full 64B. Prior to Xe2, this saves us registers, and it ought
to save us some bandwidth as well as the response length can be lowered.
The cacheline-aligning hack was the main reason not to simply call
fs_nir_emit_memory_access(), so now we do that instead, porting yet
one more thing to the common memory opcode framework.
We unfortunately still emit the old FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD
opcode for non-block intrinsics. We'd have to clean up 16-bit handling
among other things in order to eliminate this, but we should in the
future.
fossil-db results on Alchemist for this and the previous patch together:
Instrs: 161481888 -> 161297588 (-0.11%); split: -0.12%, +0.01%
Subgroup size: 8102976 -> 8103000 (+0.00%)
Send messages: 7895489 -> 7846178 (-0.62%); split: -0.67%, +0.05%
Cycle count: 16583127302 -> 16703162264 (+0.72%); split: -0.57%, +1.29%
Spill count: 72316 -> 67212 (-7.06%); split: -7.25%, +0.19%
Fill count: 134457 -> 125970 (-6.31%); split: -6.83%, +0.52%
Scratch Memory Size: 4093952 -> 3787776 (-7.48%); split: -7.53%, +0.05%
Max live registers: 33037765 -> 32947425 (-0.27%); split: -0.28%, +0.00%
Max dispatch width: 5780288 -> 5778536 (-0.03%); split: +0.17%, -0.20%
Non SSA regs after NIR: 177862542 -> 178816944 (+0.54%); split: -0.06%, +0.60%
In particular, several titles see incredible reductions in spill/fills:
Shadow of the Tomb Raider: -65.96% / -65.44%
Batman: Arkham City GOTY: -53.49% / -28.57%
Witcher 3: -16.33% / -14.29%
Total War: Warhammer III: -9.60% / -10.14%
Assassins Creed Odyssey: -6.50% / -9.92%
Red Dead Redemption 2: -6.77% / -8.88%
Far Cry: New Dawn: -7.97% / -4.53%
Improves performance in many games on Arc A750:
Cyberpunk 2077: 5.8%
Witcher 3: 4%
Shadow of the Tomb Raider: 3.3%
Assassins Creed: Valhalla: 3%
Spiderman Remastered: 2.75%
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888 >
2025-01-10 22:44:09 +00:00
Kenneth Graunke
21636ff9fa
brw: Align and combine constant-offset UBO loads in NIR
...
The hope here is to replace our backend handling for loading whole
cachelines at a time from UBOs into NIR-based handling, which plays
nicely with the NIR load/store vectorizer.
Rounding down offsets to multiples of 64B allows us to globally CSE
UBO loads across basic blocks. This is really useful. However, blindly
rounding down the offset to a multiple of 64B can trigger anti-patterns
where...a single unaligned memory load could have hit all the necessary
data, but rounding it down split it into two loads.
By moving this to NIR, we gain more control of the interplay between
nir_opt_load_store_vectorize and this rebasing and CSE'ing. The backend
can then simply load between nir_def_{first,last}_component_read() and
trust that our NIR has the loads blockified appropriately.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888 >
2025-01-10 22:44:09 +00:00
Kenneth Graunke
36d0485ae4
brw: Allow CSE of MEMORY_MODE_CONSTANT loads
...
This matches the behavior of FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888 >
2025-01-10 22:44:09 +00:00
Kenneth Graunke
7ce66e2b61
brw: Add a new MEMORY_MODE_CONSTANT option
...
This will translate to HDC Constant Cache loads or LSC UGM loads.
On LSC, MEMORY_MODE_UNTYPED would be fine, but for HDC we need to
distinguish between the regular and constant cache data ports.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888 >
2025-01-10 22:44:09 +00:00
Kenneth Graunke
cfbb5ebcdd
brw: Skip unread leading/trailing components in convergent block loads
...
The NIR vectorizer may produce block loads with unread trailing
components. Upcoming passes may produce unread leading components
as well. With a bit of finesse, we can skip loading those, and only
bother with the ones we actually need. This can sometimes save us on
loads and MOVs.
v2: Skip this for SLM reads on pre-LSC platforms (caught by Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888 >
2025-01-10 22:44:09 +00:00
Kenneth Graunke
4f0c852a4e
brw: Skip unnecessary work for trivial emit_uniformize of IMMs
...
If we pass an immediate, just trivially return that immediate.
This preserves the property that if x was an IMM, emit_uniformize(x)
will also be an IMM, without the need for optimizations to eliminate
unnecessary operations. That way, you can call emit_uniformize() on
a value and still check whether it's constant afterwards.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888 >
2025-01-10 22:44:09 +00:00
Kenneth Graunke
a0b1e07976
brw: Make get_nir_src_imm() usable for non-32-bit-sizes.
...
We return an immediate for 32-bit constant values, but fall back to
calling get_nir_src() for other values, as 64-bit, and even 8-bit
immediates have odd restrictions. We could probably support 16-bit
here without too many issues, but we leave it be for now.
This makes it usable for case where we'd like to get constants for
32-bit values but where it may be a different bit-size too.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888 >
2025-01-10 22:44:09 +00:00
Kenneth Graunke
03f948f5fd
brw: Skip fetching unread leading components of UBO loads
...
We were already skipping unread trailing components, but now we skip
them on both ends.
About -3.5% spills on Shadow of the Tomb Raider on Alchemist (mostly a
wash elsewhere, but it will help additional shaders with later patches).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888 >
2025-01-10 22:44:09 +00:00
Kenneth Graunke
c8b2ab041e
brw: Add more safeguards against misaligned OWord Block messages
...
HDC doesn't support block loads/stores with sub-DWord (<4B) aligned
offsets, and shared local memory has to use the Aligned OWord Block
messages which require OWord (16B) alignment.
Make the validator detect this case and say no. Also make the lowering
code assert that the alignment is valid as a second line of defense.
LSC has no such restrictions.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888 >
2025-01-10 22:44:09 +00:00
Kenneth Graunke
2f334e8baf
nir: Add a nir_def_first_component_read() helper
...
Similar to nir_def_last_component_read(). Just a little nicer than
prodding at the bitmask of components read directly.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32888 >
2025-01-10 22:44:09 +00:00
Hyunjun Ko
638fc5e472
anv: change bool to VkResult
...
Fixes: 41caf3665c
("anv/image: allocate some memory for mv storage after video images.")
Signed-off-by: Hyunjun Ko <zzoon@igalia.com >
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32775 >
2025-01-10 21:45:04 +00:00
Hyunjun Ko
ec60462a65
anv: fix to set default cdf buf correctly.
...
v1. Store cdf index values to the state of the commnad buffer.
(Lionel Landwerlin <lionel.g.landwerlin@intel.com >)
Fixes: dEQP-VK.video.decode.av1.sizeup_8_separated_dpb
Signed-off-by: Hyunjun Ko <zzoon@igalia.com >
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32775 >
2025-01-10 21:45:04 +00:00
Hyunjun Ko
e510efed05
anv: support in-loop super resolution for AV1 decoding
...
Signed-off-by: Hyunjun Ko <zzoon@igalia.com >
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32775 >
2025-01-10 21:45:04 +00:00
Hyunjun Ko
788263501d
anv: calculate global parmeters correctly for AV1 decoding
...
Signed-off-by: Hyunjun Ko <zzoon@igalia.com >
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32775 >
2025-01-10 21:45:04 +00:00
Dave Airlie
8432b8b282
anv: add initial support for AV1 decoding
...
Co-authored-by: Hyunjun Ko <zzoon@igalia.com >
- Allow intrabc
- Fix to manage refrenece frames using referenceNameSlotIndices
- Fix to set bitmask of motion field projection correctly
- Set destination buffer offset to the BSD_OBJECT
- Support 10-bit decoding.
- Fix small bugs.
- Change to C-style comment.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com >
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32775 >
2025-01-10 21:45:04 +00:00
Hyunjun Ko
0fd0a51df6
anv/video: Fix to return supported video format correctly.
...
Since 8-bit decoding is not default, we need to check the flag too.
Fixes: a64ae20d0
("anv: support HEVC 10-bit decoding" )
Signed-off-by: Hyunjun Ko <zzoon@igalia.com >
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32775 >
2025-01-10 21:45:04 +00:00
Hyunjun Ko
3f3d6c04a3
intel/genxml: define MEMORYADDRESSATTRIBUTES for Gen12.5 with TILEF
...
Signed-off-by: Hyunjun Ko <zzoon@igalia.com >
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32775 >
2025-01-10 21:45:04 +00:00
Dave Airlie
68477ae7c0
genxml: add av1 fields
...
Co-authored-by: Hyunjun Ko <zzoon@igalia.com >
- Remove HuC pipeline params of VD_PIPELINE_FLUSH
- Fix length of AVP_PIPE_MODE_SELECT, AVP_PIC_STATE, AVP_PIPE_BUF_ADDR_STATE
Signed-off-by: Hyunjun Ko <zzoon@igalia.com >
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32775 >
2025-01-10 21:45:04 +00:00
Dave Airlie
6a28e7a6c7
anv: add default av1 tables from media-driver
...
Co-authored-by: Hyunjun Ko <zzoon@igalia.com >
- Change to C-style comment.
Signed-off-by: Hyunjun Ko <zzoon@igalia.com >
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32775 >
2025-01-10 21:45:04 +00:00
Brian Paul
b13e2a495e
svga: add svga_resource_create_with_modifiers() function
...
The dri_create_image() function returns early if the gallium
driver does not implement this function. Surface creation has
been broken for some time up to this fix.
Signed-off-by: Brian Paul <brian.paul@broadcom.com >
Reviewed-by: Neha Bhende <neha.Bhende@broadcom.com >
Reviewed-by: Neha Bhende <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32976 >
2025-01-10 21:15:12 +00:00
Caio Oliveira
7fadd864dd
intel/elk: Fix typo in assertion
...
Just assert that the array will fit whatever the MAX is for a given
Gfx version.
Fixes: 172c1ab984
("intel/elk: Add ELK_MAX_MRF_ALL for static allocating arrays")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32978 >
2025-01-10 20:16:59 +00:00
Mike Blumenkrantz
010732b8ef
glsl: enable OVR_multiview if OVR_multiview2 is enabled
...
according to spec
Fixes: 328c29d600
("mesa,glsl,gallium: add GL_OVR_multiview")
Reviewed-by: Adam Jackson <ajax@redhat.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32946 >
2025-01-10 19:10:48 +00:00
Mike Blumenkrantz
3c5eae639d
glsl: make gl_ViewID_OVR visible to all shader stages
...
according to spec
Fixes: 328c29d600
("mesa,glsl,gallium: add GL_OVR_multiview")
Reviewed-by: Adam Jackson <ajax@redhat.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32946 >
2025-01-10 19:10:48 +00:00
Mary Guillemard
bacc5f4579
pan/genxml: Switch __gen_unpack to macros
...
This switch all __gen_unpack functions to macros to keep address space
information when working with OpenCL C.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com >
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32962 >
2025-01-10 18:27:27 +00:00
Mary Guillemard
3f3bb741fb
pan/genxml: Switch [un]pack codegen to macros
...
Because of OpenCL C, we need a way to retain address space information
contains with the pointers.
As a result this switch all [un]pack functions to macros, resulting in
pointers retaining their respective address space information.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com >
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32962 >
2025-01-10 18:27:27 +00:00
Mary Guillemard
e15940008f
pan/genxml: Switch pan_section_ptr to cast to packed type
...
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com >
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32962 >
2025-01-10 18:27:27 +00:00
Mary Guillemard
3b69edf825
pan/genxml: Enforce explicit packed types on pan_[un]pack
...
Provide a pan_cast_and_[un]pack() to help with the transition.
Those helpers should only be used when the caller is sure the
destination is big enough to emit the descriptor.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com >
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32962 >
2025-01-10 18:27:27 +00:00
Mary Guillemard
bd80037441
pan/genxml: Move [un]pack internals to use packed structs
...
We are now strongly typing everything, pan_[un]pack wil enforce this at
the API level next.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com >
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32962 >
2025-01-10 18:27:27 +00:00
Boris Brezillon
b9caca64f2
pan/genxml: Generate MALI_XXX_PACKED_T macros
...
Will be useful to easily define packed type variables from the
pan_[un]pack() functions, which we'll need during the pan_pack
revamp.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com >
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32962 >
2025-01-10 18:27:27 +00:00
Mary Guillemard
39d8b56c4a
pan/genxml: Emit struct details before pack function
...
We are going to use packed structs in [un]pack next so we need those to
be emitted before them.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com >
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32962 >
2025-01-10 18:27:27 +00:00
Mary Guillemard
95435a788d
pan/genxml: Switch unpack to use uint32_t
...
Makes this match pack.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com >
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32962 >
2025-01-10 18:27:27 +00:00
Boris Brezillon
ab1cd917ad
pan/genxml: Include pan_pack_helpers.h instead of copying it
...
The generic bits in autogen pack helpers files were extracted in a
common header, so let's include it from the autogenerated file rather
than copying its content there.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com >
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32962 >
2025-01-10 18:27:27 +00:00
Boris Brezillon
39461e5818
pan/genxml: s/PAN_PAN_HELPERS_H/PAN_PACK_HELPERS_H/
...
Fix a typo in the multi-inclusion guard.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com >
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32962 >
2025-01-10 18:27:27 +00:00
Eric Engestrom
519f4bba6b
docs/release-calendar: push the 25.0 branchpoint back by 2 weeks
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32975 >
2025-01-10 18:19:06 +00:00
Michael Cheng
c3c05ffb5f
intel : Expose Shader hashes for utrace and Perfetto
...
This patch exposes shader hashes (computes and draws) to Perfetto and
utrace. By including these hashes in traces, developers can correlate
compute and draw calls with their assoicated ASM dumps when analyzing
the traces.
To achieve this, intel_tracepoint.py has been reworked to preprocess
tracepoint arguments dynamically. Any argument containing "hash" in its
variable name is now forrmated as hexadecimal before being passed to the
tracepoint definition.
Signed-off-by: Michael <michael.cheng@intel.com >
Reviewed-by: José Roberto de Souza <jose.souza@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32708 >
2025-01-10 17:38:16 +00:00
Boris Brezillon
6f8fb6d73d
panfrost/ci: Add panvk and panfrost to the debian-x86_32 job
...
Useful to catch compile-time regressions.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com >
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com >
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32938 >
2025-01-10 15:53:36 +00:00
Boris Brezillon
dc1b988273
panvk: Fix panvk_priv_mem_bo() on 32-bit platforms
...
Masking with an ~7ull promotes the value to 64-bit, leading
to a size mismatch when we cast it to a pointer.
Make sure we're using an uintptr_t type for the mask.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com >
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com >
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32938 >
2025-01-10 15:53:36 +00:00
Boris Brezillon
134f965b88
panvk: Fix an alignment issue on x86
...
On x86-32, long long are aligned on 4-bytes only, which breaks
the assumption we had about our sysvals struct layouts.
Define an aligned_u64 embedding the alignment attribute to
keep the alignment sane.
While at it, enforce this alignment with an alignment attribute
on the struct itself.
This fixes the build on x86-32, and should do what we expect,
though it's not been tested in practice.
Fixes: ae76a6a045
("panvk: Pack push constants")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12429
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com >
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com >
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32938 >
2025-01-10 15:53:36 +00:00
Rhys Perry
2b10930b48
aco: use VOP3 v_mov_b16 if necessary
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Backport-to: 24.3
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32944 >
2025-01-10 15:05:00 +00:00
Rhys Perry
46787fc2d0
aco/util: fix bit_reference::operator&=
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Backport-to: 24.3
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32944 >
2025-01-10 15:05:00 +00:00