third_party_mesa3d

Author	SHA1	Message	Date
Alyssa Rosenzweig	ff816f224b	agx: Split nest instruction into begin_cf + break We use it for two different things. Pseudo-instructions are cheap, split it up for easier optimization passes. This also fixes the schedule classes.. we can move the cf_begin around if we want, it's inert. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>	2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig	b25b36a9e3	agx: Expand nest For breaking out of deeper control flow. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>	2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig	f9343fe5ca	agx: Remove logical_end instructions They're more trouble than they're worth for us. They were originally lifted unthinkingly from ACO, where I assume they're necessary for software CF lowering, but they're just an inconvenient convenience for us. Remove em. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>	2023-09-05 18:50:34 +00:00
Alyssa Rosenzweig	119e5b9719	agx: Schedule for register pressure Since we register allocate in SSA, the number of registers required (register demand) equals to the maximum number of simultaneous live values (register pressure). So if we can reduce register pressure, we are guaranteed to reduce register demand. Even an ineffective heuristic like randomly swapping instructions can only reduce pressure as long as it's conservative. This implements one such heuristic: in each block, schedule backwards, selecting the free instruction that looks like it will reduce liveness the most. In other words, the greedy algorithm to reduce register pressure. At the end of the block, if we haven't actually reduced pressure, we bail. This isn't optimal, but it's well-motivated and optimally handles special cases (like 0-source instructions). This is based on the scheduler I originally wrote for Mali. In my Dolphin ubershader branch, this improved performance at native 4K by 10fps (105fps->115fps) when I measured together with some other optimizations. On top of my current next (which notably includes nir_opt_sink improvements), this commit alone goes (53fps->54fps) which is considerably less impressive :-p shader-db results are a win, but not as large as we might hope. Instruction count win seems to be from the smaller live ranges being easier on RA (fewer swaps / moves). The two shaders affected for thread count are from fifa mobile, which go from 640 threads -> 1024 (full occupancy). In other words... this heuristic does an excellent job in a small subset of shaders. The Dolphin ubershader win was real, though :~) Note these shader-db wins are on top of a branch with the nir_opt_sink improvements. Without that, the stats are much better... The schedulers have some overlap, but they're better together. total instructions in shared programs: 1766635 -> 1763496 (-0.18%) instructions in affected programs: 445855 -> 442716 (-0.70%) helped: 1963 HURT: 350 Instructions are helped. total bytes in shared programs: 11597648 -> 11586924 (-0.09%) bytes in affected programs: 3106230 -> 3095506 (-0.35%) helped: 2003 HURT: 374 Bytes are helped. total halfregs in shared programs: 504609 -> 481980 (-4.48%) halfregs in affected programs: 138322 -> 115693 (-16.36%) helped: 3405 HURT: 311 Halfregs are helped. total threads in shared programs: 18839936 -> 18840704 (<.01%) threads in affected programs: 1280 -> 2048 (60.00%) helped: 2 HURT: 0 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>	2023-09-05 18:50:34 +00:00
Karol Herbst	9b59602338	asahi: implement get_compute_state_info Signed-off-by: Karol Herbst <git@karolherbst.de> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25052>	2023-09-05 18:50:33 +00:00
Faith Ekstrand	4695bebc79	nir: Drop nir_dest Instead, we replace every use of it with nir_def. Most of this commit was generated by sed: sed -i -e 's/dest.ssa/def/g' src/*/.h src/*/.c src/*/.cpp A few manual fixups were required in lima and the nir_legacy code. Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>	2023-08-14 21:22:53 +00:00
Alyssa Rosenzweig	d9786a48aa	agx: Remove agx_nir_ssa_index Deduplicated from agx_def_index. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>	2023-08-14 21:22:52 +00:00
Alyssa Rosenzweig	6f66f3583e	agx: Stop passing nir_dest around Towards deleting nir_dest. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>	2023-08-14 21:22:52 +00:00
Alyssa Rosenzweig	09d31922de	nir: Drop "SSA" from NIR language Everything is SSA now. sed -e 's/nir_ssa_def/nir_def/g' \ -e 's/nir_ssa_undef/nir_undef/g' \ -e 's/nir_ssa_scalar/nir_scalar/g' \ -e 's/nir_src_rewrite_ssa/nir_src_rewrite/g' \ -e 's/nir_gather_ssa_types/nir_gather_types/g' \ -i $(git grep -l nir \| grep -v relnotes) git mv src/compiler/nir/nir_gather_ssa_types.c \ src/compiler/nir/nir_gather_types.c ninja -C build/ clang-format cd src/compiler/nir && find .c .h -type f -exec clang-format -i \{} \; Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Acked-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24585>	2023-08-12 16:44:41 -04:00
Alyssa Rosenzweig	e83b708676	agx: Optimize out pointless else instructions Now that they're in the right blocks, this is easy. Includes an informal proof and the implementation itself is built around a finite state machine, which together meant this code worked on its first try :~) And hey, it's a pointless little instruction saving optimization I've wanted to do for a while~ Major note is that this HAS to be done after register allocation, since it doesn't update the control flow graph and would introduce critical edges if it tried to actually deleted the else block. The intuitive reason for this is simple: sometimes RA needs to insert instructions into the else block, even if it was empty in the original NIR, so we always need an else block even if we can delete it with this pass after RA. total instructions in shared programs: 1778390 -> 1776725 (-0.09%) instructions in affected programs: 268459 -> 266794 (-0.62%) helped: 1013 HURT: 0 Instructions are helped. total bytes in shared programs: 12185102 -> 12175112 (-0.08%) bytes in affected programs: 1927524 -> 1917534 (-0.52%) helped: 1013 HURT: 0 Bytes are helped. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>	2023-08-11 20:31:27 +00:00
Alyssa Rosenzweig	41b7891673	agx: Put else instructions in the right block According to Dougall's pseudocode, else_icmp operates as: if r0l == 0: r0l = n elif r0l == 1: if cc.compare(A[thread], B[thread]): r0l = 0 else: r0l = 1 exec_mask[thread] = (r0l == 0) Notice that the comparison only happens when r0l == 1, that is, for threads that are about to enter the else block. Threads that just executed the if body are still active (r0l = 0) and skip the comparison. As such, the sources of else_icmp are only read in the else block, and hence the whole instruction should be placed in the else block for correctness with respect to live range splitting. shader-db is a wash, but shows some improvements due to correctly modelling the liveness of the condition variable. total instructions in shared programs: 1778376 -> 1778390 (<.01%) instructions in affected programs: 14753 -> 14767 (0.09%) helped: 35 HURT: 39 Inconclusive result (value mean confidence interval includes 0). total bytes in shared programs: 12185018 -> 12185102 (<.01%) bytes in affected programs: 101522 -> 101606 (0.08%) helped: 35 HURT: 39 Inconclusive result (value mean confidence interval includes 0). total halfregs in shared programs: 531174 -> 531032 (-0.03%) halfregs in affected programs: 2320 -> 2178 (-6.12%) helped: 40 HURT: 1 Halfregs are helped. total threads in shared programs: 18909184 -> 18909440 (<.01%) threads in affected programs: 1792 -> 2048 (14.29%) helped: 2 HURT: 0 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>	2023-08-11 20:31:27 +00:00
Alyssa Rosenzweig	95e3df39c0	treewide: sed out more is_ssa Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24432>	2023-08-03 22:40:28 +00:00
Alyssa Rosenzweig	d9bf52e00f	agx: Assert that barriers are not used in the preamble It is nonsensical and confuses the hardware. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>	2023-07-05 05:11:49 +00:00
Alyssa Rosenzweig	766535c867	agx: Implement vector live range splitting The SSA killer feature is that, under an "optimal" allocator, the number of registers used (register demand) is equal to the number of registers required (register pressure, the maximum number of variables simultaneously live at any point in the program). I put "optimal" in scare quotes, because we don't need to use the exact minimum number of registers as long as we don't sacrifice thread count or introduce spilling, and using a few extra registers when possible can help coalesce moves. Details-shmetails. The problem is that, prior to this commit, our register allocator was not well-behaved in certain circumstances, and would require an arbitrarily large number of registers. In particular, since different variables have different sizes and require contiguous allocation, in large programs the register file may become fragmented, causing the RA to use arbitrarily many registers despite having lots of registers free. The solution is vector live range splitting. First, we calculate the register pressure (the minimum number of registers that it is theoretically possible to allocate successfully), and round up to the maximum number of registers we will actually use (to give some wiggle room to coalesce moves). Then, we will treat this maximum as a bound, requiring that we don't use more registers than chosen. In the event that register file fragmentation prevents us from finding a contiguous sequence of registers to allocate a variable, rather than giving up or using registers we don't have, we shuffle the register file around (defragmenting it) to make room for the new variable. That lets us use a few moves to avoid sacrificing thread count or introducing spilling, which is usually a great choice. Android GLES3.1 shader-db results are as expected: some noise / small regressions for instruction count, but a bunch of shaders with improved thread count. The massive increase in register demand may seem weird, but this is the RA doing exactly what it's supposed to: using more registers if and only if they would not hurt thread count. Notice that no programs whatsoever are hurt for thread count, which is the salient part. total instructions in shared programs: 1781473 -> 1781574 (<.01%) instructions in affected programs: 276268 -> 276369 (0.04%) helped: 1074 HURT: 463 Inconclusive result (value mean confidence interval includes 0). total bytes in shared programs: 12196640 -> 12201670 (0.04%) bytes in affected programs: 1987322 -> 1992352 (0.25%) helped: 1060 HURT: 513 Bytes are HURT. total halfregs in shared programs: 488755 -> 529651 (8.37%) halfregs in affected programs: 295651 -> 336547 (13.83%) helped: 358 HURT: 9737 Halfregs are HURT. total threads in shared programs: 18875008 -> 18885440 (0.06%) threads in affected programs: 64576 -> 75008 (16.15%) helped: 82 HURT: 0 Threads are helped. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23832>	2023-06-23 17:37:41 +00:00
Alyssa Rosenzweig	923b966775	agx: Add loop header? flag This is useful for deciding whether we need to fix up phis in RA. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23832>	2023-06-23 17:37:41 +00:00
Alyssa Rosenzweig	b7f130fbbc	agx: Model interpolation for iter instructions Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23480>	2023-06-07 03:21:49 +00:00
Alyssa Rosenzweig	2548293e8b	agx: Split iter and iterproj instructions These are different (though related) instructions. I've split them in applegpu, let's mirror that here. This simplifies the IR a bit. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23480>	2023-06-07 03:21:49 +00:00
Alyssa Rosenzweig	b9b71bcae6	asahi,agx: Call lower_discard_zs_emit in the driver The driver needs to lower MSAA (because only it knows the sample count). MSAA lowering depends on discards getting lowered (in order to get sample masks on the discards for sample shading to work properly). Discard lowering depends on all discards emitted. But the driver needs to lower clip planes which generates discards. To break the circular dependency, we have the driver call the discard lowering pass itself (in between lowering clip planes and lowering MSAA). Technically, this is probably a layering violation but it's the least gross solution I see. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23480>	2023-06-07 03:21:49 +00:00
Alyssa Rosenzweig	398851ca53	agx: Lower discard in NIR We already lower discard in NIR when depth/stencil writes are used in the shader. In this patch, we extend that lowering for when depth/stencil writes are not used, in which case the discard is lowered to a sample_mask instruction. This is a step towards multisampling, since the old lowering assumed single-sample and there's no way to express a sample mask with a standard NIR discard instructions so we need to lower in NIR anyway for sample shading (i.e. if a discard_if diverges between samples in a pixel). This changes the lowering for discard_if to be free of control flow (instead executing a sample mask instruction unconditionally). This seems to be slightly faster in SuperTuxKart and slightly slower in Dolphin, but I'm not too worried right now. To make this work, we do need some extra lowering to ensure we always execute a sample_mask instruction, in case a discard_if is buried in other control flow (as occurs with Dolphin's ubershaders). So that's added too. We need that for MSAA anyway, so pardon the line count. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23480>	2023-06-07 03:21:49 +00:00
Alyssa Rosenzweig	e9b471d1b3	asahi: Fix disk cache disable with AGX_MESA_DEBUG We go to initialize the disk cache before we've compiled any shaders so agx_compiler_debug is 0 at this point. Don't try to read it, instead go through sa safe getter that will do the right thing. Fixes: `5e9538c12e` ("agx: isolate compiler debug flags") Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22891>	2023-05-07 09:00:40 -04:00
Alyssa Rosenzweig	e713983875	agx: Add helper for calculating occupancy Add information about the relationship between program register usage and program occupancy (the maximum number of threads that may execute concurrently on a single shader core). This table is derived from studying the maxTotalThreadsPerThreadgroup property in Metal while varying the register usage, something I blogged about a few years back. It's probably not 100% accurate and it hasn't been tested against hardware, but it matters "only" for performance (not correctness) so I'm not super stressed about the details. In the (near) future, RA will be able to make use of this information to know exactly when it can use more registers without hurting performance. In the present, it's just used for better shader-db statistics. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	c643f42dc6	agx: Constify agx_{read,write}_registers Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22353>	2023-04-07 03:23:03 +00:00
Alyssa Rosenzweig	0f974d1f90	asahi: Convert to SPDX headers Also drop my email address in the copyright lines and fix some "Copyright 208 Alyssa Rosenzweig" lines, I'm not that old. Together this drops a lot of boilerplate without losing any meaningful licensing information. SPDX is already in use for the MIT-licensed code in turnip, venus, and a few other scattered parts of the tree, so this should be ok from a Mesa licensing standpoint. This reduces friction to create new files, by parsing the copy/paste boilerplate and being short enough you can easily type it out if you want. It makes new files seem less daunting: 20 lines of header for 30 lines of code is discouraging, but 2 lines of header for 30 lines of code is reasonable for a simple compiler pass. This has technical effects, as lowering the barrier to making new files should encourage people to split code into more modular files with (hopefully positive) effects on project compile time. This helps with consistency between files. Across the tree we have at least a half dozen variants of the MIT license text (probably more), plus code that uses SPDX headers instead. I've already been using SPDX headers in Asahi manually, so you can tell old vs new code based on the headers. Finally, it means less for reviewers to scroll through adding files. Minimal actual cognitive burden for reviewers thanks to banner blindness, but the big headers still bloat diffs that add/delete files. I originally proposed this in December (for much more of the tree) but someone requested I wait until January to discuss. I've been trying to get in touch with them since then. It is now almost April and, with still no response, I'd like to press forward with this. So with a joint sign-off from the major authors of the code in question, let's do this. Signed-off-by: Asahi Lina <lina@asahilina.net> Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Emma Anholt <emma@anholt.net> Acked-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Eric Engestrom <eric@igalia.com> Acked-by: Kenneth Graunke <kenneth@whitecape.org> Acked-by: Rose Hudson <rose@krx.sh> Acked-by: Lyude Paul [over IRC: "yes I'm fine with that"] Meh'd-by: Rob Clark <robdclark@chromium.org> Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22062>	2023-03-28 05:14:00 +00:00
Alyssa Rosenzweig	5ea9c2e634	agx: Make partial DCE optional Our dead code elimination pass does two things: 1. delete instructions that are entirely unnecessary 2. delete unnecessary destinations of necessary instructions To deal with pass ordering issues, we sometimes want to do #1 without #2. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21674>	2023-03-11 14:15:50 +00:00
Rose Hudson	5e9538c12e	agx: isolate compiler debug flags The gallium disk cache is about to depend on these, and I don't want to create a dependency on agx_opcodes.h.py for that. So, make a new header for them that doesn't have build dependencies. Rename them to agx_compiler_* too, to avoid collisions with the other driver debug flags. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21776>	2023-03-08 02:07:44 +00:00
Alyssa Rosenzweig	f92738eaaa	agx: Handle fragment shader side effects Fragment shaders with side effects need to be lowered to ensure they execute for all shaded pixels but no helper threads. Add a lowering pass to handle this. Fixes dEQP-GLES31.functional.shaders.opaque_type_indexing.atomic_counter.const_literal_fragment Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21712>	2023-03-05 19:12:35 +00:00
Alyssa Rosenzweig	037609f1dc	agx: Constify agx_print Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21430>	2023-03-05 09:27:02 +00:00
Alyssa Rosenzweig	c9728b41d5	agx: Factor out allows_16bit_immediate check The optimizer needs this information to inline immediates effectively. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21430>	2023-03-05 09:27:02 +00:00
Alyssa Rosenzweig	4b1f4b86ea	agx: Add AGX_MESA_DEBUG=nopreamble option Useful both for ruling out issues with shader preambles as well as (in some cases) making for a nicer reading experience of the compiled assembly. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21430>	2023-03-05 09:27:02 +00:00
Alyssa Rosenzweig	760f367386	agx: Lower sampler LOD bias G13 does not support sampler descriptor LOD biasing, so this needs to be lowered to shader code for APIs that require this functionality. Add an option to do this lowering while doing our other backend texture lowerings. This generates lod_bias_agx texture instructions which the driver is expected to lower according to its binding model. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21276>	2023-02-27 02:35:41 +00:00
Alyssa Rosenzweig	eab4d6a96f	agx: Add and use agx_nir_ssa_index helper Common subexpression that we'll repeat once more in the next patch. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21431>	2023-02-21 08:10:15 +00:00
Alyssa Rosenzweig	14f546726e	agx: Lower shared memory offsets to 16-bit Per the hardware requirement. This simplifies instruction selection (it avoids the need to constant fold u2u16 in the backend). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21326>	2023-02-20 18:50:39 +00:00
Alyssa Rosenzweig	0d07d27173	agx: Model atomic instructions Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21326>	2023-02-20 18:50:39 +00:00
Alyssa Rosenzweig	978d3fefa8	agx: Model and pack gathers Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21264>	2023-02-20 17:27:21 +00:00
Alyssa Rosenzweig	7edd42cbc0	agx: Lower uniform sources with a dedicated pass Move the decision of "can I copyprop this uniform?" from copyprop to a standalone lowering pass. This is more straightforward and will enable the next patch. This has the side effect of sinking load_preamble instructions, for a nice reduction in register pressure. Instruction count increase is from rematerializing some moves, which should be more than balanced out by the reduced register pressure. total instructions in shared programs: 1523285 -> 1523317 (<.01%) instructions in affected programs: 1148 -> 1180 (2.79%) helped: 0 HURT: 13 HURT stats (abs) min: 1.0 max: 4.0 x̄: 2.46 x̃: 2 HURT stats (rel) min: 0.69% max: 7.69% x̄: 3.65% x̃: 2.61% 95% mean confidence interval for instructions value: 1.78 3.14 95% mean confidence interval for instructions %-change: 2.16% 5.15% Instructions are HURT. total bytes in shared programs: 10444532 -> 10444724 (<.01%) bytes in affected programs: 7386 -> 7578 (2.60%) helped: 0 HURT: 13 HURT stats (abs) min: 6.0 max: 24.0 x̄: 14.77 x̃: 12 HURT stats (rel) min: 0.63% max: 7.14% x̄: 3.40% x̃: 2.48% 95% mean confidence interval for bytes value: 10.68 18.85 95% mean confidence interval for bytes %-change: 2.02% 4.78% Bytes are HURT. total halfregs in shared programs: 419444 -> 416434 (-0.72%) halfregs in affected programs: 27080 -> 24070 (-11.12%) helped: 634 HURT: 0 helped stats (abs) min: 1.0 max: 30.0 x̄: 4.75 x̃: 2 helped stats (rel) min: 2.90% max: 54.55% x̄: 13.13% x̃: 8.51% 95% mean confidence interval for halfregs value: -5.08 -4.41 95% mean confidence interval for halfregs %-change: -14.03% -12.23% Halfregs are helped. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21122>	2023-02-05 08:53:29 +00:00
Alyssa Rosenzweig	3706da1d1a	agx: Support uniform registers as LODs This will avoid regressing moves when we lower sampler LOD bias. Corresponding disassembler change: https://github.com/dougallj/applegpu/pull/22 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20833>	2023-02-04 07:33:08 +00:00
Alyssa Rosenzweig	5e14792200	agx: Centralize texture lowering Lowering buffer textures will interact with multiple of our existing lowerings, and it's convenient to have it all in one place. This also keeps the pass ordering dependencies centralized. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21060>	2023-02-02 06:39:42 +00:00
Alyssa Rosenzweig	02fe57b7e9	agx: Lower system values in NIR in the driver To comply with The Ekstrand Rule. AGX has a large number of "uniform registers" available. These may be loaded with arbitrary ranges of GPU memory by the driver, or they can be written by the preamble shader. Currently, the compiler runs nir_opt_preamble on the first half of the uniform file, and then translates NIR sysvals to moves from the second half of the uniform file, passing back a uniform->sysval map for the GL driver to respect. This has (at least) two issues: * Since nir_opt_preamble runs before gathering sysvals, it has to assume the maximum number of sysvals are pushed, which can prevent it from moving some computation to the preamble due to running out of partitioned uniform registers. This is a problem for Dolphin's ubershaders, though it's unclear how much it matters for Dolphin perf. * This violates The Ekstrand Rule and apparently will be a problem for our Vulkan driver. I'm just a compiler+GL girl, so I wouldn't know. To fix this, we invert the order of operations. At the end of this series, we instead lower NIR system values to NIR load_preamble instructions in the GL driver. The compiler just translates directly to uniform registers reads. The Vulkan driver will need its own version of this code, but maybe it can do something clever and descriptor set aware. This means that there will already be some load_preamble instructions when nir_opt_preamble runs, so I've made minor changes to nir_opt_preamble to handle that gracefully. This is a bit lazy... The alternative is to introduce a `load_uniform_agx` intrinsic which `load_preamble` gets lowered to trivially. But that's another pass over the IR (and due to AGX's shader variant hell I'm sensitive to backend compile time) and it would be more complicated than what's implemented here. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Ella Stanforth <ella@iglunix.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20562>	2023-01-31 17:02:34 +00:00
Alyssa Rosenzweig	3a6a5281b3	agx: Lower global loads/stores to AGX versions This lets us do all the needed address arithmetic in a central place. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20558>	2023-01-11 20:36:51 +00:00
Alyssa Rosenzweig	ebe40b15ea	agx: Fix discard with MRT The exact semantics of sample_mask aren't quite clear to me yet, but executing multiple sample_mask instructions seems to raise a fault :\| Fixes SuperTuxKart's advanced renderer. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>	2023-01-05 11:49:23 -05:00
Alyssa Rosenzweig	f6aa43cf42	agx: Optimize waits locally Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>	2023-01-05 11:49:22 -05:00
Alyssa Rosenzweig	6685dba75e	agx: Add agx_read_registers helper To be used for inserting waits post-RA. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>	2023-01-05 11:49:22 -05:00
Alyssa Rosenzweig	545a3eb601	agx: Insert waits post-RA This is the first step towards reducing stalling. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>	2023-01-05 11:49:22 -05:00
Alyssa Rosenzweig	d00a43f682	agx: Hash agx_instr faster Prior to this change, agx_opt_cse is our most expensive backend pass, due to the time spent hashing instructions. hash_instr was calling into XXH32 a massive number of times, often to hash only a single bit. It's much faster to hash entire blocks of memory at a time. Optimize to do just that. With this change, agx_opt_cse is now cheaper than instruction selection as it should be. No shader-db changes (except CPU time decrease). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>	2023-01-05 11:49:22 -05:00
Alyssa Rosenzweig	f44afe766f	agx: Use texture write mask We do need to use undefs instead of zeroes in this internal collect. While this vector gets copypropped out, it'd cause us to fail compilation if noopt is on. Fix that. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20446>	2023-01-05 11:49:22 -05:00
Alyssa Rosenzweig	f603d8ce9e	asahi: Clang-format the subtree See `0afd691f29` ("panfrost: clang-format the tree") for why I'm doing this. Asahi already mostly follows Mesa style so this doesn't do much. But this means we can all stop thinking about formatting and trust the robot poets to do that for us. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20434>	2022-12-27 22:46:29 +00:00
Alyssa Rosenzweig	d9dc77f068	asahi: Add some clang-format commas Otherwise clang-format will mangle this. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20434>	2022-12-27 22:46:29 +00:00
Alyssa Rosenzweig	c1f175c9fa	asahi: Manually format some parts of the code clang-format will mangle these. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20434>	2022-12-27 22:46:29 +00:00
Alyssa Rosenzweig	9578b47af3	agx: Implement depth and stencil export Lower FRAG_RESULT_DEPTH and FRAG_RESULT_STENCIL writes to a combnied zs_emit instruction with a multisampling index. To be used in the following commit. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20365>	2022-12-17 18:10:28 +00:00
Alyssa Rosenzweig	fb49715a2c	agx: Lower UBOs in NIR Simpler than lowering in the backend and makes the sysvals obvious in the NIR. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19996>	2022-12-02 06:25:20 +00:00

1 2 3

124 Commits