Commit Graph

406 Commits

Author SHA1 Message Date
Faith Ekstrand
80a1836d8b nir: Get rid of nir_dest_bit_size()
We could add a nir_def_bit_size() helper but we use ->bit_size about 3x
as often as nir_dest_bit_size() today so that's a major Coccinelle
refactor anyway and this doesn't make it much worse.  Most of this
commit was generated byt the following semantic patch:

    @@
    expression D;
    @@

    <...
    -nir_dest_bit_size(D)
    +D.ssa.bit_size
    ...

Some manual fixup was needed, especially in cpp files where Coccinelle
tends to give up the moment it sees any interesting C++.

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>
2023-08-14 21:22:53 +00:00
Alyssa Rosenzweig
d9786a48aa agx: Remove agx_nir_ssa_index
Deduplicated from agx_def_index.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>
2023-08-14 21:22:52 +00:00
Alyssa Rosenzweig
6f66f3583e agx: Stop passing nir_dest around
Towards deleting nir_dest.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>
2023-08-14 21:22:52 +00:00
Alyssa Rosenzweig
09d31922de nir: Drop "SSA" from NIR language
Everything is SSA now.

   sed -e 's/nir_ssa_def/nir_def/g' \
       -e 's/nir_ssa_undef/nir_undef/g' \
       -e 's/nir_ssa_scalar/nir_scalar/g' \
       -e 's/nir_src_rewrite_ssa/nir_src_rewrite/g' \
       -e 's/nir_gather_ssa_types/nir_gather_types/g' \
       -i $(git grep -l nir | grep -v relnotes)

   git mv src/compiler/nir/nir_gather_ssa_types.c \
          src/compiler/nir/nir_gather_types.c

   ninja -C build/ clang-format
   cd src/compiler/nir && find *.c *.h -type f -exec clang-format -i \{} \;

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24585>
2023-08-12 16:44:41 -04:00
Alyssa Rosenzweig
7ac6176ea5 agx: Do not allow creating vec8
mem_access_bit_size needs to split up 64x4 into 2 loads. Fixes:

dEQP-VK.spirv_assembly.instruction.compute.64bit_compare.int64.comp_opiequal_vector

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
2023-08-11 20:31:28 +00:00
Alyssa Rosenzweig
fd481d00d3 agx: Handle <32-bit local memory access
I don't know if this is possible to hit with GL, but it is with Vulkan. Fixes:

dEQP-VK.spirv_assembly.instruction.compute.workgroup_memory.*

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
2023-08-11 20:31:28 +00:00
Alyssa Rosenzweig
aeffd22c30 agx: Handle f2f16_rtne like f2f16
TBD whether we can control round modes later on.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
2023-08-11 20:31:28 +00:00
Alyssa Rosenzweig
73657cd011 agx: Handle conversions to 8-bit
These can't be lowered by nir_lower_bit_sizes but it doesn't actually matter.
Fixes SPIR-V conversions tests.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
2023-08-11 20:31:28 +00:00
Mary
0f4e3a03fd agx: Move nir_lower_fragcolor out of agx_preprocess_nir
Do not apply "nir_lower_fragcolor" in the common code.

This fix a crash on agxv side when a frag shader have SSBO writes.

This is caused by "nir_lower_frag_color" assuming that every
"store_deref" will have a variable backing the
output.

Signed-off-by: Mary <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
2023-08-11 20:31:28 +00:00
Alyssa Rosenzweig
a30c668e44 agx: Require an immediate for nest
There's no good reason to allow non-immediate nesting values, and this lets us
use the (smaller) mov_imm instruction without special casing. This matches what
Metal produces, so it seems like a good preference.

   total bytes in shared programs: 11720338 -> 11717310 (-0.03%)
   bytes in affected programs: 2341580 -> 2338552 (-0.13%)
   helped: 1385
   HURT: 0
   Bytes are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
2023-08-11 20:31:27 +00:00
Alyssa Rosenzweig
e83b708676 agx: Optimize out pointless else instructions
Now that they're in the right blocks, this is easy. Includes an informal proof
and the implementation itself is built around a finite state machine, which
together meant this code worked on its first try :~)

And hey, it's a pointless little instruction saving optimization I've wanted to
do for a while~

Major note is that this HAS to be done after register allocation, since it
doesn't update the control flow graph and would introduce critical edges
if it tried to actually deleted the else block. The intuitive reason for this is
simple: sometimes RA needs to insert instructions into the else block, even if
it was empty in the original NIR, so we always need an else block even if we can
delete it with this pass after RA.

   total instructions in shared programs: 1778390 -> 1776725 (-0.09%)
   instructions in affected programs: 268459 -> 266794 (-0.62%)
   helped: 1013
   HURT: 0
   Instructions are helped.

   total bytes in shared programs: 12185102 -> 12175112 (-0.08%)
   bytes in affected programs: 1927524 -> 1917534 (-0.52%)
   helped: 1013
   HURT: 0
   Bytes are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
2023-08-11 20:31:27 +00:00
Alyssa Rosenzweig
782055106f agx: Use unconditional else instruction
Rather than duplicating the condition. This matches the blob, so is presumably
the most energy-efficient way of expressing the logic.

No shader-db changes.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
2023-08-11 20:31:27 +00:00
Alyssa Rosenzweig
41b7891673 agx: Put else instructions in the right block
According to Dougall's pseudocode, else_icmp operates as:

  if r0l == 0:
    r0l = n
  elif r0l == 1:
    if cc.compare(A[thread], B[thread]):
      r0l = 0
    else:
      r0l = 1

  exec_mask[thread] = (r0l == 0)

Notice that the comparison only happens when r0l == 1, that is, for threads that
are about to enter the else block. Threads that just executed the if body are
still active (r0l = 0) and skip the comparison. As such, the sources of
else_icmp are only read in the else block, and hence the whole instruction
should be placed in the else block for correctness with respect to live range
splitting.

shader-db is a wash, but shows some improvements due to correctly modelling the
liveness of the condition variable.

   total instructions in shared programs: 1778376 -> 1778390 (<.01%)
   instructions in affected programs: 14753 -> 14767 (0.09%)
   helped: 35
   HURT: 39
   Inconclusive result (value mean confidence interval includes 0).

   total bytes in shared programs: 12185018 -> 12185102 (<.01%)
   bytes in affected programs: 101522 -> 101606 (0.08%)
   helped: 35
   HURT: 39
   Inconclusive result (value mean confidence interval includes 0).

   total halfregs in shared programs: 531174 -> 531032 (-0.03%)
   halfregs in affected programs: 2320 -> 2178 (-6.12%)
   helped: 40
   HURT: 1
   Halfregs are helped.

   total threads in shared programs: 18909184 -> 18909440 (<.01%)
   threads in affected programs: 1792 -> 2048 (14.29%)
   helped: 2
   HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
2023-08-11 20:31:27 +00:00
Alyssa Rosenzweig
0d7b8bfce5 agx: Don't lower load_local_invocation_index
We have an SR for it, which can save a bit of math. This came up while working
on the spiller.

   total instructions in shared programs: 1778396 -> 1778376 (<.01%)
   instructions in affected programs: 3036 -> 3016 (-0.66%)
   helped: 10
   HURT: 3
   Instructions are helped.

   total bytes in shared programs: 12185182 -> 12185018 (<.01%)
   bytes in affected programs: 38640 -> 38476 (-0.42%)
   helped: 18
   HURT: 2
   Bytes are helped.

   total halfregs in shared programs: 531218 -> 531174 (<.01%)
   halfregs in affected programs: 471 -> 427 (-9.34%)
   helped: 6
   HURT: 0
   Halfregs are helped.

   total threads in shared programs: 18909056 -> 18909184 (<.01%)
   threads in affected programs: 1280 -> 1408 (10.00%)
   helped: 2
   HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
2023-08-11 20:31:27 +00:00
Alyssa Rosenzweig
3e5d2f0c1b asahi,agx: Respect no16 even for I/O
Don't call lower_mediump_io for no16. This is helpful for debugging and soon
driconf-shaming apps with broken precision qualifiers.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
2023-08-11 20:31:27 +00:00
Alyssa Rosenzweig
5f3d784c6c agx: Handle 8-bit vecs
These should "just" work, promoting the 8-bit channels to 16-bit registers
internally, allowing us to use our 8-bit stores with 8-bit data vectors packed
in 16-bit registers. All other non-conversion ALU gets lowered by the previous
patch, this is just needed for simple things like nir_op_vec of lowered math
passed to a vectorized store.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
2023-08-11 20:31:27 +00:00
Alyssa Rosenzweig
c3b86bcbbc agx: Lower 8-bit ALU
No hardware support for it.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24635>
2023-08-11 20:31:27 +00:00
Alyssa Rosenzweig
144546f434 agx: Lower flat shading in NIR
We get this as part of the lowering we added for interpolateAtOffset.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24498>
2023-08-11 09:50:12 +00:00
Alyssa Rosenzweig
48029548f3 agx: Forcibly vectorize pointcoord coeffs
This avoids regressions from scalarizing pointcoord loads.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24498>
2023-08-11 09:50:11 +00:00
Alyssa Rosenzweig
22f694c008 agx: Implement nir_intrinsic_load_coefficients_agx
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24498>
2023-08-11 09:50:11 +00:00
Mike Blumenkrantz
e9a5da2f4b nir: add a filter cb to lower_io_to_scalar
this is useful for drivers that want to do selective scalarization
of io

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24565>
2023-08-11 09:02:53 +00:00
Alyssa Rosenzweig
a8013644a1 nir: Drop nir_alu_src::{negate,abs}
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24432>
2023-08-03 22:40:28 +00:00
Alyssa Rosenzweig
ab0d878932 treewide: Remove more is_ssa asserts
Stuff Coccinelle missed.

   sed -i -e '/assert(.*\.is_ssa)/d' $(git grep -l is_ssa)
   sed -i -e '/ASSERT.*\.is_ssa)/d' $(git grep -l is_ssa)

+ a manual fixup to restore the assert for parallel copy lowering.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24432>
2023-08-03 22:40:28 +00:00
Alyssa Rosenzweig
51db19f7a2 nir: Rename scoped_barrier -> barrier
sed + ninja clang-format + fix up spacing for common code.

If you are unhappy that I did not manually change the whitespace of your driver,
you need to enable clang-format for it so the formatting would happen
automatically.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24428>
2023-08-01 23:18:29 +00:00
Alyssa Rosenzweig
5f167c9f72 asahi: Lower multisample image stores
These will be used for spilling multisampled render targets.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig
10fc9e3d59 agx: Plumb in coverage mask
This is internally used by the hardware when writing to the tilebuffer. We need
to use it externally to spill multisample render targets.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig
56bb3dcc21 agx: Require tag writes with side effects
Otherwise the fragment shader might be skipped entirely. (Possibly this is the
wrong approach to this though...)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig
7ed2596fe7 agx: Implement fence_*_to_tex_agx intrinsics
We need these fencing intrinsics because our image caches aren't coherent with
memory. Furthermore, we need some sync intrinsics for imageblocks (which are
spicy images). These are a stub of what the final fragment shader interlock
implementation will look like, or what a real Metal-grade imageblock
implementation needs, but this is good enough for handling the sync requirements
with spilled render targets.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig
c1afe26be6 agx: Don't emit silly barriers
Trust in the scoped_barrier.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig
b618ba9330 agx: Emit global memory barriers for images
This is part of image atomics, since those go through the regular memory path.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig
93f26abe49 agx: Implement image_load
Texture loads can be reordered freely but image loads can't be (since there
could be writes). Implement image_load natively to avoid subtle problems with
CSE and scheduling.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig
e5f37ac5cb agx: Extract texture write mask handling
image_load will share the logic.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig
02b1ddeca6 asahi,agx: Fix txf sampler
Bizarrely, the clamps/wrap modes are respected so we need to set them
appropriately for correct out-of-bounds behaviour (returning all zero). That in
turn means we can't use whatever sampler is already there, instead we need to
allocate a dedicated sampler just for txf. Good news is we have an extra sampler
state register available for the purpose.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig
e2cfd2a228 agx: Add interleave opcode
We'll use it for texture atomics.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig
76641762ce agx: Implement image barriers
Or cache flushes or whatever these actually are. Probably could be optimized
once we understand what the 4 individual instructions are actually doing. Fixes
dEQP-GLES31.functional.image_load_store.2d.qualifiers.*.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig
4ef89e71ba agx: Translate image_store from NIR
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig
13bb1209e2 agx: Translate texture bindless handles
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig
f4aa6fd22e agx: Model texture bindless base
Extra source we need to implement bindless.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig
80e103d718 agx: Reduce un/packs with mem access lowering
Often not needed and makes the NIR harder to read.

shader-db is noise.

total instructions in shared programs: 1752712 -> 1752688 (<.01%)
instructions in affected programs: 8338 -> 8314 (-0.29%)
helped: 21
HURT: 8
Inconclusive result (%-change mean confidence interval includes 0).

total bytes in shared programs: 11943572 -> 11943434 (<.01%)
bytes in affected programs: 56716 -> 56578 (-0.24%)
helped: 21
HURT: 8
Inconclusive result (%-change mean confidence interval includes 0).

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
2023-07-20 15:33:28 +00:00
Alyssa Rosenzweig
8db9eeaeec asahi: Upload image descriptors
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24258>
2023-07-20 15:33:28 +00:00
Asahi Lina
1140bdb783 asahi: Arrange VS varyings in the correct order
The GPU ABI requires varyings to be grouped as follows:

- Position
- Smooth shaded fp32
- Flat shaded fp32
- Linear shaded fp32
- Smooth shaded fp16
- Flat shaded fp16
- Linear shaded fp16
- Point size

Use the flat shaded mask info we now have in the vertex shader key to
sort things properly, and pass the counts to the hardware.
FP16 is still TODO.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
2023-07-05 05:11:49 +00:00
Asahi Lina
4a65b4bb14 asahi: Fix type confusion for fragment shader keys
We can't attempt to access the fs union member if this is not a FS.
That worked so far since there wasn't a VS shader key at all, but we're
about to introduce one.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
2023-07-05 05:11:49 +00:00
Asahi Lina
90834353a1 asahi: Gather flat/linear shaded input info from uncompiled FS
We need to propagate shading model metadata from the FS to the VS in
order to correctly lay out the uniforms in the right order. This means
we need VS variants depending on this data.

We could use the existing shader info structure, but that applies to
compiled shaders which would introduce a dependency from the VS compile
to the FS compile. This information does not change with FS variants, so
we can introduce an agx_uncompiled_shader_info structure and gather it
early at precompilation time.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
2023-07-05 05:11:49 +00:00
Alyssa Rosenzweig
d9bf52e00f agx: Assert that barriers are not used in the preamble
It is nonsensical and confuses the hardware.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
2023-07-05 05:11:49 +00:00
Alyssa Rosenzweig
9bf7d14b2c agx: Use nir_opt_shrink_vectors
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
2023-07-05 05:11:49 +00:00
Alyssa Rosenzweig
c81a14c754 agx: Use nir_opt_shrink_stores
This especially helps with image stores, where we otherwise insert a bunch of
pointless moves to collect a vector even when we know the format only has a
single channel.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
2023-07-05 05:11:49 +00:00
Alyssa Rosenzweig
5a4c9136cd agx: Add algebraic opt to help with discard lowering
When lowering discards, it will be convenient to generate the pattern:

   (cond ? 255 : 0) ^ 255

Add rules to optimize that to

   (cond ? 0 : 255)

This is not part of the main algebraic optimizer since this lowering happens
late.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
2023-07-05 05:11:49 +00:00
Yonggang Luo
62ce223245 treewide: Switch to use nir_foreach_function_with_impl when possible
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23903>
2023-06-29 08:36:03 +00:00
Alyssa Rosenzweig
05adeb850b agx: Use nir_lower_frag_coord_to_pixel_coord
Instead of open-coding the logic.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23836>
2023-06-27 14:38:21 +00:00
Alyssa Rosenzweig
766535c867 agx: Implement vector live range splitting
The SSA killer feature is that, under an "optimal" allocator, the number of
registers used (register demand) is *equal* to the number of registers required
(register pressure, the maximum number of variables simultaneously live at any
point in the program). I put "optimal" in scare quotes, because we don't need to
use the exact minimum number of registers as long as we don't sacrifice thread
count or introduce spilling, and using a few extra registers when possible can
help coalesce moves. Details-shmetails.

The problem is that, prior to this commit, our register allocator was not
well-behaved in certain circumstances, and would require an arbitrarily large
number of registers. In particular, since different variables have different
sizes and require contiguous allocation, in large programs the register file may
become fragmented, causing the RA to use arbitrarily many registers despite
having lots of registers free.

The solution is vector live range splitting. First, we calculate the register
pressure (the minimum number of registers that it is theoretically possible to
allocate successfully), and round up to the maximum number of registers we will
actually use (to give some wiggle room to coalesce moves). Then, we will treat
this maximum as a *bound*, requiring that we don't use more registers than
chosen. In the event that register file fragmentation prevents us from finding a
contiguous sequence of registers to allocate a variable, rather than giving up
or using registers we don't have, we shuffle the register file around
(defragmenting it) to make room for the new variable. That lets us use a
few moves to avoid sacrificing thread count or introducing spilling, which is
usually a great choice.

Android GLES3.1 shader-db results are as expected: some noise / small
regressions for instruction count, but a bunch of shaders with improved thread
count. The massive increase in register demand may seem weird, but this is the
RA doing exactly what it's supposed to: using more registers if and only if they
would not hurt thread count. Notice that no programs whatsoever are hurt for
thread count, which is the salient part.

   total instructions in shared programs: 1781473 -> 1781574 (<.01%)
   instructions in affected programs: 276268 -> 276369 (0.04%)
   helped: 1074
   HURT: 463
   Inconclusive result (value mean confidence interval includes 0).

   total bytes in shared programs: 12196640 -> 12201670 (0.04%)
   bytes in affected programs: 1987322 -> 1992352 (0.25%)
   helped: 1060
   HURT: 513
   Bytes are HURT.

   total halfregs in shared programs: 488755 -> 529651 (8.37%)
   halfregs in affected programs: 295651 -> 336547 (13.83%)
   helped: 358
   HURT: 9737
   Halfregs are HURT.

   total threads in shared programs: 18875008 -> 18885440 (0.06%)
   threads in affected programs: 64576 -> 75008 (16.15%)
   helped: 82
   HURT: 0
   Threads are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23832>
2023-06-23 17:37:41 +00:00