Commit Graph

207 Commits

Author SHA1 Message Date
Iago Toral Quiroga
c8e4ee8ecb broadcom/compiler: don't assign registers to unused nodes/temps
In programs with a lot of unused temps, if we don't do this, we may
end up recycling previously used rfs more often, which can be
detrimental to instruction pairing.

total instructions in shared programs: 11464335 -> 11444136 (-0.18%)
instructions in affected programs: 8976743 -> 8956544 (-0.23%)
helped: 33196
HURT: 33778
Inconclusive result

total max-temps in shared programs: 2230150 -> 2229445 (-0.03%)
max-temps in affected programs: 86413 -> 85708 (-0.82%)
helped: 2217
HURT: 1523
Max-temps are helped.

total sfu-stalls in shared programs: 18077 -> 17104 (-5.38%)
sfu-stalls in affected programs: 8669 -> 7696 (-11.22%)
helped: 2657
HURT: 2182
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 11482412 -> 11461240 (-0.18%)
inst-and-stalls in affected programs: 8995697 -> 8974525 (-0.24%)
helped: 33319
HURT: 33708
Inconclusive result

total nops in shared programs: 298140 -> 296185 (-0.66%)
nops in affected programs: 52805 -> 50850 (-3.70%)
helped: 3797
HURT: 2662
Inconclusive result

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
2023-10-13 22:37:42 +00:00
Iago Toral Quiroga
ce13aa4ee7 broadcom/compiler: improve allocation for final program instructions
The last 3 instructions can't use specific registers so flag all the
nodes for temps used in the last program instructions and try to
avoid assigning any of these. This may help us avoid injecting nops
for the last thread switch instruction.

Because regisster allocation needs to happen before QPU scheduling
and instruction merging we can't tell exactly what the last 3
instructions will be, so we do this for a few more instructions than
just 3.

We only do this for fragment shaders because other shader stages
always end with VPM store instructions that take an small immediate
and therefore will never allow us to merge the final thread switch
earlier, so limiting allocation for these shaders will never improve
anything and might instead be detrimental.

total instructions in shared programs: 11471389 -> 11464335 (-0.06%)
instructions in affected programs: 582908 -> 575854 (-1.21%)
helped: 4669
HURT: 578
Instructions are helped.

total max-temps in shared programs: 2230497 -> 2230150 (-0.02%)
max-temps in affected programs: 5662 -> 5315 (-6.13%)
helped: 344
HURT: 44
Max-temps are helped.

total sfu-stalls in shared programs: 18068 -> 18077 (0.05%)
sfu-stalls in affected programs: 264 -> 273 (3.41%)
helped: 37
HURT: 48
Inconclusive result (value mean confidence interval includes 0).

total inst-and-stalls in shared programs: 11489457 -> 11482412 (-0.06%)
inst-and-stalls in affected programs: 585180 -> 578135 (-1.20%)
helped: 4659
HURT: 588
Inst-and-stalls are helped.

total nops in shared programs: 301738 -> 298140 (-1.19%)
nops in affected programs: 14680 -> 11082 (-24.51%)
helped: 3252
HURT: 108
Nops are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
2023-10-13 22:37:42 +00:00
Iago Toral Quiroga
3a36a618d7 broadcom/compiler: try to use ldunif(a) instead of ldunif(a)rf in v71
The rf variants need to encode the destination in the cond bits, which
prevents these to be merged with any other instruction that need them.

In 4.x, ldunif(a) write to r5 which is a special register that only
ldunif(a) and ldvary can write so we have a special register class for
it and only allow it for them. Then when we need to choose a register
for a node, if this register is available we always use it.

In 7.x these instructions write to rf0, which can be used by any
instruction, so instead of restricting rf0, we track the temps that
are used as ldunif(a) destinations and use that information to favor
rf0 for them.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
2023-10-13 22:37:42 +00:00
Iago Toral Quiroga
b1548b18d3 broadcom/compiler: rename vir_writes_rX to vir_writes_rX_implicitly
Since that represents more accurately what they check..

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25450>
2023-10-13 22:37:41 +00:00
Iago Toral Quiroga
4afbf4ad31 v3d: get rid of shader_state pointer in v3d_key
Having this pointer in the key is undesirable since it makes
copying keys difficult and error prone (as seen in previous
patches), also, it is only there for convenience and we don't
strictly need it (in fact the vulkan driver doesn't use it at
all), so let's just get rid of it so our v3d_key is fully
static.

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25418>
2023-10-02 06:35:07 +00:00
Iago Toral Quiroga
adc63d2503 broadcom/compiler: add a couple of shader key helpers
Our shader key includes a void pointer that we can't just memcmp,
so add helpers that allow us toget the 'static' portion and size
of a key. We will use this to fix up the shader cache in v3d in
a later patch.

Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25418>
2023-10-02 06:35:06 +00:00
Faith Ekstrand
f922ea7c07 broadcom: Stop using nir_dest directly
We want to get rid of nir_dest so back-ends need to stop storing it
in structs and passing it through helpers.

Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24674>
2023-08-14 21:22:53 +00:00
Alyssa Rosenzweig
09d31922de nir: Drop "SSA" from NIR language
Everything is SSA now.

   sed -e 's/nir_ssa_def/nir_def/g' \
       -e 's/nir_ssa_undef/nir_undef/g' \
       -e 's/nir_ssa_scalar/nir_scalar/g' \
       -e 's/nir_src_rewrite_ssa/nir_src_rewrite/g' \
       -e 's/nir_gather_ssa_types/nir_gather_types/g' \
       -i $(git grep -l nir | grep -v relnotes)

   git mv src/compiler/nir/nir_gather_ssa_types.c \
          src/compiler/nir/nir_gather_types.c

   ninja -C build/ clang-format
   cd src/compiler/nir && find *.c *.h -type f -exec clang-format -i \{} \;

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24585>
2023-08-12 16:44:41 -04:00
Iago Toral Quiroga
f0e603583e broadcom/compiler: drop execution environment from the shader key
We are no longer using this for anything.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24396>
2023-08-03 06:32:41 +00:00
Iago Toral Quiroga
1f8ecd3ae0 broadcom: use nir info to keep track of implicit sample shading
It seems NIR is tracking this for us now so we can stop doing this
in the backend.

Also, new CTS tests seem to add the requirement where in the presence of
some builtin's like gl_SampleID in a shader, even if unused, sample shading
is expected to be enabled, which is something we can't track in the backend
since the variable may have been dropped by then.

Fixes 2 failures in:
dEQP-VK.draw.renderpass.implicit_sample_shading.sample*

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23984>
2023-07-04 08:54:43 +00:00
Alyssa Rosenzweig
4601517f54 broadcom/compiler: Remove v3d_nir_lower_robust_access
Now unused.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23895>
2023-06-29 22:36:50 +00:00
Alejandro Piñeiro
dfdbf5bf94 broadcom/compiler: clarify use of QFILE_VPM
This was only used for version < 40 (See commit 22a02f3e3).

Adding some extra explanations and asserts of places where it is used.

As we are here also move the definition of a register with QFILE_VPM,
to avoid defining it if not needed.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22984>
2023-06-14 09:03:35 +00:00
Iago Toral Quiroga
c950098abb broadcom/compiler: move buffer loads to lower register pressure
If we are trying to lower register pressure this can make a big
difference in some cases. To avoid adding even more strategies,
merge this with disabling ubo load sorting, since they are basically
trying to do the same.

total instructions in shared programs: 12848024 -> 12844510 (-0.03%)
instructions in affected programs: 236537 -> 233023 (-1.49%)
helped: 195
HURT: 87
Instructions are helped.

total uniforms in shared programs: 3815601 -> 3814932 (-0.02%)
uniforms in affected programs: 31773 -> 31104 (-2.11%)
helped: 67
HURT: 115
Inconclusive result (value mean confidence interval includes 0).

total max-temps in shared programs: 2210803 -> 2210622 (<.01%)
max-temps in affected programs: 9362 -> 9181 (-1.93%)
helped: 114
HURT: 34
Max-temps are helped.

total spills in shared programs: 2556 -> 2330 (-8.84%)
spills in affected programs: 1391 -> 1165 (-16.25%)
helped: 39
HURT: 9

total fills in shared programs: 3840 -> 3317 (-13.62%)
fills in affected programs: 2379 -> 1856 (-21.98%)
helped: 39
HURT: 23

total sfu-stalls in shared programs: 21965 -> 21978 (0.06%)
sfu-stalls in affected programs: 2618 -> 2631 (0.50%)
helped: 45
HURT: 81
Inconclusive result (value mean confidence interval includes 0).

total inst-and-stalls in shared programs: 12869989 -> 12866488 (-0.03%)
inst-and-stalls in affected programs: 238771 -> 235270 (-1.47%)
helped: 193
HURT: 87
Inst-and-stalls are helped.

total nops in shared programs: 303501 -> 303274 (-0.07%)
nops in affected programs: 4159 -> 3932 (-5.46%)
helped: 87
HURT: 105
Inconclusive result (value mean confidence interval includes 0).

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22824>
2023-05-03 13:01:58 +00:00
Harri Nieminen
c3c63cb1d8 broadcom: fix typos
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22591>
2023-04-21 17:19:46 +00:00
Iago Toral Quiroga
1e28f2a6f2 broadcom/compiler: track pending ldtmu count with each TMU lookup
And use this information when scheduling QPU to avoid merging
a new TMU request into a previous ldtmu instruction when doing
so may cause TMU output fifo overflow due to a stalling ldtmu.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22044>
2023-03-21 11:29:05 +00:00
Alejandro Piñeiro
1a1fa2393e v3d/v3dv: use shader_info->var_copies_lowered
Instead of passing allow_copies as a parameter for v3d_optimize_nir
(so manually doing that tracking).

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19338>
2023-02-06 22:11:34 +00:00
Alejandro Piñeiro
41a081380a broadcom/compiler: v3d_nir_lower_txf_ms doesn't need v3d_compile
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20744>
2023-01-18 13:09:57 +00:00
Iago Toral Quiroga
22ef66bcc9 v3d/compiler: remove unused sample_coverage field from fs key.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20634>
2023-01-11 10:54:05 +00:00
Iago Toral Quiroga
c7150ad8e6 broadcom/compiler: drop unused v3d_compile parameter for nir pass
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19519>
2022-11-04 09:58:10 +00:00
Iago Toral Quiroga
24d9a80247 v3dv: implement VK_EXT_pipeline_robustness
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18883>
2022-10-27 08:17:11 +00:00
Alejandro Piñeiro
019529aa11 broadcom/compiler: call nir_opt_gcm with a custom strategy
nir_opt_gcm get us worse shader-db stats, but that is expected. But we
want to prevent to get worse values on spill/fills. Analyzing the
outcome with shader-db, this mostly happen with shaders that are
already complex, and are already spilling/filling.

So the best option here is adding a new strategy, that fall backs if
we get spill/fill using nir_opt_gcm.

It is not clear in which order we should disable gcm. For now we
disable it before loop unrolling.

We get a slight performance gain (in average) using nir_opt_gcm.

We don't show the shaderdb stats, as they are worse, but as mentioned,
this is expected.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
2022-10-26 12:29:30 +00:00
Alejandro Piñeiro
0bf31b0710 broadcom/compiler: add more lowerings/optimizations on v3d_optimize_nir
Optimizations that we are already calling on the Vulkan driver. As
preparation to the Vulkan frontend to use v3d_optimize_nir too.

We need to add a new parameter to v3d_optimize_nir in order to know if
we can call nir_opt_find_array_copies. As we don't track if we are
calling nir_var_lower_copies, we explicitly call it when we create the
uncompiled shader create. So instead of tracking, we assume that each
driver (v3d/v3dv) would call it when the shader is created. So when
v3d_optimize_nir is called as part of the process to compile it at the
compiler, we call it with allow_copies as false.

We exclude on purpose nir_opt_gcm as it is a case of a optimization
that could help performance even if it hurts shader db stats.

shaderdb stats:
  total instructions in shared programs: 11705923 -> 11705034 (<.01%)
  instructions in affected programs: 88350 -> 87461 (-1.01%)
  helped: 201
  HURT: 80
  Instructions are helped.

  total threads in shared programs: 375552 -> 375558 (<.01%)
  threads in affected programs: 6 -> 12 (100.00%)
  helped: 3
  HURT: 0

  total uniforms in shared programs: 3486108 -> 3485789 (<.01%)
  uniforms in affected programs: 7473 -> 7154 (-4.27%)
  helped: 90
  HURT: 1
  Uniforms are helped.

  total max-temps in shared programs: 2021860 -> 2021802 (<.01%)
  max-temps in affected programs: 800 -> 742 (-7.25%)
  helped: 21
  HURT: 3
  Max-temps are helped.

  total sfu-stalls in shared programs: 19299 -> 19296 (-0.02%)
  sfu-stalls in affected programs: 18 -> 15 (-16.67%)
  helped: 10
  HURT: 7
  Inconclusive result (value mean confidence interval includes 0).

  total inst-and-stalls in shared programs: 11725222 -> 11724330 (<.01%)
  inst-and-stalls in affected programs: 88402 -> 87510 (-1.01%)
  helped: 201
  HURT: 80
  Inst-and-stalls are helped.

  total nops in shared programs: 269674 -> 269386 (-0.11%)
  nops in affected programs: 3641 -> 3353 (-7.91%)
  helped: 103
  HURT: 29
  Nops are helped.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17185>
2022-10-26 12:29:30 +00:00
Iago Toral Quiroga
b6093ffbe7 v3dv: expose VK_EXT_image_robustness
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>
2022-09-27 09:08:29 +00:00
Iago Toral Quiroga
c7e022abfd broadcom/compiler: add a lowering for robust image access
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18820>
2022-09-27 09:08:29 +00:00
Iago Toral Quiroga
87a9951073 broadcom/compiler: track number of TMU operations in prog data
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17854>
2022-08-15 23:35:16 +00:00
Alejandro Piñeiro
0a50330c3d broadcom/compiler: make several passes to return a progress
Two advantages:
   * When using NIR_DEBUG=nir_print_xx, will print outcome only if
     there is a change
   * We can use NIR_PASS(_, ...) instead of NIR_PASS_V, that has
     slightly more validation checks.

This includes:
  * v3d_nir_lower_image_load_store
  * v3d_nir_lower_io
  * v3d_nir_lower_line_smooth
  * v3d_nir_lower_load_store_bitsize
  * v3d_nir_lower_robust_buffer_access
  * v3d_nir_lower_scratch
  * v3d_nir_lower_txf_ms

As we are here we also simplify some of them by using the
nir_shader_instructions_pass helper.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>
2022-07-20 11:35:25 +00:00
Alejandro Piñeiro
81ca0b4191 broadcom/compiler: removed unused function
It is not even implemented.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17609>
2022-07-20 11:35:25 +00:00
Iago Toral Quiroga
90054e9c5d broadcom/compiler: track if a shader uses global intrinsics
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17275>
2022-07-19 09:47:34 +02:00
Iago Toral Quiroga
487c213142 v3d/compiler: add more stats to prog_data
So we can expose them via VK_KHR_pipeline_executable_properties.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16370>
2022-05-09 12:12:35 +00:00
Iago Toral Quiroga
ea3223e7a4 v3dv: implement VK_EXT_inline_uniform_block
Inline uniform blocks store their contents in pool memory rather
than a separate buffer, and are intended to provide a way in which
some platforms may provide more efficient access to the uniform
data, similar to push constants but with more flexible size
constraints.

We implement these in a similar way as push constants: for constant
access we copy the data in the uniform stream (using the new
QUNIFORM_UNIFORM_UBO_*) enums to identify the inline buffer from
which we need to copy and for indirect access we fallback to
regular UBO access.

Because at NIR level there is no distinction between inline and
regular UBOs and the compiler isn't aware of Vulkan descriptor
sets, we use the UBO index on UBO load intrinsics to identify
inline UBOs, just like we do for push constants. Particularly,
we reserve indices 1..MAX_INLINE_UNIFORM_BUFFERS for this,
however, unlike push constants, inline buffers are accessed
through descriptor sets, and therefore we need to make sure
they are located in the first slots of the UBO descriptor map.
This means we store them in the first MAX_INLINE_UNIFORM_BUFFERS
slots of the map, with regular UBOs always coming after these
slots.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15575>
2022-03-28 10:44:13 +00:00
Iago Toral Quiroga
a35b47a0b1 broadcom/compiler: add a strategy to disable scheduling of general TMU reads
This can add quite a bit of register pressure so it makes sense to disable it
to prevent us from dropping to 2 threads or increase spills:

total instructions in shared programs: 12672813 -> 12642413 (-0.24%)
instructions in affected programs: 256721 -> 226321 (-11.84%)
helped: 719
HURT: 77

total threads in shared programs: 415534 -> 416322 (0.19%)
threads in affected programs: 788 -> 1576 (100.00%)
helped: 394
HURT: 0

total uniforms in shared programs: 3711370 -> 3703861 (-0.20%)
uniforms in affected programs: 28859 -> 21350 (-26.02%)
helped: 204
HURT: 455

total max-temps in shared programs: 2159439 -> 2150686 (-0.41%)
max-temps in affected programs: 32945 -> 24192 (-26.57%)
helped: 585
HURT: 47

total spills in shared programs: 5966 -> 3255 (-45.44%)
spills in affected programs: 2933 -> 222 (-92.43%)
helped: 192
HURT: 4

total fills in shared programs: 9328 -> 4630 (-50.36%)
fills in affected programs: 5184 -> 486 (-90.62%)
helped: 196
HURT: 0

Compared to the stats before adding scheduling of non-filtered
memory reads we see we that we have now gotten back all that was
lost and then some:

total instructions in shared programs: 12663186 -> 12642413 (-0.16%)
instructions in affected programs: 2051803 -> 2031030 (-1.01%)
helped: 4885
HURT: 3338

total threads in shared programs: 415870 -> 416322 (0.11%)
threads in affected programs: 896 -> 1348 (50.45%)
helped: 300
HURT: 74

total uniforms in shared programs: 3711629 -> 3703861 (-0.21%)
uniforms in affected programs: 158766 -> 150998 (-4.89%)
helped: 1973
HURT: 499

total max-temps in shared programs: 2138857 -> 2150686 (0.55%)
max-temps in affected programs: 177920 -> 189749 (6.65%)
helped: 2666
HURT: 2035

total spills in shared programs: 3860 -> 3255 (-15.67%)
spills in affected programs: 2653 -> 2048 (-22.80%)
helped: 77
HURT: 21

total fills in shared programs: 5573 -> 4630 (-16.92%)
fills in affected programs: 3839 -> 2896 (-24.56%)
helped: 81
HURT: 15

total sfu-stalls in shared programs: 39583 -> 38154 (-3.61%)
sfu-stalls in affected programs: 8993 -> 7564 (-15.89%)
helped: 1808
HURT: 1038

total nops in shared programs: 324894 -> 323685 (-0.37%)
nops in affected programs: 30362 -> 29153 (-3.98%)
helped: 2513
HURT: 2077

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15276>
2022-03-09 15:53:04 +00:00
Iago Toral Quiroga
f761f8fd9e broadcom/compiler: simplify node/temp translation during register allocation
Now that we don't sort our nodes we can arrange them so we can
easily translate between nodes and temps without a mapping table,
just applying an offset.

To do this we have a single array of nodes where twe put first the nodes
for accumulators and then the nodes for temps. With this setup we can
ensure that for any given temp T, its node is always T + ACC_COUNT.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15168>
2022-03-02 08:09:11 +00:00
Iago Toral Quiroga
a4b164b57b broadcom/compiler: only patch temps that existed before the current spill
When we spill we add new temps. We should be careful not to access
liveness for these until we have re-computed it after all spills and
fill for that the spilled temp have been processed so as to avoid
out-of-bounds accesses to the c->temp_start and c->temp_end arrays.

This fixes a crash in a Three.js demo when we try to patch register
classes after a TMU spill that was caused because we would incorrectly
try to patch the same temps we had just added for the spill itself,
which is not only unnecessary but also incorrect since we these temps
would not have liveness information available yet and thus would
cause out of bounds accesses.

Fixes: f3c3228522 ('broadcom/compiler: do not rebuild the interference graph after each spill')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15107>
2022-02-22 06:41:51 +00:00
Iago Toral Quiroga
750eeecf4e broadcom/compiler: document that spill_base is used for spills and scratch
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
2022-02-18 08:38:19 +00:00
Iago Toral Quiroga
8883975209 broadcom/compiler: drop spill_count and add spilling boolean
We added spill_count to handle uniform batch spills, which we no longer do.
What we want now is a way to know if we are spilling registers.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
2022-02-18 08:38:19 +00:00
Iago Toral Quiroga
f3c3228522 broadcom/compiler: do not rebuild the interference graph after each spill
Instead, we only recompute liveness and we add new nodes and
interferences to the graph manually (we also need to patch
register classes in some cases).

To assist in this process, we also add an ip counter to our
instructions that we also recompute after each spill, which we use
to identify registers that cross thrsw boundries introduced with
TMU spills and fills and adjust their register classes accordingly
(removing their capacity to use accumulators).

This significantly reduces the CPU cost of spills. Using
shaders/closed/gputest/piano/7.shader_test as reference:

Compile time up to the first successful compile strategy in main is
~24s and with this change it is ~11s. With this speed up, we can now
try all 2-thread compile strategies (including the fallback scheduler)
in only ~15s.

A full shader-db run results in:
Total CPU time (seconds): 9904.67 -> 9087.98 (-8.25%)

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
2022-02-18 08:38:19 +00:00
Iago Toral Quiroga
40e091267d broadcom/compiler: define max number of tmu spills for compile strategies
Instead of whether they are allowed to spill or not. This is more flexible.
Also, while we are not currently enabling spilling on any 4-thread strategies,
should we do that in the future, always prefer a 4-thread compile.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15041>
2022-02-18 08:38:19 +00:00
Iago Toral Quiroga
7561ea8fa1 broadcom/compiler: allow ldunifa with read-only SSBOs
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14830>
2022-02-03 07:35:07 +00:00
Iago Toral Quiroga
765d9feb46 broadcom/compiler: add lowering pass to scalarize non 32-bit general load/store
V3D hardware doesn't support vector access for general TMU load/store
operations like the ones we use for UBO and SSBO, so we need to split
these to scalar operations.

It should be noted that we also have a vectorization pass (which runs
later, during optimization), that may reconstruct some of these into
32-bit operations when possible (i.e. when the resulting operation
is 32-bit aligned).

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
2022-01-25 09:08:26 +00:00
Dave Airlie
ccbf700d6c nir: remove gl.h include from nir headers.
This saves a lot of pointless gl.h includes across the board,
it moves the one place that needs GLenum into a separate file
only used in those passes that require it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14605>
2022-01-19 21:54:58 +00:00
Iago Toral Quiroga
a65c605365 broadcom/compiler: track passthrough Z writes
In some cases we need to make the shaders write the Z value produced
from rasterization (FEP). Track these instances because they are relevant
to early EZ setup.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14037>
2021-12-03 10:39:08 +00:00
Iago Toral Quiroga
6d4a645c90 broadcom/compiler: emit passthrough Z write if shader reads Z
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14037>
2021-12-03 10:39:08 +00:00
Iago Toral Quiroga
6923dd687c broadcom/compiler: allow color TLB writes in last instruction
Only Z writes are disallowed.

total instructions in shared programs: 11578449 -> 11577369 (<.01%)
instructions in affected programs: 38132 -> 37052 (-2.83%)
helped: 1080
HURT: 0
Instructions are helped.

total max-temps in shared programs: 2334416 -> 2334395 (<.01%)
max-temps in affected programs: 218 -> 197 (-9.63%)
helped: 21
HURT: 0
Max-temps are helped.

total inst-and-stalls in shared programs: 11607890 -> 11606810 (<.01%)
inst-and-stalls in affected programs: 38265 -> 37185 (-2.82%)
helped: 1080
HURT: 0
Inst-and-stalls are helped.

total nops in shared programs: 338316 -> 337236 (-0.32%)
nops in affected programs: 2625 -> 1545 (-41.14%)
helped: 1080
HURT: 0
Nops are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13964>
2021-11-29 06:44:07 +00:00
Iago Toral Quiroga
0cb58f80d2 v3d: use V3D_MAX_DRAW_BUFFERS instead of hardcoded constant
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13775>
2021-11-12 11:04:07 +00:00
Iago Toral Quiroga
3a95e25e84 v3dv,v3d: don't store swizzle pointer in shader/pipeline keys
We had been storing pointers to a driver owned swizzle table
rather than storing the actual swizzle value in various shader
and pipeline keys on both GL and Vulkan drivers.

This doesn't look very robust, particularly since we also
compute sha1 hashes from these values and we may store these
hashes to disk (for the disk cache).

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13738>
2021-11-10 11:24:26 +00:00
Alejandro Piñeiro
d50be41f8f broadcom/compiler: remove unused macro and function definition
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13444>
2021-10-20 10:08:27 +00:00
Alejandro Piñeiro
193898c8b0 broadcom/compiler: remove commented out vir_LOAD_IMM methods
It has been commented several years now. Let's remove it to reduce the
noise.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13008>
2021-09-24 08:46:06 +00:00
Juan A. Suarez Romero
53c8b4c093 broadcom: make vir_emit_last_thrsw() private
This function is only used in v3d_nir_to_vir(), so make it private.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12322>
2021-09-10 09:18:05 +00:00
Iago Toral Quiroga
b727eaac3c broadcom/compiler: add a vir_get_cond helper
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12278>
2021-08-10 08:47:40 +00:00
Iago Toral Quiroga
d5acae3206 broadcom/compiler: implement nir_intrinsic_load_view_index
This is used for multiview's gl_ViewIndex built-in.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12034>
2021-07-27 07:31:31 +00:00