Commit Graph

4 Commits

Author SHA1 Message Date
Alyssa Rosenzweig
7f6491b76d nir: Combine if_uses with instruction uses
Every nir_ssa_def is part of a chain of uses, implemented with doubly linked
lists.  That means each requires 2 * 64-bit = 16 bytes per def, which is
memory intensive. Together they require 32 bytes per def. Not cool.

To cut that memory use in half, we can combine the two linked lists into a
single use list that contains both regular instruction uses and if-uses. To do
this, we augment the nir_src with a boolean "is_if", and reimplement the
abstract if-uses operations on top of that list. That boolean should fit into
the padding already in nir_src so should not actually affect memory use, and in
the future we sneak it into the bottom bit of a pointer.

However, this creates a new inefficiency: now iterating over regular uses
separate from if-uses is (nominally) more expensive. It turns out virtually
every caller of nir_foreach_if_use(_safe) also calls nir_foreach_use(_safe)
immediately before, so we rewrite most of the callers to instead call a new
single `nir_foreach_use_including_if(_safe)` which predicates the logic based on
`src->is_if`. This should mitigate the performance difference.

There's a bit of churn, but this is largely a mechanical set of changes.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22343>
2023-04-07 23:48:03 +00:00
Emma Anholt
22f7f167cd nir/opt_phi_precision: Fix missing swizzles when narrowing phi srcs.
This NIR:

	vec4 32 ssa_169 = phi block_1: ssa_168, block_2: ssa_138
	vec1 16 ssa_209 = f2fmp ssa_169.x
	vec1 16 ssa_210 = f2fmp ssa_169.y
	vec1 16 ssa_211 = f2fmp ssa_169.z
	vec1 16 ssa_212 = f2fmp ssa_169.w
	vec4 16 ssa_213 = vec4 ssa_209, ssa_210, ssa_211, ssa_212
	intrinsic store_output (ssa_213, ssa_171) (base=0, wrmask=xyzw /*15*/, component=0, src_type=float16 /*144*/, io location=4 slots=1 mediump /*8388740*/, xfb() /*0*/, xfb2() /*0*/)

would turn into:

	vec4 32 ssa_169 = phi block_1: ssa_168, block_2: ssa_138
	vec4 16 ssa_216 = phi block_1: ssa_214, block_2: ssa_215
	vec1 16 ssa_209 = f2fmp ssa_169.x
	vec1 16 ssa_210 = f2fmp ssa_169.y
	vec1 16 ssa_211 = f2fmp ssa_169.z
	vec1 16 ssa_212 = f2fmp ssa_169.w
	vec4 16 ssa_213 = vec4 ssa_216.x, ssa_216.x, ssa_216.x, ssa_216.x
	intrinsic store_output (ssa_213, ssa_171) (base=0, wrmask=xyzw /*15*/, component=0, src_type=float16 /*144*/, io location=4 slots=1 mediump /*8388740*/, xfb() /*0*/, xfb2() /*0*/)

ignoring the swizzles from the f2fmp srcs.  Fixes failures in
dEQP-GLES2.functional.shaders.random.all_features.fragment.20 on
turnip+ANGLE.

Fixes: c7b935962b ("nir: Add pass to lower phi precision")
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19179>
2022-10-22 03:06:31 +00:00
Emma Anholt
673cc9323a nir: Move phi src setup to a helper.
Cleans up the ralloc/list push code all over the tree.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11772>
2021-08-13 16:11:57 +00:00
Rob Clark
c7b935962b nir: Add pass to lower phi precision
In addition to register pressure benefits from getting more fp16/int16,
this avoids i2imp's from standing in the way of loop unrolling.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11545>
2021-06-29 23:27:28 +00:00