Out-of-bounds writes could be eliminated per spec:
Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:
"In the subsections described above for array, vector, matrix and
structure accesses, any out-of-bounds access produced undefined
behavior....
Out-of-bounds writes may be discarded or overwrite
other variables of the active program.
Out-of-bounds reads return undefined values, which
include values from other variables of the active program or zero."
GL_KHR_robustness and GL_ARB_robustness encourage us to return zero
for reads.
Otherwise get_io_offset would return out-of-bound offset which may
result in out-of-bound loading/storing of inputs/outputs,
that could cause issues in drivers down the line.
E.g. this fixes such dereference:
int vue_slot = vue_map->varying_to_slot[intrin->const_index[0]];
in brw_nir.c
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6428>
When handling arrays, range is increased based on the array size minus
one. But if such is zero, it has the effect of reducing the
range. Handle that case by returning the unknown range value.
v2:
* Add missing braces.
* Return unknown range in this case, instead of keeping the initial
range.
v3: Simplify code, using existing "fail" label. (Jason)
Fixes the following using v3dv:
dEQP-VK.graphicsfuzz.cov-simplify-clamp-max-itself
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6737>
freedreno results (note that cat6 is loads from memory as opposed to
pushed constants from the constant file):
total instructions in shared programs: 8044344 -> 8022085 (-0.28%)
total constlen in shared programs: 1411384 -> 1461964 (3.58%)
total cat6 in shared programs: 89983 -> 87065 (-3.24%)
Over the last 3 commits, we increased Manhattan31 performance by ~10%
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6359>
For UBO accesses to be the same performance as classic GL default uniform
block uniforms, we need to be able to push them through the same path. On
freedreno, we haven't been uploading UBOs as push constants when they're
used for indirect array access, because we don't know what range of the
UBO is needed for an access.
I believe we won't be able to calculate the range in general in spirv
given casts that can happen, so we define a [0, ~0] range to be "We don't
know anything". We use that at the moment for all UBO loads except for
nir_lower_uniforms_to_ubo, where we now avoid losing the range information
that default uniform block loads come with.
In a departure from other NIR intrinsics with a "base", I didn't make the
base an be something you have to add to the src[1] offset. This keeps us
from needing to modify all drivers (particularly since the base+offset
thing can mean needing to do addition in the backend), makes backend
tracking of ranges easy, and makes the range calculations in
load_store_vectorizer reasonable. However, this could definitely cause
some confusion for people used to the normal NIR base.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6359>
The current align_mul calculation in the unknown-array-index
calculation is
align_mul = MIN3(parent_mul,
min_pow2_divisor(parent_offset),
min_pow2_divisor(stride))
which is certainly correct if parent_offset > 0. However, when
parent_offset = 0, min_pow2_divisor(parent_offset) isn't well-defined
and our calculation for it is 1 << -1 which isn't well-defined. That
said.... it's not actually needed.
The offset to the base of the array is
array_base = parent_mul * k + parent_offset
for some integer k. When we throw in an unknown array index i, we get
elem = parent_mul * k + parent_offset + stride * i.
If we set new_align = MIN2(parent_mul, min_pow2_divisor(stride)), then
both parent_mul and stride are divisible by new_align and
elem = (parent_mul / new_alig) * new_align * k +
(stride / new_align) * new_align * i + parent_offset
= new_align * ((parent_mul / new_alig) * k +
(stride / new_align) * i) + parent_offset
so elem = new_align * j + parent_offset where
j = (parent_mul / new_alig) * k + (stride / new_align) * i.
That's a very long-winded way of saying that we can delete one parameter
from the align_mul calculation and it's still fine. :-)
Fixes: 480329cf8b "nir: Add a helper for getting the alignment of a deref"
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Tested-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6628>
If the deref has no explicit alignment in the chain, we assume component
alignment which is what we currently assume for all derefs today. This
should be correct for all APIs in the sense that we can usually assume
at least component alignment. However, for some APIs such as OpenCL, we
could potentially make larger alignment assumptions. The intention is
that those will be handled via alignment-increasing casts.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6472>
This enables drivers and utils to get all IO information from intrinsics,
so that they don't have to walk the complex types of NIR variables to find
out other information about IO intrinsics.
NIR in/out variables can be removed after nir_lower_io. We could remove
the variables in the pass, but for now I just decided to remove
the variables in radeonsi before shaders are returned to st/mesa.
(st/mesa just needs adjustments to work without NIR in/out variables)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6442>
Instead of having separate lists of variables, roughly sorted by mode,
use a single list for all shader-level NIR variables. This makes a few
list walks a bit longer here and there but list walks aren't a very
common thing in NIR at all. On the other hand, it makes a lot of things
like validation, printing, etc. way simpler. Also, there are a number
of cases where we move variables from inputs/outputs to globals and this
makes it way easier because we no longer have to move them between
lists. We only have to deal with that if moving them from the shader to
a nir_function_impl.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5966>
For turnip, we use the "bindless" model on a6xx. Loads and stores with
the bindless model require a bindless base, which is an immediate field
in the instruction that selects between 5 different 64-bit "bindless
base registers", a 32-bit descriptor index that's added to the base, and
the usual 32-bit offset. The bindless base usually, but not always,
corresponds to the Vulkan descriptor set. We can handle the case where
the base is non-constant by using a bunch of if-statements, to make it a
little easier in core NIR, and this seems to be what Qualcomm's driver
does too. Therefore, the pointer format we need to use in NIR has a vec2
index, for the bindless base and descriptor index. Plumb this format
through core NIR.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5683>
By inserting a b2b1 around the load_ubo, load_input, etc. intrinsics
generated by nir_lower_io, we can ensure that the intrinsic has the
correct destination bit size. Not having the right size can mess up
passes which try to optimize access. In particular, it was causing
brw_nir_analyze_ubo_ranges to ignore load_ubo of booleans which meant
that booleans uniforms weren't getting pushed as push constants. I
don't think this is an actual functional bug anywhere hence no CC to
stable but it may improve perf somewhere.
Shader-db results on ICL with iris:
total instructions in shared programs: 16076707 -> 16075246 (<.01%)
instructions in affected programs: 129034 -> 127573 (-1.13%)
helped: 487
HURT: 0
helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.45% max: 3.00% x̄: 1.33% x̃: 1.36%
95% mean confidence interval for instructions value: -3.00 -3.00
95% mean confidence interval for instructions %-change: -1.37% -1.29%
Instructions are helped.
total cycles in shared programs: 338015639 -> 337983311 (<.01%)
cycles in affected programs: 971986 -> 939658 (-3.33%)
helped: 362
HURT: 110
helped stats (abs) min: 1 max: 1664 x̄: 97.37 x̃: 43
helped stats (rel) min: 0.03% max: 36.22% x̄: 5.58% x̃: 2.60%
HURT stats (abs) min: 1 max: 554 x̄: 26.55 x̃: 18
HURT stats (rel) min: 0.03% max: 10.99% x̄: 1.04% x̃: 0.96%
95% mean confidence interval for cycles value: -79.97 -57.01
95% mean confidence interval for cycles %-change: -4.60% -3.47%
Cycles are helped.
total sends in shared programs: 815037 -> 814550 (-0.06%)
sends in affected programs: 5701 -> 5214 (-8.54%)
helped: 487
HURT: 0
LOST: 2
GAINED: 0
The two lost programs were SIMD16 shaders in CS:GO. However, CS:GO was
also one of the most helped programs where it shaves sends off of 134
programs. This seems to reduce GPU core clocks by about 4% on the first
1000 frames of the PTS benchmark.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>