The offset for the VPM write for storing outputs from the geometry
shader isn’t necessarily uniform across all the lanes. This can happen
if some of the lanes don’t emit some of the vertices. In that case the
offset for the subsequent vertices will be different in each lane. In
that case we need to use the stvpmd instruction instead of stvpmv
because it will scatter the values out.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3150
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5621>
Right now, it's implemented as a no-op for everyone. For most drivers,
it's a switch case in the NIR -> whatever which just breaks. For ir3,
they already have code to delete tessellation barriers so we just add a
case to also delete memory_barrier_tcs_patch.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>
When geometry shaders write a value to gl_Layer that doesn't correspond to
an existing layer in the target framebuffer the rendering behavior is
undefined according to the spec, however, there are CTS tests that trigger
this scenario on purpose, probably to ensure that nothing terrible happens.
For V3D, this situation is problematic because the binner uses the layer
index to select the offset to write into the tile state data, and we only
allocate tile state for MAX2(num_layers, 1), so we want to make sure we
don't produce values that would lead to out of bounds writes. The simulator
has an assert to catch this, although we haven't observed issues in actual
hardware it is probably best to play safe.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Most of the relevant work happens in the v3d_nir_lower_io. Since
geometry shaders can write any number of output vertices, this pass
injects a few variables into the shader code to keep track of things
like the number of vertices emitted or the offsets into the VPM
of the current vertex output, etc. This is also where we handle
EmitVertex() and EmitPrimitive() intrinsics.
The geometry shader VPM output layout has a specific structure
with a 32-bit general header, then another 32-bit header slot for
each output vertex, and finally the actual vertex data.
When vertex shaders are paired with geometry shaders we also need
to consider the following:
- Only geometry shaders emit fixed function outputs.
- The coordinate shader used for the vertex stage during binning must
not drop varyings other than those used by transform feedback, since
these may be read by the binning GS.
v2:
- Use MAX3 instead of a chain of MAX2 (Alejandro).
- Make all loop variables unsigned in ntq_setup_gs_inputs (Alejandro)
- Update comment in IO owering so it includes the GS stage (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Specifically when we are in non-uniform control flow, as we would need
to set the condition for the last instruction. If (for example) a
image atomic load stores directly their value on a NIR register,
last_inst would be a nop, and would fail when set the condition.
Fixes piglit test:
spec/glsl-es-3.10/execution/cs-ssbo-atomic-if-else-2.shader_test
Fixes: 6281f26f06 ("v3d: Add support for shader_image_load_store.")
v2: (Changes suggested by Eric Anholt)
* Cover all sig.ld* signals, not just ldunif and ldtmu, as all of
them have the same restriction.
* Update comment explaining why we add a MOV in that case
* Tweak commit message.
v3:
* Drop extra set of parens (Eric)
* Add missing ld signal to is_ld_signal to fix shader-db regression.
Reviewed-by: Eric Anholt <eric@anholt.net>
To make PIPE_FORMATs usable from non-gallium parts of Mesa, I want to
move their helpers out of gallium. Since u_format used
util_copy_rect(), I moved that in there, too.
I've put it in a separate directory in util/ because it's a big chunk
of related code, and it's not clear to me whether we might want it as
a separate library from libmesa_util at some point.
Closes: #1905
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
That we set for any TMU write on spills and general tmu. It is then
used as part of v3d_emit_gl_shader_state later.
v2: add a new flag instead at v3d_compiler instead of dirty the flag
at v3dx if there is any spill (change suggested by Eric, added by
Alejandro)
v3: set this for anything that is not a load and do it also in
v3d40_vir_emit_image_load_store (Eric)
Reviewed-by: Eric Anholt <eric@anholt.net>
Set of opcodes doesn't have enough flexibility in certain cases. E.g.
Utgard PP has vector conditional select operation, but condition is always
scalar. Lowering all the vector selects to scalar increases instruction
number, so we need a way to filter only those ops that can't be handled
in hardware.
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
This better matches all the other atomic intrinsics such as those for
SSBOs and shared variables where the sign is part of the intrinsic
opcode. Both generators (GLSL and SPIR-V) know the sign from the type
of the image variable or handle. In SPIR-V, signed min/max are separate
opcodes from unsigned.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
v2:
- Fix incremental update of the const offset when we need to emit a sequence
with more than one write because of the writemask.
- Do not move the tmu write emission to a separate helper.
v3:
- Get the store writemask before the loop, use ffs to get the first component
to write and clear writemask bits as we process the components (Eric).
- Simplified the code that figured out the number of components for the TMU
config based on the number of tmu writes for stores and atomics.
v4:
- Code clean-ups (Eric).
Fixes:
KHR-GLES31.core.shader_image_load_store.advanced-cast-cs
KHR-GLES31.core.shader_image_load_store.advanced-cast-fs
KHR-GLES31.core.shader_storage_buffer_object.advanced-switchBuffers-cs
KHR-GLES31.core.shader_storage_buffer_object.advanced-switchPrograms-cs
KHR-GLES31.core.shader_storage_buffer_object.basic-operations-case1-cs
Reviewed-by: Eric Anholt <eric@anholt.net>
When we implement write masks on store operations we might need to
emit multiple write sequences for a given store intrinsic. To make
that easier, let's split the emission of the tmud instructions to
their own block after we are done with the code that only needs to
run once no matter how many write sequences we need to emit.
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: add to series
v3: update Makefile.sources
v4: don't remove a comment and break statement
v4: use nir_can_move_instr
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We want to split the tlb specifier setup from the color writes, because when
we implement per-sample color writes we want to do the latter for all the
samples, but the former only once.
Reviewed-by: Eric Anholt <eric@anholt.net>
We will soon be adding per-sample color writes which means additional
complexity and more indentation (we will need another loop to emit
the writes for each individual sample), so this will help keeping
things simple and a bit more readable.
Reviewed-by: Eric Anholt <eric@anholt.net>
Among other things, this avoid the need of loading 1/-1 constants (so
one less operation).
The removed comment suggest the option of adding support on NIR for
inc/dec. Intel just uses an auxiliar method to get which hw operation
is needed, so no lowering is needed. And at the same time, being so
small, seems unreasonable to try to add a general one on NIR
itself. It is more easy to just adapt the method here (that is what
the patch does right now).
It is worth to note that we are not getting any change on shader-db
stats because all those methods are used on the usual shader-db set
with shaders needing GLSL > 4.2. In general there aren't too many GLSL
ES 3.1 tests.
As an alternative, we captured the GLES3/GLSL31/GLS32 used on
vk-gl-cts, even if that is not a real life usage of shaders. With
those we get the following:
total instructions in shared programs: 1217022 -> 1217013 (<.01%)
instructions in affected programs: 117 -> 108 (-7.69%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 2 x̄: 1.50 x̃: 1
helped stats (rel) min: 3.57% max: 10.00% x̄: 8.09% x̃: 9.09%
95% mean confidence interval for instructions value: -2.07 -0.93
95% mean confidence interval for instructions %-change: -10.54% -5.64%
Instructions are helped.
Note that the shaders helped are really low because most of the
vk-gl-cts tests using AtomicInc/Dec/Add are mostly used on compute
shaders. Although right now there is a branch around with CS support,
the usual is doing the stats against master.
Reviewed-by: Eric Anholt <eric@anholt.net>
They are already defined, although is a slightly different format on
the generated packet headers, so it was needed to change how it is
used on nir_to_vir.
In addition to allow to remove some duplicated headers, it will allow
to define just one get_op_for_atomic_add aux method later to support
using inc/dec instead of add of 1/-1.
Reviewed-by: Eric Anholt <eric@anholt.net>
Until now we have always been emitting our scoreboard locks on the last thread
switch to improve parallelism. We did this by emitting our last thread switch
right before our tlb writes at the very end of the program, where we know that
we are outside control flow.
Unfortunately, this strategy is not valid when we have tlb color reads too, as
these will happen before this point in the program and can happen inside
control flow.
To fix this we always emit a thread switch before the first tlb load and if we
see additional thread switches after that point, we change the strategy to lock
on the first thread switch.
v2: change the solution so it is expected to work in more scenarios (Eric).
Reviewed-by: Eric Anholt <eric@anholt.net>
We will be emitting this intrinsic to signal TLB color loads when we implement
OpenGL logic operations, where we need to blend the fragment shader color
output with the existing color in the render target.
Per-sample TLB reads are not supported yet.
v2: fix the offset into the color_reads array (Eric).
Reviewed-by: Eric Anholt <eric@anholt.net>
That is: the five least significant bits provide the values of
'bits' and 'offset' which is the case for all hardware currently
supported by NIR and using the bfm/bfe instructions.
This patch also changes the lowering of bitfield_insert/extract
using shifts to not use bfm and removes the flag 'lower_bfm'.
Tested-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Either all channels executed the 'then' block, in which case all
channels will directly jump to the 'endif' block at the end of the
'then' block, or all channels execute the 'else' block (so no
execution masking is necessary).
Shader-db results:
total instructions in shared programs: 9119238 -> 9117550 (-0.02%)
instructions in affected programs: 401252 -> 399564 (-0.42%)
helped: 855
HURT: 77
total uniforms in shared programs: 3022622 -> 3022605 (<.01%)
uniforms in affected programs: 3566 -> 3549 (-0.48%)
helped: 17
HURT: 0
total max-temps in shared programs: 1327762 -> 1327774 (<.01%)
max-temps in affected programs: 619 -> 631 (1.94%)
helped: 2
HURT: 15
Reviewed-by: Eric Anholt <eric@anholt.net>
Currently, st/mesa is always calling the GLSL IR lower_instructions()
pass with MOD_TO_FLOOR set, so mod operations will be lowered before
ever reaching NIR. This enables the same lowering at the NIR level,
which will let me shut off the GLSL IR path for NIR-based drivers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Eric Anholt <eric@anholt.net>
The difference between imov and fmov has been a constant source of
confusion in NIR for years. No one really knows why we have two or when
to use one vs. the other. The real reason is that they do different
things in the presence of source and destination modifiers. However,
without modifiers (which many back-ends don't have), they are identical.
Now that we've reworked nir_lower_to_source_mods to leave one abs/neg
instruction in place rather than replacing them with imov or fmov
instructions, we don't need two different instructions at all anymore.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Acked-by: Rob Clark <robdclark@chromium.org>
This can be used by both etnaviv and freedreno/a2xx as they are both vec4
architectures with some instructions being scalar-only.
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
I don't know why I thought NIR_PASS always set the progress variable.
Derp.
Fixes: d41cdef2a5 ("nir: Use the flrp lowering pass instead of nir_opt_algebraic")
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Coverity CID: 1444996
Coverity CID: 1444995
Coverity CID: 1444994
Coverity CID: 1444993
Coverity CID: 1444991
Coverity CID: 1444989
I tried to be very careful while updating all the various drivers, but I
don't have any of that hardware for testing. :(
i965 is the only platform that sets always_precise = true, and it is
only set true for fragment shaders. Gen4 and Gen5 both set lower_flrp32
only for vertex shaders. For fragment shaders, nir_op_flrp is lowered
during code generation as a(1-c)+bc. On all other platforms 64-bit
nir_op_flrp and on Gen11 32-bit nir_op_flrp are lowered using the old
nir_opt_algebraic method.
No changes on any other Intel platforms.
v2: Add panfrost changes.
Iron Lake and GM45 had similar results. (Iron Lake shown)
total cycles in shared programs: 188647754 -> 188647748 (<.01%)
cycles in affected programs: 5096 -> 5090 (-0.12%)
helped: 3
HURT: 0
helped stats (abs) min: 2 max: 2 x̄: 2.00 x̃: 2
helped stats (rel) min: 0.12% max: 0.12% x̄: 0.12% x̃: 0.12%
Reviewed-by: Matt Turner <mattst88@gmail.com>