Commit Graph

323 Commits

Author SHA1 Message Date
Jason Ekstrand
59fb59ad54 nir: Get rid of nir_shader::stage
It's redundant with nir_shader::info::stage.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2017-10-20 12:49:17 -07:00
Eric Anholt
c34295b1a3 nir: Move vc4's alpha test lowering to core NIR.
I've been doing this inside of vc4, but vc5 wants it as well and it may be
useful for other drivers (Intel has a related path for pre-gen6 with MRT,
and freedreno had a TGSI path for it at one point).

This required defining a common enum for the standard comparison
functions, but other lowering passes are likely to also want that enum.

v2: Add to meson.build as well.

Acked-by: Rob Clark <robdclark@gmail.com>
2017-10-10 11:42:04 -07:00
Eric Anholt
3752ad28f2 broadcom/vc4: Fix use-after-free when deleting a program.
By leaving the compiled shader in the context's stage state, the next
compile of a new FS would look in the old compiled FS for figuring out
whether to set various dirty flags for the VS compile.  Clear out the
pointer when deleting the program, and make sure that we always mark the
state as dirty if the previous program had been lost.  Fixes valgrind
warnings on glsl-max-varyings.

Fixes: 2350569a78 ("vc4: Avoid VS shader recompiles by keeping a set of FS inputs seen so far.")
2017-09-18 20:17:25 -07:00
Nicolai Hähnle
e044e9eb2a st/glsl_to_nir: move nir_lower_io to drivers
This allows drivers more freedom in how exactly they want to lower I/O,
e.g. first lowering I/O to temporaries.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:30 +02:00
Nicolai Hähnle
c5f97eab09 st/mesa: get rid of st_glsl_types
It's a duplicate of glsl_type::count_attribute_slots.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-07-31 14:55:30 +02:00
Eric Anholt
45b0172693 vc4: Clean up release build warnings using MAYBE_UNUSED.
These variables are all used in an assert(), so release builds see no
usages.
2017-06-20 09:09:09 -07:00
Jason Ekstrand
b86dba8a0e nir: Embed the shader_info in the nir_shader again
Commit e1af20f18a changed the shader_info
from being embedded into being just a pointer.  The idea was that
sharing the shader_info between NIR and GLSL would be easier if it were
a pointer pointing to the same shader_info struct.  This, however, has
caused a few problems:

 1) There are many things which generate NIR without GLSL.  This means
    we have to support both NIR shaders which come from GLSL and ones
    that don't and need to have an info elsewhere.

 2) The solution to (1) raises all sorts of ownership issues which have
    to be resolved with ralloc_parent checks.

 3) Ever since 00620782c9, we've been
    using nir_gather_info to fill out the final shader_info.  Thanks to
    cloning and the above ownership issues, the nir_shader::info may not
    point back to the gl_shader anymore and so we have to do a copy of
    the shader_info from NIR back to GLSL anyway.

All of these issues go away if we just embed the shader_info in the
nir_shader.  There's a little downside of having to copy it back after
calling nir_gather_info but, as explained above, we have to do that
anyway.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-05-09 15:07:47 -07:00
Jason Ekstrand
762a6333f2 nir: Rework conversion opcodes
The NIR story on conversion opcodes is a mess.  We've had way too many
of them, naming is inconsistent, and which ones have explicit sizes was
sort-of random.  This commit re-organizes things and makes them all
consistent:

 - All non-bool conversion opcodes now have the explicit size in the
   destination and are named <src_type>2<dst_type><size>.

 - Integer <-> integer conversion opcodes now only come in i2i and u2u
   forms (i2u and u2i have been removed) since the only difference
   between the different integer conversions is whether or not they
   sign-extend when up-converting.

 - Boolean conversion opcodes all have the explicit size on the bool and
   are named <src_type>2<dst_type>.

Making things consistent also allows nir_type_conversion_op to be moved
to nir_opcodes.c and auto-generated using mako.  This will make adding
int8, int16, and float16 versions much easier when the time comes.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-03-14 07:36:40 -07:00
Eric Anholt
0fca01d027 vc4: Report to shader-db how many threads a fragment shader has.
Doing instruction count analysis when we emit the thread switches that
will save us from tons of stalls is kind of missing the point.
2017-03-08 13:44:17 -08:00
Eric Anholt
61359324c1 Revert "vc4: Lazily emit our FS/VS input loads."
This reverts commit 292c24ddac.  It broke a
lot of GLES2 deqp, and I see at least one problem that will require some
serious rework to fix.
2017-03-08 13:44:17 -08:00
Brian Paul
73bafb5ee3 gallium: s/unsigned/enum pipe_shader_type/ for get_compiler_options()
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-08 08:50:20 -07:00
Eric Anholt
292c24ddac vc4: Lazily emit our FS/VS input loads.
This reduces register pressure in both types of shaders, by reordering the
input loads from the var->data.driver_location order to whatever order
they appear first in the NIR shader.  These instructions aren't
reorderable at our QIR scheduling level because the FS takes two in
lockstep to do an interpolation, and the VS takes multiple read
instructions in a row to get a whole vec4-level attribute read.

shader-db impact:
total instructions in shared programs: 76666 -> 76590 (-0.10%)
instructions in affected programs:     42945 -> 42869 (-0.18%)
total max temps in shared programs: 9395 -> 9208 (-1.99%)
max temps in affected programs:     2951 -> 2764 (-6.34%)

Some programs get their max temps hurt, depending on the order that the
load_input intrinsics appear, because we end up being unable to copy
propagate an older VPM read into its only use.
2017-02-24 17:01:29 -08:00
Eric Anholt
f06915d7b7 vc4: Refactor the load_input code out of the intrinsic code.
It's going gain most of ntq_setup_inputs(), so simplify it first.
2017-02-24 16:31:54 -08:00
Eric Anholt
84a304eb96 vc4: Track the last block we emitted at the top level.
This will be used for delaying our VPM reads (which must be unconditional)
until just before they're used.
2017-02-24 16:31:54 -08:00
Eric Anholt
0514b0bdc9 vc4: Enable glSampleMask() even when !rasterizer->multisample.
gallium's blitter expects that it can set the sample mask even when the
rasterizer doesn't have the flag on.

Between this and the previous test, 10 new ext_framebuffer_multisample
tests start passing.
2017-02-10 14:17:05 -08:00
Eric Anholt
ce538a443d vc4: Use accurate 1/w in coordinate shader as well as vert shader.
We probably shouldn't be emitting different scaled viewport coordinates
between vertex and coord.
2017-02-10 14:17:04 -08:00
Eric Anholt
b230939303 vc4: Avoid emitting small immediates for UBO indirect load address guards.
The kernel will reject our shader if we emit one here, and having 4, 8, or
12 as the top end of our UBO clamp rare is enough that it's not worth
making the kernel let us.

Fixes piglit fs-const-array-of-struct and
fs-const-array-of-struct-of-array since recent GLSL linking changes made
us get this as an indirect load of a uniform, instead of a tempoary.

Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-02-10 14:17:04 -08:00
Eric Anholt
c1299615fb vc4: Avoid an extra temporary and mov in ffloor/ffract/fceil.
shader-db results:

total instructions in shared programs: 92611 -> 91764 (-0.91%)
instructions in affected programs:     27417 -> 26570 (-3.09%)

The star is one shader in glmark2's terrain (drops 16% of its
instructions), but there are also wins in mupen64plus and glb2.7.
2017-01-28 19:35:20 -08:00
Jason Ekstrand
fb181196de nir: Rename convert_to_ssa lower_regs_to_ssa
This matches the naming of nir_lower_vars_to_ssa, the other to-SSA pass.
2016-12-29 16:02:44 -08:00
Eric Anholt
63e7671c7e vc4: Enable NIR-based loop unrolling.
This successfully unrolls a new shader in GLB2.7, which also gets that
shader to successfully compile in multithreaded mode.
2016-12-29 14:41:09 -08:00
Ilia Mirkin
fd249c803e treewide: s/comparitor/comparator/
git grep -l comparitor | xargs sed -i 's/comparitor/comparator/g'

Just happened to notice this in a patch that was sent and included one
of the tokens in question.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-12 22:13:07 -05:00
Eric Anholt
8e5ec33f11 vc4: In a loop break/continue, jump if everyone has taken the path.
This should be a win for most loops, which tend to have uniform control
flow.

More importantly, it exposes important information to live variables: that
the break/continue here means that our jump target may have access to
values that were live on our input.  Previously, we were just setting the
exec mask and letting control flow fall through, so an intervening def
between the break and the end of the loop would appear to live variables
as if it screened off the variable, when it didn't actually.

Fixes a regression in glsl-vs-loop-redundant-condition.shader_test when a
perturbing of register allocation caused a live variable to get stomped.

Cc: 13.0 <mesa-stable@lists.freedesktop.org>
2016-11-30 19:58:09 -08:00
Eric Anholt
d4c20e82ae vc4: Restructure texture insts as ALU ops with tex_[strb] as the dst.
For now we're still just generating MOVs, but this will let us fold into
other ops in the future.  No difference on shader-db.
2016-11-29 08:38:59 -08:00
Eric Anholt
314f0c57e4 vc4: Refactor qir_get_op_nsrc(enum qop) to qir_get_nsrc(struct qinst *).
Every caller was dereffing the qinst, and this will let us make the number
of sources vary depending on the destination of the qinst so that we can
have general ALU ops that store to tex_[strb] and get an implicit uniform.
2016-11-29 08:38:59 -08:00
Marek Olšák
a3f6bea69a gallium: fix more occurences of u_hash.h
this fixes compile failures since 86514d84e0
2016-11-22 18:28:18 +01:00
Eric Anholt
7f27ad5597 vc4: Try compiling our FSes in multithreaded mode on new kernels.
Multithreaded fragment shaders let us hide texturing latency by a
hyperthreading-style switch to another fragment shader.  This gets us up
to 20% framerate improvements on glmark2 tests.
2016-11-16 19:45:01 -08:00
Eric Anholt
96ffee2d02 vc4: Mark threaded FSes as non-singlethread in the CL. 2016-11-12 19:21:46 -08:00
Eric Anholt
ace0d810e5 vc4: Flag the last thread switch in the program as the last.
We don't allow the last thread switch to be inside control flow, to be
sure that we hit the last state exactly once.  If the last texturing was
in control flow, fall back to single threaded.
2016-11-12 19:21:46 -08:00
Eric Anholt
67f72c5f5d vc4: Add THRSW nodes after each tex sample setup in multithreaded mode.
This is a suboptimal implementation, but Jonas Pfeil found that it was
still a massive performance gain.
2016-11-12 19:21:46 -08:00
Eric Anholt
08d51487e3 vc4: Clamp the shadow comparison value.
Fixes piglit glsl-fs-shadow2D-clamp-z.

Cc: <mesa-stable@lists.freedesktop.org>
2016-11-09 15:33:56 -08:00
Eric Anholt
4d019bd703 vc4: Don't abort when a shader compile fails.
It's much better to just skip the draw call entirely.  Getting this
information out of register allocation will also be useful for
implementing threaded fragment shaders, which will need to retry
non-threaded if RA fails.

Cc: <mesa-stable@lists.freedesktop.org>
2016-11-09 15:33:56 -08:00
Eric Anholt
283d4d18e5 vc4: Use Newton-Raphson on the 1/W write to fix glmark2 terrain.
The 1/W was apparently not accurate enough, and we were getting sparklies
in the distance.  The closed driver also did a N-R step here.

Cc: <mesa-stable@lists.freedesktop.org>
2016-11-04 15:34:38 -07:00
Eric Anholt
70fc3a941a vc4: Make sure that vertex shader texture2D() calls use LOD 0.
I noticed this while trying to debug glmark2 terrain (which does vertex
shader texturing, but no mipmaps on its textures sampled from the VS).
2016-11-04 15:34:38 -07:00
Marek Olšák
52d2b28f7f ralloc: use rzalloc where it's necessary
No change in behavior. ralloc_size is equivalent to rzalloc_size.
That will change though.

Calls not switched to rzalloc_size:
- ralloc_vasprintf
- glsl_type::name allocation (it's filled with snprintf)
- C++ classes where valgrind didn't show uninitialized values

I switched most of non-glsl stuff to rzalloc without checking whether
it's really needed.

Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Tested-by: Edmondo Tommasina <edmondo.tommasina@gmail.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-10-31 11:53:38 +01:00
Timothy Arceri
e1af20f18a nir/i965/anv/radv/gallium: make shader info a pointer
When restoring something from shader cache we won't have and don't
want to create a nir_shader this change detaches the two.

There are other advantages such as being able to reuse the
shader info populated by GLSL IR.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-10-26 14:29:36 +11:00
Eric Anholt
8ff4182876 vc4: Avoid making temporaries for assignments to NIR registers.
Getting stores to NIR regs to not generate new MOVs is tricky, since the
result we're trying to store into the NIR reg may have been from a
conditional update of a temp, or a series of packed writes.  The easiest
solution seems to be to require that nir_store_dest()'s arg comes from an
SSA temp.

This causes us to put in a few more temporary MOVs in the NIR SSA dest
case, but copy propagation successfully cleans those up.

The shader-db change is modest:

total instructions in shared programs: 93774 -> 93598 (-0.19%)
instructions in affected programs:     14760 -> 14584 (-1.19%)
total estimated cycles in shared programs: 212135 -> 211946 (-0.09%)
estimated cycles in affected programs:     27005 -> 26816 (-0.70%)

but I was seeing patterns in some register-allocation failures in DEQP
tests that looked like the extra MOVs would increase maximum register
pressure in loops.  Some debug code indicates that that's not the case,
though I'm still a bit confused by that result.
2016-10-21 14:12:22 -07:00
Eric Anholt
78087676c9 vc4: Restructure the simulator mode.
Rather than having simulator mode changes scattered around vc4_bufmgr.c
and vc4_screen.c, make vc4_bufmgr.c just call a vc4_simulator_ioctl, which
then dispatches to a corresponding implementation.

This will give the simulator support a centralized place to do tricks like
storing most BOs directly in simulator memory rather than copying in and
out.

This leaves special casing of mmaping BOs and execution, because of the
winsys mapping.
2016-10-21 14:12:22 -07:00
Eric Anholt
d4ae5ca823 vc4: Fix live intervals analysis for screening defs in if statements.
If a conditional assignment is only conditioned on the exec mask, that's
still screening off the value in the executed channels (and, since we're
not storing to the unexcuted channels, we don't care what's in there).

Fixes a bunch of extra register pressure on Processing's Ribbons demo,
which is failing to allocate.
2016-10-06 18:09:24 -07:00
Eric Anholt
b30205b112 vc4: Fix assertion fails from trying to cast non-ALU instrs to ALU.
Fixes 100 piglit tests since the assertions were added to nir.h.  What's
amazing is that these tests used to pass, even when casting garbage.
2016-10-06 18:09:24 -07:00
Jason Ekstrand
2ed17d46de nir: Make nir_foo_first/last_cf_node return a block instead
One of NIR's invariants is that control flow lists always start and end
with blocks.  There's no good reason why we should return a cf_node from
these functions since we know that it's always a block.  Making it a block
lets us remove a bunch of code.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-10-06 09:16:37 -07:00
Eric Anholt
36f0f03182 nir: Allow opt_peephole_sel to be more aggressive in flattening IFs.
VC4 was running into a major performance regression from enabling control
flow in the glmark2 conditionals test, because of short if statements
containing an ffract.

This pass seems like it was was trying to ensure that we only flattened
IFs that should be entirely a win by guaranteeing that there would be
fewer bcsels than there were MOVs otherwise.  However, if the number of
ALU ops is small, we can avoid the overhead of branching (which itself
costs cycles) and still get a win, even if it means moving real
instructions out of the THEN/ELSE blocks.

For now, just turn on aggressive flattening on vc4.  i965 will need some
tuning to avoid regressions.  It does looks like this may be useful to
replace freedreno code.

Improves glmark2 -b conditionals:fragment-steps=5:vertex-steps=0 from 47
fps to 95 fps on vc4.

vc4 shader-db:
total instructions in shared programs: 101282 -> 99543 (-1.72%)
instructions in affected programs:     17365 -> 15626 (-10.01%)
total uniforms in shared programs: 31295 -> 31172 (-0.39%)
uniforms in affected programs:     3580 -> 3457 (-3.44%)
total estimated cycles in shared programs: 225182 -> 223746 (-0.64%)
estimated cycles in affected programs:     26085 -> 24649 (-5.51%)

v2: Update shader-db output.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
2016-09-22 11:10:21 +03:00
Kenneth Graunke
2d8a3fa7ea nir: Report progress from nir_lower_phis_to_scalar.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-09-14 12:01:51 -07:00
Kenneth Graunke
32630e211e nir: Report progress from nir_lower_alu_to_scalar.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2016-09-14 12:01:49 -07:00
Eric Anholt
9688166bd9 vc4: Move the render job state into a separate structure.
This is a preparation step for having multiple jobs being queued up at the
same time.
2016-09-14 06:08:03 +01:00
Eric Anholt
60bed14d0f vc4: Handle discards while in control flow.
I missed this while adding loop support because the discard test inside a
loop was crashing before, anyway.  Fixes piglit glsl-fs-discard-04.
2016-08-29 11:03:11 -07:00
Eric Anholt
00c72acba5 vc4: Add support for fddx/fddy
Based vaguely on a patch by jonasarrow on github.
2016-08-25 17:24:11 -07:00
Eric Anholt
47e3cc7557 vc4: Tell state_tracker that we would prefer NIR.
Before this series, the code generation path was:

GLSL IR -> TGSI -> NIR -> NIR clone -> QIR -> QPU

Now it's (generally)

GLSL IR -> NIR -> NIR clone -> QIR -> QPU
2016-08-22 12:11:08 -07:00
Eric Anholt
f4d143f0d9 vc4: Use proper type sizes for uniforms. 2016-08-22 11:52:26 -07:00
Eric Anholt
bdb54cdc16 vc4: Add VARYING_SLOT_PNTC support.
We end up with this when doing GLSL-to-NIR.
2016-08-22 11:52:26 -07:00
Eric Anholt
e8378fee0c nir: Define system values for vc4's blending-lowering arguments.
In the GLSL-to-NIR conversion of VC4, I had a bit of trouble with what I
was calling the "state uniforms" that I was putting into the NIR fighting
with its other lowering passes.  Instead of using magic uniform base
numbers in the backend, follow the lead of load_user_clip_plane and just
define system values for them.

v2: Fix unintended change to channel_num, drop unspecified const_index
    value on blend_const_color_r_float.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-22 11:52:26 -07:00