Commit Graph

207 Commits

Author SHA1 Message Date
Eric Anholt
bdef17b052 v3d: Store the actual mask of color buffers present in the key.
If you only bound rt 1+, we'd still emit a write to the rt0 that isn't
present (noticed while debugging an
ext_framebuffer_multisample-alpha-to-coverage-no-draw-buffer-zero
regression in another change).
2019-02-05 15:42:04 -08:00
Eric Anholt
3e743d8cd8 v3d: Avoid duplicating limits defines between gallium and v3d core.
We don't want to pull the compiler into every include in the gallium
driver, so just make a new little header to store the limits.
2019-01-27 08:30:03 -08:00
Eric Anholt
fe6a21c867 v3d: Fix overly-large vattr_sizes structs.
We want one vector size per vector, not per component.
2019-01-27 08:30:03 -08:00
Eric Anholt
f72820c851 v3d: Add support for CS barrier() intrinsics. 2019-01-14 15:40:55 -08:00
Eric Anholt
9b45b06d7c v3d: Add support for CS shared variable load/store/atomics.
CS shared variables are handled effectively as SSBO access to a temporary
buffer that will be allocated at CS dispatch time.
2019-01-14 15:40:55 -08:00
Eric Anholt
01d913cf90 v3d: Add support for CS workgroup/invocation id intrinsics.
We get a payload for the ivec3 workgroup and an int local invocation
index, and we use the core lowering to turn into the global invocation id
and the local invocation id ivec3s.
2019-01-14 15:40:55 -08:00
Eric Anholt
6281f26f06 v3d: Add support for shader_image_load_store.
This is only exposed on V3D 4.1+, because we didn't have the TMU write
operations for images on 3.3 (To do GLES 3.1 there, you have to lower it
to SSBO load/stores, which is a problem to solve later).
2019-01-14 15:40:55 -08:00
Eric Anholt
5932c2f0b9 v3d: Add SSBO/atomic counters support.
So far I assume that all the buffers get written.  If they weren't, you'd
probably be using UBOs instead.
2019-01-14 15:40:55 -08:00
Eric Anholt
d2b899c0ec v3d: Refactor compiler entrypoints.
Before, I had per-stage entryoints with some helpers shared between them.
As I extended for compute shaders and shader-db, it turned out that the
other common code in the middle wanted to be shared too.
2019-01-02 14:12:29 -08:00
Eric Anholt
5e9ee6e841 v3d: Fold comparisons for IF conditions into the flags for the IF.
total instructions in shared programs: 6193810 -> 6192844 (-0.02%)
instructions in affected programs: 800373 -> 799407 (-0.12%)
2019-01-02 14:12:29 -08:00
Eric Anholt
ebde5afb93 v3d: Move "does this instruction have flags" from sched to generic helpers.
I wanted to reuse it for DCE of flags updates.
2018-12-30 08:03:51 -08:00
Eric Anholt
696f63f1b4 v3d: Hook up some shader-db output to GL_ARB_debug_output.
This allows the original shader-db project's run.c runner to parse things
easily, and is probably a good thing to have for GL_ARB_debug_output in
general.  I formatted it more like Intel's so I can mostly reuse their
report script.
2018-12-30 08:03:51 -08:00
Eric Anholt
d80761b8f3 v3d: Drop shadow comparison state from shader variant key.
The shadow state is now in the sampler.
2018-12-20 11:29:30 -08:00
Eric Anholt
00e2cbc049 v3d: Fix the argument type for vir_BRANCH().
Apparently this has been spewing warnings for Jason's clang, but not my
gcc.
2018-12-17 09:52:23 -08:00
Eric Anholt
532b6c5671 v3d: Move uniform pretty-printing to its own helper function.
I want to reuse it in the QPU dump.
2018-12-14 17:48:01 -08:00
Eric Anholt
bad95bb13c v3d: Add VIR dumping of TMU config p0/p1.
I had a bit of it for V3D 3.x, but didn't update it for 4.x.
2018-12-07 16:48:23 -08:00
Eric Anholt
5932575299 v3d: Garbage collect unused uniforms code. 2018-12-07 16:48:23 -08:00
Eric Anholt
acecee4c2d v3d: Return the right gl_SampleMaskIn[] value.
It's supposed to be the dispatched sample mask for this pixel, not the GL
state's sample mask.
2018-12-07 16:48:23 -08:00
Eric Anholt
6870111051 v3d: Fix a comment typo 2018-12-07 16:48:23 -08:00
Eric Anholt
42652ea51e v3d: Use combined input/output segments.
The HW apparently has some issues (or at least a much more complicated VCM
calculation) with non-combined segments, and the closed source driver also
uses combined I/O.  Until I get the last CTS failure resolved (which does
look plausibly like some VPM stomping), let's use combined I/O too.
2018-12-07 16:48:23 -08:00
Eric Anholt
1561e4984e v3d: Emit the VCM_CACHE_SIZE packet.
This is needed to ensure that we don't get blocked waiting for VPM space
with bin/render overlapping.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
2018-08-06 13:03:23 -07:00
Eric Anholt
f2c0d310d6 v3d: Make sure that QPU instruction-has-a-dest matches VIR.
Found when debugging register spilling -- we would try to spill the dest
of a STVPMV, inserting spill code after entering the last segment.  In
fact, we were likely to to choose to do this, given that the STVPMV "dest"
temp was never read from, making it cheap to spill.

Cc: "18.2" <mesa-stable@lists.freedesktop.org>
2018-08-06 13:03:23 -07:00
Eric Anholt
3471ce9985 v3d: Add support for the TMUWT instruction.
This instruction is used to ensure that TMU stores have been processed
before moving on.  In particular, you need any TMU ops to be done by the
time the shader ends.
2018-07-31 16:05:04 -07:00
Eric Anholt
6b73a97f84 v3d: Implement a small immediates optimization, based on VC4's.
We can do one per instruction, and we have to be careful not to overwrite
raddr_b, but this greatly reduces the pressure on uniform loads
(particularly around ldvpm/stvpm instructions).

total instructions in shared programs: 90768 -> 88220 (-2.81%)
instructions in affected programs:     82711 -> 80163 (-3.08%)
2018-07-23 10:21:43 -07:00
Eric Anholt
e7ae900341 v3d: Switch to using the new SFU instructions on V3D 4.x.
These instructions let us write directly to the phys regfile, instead of
just R4.  That lets us avoid moving out of R4 to avoid conflicting with
other SFU results, and to avoid conflicting with thread switches.

There is still an extra instruction of latency, which is not represented
in the scheduler at the moment.  If you use the result before it's ready,
the QPU will just stall, unlike the magic R4 mode where you'd read the
previous value.  That means that the following shader-db results aren't
quite representative (since we now cause some stalls instead of emitting
nops), but they're impressive enough that I'm happy with the change.

total instructions in shared programs: 95669 -> 91275 (-4.59%)
instructions in affected programs:     82590 -> 78196 (-5.32%)
2018-07-23 10:21:43 -07:00
Eric Anholt
cdfa99657d v3d: Fix the name of the "flpop" operation.
Noticed while trying to sort a new op into the appropriate place to match
the documentation.
2018-07-23 10:21:43 -07:00
Eric Anholt
beeb94402f v3d: Implement noperspective varyings on V3D 4.x.
Fixes a bunch of piglit interpolation tests, and reduces my concern about
some MSAA blit shaders with noperspective varyings.
2018-07-09 11:48:32 -07:00
Eric Anholt
e130ada243 v3d: Fix shaders using pixel center W but no varyings.
The docs called this field "uses both center W and centroid W", but
actually it's "do you need center W even if varyings don't obviously call
for it?"

Fixes dEQP-GLES3.functional.shaders.builtin_variable.fragcoord_w
2018-06-15 16:09:39 -07:00
Eric Anholt
48011c42aa v3d: Remove unused QUNIFORM_STENCIL left over from vc4. 2018-06-14 16:52:25 -07:00
Eric Anholt
76ee9edcb4 broadcom/vc5: Add support for centroid varyings.
It would be nice to share the flags packet emit logic with flat shade
flags, but I couldn't come up with a good way while still using our pack
macros.  We need to refactor this to shader record setup at compile time,
anyway.

Fixes ext_framebuffer_multisample-interpolation * centroid-*
2018-04-26 11:30:22 -07:00
Eric Anholt
503716fa86 broadcom/vc5: Remove leftover vc4 MSAA lowering setup in the FS key. 2018-04-25 09:21:54 -07:00
Eric Anholt
baeb6a4b4a broadcom/vc5: Fix up the NIR types of FS outputs generated by NIR-to-TGSI.
Unfortunately TGSI doesn't record the type of the FS output like GLSL
does, but VC5's TLB writes depend on the output's base type.  Just record
the type in the key at variant compile time when we've got a TGSI input
and then fix it up.

Fixes KHR-GLES3.packed_pixels.pbo_rectangle.rgba32i/ui and apparently a
GPU hang that breaks most tests that come after it.
2018-03-21 14:02:34 -07:00
Eric Anholt
00910e3057 broadcom/vc5: Don't annotate dumps with stale live intervals.
As you're debugging register allocation, you may have changed the
intervals and not recomputed yet.  Just skip the dump in that case.
2018-03-19 16:44:20 -07:00
Eric Anholt
facc3c6f58 broadcom/vc5: Add support for register spilling.
Our register spilling support is nice to have since vc4 couldn't at all,
but we're still very restricted due to needing to not spill during a TMU
operation, or during the last segment of the program (which would be nice
to spill a value of, when there's a long-lived value being passed through
with little modification from the start to the end).

We could do better by emitting unspills for the last-segment values just
before the last thrsw, since the last segment is probably not the maximum
interference area.

Fixes GTF uniform_buffer_object_arrays_of_all_valid_basic_types and 3
others.
2018-03-19 16:44:06 -07:00
Eric Anholt
d721348dcd broadcom/vc5: Add cursors to the compiler infrastructure, like NIR's.
This will let me do lowering late in compilation using the same
instruction builder as we use in nir_to_vir.
2018-03-19 16:42:59 -07:00
Eric Anholt
c81d681742 broadcom/vc5: Move the umul macro to a header.
Anywhere we want to multiply, we probably want this.
2018-03-19 16:42:59 -07:00
Eric Anholt
9e28c18cd1 broadcom/vc5: Correct the arg count of TIDX/EIDX. 2018-03-19 16:42:59 -07:00
Eric Anholt
368bab43fd broadcom/vc5: Add support for loading varyings in V3D 4.1.
The LDVARY signal now writes an arbitrary register, so I took out the
magic src register file and replaced it with an instruction with LDVARY
set so we have somewhere to hang a QFILE_TEMP destination for register
allocation.
2018-01-12 21:57:21 -08:00
Eric Anholt
5aaea3c4a0 broadcom/vc5: Add compiler support for V3D 4.x texturing. 2018-01-12 21:56:57 -08:00
Eric Anholt
42a35da96d broadcom/vc5: Move V3D 3.3 texturing to a separate file.
V3D 4.x texturing changes enough that #ifdefs would just make a mess of
it.
2018-01-12 21:56:37 -08:00
Eric Anholt
acf30e4916 broadcom/vc5: Move V3D 3.3 VPM write setup to a separate file.
For V4.1 texturing, I need the V4.1 XML, so the main compiler needs to
stop including V3.3 XML.
2018-01-12 21:56:24 -08:00
Eric Anholt
90269ba353 broadcom/vc5: Use THRSW to enable multi-threaded shaders.
This is a major performance boost on all of V3D, but is required on V3D
4.x where shaders are always either 2- or 4-threaded.
2018-01-12 21:55:30 -08:00
Eric Anholt
a075bb6726 broadcom/vc5: Implement GFXH-1684 workaround.
Apparently the VPM writes need to be flushed out before we end the shader.
2018-01-12 21:55:15 -08:00
Eric Anholt
edbd817c30 broadcom/vc5: Use a physical-reg-only register class for LDVPM.
This is needed for LDVPM on V3D 4.x, but will also be needed for keeping
values out of the accumulators across THRSW.
2018-01-12 21:54:42 -08:00
Eric Anholt
22a02f3e34 broadcom/vc5: Use the new LDVPM/STVPM opcodes on V3D 4.1.
Now, instead of a magic write register for VPM stores we have an
instruction to do them (which means no packing of other ALU ops into it),
with the ability to reorder the VPM stores due to the offset being baked
into the instruction.

VPM loads also gain the ability to be reordered by packing the row into
the A argument.  They also no longer write to the r3 accumulator, and
instead must be stored to a physical register.
2018-01-12 21:54:33 -08:00
Eric Anholt
dfee62eed3 broadcom/vc5: Add support for V3Dv4 signal bits.
The WRTMUC replaces the implicit uniform loads in the first two texture
instructions.  LDVPM disappears in favor of an ALU op.  LDVARY, LDTMU,
LDTLB, and LDUNIF*RF now write to arbitrary registers, which required
passing the devinfo through to a few more functions.
2018-01-12 21:53:45 -08:00
Eric Anholt
8e5a0ed953 broadcom/vc5: Emit flat shade flags for varying components > 24.
This means that with no flatshading we'll emit the single-byte
ZERO_ALL_FLAT_SHADE_FLAGS, and otherwise emit a set of FLAT_SHADE_FLAGS to
get all the bits we need set.

There's a _SET enum in the packet we could use to possibly set entire
ranges of the bitfield without using another packet, but this at least
fixes the conformance failure.
2018-01-03 14:25:23 -08:00
Eric Anholt
2056e4a777 broadcom/vc5: Emit proper flatshading code for glShadeModel(GL_FLAT).
In updating the simulator, behavior changed slightly so that our old code
wasn't getting glxgears's flatshading interpolated right.  Emit flat
shading code just like we would for a normal flat-shaded varying, by
passing a flag in the shader key for glShadeModel(GL_FLAT) state and
customizing the color inputs based on that.
2018-01-03 14:25:23 -08:00
Eric Anholt
ba965084b6 broadcom/vc5: Move texture return channel setup into the compiler.
The compiler decides how many LDTMUs we're going to emit, and that must
match the P1 flags.  This brings the return channel counting to a single
place (so all that's passed into the compiler is "how many return channels
you may request from this texture's format), and was a necessary step for
shadow samplers once we stop using OVRTMUOUT=0.
2018-01-03 14:25:23 -08:00
Eric Anholt
e717e3e7cd broadcom/vc5: Add lowering for txf_ms to a txf on a 2x2-scaled texture.
The HW has no native sampler support for multisample textures, but since
we only need to support txf_ms and the layout is UIF, we just need to
scale up the texcoords and then add in the sample.

This drops the old TEXTURE_MSAA_ADDR special uniform, since we're treating
MSAA textures as textures, rather than basically texbos like VC4 had to.
2017-10-30 13:31:27 -07:00