third_party_mesa3d

Author	SHA1	Message	Date
Iago Toral Quiroga	10d50f2904	v3d: remove unused definitions Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	e540775f0c	v3d: add lowering for OpenGL logic operations This implements support for OpenGL logic operations by emitting code to read from the TLB if needed and blending the fragment output accordingly. It is similar to VC4's blend lowering pass, but exclusive to logic operations, since blending is otherwise supported in hardware. The pass doesn't handle MSAA targets yet. Fixes the following piglit tests: spec/!opengl 1.0/gl-1.0-logicop/* spec/!opengl 1.1/gl-1.1-xor spec/!opengl 1.1/gl-1.1-xor-copypixels It also fixes text cursor rendering in Libreoffice with the GTK+2 theme, which is rendered via glamor using the XOR logic operation. v2: fix checks for allowed variable location and maximum render target (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	7c1d708911	v3d: acquire scoreboard lock before first tlb read Until now we have always been emitting our scoreboard locks on the last thread switch to improve parallelism. We did this by emitting our last thread switch right before our tlb writes at the very end of the program, where we know that we are outside control flow. Unfortunately, this strategy is not valid when we have tlb color reads too, as these will happen before this point in the program and can happen inside control flow. To fix this we always emit a thread switch before the first tlb load and if we see additional thread switches after that point, we change the strategy to lock on the first thread switch. v2: change the solution so it is expected to work in more scenarios (Eric). Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	6af1bdefa9	v3d: fix size of color_reads and sample_colors arrays We need to scale the size of these arrays to consider up to V3D_MAX_DRAW_BUFFERS render targets and 4 components per color. v2: we want to store each color component separately, so scale by 4 too. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	0279ac6e51	v3d: add color formats and swizzles to the fragment shader key We are going to need these very soon to emit correct reads from the tlb to implement logic operations. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	d26b35ba44	v3d: add helpers to emit ldtlb and ldtlbu signals The ldtlbu version will read an implicit uniform with the TLB read specifier and should be used for the first read in a sequence of TLB reads (unless the default configuration is valid, in which case we can use ldtlb). The ldtlb version is used for any subsequent TLB read in the sequence. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 09:16:38 +02:00
Iago Toral Quiroga	9b96ae69bc	v3d: don't emit point coordinates varyings if the FS doesn't read them We still need to emit them in V3D 3.x since there there is no mechanism to disable them. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-06-07 08:29:42 +02:00
Eric Anholt	dc402be73e	v3d: Use the new lower_to_scratch implementation for indirects on temps. We can use the same register spilling infrastructure for our loads/stores of indirect access of temp variables, instead of doing an if ladder. Cuts 50% of instructions and max-temps from 2 KSP shaders in shader-db. Also causes several other KSP shaders with large bodies and large loop counts to not be force-unrolled. The change was originally motivated by NOLTIS slightly modifying register pressure in piglit temp mat4 array read/write tests, triggering register allocation failures.	2019-04-12 16:16:58 -07:00
Eric Anholt	6b1c659825	v3d: Add Compute Shader compilation support. While waiting for the CSD UABI to get reviewed, I keep having to rebase the CS patch. Just land the compiler side for now to keep it from diverging. For now this covers just GLES 3.1 compute shaders, not CL kernels.	2019-04-12 15:59:31 -07:00
Eric Anholt	8f065596d2	v3d: Add an optimization pass for redundant flags updates. Our exec masking introduces lots of redundant flags updates, and even without that there will be cases where NIR comparisons on the same sources for different reasons may generate the same comparison instruction before the selection. total instructions in shared programs: 6492930 -> 6460934 (-0.49%) total uniforms in shared programs: 2117460 -> 2115106 (-0.11%) total spills in shared programs: 4983 -> 4987 (0.08%) total fills in shared programs: 6408 -> 6416 (0.12%)	2019-04-11 09:24:02 -07:00
Eric Anholt	bfed0a7099	v3d: Remove some dead members of struct v3d_compile. These are more vc4 leftovers.	2019-03-21 14:20:50 -07:00
Eric Anholt	16f2770eb4	v3d: Upload all of UBO[0] if any indirect load occurs. The idea was that we could skip uploading the constant-indexed uniform data and just upload the uniforms that are variably-indexed. However, since the VS bin and render shaders may have a different set of uniforms used, this meant that we had to upload the UBO for each of them. The first case is generally a fairly small impact (usually the uniform array is the most space, other than a couple of FSes in shader-db), while the second is a larger impact: 3DMMES2 was uploading 38k/frame of uniforms instead of 18k. Given that the optimization is of dubious value, has a big downside, and is quite a bit of code, just drop it. No change in shader-db. No change on 3DMMES2 (n=15).	2019-03-21 14:20:50 -07:00
Eric Anholt	320e96bace	v3d: Move constant offsets to UBO addresses into the main uniform stream. We'd end up with the constant offset in the uniform stream anyway, since they're bigger than small immediates. Avoids the extra uniforms and adds in the shader in favor of just adding once on the CPU. shader-db: total instructions in shared programs: 6496865 -> 6494851 (-0.03%) total uniforms in shared programs: 2119511 -> 2117243 (-0.11%)	2019-03-21 14:20:50 -07:00
Eric Anholt	c36d2793ec	v3d: Rename v3d_tmu_config_data to v3d_unit_data. I want to reuse this for encoding small constant UBO/SSBO offsets into the uniform stream to reduce the extra uniform loads and adds for the small constant offsets.	2019-03-21 14:20:50 -07:00
Eric Anholt	e8ee1f8eaf	v3d: Eliminate the TLB and TLBU files. We can just use the magic register file like we do for other magic waddrs.	2019-03-05 12:57:39 -08:00
Eric Anholt	110f14d4b4	v3d: Use ldunif instructions for uniforms. The idea is that for repeated use of the same uniform, we could avoid loading it on each consumer. The results look pretty good. total instructions in shared programs: 6413571 -> 6521464 (1.68%) total threads in shared programs: 154214 -> 154000 (-0.14%) total uniforms in shared programs: 2393604 -> 2119629 (-11.45%) total spills in shared programs: 4960 -> 4984 (0.48%) total fills in shared programs: 6350 -> 6418 (1.07%) Once we do scheduling at the NIR level, the register pressure (and thus also instructions) issues we see here will drop back down.	2019-03-05 12:57:39 -08:00
Eric Anholt	4036fce8fd	v3d: Add support for register-allocating a ldunif to a QFILE_TEMP. On V3D 4.x, we can use ldunifrf to load uniforms to any register, and this will let us schedule the ldunif wherever we want in the program.	2019-03-05 12:57:39 -08:00
Eric Anholt	4739181a16	v3d: Switch implicit uniforms over to being any qinst->uniform != ~0. I'm not sure why I didn't do this before -- it's clearly much simpler to add dumping of the extra thing than to have it as another implicit source.	2019-03-05 12:57:39 -08:00
Eric Anholt	2780a99ff8	v3d: Move the stores for fixed function VS output reads into NIR. This lets us emit the VPM_WRITEs directly from nir_intrinsic_store_output() (useful once NIR scheduling is in place so that we can reduce register pressure), and lets future NIR scheduling schedule the math to generate them. Even in the meantime, it looks like this lets NIR DCE some more code and make better decisions. total instructions in shared programs: 6429246 -> 6412976 (-0.25%) total threads in shared programs: 153924 -> 153934 (<.01%) total loops in shared programs: 486 -> 483 (-0.62%) total uniforms in shared programs: 2385436 -> 2388195 (0.12%) Acked-by: Ian Romanick <ian.d.romanick@intel.com> (nir)	2019-03-05 10:59:40 -08:00
Eric Anholt	fd1d22b92e	v3d: Stop treating exec masking specially. In our backend, the successor edges from the blocks only point to where QPU control flow goes, not where the notional control flow goes from a "break" or "continue" modifying the execution mask to resume writing to some channels later. As a result, this attempt at restricting live ranges ended up missing the live range of a value where a conditional break/continue was present in a loop before the later def of a variable. The previous commit ended up fixing the problem that the flag tried to solve. Fixes glsl-vs-loop-continue.shader_test and/or glsl-vs-loop-redundant-condition.shader_test based on register allocation results.	2019-03-05 07:36:24 -08:00
Eric Anholt	c6ae666cf5	v3d: Restrict live intervals to the blocks reachable from any def. In the backend, we often have condition codes on writes to variables, such that there's no screening def anywhere and the previous live ranges algorithm would conclude that the start of the range extends to the start of the program. However, we do know that the live range can only extend as early as you can reach from all blocks writing to the variable. The motivation was that, while we have a couple of hacks to try to promote conditional writes up to being a def within the block, the exec_mask one was broken and needed a replacement. Based on `c3c1aa5aeb` ("intel/fs: Restrict live intervals to the subset possibly reachable from any definition.").	2019-03-05 07:36:24 -08:00
Eric Anholt	97566efe5c	v3d: Rematerialize MOVs of uniforms instead of spilling them. If we have a MOV of a uniform value available to spill, that's one of our best choices. We can just not spill the value, and emit a new load of the uniform as the fill. This saves bothering the TMU and the thrsw, and is the same cost in uniforms (since the spill offset is a uniform anyway). This doesn't have a huge impact on shader-db, since there aren't a whole lot of spills and we usually copy-prop the uniforms at the VIR level such that the only uniform MOVs are from vir_lower_uniforms: total instructions in shared programs: 6430292 -> 6430279 (<.01%) total uniforms in shared programs: 2386023 -> 2385787 (<.01%) total spills in shared programs: 4961 -> 4960 (-0.02%) total fills in shared programs: 6352 -> 6350 (-0.03%) However, I'm interested in dropping the uniforms copy-prop in the backend, since it would be cheaper to not load repeated uniforms if we have the registers to spare. This also saves many spills on dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20, which is what motivated a bunch of my recent backend work in the first place: before: 46 spills, 106 fills, 3062 instructions after: 0 spills, 0 fills, 2611 instructions	2019-02-25 21:33:47 -08:00
Eric Anholt	5a84d46896	v3d: Stop tracking num_inputs for VPM loads. It's unused in the VS (since we need vattr_sizes[] anyway), so move it to FS prog data.	2019-02-18 18:09:07 -08:00
Eric Anholt	581eba072d	v3d: Add a function to describe what the c->execute.file check means. This is what pointed out that we were misusing the check for last_thrsw in the previous commit.	2019-02-18 18:09:07 -08:00
Eric Anholt	441294962c	v3d: Fix the check for "is the last thrsw inside control flow" The execute.file check used to be good enough, until I stopped setting up the execute mask for uniform ifs. No known tests fixed, noticed while doing a refactor. Fixes: `0805060573` ("v3d: Handle dynamically uniform IF statements with uniform control flow.")	2019-02-18 18:09:07 -08:00
Eric Anholt	3022b4bd82	v3d: Kill off vir_PF(), which is hard to use right. You were allowed to pass in any old temp so that you could hopefully fold the PF up into the def of the temp. If we couldn't find one, it implicitly generated a MOV(nop, reg). However, that PF could have different behavior depending on whether the def being folded into was a float or int opcode, which the caller doesn't necessarily control. Due to the fragility of the function, just switch all callers over to vir_set_pf(). This also encourages the callers to use a _dest call for the inst they're putting the PF on, eliminating a bunch of temps in the pre-optimization VIR. shader-db says the change is in the noise: total instructions in shared programs: 6226247 -> 6227184 (0.02%) instructions in affected programs: 851068 -> 852005 (0.11%)	2019-02-18 18:09:06 -08:00
Eric Anholt	4586f9f902	v3d: Add a helper function for getting a nop register. Just a little refactor to explain what's going on with QFILE_NULL.	2019-02-18 18:09:06 -08:00
Eric Anholt	cd5e0b2729	v3d: Use the early_fragment_tests flag for the shader's disable-EZ field. Apparently we need disable-EZ flagged, not just "does Z writes". Fixes dEQP-GLES31.functional.image_load_store.early_fragment_tests.no_early_fragment_tests_depth_fbo on 7278, even though it passed in simulation. Signed-off-by: Eric Anholt <eric@anholt.net> Fixes: `051a41d3d5` ("v3d: Add support for the early_fragment_tests flag.")	2019-02-18 18:09:06 -08:00
Eric Anholt	940501a446	v3d: Fix copy-propagation of input unpacks. I had a single function for "does this do float input unpacking" with two major flaws: It was missing the most common thing to try to copy propagate a f32 input nunpack to (the VFPACK to an FP16 render target) along with several other ALU ops, and also would try to propagate an f32 unpack into a VFMUL which only does f16 unpacks. instructions in affected programs: 659232 -> 655895 (-0.51%) uniforms in affected programs: 132613 -> 135336 (2.05%) and a couple of programs increase their thread counts. The uniforms hit appears to be a pattern in generated code of doing (-a >= a) comparisons, which when a is abs(b) can result in the abs instruction being copy propagated once but not fully DCEed.	2019-02-05 15:46:04 -08:00
Eric Anholt	bdef17b052	v3d: Store the actual mask of color buffers present in the key. If you only bound rt 1+, we'd still emit a write to the rt0 that isn't present (noticed while debugging an ext_framebuffer_multisample-alpha-to-coverage-no-draw-buffer-zero regression in another change).	2019-02-05 15:42:04 -08:00
Eric Anholt	3e743d8cd8	v3d: Avoid duplicating limits defines between gallium and v3d core. We don't want to pull the compiler into every include in the gallium driver, so just make a new little header to store the limits.	2019-01-27 08:30:03 -08:00
Eric Anholt	fe6a21c867	v3d: Fix overly-large vattr_sizes structs. We want one vector size per vector, not per component.	2019-01-27 08:30:03 -08:00
Eric Anholt	f72820c851	v3d: Add support for CS barrier() intrinsics.	2019-01-14 15:40:55 -08:00
Eric Anholt	9b45b06d7c	v3d: Add support for CS shared variable load/store/atomics. CS shared variables are handled effectively as SSBO access to a temporary buffer that will be allocated at CS dispatch time.	2019-01-14 15:40:55 -08:00
Eric Anholt	01d913cf90	v3d: Add support for CS workgroup/invocation id intrinsics. We get a payload for the ivec3 workgroup and an int local invocation index, and we use the core lowering to turn into the global invocation id and the local invocation id ivec3s.	2019-01-14 15:40:55 -08:00
Eric Anholt	6281f26f06	v3d: Add support for shader_image_load_store. This is only exposed on V3D 4.1+, because we didn't have the TMU write operations for images on 3.3 (To do GLES 3.1 there, you have to lower it to SSBO load/stores, which is a problem to solve later).	2019-01-14 15:40:55 -08:00
Eric Anholt	5932c2f0b9	v3d: Add SSBO/atomic counters support. So far I assume that all the buffers get written. If they weren't, you'd probably be using UBOs instead.	2019-01-14 15:40:55 -08:00
Eric Anholt	d2b899c0ec	v3d: Refactor compiler entrypoints. Before, I had per-stage entryoints with some helpers shared between them. As I extended for compute shaders and shader-db, it turned out that the other common code in the middle wanted to be shared too.	2019-01-02 14:12:29 -08:00
Eric Anholt	5e9ee6e841	v3d: Fold comparisons for IF conditions into the flags for the IF. total instructions in shared programs: 6193810 -> 6192844 (-0.02%) instructions in affected programs: 800373 -> 799407 (-0.12%)	2019-01-02 14:12:29 -08:00
Eric Anholt	ebde5afb93	v3d: Move "does this instruction have flags" from sched to generic helpers. I wanted to reuse it for DCE of flags updates.	2018-12-30 08:03:51 -08:00
Eric Anholt	696f63f1b4	v3d: Hook up some shader-db output to GL_ARB_debug_output. This allows the original shader-db project's run.c runner to parse things easily, and is probably a good thing to have for GL_ARB_debug_output in general. I formatted it more like Intel's so I can mostly reuse their report script.	2018-12-30 08:03:51 -08:00
Eric Anholt	d80761b8f3	v3d: Drop shadow comparison state from shader variant key. The shadow state is now in the sampler.	2018-12-20 11:29:30 -08:00
Eric Anholt	00e2cbc049	v3d: Fix the argument type for vir_BRANCH(). Apparently this has been spewing warnings for Jason's clang, but not my gcc.	2018-12-17 09:52:23 -08:00
Eric Anholt	532b6c5671	v3d: Move uniform pretty-printing to its own helper function. I want to reuse it in the QPU dump.	2018-12-14 17:48:01 -08:00
Eric Anholt	bad95bb13c	v3d: Add VIR dumping of TMU config p0/p1. I had a bit of it for V3D 3.x, but didn't update it for 4.x.	2018-12-07 16:48:23 -08:00
Eric Anholt	5932575299	v3d: Garbage collect unused uniforms code.	2018-12-07 16:48:23 -08:00
Eric Anholt	acecee4c2d	v3d: Return the right gl_SampleMaskIn[] value. It's supposed to be the dispatched sample mask for this pixel, not the GL state's sample mask.	2018-12-07 16:48:23 -08:00
Eric Anholt	6870111051	v3d: Fix a comment typo	2018-12-07 16:48:23 -08:00
Eric Anholt	42652ea51e	v3d: Use combined input/output segments. The HW apparently has some issues (or at least a much more complicated VCM calculation) with non-combined segments, and the closed source driver also uses combined I/O. Until I get the last CTS failure resolved (which does look plausibly like some VPM stomping), let's use combined I/O too.	2018-12-07 16:48:23 -08:00
Eric Anholt	1561e4984e	v3d: Emit the VCM_CACHE_SIZE packet. This is needed to ensure that we don't get blocked waiting for VPM space with bin/render overlapping. Cc: "18.2" <mesa-stable@lists.freedesktop.org>	2018-08-06 13:03:23 -07:00

1 2 3 4

186 Commits