third_party_mesa3d

Author	SHA1	Message	Date
Iago Toral Quiroga	1d021539a2	broadcom/compiler: track pipelineable ldvary sequences If we have two (or more) smooth varyings like this: nop t3; ldvary.rf0 fmul t5, t3, t0 fadd t6, t5, r5 nop t7; ldvary.rf0 fmul t9, t7, t0 fadd t10, t9, r5 nop t11; ldvary.rf0 fmul t13, t11, t0 fadd t14, t13, r5 We may be able to pipeline them like this: nop ; nop ; ldvary.r4 nop ; fmul r0, r4, rf0 ; ldvary.r1 fadd rf13, r0, r5 ; fmul r2, r1, rf0 ; ldvary.r3 fadd rf12, r2, r5 ; fmul r4, r3, rf0 ; ldvary.r0 But in order to do this, we will need to manually tweak the QPU scheduling. This patch tracks information about ldvary sequences that are good candidates for pipelining, and a follow-up patch will use this information to pipeline them when we emit the QPU code. v2 (apinheiro): - Rename the v3d_compile fields to avoid confusion with the qinst fields. - Assert that a sequence's start instruction is not the same as the end. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9304>	2021-03-02 07:56:00 +01:00
Eric Anholt	60573b443b	v3d: Replace driver lowering of GL_CLAMP with mesa/st's. Mesa core can do this logic for us now. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9228>	2021-02-24 18:03:46 +00:00
Iago Toral Quiroga	54c17e45ae	broadcom/compiler: skip unnecessary unifa writes If a new UBO load happens to read exactly at the offset right after the previous UBO load (something that is fairly common, for example when reading a matrix), we can skip the unifa write (with its 3 delay slots) and just continue to call ldunifa to continue reading consecutive addresses. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>	2021-02-23 08:08:01 +00:00
Iago Toral Quiroga	e1cf2406da	broadcom/compiler: add a constant alu optimization pass Currently this is useful to clean up after DCEing leading ldunifa instructions, but it can be expanded to handle more cases which may allow to simplify the compiler code in places where we have been trying to optimize manually for similar cases. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>	2021-02-23 08:08:01 +00:00
Iago Toral Quiroga	14af7b3085	broadcom/compiler: don't emit redundant ldunif If we emit a new uniform and that uniform has already been emitted in the same block we can just reuse that. There is a balancing game here between reducing ldunif instructions and not increasing register pressure too much though, so we put a limit to how far back we are willing to look for a previous definition of the uniform. Based on shader-db results, 20 instructions produces best results. total instructions in shared programs: 14928266 -> 14907432 (-0.14%) instructions in affected programs: 6431841 -> 6411007 (-0.32%) helped: 15270 HURT: 10772 Instructions are helped. total uniforms in shared programs: 3944672 -> 3840276 (-2.65%) uniforms in affected programs: 1827184 -> 1722788 (-5.71%) helped: 30423 HURT: 845 Uniforms are helped. total inst-and-stalls in shared programs: 14957813 -> 14936873 (-0.14%) inst-and-stalls in affected programs: 6475349 -> 6454409 (-0.32%) helped: 15287 HURT: 10852 Inst-and-stalls are helped. v2 (Eric): - consider ldunifrf too - check that no other instruction writes to the register Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9077>	2021-02-17 09:01:01 +01:00
Iago Toral Quiroga	f85fcaa494	broadcom/compiler: pass a devinfo to check if an instruction writes to TMU V3D 3.x has V3D_QPU_WADDR_TMU which in V3D 4.x is V3D_QPU_WADDR_UNIFA (which isn't a TMU write address). This change passes a devinfo to any functions that need to do these checks so we can account for the target V3D version correctly. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8980>	2021-02-12 08:24:21 +00:00
Arcady Goldmints-Orlov	9909fe6bac	broadcom/compiler: Skip bool_to_cond where possible This change keeps track of when a boolean temp is loaded into the flags by a comparison instruction and uses that information to skip emitting instructions to set the flags in ntq_emit_bool_to_cond when the flags already have the right contents. total instructions in shared programs: 11116502 -> 11112225 (-0.04%) instructions in affected programs: 631691 -> 627414 (-0.68%) helped: 1591 HURT: 754 helped stats (abs) min: 1 max: 94 x̄: 4.14 x̃: 3 helped stats (rel) min: 0.11% max: 13.46% x̄: 2.10% x̃: 1.58% HURT stats (abs) min: 1 max: 19 x̄: 3.07 x̃: 2 HURT stats (rel) min: 0.13% max: 19.67% x̄: 1.88% x̃: 1.15% 95% mean confidence interval for instructions value: -2.02 -1.63 95% mean confidence interval for instructions %-change: -0.94% -0.71% Instructions are helped. total uniforms in shared programs: 3281555 -> 3281513 (<.01%) uniforms in affected programs: 1754 -> 1712 (-2.39%) helped: 10 HURT: 5 helped stats (abs) min: 1 max: 19 x̄: 7.90 x̃: 5 helped stats (rel) min: 0.56% max: 11.11% x̄: 7.37% x̃: 11.05% HURT stats (abs) min: 1 max: 15 x̄: 7.40 x̃: 3 HURT stats (rel) min: 0.64% max: 9.55% x̄: 5.31% x̃: 3.41% 95% mean confidence interval for uniforms value: -8.57 2.97 95% mean confidence interval for uniforms %-change: -7.35% 1.07% Inconclusive result (value mean confidence interval includes 0). total max-temps in shared programs: 1758419 -> 1758174 (-0.01%) max-temps in affected programs: 7006 -> 6761 (-3.50%) helped: 290 HURT: 14 helped stats (abs) min: 1 max: 8 x̄: 1.13 x̃: 1 helped stats (rel) min: 0.79% max: 22.86% x̄: 6.61% x̃: 4.88% HURT stats (abs) min: 1 max: 13 x̄: 6.00 x̃: 3 HURT stats (rel) min: 1.54% max: 54.17% x̄: 23.99% x̃: 9.12% 95% mean confidence interval for max-temps value: -1.03 -0.58 95% mean confidence interval for max-temps %-change: -6.24% -4.16% Max-temps are helped. total sfu-stalls in shared programs: 23676 -> 23610 (-0.28%) sfu-stalls in affected programs: 1578 -> 1512 (-4.18%) helped: 257 HURT: 252 helped stats (abs) min: 1 max: 3 x̄: 1.37 x̃: 1 helped stats (rel) min: 11.11% max: 100.00% x̄: 46.70% x̃: 40.00% HURT stats (abs) min: 1 max: 2 x̄: 1.14 x̃: 1 HURT stats (rel) min: 0.00% max: 200.00% x̄: 41.65% x̃: 25.00% 95% mean confidence interval for sfu-stalls value: -0.25 -0.01 95% mean confidence interval for sfu-stalls %-change: -8.24% 2.33% Inconclusive result (%-change mean confidence interval includes 0). total inst-and-stalls in shared programs: 11140178 -> 11135835 (-0.04%) inst-and-stalls in affected programs: 633972 -> 629629 (-0.69%) helped: 1581 HURT: 755 helped stats (abs) min: 1 max: 94 x̄: 4.26 x̃: 3 helped stats (rel) min: 0.11% max: 13.46% x̄: 2.12% x̃: 1.59% HURT stats (abs) min: 1 max: 17 x̄: 3.17 x̃: 2 HURT stats (rel) min: 0.05% max: 19.67% x̄: 1.93% x̃: 1.20% 95% mean confidence interval for inst-and-stalls value: -2.06 -1.66 95% mean confidence interval for inst-and-stalls %-change: -0.93% -0.70% Inst-and-stalls are helped. Reviewed-by: Iago Toral Quioroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>	2021-02-12 07:05:33 +00:00
Arcady Goldmints-Orlov	8762f29e9c	broadcom/compiler: Add a v3d_compile argument to vir_set_[pu]f Reviewed-by: Iago Toral Quioroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8933>	2021-02-12 07:05:33 +00:00
Eric Anholt	bcb5f9f94a	v3d: Stop advertising support for flat shading. The GL frontend can lower this weird GL feature away for us. This should fix redeclaration of the gl_Color/SecondaryColor as centroid, since that case had been missed in the !flat special case here. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Eric Anholt	ff805f8ac7	v3d: Stop advertising support for PIPE_CAP_*_COLOR_CLAMPED. The GL frontend can lower away this deprecated GL feature for us. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Eric Anholt	2992dc7386	v3d: Stop advertising support for PIPE_CAP_TWO_SIDED_COLOR. The GL frontend can lower away this deprecated GL feature for us. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Eric Anholt	5ddc2f916f	v3d: Clean up vestiges of alpha test lowering. We had an unnecessary case in our uniforms upload switch statement, since we no longer advertise the cap. Fixes: `8ad931808e` ("v3d: do not report alpha-test as supported") Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>	2021-02-09 20:06:48 -08:00
Iago Toral Quiroga	6630825dcf	broadcom/compiler: let QPUs stall on TMU input/config overflows We have been trying to avoid this by tracking fifo usages in the driver and flushing all outstanding TMU sequences if we overflowed any of these, however, this is actually not the most efficient strategy. Instead, we would like to flush only enough operations to get things going again, which is better for pipelining. Doing that in the driver would require some additional work, but thankfully, it is not required, since this seems to be what the hardware does automatically, so we can just remove overflow tracking for these two fifos and enjoy the benefits. This also further improves shader-db stats: total instructions in shared programs: 8975062 -> 8955145 (-0.22%) instructions in affected programs: 1637624 -> 1617707 (-1.22%) helped: 4050 HURT: 2241 Instructions are helped. total threads in shared programs: 236802 -> 237042 (0.10%) threads in affected programs: 252 -> 492 (95.24%) helped: 122 HURT: 2 Threads are helped. total sfu-stalls in shared programs: 19901 -> 19592 (-1.55%) sfu-stalls in affected programs: 4744 -> 4435 (-6.51%) helped: 1248 HURT: 1051 Sfu-stalls are helped. total inst-and-stalls in shared programs: 8994963 -> 8974737 (-0.22%) inst-and-stalls in affected programs: `1636184` -> 1615958 (-1.24%) helped: 4050 HURT: 2239 Inst-and-stalls are helped. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	e18d6bbf2f	broadcom/compiler: disable TMU pipelining if we fail to register allocate TMU pipelining can severely reduce our capacity to emit TMU spills, causing us to fail to compile a shader we may otherwise be able to compile. This is because pipelining extends the liveness of TMU sequences by posponing the thread switch and LDTMU until a result is needed, and we can't emit TMU spills while in the middle of a TMU sequence. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	be45960d3e	broadcom/compiler: support pipelining of tex instructions This follows the same idea as for TMU general instructions of reusing the existing infrastructure to first count required register writes and flush outstanding TMU dependencies, and then emit the actual writes, which requires that we split the code that decides about register writes to a helper. We also need to start using a component mask instead of the number of components that we need to read with a particular TMU operation. v2: update tmu_writes for V3D_QPU_WADDR_TMUOFF Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Iago Toral Quiroga	197090a3fc	broadcom/compiler: implement pipelining for general TMU operations This creates the basic infrastructure to implement TMU pipelining and applies it to general TMU. Follow-up patches will expand this to texture and image/load store operations. TMU pipelining means that we don't immediately end TMU sequences, and instead, we postpone the thread switch and LDTMU (for loads) or TMUWT (for stores) until we really need to do them. For loads, we may need to flush them if another instruction reads the result of a load operation. We can detect this because in that case ntq_get_src() will not find the definition for that ssa/reg (since we have not emitted the LDTMU instructions for it yet), so when that happens, we flush all pending TMU operations and then try again to find the definition for the source. We also need to flush pending TMU operations when we reach the end of a control flow block, to prevent the case where we emit a TMU operation in a block, but then we read the result in another block possibly under control flow. It is also required to flush across barriers and discards to honor their semantics. Since this change doesn't implement pipelining for texture and image load/store, we also need to flush outstanding TMU operations if we ever have to emit one of these. This will be corrected with follow-up patches. Finally, the TMU has 3 fifos where it can queue TMU operations. These fifos have limited capacity, depending on the number of threads used to compile the shader, so we also need to ensure that we don't have too many outstanding TMU requests and flush pending TMU operations if a new TMU operation would overflow any of these fifos. While overflowing the Input and Config fifos only leads to stalls (which we want to avoid anyway), overflowing the Output fifo is incorrect and would end up with a broken shader. This means that we need to know how many TMU register writes are required to emit a TMU operation and use that information to decide if we need to flush pending TMU operations before we emit any register writes for the new TMU operation. v2: fix TMU flushing for NIR registers reads (jasuarez) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8825>	2021-02-04 10:33:10 +00:00
Arcady Goldmints-Orlov	79bde75131	broadcom/compiler: Emit uniform loops using uniform control flow Similarly to if statements, uniform loops are now emitted without predication, using simple branches for breaks and continues. The uniformity of the loop is determined by running the nir_divergence_analysis pass. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7726>	2021-02-01 08:11:48 +00:00
Alejandro Piñeiro	429c336412	broadcom/compiler: separate texture/sampler info from v3d_key So far the v3d compiler has them combined, as for OpenGL both are the same. This change is intended to fit the v3d compiler better with Vulkan, where they are separate concepts. Note that NIR has them separate for a long time, both on nir_variable and on some NIR lowerings. v2: (from Iago feedback) * Use key->num_tex/sampler_used to iterate through the array * Fill up num_samplers_used on v3d, assert that is the same that num_tex_used if possible. v3: (Iago) * Assert num_tex/samplers_used is smaller that tex/sampler array size. v4: Update assert mentioned on v3 to use <= instead of < (detected by CI) Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> squash! broadcom/compiler: separate texture/sampler info from v3d_key Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7545>	2020-11-14 15:59:02 +00:00
Juan A. Suarez Romero	1e723745dd	v3d/compiler: extend swapping R/B support to all vertex attributes So far the support for R/B swapping in vertex attributes were for the generic attributes. But there are cases like glSecondaryColorPointer() supporting BGRA formats that require the R/B swapping to be also allowed in the non-generic vertex attributes (in this case, in the COLOR1 attribute). v2: - Don't split line (Iago) Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7196>	2020-11-05 12:15:28 +00:00
Alejandro Piñeiro	09b2bd1df9	broadcom/compiler: remove v3d_fs_key depth_enabled field. It is not used right now, so keeping it adds some noise/confusion. So far configuring Z test are done through the CFG_BITS. See v3dX(emit_state) at v3dx_emit.c for v3d, and pack_cfg_bits at v3dv_pipeline.c for v3dv. There flags like z_updates_enable and others are filled up. That key field seems like a leftover coming from using vc4 as reference, as that driver defines and uses a field with name name. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7421>	2020-11-03 10:55:08 +00:00
Iago Toral Quiroga	3ec165bce9	broadcom/compiler: track partially interpolated fragment inputs We will need these to implement GLSL's interpolateAt*() functions where we are required to perform interpolation in the shader at arbitrary offsets. Acked-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7155>	2020-10-15 02:04:04 +02:00
Iago Toral Quiroga	7eb8eb10f6	v3d/compiler: allow to batch spills Some shaders that need to spill hundreds of registers can take very long times to compile as each allocation attempt spills a single register and restarts the allocation process. We can significantly cut down these times if we allow the compiler to spill in batches, which should be possible if we are spilling uniforms, which is in fact the kind of spills that we do first because they have lower cost than TMU spills. Doing this could cause us to slightly over spill in some cases (depending on the chosen batch size) leading to slightly worse performance, so we only enable this behavior after we have started to spill over a certain threshold, at which point we assume that performance won't be good and we want to favor compilation speed instead. v2: - Keep it simple and just try to spill a fixed amount of registers in a batch instead of trying to compute this dynamically based on accumulated spills and current register pressure. (Eric). v3: - Check if the node is valid before doing anything with it. - Drop the environment variable to select batch size and just fix it to 20. With this we can take this CTS test from 35 minutes down to about 3 minutes: dEQP-VK.ssbo.layout.random.all_shared_buffer.5 Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:33 +00:00
Iago Toral Quiroga	23c727dd67	v3d/compiler: add a lowering pass for robust buffer access Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:33 +00:00
Iago Toral Quiroga	4401dde0e9	broadcom/compiler: rename QUNIFORM_GET_BUFFER_SIZE to QUNIFORM_GET_SSBO_SIZE Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:33 +00:00
Iago Toral Quiroga	d93d903a37	v3d/compiler: implement nir_intrinsic_get_ubo_size Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:33 +00:00
Iago Toral Quiroga	644a15e69e	v3dv: implement nir_texop_texture_samples Fixes: dEQP-VK.glsl.texture_functions.query.texturesamples.* Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:32 +00:00
Iago Toral Quiroga	1c4c7d95f7	broadcom/compiler: track if the fragment shader forces per-sample MSAA For example, regarding gl_SampleID, the GLSL spec states: "Any static use of this variable in a fragment shader causes the entire shader to be evaluated per-sample." So we need to track if the fragment shader does anything that implicitly enables per-sample shading in the compiler for the driver to auto-enable sample rate shading if needed. v2: - Instead of tracking reads of gl_SampleID, check SYSTEM_BIT_SAMPLE_ID and SYSTEM_BIT_SAMPLE_POS as well as the sample layout qualifier like other drivers are doing to activate this behavior (Eric). Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v1) Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:32 +00:00
Iago Toral Quiroga	5a2ef59963	v3d/compiler: support swapping R/B channels in vertex attributes. We will need this in Vulkan to support vertex format VK_FORMAT_B8G8R8A8_UNORM. The hardware doesn't allow to swizzle vertex attribute components, so we need to do it in the shader. v2: - Use nir_intrinsic_io_semantics() to retrieve the location instead of looping through the shader input variables (Eric). - Assert that we only have one component (Eric). Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> (v1) Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:31 +00:00
Iago Toral Quiroga	f41857eb48	v3d/compiler: implement nir_intrinsic_load_base_instance Vulkan lowers gl_InstanceIndex to load_base_instance + load_instance_id, so we need to implement loading the base instance in the compiler. The base instance is set by the BASE_VERTEX_BASE_INSTANCE command right before the instanced draw call and it is included in the VPM payload together with the InstanceID and VertexID if this is requested by the shader record. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:29 +00:00
Iago Toral Quiroga	1f41a128e0	v3d/compiler: implement nir_op_fquantize2f16 Reviewd-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:28 +00:00
Alejandro Piñeiro	c8212731e7	v3d/compiler: handle GL/Vulkan differences in uniform handling This also adds a v3d_execution_environment, so compiler could know if it is generating code for OpenGL or Vulkan needs. Reviewed-by: Iago Toral <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:27 +00:00
Alejandro Piñeiro	62ca997476	v3d/compiler: num_tex_used on v3d_key We would need on OpenGL to update values for all the textures used. On OpenGL that value can be always took from the context or the nir shader, but there are cases on Vulkan that it is not the case, or would force up to recompute it. Acked-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>	2020-10-13 21:21:25 +00:00
Alejandro Piñeiro	8de380d26a	broadcom/compiler: add V3D_DEBUG_RA option To ask to debug a registr allocation failure (V3D_DEBUG_REGISTER_ALLOCATION seemed too long to me). When a fallback register allocation algorithm was added, if the register allocation fails, it only dumpg the current vir with the register pressure info with the failed fallback. But if we want do debug the problem, we would be interested on both. Additionally, it was strange that we got the full vir dump with the failure even if no debug option was set. Additionally we add shaderdb like stats for those failures, to make easier to compare one and the other. v2: keep a small warning message in case both register allocation algorithms fails (Neil) Reviewed-by: Neil Roberts <nroberts@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6999>	2020-10-07 20:21:17 +00:00
Arcady Goldmints-Orlov	bd87cdad18	broadcom/compiler: support nir_intrinsic_load_sample_id This adds support for the intrinsic as well as the vir_SAMPID instruction that corresponds to it in vir. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6356>	2020-08-20 20:14:14 +00:00
Alejandro Piñeiro	bd38ea77e8	v3d/compiler: add v3dv_prog_data_size helper Main use case is to help to implement Vulkan PipelineCache, as we are serializing/deserializing the prog_data too. Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6078>	2020-08-19 22:50:21 +02:00
Neil Roberts	de5130fea0	v3d: Retry with the fallback scheduler when RA fails v3d_compile is now split out into a helper function that gets called a second time if compilation fails the first time with the result reporting the register allocation failed. The second time it is run with the fallback scheduler to try and increase the chances of successfully allocating the registers. v2: Add a performance debug message when using the fallback scheduler. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>	2020-07-24 12:27:07 +02:00
Neil Roberts	1c8167da61	v3d: Changed v3d_compile:failed to an enum Instead of just having a bool status for the failure, there is now an enum so that the compilation can report a more detailed status. Currently this is only used to report whether the failure was due to failed register allocation. The “failed” bool doesn’t seem to actually have been used anywhere so this doesn’t really change a lot. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5953>	2020-07-24 12:27:07 +02:00
Neil Roberts	ee4d51f8b2	v3d: Add a lowering pass for line smoothing When line smoothing is enabled, the driver now increases the width of the line so that it can add some semi-transparent pixels to either side of the line. A lowering pass is added which modifies the alpha component of every write to fragment output 0 so that if the fragment is outside the width of the line then the alpha is reduced. It additionally discards fragments that are completely invisible. It might seem bad to use discard on a tiled renderer but the assumption is that any bad effects from using discard will also happen anyway because of enabling alpha blending. v2: Disable the line smoothing pass entirely when the framebuffer contains an integer colour output or one with no alpha channel. Calculate the coverage once upfront and store in a global variable instead of calculating each time an output write is modified. Also do the conditional discard once upfront. v3: Don’t check whether the output buffer has an alpha channel. Only look at output 0. Use aa_line_width intrinsic instead of calculating the real line width in the shader. Clamp the coverage as part of the global variable, not per output write. Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>	2020-07-06 21:59:16 +00:00
Neil Roberts	207da33a86	v3d: Handle the line width intrinsics Adds new QUNIFORMs to store the line widths. v2: Also handle the aa_line_width intrinsic Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>	2020-07-06 21:59:16 +00:00
Neil Roberts	dab8a9169c	v3d: Add missing macro for stvpmd instruction stvpmd is like stvpmv but it scatters the output. It can be used with non-dynamically uniform offsets. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5621>	2020-06-26 09:36:15 +02:00
Neil Roberts	0a18c935e1	v3d: Remove unused member of v3d_compile It looks like gs_input_sizes was added when GS shaders were implemented but it was never used anywhere. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5561>	2020-06-22 08:23:06 +02:00
Iago Toral Quiroga	6c7a2b69f8	v3d: handle writes to gl_Layer from geometry shaders When geometry shaders write a value to gl_Layer that doesn't correspond to an existing layer in the target framebuffer the rendering behavior is undefined according to the spec, however, there are CTS tests that trigger this scenario on purpose, probably to ensure that nothing terrible happens. For V3D, this situation is problematic because the binner uses the layer index to select the offset to write into the tile state data, and we only allocate tile state for MAX2(num_layers, 1), so we want to make sure we don't produce values that would lead to out of bounds writes. The simulator has an assert to catch this, although we haven't observed issues in actual hardware it is probably best to play safe. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-12-16 08:42:37 +01:00
Iago Toral Quiroga	76fc8c8bb1	v3d: compute appropriate VPM memory configuration for geometry shader workloads Geometry shaders can output many vertices and thus have higher VPM memory pressure as a result. It is possible that too wide geometry shader dispatches exceed the maximum available VPM output allocated, in which case we need to reduce the dispatch width until we can fit the VPM memory requirements. Supported dispatch widths for geometry shaders are 16, 8, 4, 1. There is a limit in the number of VPM output sectors that can be used by a geometry shader that we can meet by lowering the dispatch width at compile time, however, at draw time we need to revisit this number and, together with other elements that can contribute to total VPM memory requirements, decide on a configuration that can fit the program into the available VPM memory. Ideally, we also want to aim for not using more than half of the available memory so we that we can run a pair of bin and render programs in parallel. v2: fixed language in comment and typo in commit log. (Alejandro) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-12-16 08:42:37 +01:00
Iago Toral Quiroga	4f5fbd6490	v3d: implement geometry shader instancing v2: - Remove unused field uses_iid from v3d_gs_prog_data (Alejandro) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-12-16 08:42:37 +01:00
Iago Toral Quiroga	5d578c27ce	v3d: add initial compiler plumbing for geometry shaders Most of the relevant work happens in the v3d_nir_lower_io. Since geometry shaders can write any number of output vertices, this pass injects a few variables into the shader code to keep track of things like the number of vertices emitted or the offsets into the VPM of the current vertex output, etc. This is also where we handle EmitVertex() and EmitPrimitive() intrinsics. The geometry shader VPM output layout has a specific structure with a 32-bit general header, then another 32-bit header slot for each output vertex, and finally the actual vertex data. When vertex shaders are paired with geometry shaders we also need to consider the following: - Only geometry shaders emit fixed function outputs. - The coordinate shader used for the vertex stage during binning must not drop varyings other than those used by transform feedback, since these may be read by the binning GS. v2: - Use MAX3 instead of a chain of MAX2 (Alejandro). - Make all loop variables unsigned in ntq_setup_gs_inputs (Alejandro) - Update comment in IO owering so it includes the GS stage (Alejandro) Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-12-16 08:42:37 +01:00
Iago Toral Quiroga	6e68f74395	v3d: add missing plumbing for VPM load instructions We will need to use LDVPMG_IN specifically to read VPM inputs in geometry shaders. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>	2019-12-16 08:42:37 +01:00
Iago Toral Quiroga	e7e501efce	v3d: rename vertex shader key (num)_fs_inputs fields Until now this made sense because we always paired vertex shaders with fragment shaders, but as soon as we implement geometry and tessellation shaders that will no longer be the case, so rename this to (num_)used_outputs. v2: Use 'used_outputs' instead of ns_outputs, which is more explicit (Eric). Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2019-10-31 08:46:35 +00:00
Iago Toral Quiroga	46182fc1da	v3d: add new flag dirty TMU cache at v3d_compiler That we set for any TMU write on spills and general tmu. It is then used as part of v3d_emit_gl_shader_state later. v2: add a new flag instead at v3d_compiler instead of dirty the flag at v3dx if there is any spill (change suggested by Eric, added by Alejandro) v3: set this for anything that is not a load and do it also in v3d40_vir_emit_image_load_store (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-10-18 14:08:52 +02:00
Jose Maria Casanova Crespo	c341ab7ffb	v3d: add shader-db stat to count SFU stalls SFU operations have a latency of 2 cicles, so if their results are used in the following cycle to a SFU instruction, the GPU stalls for an extra cycle until the result is available. This adds the number of stalls to the shader-db debug mode and sum of instruction + stalls to evaluate optimizations to schedule instructions that avoid generating sfu-stalls. v2: Rename v3d_qpu_generates_sfu_stalls to v3d_qpu_instr_is_sfu (Eric) Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-22 03:00:50 +02:00
Alejandro Piñeiro	934ce48db8	v3d: use inc/dec tmu operation with atomic sub/add of 1 Among other things, this avoid the need of loading 1/-1 constants (so one less operation). The removed comment suggest the option of adding support on NIR for inc/dec. Intel just uses an auxiliar method to get which hw operation is needed, so no lowering is needed. And at the same time, being so small, seems unreasonable to try to add a general one on NIR itself. It is more easy to just adapt the method here (that is what the patch does right now). It is worth to note that we are not getting any change on shader-db stats because all those methods are used on the usual shader-db set with shaders needing GLSL > 4.2. In general there aren't too many GLSL ES 3.1 tests. As an alternative, we captured the GLES3/GLSL31/GLS32 used on vk-gl-cts, even if that is not a real life usage of shaders. With those we get the following: total instructions in shared programs: 1217022 -> 1217013 (<.01%) instructions in affected programs: 117 -> 108 (-7.69%) helped: 6 HURT: 0 helped stats (abs) min: 1 max: 2 x̄: 1.50 x̃: 1 helped stats (rel) min: 3.57% max: 10.00% x̄: 8.09% x̃: 9.09% 95% mean confidence interval for instructions value: -2.07 -0.93 95% mean confidence interval for instructions %-change: -10.54% -5.64% Instructions are helped. Note that the shaders helped are really low because most of the vk-gl-cts tests using AtomicInc/Dec/Add are mostly used on compute shaders. Although right now there is a branch around with CS support, the usual is doing the stats against master. Reviewed-by: Eric Anholt <eric@anholt.net>	2019-07-12 11:48:40 +02:00

1 2 3 4

186 Commits