third_party_mesa3d

Author	SHA1	Message	Date
Matt Turner	1d9f74eda7	glsl: Rebalance expression trees that are reduction operations. The intention of this pass was to give us better instruction scheduling opportunities, but it unexpectedly reduced some instruction counts as well: total instructions in shared programs: 1666639 -> 1666073 (-0.03%) instructions in affected programs: 54612 -> 54046 (-1.04%) (and trades 4 SIMD16 programs in SS3)	2014-06-19 16:11:51 -07:00
Ilia Mirkin	31b92aa2fc	glsl: add lowering passes for carry/borrow Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-05-02 12:01:35 -04:00
Ian Romanick	7016afe25d	glsl: Remove varying "base" parameters In February 2013 Paul unified the values used for shader stage outputs and shader stage inputs. See commits 8a076c5f0^..eed6baf76. Since that time, the location_base parameters are always VARYING_SLOT_VAR0. Instead of passing that around, just hard code it. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2014-05-02 07:16:54 -07:00
Kenneth Graunke	da22221aa3	glsl: Drop do_common_optimization's max_unroll_iterations parameter. Now that we pass in gl_shader_compiler_options, it makes sense to just use options->MaxUnrollIterations, rather than passing a separate parameter. Half of the invocations already passed options->MaxUnrollIterations, while the other half passed in a hardcoded value of 32. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2014-04-11 17:41:37 -07:00
Kenneth Graunke	73f80c20f6	glsl: Pass ctx->Const.NativeIntegers to do_algebraic. The next patch will introduce an optimization that only works when integers are not represented as floating point values. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-04-08 00:02:06 -07:00
Kenneth Graunke	169c645f12	glsl: Pass ctx->Const.NativeIntegers to do_common_optimization(). The next few patches will introduce an optimization that only works when integers are not represented as floating point values. v2: Re-word-wrap a line, as requested by Ian Romanick. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-04-08 00:02:03 -07:00
Kenneth Graunke	ac0a8b9540	glsl: Delete LRP_TO_ARITH lowering pass flag. Tt's kind of a trap---calling do_common_optimization() after lower_instructions() may cause opt_algebraic() to reintroduce ir_triop_lrp expressions that were lowered, effectively defeating the point. Because of this, nobody uses it. v2: Delete more code (caught by Ian Romanick). Cc: "10.1" <mesa-stable@lists.freedesktop.org> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Acked-by: Eric Anholt <eric@anholt.net>	2014-02-26 02:16:56 -08:00
Dave Airlie	122c3b9486	glsl/i965: move lower_offset_array up to GLSL compiler level. This lowering pass will be useful for gallium drivers as well, in order to support the GL TG4 oddity that is textureGatherOffsets. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz> Signed-off-by: Dave Airlie <airlied@redhat.com>	2014-02-25 13:28:57 +10:00
Matt Turner	4bd6e0d7c6	glsl: Vectorize multiple scalar assignments Reduces vertex shader instruction counts in DOTA2 by 6.42%, L4D2 by 4.61%, and CS:GO by 5.71%. total instructions in shared programs: 1500153 -> 1498191 (-0.13%) instructions in affected programs: 59919 -> 57957 (-3.27%) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-01-21 14:20:44 -08:00
Paul Berry	088494aa03	glsl/loops: Get rid of lower_bounded_loops and ir_loop::normative_bound. Now that loop_controls no longer creates normatively bound loops, there is no need for ir_loop::normative_bound or the lower_bounded_loops pass. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2013-12-09 10:55:09 -08:00
Paul Berry	2c17f97fe6	glsl/loops: consolidate bounded loop handling into a lowering pass. Previously, all of the back-ends (ir_to_mesa, st_glsl_to_tgsi, and the i965 fs and vec4 visitors) had nearly identical logic for handling bounded loops. This replaces the duplicate logic with an equivalent lowering pass that is used by all the back-ends. Note: on i965, there is a slight increase in instruction count. For example, a loop like this: for (int i = 0; i < 100; i++) { total += i; } would previously compile down to this (vec4) native code: mov(8) g4<1>.xD 0D mov(8) g8<1>.xD 0D loop: cmp.ge.f0(8) null g8<4;4,1>.xD 100D (+f0) break(8) add(8) g5<1>.xD g5<4;4,1>.xD g4<4;4,1>.xD add(8) g8<1>.xD g8<4;4,1>.xD 1D add(8) g4<1>.xD g4<4;4,1>.xD 1D while(8) loop After this patch, the "(+f0) break(8)" turns into: (+f0) if(8) break(8) endif(8) because the back-end isn't smart enough to recognize that "if (condition) break;" can be done using a conditional break instruction. However, it should be relatively easy for a future peephole optimization to properly optimize this. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2013-12-09 10:54:26 -08:00
Eric Anholt	fd05ede0d0	glsl: Add a CSE pass. This only operates on constant/uniform values for now, because otherwise I'd have to deal with killing my available CSE entries when assignments happen, and getting even this working in the tree ir was painful enough. As is, it has the following effect in shader-db: total instructions in shared programs: 1524077 -> 1521964 (-0.14%) instructions in affected programs: 50629 -> 48516 (-4.17%) GAINED: 0 LOST: 0 And, for tropics, that accounts for most of the effect, the FPS improvement is 11.67% +/- 0.72% (n=3). v2: Use read_only field of the variable, manually check the lod_info union members, use get_num_operands(), rename cse_operands_visitor to is_cse_candidate_visitor, move all is-a-candidate logic to that function, and call it before checking for CSE on a given rvalue, more comments, use private keyword. Reviewed-by: Paul Berry <stereotype441@gmail.com>	2013-11-01 10:25:33 -07:00
Matt Turner	d0b8ea60b7	glsl: Add ldexp_to_arith lowering pass. Reviewed-by: Paul Berry <stereotype441@gmail.com>	2013-09-17 16:59:23 -07:00
Marek Olšák	d13003f544	glsl: don't eliminate texcoords that can be set by GL_COORD_REPLACE Tested by examining generated TGSI shaders from piglit/glsl-routing. Cc: mesa-stable@lists.freedesktop.org Reviewed-by: Henri Verbeet <hverbeet@gmail.com> Tested-by: Henri Verbeet <hverbeet@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2013-08-18 12:27:08 +02:00
Paul Berry	3b0cf7027d	glsl/linker: Properly pack GS input varyings. Since geometry shader inputs are arrays (where the array index indicates which vertex is being examined), varying packing needs to treat them differently. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2013-08-01 20:22:59 -07:00
Vinson Lee	21f97446f4	glsl: Remove comma at end of enumerator list. Fixes this build error on OpenBSD 5.3. In file included from ../../src/mesa/main/ff_fragment_shader.cpp:53: ./../glsl/ir_optimization.h:64: error: comma at end of enumerator list Signed-off-by: Vinson Lee <vlee@freedesktop.org>	2013-07-17 20:57:54 -07:00
Marek Olšák	b3d8b4c0b4	glsl/linker: eliminate unused and set-but-unused built-in varyings This eliminates built-in varyings such as gl_Color, gl_SecondaryColor, gl_TexCoord, and gl_FogFragCoord if they are unused by the next stage or not written at all (e.g. gl_TexCoord elements). The gl_TexCoord array is broken down into separate vec4s if needed. v2: - use a switch statement in varying_info_visitor::visit(ir_variable*) - use snprintf - disable the optimization for GLES2 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2013-07-02 17:02:14 +02:00
Jordan Justen	5ebf547312	glsl linker: remove interface block instance names Convert interface blocks with instance names into flat interface blocks without an instance name. Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-05-23 09:37:12 -07:00
Ian Romanick	ee7a6dad30	glsl: Add lowering pass for ir_triop_vector_insert This will eventually replace do_vec_index_to_cond_assign. This lowering pass is called in all the places where do_vec_index_to_cond_assign or do_vec_index_to_swizzle is called. v2: Use WRITEMASK_* instead of integer literals. Use a more concise method of generating broadcast_index. Both suggested by Eric. v3: Use a series of scalar compares instead of a single vector compare. Suggested by Eric and Ken. It still uses 'if (cond) v.x = y;' instead of conditional assignments because ir_builder doesn't do conditional assignments, and I'd rather keep the code simple. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2013-05-13 12:05:19 -07:00
Kenneth Graunke	e413d3f15c	glsl: Add a pass to flip matrix/vector multiplies to use dot products. This pass flips (matrix * vector) operations to (vector * matrixTranspose) for certain built-in matrices (currently gl_ModelViewProjectionMatrix and gl_TextureMatrix). This is equivalent, but results in dot products rather than multiplies and adds. On some hardware, this is more efficient. This pass is conditionalized on ctx->mvp_with_dp4, the flag drivers set to indicate they prefer dot products. Improves performance in Lightsmark by 1.01131% +/- 0.162069% (n = 10) on a Haswell GT2 system. Passes Piglit on Ivybridge. v2: Use struct gl_shader_compiler_options instead of plumbing through another boolean flag for this purpose. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2013-05-12 09:36:46 -07:00
Kenneth Graunke	b765740a66	glsl: Pass struct shader_compiler_options into do_common_optimization. do_common_optimization may need to make choices about whether to emit certain kinds of instructions. gl_context::ShaderCompilerOptions contains exactly that information, so it makes sense to pass it in. Rather than passing the whole array, pass the structure for the stage that's currently being worked on. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2013-05-12 09:36:41 -07:00
Matt Turner	dafd050883	glsl: Add a pass to lower bitfield-insert into bfm+bfi. i965/Gen7+ and Radeon/Evergreen+ have bfm/bfi instructions to implement bitfieldInsert() from ARB_gpu_shader5. v2: Add ir_binop_bfm and ir_triop_bfi to st_glsl_to_tgsi.cpp. Remove spurious temporary assignment and dereference. Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>	2013-05-06 10:17:13 -07:00
Kenneth Graunke	edc52a8f28	glsl: Add an optimization pass to flatten simple nested if blocks. GLBenchmark 2.7's shaders contain conditional blocks like: if (x) { if (y) { ... } } where the outer conditional's then clause contains exactly one statement (the nested if) and there are no else clauses. This can easily be optimized into: if (x && y) { ... } This saves a few instructions in GLBenchmark 2.7: total instructions in shared programs: 11833 -> 11649 (-1.55%) instructions in affected programs: 8234 -> 8050 (-2.23%) It also helps CS:GO slightly (-0.05%/-0.22%). More importantly, however, it simplifies the control flow graph, which could enable other optimizations. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2013-04-04 15:38:19 -07:00
Kenneth Graunke	93066ce129	glsl: Convert mix() to use a new ir_triop_lrp opcode. Many GPUs have an instruction to do linear interpolation which is more efficient than simply performing the algebra necessary (two multiplies, an add, and a subtract). Pattern matching or peepholing this is more desirable, but can be tricky. By using an opcode, we can at least make shaders which use the mix() built-in get the more efficient behavior. Currently, all consumers lower ir_triop_lrp. Subsequent patches will actually generate different code. v2 [mattst88]: - Add LRP_TO_ARITH flag to ir_to_mesa.cpp. Will be removed in a subsequent patch and ir_triop_lrp translated directly. v3 [mattst88]: - Move changes from the next patch to opt_algebraic.cpp to accept 3-src operations. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>	2013-02-28 13:18:59 -08:00
Matt Turner	321555fb41	glsl: Add support for lowering 4x8 pack/unpack operations Lower them to arithmetic and bit manipulation expressions. Reviewed-by: Chad Versace <chad.versace@linux.intel.com> Reviewed-by: Paul Berry <stereotype441@gmail.com>	2013-01-25 14:10:23 -08:00
Chad Versace	b9f56ea923	glsl: Add lowering pass for GLSL ES 3.00 pack/unpack operations (v4) Lower them to arithmetic and bit manipulation expressions. v2: Rewrite using ir_builder [for idr]. v3: Comment typos. [for mattst88] v4: Fix arithmetic error in comments. Factor out a shift instruction. Don't heap allocate factory.instructions. [for paul] Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v2) Reviewed-by: Matt Tuner <mattst88@gmail.com> (v3) Reviewed-by: Paul Berry <stereotype441@gmail.com> (v4) Signed-off-by: Chad Versace <chad.versace@linux.intel.com>	2013-01-24 21:24:10 -08:00
Paul Berry	1745a4d751	glsl: Add a lowering pass for packing varyings. This lowering pass generates GLSL code that manually packs varyings into vec4 slots, for the benefit of back-ends that don't support packed varyings natively. No functional change--the lowering pass is not yet used. Reviewed-by: Eric Anholt <eric@anholt.net> v2: Don't use ir_hierarchical_visitor--just loop over instructions directly. Also, make the names of the packed varyings include the names of the original varyings that were packed into them.	2012-12-14 10:49:21 -08:00
Paul Berry	18392443d4	glsl/lower_clip_distance: Update symbol table. This patch modifies the clip distance lowering pass so that the new symbol it generates (glClipDistanceMESA) is added to the shader's symbol table. This will allow a later patch to modify the linker so that it finds transform feedback varyings using the symbol table rather than having to iterate through all the declarations in the shader. Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2012-12-14 10:48:28 -08:00
Eric Anholt	a75f2681d2	glsl: Add a lowering pass to turn complicated UBO references to vector loads. v2: Reduce the impenetrable code in emit_ubo_loads() by 23 lines by keeping the ir_variable as the variable part of the offset from handle_rvalue(), and track the constant offsets from that with a plain old integer value, avoiding a bunch of temporary variables in the array and struct handling. Also, fix file description doxygen. v3: Fix a row vs col typo, and fix spelling in a comment. Reviewed-by: Eric Anholt <eric@anholt.net>	2012-08-07 13:54:47 -07:00
José Fonseca	e88f9b9546	glsl: Fix lower_discard_flow prototype mismatch. Should fix MSVC link failure.	2012-05-15 12:27:15 +01:00
Eric Anholt	3de1395fa5	glsl: Implement the GLSL 1.30+ discard control flow rule in GLSL IR. Previously, I tried implementing this in the i965 driver, but did so in a way that violated the intent of the spec, and broke Tropics. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2012-05-14 17:03:51 -07:00
Eric Anholt	60177d5e2a	glsl: Add an array splitting pass. I've had this code laying around almost done for a long time. The idea is like opt_structure_splitting, that we've got a bunch of transforms at the GLSL IR level that only understand scalars and vectors, which just skip complicated dereferences. While driver backends may manage some optimization after they split matrices up themselves, it would be better to bring all of our optimization to bear on the problem. While I wasn't expecting changes quite yet, a few programs end up winning: a gstreamer convolution shader, and the Humus dynamic branching demo: Total instructions: 269430 -> 269342 3/2148 programs affected (0.1%) 1498 -> 1410 instructions in affected programs (5.9% reduction)	2012-04-11 18:08:21 -07:00
Vincent Lejeune	6d4b35c036	glsl: Add a lowering pass to remove reads of shader output variables. This is similar to Gallium's existing glsl_to_tgsi::remove_output_read lowering pass, but done entirely inside the GLSL compiler. Signed-off-by: Vincent Lejeune <vljn@ovi.com> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2012-01-06 13:36:44 +00:00
Ian Romanick	1d5d67f8ad	glsl: Add uniform_locations_assigned parameter to do_dead_code opt pass Setting this flag prevents declarations of uniforms from being removed from the IR. Since the IR is directly used by several API functions that query uniforms in shaders, uniform declarations cannot be removed after the locations have been set. However, it should still be safe to reorder the declarations (this is not tested). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41980 Tested-by: Brian Paul <brianp@vmware.com> Reviewed-by: Bryan Cain <bryancain3@gmail.com> Cc: Vinson Lee <vlee@vmware.com> Cc: José Fonseca <jfonseca@vmware.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>	2011-10-25 17:51:43 -07:00
Paul Berry	c06e325967	glsl: Implement a lowering pass for gl_ClipDistance. In i965 GEN6+ (and I suspect most other hardware), gl_ClipDistance needs to be laid out as a pair of vec4's (the first containing clip distances 0-3, and the second containing clip distances 4-7). However, it is declared in GLSL as an array of 8 floats. This lowering pass acts at the GLSL level, modifying the declaration of gl_ClipDistance so that it is an array of vec4's rather than an array of floats, and renaming it to gl_ClipDistanceMESA. In addition, it modifies all accesses to the array so that they access the appropiate component of one of the vec4's. Since some hardware may not internally represent gl_ClipDistance as a pair of vec4's, this lowering pass is optional. To enable it, set the LowerClipDistance flag in gl_shader_compiler_options to true. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2011-09-23 13:28:43 -07:00
Bryan Cain	478034f34a	glsl: Use a separate div_to_mul_rcp lowering flag for integers. Using multiply and reciprocal for integer division involves potentially lossy floating point conversions. This is okay for older GPUs that represent integers as floating point, but undesirable for GPUs with native integer division instructions. TGSI, for example, has UDIV/IDIV instructions for integer division, so it makes sense to handle this directly. Likewise for i965. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Signed-off-by: Bryan Cain <bryancain3@gmail.com> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>	2011-08-31 12:02:18 -07:00
Ian Romanick	90cc372400	glsl: Factor out code that generates block of index comparisons Reviewed-by: Eric Anholt <eric@anholt.net>	2011-07-23 01:24:18 -07:00
Paul Berry	5fb79fc69f	glsl: Remove unused function prototypes. No functional change. Remove prototypes for do_mod_to_fract() and do_sub_to_add_neg(), which haven't existed since November 2010.	2011-07-08 09:59:29 -07:00
Eric Anholt	e31266ed3e	glsl: Add a new opt_copy_propagation variant that does it channel-wise. This patch cleans up many of the extra copies in GLSL IR introduced by i965's scalarizing passes. It doesn't result in a statistically significant performance difference on nexuiz high settings (n=3) or my demo (n=10), due to brw_fs.cpp's register coalescing covering most of those extra moves anyway. However, it does make the debug of wine's GLSL shaders much more tractable, and reduces instruction count of glsl-fs-convolution-2 from 376 to 288.	2011-02-04 12:18:38 -06:00
Kenneth Graunke	9ac6a9b2fa	glsl: Support if-flattening beyond a given maximum nesting depth. This adds a new optional max_depth parameter (defaulting to 0) to lower_if_to_cond_assign, and makes the pass only flatten if-statements nested deeper than that. By default, all if-statements will be flattened, just like before. This patch also renames do_if_to_cond_assign to lower_if_to_cond_assign, to match the new naming conventions.	2010-12-27 00:59:31 -08:00
Ian Romanick	c4285be9a5	glsl: Lower ir_binop_pow to a sequence of EXP2 and LOG2	2010-12-01 12:01:13 -08:00
Kenneth Graunke	940df10100	glsl: Add a lowering pass to move discards out of if-statements. This should allow lower_if_to_cond_assign to work in the presence of discards, fixing bug #31690 and likely #31983. NOTE: This is a candidate for the 7.9 branch.	2010-12-01 11:52:43 -08:00
Kenneth Graunke	9a1d063c6d	glsl: Add an optimization pass to simplify discards. NOTE: This is a candidate for the 7.9 branch.	2010-12-01 11:52:43 -08:00
Kenneth Graunke	63684a9ae7	glsl: Combine many instruction lowering passes into one. This should save on the overhead of tree-walking and provide a convenient place to add more instruction lowering in the future. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>	2010-11-19 15:56:28 -08:00
Ian Romanick	11d6f1c698	glsl: Add ir_quadop_vector expression The vector operator collects 2, 3, or 4 scalar components into a vector. Doing this has several advantages. First, it will make ud-chain tracking for components of vectors much easier. Second, a later optimization pass could collect scalars into vectors to allow generation of SWZ instructions (or similar as operands to other instructions on R200 and i915). It also enables an easy way to generate IR for SWZ instructions in the ARB_vertex_program assembler.	2010-11-19 15:00:26 -08:00
Eric Anholt	aae338104f	glsl: Add a lowering pass for texture projection.	2010-09-30 20:23:36 -07:00
Ian Romanick	a6ecd1c372	glsl2: Add flags to enable variable index lowering	2010-09-17 11:00:24 +02:00
Luca Barbieri	a47539c7a1	glsl: add pass to lower variable array indexing to conditional assignments Currenly GLSL happily generates indirect addressing of any kind of arrays. Unfortunately DirectX 9 GPUs are not guaranteed to support any of them in general. This pass fixes that by lowering such constructs to a binary search on the values, followed at the end by vectorized generation of equality masks, and 4 conditional assignments for each mask generation. Note that this requires the ir_binop_equal change so that we can emit SEQ to generate the boolean masks. Unfortunately, ir_structure_splitting is too dumb to turn the resulting constant array references to individual variables, so this will need to be added too before this pass can actually be effective for temps. Several patches in the glsl2-lower-variable-indexing were squashed into this commit. These patches fix bugs in Luca's original implementation, and the individual patches can be seen in that branch. This was done to aid bisecting in the future. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>	2010-09-17 10:58:58 +02:00
Ian Romanick	8f2214f489	glsl2: Add pass to remove redundant jumps	2010-09-13 14:25:26 -07:00
Luca Barbieri	3361cbac2a	glsl: add continue/break/return unification/elimination pass (v2) Changes in v2: - Base class renamed to ir_control_flow_visitor - Tried to comply with coding style This is a new pass that supersedes ir_if_return and "lowers" jumps to if/else structures. Currently it causes no regressions on softpipe and nv40, but I'm not sure whether the piglit glsl tests are thorough enough, so consider this experimental. It can be asked to: 1. Pull jumps out of ifs where possible 2. Remove all "continue"s, replacing them with an "execute flag" 3. Replace all "break" with a single conditional one at the end of the loop 4. Replace all "return"s with a single return at the end of the function, for the main function and/or other functions This gives several great benefits: 1. All functions can be inlined after this pass 2. nv40 and other pre-DX10 chips without "continue" can be supported 3. nv30 and other pre-DX10 chips with no control flow at all are better supported Note that for full effect we should also teach the unroller to unroll loops with a fixed maximum number of iterations but with the canonical conditional "break" that this pass will insert if asked to. Continues are lowered by adding a per-loop "execute flag", initialized to TRUE, that when cleared inhibits all execution until the end of the loop. Breaks are lowered to continues, plus setting a "break flag" that is checked at the end of the loop, and trigger the unique "break". Returns are lowered to breaks/continues, plus adding a "return flag" that causes loops to break again out of their enclosing loops until all the loops are exited: then the "execute flag" logic will ignore everything until the end of the function. Note that "continue" and "return" can also be implemented by adding a dummy loop and using break. However, this is bad for hardware with limited nesting depth, and prevents further optimization, and thus is not currently performed.	2010-09-13 13:03:09 -07:00

1 2

70 Commits