Commit Graph

23 Commits

Author SHA1 Message Date
Luca Barbieri
a47539c7a1 glsl: add pass to lower variable array indexing to conditional assignments
Currenly GLSL happily generates indirect addressing of any kind of
arrays.

Unfortunately DirectX 9 GPUs are not guaranteed to support any of them in
general.

This pass fixes that by lowering such constructs to a binary search on the
values, followed at the end by vectorized generation of equality masks, and
4 conditional assignments for each mask generation.

Note that this requires the ir_binop_equal change so that we can emit SEQ
to generate the boolean masks.

Unfortunately, ir_structure_splitting is too dumb to turn the resulting
constant array references to individual variables, so this will need to
be added too before this pass can actually be effective for temps.

Several patches in the glsl2-lower-variable-indexing were squashed
into this commit.  These patches fix bugs in Luca's original
implementation, and the individual patches can be seen in that branch.
This was done to aid bisecting in the future.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2010-09-17 10:58:58 +02:00
Ian Romanick
8f2214f489 glsl2: Add pass to remove redundant jumps 2010-09-13 14:25:26 -07:00
Luca Barbieri
3361cbac2a glsl: add continue/break/return unification/elimination pass (v2)
Changes in v2:
- Base class renamed to ir_control_flow_visitor
- Tried to comply with coding style

This is a new pass that supersedes ir_if_return and "lowers" jumps
to if/else structures.

Currently it causes no regressions on softpipe and nv40, but I'm not sure
whether the piglit glsl tests are thorough enough, so consider this
experimental.

It can be asked to:
1. Pull jumps out of ifs where possible
2. Remove all "continue"s, replacing them with an "execute flag"
3. Replace all "break" with a single conditional one at the end of the loop
4. Replace all "return"s with a single return at the end of the function,
   for the main function and/or other functions

This gives several great benefits:
1. All functions can be inlined after this pass
2. nv40 and other pre-DX10 chips without "continue" can be supported
3. nv30 and other pre-DX10 chips with no control flow at all are better supported

Note that for full effect we should also teach the unroller to unroll
loops with a fixed maximum number of iterations but with the canonical
conditional "break" that this pass will insert if asked to.

Continues are lowered by adding a per-loop "execute flag", initialized to
TRUE, that when cleared inhibits all execution until the end of the loop.

Breaks are lowered to continues, plus setting a "break flag" that is checked
at the end of the loop, and trigger the unique "break".

Returns are lowered to breaks/continues, plus adding a "return flag" that
causes loops to break again out of their enclosing loops until all the
loops are exited: then the "execute flag" logic will ignore everything
until the end of the function.

Note that "continue" and "return" can also be implemented by adding
a dummy loop and using break.
However, this is bad for hardware with limited nesting depth, and
prevents further optimization, and thus is not currently performed.
2010-09-13 13:03:09 -07:00
Ian Romanick
547131ac87 glsl2: Add lowering pass to remove noise opcodes 2010-09-09 15:39:51 -07:00
Luca Barbieri
e591c4625c glsl: add several EmitNo* options, and MaxUnrollIterations
This increases the chance that GLSL programs will actually work.

Note that continues and returns are not yet lowered, so linking
will just fail if not supported.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2010-09-08 20:36:37 -07:00
Eric Anholt
8f8cdbfba4 glsl2: Add a pass to strip out noop swizzles.
With the glsl2-965 branch, the optimization of glsl-algebraic-rcp-rcp
regressed due to noop swizzles hiding information from ir_algebraic.
This cleans up those noop swizzles for us.
2010-08-13 17:54:47 -07:00
Eric Anholt
2f4fe15168 glsl2: Move the common optimization passes to a helper function.
These are passes that we expect all codegen to be happy with.  The
other lowering passes for Mesa IR are moved to the Mesa IR generator.
2010-08-13 17:47:00 -07:00
Eric Anholt
5854d4583c glsl2: Add a pass to transform ir_binop_sub to add(op0, neg(op1))
All the current HW backends transform subtract to adding the negation,
so I haven't bothered peepholing it back out in Mesa IR.  This allows
some subtract of subtract to get removed in ir_algebraic.
2010-08-09 21:41:14 -07:00
Eric Anholt
8bebbeb7c5 glsl2: Add constant propagation.
Whereas constant folding evaluates constant expressions at rvalue
nodes, constant propagation tracks constant components of vectors
across execution to replace (possibly swizzled) variable dereferences
with constant values, triggering possible constant folding or reduced
variable liveness.
2010-08-09 19:21:18 -07:00
Eric Anholt
bc4034b243 glsl2: Add a pass to convert exp and log to exp2 and log2.
Fixes ir_to_mesa handling of unop_log, which used the weird ARB_vp LOG
opcode that doesn't do what we want.  This also lets the multiplication
coefficients in there get constant-folded, possibly.

Fixes:
glsl-fs-log
2010-08-05 15:34:00 -07:00
Eric Anholt
7f7eaf0285 ir_structure_splitting: New pass to chop structures into their components.
This doesn't do anything if your structure goes through an uninlined
function call or if whole-structure assignment occurs.  As such, the
impact is limited, at least until we do some global copy propagation
to reduce whole-structure assignment.
2010-08-05 12:56:03 -07:00
Eric Anholt
2e853ca23c glsl2: Add a pass for removing unused functions.
For a shader involving many small functions, this avoids running
optimization across all of them after they've been inlined
post-linking.

Reduces the runtime of linking and running a fragment shader from Yo
Frankie from 1.6 seconds to 0.9 seconds (-44.9%, +/- 3.3%).
2010-08-05 10:18:31 -07:00
Eric Anholt
784695442c glsl2: Add new tree grafting optimization pass. 2010-07-31 15:52:21 -07:00
Eric Anholt
66d4c65ee2 glsl2: Make the dead code handler make its own talloc context.
This way, we don't need to pass in a parse state, and the context
doesn't grow with the number of passes through optimization.
2010-07-27 11:46:05 -07:00
Eric Anholt
832aad989e glsl2: Add optimization pass for algebraic simplifications.
This cleans up the assembly output of almost all the non-logic tests
glsl-algebraic-*.  glsl-algebraic-pow-two needs love (basically,
flattening to a temporary and squaring it).
2010-07-27 09:43:52 -07:00
Eric Anholt
29ce44ad2b glsl2: Add a pass for converting if statements to conditional assignment.
This will be used on 915 and similar hardware of that generation.
2010-07-19 10:21:38 -07:00
Eric Anholt
6d8a0a0aad glsl2: Add a new pass at the IR level to break down matrix ops to vector ops.
This will be used by the Mesa IR and likely most HW backends, as it
allows other optimizations to occur that might not otherwise.

Fixes glsl-vs-mat-sub-1, glsl-vs-mat-div-1.
2010-07-12 13:26:46 -07:00
Eric Anholt
d674ebcee0 glsl2: Add a pass to simplify if statements returning from both sides.
This allows function inlining making the following tests work even
without function calls implemented:
glsl-fs-functions-2
glsl-fs-functions-3
glsl-vs-functions
glsl-vs-functions-2
glsl-vs-functions-3
glsl-vs-vec4-indexing-5

(Note that those tests were designed to trigger actual function calls,
and this defeats them.  However, those testcases ended up catching the
bug in the previous commit.)
2010-07-07 09:10:48 -07:00
Eric Anholt
a36334be02 glsl2: Add pass for supporting variable vector indexing in rvalues.
The Mesa IR needs this to support vector indexing correctly, and
hardware backends such as 915 would want this behavior as well.

Fixes glsl-vs-vec4-indexing-2.
2010-07-06 18:49:24 -07:00
Eric Anholt
9a0e421983 glsl2: Add a pass to break ir_binop_div to _mul and _rcp.
This results in constant folding of a constant divisor.
2010-07-02 11:27:06 -07:00
Eric Anholt
8a1f186cc5 glsl2: Add a pass to convert mod(a, b) to b * fract(a/b).
This is used by the Mesa IR backend to implement mod, fixing glsl-fs-mod.
2010-07-01 11:07:23 -07:00
Eric Anholt
bda27424cf glsl2: Use the parser state as the talloc context for dead code elimination.
This cuts runtime by around 20% from talloc_parent() lookups.
2010-06-25 13:38:38 -07:00
Eric Anholt
2928588267 glsl2: Move the compiler to the subdirectory it will live in in Mesa. 2010-06-24 15:36:00 -07:00