Commit Graph

52 Commits

Author SHA1 Message Date
Ian Romanick
ee7a6dad30 glsl: Add lowering pass for ir_triop_vector_insert
This will eventually replace do_vec_index_to_cond_assign.  This lowering
pass is called in all the places where do_vec_index_to_cond_assign or
do_vec_index_to_swizzle is called.

v2: Use WRITEMASK_* instead of integer literals.  Use a more concise
method of generating broadcast_index.  Both suggested by Eric.

v3: Use a series of scalar compares instead of a single vector compare.
Suggested by Eric and Ken.  It still uses 'if (cond) v.x = y;' instead
of conditional assignments because ir_builder doesn't do conditional
assignments, and I'd rather keep the code simple.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-05-13 12:05:19 -07:00
Kenneth Graunke
e413d3f15c glsl: Add a pass to flip matrix/vector multiplies to use dot products.
This pass flips (matrix * vector) operations to (vector *
matrixTranspose) for certain built-in matrices (currently
gl_ModelViewProjectionMatrix and gl_TextureMatrix).

This is equivalent, but results in dot products rather than multiplies
and adds.  On some hardware, this is more efficient.

This pass is conditionalized on ctx->mvp_with_dp4, the flag drivers set
to indicate they prefer dot products.

Improves performance in Lightsmark by 1.01131% +/- 0.162069% (n = 10)
on a Haswell GT2 system.  Passes Piglit on Ivybridge.

v2: Use struct gl_shader_compiler_options instead of plumbing through
    another boolean flag for this purpose.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-05-12 09:36:46 -07:00
Kenneth Graunke
b765740a66 glsl: Pass struct shader_compiler_options into do_common_optimization.
do_common_optimization may need to make choices about whether to emit
certain kinds of instructions.  gl_context::ShaderCompilerOptions
contains exactly that information, so it makes sense to pass it in.

Rather than passing the whole array, pass the structure for the stage
that's currently being worked on.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-05-12 09:36:41 -07:00
Matt Turner
dafd050883 glsl: Add a pass to lower bitfield-insert into bfm+bfi.
i965/Gen7+ and Radeon/Evergreen+ have bfm/bfi instructions to implement
bitfieldInsert() from ARB_gpu_shader5.

v2: Add ir_binop_bfm and ir_triop_bfi to st_glsl_to_tgsi.cpp.
    Remove spurious temporary assignment and dereference.

Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2013-05-06 10:17:13 -07:00
Kenneth Graunke
edc52a8f28 glsl: Add an optimization pass to flatten simple nested if blocks.
GLBenchmark 2.7's shaders contain conditional blocks like:

if (x) {
    if (y) {
        ...
    }
}

where the outer conditional's then clause contains exactly one statement
(the nested if) and there are no else clauses.  This can easily be
optimized into:

if (x && y) {
    ...
}

This saves a few instructions in GLBenchmark 2.7:

    total instructions in shared programs: 11833 -> 11649 (-1.55%)
    instructions in affected programs:     8234 -> 8050 (-2.23%)

It also helps CS:GO slightly (-0.05%/-0.22%).  More importantly,
however, it simplifies the control flow graph, which could enable other
optimizations.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-04-04 15:38:19 -07:00
Kenneth Graunke
93066ce129 glsl: Convert mix() to use a new ir_triop_lrp opcode.
Many GPUs have an instruction to do linear interpolation which is more
efficient than simply performing the algebra necessary (two multiplies,
an add, and a subtract).

Pattern matching or peepholing this is more desirable, but can be
tricky.  By using an opcode, we can at least make shaders which use the
mix() built-in get the more efficient behavior.

Currently, all consumers lower ir_triop_lrp.  Subsequent patches will
actually generate different code.

v2 [mattst88]:
   - Add LRP_TO_ARITH flag to ir_to_mesa.cpp. Will be removed in a
     subsequent patch and ir_triop_lrp translated directly.
v3 [mattst88]:
   - Move changes from the next patch to opt_algebraic.cpp to accept
     3-src operations.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2013-02-28 13:18:59 -08:00
Matt Turner
321555fb41 glsl: Add support for lowering 4x8 pack/unpack operations
Lower them to arithmetic and bit manipulation expressions.

Reviewed-by: Chad Versace <chad.versace@linux.intel.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-01-25 14:10:23 -08:00
Chad Versace
b9f56ea923 glsl: Add lowering pass for GLSL ES 3.00 pack/unpack operations (v4)
Lower them to arithmetic and bit manipulation expressions.

v2: Rewrite using ir_builder [for idr].
v3: Comment typos. [for mattst88]
v4: Fix arithmetic error in comments.
    Factor out a shift instruction.
    Don't heap allocate factory.instructions.
    [for paul]

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v2)
Reviewed-by: Matt Tuner <mattst88@gmail.com> (v3)
Reviewed-by: Paul Berry <stereotype441@gmail.com> (v4)
Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-01-24 21:24:10 -08:00
Paul Berry
1745a4d751 glsl: Add a lowering pass for packing varyings.
This lowering pass generates GLSL code that manually packs varyings
into vec4 slots, for the benefit of back-ends that don't support
packed varyings natively.

No functional change--the lowering pass is not yet used.

Reviewed-by: Eric Anholt <eric@anholt.net>

v2: Don't use ir_hierarchical_visitor--just loop over instructions
directly.  Also, make the names of the packed varyings include the
names of the original varyings that were packed into them.
2012-12-14 10:49:21 -08:00
Paul Berry
18392443d4 glsl/lower_clip_distance: Update symbol table.
This patch modifies the clip distance lowering pass so that the new
symbol it generates (glClipDistanceMESA) is added to the shader's
symbol table.

This will allow a later patch to modify the linker so that it finds
transform feedback varyings using the symbol table rather than having
to iterate through all the declarations in the shader.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2012-12-14 10:48:28 -08:00
Eric Anholt
a75f2681d2 glsl: Add a lowering pass to turn complicated UBO references to vector loads.
v2: Reduce the impenetrable code in emit_ubo_loads() by 23 lines by keeping
    the ir_variable as the variable part of the offset from handle_rvalue(),
    and track the constant offsets from that with a plain old integer value,
    avoiding a bunch of temporary variables in the array and struct handling.
    Also, fix file description doxygen.
v3: Fix a row vs col typo, and fix spelling in a comment.

Reviewed-by: Eric Anholt <eric@anholt.net>
2012-08-07 13:54:47 -07:00
José Fonseca
e88f9b9546 glsl: Fix lower_discard_flow prototype mismatch.
Should fix MSVC link failure.
2012-05-15 12:27:15 +01:00
Eric Anholt
3de1395fa5 glsl: Implement the GLSL 1.30+ discard control flow rule in GLSL IR.
Previously, I tried implementing this in the i965 driver, but did so
in a way that violated the intent of the spec, and broke Tropics.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2012-05-14 17:03:51 -07:00
Eric Anholt
60177d5e2a glsl: Add an array splitting pass.
I've had this code laying around almost done for a long time.  The
idea is like opt_structure_splitting, that we've got a bunch of
transforms at the GLSL IR level that only understand scalars and
vectors, which just skip complicated dereferences.  While driver
backends may manage some optimization after they split matrices up
themselves, it would be better to bring all of our optimization to
bear on the problem.

While I wasn't expecting changes quite yet, a few programs end up
winning: a gstreamer convolution shader, and the Humus dynamic
branching demo:
Total instructions: 269430 -> 269342
3/2148 programs affected (0.1%)
1498 -> 1410 instructions in affected programs (5.9% reduction)
2012-04-11 18:08:21 -07:00
Vincent Lejeune
6d4b35c036 glsl: Add a lowering pass to remove reads of shader output variables.
This is similar to Gallium's existing glsl_to_tgsi::remove_output_read
lowering pass, but done entirely inside the GLSL compiler.

Signed-off-by: Vincent Lejeune <vljn@ovi.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2012-01-06 13:36:44 +00:00
Ian Romanick
1d5d67f8ad glsl: Add uniform_locations_assigned parameter to do_dead_code opt pass
Setting this flag prevents declarations of uniforms from being removed
from the IR.  Since the IR is directly used by several API functions
that query uniforms in shaders, uniform declarations cannot be removed
after the locations have been set.  However, it should still be safe
to reorder the declarations (this is not tested).

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41980
Tested-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Bryan Cain <bryancain3@gmail.com>
Cc: Vinson Lee <vlee@vmware.com>
Cc: José Fonseca <jfonseca@vmware.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
2011-10-25 17:51:43 -07:00
Paul Berry
c06e325967 glsl: Implement a lowering pass for gl_ClipDistance.
In i965 GEN6+ (and I suspect most other hardware), gl_ClipDistance
needs to be laid out as a pair of vec4's (the first containing clip
distances 0-3, and the second containing clip distances 4-7).
However, it is declared in GLSL as an array of 8 floats.

This lowering pass acts at the GLSL level, modifying the declaration
of gl_ClipDistance so that it is an array of vec4's rather than an
array of floats, and renaming it to gl_ClipDistanceMESA.  In addition,
it modifies all accesses to the array so that they access the
appropiate component of one of the vec4's.

Since some hardware may not internally represent gl_ClipDistance as a
pair of vec4's, this lowering pass is optional.  To enable it, set the
LowerClipDistance flag in gl_shader_compiler_options to true.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2011-09-23 13:28:43 -07:00
Bryan Cain
478034f34a glsl: Use a separate div_to_mul_rcp lowering flag for integers.
Using multiply and reciprocal for integer division involves potentially
lossy floating point conversions.  This is okay for older GPUs that
represent integers as floating point, but undesirable for GPUs with
native integer division instructions.

TGSI, for example, has UDIV/IDIV instructions for integer division,
so it makes sense to handle this directly.  Likewise for i965.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Signed-off-by: Bryan Cain <bryancain3@gmail.com>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2011-08-31 12:02:18 -07:00
Ian Romanick
90cc372400 glsl: Factor out code that generates block of index comparisons
Reviewed-by: Eric Anholt <eric@anholt.net>
2011-07-23 01:24:18 -07:00
Paul Berry
5fb79fc69f glsl: Remove unused function prototypes.
No functional change.  Remove prototypes for do_mod_to_fract() and
do_sub_to_add_neg(), which haven't existed since November 2010.
2011-07-08 09:59:29 -07:00
Eric Anholt
e31266ed3e glsl: Add a new opt_copy_propagation variant that does it channel-wise.
This patch cleans up many of the extra copies in GLSL IR introduced by
i965's scalarizing passes.  It doesn't result in a statistically
significant performance difference on nexuiz high settings (n=3) or my
demo (n=10), due to brw_fs.cpp's register coalescing covering most of
those extra moves anyway.  However, it does make the debug of wine's
GLSL shaders much more tractable, and reduces instruction count of
glsl-fs-convolution-2 from 376 to 288.
2011-02-04 12:18:38 -06:00
Kenneth Graunke
9ac6a9b2fa glsl: Support if-flattening beyond a given maximum nesting depth.
This adds a new optional max_depth parameter (defaulting to 0) to
lower_if_to_cond_assign, and makes the pass only flatten if-statements
nested deeper than that.

By default, all if-statements will be flattened, just like before.

This patch also renames do_if_to_cond_assign to lower_if_to_cond_assign,
to match the new naming conventions.
2010-12-27 00:59:31 -08:00
Ian Romanick
c4285be9a5 glsl: Lower ir_binop_pow to a sequence of EXP2 and LOG2 2010-12-01 12:01:13 -08:00
Kenneth Graunke
940df10100 glsl: Add a lowering pass to move discards out of if-statements.
This should allow lower_if_to_cond_assign to work in the presence of
discards, fixing bug #31690 and likely #31983.

NOTE: This is a candidate for the 7.9 branch.
2010-12-01 11:52:43 -08:00
Kenneth Graunke
9a1d063c6d glsl: Add an optimization pass to simplify discards.
NOTE: This is a candidate for the 7.9 branch.
2010-12-01 11:52:43 -08:00
Kenneth Graunke
63684a9ae7 glsl: Combine many instruction lowering passes into one.
This should save on the overhead of tree-walking and provide a
convenient place to add more instruction lowering in the future.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2010-11-19 15:56:28 -08:00
Ian Romanick
11d6f1c698 glsl: Add ir_quadop_vector expression
The vector operator collects 2, 3, or 4 scalar components into a
vector.  Doing this has several advantages.  First, it will make
ud-chain tracking for components of vectors much easier.  Second, a
later optimization pass could collect scalars into vectors to allow
generation of SWZ instructions (or similar as operands to other
instructions on R200 and i915).  It also enables an easy way to
generate IR for SWZ instructions in the ARB_vertex_program assembler.
2010-11-19 15:00:26 -08:00
Eric Anholt
aae338104f glsl: Add a lowering pass for texture projection. 2010-09-30 20:23:36 -07:00
Ian Romanick
a6ecd1c372 glsl2: Add flags to enable variable index lowering 2010-09-17 11:00:24 +02:00
Luca Barbieri
a47539c7a1 glsl: add pass to lower variable array indexing to conditional assignments
Currenly GLSL happily generates indirect addressing of any kind of
arrays.

Unfortunately DirectX 9 GPUs are not guaranteed to support any of them in
general.

This pass fixes that by lowering such constructs to a binary search on the
values, followed at the end by vectorized generation of equality masks, and
4 conditional assignments for each mask generation.

Note that this requires the ir_binop_equal change so that we can emit SEQ
to generate the boolean masks.

Unfortunately, ir_structure_splitting is too dumb to turn the resulting
constant array references to individual variables, so this will need to
be added too before this pass can actually be effective for temps.

Several patches in the glsl2-lower-variable-indexing were squashed
into this commit.  These patches fix bugs in Luca's original
implementation, and the individual patches can be seen in that branch.
This was done to aid bisecting in the future.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2010-09-17 10:58:58 +02:00
Ian Romanick
8f2214f489 glsl2: Add pass to remove redundant jumps 2010-09-13 14:25:26 -07:00
Luca Barbieri
3361cbac2a glsl: add continue/break/return unification/elimination pass (v2)
Changes in v2:
- Base class renamed to ir_control_flow_visitor
- Tried to comply with coding style

This is a new pass that supersedes ir_if_return and "lowers" jumps
to if/else structures.

Currently it causes no regressions on softpipe and nv40, but I'm not sure
whether the piglit glsl tests are thorough enough, so consider this
experimental.

It can be asked to:
1. Pull jumps out of ifs where possible
2. Remove all "continue"s, replacing them with an "execute flag"
3. Replace all "break" with a single conditional one at the end of the loop
4. Replace all "return"s with a single return at the end of the function,
   for the main function and/or other functions

This gives several great benefits:
1. All functions can be inlined after this pass
2. nv40 and other pre-DX10 chips without "continue" can be supported
3. nv30 and other pre-DX10 chips with no control flow at all are better supported

Note that for full effect we should also teach the unroller to unroll
loops with a fixed maximum number of iterations but with the canonical
conditional "break" that this pass will insert if asked to.

Continues are lowered by adding a per-loop "execute flag", initialized to
TRUE, that when cleared inhibits all execution until the end of the loop.

Breaks are lowered to continues, plus setting a "break flag" that is checked
at the end of the loop, and trigger the unique "break".

Returns are lowered to breaks/continues, plus adding a "return flag" that
causes loops to break again out of their enclosing loops until all the
loops are exited: then the "execute flag" logic will ignore everything
until the end of the function.

Note that "continue" and "return" can also be implemented by adding
a dummy loop and using break.
However, this is bad for hardware with limited nesting depth, and
prevents further optimization, and thus is not currently performed.
2010-09-13 13:03:09 -07:00
Ian Romanick
547131ac87 glsl2: Add lowering pass to remove noise opcodes 2010-09-09 15:39:51 -07:00
Luca Barbieri
e591c4625c glsl: add several EmitNo* options, and MaxUnrollIterations
This increases the chance that GLSL programs will actually work.

Note that continues and returns are not yet lowered, so linking
will just fail if not supported.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
2010-09-08 20:36:37 -07:00
Eric Anholt
8f8cdbfba4 glsl2: Add a pass to strip out noop swizzles.
With the glsl2-965 branch, the optimization of glsl-algebraic-rcp-rcp
regressed due to noop swizzles hiding information from ir_algebraic.
This cleans up those noop swizzles for us.
2010-08-13 17:54:47 -07:00
Eric Anholt
2f4fe15168 glsl2: Move the common optimization passes to a helper function.
These are passes that we expect all codegen to be happy with.  The
other lowering passes for Mesa IR are moved to the Mesa IR generator.
2010-08-13 17:47:00 -07:00
Eric Anholt
5854d4583c glsl2: Add a pass to transform ir_binop_sub to add(op0, neg(op1))
All the current HW backends transform subtract to adding the negation,
so I haven't bothered peepholing it back out in Mesa IR.  This allows
some subtract of subtract to get removed in ir_algebraic.
2010-08-09 21:41:14 -07:00
Eric Anholt
8bebbeb7c5 glsl2: Add constant propagation.
Whereas constant folding evaluates constant expressions at rvalue
nodes, constant propagation tracks constant components of vectors
across execution to replace (possibly swizzled) variable dereferences
with constant values, triggering possible constant folding or reduced
variable liveness.
2010-08-09 19:21:18 -07:00
Eric Anholt
bc4034b243 glsl2: Add a pass to convert exp and log to exp2 and log2.
Fixes ir_to_mesa handling of unop_log, which used the weird ARB_vp LOG
opcode that doesn't do what we want.  This also lets the multiplication
coefficients in there get constant-folded, possibly.

Fixes:
glsl-fs-log
2010-08-05 15:34:00 -07:00
Eric Anholt
7f7eaf0285 ir_structure_splitting: New pass to chop structures into their components.
This doesn't do anything if your structure goes through an uninlined
function call or if whole-structure assignment occurs.  As such, the
impact is limited, at least until we do some global copy propagation
to reduce whole-structure assignment.
2010-08-05 12:56:03 -07:00
Eric Anholt
2e853ca23c glsl2: Add a pass for removing unused functions.
For a shader involving many small functions, this avoids running
optimization across all of them after they've been inlined
post-linking.

Reduces the runtime of linking and running a fragment shader from Yo
Frankie from 1.6 seconds to 0.9 seconds (-44.9%, +/- 3.3%).
2010-08-05 10:18:31 -07:00
Eric Anholt
784695442c glsl2: Add new tree grafting optimization pass. 2010-07-31 15:52:21 -07:00
Eric Anholt
66d4c65ee2 glsl2: Make the dead code handler make its own talloc context.
This way, we don't need to pass in a parse state, and the context
doesn't grow with the number of passes through optimization.
2010-07-27 11:46:05 -07:00
Eric Anholt
832aad989e glsl2: Add optimization pass for algebraic simplifications.
This cleans up the assembly output of almost all the non-logic tests
glsl-algebraic-*.  glsl-algebraic-pow-two needs love (basically,
flattening to a temporary and squaring it).
2010-07-27 09:43:52 -07:00
Eric Anholt
29ce44ad2b glsl2: Add a pass for converting if statements to conditional assignment.
This will be used on 915 and similar hardware of that generation.
2010-07-19 10:21:38 -07:00
Eric Anholt
6d8a0a0aad glsl2: Add a new pass at the IR level to break down matrix ops to vector ops.
This will be used by the Mesa IR and likely most HW backends, as it
allows other optimizations to occur that might not otherwise.

Fixes glsl-vs-mat-sub-1, glsl-vs-mat-div-1.
2010-07-12 13:26:46 -07:00
Eric Anholt
d674ebcee0 glsl2: Add a pass to simplify if statements returning from both sides.
This allows function inlining making the following tests work even
without function calls implemented:
glsl-fs-functions-2
glsl-fs-functions-3
glsl-vs-functions
glsl-vs-functions-2
glsl-vs-functions-3
glsl-vs-vec4-indexing-5

(Note that those tests were designed to trigger actual function calls,
and this defeats them.  However, those testcases ended up catching the
bug in the previous commit.)
2010-07-07 09:10:48 -07:00
Eric Anholt
a36334be02 glsl2: Add pass for supporting variable vector indexing in rvalues.
The Mesa IR needs this to support vector indexing correctly, and
hardware backends such as 915 would want this behavior as well.

Fixes glsl-vs-vec4-indexing-2.
2010-07-06 18:49:24 -07:00
Eric Anholt
9a0e421983 glsl2: Add a pass to break ir_binop_div to _mul and _rcp.
This results in constant folding of a constant divisor.
2010-07-02 11:27:06 -07:00
Eric Anholt
8a1f186cc5 glsl2: Add a pass to convert mod(a, b) to b * fract(a/b).
This is used by the Mesa IR backend to implement mod, fixing glsl-fs-mod.
2010-07-01 11:07:23 -07:00