Orbital Explorer was generating a 4000 instruction geometry shader, which
was taking 275 trips through dead code elimination and register
coalescing, each of which updated live variables to get its work done, and
invalidated those live variables afterwards.
By using bitfields instead of bools (reducing the working set size by a
factor of 8) in live variables analysis, it drops from 88% of the profile
to 57%, and reduces overall runtime from I-got-bored-and-killed-it (Paul
says 3+ minutes) to 10.5 seconds.
Compare to f179f419d1 on the FS side.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
This prevents unnecessary (and wrong) register allocation in the
scheduler for preloaded values in fixed registers.
Fixes interpolation-mixed.shader_test on rv770
(and probably on all other pre-evergreen chips).
Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
I noticed this in a shader in Unigine Heaven that was spilling. While it
doesn't really reduce register pressure, it shaves a few instructions
anyway (7955 -> 7882).
v2: Fix turning "0 >> x" into "x" instead of "0" (caught by Erik
Faye-Lund).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
While ir_builder is slightly less efficient, we're only increasing the
work when there's actual optimization being done, and it's way more
readable code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Matt and I had each screwed up these common required patterns recently, in
ways that wouldn't have been noticed for a long time if not for code
review. Just enforce it in the caller so that we don't rely on code
review catching these bugs.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There is nothing in the OpenGL specification which prevents the user from
calling glGenQueries to generate a new query object while another object is
active. Neither is there anything in the Mesa implementation which prevents
this. So remove the INVALID_OPERATION errors in this case.
Similarly, it is explicitly allowed by the OpenGL specification to delete an
active query, so remove the assertion for that case, replacing it with the
necesssary state updates to end the query, (clear the bindpt pointer and call
into the driver's EndQuery hook).
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
The normal drawing path does this, and it's necessary on Ivybridge,
so let's try it on Sandybridge too. It's not explicitly documented
as necessary, but might help with hangs.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
From the documentation:
"[DevIVB] 3DSTATE_DEPTH_BUFFER must always be programmed along with the
other Depth/Stencil state commands(i.e. 3DSTATE_CLEAR_PARAMS,
3DSTATE_STENCIL_BUFFER, or 3DSTATE_HIER_DEPTH_BUFFER)."
We normally do this, but BLORP was failing to do so in the case where it
disables depth.
Not observed to fix anything yet.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
For some reason, we put the flush in the caller, rather than just before
emitting the packet. This is more than a cosmetic problem: BLORP calls
gen6_emit_3dstate_multisample() directly, and so it missed the flush.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
From the comments above intel_emit_post_sync_nonzero_flush:
"[DevSNB-C+{W/A}] Before any depth stall flush (including those
produced by non-pipelined state commands), software needs to first
send a PIPE_CONTROL with no bits set except Post-Sync Operation != 0."
This suggests that every non-pipelined (0x79xx) command needs a
post-sync non-zero flush before it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
Otherwise the gen6 w/a in the kernel won't kick in and the write will
land nowhere.
Inspired by a patch Ken pointed me at which had the same issue (but
isn't yet merged and also for a gen7+ feature). An audit of the entire
driver didn't reveal any other case than the one in in the write_reg
helper used by the gen6 queryobj code.
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
The main purpose of this patch is to increase readability of
the array code by introducing is_unsized_array() to glsl_types.
Some redundent is_array() checks are also removed, and small number
of other related clean ups.
The introduction of is_unsized_array() should also make the
ARB_arrays_of_arrays code simpler and more readable when it arrives.
V2: Also replace code that checks for unsized arrays directly with the
length variable
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
v3 (Paul Berry <stereotype441@gmail.com>): clean up formatting.
Separate whitespace cleanups to their own patch.
Reviewed-by: Paul Berry <stereotype441@gmail.com>
When a geometry shader is present, the fragment shader gl_PrimitiveID
input acts like an ordinary varying, receiving data from the gs
gl_PrimitiveID output. When there's no geometry shader, we have to
ask the fixed function SF hardware to provide the primitive ID to the
fragment shader instead.
Previously, the SF setup code would handle this situation by
recognizing that the FS gl_PrimitiveID input didn't match to any VS
output; since normally an FS input with no corresponding VS output
leads to undefined data, the SF setup code used to just arbitrarily
assign it to receive data from attribute 0.
This patch changes the SF setup code so that instead of arbitrarily
using attribute 0, it assigns the unmatched FS input to receive
gl_PrimitiveID. In the case where the FS input really is
gl_PrimitiveID, this produces the intended result. In all other
cases, no harm is done since GL specifies that the behaviour is
undefined.
Fixes piglit test primitive-id-no-gs.
v2: If an attribute is already being overridden with point
coordinates, don't try to also override it with gl_PrimitiveID. This
is necessary to avoid regressing piglit tests such as
shaders/glsl-fs-pointcoord.
Reviewed-by: Eric Anholt <eric@anholt.net>
v2: Actually implement interop between the gallium
state tracker and the VDPAU backend.
v3: Make it also available in non legacy contexts,
fix video buffer sharing.
v4: deny interop if we don't have the same screen object
v5: rebased on upstream changes
v6: implemented VDPAUGetSurfaceivNV, improved error handling,
unregister all surfaces in VDPAUFiniNV
v7: squash merge with Mareks changes
Signed-off-by: Christian König <christian.koenig@amd.com>
ir_txf expects an ivec* coordinate, and may be larger than ivec2;
shuffle things around so that this will work.
V2: Fix style nits, use ir_builder
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
It turns out that nonzero offsets with gsampler2DRect don't work -- they
just return garbage. Work around this by folding the offset into the
coord.
Done as an IR pass rather than yet another hack in the visitors because
it's clear what's going on this way. Can possibly reuse this to replace
the existing txf coord+offset hacks.
V2: Use ir_builder
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We don't have a message that does 4 independent offsets; a lowering
pass needs to lower it to 4 normal gather4s before reaching this
point.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Note that gather4_po_c's parameters are too long for SIMD16. It might be
worth emitting 2xSIMD8 messages in this case at some point.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gather4_c's argument layout is straightforward -- refz just goes on the
end.
gather4_po_c's layout however -- the array index is replaced with refz.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
ARB_gpu_shader5's textureGather*() functions which take shadow samplers
have a separate `refz` parameter rather than adding it to the
coordinate.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
V3: fixup crazy check for whether we need to emit the coordinate after
custom handling.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Some texturing ops are about to have nonconstant offset support; the
offset in the header in these cases should be zero.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The generator code ends up clearer this way than if we had to sniff
via the message length. Implemented via the gather4_po message in
hardware, which is present in Gen7 and later.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Prior to ARB_gpu_shader5 / GLSL 4.0, the offset is required to be
a constant expression.
With that extension, it is relaxed to be an arbitrary expression.
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Since 062317d667 (i965: Go back to using the kernel SOL reset feature.)
we've been flushing the batch on BeginTransformFeedback(). So it's not
necessary to do it on EndTransformFeedback(). A PIPE_CONTROL will work.
This makes gen7_end_transform_feedback() exactly the same as the gen6
variant. However, they'll diverge again shortly.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>