Commit Graph

59437 Commits

Author SHA1 Message Date
Brian Paul
ea9fe9ebdb svga: reindent drawing code 2013-10-29 08:09:34 -06:00
Eric Anholt
415d6dc5bd i965/vec4: Reduce working set size of live variables computation.
Orbital Explorer was generating a 4000 instruction geometry shader, which
was taking 275 trips through dead code elimination and register
coalescing, each of which updated live variables to get its work done, and
invalidated those live variables afterwards.

By using bitfields instead of bools (reducing the working set size by a
factor of 8) in live variables analysis, it drops from 88% of the profile
to 57%, and reduces overall runtime from I-got-bored-and-killed-it (Paul
says 3+ minutes) to 10.5 seconds.

Compare to f179f419d1 on the FS side.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-10-29 00:27:35 -07:00
Vadim Girlin
8bd4476010 r600g/sb: fix value::is_fixed()
This prevents unnecessary (and wrong) register allocation in the
scheduler for preloaded values in fixed registers.

Fixes interpolation-mixed.shader_test on rv770
(and probably on all other pre-evergreen chips).

Signed-off-by: Vadim Girlin <vadimgirlin@gmail.com>
Tested-by: Andreas Boll <andreas.boll.dev@gmail.com>
2013-10-29 05:49:21 +04:00
Eric Anholt
08bf52712e glsl: Drop no-op shifts involving 0.
I noticed this in a shader in Unigine Heaven that was spilling.  While it
doesn't really reduce register pressure, it shaves a few instructions
anyway (7955 -> 7882).

v2: Fix turning "0 >> x" into "x" instead of "0" (caught by Erik
    Faye-Lund).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2013-10-28 14:07:31 -07:00
Eric Anholt
3a0fdf2ab6 glsl: Use ir_builder more in opt_algebraic.
While ir_builder is slightly less efficient, we're only increasing the
work when there's actual optimization being done, and it's way more
readable code.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2013-10-28 14:07:31 -07:00
Eric Anholt
27bcb5063f glsl: Move common code out of opt_algebraic's handle_expression().
Matt and I had each screwed up these common required patterns recently, in
ways that wouldn't have been noticed for a long time if not for code
review.  Just enforce it in the caller so that we don't rely on code
review catching these bugs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2013-10-28 14:07:31 -07:00
Carl Worth
29996e2199 Remove error when calling glGenQueries/glDeleteQueries while a query is active
There is nothing in the OpenGL specification which prevents the user from
calling glGenQueries to generate a new query object while another object is
active. Neither is there anything in the Mesa implementation which prevents
this. So remove the INVALID_OPERATION errors in this case.

Similarly, it is explicitly allowed by the OpenGL specification to delete an
active query, so remove the assertion for that case, replacing it with the
necesssary state updates to end the query, (clear the bindpt pointer and call
into the driver's EndQuery hook).

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Tested-by: Brian Paul <brianp@vmware.com>
2013-10-28 12:56:49 -07:00
Kenneth Graunke
5563dfabc8 i965: Also emit HiZ and Stencil packets when disabling depth on Gen6.
The normal drawing path does this, and it's necessary on Ivybridge,
so let's try it on Sandybridge too.  It's not explicitly documented
as necessary, but might help with hangs.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
2013-10-28 11:29:36 -07:00
Kenneth Graunke
29e5d5db51 i965: Also emit HIER_DEPTH and STENCIL packets when disabling depth.
From the documentation:
"[DevIVB] 3DSTATE_DEPTH_BUFFER must always be programmed along with the
 other Depth/Stencil state commands(i.e. 3DSTATE_CLEAR_PARAMS,
 3DSTATE_STENCIL_BUFFER, or 3DSTATE_HIER_DEPTH_BUFFER)."

We normally do this, but BLORP was failing to do so in the case where it
disables depth.

Not observed to fix anything yet.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
2013-10-28 11:29:33 -07:00
Kenneth Graunke
65b1f642ac i965: Move post-sync non-zero flush for 3DSTATE_MULTISAMPLE.
For some reason, we put the flush in the caller, rather than just before
emitting the packet.  This is more than a cosmetic problem: BLORP calls
gen6_emit_3dstate_multisample() directly, and so it missed the flush.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
2013-10-28 11:29:32 -07:00
Kenneth Graunke
10a918e52c i965: Also guard 3DSTATE_DRAWING_RECTANGLE with a flush in blorp.
Non-pipelined commands need this flush.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
2013-10-28 11:29:31 -07:00
Kenneth Graunke
3aef1fefb4 i965: Emit post-sync non-zero flush before 3DSTATE_DRAWING_RECTANGLE.
This is another non-pipelined command that needs a flush on Sandybridge.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
2013-10-28 11:29:29 -07:00
Kenneth Graunke
436e815a25 i965: Emit post-sync non-zero flush before 3DSTATE_GS_SVB_INDEX.
From the comments above intel_emit_post_sync_nonzero_flush:
"[DevSNB-C+{W/A}] Before any depth stall flush (including those
 produced by non-pipelined state commands), software needs to first
 send a PIPE_CONTROL with no bits set except Post-Sync Operation != 0."

This suggests that every non-pipelined (0x79xx) command needs a
post-sync non-zero flush before it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
2013-10-28 11:29:27 -07:00
Daniel Vetter
32a3f5f6d7 i965: CS writes/reads should use I915_GEM_INSTRUCTION
Otherwise the gen6 w/a in the kernel won't kick in and the write will
land nowhere.

Inspired by a patch Ken pointed me at which had the same issue (but
isn't yet merged and also for a gen7+ feature). An audit of the entire
driver didn't reveal any other case than the one in in the write_reg
helper used by the gen6 queryobj code.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Tested-by: Xinkai Chen <yeled.nova@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Cc: "9.2" <mesa-stable@lists.freedesktop.org>
2013-10-28 11:29:15 -07:00
Anuj Phogat
f278d49c4b i965: Do not set bilinear_filter flag in case of multisample blits
Setting bilinear_filter flag in case of multisample blits with
GL_LINEAR filter causes incorrect behavior in translate_dst_to_src()
function. This broke Modern Warfare (1, 2 and 3) on SNB, IVB and HSW.

Tested on SNB and IVB, no Piglit regressions. Trace file of the game
(taken with apitrace) works fine with this patch.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69078
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reported-by: Armin K <krejzi@email.com>
Tested-by: Armin K <krejzi@email.com>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-10-28 09:33:01 -07:00
Rico Schüller
14f02cdee8 mesa: Remove trailing whitespace in texparam.c
Signed-off-by: Rico Schüller <kgbricola@web.de>
Signed-off-by: Brian Paul <brianp@vmware.com>
2013-10-28 08:43:40 -06:00
Brian Paul
0ce3bfbd40 mesa: use void in _mesa_VDPAUFiniNV() as in the header file 2013-10-28 08:37:39 -06:00
Timothy Arceri
b59c5926cb glsl: Add check for unsized arrays to glsl types
The main purpose of this patch is to increase readability of
the array code by introducing is_unsized_array() to glsl_types.
Some redundent is_array() checks are also removed, and small number
of other related clean ups.

The introduction of is_unsized_array() should also make the
ARB_arrays_of_arrays code simpler and more readable when it arrives.

V2: Also replace code that checks for unsized arrays directly with the
length variable

Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>

v3 (Paul Berry <stereotype441@gmail.com>): clean up formatting.
Separate whitespace cleanups to their own patch.

Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-10-28 06:06:04 -07:00
Timothy Arceri
5cd7eb9f07 glsl: whitespace cleanups.
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>

v2 (Paul Berry <stereotype441@gmail.com>): Separate from "glsl: Add
check for unsized arrays to glsl types".

Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-10-28 06:06:04 -07:00
Timothy Arceri
e14abf566b glsl: Fix comment
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-10-28 06:05:51 -07:00
Christian König
925ffa8c4a vl/h264: split fields into SPS/PPS
Add alot of missing fields as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
2013-10-28 11:08:12 +01:00
Christian König
6f2410c9aa radeon/uvd: fix H264 chroma format handling
Signed-off-by: Christian König <christian.koenig@amd.com>
2013-10-28 11:06:37 +01:00
Christian König
cc49baeedc vl: add 400 chroma format as well
Signed-off-by: Christian König <christian.koenig@amd.com>
2013-10-28 11:06:18 +01:00
Chia-I Wu
d2fdc0d634 ilo: minor cleanups for recent interface changes
Kill ilo_bind_sampler_states2 and ilo_set_sampler_views2.  Map
PIPE_FORMAT_R10G10B10A2_UINT to BRW_SURFACEFORMAT_R10G10B10A2_UINT.
2013-10-28 11:40:41 +08:00
Timothy Arceri
d1d3b1e361 glsl: Move error message inside validation check reducing duplicate message handling
v2 (Paul Berry <stereotype441@gmail.com): Fix precedence error in call
to _mesa_glsl_error().

Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-10-27 10:23:52 -07:00
Paul Berry
e79e6c5911 i965: Make fs gl_PrimitiveID input work even when there's no gs.
When a geometry shader is present, the fragment shader gl_PrimitiveID
input acts like an ordinary varying, receiving data from the gs
gl_PrimitiveID output.  When there's no geometry shader, we have to
ask the fixed function SF hardware to provide the primitive ID to the
fragment shader instead.

Previously, the SF setup code would handle this situation by
recognizing that the FS gl_PrimitiveID input didn't match to any VS
output; since normally an FS input with no corresponding VS output
leads to undefined data, the SF setup code used to just arbitrarily
assign it to receive data from attribute 0.

This patch changes the SF setup code so that instead of arbitrarily
using attribute 0, it assigns the unmatched FS input to receive
gl_PrimitiveID.  In the case where the FS input really is
gl_PrimitiveID, this produces the intended result.  In all other
cases, no harm is done since GL specifies that the behaviour is
undefined.

Fixes piglit test primitive-id-no-gs.

v2: If an attribute is already being overridden with point
coordinates, don't try to also override it with gl_PrimitiveID.  This
is necessary to avoid regressing piglit tests such as
shaders/glsl-fs-pointcoord.

Reviewed-by: Eric Anholt <eric@anholt.net>
2013-10-27 10:23:39 -07:00
Vinson Lee
7f76368305 mesa: Add GL_NV_vdpau_interop functions to dispatch_sanity.cpp.
Fixes 'make check' failures introduced with commit
80964226e9.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70900
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
2013-10-26 23:13:51 -07:00
Brian Paul
bc23944091 mesa: add vdpau.c and st_vdpau.c to src/mesa/SConscript
Fixes SCons build.
2013-10-26 07:24:17 -06:00
Christian König
80964226e9 implement NV_vdpau_interop v7
v2: Actually implement interop between the gallium
    state tracker and the VDPAU backend.

v3: Make it also available in non legacy contexts,
    fix video buffer sharing.

v4: deny interop if we don't have the same screen object

v5: rebased on upstream changes

v6: implemented VDPAUGetSurfaceivNV, improved error handling,
    unregister all surfaces in VDPAUFiniNV

v7: squash merge with Mareks changes

Signed-off-by: Christian König <christian.koenig@amd.com>
2013-10-26 12:13:36 +02:00
Christian König
3d3a0b9b67 winsys/radeon: make radeon_drm_winsys_create public
Otherwise OpenGL/VDPAU interop won't work as expected.

Signed-off-by: Christian König <christian.koenig@amd.com>
2013-10-26 12:13:36 +02:00
Chris Forbes
598ca510b8 i965: Remove ir_txf coord+offset special case in visitors
Just let it be handled by the lowering pass.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-10-26 22:56:27 +13:00
Chris Forbes
06de9f8ff1 i965: Generalize coord+offset lowering pass for ir_txf
ir_txf expects an ivec* coordinate, and may be larger than ivec2;
shuffle things around so that this will work.

V2: Fix style nits, use ir_builder

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-10-26 22:56:25 +13:00
Chris Forbes
72b5e9c42a i965: Add lowering pass to fold offset into unnormalized coords
It turns out that nonzero offsets with gsampler2DRect don't work -- they
just return garbage. Work around this by folding the offset into the
coord.

Done as an IR pass rather than yet another hack in the visitors because
it's clear what's going on this way. Can possibly reuse this to replace
the existing txf coord+offset hacks.

V2: Use ir_builder

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-10-26 22:56:09 +13:00
Chris Forbes
a936000db6 i965: Add lowering pass for splitting textureGatherOffsets
Rewrites textureGatherOffsets(s, p, offsets) into

   gvec4(
      textureGatherOffset(s, p, offsets[0]).w,
      textureGatherOffset(s, p, offsets[1]).w,
      textureGatherOffset(s, p, offsets[2]).w,
      textureGatherOffset(s, p, offsets[3]).w
      )

V2: Use ir_builder to be slightly clearer.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-10-26 22:28:26 +13:00
Chris Forbes
4c1eae5395 i965: Add asserts to ensure that ir_tg4 offset arrays are lowered
We don't have a message that does 4 independent offsets; a lowering
pass needs to lower it to 4 normal gather4s before reaching this
point.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-10-26 22:28:05 +13:00
Chris Forbes
de8948a0b6 glsl: add signatures for textureGatherOffsets()
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-10-26 22:28:03 +13:00
Chris Forbes
a9de744a26 glsl: add support for texture functions with offset arrays
This is needed for textureGatherOffsets()

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-10-26 22:27:37 +13:00
Chris Forbes
3c98d77460 i965/fs: Add support for shadow comparitors with gather4
Note that gather4_po_c's parameters are too long for SIMD16. It might be
worth emitting 2xSIMD8 messages in this case at some point.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-10-26 22:16:32 +13:00
Chris Forbes
32f898a71c i965/vs: Add support for shadow comparitors with gather4
gather4_c's argument layout is straightforward -- refz just goes on the
end.

gather4_po_c's layout however -- the array index is replaced with refz.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-10-26 22:16:28 +13:00
Chris Forbes
070c841111 i965: Add Gen7 gather4_c and gather4_po_c message types
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-10-26 22:16:27 +13:00
Chris Forbes
43e3ae112f glsl: Add new textureGather[Offset]() overloads for shadow samplers
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-10-26 22:16:24 +13:00
Chris Forbes
af1dfd99b7 glsl: Add support for separate reference Z for shadow samplers
ARB_gpu_shader5's textureGather*() functions which take shadow samplers
have a separate `refz` parameter rather than adding it to the
coordinate.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-10-26 22:16:19 +13:00
Chris Forbes
fb08769bb6 i965/vs: add support for gather4 with nonconstant offsets
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
2013-10-26 22:10:02 +13:00
Chris Forbes
938d909894 i965/fs: add support for gather4 with nonconstant offsets
V3: fixup crazy check for whether we need to emit the coordinate after
    custom handling.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-10-26 22:08:51 +13:00
Chris Forbes
bdcacaed9c i965: relax brw_texture_offset assert
Some texturing ops are about to have nonconstant offset support; the
offset in the header in these cases should be zero.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-10-26 21:54:15 +13:00
Chris Forbes
6bb2cf2107 i965: Add SHADER_OPCODE_TG4_OFFSET for gather with nonconstant offsets.
The generator code ends up clearer this way than if we had to sniff
via the message length. Implemented via the gather4_po message in
hardware, which is present in Gen7 and later.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-10-26 21:54:15 +13:00
Chris Forbes
cd8505bfb8 i965: add missing tg4 case in brw_instruction_name
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-10-26 21:54:15 +13:00
Chris Forbes
4fa123deac glsl: relax const offset requirement for textureGatherOffset
Prior to ARB_gpu_shader5 / GLSL 4.0, the offset is required to be
a constant expression.

With that extension, it is relaxed to be an arbitrary expression.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-10-26 21:54:15 +13:00
Chris Forbes
00235402a0 glsl: Add ARB_gpu_shader5 textureGatherOffset signatures
- gsampler2DRect
- optional `comp` parameter

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2013-10-26 21:54:15 +13:00
Kenneth Graunke
d07d38e696 i965: Weaken the flushing in gen7_end_transform_feedback().
Since 062317d667 (i965: Go back to using the kernel SOL reset feature.)
we've been flushing the batch on BeginTransformFeedback().  So it's not
necessary to do it on EndTransformFeedback().  A PIPE_CONTROL will work.

This makes gen7_end_transform_feedback() exactly the same as the gen6
variant.  However, they'll diverge again shortly.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-10-25 22:25:38 -07:00