third_party_mesa3d

Author	SHA1	Message	Date
Anuj Phogat	a92e5f7cf6	i965: Use sample barycentric coordinates with per sample shading Current implementation of arb_sample_shading doesn't set 'Barycentric Interpolation Mode' correctly. We use pixel barycentric coordinates for per sample shading. Instead we should select perspective sample or non-perspective sample barycentric coordinates. It also enables using sample barycentric coordinates in case of a fragment shader variable declared with 'sample' qualifier. e.g. sample in vec4 pos; A piglit test to verify the implementation has been posted on piglit mailing list for review. V2: Do not interpolate all the 'in' variables at sample position if fragment shader uses 'sample' qualifier with one of them. For example we have a fragment shader: #version 330 #extension ARB_gpu_shader5: require sample in vec4 a; in vec4 b; main() { ... } Only 'a' should be sampled at sample location, not 'b'. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>	2014-01-21 14:42:27 -08:00
Anuj Phogat	3313cc269b	i965: Add an option to ignore sample qualifier This will be useful in my next patch which depends on a functionality of _mesa_get_min_invocations_per_fragment() to ignore the sample qualifier (prog->IsSample) based on a flag passed to it. Cc: mesa-stable@lists.freedesktop.org Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com> Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>	2014-01-21 14:42:27 -08:00
Matt Turner	78d65476b6	mesa/x86: Remove dead read_rgba_span_x86.h. Dead since `304f7a13`.	2014-01-21 14:20:44 -08:00
Matt Turner	bf0773aeca	i965/fs: Optimize LRP with x == y into a MOV. total instructions in shared programs: 1487331 -> 1485988 (-0.09%) instructions in affected programs: 45638 -> 44295 (-2.94%) GAINED: 7 LOST: 0 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2014-01-21 14:20:44 -08:00
Jordan Justen	8d37e9915a	glsl: Optimize open-coded lrp into lrp. total instructions in shared programs: 1498191 -> 1487051 (-0.74%) instructions in affected programs: 669388 -> 658248 (-1.66%) GAINED: 1 LOST: 0 Reviewed-by: Matt Turner <mattst88@gmail.com> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>	2014-01-21 14:20:44 -08:00
Matt Turner	13100ac142	i965: Enable AOS optimizations for the geometry shader. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-01-21 14:20:44 -08:00
Matt Turner	4bd6e0d7c6	glsl: Vectorize multiple scalar assignments Reduces vertex shader instruction counts in DOTA2 by 6.42%, L4D2 by 4.61%, and CS:GO by 5.71%. total instructions in shared programs: 1500153 -> 1498191 (-0.13%) instructions in affected programs: 59919 -> 57957 (-3.27%) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-01-21 14:20:44 -08:00
Matt Turner	5e82d8a9da	glsl: Add parameter to .equals() to ignore an IR type. Only implemented for ir_swizzles currently, but perhaps will be useful for other IR types in the future. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-01-21 14:20:44 -08:00
Matt Turner	ebf91993c1	mesa: rename PreferDP4 to OptimizeForAOS. This flag was really just a proxy for determining whether the backend was vector (AOS) or scalar (SOA). It will be used to apply a future optimization only for vector backends. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-01-21 14:20:44 -08:00
Matt Turner	413622fbef	i965/fs: Print the maximum register pressure. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2014-01-21 14:20:44 -08:00
Kenneth Graunke	391eaa59bd	i965/fs: Show register pressure in dump_instructions() output. Dumping the number of live registers at each IP allows us to see register pressure and identify any local maxima. This should aid in debugging passes designed to reduce register pressure, as well as optimizations that suddenly trigger spilling. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2014-01-21 14:20:44 -08:00
Kenneth Graunke	3b74f4b233	i965: Compute the number of live registers at each IP. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>	2014-01-21 14:20:44 -08:00
Matt Turner	0ea600ef1a	i965/fs: Call opt_peephole_sel later in the optimization loop. Calling it after value numbering (added in the next commit) prevents some instruction count regressions. total instructions in shared programs: 1524387 -> 1523905 (-0.03%) instructions in affected programs: 13112 -> 12630 (-3.68%) GAINED: 0 LOST: 3 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2014-01-21 14:09:33 -08:00
Matt Turner	ede6c341f6	i965/fs: Calculate interference better in register_coalesce. Previously we simply considered two registers whose live ranges overlapped to interfere. Cases such as set A ------ ... \| mov B, A -- \| ... \| B \| A use B -- \| ... \| use A ------ would be considered to interfere, even though B is an unmodified copy of A whose live range fit wholly inside that of A. If no writes to A or B occur between the mov B, A and the use of B then we can safely coalesce them. Instead of removing MOV instructions, we make them NOPs and remove them at once after the main pass is finished in order to avoid recomputing live intervals (which are needed to perform the previous step). total instructions in shared programs: 1543768 -> 1513077 (-1.99%) instructions in affected programs: 951563 -> 920872 (-3.23%) GAINED: 46 LOST: 22 Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2014-01-21 14:09:33 -08:00
Matt Turner	4a7d0c550e	i965/fs: Support coalescing registers of size > 1. total instructions in shared programs: 1550048 -> 1549880 (-0.01%) instructions in affected programs: 1896 -> 1728 (-8.86%) Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2014-01-21 14:09:33 -08:00
Matt Turner	78fa6172e1	i965/fs: Assert that var < num_vars. Helped to track down a problem in a version of the next commit. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2014-01-21 14:09:33 -08:00
Matt Turner	9bb4d71fd2	i965/fs: Add a comment explaining how register coalescing works. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2014-01-21 14:09:33 -08:00
Matt Turner	2dfb067139	i965/fs: Add and use MAX_SAMPLER_MESSAGE_SIZE definition. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2014-01-21 14:09:33 -08:00
Matt Turner	81d52419cf	mesa: Add STRINGIFY macro. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2014-01-21 14:09:33 -08:00
Matt Turner	80b949f16b	i965/fs: Fix the example about overwriting uniforms in SIMD16. mov takes only a single source argument. Example instruction inexplicably changed from add to mov in commit `f10f5e49`. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2014-01-21 14:09:33 -08:00
Matt Turner	71bc11a375	i965: Print reg_offset for vgrf of size > 1 in dump_instruction(). Previously we wouldn't print the +0 for the first part of a VGRF of size greater than 1. Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>	2014-01-21 14:09:33 -08:00
Grigori Goronzy	955c93dc08	glsl: Match unnamed record types across stages. Unnamed record types are assigned to separate types per stage, e.g. if uniform struct { ... } a; is defined in both vertex and fragment shader, two separate types will result with different names. When linking the shader, this results in a type conflict. However, there is no reason why this should not be allowed according to GLSL specifications. Compare and match record types when linking shader stages to avoid this conflict. Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-01-21 14:01:09 -08:00
Grigori Goronzy	41c9bf884f	glsl: Extract function for record comparisons. Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-01-21 14:01:09 -08:00
Brian Paul	6d8cf5181a	docs: remove some ancient README.* files None of this info is relevant anymore. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-01-21 10:53:51 -08:00
Brian Paul	b9f68d927e	svga: implement TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS Fixes several colorbuffer tests, including piglit "fbo-drawbuffers-none" for "gl_FragColor" and "glDrawPixels" cases. v2: rework patch to only avoid creating extra shader variants when TGSI_PROPERTY_FS_COLOR0_WRITES_ALL_CBUFS is not specified. Per Jose. Use a write_color0_to_n_cbufs key field to replicate color0 to N color buffers only when N > 0 and WRITES_ALL_CBUFS is set. Reviewed-by: José Fonseca <jfonseca@vmware.com>	2014-01-21 10:53:51 -08:00
Brian Paul	384fd64ab1	svga: rename color output variables Just to be bit more readable. Reviewed-by: José Fonseca <jfonseca@vmware.com>	2014-01-21 10:53:51 -08:00
Brian Paul	f6bc7d6586	svga: fix clearing for null color buffers Fixes piglit "fbo-drawbuffers-none glClear" test. Reviewed-by: José Fonseca <jfonseca@vmware.com>	2014-01-21 10:53:51 -08:00
Brian Paul	ff59b3d9ee	mesa: add missing TYPE_DOUBLEN_2 cases in get.c The new TYPE_DOUBLEN_2 type was added in `0e60d850` but the code to return values of that type wasn't completed. Fixes conform's default state test. glGetFloatv(GL_DEPTH_RANGE) wasn't returning anything. v2: remove stray 'break' statements. Reviewed-by: Jose Fonseca <jfonseca@vmware.com>	2014-01-21 10:53:12 -08:00
Paul Berry	51000c2ff8	i965: Modify some error messages to refer to "vec4" instead of "vs". These messages are in code that is shared between the VS and GS back-ends, so use the terminology "vec4" to avoid confusion. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-01-21 09:05:33 -08:00
Paul Berry	a4d68e9ee9	i965: Add GS support to INTEL_DEBUG=shader_time. Previously, time spent in geometry shaders would be counted as part of the vertex shader time. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-01-21 09:05:12 -08:00
Roland Scheidegger	e23e4f67be	draw: fix points with negative w coords for d3d style point clipping Even with depth clipping disabled, vertices which have negative w coords must be discarded. And since we don't have a proper guardband implementation yet (relying on driver to handle all values except infs/nans in rasterization for such points) we need to kill them off manually (as they can end up with coordinates inside viewport otherwise). v2: use 0.0f instead of 0 (spotted by Brian). Reviewed-by: Jose Fonseca <jfonseca@vmware.com> Reviewed-by: Brian Paul <brianp@vmware.com>	2014-01-21 17:49:02 +01:00
Kenneth Graunke	ad04e396fa	i965: Reserve space for "Vertex Count" in GS outputs. v2: Also increment ir->offset in the GS visitor, rather than at the final assembly generation stage (requested by Paul). Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Paul Berry <stereotype441@gmail.com>	2014-01-21 00:20:14 -08:00
Kenneth Graunke	94c0a11b19	i965: Update blitter code for 48-bit addresses. v2: Rebase on Eric's SET_FIELD changes. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> [v1]	2014-01-20 16:21:52 -08:00
Kenneth Graunke	23827756f3	i965: Update PIPE_CONTROL packet lengths for Broadwell. On Broadwell, PIPE_CONTROL needs an extra DWord to accomodate the 48-bit addressing. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-01-20 15:38:24 -08:00
Kenneth Graunke	f7e76e00b6	i965: Re-combine the Gen4-5 and Gen6+ write_depth_count functions. Now that we have a helper function that handles the PIPE_CONTROL variations between the various platforms, these are basically the same. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-01-20 15:38:23 -08:00
Kenneth Graunke	f5dd608db2	i965: Create a helper function for emitting PIPE_CONTROL writes. There are a lot of places that use PIPE_CONTROL to write a value to a buffer (either an immediate write, TIMESTAMP, or PS_DEPTH_COUNT). Creating a single function to do this seems convenient. As part of this refactor, we now set the PPGTT/GTT selection bit correctly on Gen7+. Previously, we set bit 2 of DW2 on all platforms. This is correct for Sandybridge, but actually part of the address on Ivybridge and later! Broadwell will also increase the length of these packets by 1; with the refactoring, we should have to adjust that in substantially fewer places, giving us confidence that we've hit them all. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-01-20 15:38:23 -08:00
Kenneth Graunke	35458a99c0	i965: Use full-length PIPE_CONTROL packets for workaround writes. I believe that PIPE_CONTROL uses the length field to decide whether to do 32-bit or 64-bit writes. A length of 4 would do a 32-bit write, while a length of 5 would do a 64-bit write. (I haven't verified this, though.) For workaround writes, we don't care what value gets written, or how much data. We're only writing something because hardware bugs mandate that do so. So using a 64-bit write should be fine. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-01-20 15:38:23 -08:00
Kenneth Graunke	4b9e5c985c	i965: Emit full-length PIPE_CONTROLs for (non-write) flushes. The PIPE_CONTROL packet actually has 5 DWords on Gen6+: 1. Header 2. Flags 3. Address 4. Immediate Data: Lower DWord 5. Immediate Data: Upper DWord We just never emitted the last one. While it appears to work, it's probably safer to emit the entire thing. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Matt Turner <mattst88@gmail.com>	2014-01-20 15:38:23 -08:00
Kenneth Graunke	9420b577dd	i965: Create a helper function for emitting PIPE_CONTROL flushes. These days, we need to emit PIPE_CONTROL flushes all over the place. Being able to do that via a single function call seems convenient. Broadwell will also increase the length of these packets by 1; with the refactoring, we should have to do this in substantially fewer places. v2: Add back forgotten intel_emit_post_sync_nonzero_flush (caught by Eric Anholt). Drop unlikely() from BLT_RING check. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Eric Anholt <eric@anholt.net>	2014-01-20 15:38:16 -08:00
Kenneth Graunke	ded5674689	i965: Fix MI_STORE_REGISTER_MEM for Broadwell. It now takes a 48-bit address. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net>	2014-01-20 15:12:23 -08:00
Kenneth Graunke	f11c1feaf7	i965: Introduce an OUT_RELOC64 macro. Broadwell uses 48-bit addresses. The first DWord is the low 32 bits, and the second DWord is the high 16 bits. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-01-20 15:12:23 -08:00
Kenneth Graunke	67ebcb4711	i965: Use the new drm_intel_bo offset64 field. libdrm 2.4.52 introduces a new 'uint64_t offset64' field, intended to replace the old 'unsigned long offset' field. To preserve ABI, libdrm continues to store the presumed offset in both locations. On Broadwell, a 64-bit kernel may place BOs at "high" (> 4G) addresses. However, with a 32-bit userspace, the 'unsigned long offset' field will only be 32-bit, which is not large enough to hold this value. We need to use a proper uint64_t (like the kernel does). Technically, a lot of this code doesn't affect Broadwell, so we could leave it using the old field. But it makes sense to just switch to the new, properly typed field. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-01-20 15:12:23 -08:00
Kenneth Graunke	77425ef91a	build: Require libdrm 2.4.52 for Intel. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>	2014-01-20 15:12:23 -08:00
Kenneth Graunke	5f4eed3575	i965: Delete intel_batchbuffer_emit_reloc_fenced. Nothing in i965 uses it. Signed-off-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>	2014-01-20 15:12:12 -08:00
Ian Romanick	4cd8011907	i915: Silence warning: unused parameter warning in intel_bufferobj_buffer intel_buffer_objects.c: In function 'old_intel_bufferobj_buffer': intel_buffer_objects.c:471:17: warning: unused parameter 'flag' [-Wunused-parameter] The parameter hasn't been used since the i915 and i965 drivers had their breakup. i965 got the flags, and i915 got to cry itself to sleep. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-01-20 11:40:46 -08:00
Ian Romanick	8468f437e8	i915: Ensure that intel_bufferobj_map_range meets alignment guarantees Not actually tested, but the changes are identical to the i965 changes that are tested. v2: Remove MAX2(64, ...). Suggested by Ken (in the i965 version of this patch). Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Cc: Siavash Eliasi <siavashserver@gmail.com>	2014-01-20 11:40:41 -08:00
Ian Romanick	1ec663ab19	i965: Ensure that intel_bufferobj_map_range meets alignment guarantees No piglit regressions on IVB. With minor tweaks to the arb_map_buffer_alignment-map-invalidate-range test (disable the extension check, set alignment to 64 instead of querying), the i965 driver would fail the test without this patch (as predicted by Eric). With this patch, it passes. v2: Remove MAX2(64, ...). Suggested by Ken. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Eric Anholt <eric@anholt.net> Cc: Siavash Eliasi <siavashserver@gmail.com>	2014-01-20 11:40:34 -08:00
Ian Romanick	c2352a88ed	docs: Note that GL_ARB_viewport_array is done on i965 At least for GEN7+, anyway. Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-01-20 11:32:05 -08:00
Courtney Goeltzenleuchter	7837f425e7	i965: Enable ARB_viewport_array v2 (idr): Only enable the extension on GEN7+ w/core profile because it requires geometry shaders. v3 (idr): Add some casting to fix setting of ViewportBounds.Min. Negating an unsigned value, then casting to float doesn't do what you might think it does. Signed-off-by: Courtney Goeltzenleuchter <courtney@LunarG.com> Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-01-20 11:32:05 -08:00
Ian Romanick	d3ee8ba346	i965: Consider all viewports before enabling guardband clipping Signed-off-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2014-01-20 11:32:05 -08:00

... 9 10 11 12 13 ...

61131 Commits