Commit Graph

55787 Commits

Author SHA1 Message Date
Chris Forbes
2a528889a3 docs: mark ARB_texture_storage_multisample done
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:42 +13:00
Chris Forbes
d25b4d5e90 i965: enable ARB_texture_storage_multisample on Gen6+
This can be enabled everywhere that ARB_texture_multisample is
supported -- ARB_texture_storage is supported on everything.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:40 +13:00
Chris Forbes
e0015c819c mesa: allow multisample texture targets in [Get]TexParameter*
ARB_texture_storage_multisample allows texture parameters to be
queried for TEXTURE_2D_MULTISAMPLE and TEXTURE_2D_MULTISAMPLE_ARRAY
targets.

Some parameters may also be set, with the following exceptions:

- TEXTURE_BASE_LEVEL may not be set to a nonzero value; generates
   INVALID_OPERATION

- any state which appears in the `per-sampler` state table may not
  be set; generates INVALID_OPERATION

V2: Don't introduce bogus handling of TEXTURE_MAX_LEVEL

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:36 +13:00
Chris Forbes
b15c558c85 mesa: improve reported function name in Tex*Multisample
Now that there are 4 variants, just pass the function name into
teximagemultisample rather than reconstructing it.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:34 +13:00
Chris Forbes
9cbfe98bfc mesa: add enable bit for ARB_texture_storage_multisample
Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:32 +13:00
Chris Forbes
719974b54c glapi: add definition of ARB_texture_storage_multisample
Adds XML for the extension, dispatch_sanity enabling, and the two new
entrypoints. These are both implemented by calling the shared
teximagemultisample() with immutable=GL_TRUE.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:28 +13:00
Chris Forbes
788b0f8535 mesa: add support for immutable textures to teximagemultisample()
The new entrypoints will come later, but this adds the actual logic for
supporting immutable multisample textures:

- The immutability flag is set as desired.
- Attempting to modify an immutable multisample texture produces
  INVALID_OPERATION.

Note: The extension spec does not mention adding this behavior to
TexImage*Multisample, but it seems like the reasonable thing to do.

V2: - Cover missing error cases (unsized formats; texture object zero)

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
[V1] Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:22 +13:00
Chris Forbes
7f32b9560b mesa: extract _mesa_is_legal_tex_storage_format helper
This is about to be used in teximagemultisample() when immutable=true.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-31 22:19:13 +13:00
Kenneth Graunke
fdc5941972 mesa: Delete VERT_ATTRIB_GENERIC_NV and VERT_BIT_GENERIC_NV macros.
These haven't been used since we deleted NV_vertex_program support.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-30 19:19:45 -07:00
Eric Anholt
0967c362bf i965: Fix an inconsistency inb the VUE map with gl_ClipVertex on gen4/5.
We are intentionally not allocating a slot for gl_ClipVertex.  But by
leaving the bit set in the slots_valid, the fragment shader's computation
of where varyings are in urb entry coming out of the SF would be off by
one.  Fixes rendering in Freespace 2 SCP, and improves rendering in TF2.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62830
Tested-by: Joaquín Ignacio Aramendía <samsagax@gmail.com>
NOTE: This is a candidate for the 9.1 branch.
Reviewed-and-tested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
2013-03-30 17:24:18 -07:00
Eric Anholt
9dd19575d3 intel: Remove a never-taken debug print path.
Alessandro Pignotti noted when I added this code in commit
0e723b135b that it's in the else block for
"if (busy)", so this debug print couldn't happen.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-30 17:23:50 -07:00
Brian Paul
c34bbe110d st/mesa: add ir_lod case in GLSL->TGSI code to silence warning 2013-03-29 17:21:33 -06:00
Ian Romanick
e0131196ca glsl: Generated masked write instead of vector array index for UBO lowering
When reading a column from a row-major matrix, we would slot the single
value read into the vector using an ir_dereference_array of the vector
with a constant index.  This will (eventually) get optimized to a
masked-write, so just generate the masked write in the first place.

v2: Remove unused variable 'chan'.  Suggested by Ken.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Eric Anholt <eric@anholt.net>
2013-03-29 12:01:14 -07:00
Ian Romanick
65cc68f430 glsl: Replace open-coded dot-product with dot
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Paul Berry <stereotype441@gmail.com>
2013-03-29 12:01:11 -07:00
Ian Romanick
dbf94d105a glsl: Replace constant-index vector array accesses with swizzles
Search and replace:

    ][0] -> ].x
    ][1] -> ].y
    ][2] -> ].z
    ][3] -> ].w

Fixes piglit tests inverse-mat[234].{vert,frag}.  These tests call the
inverse function with constant parameters and expect proper constant
folding to happen.  My suspicion is that this patch papers over some bug
in constant propagation involving array accesses.

Either way, all of these accesses eventually get lowered to swizzles.
This cuts out the middle man (saving a trivial amount of CPU).

NOTE: This is a candidate for the 9.1 branch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: Eric Anholt <eric@anholt.net>
Cc: Paul Berry <stereotype441@gmail.com>
2013-03-29 12:01:07 -07:00
Ian Romanick
c770faea0a glsl: Add missing bool case in glsl_type::get_scalar_type
Since the case was missing bec4->get_scalar_type() would return bvec4,
but vec4->get_scalar_type() would return float.

NOTE: This is a candidate for stable branches.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2013-03-29 12:01:01 -07:00
Kenneth Graunke
57a502518e i965: Fix INTEL_DEBUG=shader_time for fragment shaders with discards.
"discard" instructions generate HALT instructions which jump to a final
HALT near the end of the shader.  Previously, fs_generator created this
final jump target when it saw the first FS_OPCODE_FB_WRITE, causing it
to jump right before the FB write epilogue.  This is normally good.

However, INTEL_DEBUG=shader_time also has an epilogue section which
records the final timestamp.  The frontend emits IR for this just before
FS_OPCODE_FB_WRITE.  Unfortunately, this led to the following ordering:

1. Shader Time Epilogue
2. Final HALT (where discards jump)
3. Framebuffer Write Epilogue

This meant that discarded pixels completely skipped the shader time
epilogue, causing no ending timestamp to be written.  This obviously
led to inaccurate results.

This patch adds a new FS_OPCODE_PLACEHOLDER_HALT in the IR stream just
before any epilogue sections.  This is where the final HALT should be
generated, and makes it easy to ensure the correct ordering:

1. Final HALT
2. Shader Time Epilogue
3. Framebuffer Write Epilogue

For shaders that don't discard, this opcode compiles away to nothing.
The scheduler adds barrier dependencies to make sure that it doesn't
get moved above any FS_OPCODE_DISCARD_JUMP instructions.

One 8-wide shader in GLBenchmark 2.7 dropped from 2291.67 Gcycles to
a mere 5.13 Gcycles.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-29 11:39:32 -07:00
Eric Anholt
20d846ce8b i965: Add names for all instructions to dump_instruction() in FS and VS.
I'd previously added the minimum names to understand my dumps, but this
makes dumps in general much easier to read.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-29 11:39:21 -07:00
Matt Turner
ed6186f0e8 i965: Enable ARB_texture_query_lod.
v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-29 10:21:14 -07:00
Matt Turner
b8aa9f7d3a i965/fs: Generate LOD sampler message from ir_lod.
v2: Support Ironlake as well.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-29 10:21:14 -07:00
Dave Airlie
110ca8b1f3 glsl: Implement ARB_texture_query_lod
v2 [mattst88]:
   - Rebase.
   - #define GL_ARB_texture_query_lod to 1.
   - Remove comma after ir_lod in ir.h for MSVC.
   - Handled ir_lod in ir_hv_accept.cpp, ir_rvalue_visitor.cpp,
     opt_tree_grafting.cpp.
   - Rename textureQueryLOD to textureQueryLod, see
     https://www.khronos.org/bugzilla/show_bug.cgi?id=821
   - Fix ir_reader of (lod ...).
v3 [mattst88]:
   - Rename textureQueryLod to textureQueryLOD, pending resolution of
     Khronos 821.
   - Add ir_lod case to ir_to_mesa.cpp.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-29 10:20:26 -07:00
Matt Turner
0e0ab8a071 i965/fs: Use measured Gen7 instruction timings on Gen6.
x before
+ after
+------------------------------------------------------------------------------+
|   x                                   x   +                                  |
|   xx  ++                              x   +                                  |
|   xx  ++ +                           xx   ++                                 |
|x xxx x+++++          +           xxx x*x+*+++ +         x                   +|
|   |_____|____________A______A____M____M_|_______|                            |
+------------------------------------------------------------------------------+
    N           Min           Max        Median           Avg        Stddev
    x  23       8083.78       8287.83       8205.55     8162.7461     68.307951
    +  23       8107.56       8358.74       8224.33     8186.1765     71.506301
    No difference proven at 95.0% confidence

Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-29 10:13:27 -07:00
Matt Turner
f085b21b25 i965/fs: Increase and document MAD latency on Gen7.
58% of mad(8) generated in shader-db are reading registers from the same
bank.

Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-29 10:13:27 -07:00
Matt Turner
414ea2f560 i965/fs: Add LRP instruction latency.
Set its latency to what happens to be the default floating-point
instruction latency. One day we may want to handle latency based on
register bank information.

Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-29 10:13:27 -07:00
Matt Turner
ad4507b355 i965/fs: Add Haswell cycle timings
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-29 10:13:27 -07:00
Matt Turner
7997e59b65 i965: Note that write-after-write dependencies are blocking.
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-29 10:13:26 -07:00
Matt Turner
f91e371fee i965: Reword comment about the shared mathbox.
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-29 10:13:26 -07:00
Roland Scheidegger
5f41e08cf3 gallivm: consolidate some half-to-float and r11g11b10-to-float code
Similar enough that we can try to use shared code.
v2: fix a stupid bug using wrong variable causing mayhem with Inf and NaNs.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com
2013-03-29 16:39:40 +01:00
Chris Forbes
4412f3bc13 mesa: provide default implementation of QuerySamplesForFormat
Previously at least i915 failed to provide an implementation, but
exposed ARB_internalformat_query anyway, leading to crashes when
QueryInternalformativ was called.

Default implementation just returns 1 for everything, so is suitable for
any driver which does not support multisampling.

V2: - Move from intel to core mesa.

Signed-off-by: Chris Forbes <chrisf@ijw.co.nz>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-29 20:54:36 +13:00
Christoph Bumiller
ee624ced36 nvc0: implement MP performance counters
There's more, but this only adds (most) of the counters that are
handled directly by the shader processors.
The other counter domains are not handled on the multiprocessor and
there are no FIFO object methods for configuring them.
Instead, they have to be programmed by the kernel via PCOUNTER, and
the interface for this isn't in place yet.
2013-03-29 00:33:01 +01:00
Christoph Bumiller
480359bcf6 nvc0: enable compression when supported 2013-03-29 00:33:01 +01:00
Christoph Bumiller
25722e3454 nvc0: use NOUVEAU_GETPARAM_GRAPH_UNITS to get MP count 2013-03-29 00:33:00 +01:00
Christoph Bumiller
443b247878 nv50,nvc0: fix 3d blits, restore viewport after blit 2013-03-29 00:33:00 +01:00
Christoph Bumiller
090e73fc46 nv50: fix 3D render target setup 2013-03-29 00:33:00 +01:00
Brian Paul
b54ce3738a llvmpipe: put .bmp extension on dumped image files 2013-03-28 17:17:26 -06:00
Brian Paul
e90c56bc4e llvmpipe: add 'f' suffix to 1.0 in fixed_to_float() 2013-03-28 17:17:26 -06:00
Brian Paul
499aa3ddb4 draw: fix some build breakage when LLVM is not used
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62883
Tested-by: Vinson Lee <vlee@freedesktop.org>
2013-03-28 17:15:58 -06:00
Marek Olšák
9ad9141917 mesa: handle STATE_CURRENT_ATTRIB_MAYBE_VP_CLAMPED for parameter printing
Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-28 20:02:50 +01:00
Kenneth Graunke
9fe47756b3 i965: Tidy shader time printing code by using printf's field widths.
We can use %-6s%-6s rather than manually counting characters, resulting
in much more readable code.

This necessitates a small secondary change: using "total fs16" and ""
now causes the "" string to be padded out to 6 characters, resulting in
too much whitespace.  Splitting it into "total" and "fs16" produces the
same output as before.

Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:44 -07:00
Eric Anholt
6192e9b377 i965/vs: Include URB payload setup in shader_time.
This much more accurately reflects the cost of the vertex shader, since
the payload setup is often a significant fraction of the instructions in
the VS.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:41 -07:00
Eric Anholt
55feb19704 i965/vs: Use a send from a 2-register VGRF for shader time writes.
This will let us emit it later, after we're setting up MRFs for the
URB write.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:37 -07:00
Eric Anholt
130138030a i965/vs: Teach copy propagation about sends from GRFs.
This incidentally also teaches it a bit about gen6 math -- we now allow
unswizzled, unmodified GRF temps as the sources for math.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:34 -07:00
Eric Anholt
c3a22d42a8 i965/vs: Prepare split_virtual_grfs() for the presence of SENDs from GRFs.
v2: Fix silly bool handling, and don't add new tabs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:29 -07:00
Eric Anholt
47e795d861 i965/fs: Include everything but the final FB write in shader_time.
Previously, if you just wrote a constant color to the render target, no
time got noted at all.  This is convenient for doing single-instruction
timings, but not so much for actual program analysis.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:23 -07:00
Eric Anholt
5c5218ea61 i965/fs: Switch shader_time writes to using GRFs.
This avoids conflicts between shader_time and FB writes, so we can include
more of the program under our profiling.  This does mean hiding more of
the message setup from the optimizer, which doesn't have a way to handle
multi-reg sends from GRFs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:15 -07:00
Eric Anholt
5c039543db i965: Provide more detailed information to match shader_time to programs.
Ken asked me the other day what -1 vs 0 vs 3 vs other meant in our shader
names, and I realized that it was really unclear.  I'd like to do even
better, like noting which one is the clear shader, but that would require
exposing the metaops struct to the driver.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:11 -07:00
Eric Anholt
d2ba1c24b4 i965: Track ARB program state along with GLSL state for shader_time.
This will let us do much better printouts for non-GLSL programs.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 11:46:01 -07:00
Marek Olšák
a19f6e880a st/dri: fix crash with HUD and single buffering 2013-03-28 18:17:21 +01:00
Marek Olšák
6b5dfa42c9 st/mesa: remove leftover printfs from ReadPixels
Oops, I thought I had removed all debugging code.
2013-03-28 18:17:21 +01:00
Eric Anholt
eda434921d i965/fs: Improve performance of copy propagation dataflow using bitsets.
Reduces compile time of l4d2's slowest shader by 17.8% +/- 1.3% (n=10).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-28 09:48:50 -07:00