Commit Graph

87313 Commits

Author SHA1 Message Date
Jason Ekstrand
b18cd8ce2c i965/miptree: Use intel_miptree_copy for maps
What we're really doing is copying a texture not blitting it in the sense
of glBlitFramebuffers.  Also, the intel_miptree_copy function is capable of
properly handling compressed textures which intel_miptree_blit is not.

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97473
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-12-13 15:48:34 -08:00
Jason Ekstrand
157971e450 i965/blit: Fix the src dimension sanity check in miptree_copy
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
2016-12-13 15:48:13 -08:00
Lionel Landwerlin
9fe3f2649e docs: add INTEL_conservative_rasterization to relaese notes for 13.1.0
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-12-13 16:28:00 +00:00
Lionel Landwerlin
60330d730b main: add INTEL_conservative_rasterization enum query support
v2: add extra parameter (Ilia)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-12-13 16:27:59 +00:00
Lionel Landwerlin
d4b753a50b glapi: add missing INTEL_conservative_rasterization
v2: put enum directly in gl_API.xml (Ilia)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-12-13 16:27:56 +00:00
Lionel Landwerlin
47285d4602 extensions: update INTEL_conservative_rasterization dependencies
Suggested by Ilia.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-12-13 16:27:54 +00:00
Lionel Landwerlin
300d96a433 main: don't error when enabling conservative rasterization on gles
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2016-12-13 16:27:51 +00:00
Lionel Landwerlin
9854a3ba8b main: use new driver flag for conservative rasterization state
Suggested by Marek.

v2: Use new driver flag (Marek)

v3: Fix i965 comments (Lionel)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-13 16:27:33 +00:00
Iago Toral Quiroga
da3389a331 nir/lower_tex: lower gradients on shadow cube maps if lower_txd_shadow is set
Even if lower_txd_cube_map isn't. Suggested by Ken to make the flag more
consistent with its name.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-13 10:33:29 +01:00
Iago Toral Quiroga
44873ad0a4 i965: remove brw_lower_texture_gradients
This has been ported to NIR now so we don'tneed to keep the GLSL IR
lowering any more.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-13 10:33:20 +01:00
Iago Toral Quiroga
77f65b3b64 i965/nir: enable lowering of texture gradient for shadow samplers
This gets the lowering on the Vulkan driver too, which is required for
hardware that does not have the sample_l_d message (up to IvyBridge).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-13 10:33:14 +01:00
Iago Toral Quiroga
5be2e785b1 nir/lower_tex: add lowering for texture gradient on shadow samplers
This is ported from the Intel lowering pass that we use with GLSL IR.
This takes care of lowering texture gradients on shadow samplers other
than cube maps. Intel hardware requires this for gen < 8.

v2 (Ken):
 - Use the helper function to retrieve ddx/ddy
 - Swizzle away size components we are not interested in

v3:
- Get rid of the ddx/ddy helper and use nir_tex_instr_src_index
  instead (Ken, Eric)

v4:
- Add a 'continue' statement if the lowering makes progress because it
  replaces the original texture instruction

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v3)
2016-12-13 10:32:52 +01:00
Iago Toral Quiroga
f90da64fc6 i965/nir: enable lowering of texture gradient for cube maps
This gets the lowering on the Vulkan driver too.

Fixes Vulkan CTS cube map texture gradient tests in:
dEQP-VK.glsl.texture_functions.texturegrad.*

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-13 10:32:46 +01:00
Iago Toral Quiroga
a8e740c354 nir/lower_tex: add lowering for texture gradient on cube maps
This is ported from the Intel lowering pass that we use with GLSL IR.
The NIR pass only handles cube maps, not shadow samplers, which are
also lowered for gen < 8 on Intel hardware. We will add support for
that in a later patch, at which point we should be able to remove
the GLSL IR lowering pass.

v2:
- added a helper to retrieve ddx/ddy parameters (Ken)
- No need to make size.z=1.0, we are only using component x anyway (Iago)

v3:
- Get rid of the ddx/ddy helper and use nir_tex_instr_src_index
  instead (Ken, Eric)

v4:
- When emitting the textureLod operation, copy all texture parameters
  from the original textureGrad() (except for ddx/ddy) using a loop
- Add a 'continue' statement if the lowering makes progress because it
  replaces the original texture instruction

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v3)
2016-12-13 10:32:00 +01:00
Iago Toral Quiroga
bac303c286 nir/lower_tex: generalize get_texture_size()
This was written specifically for RECT samplers. Make it more generic so
we can call this from the gradient lowerings too.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-12-13 10:31:38 +01:00
Ilia Mirkin
fd249c803e treewide: s/comparitor/comparator/
git grep -l comparitor | xargs sed -i 's/comparitor/comparator/g'

Just happened to notice this in a patch that was sent and included one
of the tokens in question.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-12-12 22:13:07 -05:00
Ian Romanick
a0ce9ff8c4 nir: Only float and double types can be matrices
In 19a541f (nir: Get rid of nir_constant_data) a number of places that
operated on nir_constant::values were mechanically converted to operate
on the whole array without regard for the base type.  Only
GLSL_TYPE_FLOAT and GLSL_TYPE_DOUBLE can be matrices, so only those
types can have data in the non-0 array element.

See also b870394.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
Cc: Iago Toral Quiroga <itoral@igalia.com>
2016-12-12 17:17:12 -08:00
Tim Rowley
75149088be swr: [rasterizer core/memory] StoreTile: AVX512 progress
Fixes to 128-bit formats.

Reviwed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2016-12-12 17:52:39 -06:00
Matt Turner
ac6646129f nir: Move fsat outside of fmin/fmax if second arg is 0 to 1.
instructions in affected programs: 550 -> 544 (-1.09%)
helped: 6

cycles in affected programs: 6952 -> 6850 (-1.47%)
helped: 6

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-12 12:39:27 -08:00
Matt Turner
7bed52bb5f i965/fs: Reject copy propagation into SEL if not min/max.
We shouldn't ever see a SEL with conditional mod other than GE (for max)
or L (for min), but we might see one with predication and no conditional
mod.

total instructions in shared programs: 8241806 -> 8241902 (0.00%)
instructions in affected programs: 13284 -> 13380 (0.72%)
HURT: 62

total cycles in shared programs: 84165104 -> 84166244 (0.00%)
cycles in affected programs: 75364 -> 76504 (1.51%)
helped: 10
HURT: 34

Fixes generated code in at least Sanctum 2, Borderlands 2, Goat
Simulator, XCOM: Enemy Unknown, and Shogun 2.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92234
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-12 12:38:55 -08:00
Matt Turner
091a8a04ad i965/fs: Add unit tests for copy propagation pass.
Pretty basic, but it's a start.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-12 12:38:50 -08:00
Matt Turner
6014da50ec i965/fs: Rename opt_copy_propagate -> opt_copy_propagation.
Matches the vec4 backend, cmod propagation, and saturate propagation.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-12-12 12:38:43 -08:00
Nicolai Hähnle
ec0a0a60cc radeonsi: shrink the GSVS ring to account for the reduced item sizes
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:05:17 +01:00
Nicolai Hähnle
6fdef7d265 radeonsi: shrink each vertex stream to the actually required size
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:05:13 +01:00
Nicolai Hähnle
2f2e941e2d radeonsi: use a single descriptor for the GSVS ring
We can hardcode all of the fields for swizzling in the geometry shader.

The advantage is that we use fewer descriptor slots and we no longer have to
update any of the (ring) descriptors when the geometry shader changes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:05:05 +01:00
Nicolai Hähnle
18616e7551 radeonsi: pack GS output components for each vertex stream contiguously
Note that the memory layout of one vertex stream inside one "item" (= memory
written by one GS wave) on the GSVS ring is:

  t0v0c0 ... t15v0c0 t0v1c0 ... t15v1c0 ... t0vLc0 ... t15vLc0
  t0v0c1 ... t15v0c1 t0v1c1 ... t15v1c1 ... t0vLc1 ... t15vLc1
                        ...
  t0v0cL ... t15v0cL t0v1cL ... t15v1cL ... t0vLcL ... t15vLcL
  t16v0c0 ... t31v0c0 t16v1c0 ... t31v1c0 ... t16vLc0 ... t31vLc0
  t16v0c1 ... t31v0c1 t16v1c1 ... t31v1c1 ... t16vLc1 ... t31vLc1
                        ...
  t16v0cL ... t31v0cL t16v1cL ... t31v1cL ... t16vLcL ... t31vLcL

                        ...

  t48v0c0 ... t63v0c0 t48v1c0 ... t63v1c0 ... t48vLc0 ... t63vLc0
  t48v0c1 ... t63v0c1 t48v1c1 ... t63v1c1 ... t48vLc1 ... t63vLc1
                        ...
  t48v0cL ... t63v0cL t48v1cL ... t63v1cL ... t48vLcL ... t63vLcL

where tNN indicates the thread number, vNN the vertex number (in the order of
EMIT_VERTEX), and cNN the output component (vL and cL are the last vertex and
component, respectively).

The vertex streams are laid out sequentially.

The swizzling by 16 threads is hard-coded in the way the VGT generates the
offset passed into the GS copy shader, and the jump every 16 threads is
calculated from VGT_GSVS_RING_OFFSET_n and VGT_GSVS_RING_ITEMSIZE in a way
that makes it difficult to deviate from this layout (at least that's what
I've experimentally confirmed on VI after first trying to go the simpler
route of just interleaving the vertex streams).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:05:00 +01:00
Nicolai Hähnle
edf034ac14 radeonsi: do not write non-existent components through the GSVS ring
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:58 +01:00
Nicolai Hähnle
af976f12a5 radeonsi: only write values belonging to the stream when emitting GS vertex
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:54 +01:00
Nicolai Hähnle
bdf1bf1cb5 radeonsi: generate an explicit switch instruction over vertex streams
SimplifyCFG generates a switch instruction anyway when all four streams
are present, but is simultaneously not smart enough to eliminate some
redundant jumps that it generates.

The generated assembly is still a bit silly, probably because the
control flow annotation doesn't know how to handle a switch with uniform
condition.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:49 +01:00
Nicolai Hähnle
bae929f96e radeonsi: fetch only outputs of current vertex stream from the GSVS ring
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:46 +01:00
Nicolai Hähnle
dfb69cac33 radeonsi: only export from GS copy shader for vertex stream 0
When running the copy shader for vertex streams != 0, the SX does not need
any data from us (there is no rasterization for the higher vertex streams,
only streamout).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:43 +01:00
Nicolai Hähnle
21f2bb22a3 radeonsi: do not export VS outputs from vertex streams != 0
This affects for GS copy shaders. When an output is meant for vertex
stream != 0, then we don't have to make it available to the pixel
shader.

There is a minor inefficiency here because the GLSL varying packing pass
does not group varyings of the same vertex stream together, but it
shouldn't be important in practice.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:36 +01:00
Nicolai Hähnle
fc0e009aa7 radeonsi: pull iteration over vertex streams into GS copy shader logic
The iteration is not needed for normal vertex shaders.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:33 +01:00
Nicolai Hähnle
180ae18ec5 radeonsi: group streamout writes by vertex stream
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:30 +01:00
Nicolai Hähnle
d89592836a radeonsi: load the streamout buf descriptors closer to their use
LLVM can still decide to hoist the loads since they're marked invariant.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:27 +01:00
Nicolai Hähnle
564f17f0d7 radeonsi: extract writing of a single streamout output
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:24 +01:00
Nicolai Hähnle
b41dd00235 radeonsi: separate the call to si_llvm_emit_streamout from exports
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:22 +01:00
Nicolai Hähnle
5ad6e56ca3 radeonsi: plumb the output vertex_stream through to si_shader_output_values
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:19 +01:00
Nicolai Hähnle
2985708fa0 radeonsi: rename members of si_shader_output_values
Be a bit more verbose and avoid confusion in future patches.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:16 +01:00
Nicolai Hähnle
88509518b0 radeonsi: fix an off-by-one error in the bounds check for max_vertices
The spec actually says that calling EmitStreamVertex is undefined when
you exceed max_vertices. But we do need to avoid trampling over memory
outside the GSVS ring.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:13 +01:00
Nicolai Hähnle
7655bccce8 radeonsi: do not kill GS with memory writes
Vertex emits beyond the specified maximum number of vertices are supposed to
have no effect, which is why we used to always kill GS that reached the limit.

However, if the GS also writes to memory (SSBO, atomics, shader images), then
we must keep going and only skip the vertex emit itself.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:10 +01:00
Nicolai Hähnle
7b5b3d63c5 radeonsi: update all GSVS ring descriptors for new buffer allocations
Fixes GL45-CTS.gtf40.GL3Tests.transform_feedback3.transform_feedback3_geometry_instanced.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:06 +01:00
Nicolai Hähnle
2eaacba7f2 st/glsl_to_tgsi: plumb the GS output stream qualifier through to TGSI
Allow drivers to emit GS outputs in a smarter way.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:03 +01:00
Nicolai Hähnle
cc34a6f0bd tgsi/scan: collect information about output usagemasks
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:04:01 +01:00
Nicolai Hähnle
cf8e9778fc tgsi/scan: collect information about output vertex streams
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:03:57 +01:00
Nicolai Hähnle
81d0dc5e55 gallium: extract individual streamout output structure
So that we can pass pointers to individual array entries around.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:03:54 +01:00
Nicolai Hähnle
04811354c8 tgsi: add Stream{X,Y,Z,W} fields to tgsi_declaration_semantic
This is for geometry shader outputs. Without it, drivers have no way of
knowing which stream each output is intended for, and have to
conservatively write all outputs to all streams.

Separate stream numbers for each component are required due to output
packing.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:03:51 +01:00
Nicolai Hähnle
173d80b401 glsl: remember per-component vertex streams for packed varyings
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-12-12 09:03:47 +01:00
Grazvydas Ignotas
6092169b96 i965/blorp: fix release build unused variable warning
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2016-12-12 07:09:33 +01:00
Edward O'Callaghan
5e6b2b05a5 virgl: Fix a strict-aliasing violation in the encoder
As per the C spec, it is illegal to alias pointers to different
types. This results in undefined behaviour after optimization
passes, resulting in very subtle bugs that happen only on a
full moon..

Use a memcpy() as a well defined coercion between the double
to uint64_t interpretations of the memory.

V.2: Use static_assert() instead of assert().
V.3: Use C99 compat STATIC_ASSERT() over C11 static_assert().

Signed-off-by: Edward O'Callaghan <funfunctor@folklore1984.net>
Acked-by: Dave Airlie <airlied@redhat.com>
2016-12-12 16:50:15 +11:00