Commit Graph

84857 Commits

Author SHA1 Message Date
Francisco Jerez
4135fc22ff i965/fs: Hook up coherent framebuffer reads to the NIR front-end.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:09 -07:00
Francisco Jerez
be12a1f36e i965/fs: Remove special casing of framebuffer writes in scheduler code.
The reason why it was safe for the scheduler to ignore the side
effects of framebuffer write instructions was that its side effects
couldn't have had any influence on any other instruction in the
program, because we weren't doing framebuffer reads, and framebuffer
writes were always non-overlapping.  We need actual memory dependency
analysis in order to determine whether a side-effectful instruction
can be reordered with respect to other instructions in the program.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:09 -07:00
Francisco Jerez
3daa0fae4b i965/fs: Don't CSE render target messages with different target index.
We weren't checking the fs_inst::target field when comparing whether
two instructions are equal.  For FB writes it doesn't matter because
they aren't CSE-able anyway, but this would have become a problem with
FB reads which are expression-like instructions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
db123df747 i965/fs: Define logical framebuffer read opcode and lower it to physical reads.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
f2f75b0cf0 i965/fs: Define framebuffer read virtual opcode.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
71d639f69e i965/disasm: Fix RC message type strings on Gen7+.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
26ac16fe2f i965/eu: Add codegen support for the Gen9+ render target read message.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
29eb8059fd i965/eu: Take into account the target cache argument in brw_set_dp_read_message.
brw_set_dp_read_message() was setting the data cache as send message
SFID on Gen7+ hardware, ignoring the target cache specified by the
caller.  Some of the callers were passing a bogus target cache value
as argument relying on brw_set_dp_read_message not to take it into
account.  Fix them too.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
8a2f19a777 i965: Flip the non-coherent framebuffer fetch extension bit on G45-Gen8 hardware.
This is not enabled on the original Gen4 part because it lacks surface
state tile offsets so it may not be possible to sample from arbitrary
non-zero layers of the framebuffer depending on the miptree layout (it
should be possible to work around this by allocating a scratch surface
and doing the same hack currently used for render targets, but meh...).

On Gen9+ even though it should mostly work (feel free to force-enable
it in order to compare the coherent and non-coherent paths in terms of
performance), there are some corner cases like 1D array layered
framebuffers that cannot be handled easily by the non-coherent path
because of the incompatible layout in memory of 1D and 2D miptrees (it
should be possible to work around this too by doing state-dependent
recompiles, but it's hard to care enough since Gen9 has native support
for coherent render target reads...)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
ecc4800383 i965: Implement glBlendBarrier.
This is a no-op if the platform supports coherent framebuffer fetch,
-- If it doesn't we just need to flush the render cache and invalidate
the texture cache in order for previous rendering to be visible to
framebuffer fetch.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
786108e7b2 i965: Upload surface state for non-coherent framebuffer fetch.
This iterates over the list of attached render buffers and binds
appropriate surface state structures to the binding table block
allocated for shader framebuffer read.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:08 -07:00
Francisco Jerez
dc96968dbf i965: Implement support for overriding the texture target in brw_emit_surface_state.
This allows the caller to bind a miptree using a texture target other
than the one it it was created with.  The code should work even if the
memory layouts of the specified and original targets don't match, as
long as the caller only intends to access a single slice of the
miptree structure.

This will be exploited by the next commit in order to support
non-coherent framebuffer fetch of a single layer of a 3D texture
(since some generations lack the minimum array element control for 3D
textures bound to the sampler unit), and multiple layers of a 1D array
texture (since binding it as an actual 1D array texture would require
state-dependent recompiles because the same shader couldn't
simultaneously work for 1D and 2D array textures due to the different
texel fetch coordinate ordering).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:07 -07:00
Francisco Jerez
49ea2bd175 i965: Massage argument list of brw_emit_surface_state().
This commit does three different things in a single pass in order to
keep the amount of churn low: Remove the for_gather boolean argument
which was unused, pass the isl_view argument by value rather than by
reference since I'll have to modify it from within the function, and
add a target argument to allow callers to bind textures using a target
other than the original.  The prototype of the function now looks
like:

 void brw_emit_surface_state(struct brw_context *brw,
                             struct intel_mipmap_tree *mt,
                             GLenum target, struct isl_view view,
                             uint32_t mocs, uint32_t *surf_offset, int surf_index,
                             unsigned read_domains, unsigned write_domains);

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:07 -07:00
Francisco Jerez
74e4baec59 i965: Add missing has_surface_tile_offset flag to the Gen8+ device info structures.
This surface state control has been supported by all hardware
generations since G45.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:07 -07:00
Francisco Jerez
0fe732e66f i965: Return the correct layout from get_isl_dim_layout for pre-ILK cube textures.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:07 -07:00
Francisco Jerez
5759eb458b i965: Factor out isl_surf_dim/isl_dim_layout calculation into functions.
The logic to calculate the right layout and dimensionality for a given
GL texture target is going to be useful elsewhere, factor it out from
intel_miptree_get_isl_surf().

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:07 -07:00
Francisco Jerez
99fb167839 i965: Resolve color for non-coherent FB fetch at UpdateState time.
This is required because the sampler unit used to fetch from the
framebuffer is unable to interpret non-color-compressed fast-cleared
single-sample texture data.  Roughly the same limitation applies for
surfaces bound to texture or image units, but unlike texture sampling,
non-coherent framebuffer fetch is by definition non-coherent with
previous rendering, so the brw_render_cache_set_check_flush() call can
be omitted except after resolve.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:07 -07:00
Francisco Jerez
071665c161 i965: Return whether the miptree was resolved from intel_miptree_resolve_color().
This will allow optimizing out the cache flush in some cases when
resolving wasn't necessary.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:07 -07:00
Francisco Jerez
f24e393bd5 i965/fs: Translate nir_intrinsic_load_output on a fragment output.
This gets the non-coherent framebuffer fetch path hooked up to the NIR
front-end.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:07 -07:00
Francisco Jerez
b00a236d6a i965/fs: Allocate fragment output temporaries on demand.
This gets rid of the duplication of logic between nir_setup_outputs()
and get_frag_output() by allocating fragment output temporaries lazily
whenever get_frag_output() is called.  This makes nir_setup_outputs()
a no-op for the fragment shader stage.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
7dac882073 i965/fs: Rework representation of fragment output locations in NIR.
The problem with the current approach is that driver output locations
are represented as a linear offset within the nir_outputs array, which
makes it rather difficult for the back-end to figure out what color
output and index some nir_intrinsic_load/store_output was meant for,
because the offset of a given output within the nir_output array is
dependent on the type and size of all previously allocated outputs.
Instead this defines the driver location of an output to be the pair
formed by its GLSL-assigned location and index (I've borrowed the
bitfield macros from brw_defines.h in order to represent the pair of
integers as a single scalar value that can be assigned to
nir_variable_data::driver_location).  nir_assign_var_locations is no
longer useful for fragment outputs.

Because fragment outputs are now allocated independently rather than
within the nir_outputs array, the get_frag_output() helper becomes
necessary in order to obtain the right temporary register for a given
location-index pair.

The type_size helper passed to nir_lower_io is now type_size_dvec4
rather than type_size_vec4_times_4 so that output array offsets are
provided in terms of whole array elements rather than in terms of
scalar components (dvec4 is the largest vector type supported by the
GLSL so this will cause all individual fragment outputs to have a size
of one regardless of the type).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
4e990b67ce i965: Fix undefined signed overflow in INTEL_MASK for bitfields of 31 bits.
Most likely we had only ever used this macro on bitfields of less than
31 bits -- That's going to change shortly.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
f3cb2c34f2 i965/fs: Special-case nir_intrinsic_store_output for the fragment shader.
I'm about to change how fragment shader output locations are
represented, so the generic nir_intrinsic_store_output implementation
that assumes that outputs are just contiguous elements in the big
nir_outputs array won't work anymore.  This somewhat simplified
implementation of nir_intrinsic_store_output for fragment shaders
should be functionally equivalent to the current fall-back one.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
af0cc743e6 i965/fs: Implement non-coherent framebuffer fetch using the sampler unit.
v2: Memoize sample ID, misc codestyle changes. (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
fe6abb5755 i965/fs: Emit interpolation setup if non-coherent framebuffer fetch is in use.
This will be required for the next commit since the non-coherent path
makes use of the fragment coordinates implicitly, so they need to be
calculated.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
98d61ee083 i965/fs: Force per-sample dispatch if the shader reads from a multisample FBO.
The result of a framebuffer fetch from a multisample FBO is inherently
per-sample, so the spec requires at least those sections of the shader
that depend on the framebuffer fetch result to be executed once per
sample.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
08705badfe i965: Allocate space in the binding table for non-coherent FB fetch.
Unfortunately due to the inconsistent meaning of some surface state
structure fields, we cannot re-use the same binding table entries for
sampling from and rendering into the same set of render buffers, so we
need to allocate a separate binding table block specifically for
render target reads if the non-coherent path is in use.

The slight noise is due to the change of
brw_assign_common_binding_table_offsets to return the next available
binding table index rather than void.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
40b23ad57e i965/fs: Add brw_wm_prog_key bit specifying whether FB reads should be coherent.
Some of the following changes in this series are specific to the
non-coherent path, so I need some way to tell whether the coherent or
non-coherent path is in use.  The flag defaults to the value of the
gl_extensions::MESA_shader_framebuffer_fetch enable so that it can be
overridden easily on hardware that supports both framebuffer fetch
extensions in order to test the non-coherent path, like:

 MESA_EXTENSION_OVERRIDE=-GL_EXT_shader_framebuffer_fetch

(Of course trying to force-enable the coherent framebuffer fetch
extension on hardware without native support won't work and lead to
assertion failures).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:06 -07:00
Francisco Jerez
4a87e4ade7 i965/fs: Get rid of fs_visitor::do_dual_src.
This boolean flag was being used for two different things:

 - To set the brw_wm_prog_data::dual_src_blend flag.  Instead we can
   just set it based on whether the dual_src_output register is valid,
   which will be the case if the shader writes the secondary blending
   color.

 - To decide whether to call emit_single_fb_write() once, or in a loop
   that would iterate only once, which seems pretty useless.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:36:00 -07:00
Francisco Jerez
aee3d8f0d9 nir: Handle FB fetch outputs correctly in nir_lower_io_to_temporaries.
This requires emitting a series of copies at the top of the program
from each output variable to the corresponding temporary.  The initial
copy can be skipped for non-framebuffer fetch outputs whose initial
value is undefined, and the final copy needs to be skipped for
read-only outputs (i.e. gl_LastFragData), since it would be illegal to
emit a store output intrinsic for it.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:33:29 -07:00
Francisco Jerez
97ac3eba58 nir: Pass through fb_fetch_output and OutputsRead from GLSL IR.
The NIR representation of framebuffer fetch is the same as the GLSL
IR's until interface variables are lowered away, at which point it
will be translated to load output intrinsics.  The GLSL-to-NIR pass
just needs to copy the bits over to the NIR program.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-25 18:33:29 -07:00
Eric Anholt
00c72acba5 vc4: Add support for fddx/fddy
Based vaguely on a patch by jonasarrow on github.
2016-08-25 17:24:11 -07:00
Eric Anholt
e763e19808 vc4: Add register allocation support for MUL output rotation.
We need the source to be in r0-r3, so make a new register class for it.
It will be up to the surrounding passes to make sure that the r0-r3
allocation of its source won't conflict with anything other class
requirements on that temp.
2016-08-25 17:24:11 -07:00
Eric Anholt
8ce6526178 vc4: Add support for MUL output rotation.
Extracted from a patch by jonasarrow on github.
2016-08-25 17:24:11 -07:00
Eric Anholt
074f1f3c0c vc4: Add support for the 2-bit LOAD_IMM variants.
Extracted and fixed up from a patch by jonasarrow on github.  This ended
up not getting used for ddx/ddy, but seems like it might still be useful.
2016-08-25 17:24:11 -07:00
Eric Anholt
3da4e38f48 vc4: Add QPU scheduling to handle MUL rotate sources.
We need MUL rotates to do ddx/ddy support.
2016-08-25 17:24:11 -07:00
Eric Anholt
b0b99a7952 vc4: Add disassembly for constant MUL rotates 2016-08-25 17:24:11 -07:00
Eric Anholt
b160708e03 vc4: Add real validation for MUL rotation.
Caught problems in the upcoming DDX/DDY implementation.
2016-08-25 17:24:11 -07:00
Eric Anholt
31da39ddc9 vc4: Add a QIR value for the QPU element register.
This will be used in the ddx/ddy support for "Am I the top half?" or "Am I
the left half?" checks.
2016-08-25 17:24:11 -07:00
Chad Versace
5b03975889 i965: Respect miptree offsets in intel_readpixels_tiled_memcpy()
Respect intel_miptree_slice::x_offset,y_offset and
intel_mipmap_tree::offset. All three may be non-zero when glReadPixels
is called on an EGLImage created from the non-base slice of a miptree.

Patch 2/2 that fixes test
'dEQP-EGL.functional.image.create.gles2_cubemap_*'.

Reported-by: Haixia Shi <hshi@chromium.org>
Diagnosed-by: Haixia Shi <hshi@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Change-Id: I4b397b27e55a743a7094d29fb0a6a4b6b34352b0
2016-08-25 16:52:00 -07:00
Chad Versace
c82f99e883 i965: Fix miptree layout for EGLImage-based renderbuffers
When glEGLImageTargetRenderbufferStorageOES() was given an EGLImage
created from the non-base slice of a miptree,
intel_image_target_renderbuffer_storage() forgot to apply the intra-tile
offsets __DRIimage::tile_x,tile_y to the miptree layout.

This patch fixes the problem with a quick hack suitable for
cherry-picking. A proper fix requires more thorough plumbing in
intel_miptree_create_layout() and brw_tex_layout().

Patch 1/2 that fixes test
'dEQP-EGL.functional.image.create.gles2_cubemap_*'.

Reported-by: Haixia Shi <hshi@chromium.org>
Diagnosed-by: Haixia Shi <hshi@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: mesa-stable@lists.freedesktop.org
Change-Id: I8a64b0048a1ee9e714ebb3f33fffd8334036450b
2016-08-25 16:52:00 -07:00
Jason Ekstrand
bebc1a1d99 intel: Flatten the makefile structure
This pulls isl and genxml into a single make file so that they can properly
build in parallel.  This isn't terribly important now as genxml just
generates sources which happens serially first anyway but it will be more
important as we add more stuff to src/intel.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-25 15:29:48 -07:00
Jason Ekstrand
c19fc5e019 isl/tests: Use a longer path for isl.h
The tests assumed that isl would be in the include path but that usually
isn't the case.  Instead, we usually have src/intel and you need to add an
"isl/" prefix.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-25 15:29:47 -07:00
Jason Ekstrand
8bdf605214 intel/isl/gen9: Only use the magic 1D alignment for GEN9_1D surfaces
If the surface has a layout of GEN4_2D then we need to compute a normal 2D
alignment and not use the magic linewar 1D alignment.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-08-25 14:11:15 -07:00
Jason Ekstrand
cda1a5dc0e intel/isl: Pass the dim_layout into choose_alignment_el
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-08-25 14:10:43 -07:00
Jason Ekstrand
f68cfb05fa intel/isl: Use DIM_LAYOUT_GEN4_2D for tiled 1-D surfaces on SKL
The Sky Lake 1D layout is only used if the surface is linear.  For tiled
surfaces such as depth and stencil the old gen4 2D layout is used.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
2016-08-25 14:09:44 -07:00
Jason Ekstrand
78715c7211 nir/phi_builder: Don't recurse in value_get_block_def
In some programs, we can have very deep dominance trees and the recursion
can cause us to risk stack overflows.  Instead, we replace the recursion
with a pair of loops, one at the start and one at the end.  This is
functionally equivalent to what we had before and it's actually a bit
easier to read in the new form without the recursion.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-25 14:08:07 -07:00
Chad Versace
3eddf5219e .mailmap: Update my address again
I joined Google's Chrome OS graphics team.
2016-08-25 13:55:52 -07:00
Matt Turner
e53130cc27 nir: Walk blocks in source code order in lower_vars_to_ssa.
Prior to this commit rename_variables_block() is recursively called,
performing a depth-first traversal of the control flow graph. The
function uses a non-trivial amount of stack space for local variables,
which puts us in danger of smashing the stack, given a sufficiently deep
dominance tree.

XCOM: Enemy Within contains a shader with such a dominance tree (1574
nir_blocks in total, depth of at least 143).

Jason tells me that he believes that any walk over the nir_blocks that
respects dominance is sufficient (a DFS might have been necessary prior
to the introduction of nir_phi_builder).

In fact, the introduction of nir_phi_builder made the problem worse:
rename_variables_block(), walks to the bottom of the dominance tree
before calling nir_phi_builder_value_get_block_def() which walks back to
the top of the dominance tree...

In any case, this patch ensures we avoid that problem as well.

Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97225
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-08-25 13:45:39 -07:00
Marek Olšák
a491b9e945 radeonsi: don't use allocas for arrays with LLVM 3.8
It crashes.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97413
2016-08-25 21:19:17 +02:00