Commit Graph

1089 Commits

Author SHA1 Message Date
Francisco Jerez
92649a3e67 i965/vec4: fix assert to detect SIMD lowered DF instructions in IVB
On IVB, DF instructions have lowered the SIMD width to 4 but the
exec_size will be later doubled. Fix the assert to avoid crashing in
this case.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Simplify assert.  Except for the 'inst->group % 4
  == 0' part the assertion was redundant with the previous assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez
6e3265eae5 i965/vec4: split VEC4_OPCODE_FROM_DOUBLE into one opcode per destination's type
This way we can set the destination type as double to all these new opcodes,
avoiding any optimizer's confusion that was happening before.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop no_spill workaround originally needed due to
  the bogus destination type of VEC4_OPCODE_FROM_DOUBLE. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez
50a5217637 i965/vec4: split d2x conversion and data gathering from one opcode to two explicit ones
When doing a 64-bit to a smaller data type size conversion, the destination should
be aligned to 64-bits. Because of that, we need to gather the data after the
actual conversion.

Until now, these two operations were done by VEC4_OPCODE_FROM_DOUBLE but
now we split them explicitely in two different instructions:
VEC4_OPCODE_FROM_DOUBLE just do the conversion and
VEC4_OPCODE_PICK_LOW_32BIT will gather the data.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero
cfaf14a126 i965/vec4: fix VEC4_OPCODE_FROM_DOUBLE for IVB/BYT
In the generator we must generate slightly different code for
Ivybridge/Baytrail, because of the way the stride works in
this hardware.

v2:
- Use stride and don't need to fix dst (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero
be445d3ea3 i965/vec4: keep original type when dealing with null registers
Keep the original type when dealing with null registers. Especially
because we do no want to introduce an implicit conversion between
types that could affect the conditional flags.

This affects especially when the original type is DF, and we are working
on Ivybridge/Baytrail.

v2 (Curro)
- Fix typo.
- Use retype() instead of applying the type directly.
- Remove unneeded retype.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez
a21dc2b500 i965/vec4: split DF instructions and later double its execsize in IVB/BYT
We need to split DF instructions in two on IVB/BYT as it needs an
execsize 8 to process 4 DF values (one GRF in total).

v2:
- Rename helper and make it static inline function (Matt).
- Fix indention and add braces (Matt).

v3:
- Don't edit IR instruction when doubling exec_size (Curro)
- Add comment into the code (Curro).
- Manage ARF registers like the others (Curro)

v4:
- Add get_exec_type() function and use it to calculate the execution
  size.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check.  Take
  destination type as execution type where there is no valid source.
  Assert-fail if the deduced execution type is byte.  Clarify comment
  in get_lowered_simd_width().  Move SIMD width workaround outside of
  'if (...inst->size_written > REG_SIZE)' conditional block, since the
  problem should be independent of whether the amount of data written
  by the instruction is greater or lower than a GRF.  Drop redundant
  is_ivb_df definition.  Drop bogus inst->exec_size < 8 check.
  Simplify channel group assertion. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Samuel Iglesias Gonsálvez
a5399e8b1c i965/fs: lower all non-force_writemask_all DF instructions to SIMD4 on IVB/BYT
The hardware applies the same channel enable signals to both halves of
the compressed instruction which will be just wrong under non-uniform
control flow. Fix this by splitting those instructions to SIMD4.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:08 -07:00
Francisco Jerez
ebfb703d44 i965/fs: Get 64-bit indirect moves working on IVB.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-04-14 14:56:08 -07:00
Matt Turner
630b84cdc8 i965: Use source region <1,2,0> when converting to DF.
Doing so allows us to use a single MOV in VEC4_OPCODE_TO_DOUBLE instead
of two.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-04-14 14:56:08 -07:00
Juan A. Suarez Romero
3198ce3f96 i965/fs: fix lower SIMD width for IVB/BYT's MOV_INDIRECT
According to the IVB and HSW PRMs:

"2.When the destination requires two registers and the sources are
 indirect, the sources must use 1x1 regioning mode."

So for DF instructions the execution size is not limited by the number
of address registers that are available, but by the EU decompression
logic not handling VxH indirect addressing correctly.

This patch limits the SIMD width to 4 in this case.

v2:
- Fix typo (Matt).
- Fix condition (Curro)

v3:
- Add spec quote (Curro)

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero
571cbd05eb i965/fs: fix dst stride in IVB/BYT type conversions
When converting a DF to 32-bit conversions, we set dst stride to 2,
to fulfill alignment restrictions because the upper Dword of every
Qword will be written with undefined value.

But in IVB/BYT, this is not necessary, as each DF conversion already
writes 2, the first one the real value, and the second one a 0.
That is, IVB/BYT already set stride = 2 implicitly, so we must set it to
1 explicitly to avoid ending up with stride = 4.

v2:
- Fix typo (Matt)

v3:
- Fix stride in the destination's brw_reg, don't modity IR (Curro)

v4:
- Remove 'is_dst' argument of brw_reg_from_fs_reg() (Curro)
- Fix comment (Curro).
- Relax hstride assert (Curro)

Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Minor spelling fixes. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez
af6fc3a8ea i965/fs: rename lower_d2x to lower_conversions
v2:
- Change the name to lower_conversions.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez
dee31311eb Revert "i965/fs: Don't emit SEL instructions for type-converting MOVs."
This reverts commit 7dccd38b40.

d2x pass fixes SEL instructions when there is a type conversion
by doing a SEL without type conversion and then convert the result.
This pass also takes into account the non-uniform control flow.

Then, 7dccd38b40 is not needed anymore.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez
aeecc82d05 i965/fs: generalize the legalization d2x pass
Generalize it to lower any unsupported narrower conversion.

v2 (Curro):
- Add supports_type_conversion()
- Reuse existing intruction instead of cloning it.
- Generalize d2x to narrower and equal size conversions.

v3 (Curro):
- Make supports_type_conversion() const and improve it.
- Use foreach_block_and_inst to process added instructions.
- Simplify code.
- Add assert and improve comments.
- Remove redundant mov.
- Remove useless comment.
- Remove saturate == false assert and add support for saturation
  when fixing the conversion.
- Add get_exec_type() function.

v4 (Curro):
- Use get_exec_type() function to get sources' type.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Matt Turner
94ffeb7fa2 i965: Use <0,2,1> region for scalar DF sources on IVB/BYT.
On HSW+, scalar DF sources can be accessed using the normal <0,1,0>
region, but on IVB and BYT DF regions must be programmed in terms of
floats. A <0,2,1> region accomplishes this.

v2:
- Apply region <0,2,1> in brw_reg_from_fs_reg() (Curro).

v3:
- Added comment explaining the reason (Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Samuel Iglesias Gonsálvez
82d17615f4 i965/fs: clamp exec_size when an instruction has a scalar DF source
Then the SIMD lowering pass will get rid of any compressed instructions with scalar
source (whether force_writemask_all or not) and we avoid hitting the Gen7 region
decompression bug.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero
0f1316d4db i965/fs: double regioning parameters and execsize for DF in IVB/BYT
In IVB and BYT, both regioning parameters and execution sizes are measured as
32-bits element size.

So when we have something like:

mov(8) g2<1>DF g3<4,4,1>DF

We are not actually moving 8 doubles (our intention), but 4 doubles.

We need to double the parameters to cope with this issue. However,
horizontal strides don't behave as they're supposed to on IVB
for DF regions, they will cause each 32-bit half of DF sources to be
strided individually, and doubling the value won't make any difference.

v2:
- Use devinfo directly (Matt).
- Use Baytrail instead of Valleview (Matt).
- Use IvyBridge instead of Ivy (Matt)
- Double the exec_size in code emission (Curro)

v3:
- Change hstride doubling by an assert and fix commit log (Curro).
- Substitute remaining compiler->devinfo by devinfo (Curro).

v4:
- Fix comment (Curro).

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Juan A. Suarez Romero
79af256388 i965/fs: add helper to retrieve instruction execution type
The execution data size is the biggest type size of any instruction
operand.

We will use it to know if the instruction deals with DF, because in Ivy
we need to double the execution size and regioning parameters.

v2:
- Fix typo in commit log (Matt)
- Use static inline function instead of fs_inst's method (Curro).
- Define the result as a constant (Curro).
- Fix indentation (Matt).
- Add braces to nested control flow (Matt).

v3 (Curro):
- Add get_exec_type() and other auxiliary functions and use them to
  calculate its size.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Fix bogus 'type != BAD_FILE' check.  Fix deduced
  execution type for integer vector types.  Take destination type as
  execution type where there is no valid source.  Assert-fail if the
  deduced execution type is byte.  Move into brw_ir_fs.h header for
  consistency with the VEC4 back-end. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:07 -07:00
Matt Turner
fd349d29e4 i965: Handle IVB DF differences in the validator.
On IVB/BYT, region parameters and execution size for DF are in terms of
32-bit elements, so they are doubled. For evaluating the validity of an
instruction, we halve them.

v2 (Sam):
- Add comments.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-04-14 14:56:07 -07:00
Iago Toral Quiroga
fbac8b1f94 i965/disasm: also print nibctrl in IVB for execsize=8
4-wide DF operations where NibCtrl applies require and execsize of 8
in IvyBridge/BayTrail.

v2:
- Refactor NibCtrl printing (Matt)

Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2017-04-14 14:56:06 -07:00
Francisco Jerez
147e71242c i965/fs: Take into account lower frequency of conditional blocks in spilling cost heuristic.
The individual branches of an if/else/endif construct will be executed
some unknown number of times between 0 and 1 relative to the parent
block.  Use some factor in between as weight while approximating the
cost of spill/fill instructions within a conditional if-else branch.
This favors spilling registers used within conditional branches which
are likely to be executed less frequently than registers used at the
top level.

Improves the framerate of the SynMark2 OglCSDof benchmark by ~1.9x on
my SKL GT4e.  Should have a comparable effect on other platforms.  No
significant regressions.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-11 15:28:54 -07:00
Jason Ekstrand
3503b2714b i965/fs: Always provide a default LOD of 0 for TXS and TXL
We already provide a default LOD for textureQueryLevels and texture() on
non-fragment stages.  However, there are more cases where one is needed
such as textureSize(gsampler2DMS*) in SPIR-V.  Instead of trying to list
out all of the cases one at a time, just provide the default for all TXS
and TXL operations.  This fixes a shader validation error in the new
Sascha deferredmultisampling demo which uses textureSize(gsampler2DMS).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100391
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-04-04 18:33:35 -07:00
Jason Ekstrand
405ef7bb33 intel/vec4: Add some fall through comments
Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-04-03 16:58:35 -07:00
Alejandro Piñeiro
2f8d6bd578 i965: expose BRW_OPCODE_[F32TO16/F16TO32] name on gen8+
Technically those hw operations are only available on gen7, as gen8+
support the conversion on the MOV. But, when using the builder to
implement nir operations (example: nir_op_fquantize2f16), it is not
needed to do the gen check. This check is done later, on the final
emission at brw_F32TO16 (brw_eu_emit), choosing between the MOV or the
specific operation accordingly.

So in the middle, during optimization phases those hw operations can
be around for gen8+ too.

Without this patch, several (at least 95) vulkan-cts quantize tests
crashes when using INTEL_DEBUG=optimizer. For example:
dEQP-VK.spirv_assembly.instruction.graphics.opquantize.too_small_vert

v2: simplify the code using GEN_GE (Ilia Mirkin)
v3: tweak brw_instruction_name instead of changing opcode_descs
    table, that is used for validation (Matt Turner)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2017-03-29 17:34:15 +02:00
Matt Turner
7dccd38b40 i965/fs: Don't emit SEL instructions for type-converting MOVs.
SEL can only convert between a few integer types, which we basically
never do.

Fixes fs/vs-double-uniform-array-direct-indirect-non-uniform-control-flow
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Acked-by: Francisco Jerez <currojerez@riseup.net>
2017-03-27 10:59:42 -07:00
Iago Toral Quiroga
ddb2bb3ed4 anv/pipeline: make FragCoord include sample positions when sample shading
We need to know if sample shading has been requested during shader
compilation since that affects the way fragment coordinates are
computed.

Notice that the semantics of fragment coordinates only depend on
whether sample shading has been requested, not on whether more
than one sample will actually be produced (that is,
minSampleShading and rasterizationSamples do not affect this
behavior).

Because this setting affects the code we generate for the shader, we also
need to include it in the WM prog key. Notice we don't need to alter the
OpenGL code because it doesn't ever use this behavior, so they key's
value is always false (the default).

Fixes:
dEQP-VK.glsl.builtin_var.fragcoord_msaa.*

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-24 08:11:53 +01:00
Matt Turner
7499bc7fd7 i965: Replace OPT_V() with OPT().
We want to be able to check the progress of each pass and dump the NIR
for debugging purposes if it changed.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Matt Turner
1be91bd9d8 i965/fs: Return progress from demote_sample_qualifiers().
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Matt Turner
fd3351246c i965/fs: Return progress from move_interpolation_to_top().
And mark as static at the same time.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-23 14:34:44 -07:00
Emil Velikov
2438c0a236 intel/compiler: consistently use ifndef guards over pragma once
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:22 +00:00
Emil Velikov
3b277bae66 i965: make brw_setup_image_uniform_values static
Used only internally.

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Vedran Miletić <vedran@miletic.net>
Acked-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Reviewed-by: Edward O'Callaghan <funfunctor@folklore1984.net>
2017-03-22 16:55:21 +00:00
Jason Ekstrand
762a6333f2 nir: Rework conversion opcodes
The NIR story on conversion opcodes is a mess.  We've had way too many
of them, naming is inconsistent, and which ones have explicit sizes was
sort-of random.  This commit re-organizes things and makes them all
consistent:

 - All non-bool conversion opcodes now have the explicit size in the
   destination and are named <src_type>2<dst_type><size>.

 - Integer <-> integer conversion opcodes now only come in i2i and u2u
   forms (i2u and u2i have been removed) since the only difference
   between the different integer conversions is whether or not they
   sign-extend when up-converting.

 - Boolean conversion opcodes all have the explicit size on the bool and
   are named <src_type>2<dst_type>.

Making things consistent also allows nir_type_conversion_op to be moved
to nir_opcodes.c and auto-generated using mako.  This will make adding
int8, int16, and float16 versions much easier when the time comes.

Reviewed-by: Eric Anholt <eric@anholt.net>
2017-03-14 07:36:40 -07:00
Jason Ekstrand
7107b32155 i965/fs: Re-arrange conversion operations
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-14 07:36:40 -07:00
Jason Ekstrand
bab4610e9c i965/vec4: Get rid of the type parameter from to/from_double
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-03-14 07:36:40 -07:00
Jason Ekstrand
b377be9213 i965/fs: Use num_components from the SSA def in image intrinsics
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2017-03-14 07:36:40 -07:00
Iago Toral Quiroga
e8eeb759b7 intel: fix compiler build
compiler/brw_vec4_gs_visitor.cpp:744:39: error:
‘GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES’ was not declared in this scope
           output_vertex_size_bytes <= GEN7_MAX_GS_OUTPUT_VERTEX_SIZE_BYTES);

Fixes: d0d4a5f43b ("i965: split EU defines to brw_eu_defines.h")
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-03-13 13:09:24 +01:00
Emil Velikov
aa09c9552c intel/compiler: whitespace cleanups
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:35 +00:00
Emil Velikov
bdc5036464 intel/compiler: link all tests again gtest, even test_eu_compact"
At the moment all the tests but test_eu_compact are actual C++ gtests.
To simplify things, we can move the gtest.la to the common TEST_LIBS.
As we're here, we can rename change the test extension [to .cpp] to
avoid using the confusing dummy.cpp.

Add a nice comment in the makefile for posterity.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:35 +00:00
Jason Ekstrand
700bebb958 i965: Move the back-end compiler to src/intel/compiler
Mostly a dummy git mv with a couple of noticable parts:
 - With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
 - Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
 - brw_util.[ch] is not really compiler specific, so it's moved to i965.

v2:
 - move brw_eu_defines.h instead of brw_defines.h
 - remove no-longer applicable includes
 - add missing vulkan/ prefix in the Android build (thanks Tapani)

v3:
 - don't list brw_defines.h in src/intel/Makefile.sources (Jason)
 - rebase on top of the oa patches

[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2017-03-13 11:16:34 +00:00