Prior to gen8, the flag [sub]register number is in a different spot on
3src instructions than on other instructions. Starting with Broadwell,
they made it consistent. This commit fixes bugs that occur when a
conditional modifier gets propagated into a 3src instruction such as a
MAD.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of doing a memcpy, this moves us to start with a blank
instruction (memset to zero) and copy the fields over one at a time.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Adds suppport for ARB_fragment_shader_interlock. We achieve
the interlock and fragment ordering by issuing a memory fence
via sendc.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
When using multiple RT write messages to the same RT such as for
dual-source blending or all RT writes in SIMD32, we have to set the
"Last Render Target Select" bit on all write messages that target the
last RT but only set EOT on the last RT write in the shader.
Special-casing for dual-source blend works today because that is the
only case which requires multiple RT write messages per RT. When we
start doing SIMD32, this will become much more common so we add a
dedicated bit for it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The only reason it was it's own opcode was so that we could detect it
and adjust the source register based on the payload setup. Now that
we're using the ATTR file for FS inputs, there's no point in having a
magic opcode for this.
v2 (Jason Ekstrand):
- Break the bit which removes the CINTERP opcode into its own patch
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This replaces the special magic opcodes which implicitly read inputs
with explicit use of the ATTR file.
v2 (Jason Ekstrand):
- Break into multiple patches
- Change the units of the FS ATTR to be in logical scalars
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This is better than compression control because it naturally extends to
SIMD32.
v2:
- Push/pop instruction state around adjusted codegen (Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
The fall-back does not work correctly in SIMD16 mode and the register
allocator should ensure that we never hit this case anyway.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This rollbacks the revert of this same patch introduced in
commit 7b9c15628a.
And also squahes the following patch to prevent a piglit regression caused
by this change:
intel/compiler: Fix lower_conversions for 8-bit types.
Author: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
For 8-bit types the execution type is word. A byte raw MOV has 16-bit
execution type and 8-bit destination and it shouldn't be considered
a conversion case. So there is no need to change alignment and enter
in lower_conversions for these instructions.
Fixes a regresion in the piglit test "glsl-fs-shader-stencil-export"
that is introduced with this patch from the Vulkan shaderInt16 series:
'i965/compiler: handle conversion to smaller type in the lowering
pass for that'. The problem is caused because there is already a case
in the driver that injects Byte instructions like this:
mov(8) g127<1>UB g2<32,8,4>UB
And the aforementioned pass was not accounting for the special
handling of the execution size of Byte instructions. This patch
fixes this.
v2: (Jason Ekstrand)
- Simplify is_byte_raw_mov, include reference to PRM and not
consider B <-> UB conversions as raw movs.
v3: (Matt Turner)
- Indentation style fixes.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106393
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
These are subject to the general restriction that anything that is converted
to 64-bit needs to be aligned to 64-bit. We had this already in place for
32-bit to 64-bit conversions, so this patch generalizes the implementation
to take effect on any conversion to 64-bit from a source smaller than
64-bit.
Fixes assembly validation errors in the following CTS tests in BSW:
dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_int64
dEQP-VK.spirv_assembly.instruction.compute.uconvert.uint16_to_uint64
dEQP-VK.spirv_assembly.instruction.compute.sconvert.int16_to_uint64
Tested-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
NIR assumes that booleans are always 32-bit, but Intel hardware produces
16-bit booleans for 16-bit comparisons. This means that we need to convert
the 16-bit result to 32-bit.
In the future we want to add an optimization pass to clean this up and
hopefully remove the conversions.
v2 (Jason): use the type of the source for the temporary and use
brw_reg_type_from_bit_size for the conversion to 32-bit.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
These are not supported in hardware for 16-bit integers.
We do the lowering pass after the optimization loop to ensure that we
lower ALU operations injected by algebraic optimizations too.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
16-bit immediates need to replicate the 16-bit immediate value
in both words of the 32-bit value. This needs to be careful
to avoid sign-extension, which the previous implementation was
not handling properly.
For example, with the previous implementation, storing the value
-3 would generate imm.d = 0xfffffffd due to signed integer sign
extension, which is not correct. Instead, we should cast to
uint16_t, which gives us the correct result: imm.ud = 0xfffdfffd.
We only had a couple of cases hitting this path in the driver
until now, one with value -1, which would work since all bits are
one in this case, and another with value -2 in brw_clip_tri(),
which would hit the aforementioned issue (this case only affects
gen4 although we are not aware of whether this was causing an
actual bug somewhere).
v2: Make explicit uint32_t casting for left shift (Jason Ekstrand)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: "18.0 18.1" <mesa-stable@lists.freedesktop.org>
From Intel Skylake PRM, vol 07, "Immediate" section (page 768):
"For a word, unsigned word, or half-float immediate data,
software must replicate the same 16-bit immediate value to both
the lower word and the high word of the 32-bit immediate field
in a GEN instruction."
This fixes the int16/uint16 negate and abs immediates that weren't
taking into account the replication in lower and upper words.
v2: Integer cases are different to Float cases. (Jason Ekstrand)
Included reference to PRM (Jose Maria Casanova)
v3: Make explicit uint32_t casting for left shift (Jason Ekstrand)
Split half float implementation. (Jason Ekstrand)
Fix brw_abs_immediate (Jose Maria Casanova)
Cc: "18.0 18.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The lowering pass was specialized to act on 64-bit to 32-bit conversions only,
but the implementation is valid for other cases.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We need to use 16-bit constants with 16-bit instructions,
otherwise we get the following validation error:
"Destination stride must be equal to the ratio of the sizes of
the execution data type to the destination type"
Because the execution data type is 4B due to the 32-bit integer
constant.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The Vertex Elements are now:
* VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <DrawID, is-indexed-draw, 0, 0>
VE1 is it kept as it was before, VE2 additionally contains the new
system value.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual void vec4_instruction_scheduler::count_reads_remaining(backend_instruction*)’:
src/intel/compiler/brw_schedule_instructions.cpp:764:72: warning: unused parameter ‘be’ [-Wunused-parameter]
vec4_instruction_scheduler::count_reads_remaining(backend_instruction *be)
^~
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual void vec4_instruction_scheduler::setup_liveness(cfg_t*)’:
src/intel/compiler/brw_schedule_instructions.cpp:769:51: warning: unused parameter ‘cfg’ [-Wunused-parameter]
vec4_instruction_scheduler::setup_liveness(cfg_t *cfg)
^~~
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual void vec4_instruction_scheduler::update_register_pressure(backend_instruction*)’:
src/intel/compiler/brw_schedule_instructions.cpp:774:75: warning: unused parameter ‘be’ [-Wunused-parameter]
vec4_instruction_scheduler::update_register_pressure(backend_instruction *be)
^~
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual int vec4_instruction_scheduler::get_register_pressure_benefit(backend_instruction*)’:
src/intel/compiler/brw_schedule_instructions.cpp:779:80: warning: unused parameter ‘be’ [-Wunused-parameter]
vec4_instruction_scheduler::get_register_pressure_benefit(backend_instruction *be)
^~
src/intel/compiler/brw_schedule_instructions.cpp: In member function ‘virtual int vec4_instruction_scheduler::issue_time(backend_instruction*)’:
src/intel/compiler/brw_schedule_instructions.cpp:1550:61: warning: unused parameter ‘inst’ [-Wunused-parameter]
vec4_instruction_scheduler::issue_time(backend_instruction *inst)
^~~~
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Since all of the fs_generator::generate_foo methods take a fs_inst * as
the first parameter, just remove the name to quiet the compiler.
src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_barrier(fs_inst*, brw_reg)’:
src/intel/compiler/brw_fs_generator.cpp:743:41: warning: unused parameter ‘inst’ [-Wunused-parameter]
fs_generator::generate_barrier(fs_inst *inst, struct brw_reg src)
^~~~
src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_discard_jump(fs_inst*)’:
src/intel/compiler/brw_fs_generator.cpp:1326:46: warning: unused parameter ‘inst’ [-Wunused-parameter]
fs_generator::generate_discard_jump(fs_inst *inst)
^~~~
src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_pack_half_2x16_split(fs_inst*, brw_reg, brw_reg, brw_reg)’:
src/intel/compiler/brw_fs_generator.cpp:1675:54: warning: unused parameter ‘inst’ [-Wunused-parameter]
fs_generator::generate_pack_half_2x16_split(fs_inst *inst,
^~~~
src/intel/compiler/brw_fs_generator.cpp: In member function ‘void fs_generator::generate_shader_time_add(fs_inst*, brw_reg, brw_reg, brw_reg)’:
src/intel/compiler/brw_fs_generator.cpp:1743:49: warning: unused parameter ‘inst’ [-Wunused-parameter]
fs_generator::generate_shader_time_add(fs_inst *inst,
^~~~
src/intel/compiler/brw_vec4_generator.cpp: In function ‘void generate_set_simd4x2_header_gen9(brw_codegen*, brw::vec4_instruction*, brw_reg)’:
src/intel/compiler/brw_vec4_generator.cpp:1412:52: warning: unused parameter ‘inst’ [-Wunused-parameter]
vec4_instruction *inst,
^~~~
src/intel/compiler/brw_vec4_generator.cpp: In function ‘void generate_mov_indirect(brw_codegen*, brw::vec4_instruction*, brw_reg, brw_reg, brw_reg, brw_reg)’:
src/intel/compiler/brw_vec4_generator.cpp:1430:41: warning: unused parameter ‘inst’ [-Wunused-parameter]
vec4_instruction *inst,
^~~~
src/intel/compiler/brw_vec4_generator.cpp:1432:63: warning: unused parameter ‘length’ [-Wunused-parameter]
struct brw_reg indirect, struct brw_reg length)
^~~~~~
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
They are send messages and this makes size_read() and mlen agree. For
both of these opcodes, the payload is just a dummy so mlen == 1 and this
should decrease register pressure a bit.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Cc: mesa-stable@lists.freedesktop.org
All operations with offset_reg at do_vector_read are done
with UD type. So copy propagation was not working through
the generated MOVs:
mov(8) vgrf9:UD, vgrf7:D
This change allows removing the MOV generated for reading the
first components for 16-bit and 64-bit ssbo reads with
non-constant offsets.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Until we set gl_BaseVertex to zero for non-indexed draw calls
both have an identical value.
The Vertex Elements are kept like that:
* VE 1: <BaseVertex/firstvertex, BaseInstance, VertexID, InstanceID>
* VE 2: <Draw ID, 0, 0, 0>
v2 (idr): Mark nir_intrinsic_load_first_vertex as "unreachable" in
emit_system_values_block and fs_visitor::nir_emit_vs_intrinsic.
- remove mtypes.h from most header files
- add main/menums.h for often used definitions
- remove main/core.h
v2: fix radv build
Reviewed-by: Brian Paul <brianp@vmware.com>
brw_reg::type is "enum brw_reg_type type:4". For whatever reason, GCC
is treating this as an int instead of an enum. As a result, it doesn't
detect missing switch cases and it doesn't detect that flow can get out
of the switch.
This silences the warning:
src/intel/compiler/brw_reg.h: In function ‘bool brw_regs_negative_equal(const brw_reg*, const brw_reg*)’:
src/intel/compiler/brw_reg.h:305:1: warning: control reaches end of non-void function [-Wreturn-type]
}
^
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
../src/intel/compiler/brw_reg.h: In function ‘bool brw_regs_negative_equal(const brw_reg*, const brw_reg*)’:
../src/intel/compiler/brw_reg.h:305:1: warning: control reaches end of non-void function [-Wreturn-type]
Introduced by 8f83eea71e ("i965: Add negative_equals methods").
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Add helpers to get the number of src/dest components for an intrinsic,
and update spots that were open-coding this logic to use the helpers
instead.
Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Otherwise, any indirect push constant access results in an assertion
failure when we start digging through the channel_sizes array. This
fixes dEQP-VK.pipeline.push_constant.graphics_pipeline.dynamic_index_vert
on Haswell. It should be a harmless no-op for GL since indirect push
constants aren't used there.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: e69e5c7006 "i965/vec4: load dvec3/4 uniforms first in the..."
The new name make the zero-input behavior more obvious. The next
patch adds a new function with different zero-input behavior.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
This was causing us to walk dest_components times over a thing with no
destination. This happened to work because all of the image intrinsics
without a destination also happened to have dest_components == 0. We
shouldn't be reading dest_components if has_dest == false.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
A recent commit (see below) triggered some cases where conditional
modifier propagation and dead code elimination would cause a MAD
instruction like the following to be generated:
mad.l.f0 null, ...
Matt pointed out that fs_visitor::fixup_3src_null_dest() fixes cases
like this in the scalar backend. This commit basically ports that code
to the vec4 backend.
NOTE: I have sent a couple tests to the piglit list that reproduce this
bug *without* the commit mentioned below. This commit fixes those
tests.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Fixes: ee63933a7 ("nir: Distribute binary operations with constants into bcsel")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105704
No shader-db changes. This source must have been written by a previous
instruction, so it cannot be a uniform or a shader input. However, this
change allows the next commit to help more shaders.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>