With shaders using a lot of inputs/outputs, like this (from Gtk+) :
layout(location = 0) in vec2 inPos;
layout(location = 1) in float inGradientPos;
layout(location = 2) in flat int inRepeating;
layout(location = 3) in flat int inStopCount;
layout(location = 4) in flat vec4 inClipBounds;
layout(location = 5) in flat vec4 inClipWidths;
layout(location = 6) in flat ColorStop inStops[8];
layout(location = 0) out vec4 outColor;
we're missing the programming of the input_slots_valid field leading
to an assert further down the backend code.
v2: Use valid slots of the geometry or vertex stage (Jason)
v3: Use helper to find correct vue map (Jason)
v4: Set the valid slots off the previous stages (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is the same we do in the GL driver: the hardware provides gl_Layer
in the VUE header, so when the fragment shader reads it we can't skip it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
So far, input_reads was a bitmap tracking which vertex input locations
were being used.
In OpenGL, an attribute bigger than a vec4 (like a dvec3 or dvec4)
consumes just one location, any other small attribute. So we mark the
proper bit in inputs_read, and also the same bit in double_inputs_read
if the attribute is a dvec3/dvec4.
But in Vulkan, this is slightly different: a dvec3/dvec4 attribute
consumes two locations, not just one. And hence two bits would be marked
in inputs_read for the same vertex input attribute.
To avoid handling two different situations in NIR, we just choose the
latest one: in OpenGL, when creating NIR from GLSL/IR, any dvec3/dvec4
vertex input attribute is marked with two bits in the inputs_read bitmap
(and also in the double_inputs_read), and following attributes are
adjusted accordingly.
As example, if in our GLSL/IR shader we have three attributes:
layout(location = 0) vec3 attr0;
layout(location = 1) dvec4 attr1;
layout(location = 2) dvec3 attr2;
then in our NIR shader we put attr0 in location 0, attr1 in locations 1
and 2, and attr2 in location 3 and 4.
Checking carefully, basically we are using slots rather than locations
in NIR.
When emitting the vertices, we do a inverse map to know the
corresponding location for each slot.
v2 (Jason):
- use two slots from inputs_read for dvec3/dvec4 NIR from GLSL/IR.
v3 (Jason):
- Fix commit log error.
- Use ladder ifs and fix braces.
- elements_double is divisible by 2, don't need DIV_ROUND_UP().
- Use if ladder instead of a switch.
- Add comment about hardware restriction in 64bit vertex attributes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We use *64*_PASSTHRU formats to upload vertex attributes of 64 bits
to avoid conversions. From the BDW PRM, Volume 2d, page 586
(VERTEX_ELEMENT_STATE):
"When SourceElementFormat is set to one of the *64*_PASSTHRU
formats, 64-bit components are stored in the URB without any
conversion. In this case, vertex elements must be written as 128
or 256 bits, with VFCOMP_STORE_0 being used to pad the output
as required. E.g., if R64_PASSTHRU is used to copy a 64-bit Red
component into the URB, Component 1 must be specified as
VFCOMP_STORE_0 (with Components 2,3 set to VFCOMP_NOSTORE)
in order to output a 128-bit vertex element, or Components 1-3 must
be specified as VFCOMP_STORE_0 in order to output a 256-bit vertex
element. Likewise, use of R64G64B64_PASSTHRU requires Component 3
to be specified as VFCOMP_STORE_0 in order to output a 256-bit vertex
element."
v2,v3 (Jason):
- Don't delete unused formats.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Because border color is handled pre-swizzle, when we move the alpha
channel around in the format, the OPAQUE_BLACK border colors don't work
correctly on B4G4R4A4_UNORM_PACK16 with the hack. This fixes the
following Vulkan CTS tests on Broadwell:
dEQP-VK.pipeline.sampler.view_type.2d_array.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black
dEQP-VK.pipeline.sampler.view_type.1d_array.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black
dEQP-VK.pipeline.sampler.view_type.2d.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black
dEQP-VK.pipeline.sampler.view_type.1d.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black
dEQP-VK.pipeline.sampler.view_type.3d.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
The specification section 9.4 says :
When an application attempts to create many pipelines in a single
command, it is possible that some subset may fail creation. In that
case, the corresponding entries in the pPipelines output array will
be filled with VK_NULL_HANDLE values. If any pipeline fails
creation (for example, due to out of memory errors), the
vkCreate*Pipelines commands will return an error code. The
implementation will attempt to create all pipelines, and only
return VK_NULL_HANDLE values for those that actually failed.
Fixes :
dEQP-VK.api.object_management.alloc_callback_fail_multiple.graphics_pipeline
dEQP-VK.api.object_management.alloc_callback_fail_multiple.compute_pipeline
v2: C is hard let's go shopping (Lionel)
v3: Remove unnecessary condition in for loops (Lionel)
v4: Document why we return on first failure (Eduardo)
Move i declaration inside for() (Eduardo)
v5: Move array cleanup out of loop (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The SPIR-V capability isn't even marked as enabled, and there are no
tests in Vulkan-CTS. Per Jason Ekstrand, this won't work in anv as such
write-only surfaces require additional setup which is currently not
performed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Dave Airlie <airlied@redhat.com>
Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>
We rename it to nir_deref_clone, re-order the sources to match the other
clone functions, and expose nir_deref_var_clone. This past part, in
particular, lets us get rid of quite a few lines since we no longer have
to call nir_copy_deref and wrap it in deref_as_var.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
The spec implicitly allows the incoming count to be 0. From the Vulkan
1.0.38 spec, Section 4.1 Physical Devices:
If the value referenced by pQueueFamilyPropertyCount is not 0 [then
do stuff].
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The Vulkan spec indicates that
vkGetPhysicalDeviceQueueFamilyProperties() should overwrite
pQueueFamilyPropertyCount with the number of structures actually
written to pQueueFamilyProperties.
Signed-off-by: Damien Grassart <damien@grassart.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Cc: mesa-stable@lists.freedesktop.org
This moves the nir_lower_indirect_derefs() call into
brw_preprocess_nir() so thats is called by both OpenGL and Vulkan
and removes that call to the old GLSL IR pass
lower_variable_index_to_cond_assign()
We want to do this pass in nir to be able to move loop unrolling
to nir.
There is a increase of 1-3 instructions in a small number of shaders,
and 2 Kerbal Space program shaders that increase by 32 instructions.
The changes seem to be caused be the difference in the GLSL IR vs
NIR variable index lowering passes. The GLSL IR pass creates a
simple if ladder for arrays of size 4 or less, while the NIR pass
implements a binary search for all arrays regardless of size.
Shader-db results BDW:
total instructions in shared programs: 13021176 -> 13021819 (0.00%)
instructions in affected programs: 57693 -> 58336 (1.11%)
helped: 20
HURT: 190
total cycles in shared programs: 299805580 -> 299750826 (-0.02%)
cycles in affected programs: 2290024 -> 2235270 (-2.39%)
helped: 337
HURT: 442
total fills in shared programs: 19984 -> 19984 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0
LOST: 4
GAINED: 0
V2: remove the do_copy_propagation() call from the i965 GLSL IR
linking code. This call was added in f7741c5211 but since we are
moving the variable index lowering to NIR we no longer need it and
can just rely on the nir copy propagation pass.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This fixes a regression in a bunch of image store vulkan CTS tests
from commit ad38ba1134, which started
using OWORD block read messages to implement UBO loads. The reason
for the failure is that we were giving bogus buffer alignment limits
to the application (1B), so the CTS would happily come back with
descriptor sets pointing at not even word-aligned uniform buffer
addresses.
Surprisingly the sampler messages used to fetch pull constants before
that commit were able to cope with the non-texel aligned addresses,
but the dataport messages used to fetch pull constants after that
commit and the ones used to access storage buffers (before and after
the same commit) aren't as permissive with unaligned addresses.
Cc: <mesa-stable@lists.freedesktop.org>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99097
Reported-by: Mark Janes <mark.a.janes@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This moves the nir_lower_indirect_derefs() call into
brw_preprocess_nir() so thats is called by both OpenGL and Vulkan
and removes that call to the old GLSL IR pass
lower_variable_index_to_cond_assign()
We want to do this pass in nir to be able to move loop unrolling
to nir.
There is a increase of 1-3 instructions in a small number of shaders,
and 2 Kerbal Space program shaders that increase by 32 instructions.
Shader-db results BDW:
total instructions in shared programs: 8705873 -> 8706194 (0.00%)
instructions in affected programs: 32515 -> 32836 (0.99%)
helped: 3
HURT: 79
total cycles in shared programs: 74618120 -> 74583476 (-0.05%)
cycles in affected programs: 528104 -> 493460 (-6.56%)
helped: 47
HURT: 37
LOST: 2
GAINED: 0
This is already supported in genX_state.c, expose the extension string.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
In an attempt to fix 3DSTATE_DEPTH_BUFFER for stencil-only cases, I
accidentally kept setting the SurfaceType to 2D in the stencil-only case
thanks to a copy+paste error.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Set the include paths to consider in-tree headers before out-of-tree
headers.
Avoids the build failing due to stale headers being present in
$prefix. Previosuly 'make -ki install' or something similar was required
to update the out-of-tree headers to allow the build to succeed.
Also avoids having to rebuild the entire thing after every 'make
install'.
Cc: Rob Clark <robdclark@gmail.com>
Cc: Jason Ekstrand <jason.ekstrand@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Cleaner this way and we avoid including gen9_pack.h when we compile with
gen8_pack.h. We also avoid the if (cherryview) condition for non-gen8
gens that don't need it.
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The batch chain logic only needs the pre-gen8 size of
MI_BATCH_BUFFER_START, which seems like something we can make a special
case for. The other two gen7 references, MI_BATCH_BUFFER_END and
MI_NOOP, are the same on all gens.
Signed-off-by: Kristian H. Kristensen <hoegsberg@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Remove duplicate .alphaToOne, add missing .shaderResourceMinLod, and
reorder a few entries to match their vulkan.h order. All the sparse
features are still left out entirely.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
This matches what NVIDIA and AMD hardware expose, as well as what Intel
hardware supports.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The 1-D special case doesn't actually apply to depth or HiZ. I discovered
this while converting BLORP over to genxml and ISL. The reason is that the
1-D special case only applies to the new Sky Lake 1-D layout which is only
used for LINEAR 1-D images. For tiled 1-D images, such as depth buffers,
the old gen4 2-D layout is used and the QPitch should be in rows.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
The gen7/8_cmd_buffer logic already sets the clamp, and it's piped
through via the dynamic state.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
These are all regularly available in desktop GL, so the backend fully
supports them.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I asked Emil to switch from 0 (success) vs. -1 (fail) to use a boolean
in my review comments. The "not" went missing. Easy mistake, but the
result is that nothing runs at all :)
Fix whitespace while we're here too.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>