According to Vulkan spec:
"pColorBlendState is a pointer to an instance of the
VkPipelineColorBlendStateCreateInfo structure, and is ignored if the
pipeline has rasterization disabled or if the subpass of the render pass the
pipeline is created against does not use any color attachments."
Fixes tests from CL#2505:
dEQP-VK.renderpass.*.simple.color_unused_omit_blend_state
v2:
- Check that blend is not NULL before usage.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Once we've gotten rid of everything but the main entrypoint, there's no
reason why we should go ahead and lower them all. This is what radv
does and it will make future work easier.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Since the intermediate states of active_stages are not used,
i.e. active_stages is read only after all stages were set into it,
just set its value before compiling the shaders.
This will allow to conditionally run certain passes based on what
other shaders are being used, e.g. a certain pass might only be
applicable to the vertex shader if there's no geometry or tessellation
shader being used.
v2: Use vk_to_mesa_shader_stage. (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This requires us to bump the subgroup size to 32 for all shader stages
because Vulkan requires that to be a physical device query.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This is part of the device groups extension/feature but it's a decent
chunk of work in its own right so it's worth breaking into its own
patch. The mechanism we use is fairly straightforward: we just push the
base work group id into the shader and add it to the work group id we
get from dispatch.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
If a shader only writes to an output via a constant initializer we
need to lower it before we call nir_remove_dead_variables so that
this pass sees the stores from the initializer and doesn't kill the
output.
Fixes test failures in new work-in-progress CTS tests:
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output_vert
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output_frag
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The Vulkan spec states that VkPipelineLayout objects must not be
destroyed while any command buffer that uses them is in the recording
state, but it permits them to be destroyed otherwise. This means that
applications are allowed to free pipeline layouts after command recording
is finished even if there are pipeline objects that still exist and were
created with these layouts.
There are two solutions to this, one is to use reference counting on
pipeline layout objects. The other is to avoid holding references to
pipeline layouts where they are not really needed.
This patch takes a step towards the second option by making the
pipeline shader compile code take pipeline layout from the
VkGraphicsPipelineCreateInfo provided rather than the pipeline
object.
A follow-up patch will remove any remaining uses of the layout field
so we can remove it from the pipeline object and avoid the need
for reference counting.
v2: Use ANV_FROM_HANDLE, remove unnecessary braces (Jason)
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Older OpenGL defines two equations for converting from signed-normalized
to floating point data. These are:
f = (2c + 1)/(2^b - 1) (equation 2.2)
f = max{c/2^(b-1) - 1), -1.0} (equation 2.3)
Both OpenGL 4.2+ and OpenGL ES 3.0+ mandate that equation 2.3 is to be
used in all scenarios, and remove equation 2.2. DirectX uses equation
2.3 as well. Intel hardware only supports equation 2.3, so Gen7.5+
systems that use the vertex fetcher hardware to do the conversions
always get formula 2.3.
This can make a big difference for 10-10-10-2 formats - the 2-bit value
can represent 0 with equation 2.3, and cannot with equation 2.2.
Ivybridge and older were using equation 2.2 for OpenGL, and 2.3 for ES.
Now that Ivybridge supports OpenGL 4.2, this is wrong - we need to use
the new rules, at least in core profile. That would leave Gen4-6 doing
something different than all other hardware, which seems...lame.
With context version promotion, applications that requested a pre-4.2
context may get promoted to 4.2, and thus get the new rules. Zero cases
have been reported of this being a problem. However, we've received a
report that following the old rules breaks expectations. SuperTuxKart
apparently renders the cars red when following equation 2.2, and works
correctly when following equation 2.3:
https://github.com/supertuxkart/stk-code/issues/2885#issuecomment-353858405
So, this patch deletes the legacy equation 2.2 support entirely, making
all hardware and APIs consistently use the new equation 2.3 rules.
If we ever find an application that truly requires the old formula, then
we'd likely want that application to work on modern hardware, too. We'd
likely restore this support as a driconf option. Until then, drop it.
This commit will regress Piglit's draw-vertices-2101010 test on
pre-Haswell without the corresponding Piglit patch to accept either
formula (commit 35daaa1695ea01eb85bc02f9be9b6ebd1a7113a1):
draw-vertices-2101010: Accept either SNORM conversion formula.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
We can write to the same output but in different components, like
in this example:
layout(location = 0, component = 0) out ivec2 dEQP_FragColor_0;
layout(location = 0, component = 2) out ivec2 dEQP_FragColor_1;
Therefore, they are not two different outputs but only one.
Fixes:
dEQP-VK.glsl.440.linkage.varying.component.frag_out.*
v3:
- Remove FRAG_RESULT_MAX.
- Add const and use sizeof (Ian).
- Do three-pass to set properly the locations of fragment
outputs when having arrays (Jason).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Push constants on Intel hardware are significantly more performant than
pull constants. Since most Vulkan applications don't actively use push
constants on Vulkan or at least don't use it heavily, we're pulling way
more than we should be. By enabling pushing chunks of UBOs we can get
rid of a lot of those pulls.
On my SKL GT4e, this improves the performance of Dota 2 and Talos by
around 2.5% and improves Aztec Ruins by around 2%.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Enables SPV_KHR_16bit_storage on gen 8+.
VK_KHR_16bit_storage is enabled for SSBO/UBO using the
VK_KHR_get_physical_device_properties2 functionality to expose
if the extension is supported or not.
v2: update due rebase against master (Alejandro)
v3: (Jason Ekstrand)
- Move this patch up in VK_KHR_16bit_storage series enabling only
storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccess.
- Only expose VK_KHR_16bit_storage on Gen8+
v4: (Jason Ekstrand)
- Squash enable SPV_KHR_16bit_storage into VK_KHR_16bit_storage
enablement for SSBO/UBO.
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Not to be confused with variablePointersStorageBuffer which is the
subset of VK_KHR_variable_pointers required to enable the extension.
This means we now have "full" support for variable pointers.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This is a bit more general and lets us pass additional options into the
spirv_to_nir pass beyond what capabilities we support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
We currently have a bug where nir_lower_system_values gets called before
nir_lower_var_copies so it will miss any system value uses which come
from a copy_var intrinsic. Moving it to after brw_preprocess_nir fixes
this problem.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Now that the only thing we put in the array up-front are client push
constants, we can simplify anv_pipeline_compile a bit.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before, we were calculating up-front and then filling in later. Now we
just grow as needed in anv_nir_apply_pipeline_layout.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This way any image uniforms end up having locations higher than
MAX_PUSH_CONSTANT_SIZE. There's no bug here at the moment, but this
consistency will make the next commit easier. Also, because
nir_apply_pipeline_layout properly increments nir->num_uniforms when
it expands the param array, we no longer need to stomp it to match
prog_data::nr_params because it already does.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Instead of making the caller of brw_compile_cs add something to the
param array for thread_local_id_index, just add it on-demand in
brw_nir_intrinsics and grow the array. This is now safe to do because
everyone is now using ralloc for prog_data::param.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This way we stop leaking it. This is completely safe because, when we
hand it off to anv_shader_bin_create or anv_pipeline_cache_upload_kernel,
they make a copy of the entire param array.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This lets us avoid some of the manual ralloc stealing and prepares for
future commits in which we will want to ralloc prog_data::param.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This moves us away to the array of pointers model and onto a model where
each param is represented by a generic uint32_t handle. We reserve 2^16
of these handles for builtins that get generated by somewhere inside the
compiler and have well-defined meanings. Generic params have handles
whose meanings are defined by the driver.
The primary downside to this new approach is that it moves a little bit
of the work that we would normally do at compile time to draw time. On
my laptop this hurts OglBatch6 by no more than 1% and doesn't seem to
have any measurable affect on OglBatch7. So, while this may come back
to bite us, it doesn't look too bad.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This pass implements all the implicit conversions required by the
VK_KHR_sampler_ycbcr_conversion specification.
It also inserts plane sources onto sampling instructions that we then
let the pipeline layout pass deal with, when mapping things correctly
to descriptors.
v2: Add new file to meson build (Lionel)
Use nir_frcp() rather than (1.0f / x) (Jason)
Reuse nir_tex_instr_dest_size() rather than handwritten one (Jason)
Return progress (Jason)
Account for array of samplers (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This structure contains two fields, binding and index, that store the
binding in the descriptor set and the index inside the binding.
These structures are defined as uint8_t, but the types in Vulkan
specification are uint32_t, so big values are clamp.
This fixes dEQP-VK.binding_model.shader_access.*.multiple_arbitrary_descriptors.*
v2: use UINT32_MAX for index when having no render targets (Tapani)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Avoids Clang's warning about the current code:
warning: suggest braces around initialization of subobject
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Right now, OpenGL uses the GLSL lowering for shared variables and anv
uses NIR to lower them. For a long time, we've done this weird thing
where we do the NIR lowering unconditionally and then add the SLM sizes
from the two together. This works because one of them will always be 0
but it's a bit sketchy. Let's just move the NIR-based lowering into
anv_pipeline and get rid of the sketch.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We don't support the general version yet because that requires us to
lower shared variables up-front in SPIR-V -> NIR. This shouldn't be a
whole lot of work but it's not something we support today.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
In the previous commit, forgot to apply v2 suggestions.
Fixes: 28d0c38 (anv/pipeline: use unsigned long long constant to check
enable vertex inputs)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
When initializing the ANV pipeline, one of the tasks is checking which
vertex inputs are enabled. This is done by checking if the enabled bits
in inputs_read.
But the mask to use is computed doing `(1 << (VERT_ATTRIB_GENERIC0 +
desc->location))`. The problem here is that if location is 15 or
greater, the sum is 32 or greater. But C is handling 1 as a 32-bit
integer, which means the displaced bit is out of range and thus the full
value is 0.
Thus, use 1ull, which is an unsigned long long value.
This fixes:
dEQP-VK.pipeline.vertex_input.max_attributes.16_attributes.binding_one_to_one.interleaved
v2: use 1ull instead of BITFIELD64_BIT() (Matt Turner)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
SPIR-V tessellation shaders that were created from HLSL will have
the primitive generation domain set in tessellation control shader
(hull shader in HLSL) instead of the tessellation evaluation shader.
v2:
- Add assert (Kenneth)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit e1af20f18a changed the shader_info
from being embedded into being just a pointer. The idea was that
sharing the shader_info between NIR and GLSL would be easier if it were
a pointer pointing to the same shader_info struct. This, however, has
caused a few problems:
1) There are many things which generate NIR without GLSL. This means
we have to support both NIR shaders which come from GLSL and ones
that don't and need to have an info elsewhere.
2) The solution to (1) raises all sorts of ownership issues which have
to be resolved with ralloc_parent checks.
3) Ever since 00620782c9, we've been
using nir_gather_info to fill out the final shader_info. Thanks to
cloning and the above ownership issues, the nir_shader::info may not
point back to the gl_shader anymore and so we have to do a copy of
the shader_info from NIR back to GLSL anyway.
All of these issues go away if we just embed the shader_info in the
nir_shader. There's a little downside of having to copy it back after
calling nir_gather_info but, as explained above, we have to do that
anyway.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Jason Ekstrand):
- Take a view_mask rather than a whole subpass
- Build the view mask into the VS shader key
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We want to insert more lowering code that may insert system values and
we need to gather info after that lowering.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Shader hashing is very closely related to shader compilation. Putting
them right next to each other in anv_pipeline makes it easier to verify
that we're actually hashing everything we need to be hashing. The only
real change (other than the order of hashing) is that we now hash in the
shader stage.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We need to know if sample shading has been requested during shader
compilation since that affects the way fragment coordinates are
computed.
Notice that the semantics of fragment coordinates only depend on
whether sample shading has been requested, not on whether more
than one sample will actually be produced (that is,
minSampleShading and rasterizationSamples do not affect this
behavior).
Because this setting affects the code we generate for the shader, we also
need to include it in the WM prog key. Notice we don't need to alter the
OpenGL code because it doesn't ever use this behavior, so they key's
value is always false (the default).
Fixes:
dEQP-VK.glsl.builtin_var.fragcoord_msaa.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>