Instead of having each anv_pipeline_compile_* function populate the
shader key, make it part of the anv_pipeline_stage struct and fill it
out up-front.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Instead of just looking at the number of color attachments, look at
which ones are actually used by the subpass. This lets us potentially
throw away chunks of the fragment shader. In DXVK, for example, all
subpasses have 8 attachments and most are VK_ATTACHMENT_UNUSED so this
is very helpful in that case.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The back-end compiler emits the number of color writes specified by
wm_prog_key::nr_color_regions regardless of what nir_store_outputs we
have. Once we've gone through and figured out which render targets
actually exist and are written by the shader, we should restrict the key
to avoid extra RT write messages.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
With the new deref instructions, we have to keep the modes consistent
between the derefs and the variables they reference. Since we remove
outputs by changing them to local variables, we need to run the fixup
pass to fix the modes.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Until now, we had separate passes for lowering gl_PatchVerticesIn to
a statically known constant (for TES inputs when linked against a TCS),
and a uniform in the other cases. Annoyingly, one had to be run before
nir_lower_system_values, and the other afterward. This simplified the
passes, but made life painful for the callers.
This patch combines both into a single pass. If you give it a non-zero
static count, it uses that. If you give it Mesa state slots, it turns
it back into a built-in uniform. Otherwise, it does nothing.
This also moves the i965 uniform lowering out to shared code.
v2: Make token arrays const.
Reviewed-by: Eric Anholt <eric@anholt.net>
The original pass only looked for load_uniform intrinsics but there are
a number of other places that could end up loading a push constant. One
obvious omission was images which always implicitly use a push constant.
Legacy VS clip planes also get pushed into the shader. This fixes some
new Vulkan CTS tests that test random combinations of bindings and, in
particular, test lots of UBOs and images together.
Cc: mesa-stable@lists.freedesktop.org
Cc: Kenneth Graunke <kenneth@whitecape.org>
Note that the use of ICMS_INNER_CONSERVATIVE disagrees with the GL driver.
Perhaps it's more performant than ICMS_NORMAL and is otherwise permitted?
Not sure, so I left it as-is.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Enables SPV_KHR_8bit_storage and VK_KHR_8bit_storage on gen 8+
using the VK_KHR_get_physical_device_properties2 functionality
to expose if the extension is supported or not.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This makes certain checks a bit easier and means that we don't have
the attachment information duplicated in the attachment list and in
depth_stencil_attachment.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Before, we were only hashing the shader if we had a shader cache to
cache things in. This means that if we ever get it wrong, we could end
up trying to cache a shader with an undefined hash. Since not having a
shader cache is an extremely uncommon case, let's optimize for code
clarity and obvious correctness over avoiding a hash operation.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This commit removes most of the deref instruction lowering. Instead of
lowering early, we only lower textures and images and we only do so
right before any of the anv image lowering passes.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This commit completely reworks function calls in NIR. Instead of having
a set of variables for the parameters and return value, nir_call_instr
now has simply has a number of sources which get mapped to load_param
intrinsics inside the functions. It's up to the client API to build an
ABI on top of that. In SPIR-V, out parameters are handled by passing
the result of a deref through as an SSA value and storing to it.
This virtue of this approach can be seen by how much it allows us to
delete from core NIR. In particular, nir_inline_functions gets halved
and goes from a fairly difficult pass to understand in detail to almost
trivial. It also simplifies spirv_to_nir somewhat because NIR functions
never were a good fit for SPIR-V.
Unfortunately, there is no good way to do this without a mega-commit.
Core NIR and SPIR-V have to be changed at the same time. This also
requires changes to anv and radv because nir_inline_functions couldn't
handle deref instructions before this change and can't work without them
after this change.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Before, we were doing structure splitting in spirv_to_nir.
Unfortunately, this doesn't really work when you think about passing
struct pointers into functions. Doing it later in NIR is a much better
plan.
Acked-by: Rob Clark <robdclark@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
According to Vulkan spec:
"pColorBlendState is a pointer to an instance of the
VkPipelineColorBlendStateCreateInfo structure, and is ignored if the
pipeline has rasterization disabled or if the subpass of the render pass the
pipeline is created against does not use any color attachments."
Fixes tests from CL#2505:
dEQP-VK.renderpass.*.simple.color_unused_omit_blend_state
v2:
- Check that blend is not NULL before usage.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Once we've gotten rid of everything but the main entrypoint, there's no
reason why we should go ahead and lower them all. This is what radv
does and it will make future work easier.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Since the intermediate states of active_stages are not used,
i.e. active_stages is read only after all stages were set into it,
just set its value before compiling the shaders.
This will allow to conditionally run certain passes based on what
other shaders are being used, e.g. a certain pass might only be
applicable to the vertex shader if there's no geometry or tessellation
shader being used.
v2: Use vk_to_mesa_shader_stage. (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This requires us to bump the subgroup size to 32 for all shader stages
because Vulkan requires that to be a physical device query.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This is part of the device groups extension/feature but it's a decent
chunk of work in its own right so it's worth breaking into its own
patch. The mechanism we use is fairly straightforward: we just push the
base work group id into the shader and add it to the work group id we
get from dispatch.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
If a shader only writes to an output via a constant initializer we
need to lower it before we call nir_remove_dead_variables so that
this pass sees the stores from the initializer and doesn't kill the
output.
Fixes test failures in new work-in-progress CTS tests:
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output_vert
dEQP-VK.spirv_assembly.instruction.graphics.variable_init.output_frag
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The Vulkan spec states that VkPipelineLayout objects must not be
destroyed while any command buffer that uses them is in the recording
state, but it permits them to be destroyed otherwise. This means that
applications are allowed to free pipeline layouts after command recording
is finished even if there are pipeline objects that still exist and were
created with these layouts.
There are two solutions to this, one is to use reference counting on
pipeline layout objects. The other is to avoid holding references to
pipeline layouts where they are not really needed.
This patch takes a step towards the second option by making the
pipeline shader compile code take pipeline layout from the
VkGraphicsPipelineCreateInfo provided rather than the pipeline
object.
A follow-up patch will remove any remaining uses of the layout field
so we can remove it from the pipeline object and avoid the need
for reference counting.
v2: Use ANV_FROM_HANDLE, remove unnecessary braces (Jason)
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Older OpenGL defines two equations for converting from signed-normalized
to floating point data. These are:
f = (2c + 1)/(2^b - 1) (equation 2.2)
f = max{c/2^(b-1) - 1), -1.0} (equation 2.3)
Both OpenGL 4.2+ and OpenGL ES 3.0+ mandate that equation 2.3 is to be
used in all scenarios, and remove equation 2.2. DirectX uses equation
2.3 as well. Intel hardware only supports equation 2.3, so Gen7.5+
systems that use the vertex fetcher hardware to do the conversions
always get formula 2.3.
This can make a big difference for 10-10-10-2 formats - the 2-bit value
can represent 0 with equation 2.3, and cannot with equation 2.2.
Ivybridge and older were using equation 2.2 for OpenGL, and 2.3 for ES.
Now that Ivybridge supports OpenGL 4.2, this is wrong - we need to use
the new rules, at least in core profile. That would leave Gen4-6 doing
something different than all other hardware, which seems...lame.
With context version promotion, applications that requested a pre-4.2
context may get promoted to 4.2, and thus get the new rules. Zero cases
have been reported of this being a problem. However, we've received a
report that following the old rules breaks expectations. SuperTuxKart
apparently renders the cars red when following equation 2.2, and works
correctly when following equation 2.3:
https://github.com/supertuxkart/stk-code/issues/2885#issuecomment-353858405
So, this patch deletes the legacy equation 2.2 support entirely, making
all hardware and APIs consistently use the new equation 2.3 rules.
If we ever find an application that truly requires the old formula, then
we'd likely want that application to work on modern hardware, too. We'd
likely restore this support as a driconf option. Until then, drop it.
This commit will regress Piglit's draw-vertices-2101010 test on
pre-Haswell without the corresponding Piglit patch to accept either
formula (commit 35daaa1695ea01eb85bc02f9be9b6ebd1a7113a1):
draw-vertices-2101010: Accept either SNORM conversion formula.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisforbes@google.com>
We can write to the same output but in different components, like
in this example:
layout(location = 0, component = 0) out ivec2 dEQP_FragColor_0;
layout(location = 0, component = 2) out ivec2 dEQP_FragColor_1;
Therefore, they are not two different outputs but only one.
Fixes:
dEQP-VK.glsl.440.linkage.varying.component.frag_out.*
v3:
- Remove FRAG_RESULT_MAX.
- Add const and use sizeof (Ian).
- Do three-pass to set properly the locations of fragment
outputs when having arrays (Jason).
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Push constants on Intel hardware are significantly more performant than
pull constants. Since most Vulkan applications don't actively use push
constants on Vulkan or at least don't use it heavily, we're pulling way
more than we should be. By enabling pushing chunks of UBOs we can get
rid of a lot of those pulls.
On my SKL GT4e, this improves the performance of Dota 2 and Talos by
around 2.5% and improves Aztec Ruins by around 2%.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Enables SPV_KHR_16bit_storage on gen 8+.
VK_KHR_16bit_storage is enabled for SSBO/UBO using the
VK_KHR_get_physical_device_properties2 functionality to expose
if the extension is supported or not.
v2: update due rebase against master (Alejandro)
v3: (Jason Ekstrand)
- Move this patch up in VK_KHR_16bit_storage series enabling only
storageBuffer16BitAccess and uniformAndStorageBuffer16BitAccess.
- Only expose VK_KHR_16bit_storage on Gen8+
v4: (Jason Ekstrand)
- Squash enable SPV_KHR_16bit_storage into VK_KHR_16bit_storage
enablement for SSBO/UBO.
Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Not to be confused with variablePointersStorageBuffer which is the
subset of VK_KHR_variable_pointers required to enable the extension.
This means we now have "full" support for variable pointers.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
This is a bit more general and lets us pass additional options into the
spirv_to_nir pass beyond what capabilities we support.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
We currently have a bug where nir_lower_system_values gets called before
nir_lower_var_copies so it will miss any system value uses which come
from a copy_var intrinsic. Moving it to after brw_preprocess_nir fixes
this problem.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable@lists.freedesktop.org