This way we stop leaking it. This is completely safe because, when we
hand it off to anv_shader_bin_create or anv_pipeline_cache_upload_kernel,
they make a copy of the entire param array.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This lets us avoid some of the manual ralloc stealing and prepares for
future commits in which we will want to ralloc prog_data::param.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This moves us away to the array of pointers model and onto a model where
each param is represented by a generic uint32_t handle. We reserve 2^16
of these handles for builtins that get generated by somewhere inside the
compiler and have well-defined meanings. Generic params have handles
whose meanings are defined by the driver.
The primary downside to this new approach is that it moves a little bit
of the work that we would normally do at compile time to draw time. On
my laptop this hurts OglBatch6 by no more than 1% and doesn't seem to
have any measurable affect on OglBatch7. So, while this may come back
to bite us, it doesn't look too bad.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This pass implements all the implicit conversions required by the
VK_KHR_sampler_ycbcr_conversion specification.
It also inserts plane sources onto sampling instructions that we then
let the pipeline layout pass deal with, when mapping things correctly
to descriptors.
v2: Add new file to meson build (Lionel)
Use nir_frcp() rather than (1.0f / x) (Jason)
Reuse nir_tex_instr_dest_size() rather than handwritten one (Jason)
Return progress (Jason)
Account for array of samplers (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This structure contains two fields, binding and index, that store the
binding in the descriptor set and the index inside the binding.
These structures are defined as uint8_t, but the types in Vulkan
specification are uint32_t, so big values are clamp.
This fixes dEQP-VK.binding_model.shader_access.*.multiple_arbitrary_descriptors.*
v2: use UINT32_MAX for index when having no render targets (Tapani)
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Avoids Clang's warning about the current code:
warning: suggest braces around initialization of subobject
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Right now, OpenGL uses the GLSL lowering for shared variables and anv
uses NIR to lower them. For a long time, we've done this weird thing
where we do the NIR lowering unconditionally and then add the SLM sizes
from the two together. This works because one of them will always be 0
but it's a bit sketchy. Let's just move the NIR-based lowering into
anv_pipeline and get rid of the sketch.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
We don't support the general version yet because that requires us to
lower shared variables up-front in SPIR-V -> NIR. This shouldn't be a
whole lot of work but it's not something we support today.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
In the previous commit, forgot to apply v2 suggestions.
Fixes: 28d0c38 (anv/pipeline: use unsigned long long constant to check
enable vertex inputs)
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
When initializing the ANV pipeline, one of the tasks is checking which
vertex inputs are enabled. This is done by checking if the enabled bits
in inputs_read.
But the mask to use is computed doing `(1 << (VERT_ATTRIB_GENERIC0 +
desc->location))`. The problem here is that if location is 15 or
greater, the sum is 32 or greater. But C is handling 1 as a 32-bit
integer, which means the displaced bit is out of range and thus the full
value is 0.
Thus, use 1ull, which is an unsigned long long value.
This fixes:
dEQP-VK.pipeline.vertex_input.max_attributes.16_attributes.binding_one_to_one.interleaved
v2: use 1ull instead of BITFIELD64_BIT() (Matt Turner)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
SPIR-V tessellation shaders that were created from HLSL will have
the primitive generation domain set in tessellation control shader
(hull shader in HLSL) instead of the tessellation evaluation shader.
v2:
- Add assert (Kenneth)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit e1af20f18a changed the shader_info
from being embedded into being just a pointer. The idea was that
sharing the shader_info between NIR and GLSL would be easier if it were
a pointer pointing to the same shader_info struct. This, however, has
caused a few problems:
1) There are many things which generate NIR without GLSL. This means
we have to support both NIR shaders which come from GLSL and ones
that don't and need to have an info elsewhere.
2) The solution to (1) raises all sorts of ownership issues which have
to be resolved with ralloc_parent checks.
3) Ever since 00620782c9, we've been
using nir_gather_info to fill out the final shader_info. Thanks to
cloning and the above ownership issues, the nir_shader::info may not
point back to the gl_shader anymore and so we have to do a copy of
the shader_info from NIR back to GLSL anyway.
All of these issues go away if we just embed the shader_info in the
nir_shader. There's a little downside of having to copy it back after
calling nir_gather_info but, as explained above, we have to do that
anyway.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Jason Ekstrand):
- Take a view_mask rather than a whole subpass
- Build the view mask into the VS shader key
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We want to insert more lowering code that may insert system values and
we need to gather info after that lowering.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Shader hashing is very closely related to shader compilation. Putting
them right next to each other in anv_pipeline makes it easier to verify
that we're actually hashing everything we need to be hashing. The only
real change (other than the order of hashing) is that we now hash in the
shader stage.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We need to know if sample shading has been requested during shader
compilation since that affects the way fragment coordinates are
computed.
Notice that the semantics of fragment coordinates only depend on
whether sample shading has been requested, not on whether more
than one sample will actually be produced (that is,
minSampleShading and rasterizationSamples do not affect this
behavior).
Because this setting affects the code we generate for the shader, we also
need to include it in the WM prog key. Notice we don't need to alter the
OpenGL code because it doesn't ever use this behavior, so they key's
value is always false (the default).
Fixes:
dEQP-VK.glsl.builtin_var.fragcoord_msaa.*
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
According to section 14.6 of the Vulkan specification:
"When sample shading is enabled, the x and y components of FragCoord
reflect the location of the sample corresponding to the shader
invocation."
So add a boolean parameter to the lowering pass to select this behavior
when we need it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Specifically, report 'out of memory' errors that might have happened while
emitting the pipeline's batch.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
We have a performance problem with dynamic buffer descriptors. Because
we are currently implementing them by pushing an offset into the shader
and adding that offset onto the already existing offset for the UBO/SSBO
operation, all UBO/SSBO operations on dynamic descriptors are indirect.
The back-end compiler implements indirect pull constant loads using what
basically amounts to a texelFetch instruction. For pull constant loads
with constant offsets, however, we use an oword block read message which
goes through the constant cache and reads a whole cache line at a time.
Because of these two things, direct pull constant loads are much faster
than indirect pull constant loads. Because all loads from dynamically
bound buffers are indirect, the user takes a substantial performance
penalty when using this "performance" feature.
There are two potential solutions I have seen for this problem. The
alternate solution is to continue pushing offsets into the shader but
wire things up in the back-end compiler so that we use the oword block
read messages anyway. The only reason we can do this because we know a
priori that the dynamic offsets are uniform and 16-byte aligned.
Unfortunately, thanks to the 16-byte alignment requirement of the oword
messages, we can't do some general "if the indirect offset is uniform,
use an oword message" sort of thing.
This solution, however, is recommended for a few of reasons:
1. Surface states are relatively cheap. We've been using on-the-fly
surface state setup for some time in GL and it works well. Also,
dynamic offsets with on-the-fly surface state should still be
cheaper than allocating new descriptor sets every time you want to
change a buffer offset which is really the only requirement of the
dynamic offsets feature.
2. This requires substantially less compiler plumbing. Not only can we
delete the entire apply_dynamic_offsets pass but we can also avoid
having to add architecture for passing dynamic offsets to the back-
end compiler in such a way that it can continue using oword messages.
3. We get robust buffer access range-checking for free. Because the
offset and range are baked into the surface state, we no longer need
to pass ranges around and do bounds-checking in the shader.
4. Once we finally get UBO pushing implemented, it will be much easier
to handle pushing chunks of dynamic descriptors if the compiler
remains blissfully unaware of dynamic descriptors.
This commit improves performance of The Talos Principle on ULTRA
settings by around 50% and brings it nicely into line with OpenGL
performance.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Mostly a dummy git mv with a couple of noticable parts:
- With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
- Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
- brw_util.[ch] is not really compiler specific, so it's moved to i965.
v2:
- move brw_eu_defines.h instead of brw_defines.h
- remove no-longer applicable includes
- add missing vulkan/ prefix in the Android build (thanks Tapani)
v3:
- don't list brw_defines.h in src/intel/Makefile.sources (Jason)
- rebase on top of the oa patches
[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Over the course of driver development, we've come up with a number of
different schemes for adding giant blocks of asserts inside the driver.
This one is only being used once in anv_pipeline.c and the way it's
being used actually generates compiler warnings in release builds. This
commit drops the anv_validate macro and just puts the contents of the
one validation function in side of a "#ifdef DEBUG" guard.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We will be using the image layout. Store the full struct directly from
the user.
Signed-off-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This just enables basic MSAA compression (no fast clears) for all
multisampled surfaces. This improves the framerate of the Sascha
"multisampling" demo by 76% on my Sky Lake laptop. Running Talos on
medium settings with 8x MSAA, this improves the framerate in the
benchmark by 80%.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
This allows shaders to write to storage images declared with unknown
format if they are decorated with NonReadable ("writeonly" in GLSL).
Previously an image view would always use a lowered format for its
surface state, however when a shader declares a write-only image, we
should use the real format. Since we don't know at view creation time
whether it will be used with only write-only images in shaders, create
two surface states using both the original format and the lowered
format. When emitting the binding table, choose between the states
based on whether the image is declared write-only in the shader.
Tested on both Sascha Willems' computeshader sample (with the original
shaders and ones modified to declare images writeonly and omit their
format qualifiers) and on our own shaders for which we need support
for this.
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
When multiple shader stages exist in the same SPIR-V module, we compile
all entry points and their inputs/outputs, then dead code eliminate the
ones not related to the specific entry point later.
nir_lower_wpos_center was being run prior to eliminating those random
other variables, which made it trip up, thinking it found gl_FragCoord
when it actually found something else like gl_PerVertex[3].
Fixes dEQP-VK.spirv_assembly.instruction.graphics.module.same_module.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
With shaders using a lot of inputs/outputs, like this (from Gtk+) :
layout(location = 0) in vec2 inPos;
layout(location = 1) in float inGradientPos;
layout(location = 2) in flat int inRepeating;
layout(location = 3) in flat int inStopCount;
layout(location = 4) in flat vec4 inClipBounds;
layout(location = 5) in flat vec4 inClipWidths;
layout(location = 6) in flat ColorStop inStops[8];
layout(location = 0) out vec4 outColor;
we're missing the programming of the input_slots_valid field leading
to an assert further down the backend code.
v2: Use valid slots of the geometry or vertex stage (Jason)
v3: Use helper to find correct vue map (Jason)
v4: Set the valid slots off the previous stages (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This moves the nir_lower_indirect_derefs() call into
brw_preprocess_nir() so thats is called by both OpenGL and Vulkan
and removes that call to the old GLSL IR pass
lower_variable_index_to_cond_assign()
We want to do this pass in nir to be able to move loop unrolling
to nir.
There is a increase of 1-3 instructions in a small number of shaders,
and 2 Kerbal Space program shaders that increase by 32 instructions.
The changes seem to be caused be the difference in the GLSL IR vs
NIR variable index lowering passes. The GLSL IR pass creates a
simple if ladder for arrays of size 4 or less, while the NIR pass
implements a binary search for all arrays regardless of size.
Shader-db results BDW:
total instructions in shared programs: 13021176 -> 13021819 (0.00%)
instructions in affected programs: 57693 -> 58336 (1.11%)
helped: 20
HURT: 190
total cycles in shared programs: 299805580 -> 299750826 (-0.02%)
cycles in affected programs: 2290024 -> 2235270 (-2.39%)
helped: 337
HURT: 442
total fills in shared programs: 19984 -> 19984 (0.00%)
fills in affected programs: 0 -> 0
helped: 0
HURT: 0
LOST: 4
GAINED: 0
V2: remove the do_copy_propagation() call from the i965 GLSL IR
linking code. This call was added in f7741c5211 but since we are
moving the variable index lowering to NIR we no longer need it and
can just rely on the nir copy propagation pass.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This moves the nir_lower_indirect_derefs() call into
brw_preprocess_nir() so thats is called by both OpenGL and Vulkan
and removes that call to the old GLSL IR pass
lower_variable_index_to_cond_assign()
We want to do this pass in nir to be able to move loop unrolling
to nir.
There is a increase of 1-3 instructions in a small number of shaders,
and 2 Kerbal Space program shaders that increase by 32 instructions.
Shader-db results BDW:
total instructions in shared programs: 8705873 -> 8706194 (0.00%)
instructions in affected programs: 32515 -> 32836 (0.99%)
helped: 3
HURT: 79
total cycles in shared programs: 74618120 -> 74583476 (-0.05%)
cycles in affected programs: 528104 -> 493460 (-6.56%)
helped: 47
HURT: 37
LOST: 2
GAINED: 0
This fixes a bunch of new CTS tests which look for exactly this. Even in
the cases where we just call vk_free to free a CPU data structure, we still
handle NULL explicitly. This way we're less likely to forget to handle
NULL later should we actually do something less trivial.
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>