Instead of this fragile use_primitive_replication bit which we set
differently depending on whether or not we pulled the shader out of the
cache, compute and use the information up-front during the compile and
then always fetch it from the vue_map after that. This way, regardless
of whether the shader comes from the cache or not, we have the same flow
and there are no inconsistencies.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17602>
Wa_14010455700 is dependent on the format and sample count, but our
code to track whether or not it had been applied was only dependent on
the format.
As a result, we failed to enable the workaround when an app used a D16
2xMSAA buffer, then a D16 1xMSAA buffer right afterwards.
Make the workaround tracking code sample-dependent to fix this.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17859>
Instead of having 2 VkMemoryType pointing to the same VkMemoryHeap, we
have each VkMemoryType with VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT (one
host visible, the other not) point to its own VkMemoryHeap. For the
local heap that is host visible, we'll use the
I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS flag at GEM BO creation.
When the smallbar uAPI is not available we fallback to a single heap
and do not use I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS.
v2: Handle probed_cpu_visible_size == probed_size (Matthew)
v3:
* Jordan: Use region info from devinfo
v4: Also make the vram host visible heap as local (Ken)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16739>
When we made depth/stencil dynamic, we lost the optimization. This is
particularly important for cases where the stencil test is enabled but
never writes anything as certain combinations with discard can cause
the stencil write (which doesn't do anything) to get moved late which
can be a measurable perf hit. According to 028e1137e6 ("anv/pipeline:
Be smarter about depth/stencil state", it was a couple percent for DOTA2
on Broadwell back in the day. No idea how it affects current titles.
This may also improve the depth/stncil PMA workarounds on Gen8 and Gen9
since they're now looking at optimized depth/stencil state.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17564>
For one thing, we were deceptively setting it wrong in genX_cmd_buffer.c
and then overwriting it in each of of gfx7_cmd_buffer.c and
gfx8_cmd_buffer.c. Pull it all into genX_cmd_buffer.c so it's no longer
duplicated. Also, stop doing the PATCHLIST conversion in anv_pipeline.c
and just store the number of patch vertices.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17564>
The only reason why we recorded them per-sample-count is because Intel
hardware is weird starting with Broadwell. The API, requires that the
dynamic sample pattern be reset every time the sample count changes so
we only need to record the pattern for the current sample count.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17564>
Those fields have confusing names. They hold non dynamic state,
specified at pipeline creation that we copy into the dynamic state of
the command buffer whenever we bind a pipeline. This non dynamic state
might get picked up in the dynamically emitted instructions of the 3D
pipeline because our HW packets are not exactly splitted like the
Vulkan API.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17601>
Because clear colors are stored as 4 32bit component values, there is
an issue if you try to format instance :
- clearing in R16G16_UNORM
- draw in R32_UINT
Clear will use 2 components of the clear color in dword0 & dword1.
While draw will use only one component of dword0.
This change uses the mutable format information to track whether clear
colors can be non-zero for fast clears.
With :
- non mutable formats, we can fast clear with any color on Gfx > 8
- mutable formats with incompatible component sizes, we can only
fast clear with 0 color
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5930
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17329>
In the following case :
vkCmdBindPipeline(compute_pipeline);
vkCmdDispatch(...);
vkCmdBindPipeline(graphics_pipeline);
vkCmdBindIndexBuffer(buffer)
vkCmdDraw(...);
We're emitting the 3DSTATE_INDEX_BUFFER instruction while the HW is
still in GPGPU mode, because we're dealing the pipeline selection to
vkCmdDraw().
Found while debugging Age Of Empire 4, HW is hung on
3DSTATE_INDEX_BUFFER instruction.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17153>
With this option enabled range of input values for fsin and fcos is
limited to [-2*pi : 2*pi] by calculating the reminder after 2*pi modulo
division. This helps to improve calculation precision for large input
arguments on Intel.
-v2: Add limit_trig_input_range option to prog_key to update shader
cache (Lionel)
Signed-off-by: Vadym Shovkoplias <vadym.shovkoplias@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16388>
On Gfx7 we can only give the sample location for a given multisample
number. This means everytime the multisampling value changes, we have
to re-emit the locations. It's fine because it's also where
(3DSTATE_MULTISAMPLE) the number of samples is stored.
On Gfx8+ though, 3DSTATE_MULTISAMPLE only holds the number of samples
and all the sample locations for all number of samples are located in
3DSTATE_SAMPLE_PATTERN. So to be more effecient there, we need to
track the locations for all sample numbers and compare new values with
the relevant sample count when touching the dynamic state.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16220>
Discrete platforms don't have LLC, but on those, we mmap our buffers
with WC. So we shouldn't need to clflush there.
Anv already had a boolean field on the physical device to know whether
we need to use clflush(), based off the memory heaps available. So use
that instead.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15780>
Without this we might choose 8 or 16 width, while the app assumes 32.
With subgroup operations it may cause wrong calculations and thus bugs.
Examples of such games are Aperture Desk Job and DOOM Eternal.
v2: Make it a driconf option instead of applying unconditionally, move
from brw_required_dispatch_width to brw_compile_cs
v3: Rename allow_assuming_full_subgroups -> assume_full_subgroups.
Include assume_full_subgroups value in anv_pipeline_hash_compute().
v4: Move actual workaround code from brw_fs.c -> anv_pipeline.c.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6171
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15708>
Instead of having two different helpers, delete the pipeline_cache ones.
Also, instead of manually handling the cache == NULL case in every
vkCreateFooPipelines call, handle it inside the helpers. This means
that BLORP can use them too by passing cache=NULL.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13184>