PA_SC_AA_CONFIG might be updated when conversative rasterization is
enabled. Because the driver only re-emits the multisample state if
the number of samples is different, that register value might not
be updated correctly.
Found by inspection, doesn't fix anything known.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When a fragment shader includes an input variable decorated with
SampleId or SamplePosition, sample shading should be enabled
because minSampleShadingFactor is expected to be 1.0.
Cc: 19.2, 19.3 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In order to prevent a potential malicious pipeline tainting our
secure compile process and interfering with successive pipelines
we want to create a fresh fork for each pipeline compile.
Benchmarking has shown that simply forking on each pipeline
creation doubles the total time it takes to compile a fossilize db
collection. So instead here we fork the process at device creation
so that we have a slim copy of the device and then fork this
otherwise idle and untainted process each time we compile a
pipeline. Forking this slim copy of the device results in only a
20% increase in compile time vs a 100% increase.
Fixes: cff53da3 ("radv: enable secure compile support")
When the scratch ringbuffer settings are changed, the shader unit has
to be idle or we will have shaders using old and new settings.
That combination is not supported on the HW (likely the offset is
ringbuffer idx * WAVESIZE * 1024).
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The primitive indices have to be swapped to follow the drawing
order.
This fixes corruption with Overwatch when NGG GS is force enabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This extension allows to control the subgroup size by allowing a
varying subgroup size and also specifying a required subgroup size.
This implementation only allows to specify a required subgroup
size for compute shaders because there is some caveats with
other shader stages (eg. NGG with geometry shader). This
basically allows apps to use Wave32 for compute shaders.
This extension is enabled for all chips but only GFX10 supports
Wave32. ACO doesn't support it.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If an app first creates a compute pipeline with
VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT set, then re-compile it
without that flag, the driver should re-compile the compute shader.
Otherwise, it will return the unoptimized one.
Fixes: ce188813bf ("radv: add initial support for VK_PIPELINE_CREATE_DISABLE_OPTIMIZATION_BIT")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Got some int->pointer warnings and 20 is not a valid pointer ....
Fixes: 2e3a635ee6 "radv: Add an early exit in the secure compile if we already have the cache entries."
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Can be enabled via the environment variable which tells the
driver how many compilation threads are expected to be called,
and therefore how many forked processes the driver should
create.
For example we would expect to call fossilize replay with
something like this:
RADV_SECURE_COMPILE_THREADS=8 ./fossilize-replay --num-threads 8 \
--shader-cache-size 0 --ignore-derived-pipelines pipeline_cache.foz
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This function will be called by the parent process when doing a
secure compile. It first selects a free process to work with then
passes it all the information it needs to compile the pipeline.
Once the pipeline information has been passed to the secure
process, it then waits around to read/write any disk cache entries
required before exiting.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is cleaner and avoids having to read/write an additional copy of
topology for use with secure compile.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The number of vertices has to be adjusted with the output primitive
type.
This fixes dEQP-VK.transform_feedback.simple.triangle_strip_*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This commit allows RADV to set the shared VGPR count according to
the shader config.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Apparently we already enabled it without having support ...
Not sure if we also need to set disable_start_of_prim when the PS
has memory writes, but this mirrors radeonsi.
Doubles fillrate in my dual_quad_bench from ~16 pixels/cycles to
~32 pixels/cycle on a Raven.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
No GS copy shader if a pipeline enables NGG GS.
This fixes
dEQP-VK.pipeline.executable_properties.graphics.*geometry_stage*.
Fixes: 86864eedd2 ("radv: Implement radv_GetPipelineExecutablePropertiesKHR.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Otherwise the wave IDs are probably 0 and it hangs. NGG_WAVE_ID_EN
generates wave IDs for GDS OA.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This internal option is turned off by default because NGG streamout
still hangs. It seems like it's related to GDS as RadeonSI.
That option will be turned on once all issues are resolved.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes some interactions when NGG GS is enabled. It fixes:
- dEQP-VK.clipping.user_defined.clip_cull_distance_dynamic_index.*geom*
- dEQP-VK.tessellation.geometry_interaction.passthrough.*
For some reasons, using the computed ESGS ring size randomly hangs
with CTS. For now, just use the maximum LDS size for ESGS.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This shouldn't be in NIR->LLVM because ACO also needs the shader
info. This will also help for computing some NGG values that are
necessary for declaring LDS symbols.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We usually use these counts as a simple way to figure out if a change
reduces the number of instructions or shrinks an instruction. However,
since .rodata sections aren't executed, we shouldn't be counting their
size for this analysis. Make the linker return the total executable
size, and use it to report the more useful size in both drivers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>