Commit Graph

354 Commits

Author SHA1 Message Date
Samuel Pitoiset
91aa25f462 radv: fix alpha-to-coverage when there is unused color attachments
When alphaToCoverage is enabled, we should always write the alpha
channel of MRT0 if it's unused. This now matches RadeonSI.

This fixes the new CTS:
dEQP-VK.pipeline.multisample.alpha_to_coverage_unused_attachment.samples_*.alpha_invisible

Cc: 19.0 19.1 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl
2019-06-10 09:23:41 +02:00
Nicolai Hähnle
f480b8aaa4 amd/common: use generated register header 2019-06-03 20:05:20 -04:00
Samuel Pitoiset
da26013eb7 radv: implement VK_EXT_sample_locations and disable it
Basically, this extension allows applications to use custom
sample locations. It doesn't support variable sample locations
during subpass. Note that we don't have to upload the user
sample locations because the spec doesn't allow this.

The extension is currently disabled because the driver needs to
support variable sample locations during layout transitions. The
depth decompress needs to know them and that's a bit invasive.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-30 09:52:16 +02:00
Marek Olšák
b257956021 ac: treat Mullins as Kabini, remove the enum
it's the same design
2019-05-27 15:10:51 -04:00
Samuel Pitoiset
a7763ddcf2 radv: clean up the sample locations codebase
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-22 08:36:35 +02:00
Marek Olšák
ccfcb9d818 ac: rename SI-CIK-VI to GFX6-GFX7-GFX8
Acked-by: Dave Airlie <airlied@redhat.com>

We already use GFX9 and I don't want us to have confusing naming
in the driver. GFXn naming is better from the driver perspective,
because it's the real version of the gfx portion of the hw. Also,
CIK means Bonaire-Kaveri-Kabini, it doesn't mean CI.

It shouldn't confuse our SDMA, UVD, VCE etc. code much. Those have
nothing to do with GFXn and they have their own version numbers.
2019-05-15 20:54:10 -04:00
Samuel Pitoiset
53dfff1c4d radv: fix setting the number of rectangles when it's dyanmic
We need to know the number of rectangles.

This fixes new CTS dEQP-VK.draw.discard_rectangles.dynamic_*.

Fixes: 5db0bf9994 ("radv: Implement VK_EXT_discard_rectangles.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-05-09 11:42:25 +02:00
Bas Nieuwenhuizen
5c3467e74a radv: Run the new ycbcr lowering pass.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-25 19:56:20 +00:00
Bas Nieuwenhuizen
028ce52739 radv: Add non-uniform indexing lowering.
This patch does it as late as possible so the potential extra
basic blocks don't inhibit other optimizations.

Big thanks to Jason for writing the lowering pass.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-04-10 02:04:13 +02:00
Samuel Pitoiset
775191cd99 radv: fix getting the vertex strides if the bindings aren't contiguous
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110349
Fixes: a66b186beb ("radv: use typed buffer loads for vertex input fetches")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-04-08 21:17:15 +02:00
Samuel Pitoiset
db07f0554a radv: add missing initializations since VK_EXT_pipeline_creation_feedback
This fixes the world.

Fixes: 5f5ac19f13 ("radv: Implement VK_EXT_pipeline_creation_feedback.")"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-21 09:42:31 +01:00
Bas Nieuwenhuizen
5f5ac19f13 radv: Implement VK_EXT_pipeline_creation_feedback.
Does what it says on the tin.

The per stage time is only an approximation due to linking and
the Vega merged stages.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-03-20 21:19:46 +00:00
Samuel Pitoiset
a66b186beb radv: use typed buffer loads for vertex input fetches
This drastically reduces the number of SGPRs because the driver
now uses descriptors per vertex binding, instead of per vertex
attribute format.

29077 shaders in 15096 tests
Totals:
SGPRS: 1354285 -> 1282109 (-5.33 %)
VGPRS: 909896 -> 908800 (-0.12 %)
Spilled SGPRs: 24840 -> 24811 (-0.12 %)
Code Size: 49221144 -> 48986628 (-0.48 %) bytes
Max Waves: 243930 -> 244229 (0.12 %)

Totals from affected shaders:
SGPRS: 390648 -> 318472 (-18.48 %)
VGPRS: 288432 -> 287336 (-0.38 %)
Spilled SGPRs: 94 -> 65 (-30.85 %)
Code Size: 11548412 -> 11313896 (-2.03 %) bytes
Max Waves: 86460 -> 86759 (0.35 %)

This gives a really tiny boost.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-13 13:31:11 +01:00
Samuel Pitoiset
0b9a06a1a0 radv: store more vertex attribute infos as pipeline keys
They are required for using typed buffer loads.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-13 13:31:08 +01:00
Samuel Pitoiset
9e787904d0 rav: use 32_AR instead of 32_ABGR when alpha coverage is required
This export format is faster. Seems to improve performance in
Wreckfest.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-03-04 12:02:01 +01:00
Bas Nieuwenhuizen
a1fdd4a4a7 radv: Fix float16 interpolation set up.
float16 types can have non-flat interpolation so set up the HW
correctly for that.

Fixes: 62024fa775 "radv: enable VK_KHR_16bit_storage extension / 16bit storage features"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-22 17:06:55 +01:00
Bas Nieuwenhuizen
688f5e456a radv: Disable depth clamping even without EXT_depth_range_unrestricted.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-20 23:24:31 +00:00
Bas Nieuwenhuizen
9f7e0523ce radv: Implement VK_EXT_depth_clip_enable.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-02-20 23:24:31 +00:00
Bas Nieuwenhuizen
572854e706 radv: Clean up a bunch of compiler warnings.
Random unused vars.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-02-20 03:21:09 +01:00
Samuel Pitoiset
47616810ed radv: fix writing the alpha channel of MRT0 when alpha coverage is enabled
This version is better and safer.

Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-18 18:06:07 +01:00
Samuel Pitoiset
0d8f096293 radv: write the alpha channel of MRT0 when alpha coverage is enabled
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109597
Cc: 18.3 19.0 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-18 12:14:22 +01:00
Samuel Pitoiset
210aec3612 radv: store vertex attribute formats as pipeline keys
The formats will be used for reducing the number of loaded channels.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-14 09:10:09 +01:00
Samuel Pitoiset
334da034d8 radv: always export gl_SampleMask when the fragment shader uses it
For some reasons, this breaks trees rendering in Project Cars.

Fixes: 85010585cd ("radv: only enable gl_SampleMask if MSAA is enabled too")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109401
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-13 23:01:30 +01:00
Dave Airlie
90c6880df7 radv: remove alloc parameter from pipeline init
clang points out this isn't used.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-11 10:04:40 +10:00
Samuel Pitoiset
6430616e77 radv: track if subpasses have color attachments
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:19:14 +01:00
Samuel Pitoiset
a20c2e38d8 radv: store the list of attachments for every subpass
This reworks how the depth stencil attachment is used for
simplicity. This also introduces radv_render_pass_compile()
helper that will be used for further optimizations.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-02-04 13:17:54 +01:00
Rhys Perry
e4c6423c5e radv: avoid context rolls when binding graphics pipelines
It's common in some applications to bind a new graphics pipeline without
ending up changing any context registers.

This has a pipline have two command buffers: one for setting context
registers and one for everything else. The context register command buffer
is only emitted if it differs from the previous pipeline's.

v2: ensure late scissor emission is done when radv_emit_rbplus_state() is
    called
v2: make use of cmd_buffer->state.workaround_scissor_bug
v3: rename "workaround_scissor_bug" to
    "context_roll_without_scissor_emitted"

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-21 14:37:53 +00:00
Rhys Perry
8a52e4cc4f radv: use dithered alpha-to-coverage
This matches the behaviour of AMDVLK and hides banding.
It is also seems to be allowed by the Vulkan spec.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-01-16 20:49:23 +00:00
Bas Nieuwenhuizen
568e7a2998 radv: Set partial_vs_wave for pipelines with just GS, not tess.
Looking at -pro we need to enable it for pipelines with just a
GS too.

This seems to reduce the hangs from
https://bugs.freedesktop.org/show_bug.cgi?id=109242 on a RX 550 to
the point where I can't reproduce, after the false start with the
wd_switch_on_eop patch due to flakiness.

(but people are reporting it does not fix the issue completely for
 them on polaris 11)

CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-01-15 10:22:30 +01:00
Samuel Pitoiset
d58b11e709 radv: get rid of bunch of KHR suffixes
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-01-09 12:26:48 +01:00
Timothy Arceri
50de3f80a8 nir: rename nir_link_constant_varyings() nir_link_opt_varyings()
The following patches will add support for an additional
optimisation so this function will no longer just optimise varying
constants.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-01-02 12:19:17 +11:00
Jason Ekstrand
11dc130779 nir: Add a bool to int32 lowering pass
We also enable it in all of the NIR drivers.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-16 21:03:02 +00:00
Samuel Pitoiset
d8325f1f07 radv: switch on EOP when primitive restart is enabled with triangle strips
Otherwise, Yakuza hangs the GPU with DXVK. We don't know if
linetrip and pointlist are affected, so my point is to do that
only for triangle strips.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-12-13 09:21:16 +01:00
Timothy Arceri
95b513c937 radv: make use of nir_move_out_const_to_consumer()
vkpipeline-db results:

Totals from affected shaders:
SGPRS: 28400 -> 28576 (0.62 %)
VGPRS: 27916 -> 27692 (-0.80 %)
Spilled SGPRs: 140 -> 138 (-1.43 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 1534456 -> 1520560 (-0.91 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 3541 -> 3582 (1.16 %)
Wait states: 0 -> 0 (0.00 %)

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-11-14 09:41:50 +11:00
Samuel Pitoiset
f70c5d31cd radv: set PA.SC_CONSERVATIVE_RASTERIZATION.NULL_SQUAD_AA_MASK_ENABLE
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-13 10:24:33 +01:00
Samuel Pitoiset
d9d14346c2 radv: clean up setting partial_es_wave for distributed tess on VI
Only needed when the pipeline actually uses tessellation. I don't
think that changes anything, except improving readability.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-11-12 09:35:44 +01:00
Samuel Pitoiset
9cbdcc86b7 radv: set PA_SU_PRIM_FILTER_CNTL optimally
Ported from RadeonSI. It's always TRUE for CIK+ because RADV
doesn't support 16 samples.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-11-01 08:49:15 +01:00
Samuel Pitoiset
85010585cd radv: only enable gl_SampleMask if MSAA is enabled too
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-11-01 08:49:11 +01:00
Samuel Pitoiset
79410b1e87 radv: add support for Raven2
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-11-01 08:48:52 +01:00
Samuel Pitoiset
b4eb029062 radv: implement VK_EXT_transform_feedback
This implementation should work and potential bugs can be
fixed during the release candidates window anyway.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-10-29 17:10:58 +01:00
Samuel Pitoiset
98c09c3fcd radv: adjust the number of output components per stream
Same as the previous patch, except that is only the number of
components.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-10-29 17:09:08 +01:00
Samuel Pitoiset
4649471a9e radv: adjust the GSVS ring sizes based on the number of components
For multiple streams support we have to set the different ring
buffer sizes correctly. This relies on the number of output
components per stream.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-10-29 17:09:08 +01:00
Samuel Pitoiset
9e56ffb0b4 radv: remove wrong comment in calculate_gs_ring_sizes() about streams
The computation seems correct compared to RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-10-29 12:33:58 +01:00
Timothy Arceri
0ff1ccca25 radv: call nir_link_xfb_varyings()
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-10-24 08:21:29 +11:00
Timothy Arceri
c769ed10de radv: move nir_lower_io_to_scalar_early() to radv_link_shaders()
nir_lower_io_to_scalar_early() is really part of the link time
optimisations. Moving it here allows the code to be simplified
and also keeps the code easy to follow in the next patch.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-10-24 08:21:29 +11:00
Timothy Arceri
06675711e7 radv: use nir_opt_find_array_copies()
Totals from affected shaders:
SGPRS: 1112 -> 1112 (0.00 %)
VGPRS: 1492 -> 1196 (-19.84 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 112172 -> 101316 (-9.68 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 93 -> 98 (5.38 %)
Wait states: 0 -> 0 (0.00 %)

All affected shaders are from "Batman: Arkham City" over DXVK.

The pass detects that the temporary array created by DXVK for
storing TCS inputs is a copy of the input arrays and allows
us to avoid copying all of the input data and then indirecting
on it with if-ladders, instead we just do indirect indexing.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-18 15:04:09 +11:00
Samuel Pitoiset
26a2ce35ab radv: do not force the flat qualifier for clip/cull distances
This fixes some new CTS that reads clip/cull distances
from the fragment shader stage:

dEQP-VK.clipping.user_defined.clip_*

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-15 21:55:28 +02:00
Samuel Pitoiset
d179312b53 radv: add a workaround for a VGT hang with prim restart and strips
Otherwise, Yakuza and The Evil Within hang the GPU with DXVK.
This apparently only works on Polaris.

Suggested by Marek.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-11 10:16:11 +02:00
Dave Airlie
7c04b96f03 radv: don't pass shader key by copy
Coverity pointed out we were copying 168 bytes here unnecessarily.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-10-11 10:18:43 +10:00
Samuel Pitoiset
d3682766f6 radv: tidy up radv_pipeline_init_multisample_state()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-10-08 14:17:43 +02:00