Connor Abbott
3b143369a5
ac/nir, radv, radeonsi: Switch to using ac_shader_args
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Acked-by: Marek Olšák <marek.olsak@amd.com >
2019-11-25 14:17:10 +01:00
Marek Olšák
442ef8c3e3
radeonsi: keep serialized NIR instead of nir_shader in si_shader_selector
...
This decreases memory usage, because serialized NIR is more compact.
The main shader part is compiled from nir_shader.
Monolithic shader variants are compiled from nir_binary.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com >
2019-11-05 23:28:45 -05:00
Marek Olšák
2f42d4cacc
radeonsi/gfx10: use fma for TGSI_OPCODE_FMA
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
2019-09-09 23:43:03 -04:00
Marek Olšák
223b3174bd
radeonsi/nir: always lower ballot masks as 64-bit, codegen handles it
...
This fixes KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks.
This solution is better, because the IR isn't dependent on wave32.
2019-08-19 17:23:38 -04:00
Marek Olšák
1f8a661748
radeonsi: clean up si_llvm_context_set_tgsi
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
2019-08-19 17:23:38 -04:00
Marek Olšák
0ef4c1c04d
radeonsi: don't use lp_build_if for the wrapping if block in merged shaders
2019-07-30 22:06:23 -04:00
Marek Olšák
9234275320
radeonsi/nir: implement FBFETCH for KHR_blend_equation_advanced
2019-07-30 22:06:23 -04:00
Marek Olšák
88efb63caf
radeonsi/gfx10: implement Wave32
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
2019-07-19 20:16:19 -04:00
Marek Olšák
1d3bffaf9c
radeonsi/gfx10: enable image stores with DCC
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Acked-by: Dave Airlie <airlied@redhat.com >
2019-07-09 17:24:16 -04:00
Nicolai Hähnle
792a638b03
radeonsi/gfx10: implement streamout-related queries
...
The NGG hardware pipeline doesn't track these statistics automatically,
and in fact *cannot* track them automatically when API geometry shaders
are involved, so we accumulate statistics in the shader using atomic
adds.
This implementation accumulates statistics via the memory system and
the RW buffer descriptor setup. We could use GDS, but since these
atomics aren't latency-sensitive, that basically just trades off
L2$ bandwidth vs. export bus bandwidth. One single memory transaction
per shader workgroup doesn't seem too bad. The result ring buffer in
memory is needed either way to avoid pipeline stalls.
The shader code contains the atomic unconditionally, though the
GFX10_GS_QUERY_BUF is a null buffer when no queries are active. The
atomic is simply discarded by the shader hardware in that case.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
4ecc39e1aa
radeonsi/gfx10: NGG geometry shader PM4 and upload
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
a04aa4be2b
radeonsi/gfx10: generate geometry shaders for NGG
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
612489bd5d
radeonsi/gfx10: generate VS and TES as NGG merged ESGS shaders
...
This does not support geometry shading yet. Also missing are streamout
and NGG-specific optimizations.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
112bf7f900
radeonsi: make emit_streamout_output externally accessible
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
04e27ec136
radeonsi: make get_primitive_id externally visible
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
5059a4df8a
radeonsi: make si_llvm_export_vs externally available
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-07-03 15:51:12 -04:00
Nicolai Hähnle
bf8a1ca902
radeonsi: use the new run-time linker for shaders
...
v2:
- fix a memory leak
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2019-06-12 20:28:23 -04:00
Marek Olšák
43aa2f4f7c
radeonsi: make functions for creating LLVM functions non-static
...
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de >
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2019-05-16 13:10:07 -04:00
Marek Olšák
be0bd95abf
radeonsi: fix GPU hangs with bindless textures and LLVM 7.0
...
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de >
2018-09-10 15:19:56 -04:00
Marek Olšák
c5442c1165
radeonsi: add TGSI_SEMANTIC_CS_USER_DATA for reading up to 4 SGPRs with TGSI
2018-08-29 15:31:42 -04:00
Marek Olšák
e80e8d7adc
ac: fix WAITCNT flags for GFX9
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
2018-08-22 14:34:43 -04:00
Marek Olšák
a2c18bfbe3
radeonsi: don't use emit_data->args in atomic_emit
...
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de >
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
2018-08-14 21:21:03 -04:00
Marek Olšák
cb6b241c30
ac,radeonsi: reduce optimizations for complex compute shaders on older APUs (v2)
...
To make dEQP-GLES31.functional.ssbo.layout.random.all_shared_buffer.23
finish sooner on the older CPUs. (otherwise it gets killed and we fail
the test)
Acked-by: Dave Airlie <airlied@gmail.com >
2018-08-01 15:25:18 -04:00
Dave Airlie
0eb65b4944
radeonsi: rename si_compiler -> ac_llvm_compiler
...
As precursor to moving init to common code, just rename the struct
and move it.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-07-04 05:31:32 +10:00
Marek Olšák
7bd40dc2f2
radeonsi: clean up some #includes
...
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com >
2018-06-25 18:33:58 -04:00
Marek Olšák
f154555733
radeonsi: clean up passing the is_monolithic flag for compilation
...
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com >
2018-06-25 18:33:58 -04:00
Marek Olšák
87eb597758
radeonsi: add struct si_compiler containing LLVMTargetMachineRef
...
It will contain more variables.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com >
Tested-by: Benedikt Schemmer <ben at besd.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2018-04-27 17:56:04 -04:00
Marek Olšák
4c5efc40f4
radeonsi: update copyrights
...
Acked-by: Timothy Arceri <tarceri@itsqueeze.com >
2018-04-05 15:34:58 -04:00
Marek Olšák
2be6143032
radeonsi: implement GL_KHR_blend_equation_advanced
...
MSAA is supported using sample shading. Layered rendering and all texture
targets are also supported.
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de >
2018-04-02 13:55:25 -04:00
Marek Olšák
e04631b0f2
radeonsi: rename unpack_param -> si_unpack_param
...
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de >
2018-04-02 13:55:23 -04:00
Timothy Arceri
50cc97d98a
radeonsi: add si_llvm_emit_kill() helper
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-03-08 11:28:37 +11:00
Timothy Arceri
6e1a142863
radeonsi: make use of if/loop build helpers in ac
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-03-08 10:12:34 +11:00
Marek Olšák
9779f34326
radeonsi: remove si_llvm_add_attribute
2018-03-07 13:55:49 -05:00
Timothy Arceri
2a68c6c6c8
radeonsi: move si_nir_load_input_gs() to si_shader.c
...
All the tess shader and tgsi equivalents are here and it allows
use to use llvm_type_is_64bit() in the following patch without
exposing it externally.
Reviewed-by: Dave Airlie <airlied@redhat.com >
2018-03-06 11:44:06 +11:00
Marek Olšák
63ea0a00a3
radeonsi: preload the tess offchip ring in TES
...
so that it's not done multiple times in branches
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2018-02-24 23:08:29 +01:00
Marek Olšák
2d03c4cac8
radeonsi: move tess ring address into TCS_OUT_LAYOUT, removes 2 TCS user SGPRs
...
TCS_OUT_LAYOUT has 13 unused bits. That's enough for a 32-bit address
aligned to 512KB. Hey, it's a 13-bit pointer!
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2018-02-24 23:08:29 +01:00
Marek Olšák
41895c26d3
radeonsi: move TCS_OUT_LAYOUT.PatchVerticesIn to lower bits
...
For a later patch.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2018-02-24 23:08:28 +01:00
Timothy Arceri
6d338d757f
ac/radeonsi: pass type to load_tess_varyings()
...
We need this to be able to load 64bit varyings.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-02-22 09:31:00 +11:00
Timothy Arceri
b6cf898ec2
radeonsi: make si_declare_compute_memory() more generic and call for nir
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-02-13 14:43:05 +11:00
Timothy Arceri
9c52902c76
ac/radeonsi: add num_work_groups to the abi
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-02-07 08:43:08 +11:00
Timothy Arceri
c8066cdfa7
ac/radeonsi: add local_invocation_ids to the abi
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-02-07 08:43:08 +11:00
Timothy Arceri
fa5239c153
ac/radeonsi: add workgroup_ids to the abi
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-02-07 08:43:08 +11:00
Marek Olšák
472361dd7e
radeonsi: remove unused si_shader_context members
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
2018-02-01 16:20:19 +01:00
Timothy Arceri
3a47b138e3
radeonsi/nir: add si_nir_lookup_interp_param() helper
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-01-31 09:14:07 +11:00
Marek Olšák
b633999a4e
ac: rename and move si_const_array into common code
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
2018-01-27 02:09:09 +01:00
Timothy Arceri
9622b445c8
ac/radeonsi: add tcs load outputs support
...
The code to load outputs is essentially the same as load inputs
so we make the interface more generic to maximise code sharing.
We will make use of the new support in the following patch.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2018-01-18 00:03:33 +11:00
Timothy Arceri
9e1a3caf32
ac/radeonsi: add tcs_rel_ids to the abi
...
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-01-05 11:58:55 +11:00
Timothy Arceri
f93740efc1
ac: add {tcs,tes}_patch_id to the abi
...
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-01-05 11:58:55 +11:00
Timothy Arceri
e04bf8a619
radeonsi: add si_nir_load_input_tes()
...
V2: drop type param and just use ctx->i32
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-01-05 11:58:55 +11:00
Samuel Pitoiset
225b198802
amd/common: add ac_build_waitcnt()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2017-12-14 22:24:44 +01:00