Marek Olšák
81091a5183
ac: create the LLVM builder in ac_llvm_context_init
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
2019-07-19 20:16:19 -04:00
Marek Olšák
eb54b8c222
ac: create the LLVM module for Wave32 or Wave64 in ac_llvm_context_init
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
2019-07-19 20:16:19 -04:00
Marek Olšák
9e467d111b
ac: initial Wave32 support in LLVM build helpers
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
2019-07-19 20:16:19 -04:00
Marek Olšák
14450c8c41
ac: remove unused AC_WAIT_EXP
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
2019-07-04 15:39:01 -04:00
Marek Olšák
8a71f60194
ac: replace glc,slc with cache_policy for loads
...
cosmetic change
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
2019-07-04 15:38:56 -04:00
Marek Olšák
a29e781961
ac: replace glc,slc with cache_policy for stores
...
cosmetic change
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
2019-07-04 15:38:54 -04:00
Marek Olšák
969e5176c2
ac: rework ac_build_waitcnt for gfx10
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-07-03 15:51:13 -04:00
Marek Olšák
4bdf44724f
radeonsi/gfx10: set DLC for loads when GLC is set
...
This fixes L1 shader array cache coherency.
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-07-03 15:51:13 -04:00
Nicolai Hähnle
7ba80c1d19
amd/common/gfx10: add GS_ALLOC_REQ message define
...
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-07-03 15:51:12 -04:00
Marek Olšák
ac4b1e2f0a
radeonsi: set the calling convention for inlined function calls
...
otherwise the behavior is undefined
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de >
2019-06-24 21:04:10 -04:00
Connor Abbott
3bf8981c51
ac,radeonsi: Always mark buffer stores as inaccessiblememonly
...
inaccessiblememonly means that it doesn't modify memory accesible via
normal LLVM pointers. This lets LLVM's dead store elimination, memcpy
forwarding, etc. ignore functions with this attribute. We don't
represent descriptors as pointers, so this property is always true of
buffer and image stores. There are plans to represent descriptors via
pointers, but this just means that now nothing is inaccessiblememonly,
as LLVM will then understand loads/stores via its usual alias analysis.
Radeonsi was mistakenly only setting it if the driver could prove that
there were no reads, and then it was cargo-culted into ac_llvm_build
and ac_llvm_to_nir. Rip it out of everything.
statistics with nir enabled:
Totals from affected shaders:
SGPRS: 152 -> 152 (0.00 %)
VGPRS: 128 -> 132 (3.12 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 9324 -> 9244 (-0.86 %) bytes
LDS: 2 -> 2 (0.00 %) blocks
Max Waves: 17 -> 17 (0.00 %)
Wait states: 0 -> 0 (0.00 %)
The only difference was a manhattan31 shader.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com >
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2019-06-19 14:08:27 +02:00
Marek Olšák
4773f5a293
radeonsi: use the ac helper for index buffer stores in the culling shader
2019-06-11 20:05:21 -04:00
Samuel Pitoiset
6970a9a6ca
ac,radv: remove the vec3 restriction with LLVM 9+
...
This changes requires LLVM r356755.
32706 shaders in 16744 tests
Totals:
SGPRS: 1448848 -> 1455984 (0.49 %)
VGPRS: 1016684 -> 1016220 (-0.05 %)
Spilled SGPRs: 25871 -> 25815 (-0.22 %)
Spilled VGPRs: 122 -> 122 (0.00 %)
Scratch size: 11964 -> 11956 (-0.07 %) dwords per thread
Code Size: 55324500 -> 55301152 (-0.04 %) bytes
Max Waves: 235660 -> 235586 (-0.03 %)
Totals from affected shaders:
SGPRS: 293704 -> 300840 (2.43 %)
VGPRS: 246716 -> 246252 (-0.19 %)
Spilled SGPRs: 159 -> 103 (-35.22 %)
Scratch size: 188 -> 180 (-4.26 %) dwords per thread
Code Size: 8653664 -> 8630316 (-0.27 %) bytes
Max Waves: 60811 -> 60737 (-0.12 %)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2019-06-03 11:30:08 +02:00
Nicolai Hähnle
81fe33735a
amd/common: add ac_build_opencoded_fetch_format
...
Implement software emulation of buffer_load_format for all types required
by vertex buffer fetches.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2019-05-13 17:07:23 +02:00
Rhys Perry
bd4c661ad0
ac,ac/nir: use a better sync scope for shared atomics
...
https://reviews.llvm.org/rL356946 (present in LLVM 9 and later) changed
the meaning of the "system" sync scope, making it no longer restricted to
the memory operation's address space. So a single address space sync scope
is needed for shared atomic operations (such as "system-one-as" or
"workgroup-one-as") otherwise buffer_wbinvl1 and s_waitcnt instructions
can be created at each shared atomic operation.
This mostly reimplements LLVMBuildAtomicRMW and LLVMBuildAtomicCmpXchg
to allow for more sync scopes and uses the new functions in ac->nir with
the "workgroup-one-as" or "workgroup" sync scopes.
F1 2017 (4K, Ultra High settings, TAA), avg FPS : 59 -> 59.67 (+1.14%)
Strange Brigade (4K, ~highest settings), avg FPS : 51.5 -> 51.6 (+0.19%)
RotTR/mountain (4K, VeryHigh settings, FXAA), avg FPS : 57.2 -> 57.2 (+0.0%)
RotTR/tomb (4K, VeryHigh settings, FXAA), avg FPS : 42.5 -> 43.0 (+1.17%)
RotTR/valley (4K, VeryHigh settings, FXAA), avg FPS : 40.7 -> 41.6 (+2.21%)
Warhammer II/fallen, avg FPS : 31.63 -> 31.83 (+0.63%)
Warhammer II/skaven, avg FPS : 37.77 -> 38.07 (+0.79%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-04-29 18:20:44 +01:00
Marek Olšák
35cd57df2e
ac: add ac_get_i1_sgpr_mask
...
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de >
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2019-04-23 11:28:56 -04:00
Samuel Pitoiset
fd4041987b
ac: add ac_build_load_helper_invocation() helper
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2019-04-12 17:30:55 +02:00
Samuel Pitoiset
590a4c8981
ac: add ac_build_ddxy_interp() helper
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2019-04-12 17:30:55 +02:00
Samuel Pitoiset
4cb13e9462
ac: add ac_build_umax() and use it where possible
...
This changes the predicate from LessThan to Equal.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2019-04-12 17:30:55 +02:00
Samuel Pitoiset
52c02d921f
ac: add ac_build_frex_exp() helper ans 16-bit/32-bit support
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-03-28 13:02:48 +01:00
Samuel Pitoiset
1bf9311c59
ac: add ac_build_frexp_mant() helper and 16-bit/32-bit support
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-03-28 13:02:46 +01:00
Samuel Pitoiset
d6a07732c9
ac: use llvm.amdgcn.fmed3 intrinsic for nir_op_fmed3
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-03-27 14:45:52 +01:00
Samuel Pitoiset
ff11c9dcc7
ac: add f16_0 and f16_1 constants
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-03-21 12:13:05 +01:00
Samuel Pitoiset
b235d77e18
ac: add ac_build_tbuffer_store_byte() helper
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-03-21 09:02:18 +01:00
Samuel Pitoiset
104dbc64a5
ac: add ac_build_tbuffer_load_byte() helper
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-03-21 09:02:14 +01:00
Samuel Pitoiset
6e632eb24b
ac: add various int8 definitions
...
Original patch by Rhys Perry.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-03-21 09:02:10 +01:00
Samuel Pitoiset
9d960c17a8
ac: use new LLVM 8 intrinsic when storing 16-bit values
...
vindex is always 0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-03-20 22:19:14 +01:00
Samuel Pitoiset
2a9d331898
ac: add ac_build_{struct,raw}_tbuffer_store() helpers
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-03-20 22:19:12 +01:00
Samuel Pitoiset
a2073f49f1
ac: add ac_build_buffer_store_format() helper
...
Similar to ac_build_buffer_load_format().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-03-20 22:18:50 +01:00
Samuel Pitoiset
cbf022cb31
ac: use the raw tbuffer version for 16-bit SSBO loads
...
vindex is always 0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-03-13 14:16:14 +01:00
Samuel Pitoiset
045fae0f73
ac: add ac_build_{struct,raw}_tbuffer_load() helpers
...
The struct version sets IDXEN=1, while the raw version sets IDXEN=0.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-03-13 14:15:05 +01:00
Samuel Pitoiset
489dac0d21
ac: rework typed buffers loads for LLVM 7
...
Be more generic, this will be used by an upcoming series.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-03-13 13:31:06 +01:00
Bas Nieuwenhuizen
a1fdd4a4a7
radv: Fix float16 interpolation set up.
...
float16 types can have non-flat interpolation so set up the HW
correctly for that.
Fixes: 62024fa775
"radv: enable VK_KHR_16bit_storage extension / 16bit storage features"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
2019-02-22 17:06:55 +01:00
Samuel Pitoiset
f0223143a8
ac: add ac_build_llvm8_tbuffer_load() helper
...
It uses the new LLVM intrinsics.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-02-18 12:14:17 +01:00
Samuel Pitoiset
2154fac6f3
ac: make use of ac_build_expand_to_vec4() in visit_image_store()
...
And make ac_build_expand() a static function.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-02-14 09:09:48 +01:00
Bas Nieuwenhuizen
e00d9a9a72
amd/common: Add gep helper for pointer increment.
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
2019-02-06 22:35:36 +01:00
Nicolai Hähnle
300876a9a7
amd/common: scan/reduce across waves of a workgroup
...
Order-aware scan/reduce can trade-off LDS traffic for external atomics
memory traffic in producer/consumer compute shaders.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-12-19 12:01:17 +01:00
Nicolai Hähnle
3963402fd3
amd/common: add ac_build_ifcc
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-12-19 12:01:15 +01:00
Samuel Pitoiset
3fbdcd942f
amd: remove support for LLVM 6.0
...
User are encouraged to switch to LLVM 7.0 released in September 2018.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-12-06 14:02:56 +01:00
Dave Airlie
ec9fe8abc7
ac: avoid casting pointers on bcsel and stores
...
For variable pointers we really don't want to case the pointers to int
without a good reason, just add a wrapper for bcsel loading and result
storing.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2018-11-21 08:54:25 +10:00
Connor Abbott
59535b05cf
ac: Introduce ac_build_expand()
...
And implement ac_bulid_expand_to_vec4() on top of it.
Fixes: 7e7ee82698
("ac: add support for 16bit buffer loads")
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2018-10-22 09:44:51 +02:00
Marek Olšák
bfc795670e
ac: add helpers for fast integer division by a constant
2018-10-16 17:23:25 -04:00
Samuel Pitoiset
416013b4f5
radv: emit the GLC bit for SSBO loads/stores when needed
...
This fixes some new memory model tests:
dEQP-VK.memory_model.message_passing.core11.u32.coherent.fence_fence.atomicwrite.device.*
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108112
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2018-10-12 08:42:08 +02:00
Marek Olšák
77903c8cfb
ac: add ac_build_round
2018-10-06 21:50:09 -04:00
Marek Olšák
a668c8d6ba
ac: define all address spaces properly
2018-10-06 21:50:09 -04:00
Samuel Pitoiset
cfd6314cfe
ac: add 16-bit constant values for zero and one
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2018-09-17 15:18:26 +02:00
Samuel Pitoiset
074e29183c
ac: add ac_build_bifield_reverse() helper
...
Are we missing 64-bit support?
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2018-09-17 15:18:23 +02:00
Samuel Pitoiset
371c35e5bb
ac: add ac_build_bit_count() helper
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2018-09-17 15:18:20 +02:00
Marek Olšák
be0bd95abf
radeonsi: fix GPU hangs with bindless textures and LLVM 7.0
...
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de >
2018-09-10 15:19:56 -04:00
Marek Olšák
e80e8d7adc
ac: fix WAITCNT flags for GFX9
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
2018-08-22 14:34:43 -04:00