Marek Olšák
f68aeaa2c2
ac/llvm: remove inst_offset parameter from ac_build_buffer_store_dword
...
There was a bug that inst_offset was added to soffset in one codepath and
to voffset in all other codepaths. The correct behavior is to add it
to voffset.
Reviewed-by: Mihai Preda <mhpreda@gmail.com >
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15966 >
2022-04-23 01:45:17 +00:00
Marek Olšák
8abb612cba
ac/llvm: remove immoffset parameter from ac_build_tbuffer_load
...
Reviewed-by: Mihai Preda <mhpreda@gmail.com >
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15966 >
2022-04-23 01:45:17 +00:00
Marek Olšák
a3e777a89a
ac/llvm: add AC_WAIT_EXP for ac_build_waitcnt
...
Reviewed-by: Mihai Preda <mhpreda@gmail.com >
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15966 >
2022-04-23 01:45:17 +00:00
Marek Olšák
5af4d0c2dc
ac/llvm: remove LLVM pass ac_optimize_vs_outputs
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14414 >
2022-04-22 22:21:11 +00:00
Mihai Preda
279eea5bda
amd/llvm: Transition to LLVM "opaque pointers"
...
For context, see LLVM opaque pointers:
https://llvm.org/docs/OpaquePointers.html
https://llvm.org/devmtg/2015-10/slides/Blaikie-OpaquePointerTypes.pdf
This fixes the deprecation warnings in src/amd/llvm only.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15572 >
2022-04-09 05:20:29 +00:00
Marek Olšák
fd14e0afb1
radeonsi: set done=1 for PS exports at the end of si_llvm_build_ps_epilog
...
so that we can reorder the exports
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14266 >
2022-01-05 12:46:30 +00:00
Marek Olšák
d382ceea2b
ac/llvm: remove the num_channels parameter from ac_build_buffer_store_dword
...
It was used when LLVM didn't support vec3 and we had to pass vec4
with num_channels=3. We no longer need to do that.
This also removes the vec3 splitting or conversion to vec4 in callers.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14266 >
2022-01-05 12:46:30 +00:00
Marek Olšák
e6aac44051
ac/llvm: add vindex into ac_build_buffer_store_dword
...
for future work
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14266 >
2022-01-05 12:46:30 +00:00
Marek Olšák
efaab0ec50
ac/llvm: add helper ac_build_is_inf_or_nan
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13380 >
2021-10-19 12:49:06 +00:00
Samuel Pitoiset
cf3e31fd11
ac/llvm: implement nir_intrinsic_image_deref_atomic_{fmin,fmax}
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12716 >
2021-09-15 14:10:42 +00:00
Marek Olšák
b330c7cb2a
radeonsi: use a trick to extract and pack edgeflags using fewer instructions
...
This removes 4 instructions from the prim export packing.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12343 >
2021-09-14 15:24:11 +00:00
Marek Olšák
2e95ad1433
ac/llvm: implement a bunch of NIR AMD intrinsics for NGG
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12570 >
2021-09-07 17:51:41 +00:00
Timur Kristóf
1e49018ced
amd: Add extra source to the mbcnt_amd NIR intrinsic.
...
The v_mbcnt instructions can take an extra source that they add to
the result. This is not exposed in SPIR-V but we now expose it in NIR.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11072 >
2021-06-09 16:48:51 +00:00
Marek Olšák
13acbaecd8
radeonsi: rewrite the prefix sum computation for shader culling
...
Instead of storing the vertex mask per wave into LDS and then computing
the prefix sum, store 8-bit bitcounts (vertex counts) of the vertex masks
into LDS. This allows us to compute the sum using v_sad_u8, which computes
a sum of 4 i8vec4 components in one instruction.
Each i8vec4 of vertex counts is loaded in parallel threads (one dword
per thread) instead of all being loaded in thread 0, and readlane copies
them to SGPRs instead of readfirstlane.
LDS is no longer initialized before culling. Instead, the counts for
inactive waves are masked with AND later.
Incorrect old comments are also fixed.
This change removes 80 bytes from the code size, and it allows increasing
the workgroup size from 128 to 256. (which is the main motivation for this)
Now changing the workgroup size with wave64 has no effect on the code size.
Switching to wave32 with 8 waves even generates slightly smaller code than
wave64 with 4 waves.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10813 >
2021-05-25 16:15:44 +00:00
Marek Olšák
57e182c75b
ac/llvm: allow ac_build_optimization_barrier with SGPRs, pointers, and metadata
...
sgpr=true prevents moving the value to a VGPR.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10813 >
2021-05-25 16:15:44 +00:00
Marek Olšák
5f33f80dc7
ac/llvm: expose set_range_metadata to more users
...
I sometimes use it for experiments. It will be used later.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10813 >
2021-05-25 16:15:44 +00:00
Marek Olšák
1dff495057
ac/llvm: implement 16-bit packed VS outputs and FS inputs
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9051 >
2021-04-13 21:10:43 -04:00
Marek Olšák
230a6dc55d
ac,radeonsi: add sampler changes for Aldebaran
...
- no 3D and cube textures
- no mipmapping
- no border color
- image_sample is the only supported opcode with a sampler (behaves like _lz)
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9389 >
2021-03-10 18:02:27 +00:00
Marek Olšák
18c1c1404d
ac/llvm: add type parameter into ac_build_buffer_load to fix 16-bit TES inputs
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9395 >
2021-03-03 20:06:09 +00:00
Marek Olšák
3475c79328
ac/llvm: add support for 16-bit source operands for samplers
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9395 >
2021-03-03 20:06:09 +00:00
Rhys Perry
6d5e26752c
ac/nir: implement sparse image/texture loads
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7775 >
2021-01-08 14:27:07 +00:00
Marek Olšák
fc212dcaa5
amd/llvm: fix C++ compile failures
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7807 >
2020-12-09 16:01:21 -05:00
James Park
ee72cd0757
amd: Remove bitfield sizes from enum values
...
Fixes negative indexing on MSVC.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7791 >
2020-11-27 20:49:00 -08:00
Marek Olšák
d425d765bf
ac: add build_alloca with an initializer
...
combining alloca_undef + BuildStore.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7542 >
2020-11-18 06:19:59 +00:00
Marek Olšák
aa757f4f8c
ac/llvm: fix demote inside conditional branches
...
The big comment explains it.
v2: don't kill if subgroup ops are used
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7586 >
2020-11-12 21:02:05 +00:00
Marek Olšák
e690a1b78b
ac/llvm: don't lower bool to int32, switch to native i1 bool
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7077 >
2020-10-20 10:21:39 +00:00
Samuel Pitoiset
5000c344cc
ac/llvm: move AC_FETCH_FORMAT to non-LLVM code
...
While we are it, give it a name.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7065 >
2020-10-12 13:13:40 +00:00
Samuel Pitoiset
31a0574b96
ac/nir: implement nir_op_fsat
...
With fmed3 if available, otherwise fallback to fmin/fmax.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6932 >
2020-10-08 12:38:04 +00:00
Marek Olšák
98a52fecda
radeonsi: implement 16-bit FS color outputs
...
This removes type conversions from 16 bits to 32 bits in the main function
and then back to 16 bits in the epilog.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6622 >
2020-09-22 02:44:53 +00:00
Pierre-Eric Pelloux-Prayer
82d2d73e03
amd/llvm: switch to 3-spaces style
...
Follow-up of !4319 using the same clang-format config.
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Acked-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5310 >
2020-09-07 10:00:20 +02:00
Marek Olšák
e8d55e6db3
ac/llvm: fix b2f for v2f16
...
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6284 >
2020-09-06 14:36:21 +00:00
Marek Olšák
d9a77f9ca3
ac/llvm: add better code for fsign
...
There are 2 improvements:
- better code for 16, 32, and 64 bits
- vector support for 16 and 32 bits
Totals:
SGPRS: 2639738 -> 2625882 (-0.52 %)
VGPRS: 1534120 -> 1533916 (-0.01 %)
Spilled SGPRs: 3541 -> 3557 (0.45 %)
Spilled VGPRs: 33 -> 33 (0.00 %)
Private memory VGPRs: 256 -> 256 (0.00 %)
Scratch size: 292 -> 292 (0.00 %) dwords per thread
Code Size: 55640332 -> 55384892 (-0.46 %) bytes
Max Waves: 964785 -> 964857 (0.01 %)
Totals from affected shaders:
SGPRS: 377352 -> 363496 (-3.67 %)
VGPRS: 209800 -> 209596 (-0.10 %)
Spilled SGPRs: 1979 -> 1995 (0.81 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 256 -> 256 (0.00 %)
Scratch size: 256 -> 256 (0.00 %) dwords per thread
Code Size: 12549300 -> 12293860 (-2.04 %) bytes
Max Waves: 105762 -> 105834 (0.07 %)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6284 >
2020-09-06 14:36:21 +00:00
Marek Olšák
ca74603b4f
ac/llvm: add better code for isign
...
There are 2 improvements:
- select v_med3_i32
- support vectors
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6284 >
2020-09-06 14:36:21 +00:00
Marek Olšák
84500eebd7
ac/llvm: remove stub prototype for fmed3
...
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6284 >
2020-09-06 14:36:20 +00:00
Marek Olšák
70b6d54011
ac/nir: support 16-bit data in image opcodes
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5003 >
2020-06-02 16:29:25 -04:00
Marek Olšák
c3e0ba52a0
ac/nir: support 16-bit data in buffer_load_format opcodes
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5003 >
2020-06-02 16:29:25 -04:00
Marek Olšák
b819ba949b
ac/nir: remove type and num_channels args from ac_build_buffer_store_common
...
They were only used for type overloading where we can just use
the type of data.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5003 >
2020-06-02 16:29:25 -04:00
Marek Olšák
e5ea87cde8
ac/nir: use more types from ac_llvm_context
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5003 >
2020-06-02 16:29:25 -04:00
Samuel Pitoiset
14292310d9
ac/nir: implement nir_intrinsic_shader_clock with device scope
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5117 >
2020-05-24 20:37:58 +02:00
Samuel Pitoiset
0d63a1a84d
ac/llvm: add support for texturing with clamped LOD
...
This is a requirement for the shaderResourceMinLod feature which
allows to clamp LOD. This uses all image_sample_*_cl variants.
All dEQP-VK.glsl.texture_functions.texture*clamp.* pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4989 >
2020-05-14 10:05:44 +00:00
Pierre-Eric Pelloux-Prayer
17acff01a0
radeonsi: skip vs output optimizations for some outputs
...
If PT_SPRITE_TEX is enabled, PS inputs are overriden at runtime so
we can't apply the vs output optim.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2747
Fixes: 3ec9975555
("radeonsi: eliminate trivial constant VS outputs")
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4559 >
2020-04-20 08:45:16 +02:00
Samuel Pitoiset
ba2ec1f369
ac/nir: use llvm.amdgcn.rcp in ac_build_fdiv()
...
Instead of emitting 1.0 / x which includes a slow division that
LLVM doesn't always optimize even if the metadata is correctly set.
No pipeline-db changes with VEGA10/LLVM 9.
pipeline-db (VEGA10/LLVM 10):
Totals from affected shaders:
SGPRS: 6672 -> 6672 (0.00 %)
VGPRS: 6652 -> 6652 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 561780 -> 561692 (-0.02 %) bytes
Max Waves: 1043 -> 1043 (0.00 %)
pipeline-db (VEGA10/LLVM 11 - 92744f62478):
Totals from affected shaders:
SGPRS: 84608 -> 83768 (-0.99 %)
VGPRS: 106768 -> 106636 (-0.12 %)
Spilled SGPRs: 1625 -> 1713 (5.42 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 10850936 -> 10726712 (-1.14 %) bytes
Max Waves: 3152 -> 3180 (0.89 %)
LLVM 11 (master) is more affected than previous versions, but
based on the small impact with LLVM 9/10, I decided to emit it
unconditionally.
Cc: 20.0 <mesa-stable@lists.freedesktop.org >
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4326 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4326 >
2020-03-27 08:05:43 +01:00
Daniel Schürmann
de57ea2a3d
amd/llvm: implement nir_intrinsic_demote(_if) and nir_intrinsic_is_helper_invocation
...
The current implementation uses a temporary helper variable
to ensure correct behavior until LLVM provides an intrinsic.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4047 >
2020-03-09 12:29:32 +00:00
Marek Olšák
4e4b2d13f0
ac: add helper ac_build_triangle_strip_indices_to_triangle
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
2020-01-20 16:16:11 -05:00
Marek Olšák
0f45d4dc2b
ac: add ac_build_readlane without optimization barrier
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
2020-01-20 16:16:11 -05:00
Marek Olšák
77393cf39b
ac: add prefix bitcount functions
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
2020-01-20 16:16:11 -05:00
Marek Olšák
9b71041627
ac: add ac_build_s_endpgm
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
2020-01-08 16:03:48 -05:00
Marek Olšák
1c44480538
ac: add 128-bit bitcount
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
2020-01-08 16:00:41 -05:00
Marek Olšák
d1c8aeb24f
ac: unify primitive export code
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
2020-01-08 16:00:38 -05:00
Marek Olšák
1c77a18cc2
ac: unify build_sendmsg_gs_alloc_req
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
2020-01-08 16:00:36 -05:00