Marek Olšák
be973ed21f
radeonsi: load the right number of components for VS inputs and TBOs
...
The supported counts are 1, 2, 4. (3=4)
The following snippet loads float, vec2, vec3, and vec4:
Before:
buffer_load_format_x v9, v4, s[0:3], 0 idxen ; E0002000 80000904
buffer_load_format_xyzw v[0:3], v5, s[8:11], 0 idxen ; E00C2000 80020005
s_waitcnt vmcnt(0) ; BF8C0F70
buffer_load_format_xyzw v[2:5], v6, s[12:15], 0 idxen ; E00C2000 80030206
s_waitcnt vmcnt(0) ; BF8C0F70
buffer_load_format_xyzw v[5:8], v7, s[4:7], 0 idxen ; E00C2000 80010507
After:
buffer_load_format_x v10, v4, s[0:3], 0 idxen ; E0002000 80000A04
buffer_load_format_xy v[8:9], v5, s[8:11], 0 idxen ; E0042000 80020805
buffer_load_format_xyzw v[0:3], v6, s[12:15], 0 idxen ; E00C2000 80030006
s_waitcnt vmcnt(0) ; BF8C0F70
buffer_load_format_xyzw v[3:6], v7, s[4:7], 0 idxen ; E00C2000 80010307
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
2018-02-01 16:20:19 +01:00
Dave Airlie
16dd0eb517
ac/llvm: bump the number of results to 8.
...
This function can get access for a 64-bit dvec4, which means we
have to load 8 components.
This fixes:
R600_DEBUG=nir ./bin/shader_runner generated_tests/spec/arb_gpu_shader_fp64/execution/built-in-functions/fs-abs-dvec4.shader_test -auto
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-01-31 05:37:16 +10:00
Marek Olšák
b633999a4e
ac: rename and move si_const_array into common code
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
2018-01-27 02:09:09 +01:00
Samuel Pitoiset
51e14bc3c0
ac: pass the number of channels to ac_build_buffer_load_format()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2018-01-26 12:14:27 +01:00
Samuel Pitoiset
d7c93b558a
ac: add ac_build_buffer_load_common() helper
...
For both versions of llvm.amdgcn.buffer.load.{format}.*.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2018-01-26 12:14:27 +01:00
Timothy Arceri
5b9362c248
ac: fix ac_build_varying_gather_values() for packed layouts
...
This fixes a segfault for varyings not starting at component 0.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2018-01-23 10:00:52 +11:00
Timothy Arceri
38876c88d1
ac: add i64_0 and i64_1 to llvm build context
...
These will be used in the following patch.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2018-01-14 11:40:03 +11:00
Timothy Arceri
d7b6b8ba52
ac: add f64_0 to the llvm build context
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
2018-01-12 09:29:18 +11:00
Timothy Arceri
c0eb304acd
ac: add f64_1 to the llvm build context
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
2018-01-12 09:29:17 +11:00
Samuel Pitoiset
7239e265eb
amd/common: import get_{load,store}_intr_attribs() from RadeonSI
...
v2: move those helpers to the header and use static inline
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl > (v1)
2018-01-10 19:02:23 +01:00
Marek Olšák
a140aeb619
ac: add ac_build_fmin/fmax helpers
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
2018-01-06 09:51:43 +01:00
Timothy Arceri
4a0c24f2dd
ac: rework ac_llvm_extract_elem()
...
Simplifies the logic a little and asserts index is 0.
Suggested-by: Nicolai Hähnle <nhaehnle@gmail.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-01-05 12:20:38 +11:00
Timothy Arceri
b99ebaa4fd
ac: move some helpers to ac_llvm_build.c
...
We will call these from the radeonsi NIR backend.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2018-01-05 11:58:55 +11:00
Samuel Pitoiset
03ef264146
amd/common: pass the family to ac_llvm_context_init()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2017-12-22 10:38:44 +01:00
Samuel Pitoiset
225b198802
amd/common: add ac_build_waitcnt()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2017-12-14 22:24:44 +01:00
Samuel Pitoiset
d43e72fd8c
radeonsi: make use of ac_build_fdiv()
...
And move the comment to amd/common.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2017-12-14 22:24:38 +01:00
Timothy Arceri
caf15ce670
ac: move build_varying_gather_values() to ac_llvm_build.h and expose
...
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2017-12-04 12:52:19 +11:00
Timothy Arceri
7f4966731f
ac: add v2f32 to the common code and make use of it
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2017-11-03 14:54:46 +11:00
Timothy Arceri
ee376ac6f4
ac: add v3i32 to the common code and make use of it
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2017-11-03 14:54:45 +11:00
Timothy Arceri
309a51411d
ac: add v2i32 to the common code and use it
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com
Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2017-11-03 14:54:45 +11:00
Dave Airlie
82d47b9d38
ac/llvm: consolidate find lsb function.
...
This was the same between si and ac.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com >
Signed-off-by: Dave Airlie <airlied@redhat.com >
2017-10-26 15:59:31 +10:00
Dave Airlie
a76b6c2192
ac/llvm: add i1false/i1true to common code.
...
These get used in fair few places.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com >
Signed-off-by: Dave Airlie <airlied@redhat.com >
2017-10-26 15:59:18 +10:00
Dave Airlie
f925f5b074
ac/nir: move lds declaration/load/store into shared code.
...
This was duplicated between both drivers, share here.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com >
Signed-off-by: Dave Airlie <airlied@redhat.com >
2017-10-26 15:59:11 +10:00
Marek Olšák
2a414c3961
radeonsi: postponed KILL isn't postponed anymore, but maintains WQM
...
This restores performance for the drirc workaround, i.e.
KILL_IF does:
visible = src0 >= 0;
kill_flag &= visible; // accumulate kills
amdgcn_kill(wqm_vote(visible)); // kill fully dead quads only
And all helper pixels are killed at the end of the shader:
amdgcn_kill(kill_flag);
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2017-10-24 14:56:34 +02:00
Marek Olšák
478afbe525
ac: use llvm.amdgcn.kill with LLVM 6.0
...
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2017-10-24 14:56:34 +02:00
Marek Olšák
1ff9e27cbd
ac: replace ac_build_kill with ac_build_kill_if_false
...
This will be a new LLVM intrinsic and will also work nicely with
llvm.amdgcn.wqm.vote.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2017-10-24 14:56:34 +02:00
Eric Anholt
34c04c734f
ac: Fix a compiler warning for possibly undefined "name"
...
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2017-10-23 10:14:40 -07:00
Dave Airlie
1dda214d9c
ac/nir: init full exec mask for merged shaders.
...
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Reviewed-by: Dave Airlie <airlied@redhat.com >
2017-10-20 01:50:40 +02:00
Marek Olšák
854593b8eb
ac: clean up ac_build_indexed_load function interfaces
...
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2017-10-17 22:03:03 +02:00
Marek Olšák
bcd3e761a3
ac: properly document a buffer.store LLVM workaround
...
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
2017-10-06 02:56:11 +02:00
Marek Olšák
94d800bfa3
ac: silence a warning
2017-10-04 17:00:05 +02:00
Nicolai Hähnle
052b974fed
amd/common: move ac_build_phi from radeonsi
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2017-10-02 12:17:15 +02:00
Nicolai Hähnle
a6ea4c1b93
amd/common: save an instruction in the build_cube_select sequence
...
Avoid a v_cndmask: the absolute value is free due to input modifiers.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de >
2017-09-29 11:43:07 +02:00
Nicolai Hähnle
5be5c1e0fa
amd/common: fix build_cube_select
...
Fix the custom cube coord selection sequence to be identical to
the hardware v_cubesc/tc and OpenGL spec. Affects texture sampling
with user-provided derivatives.
Fixes dEQP-GLES3.functional.shaders.texture_functions.texturegrad.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de >
2017-09-29 11:43:04 +02:00
Nicolai Hähnle
94736d31c3
amd/common: add workaround for cube map array layer clamping
...
Fixes dEQP-GLES31.functional.texture.filtering.cube_array.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2017-09-18 11:25:18 +02:00
Nicolai Hähnle
6772452e4c
amd/common: remove has_ds_bpermute argument from ac_build_ddxy
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2017-09-18 11:25:18 +02:00
Nicolai Hähnle
3db86d86ed
amd/common: add chip_class to ac_llvm_context
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2017-09-18 11:25:18 +02:00
Nicolai Hähnle
e0af3bed2c
amd/common: round cube array slice in ac_prepare_cube_coords
...
The NIR-to-LLVM pass already does this; now the same fix covers
radeonsi as well.
Fixes various tests of
dEQP-GLES31.functional.texture.filtering.cube_array.combinations.*
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Reviewed-by: Dave Airlie <airlied@redhat.com >
2017-09-18 11:25:18 +02:00
Connor Abbott
b909d278d0
ac: remove bitcast_to_float()
...
ac_to_float() does a superset of what it does.
Signed-off-by: Dave Airlie <airlied@redhat.com >
2017-09-08 04:24:56 +01:00
Connor Abbott
50967cd0b0
ac: move ac_to_integer() and ac_to_float() to ac_llvm_build.c
...
We'll need to use ac_to_integer() for other stuff in ac_llvm_build.c.
Reviewed-by: Dave Airlie <airlied@redhat.com >
2017-09-08 04:24:02 +01:00
Connor Abbott
fafa299511
ac: fix ac_get_type_size() for doubles
...
Reviewed-by: Dave Airlie <airlied@redhat.com >
2017-09-08 04:19:47 +01:00
Connor Abbott
b8a51c8c4b
radeonsi: move the guts of ARB_shader_group_vote emission to ac
...
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Signed-off-by: Dave Airlie <airlied@redhat.com >
2017-09-08 04:12:49 +01:00
Connor Abbott
bd73b89792
radeonsi: move si_emit_ballot() to ac
...
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Signed-off-by: Dave Airlie <airlied@redhat.com >
2017-09-08 04:12:42 +01:00
Connor Abbott
ac27fa7294
radeonsi: move emit_optimization_barrier() to ac
...
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Signed-off-by: Dave Airlie <airlied@redhat.com >
2017-09-08 04:06:47 +01:00
Connor Abbott
c181d4f2b7
radeonsi: move llvm_get_type_size() to ac
...
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Signed-off-by: Dave Airlie <airlied@redhat.com >
2017-09-08 04:04:16 +01:00
Dave Airlie
cb6f16dce9
radeon/ac: use ds_swizzle for derivs on si/cik.
...
This looks like it's supported since llvm 3.9 at least,
so switch over radeonsi and radv to using it, -pro also
uses this. We can now drop creating lds for these operations
as the ds_swizzle operation doesn't actually write to lds at all.
Acked-by: Marek Olšák <marek.olsak@amd.com >
(stable requested due to fixing radv CIK conformance tests)
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Dave Airlie <airlied@redhat.com >
2017-08-02 00:12:01 +01:00
Nicolai Hähnle
a69afb68c9
radeonsi: use new function ac_build_umin for edgeflag clamping
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2017-07-31 14:55:42 +02:00
Nicolai Hähnle
ac2ab5acad
ac/nir: add always_vector argument to ac_build_gather_values_extended
...
This simplifies a bunch of places that no longer need special treatment
of value_count == 1. We rely on LLVM to optimize away the 1-element vector
types.
This fixes a bunch of bugs where 1-element arrays are indexed indirectly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2017-07-31 14:55:42 +02:00
Dave Airlie
ff422500cc
ac/nir: remove last remnants of v16i8
...
llvm doesn't need this workaround anymore.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Signed-off-by: Dave Airlie <airlied@redhat.com >
2017-06-28 20:22:30 +01:00
Nicolai Hähnle
edfd3be77e
ac: add ac_llvm_context::v8i32
...
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Signed-off-by: Dave Airlie <airlied@redhat.com >
2017-06-27 10:28:29 +10:00