Commit Graph

34 Commits

Author SHA1 Message Date
Marek Olšák
ff311df6b5 winsys/amdgpu: remove amdgpu_winsys_bo::num_cs_references to remove atomics
This decreases the CPU time percentage of amdgpu_cs_add_buffer by 50%
on Ryzen 3900X.

We don't need to call amdgpu_bo_is_referenced_by_any_cs
in amdgpu_bo_can_reclaim. The reclaim function is only called for buffers
that have 0 references.

The only downside is that amdgpu_bo_is_referenced_by_cs might be slower
in some very rare cases. Overall the driver overhead is better.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8849>
2021-02-06 05:41:22 +00:00
Marek Olšák
e97af11ba9 winsys/amdgpu,pb_slab: add slabs with 3/4 of power of two sizes to save memory
Instead of aligning slab allocations to powers of two (e.g. 129K -> 256K),
implement slab allocations with 3/4 of power of two sizes to reduce
overallocation. (e.g. 129K -> 192K)

The limitation is that the alignment must be 1/3rd of the allocation size.

DeusExMD allocates 2.1 GB of VRAM. Without this, slabs waste 194 MB due
to alignment, i.e. 9.2%. This commit reduces the waste to 102 MB, i.e. 4.9%.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8683>
2021-02-03 21:53:34 +00:00
Pierre-Eric Pelloux-Prayer
2be8cebd0b amdgpu_bo: make cache_entry a extensible array
Improves performance in SPECviewperf13 snx.
e.g.: test10 fps evolution: 270 -> 280.

"pahole radeonsi_dri.so -C amdgpu_winsys_bo" after:

struct amdgpu_winsys_bo {
	struct pb_buffer           base;                 /*     0    32 */
	union {
		struct {
			amdgpu_va_handle va_handle;      /*    32     8 */
			uint32_t   kms_handle;           /*    40     4 */
			int        map_count;            /*    44     4 */
		} real;                                  /*    32    16 */
		[...]
	} u;                                             /*    32    40 */
	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
	[...]
	struct pb_cache_entry      cache_entry[];        /*   144     0 */

	/* size: 144, cachelines: 3, members: 17 */
};

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7532>
2020-11-19 12:44:40 +00:00
Pierre-Eric Pelloux-Prayer
111a1b2e1c winsys/amdgpu: make RADEON_ALL_BOS a debug only feature
Improves performance in SPECviewperf13 snx.
e.g.: test10 fps evolution: 235 -> 270.

Extract from "pahole radeonsi_dri.so -C amdgpu_winsys_bo", before:

struct amdgpu_winsys_bo {
	struct pb_buffer           base;                 /*     0    32 */
	union {
		struct {
			struct pb_cache_entry cache_entry; /*    32    56 */

			/* XXX last struct has 4 bytes of padding */

			/* --- cacheline 1 boundary (64 bytes) was 24 bytes ago --- */
			amdgpu_va_handle va_handle;      /*    88     8 */
			int        map_count;            /*    96     4 */
			_Bool      use_reusable_pool;    /*   100     1 */

			/* XXX 3 bytes hole, try to pack */

			struct list_head global_list_item; /*   104    16 */
			uint32_t   kms_handle;           /*   120     4 */
		} real;
		[...]
	} u;                                             /*    32    96 */
	[...]
	/* size: 200, cachelines: 4, members: 15 */
};

After:

struct amdgpu_winsys_bo {
	struct pb_buffer           base;                 /*     0    32 */
	union {
		struct {
			struct pb_cache_entry cache_entry; /*    32    56 */

			/* XXX last struct has 4 bytes of padding */

			/* --- cacheline 1 boundary (64 bytes) was 24 bytes ago --- */
			amdgpu_va_handle va_handle;      /*    88     8 */
			int        map_count;            /*    96     4 */
			_Bool      use_reusable_pool;    /*   100     1 */

			/* XXX 3 bytes hole, try to pack */

			uint32_t   kms_handle;           /*   104     4 */
		} real;                                  /*    32    80 */
	} u;                                             /*    32    80 */
	/* --- cacheline 1 boundary (64 bytes) was 48 bytes ago --- */
	[...]
	/* size: 184, cachelines: 3, members: 15 */
};

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7532>
2020-11-19 12:44:40 +00:00
Marek Olšák
745f0b8a31 winsys/amdgpu: move amdgpu_winsys_bo::lock for better packing
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7585>
2020-11-18 23:50:43 -05:00
Marek Olšák
bccb9a7457 winsys/amdgpu: replace amdgpu_winsys_bo::initial_domain with pb_buffer::placement
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7585>
2020-11-18 23:50:41 -05:00
Marek Olšák
9c239aa638 winsys/amdgpu: replace amdgpu_winsys_bo::flags with pb_buffer::usage
Let's use the field so as not to waste memory.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7585>
2020-11-18 23:50:40 -05:00
Marek Olšák
37cdce0146 winsys/amdgpu: remove amdgpu_winsys_bo::sparse
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7585>
2020-11-18 23:50:38 -05:00
Marek Olšák
a09bc2db18 winsys/amdgpu: remove amdgpu_winsys_bo::u::sparse::flags
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7585>
2020-11-18 23:50:36 -05:00
Marek Olšák
3586068557 gallium: rename pipe_transfer_usage -> pipe_map_flags
Acked-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5749>
2020-09-22 03:20:54 +00:00
Marek Olšák
7a6af4c5ed winsys/amdgpu: make amdgpu_bo_unmap non-static
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5798>
2020-07-22 12:08:19 -04:00
Pierre-Eric Pelloux-Prayer
fe2a3b804b amdgpu: add encrypted slabs support
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4401>
2020-05-11 10:25:53 +02:00
Bas Nieuwenhuizen
d80fb02430 winsys/amdgpu: Retrieve WC flags from imported buffers.
Otherwise reading from an imported mapped GTT+WC linear texture
is painfully slow.

Sadly no radeon winsys implementation, as I don't know a suitable
kernel driver operation.

Hit this  in vaGetImage with an image imported from minigbm (which
we are switching to allocate WC for SCANOUT images).

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4542>
2020-04-16 13:51:28 +00:00
Pierre-Eric Pelloux-Prayer
ab54624d0d radeonsi: stop using the VM_ALWAYS_VALID flag
Allocation all the bo as ALWAYS_VALID means they must all fit in memory
(vram + gtt) at each command submission.
This causes some trouble when the total allocated memory is greater than
the available memory.

Possible solutions:
- being able to tag/untag a bo as ALWAYS_VALID: would require kernel changes
- disable VM_ALWAYS_VALID when memory usage is more than a percentage of the
  available memory
- disable VM_ALWAYS_VALID entirely

v1 of this patch implemented option 2. v2 (this version) implements option 3.

Related issues:
 - https://gitlab.freedesktop.org/drm/amd/issues/607
 - https://gitlab.freedesktop.org/mesa/mesa/issues/1257

It also helps with some piglit tests (-t maxsize -t "max[_-].*size" -t maxuniformblocksize):
instead of crashing the machine, the tests fail cleanly.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2190
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3430>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3430>
2020-01-29 09:05:04 +01:00
Michel Dänzer
cb446dc0fa winsys/amdgpu: Add amdgpu_screen_winsys
It extends pipe_screen / radeon_winsys and references amdgpu_winsys.
Multiple amdgpu_screen_winsys instances may reference the same
amdgpu_winsys instance, which corresponds to an amdgpu_device_handle.

The purpose of amdgpu_screen_winsys is to keep a duplicate of the DRM
file descriptor passed to amdgpu_winsys_create, which will be needed
in the next change.

v2:
* Add comment in amdgpu_winsys_unref explaining why it always returns
  true (Marek Olšák)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-07-03 09:19:07 +00:00
Nicolai Hähnle
eb94b6bd5c winsys/amdgpu: explicitly declare whether buffer_map is permanent or not
Introduce a new driver-private transfer flag RADEON_TRANSFER_TEMPORARY
that specifies whether the caller will use buffer_unmap or not. The
default behavior is set to permanent maps, because that's what drivers
do for Gallium buffer maps.

This should eliminate the need for hacks in libdrm. Assertions are added
to catch when the buffer_unmap calls don't match the (temporary)
buffer_map calls.

I did my best to update r600 for consistency (r300 needs no changes
because it never calls buffer_unmap), even though the radeon winsys
ignores the new flag.

As an added bonus, this should actually improve the performance of
the normal fast path, because we no longer call into libdrm at all
after the first map, and there's one less atomic in the winsys itself
(there are now no atomics left in the UNSYNCHRONIZED fast path).

Cc: Leo Liu <leo.liu@amd.com>
v2:
- remove comment about visible VRAM (Marek)
- don't rely on amdgpu_bo_cpu_map doing an atomic write
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-11-28 18:24:14 +01:00
Nicolai Hähnle
35eb81987c winsys/amdgpu: add amdgpu_winsys_bo::lock
We'll use it in the upcoming mapping change. Sparse buffers have always
had one.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-11-28 18:23:29 +01:00
Marek Olšák
461a864316 winsys/amdgpu: pass the BO list via the CS ioctl on DRM >= 3.27.0 2018-08-03 18:35:19 -04:00
Timothy Arceri
87f02ddfd1 amdgpu: use simple mtx
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-09 12:07:48 +11:00
Marek Olšák
529cdce799 radeonsi: remove 'Authors:' comments
It's inaccurate. Instead, see the copyright and use "git log" and
"git blame" to know the authorship.

Acked-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-11-02 18:19:03 +01:00
Christian König
214b565bc2 winsys/amdgpu: set AMDGPU_GEM_CREATE_VM_ALWAYS_VALID if possible v2
When the kernel supports it set the local flag and
stop adding those BOs to the BO list.

Can probably be optimized much more.

v2: rename new flag to AMDGPU_GEM_CREATE_VM_ALWAYS_VALID

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-08-31 14:55:38 +02:00
Nicolai Hähnle
e348248647 winsys/amdgpu: add sparse buffer data structures
v2:
- remove pipe_mutex_*
- use a simple page commitment array

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-04-05 10:37:18 +02:00
Nicolai Hähnle
ffa1c669dd winsys/amdgpu: enable buffer allocation from slabs
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:45:23 +02:00
Nicolai Hähnle
a987e4377a winsys/amdgpu: add slab entry structures to amdgpu_winsys_bo
Already adjust amdgpu_bo_map/unmap accordingly.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:45:15 +02:00
Nicolai Hähnle
5af9eef719 winsys/amdgpu: do not synchronize unsynchronized buffers
When a buffer is added to a CS without the SYNCHRONIZED usage flag, we now
no longer add a dependency on the buffer's fence(s).

However, we still need to add a fence to the buffer during flush, so that
cache reclaim works correctly (and in the hypothetical case that the buffer
is later added to a CS _with_ the SYNCHRONIZED flag).

It is now possible that the submissions refererring to a buffer are no longer
linearly ordered, and so we may have to keep multiple fences around. We keep
the fences in a FIFO. It should usually stay quite short (# of contexts * 2,
for gfx + dma rings).

While we're at it, extract amdgpu_add_fence_dependency for a single buffer,
which will make adding the distinction between real buffer and slab cases
easier.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-27 16:45:11 +02:00
Nicolai Hähnle
11cbf4d7ae winsys/amdgpu: use only one fence per BO
The fence that is added to the BO during flush is guaranteed to be
signaled after all the fences that were in the fences array of the BO
before the flush, because those fences are added as dependencies for the
submission (and all this happens atomically under the bo_fence_lock).

Therefore, keeping only the last fence around is sufficient.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:54:59 +02:00
Nicolai Hähnle
339867c077 gallium/radeon/winsyses: remove #includes of pb_bufmgr.h
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-09-12 13:54:36 +02:00
Marek Olšák
1e04483c22 winsys/amdgpu: track the amount of mapped memory
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-08-10 01:11:10 +02:00
Marek Olšák
53f33619a4 winsys/amdgpu: add back multithreaded command submission
Ported from the initial amdgpu winsys from the private AMD branch.

The thread creates the buffer list, submits IBs, and cleans up
the submission context, which can also destroy buffers.

3-5% reduction in CPU overhead is expected for apps submitting a lot
of IBs per frame. This is most visible with DMA IBs.

v2: use a semaphore instead of a busy loop in amdgpu_ws_queue_cs
    add another amdgpu_cs_sync_flush call into amdgpu_bo_map

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-05-26 16:43:45 +02:00
Marek Olšák
e78170f388 winsys/amdgpu: split IB data into a new structure in preparation for CE
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2016-04-19 18:10:30 +02:00
Marek Olšák
e707b9d8ba winsys/amdgpu: optionally use buffer lists with all allocated buffers
Set RADEON_ALL_BOS=1 to use it.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2016-01-23 17:01:54 +01:00
Marek Olšák
1e05812fcd winsys/amdgpu: don't use the "rws" abbreviation for amdgpu_winsys
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-11 15:25:12 +01:00
Marek Olšák
6f4e74d165 winsys/amdgpu: use pb_cache instead of pb_cache_manager
This is a prerequisite for the removal of radeon_winsys_cs_handle.

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-12-11 15:25:12 +01:00
Marek Olšák
2eb067db0f winsys/amdgpu: add a new winsys for the new kernel driver
v2: - lots of changes according to Emil Velikov's comments
    - implemented radeon_winsys::read_registers

v3: - a lot of new work, many of them adapt to libdrm interface changes
Squashed patches:
winsys/amdgpu: implement radeon_winsys context support
winsys/amdgpu: add reference counting for contexts
winsys/amdgpu: add userptr support
winsys/amdgpu: allocate IBs like normal buffers
winsys/amdgpu: add IBs to the buffer list, adapt to interface changes
winsys/amdgpu: don't use KMS handles as reloc hash keys
winsys/amdgpu: sync buffer accesses to different rings
winsys/amdgpu: use dependencies instead of waiting for last fence v2
gallium/radeon: unify buffer_wait and buffer_is_busy in the winsys interface (amdgpu part)
winsys/amdgpu: track fences per ring and be thread-safe
winsys/amdgpu: simplify waiting on a variable in amdgpu_fence_wait
gallium/radeon: allow the winsys to choose the IB size (amdgpu part)
winsys/amdgpu: switch to new amdgpu_cs_query_fence_status interface
winsys/amdgpu: handle fence and dependencies merge
winsys/amdgpu follow libdrm change to move user fence into UMD
winsys/amdgpu: use amdgpu_bo_va_op for va map/unmap v2
winsys/amdgpu: use the new tiling flags
winsys/amdgpu: switch to new GTT_USWC definition
winsys/amdgpu: expose amdgpu_cs_query_reset_state to drivers
winsys/amdgpu: fix valgrind warnings
winsys/amdgpu: don't use VRAM with APUs that don't have much of it
winsys/amdgpu: require LLVM 3.6.1 for VI because of bug fixes there
winsys/amdgpu: remove amdgpu_winsys::num_cpus
winsys/amdgpu: align BO size to page size
winsys/amdgpu: reduce BO cache timeout
winsys/amdgpu: remove useless flushing and waiting in amdgpu_bo_set_tiling
winsys/amdgpu: use amdgpu_device_handle as a unique device ID instead of fd
winsys/amdgpu: use safer access to amdgpu_fence_wait::signalled
winsys/amdgpu: allow maximum IB size of 4 MB
winsys/amdgpu: add ip_instance into amdgpu_fence
gallium/radeon: add RING_COMPUTE instead of RADEON_FLUSH_COMPUTE
winsys/amdgpu: set the ring type at CS initilization
winsys/amdgpu: query the GART page size from the kernel
winsys/amdgpu: correctly wait for shared buffers to become idle
winsys/amdgpu: set the amdgpu_cs_fence structure only once at fence creation
winsys/amdgpu: add a specific error message for cs_submit -> -ENOMEM
winsys/amdgpu: check num_active_ioctls before calling amdgpu_bo_wait_for_idle
winsys/amdgpu: clear user fence BO after allocating it
winsys/amdgpu: fix user fences
winsys/amdgpu: make amdgpu_winsys_create public
winsys/amdgpu: remove thread offloading
winsys/amdgpu: flatten the amdgpu_cs_context structure and simplify more

v4: require libdrm 2.4.63
2015-08-14 15:02:28 +02:00