anv/xe: try harder when the vm_bind ioctl fails

From all the many possible errors returned by the vm_bind ioctl, some can actually happen in the wild when the system is under memory pressure. Thomas Hellström pointed to us that, due to its asynchronous nature, the vm_bind ioctl itself has to pin some memory, so if the number of bind operations passed is too big, there is a probability that it may run out of memory. Previously the Kernel would return ENOMEM when this condition happened. Since commit e8babb280b5e ("drm/xe: Convert multiple bind ops into single job") the Kernel has started returning ENOBUFS when it doesn't have enough memory to do what it wants but thinks we'd succeed if we tried to do one bind operation at a time (instead of doing multiple operations in the same ioctl), and ENOMEM in some other situations. Still-uncommitted commit "drm/xe: Return -ENOBUFS if a kmalloc fails which is tied to an array of binds" proposes converting a few more ENOMEM cases no ENOBUFS. Still, even ENOMEM situations could in theory be possible to recover from, because if we wait some amount of time, resources that may have been consuming memory could end up being freed by other threads or processes, allowing the operations to succeed. So our main idea in this patch is that we treat both ENOMEM and ENOBUFS in the same way, so our implementation can work with any xe.ko driver regardless of having or not having the commits mentioned above. So in this patch, when we detect the system is under memory pressure (i.e., the vm_bind() function returns VK_ERROR_OUT_OF_HOST_MEMORY), we throw away our performance expectations and try to go slowly and steady. First we wait everything we're supposed to wait (hoping that this alone could also help to alleviate the memory pressure), and then we synchronously bind one piece at a time (as this will ensure ENOBUFS can't be returned), hoping that this won't cause the Kernel to try to reserve too much memory. All this while also hoping that whatever thing that may be eating all the memory goes away in the meantime. If even this fails, we give up and hope the upper layer will be able to figure out what to do. This fixes a bunch of LNL failures and flaky tests (as LNL is our first officially supported xe.ko platform). This can be seen in dEQP but only if multiple tests are being run parallel. Happens in multiple tests, some of which may include: - dEQP-VK.sparse_resources.image_sparse_binding.2d_array.rgba8_snorm.1024_128_8 - dEQP-VK.sparse_resources.image_sparse_binding.3d.rgba16_snorm.1024_128_8 - dEQP-VK.sparse_resources.image_sparse_binding.3d.rgba16ui.512_256_6 I don't ever see these errors when running Alchemist/DG2 with xe.ko. Fixes: e9f63df2f2 ("intel/dev: Enable LNL PCI IDs without INTEL_FORCE_PROBE") Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30276>
2024-03-27 15:53:50 -07:00
parent 8b86653da6
commit dd5362c78a
2 changed files with 85 additions and 3 deletions
--- a/src/intel/vulkan/anv_sparse.c
+++ b/src/intel/vulkan/anv_sparse.c
@@ -798,7 +798,76 @@ anv_sparse_bind_vm_bind(struct anv_device *device,
   if (!queue)
      assert(submit->wait_count == 0 && submit->signal_count == 0);

-   return device->kmd_backend->vm_bind(device, submit, ANV_VM_BIND_FLAG_NONE);
+   VkResult result = device->kmd_backend->vm_bind(device, submit,
+                                                  ANV_VM_BIND_FLAG_NONE);
+
+   if (result == VK_ERROR_OUT_OF_HOST_MEMORY) {
+      /* If we get this, the system is under memory pressure. First we
+       * manually wait for all our dependency syncobjs hoping that some memory
+       * will be released while we wait, then we try to issue each bind
+       * operation in a single ioctl as it requires less Kernel memory and so
+       * we may be able to move things forward, although slowly, while also
+       * waiting for each operation to complete before issuing the next.
+       * Performance isn't a concern at this point: we're just trying to move
+       * progress forward without crashing until whatever is eating too much
+       * memory goes away.
+       */
+
+      result = vk_sync_wait_many(&device->vk, submit->wait_count,
+                                 submit->waits, VK_SYNC_WAIT_COMPLETE,
+                                 INT64_MAX);
+      if (result != VK_SUCCESS)
+         return vk_queue_set_lost(&queue->vk, "vk_sync_wait_many failed");
+
+      struct vk_sync *sync;
+      result = vk_sync_create(&device->vk,
+                              &device->physical->sync_syncobj_type,
+                              VK_SYNC_IS_TIMELINE, 0 /* initial_value */,
+                              &sync);
+      if (result != VK_SUCCESS)
+         return vk_error(device, VK_ERROR_OUT_OF_HOST_MEMORY);
+
+      for (int b = 0; b < submit->binds_len; b++) {
+         struct vk_sync_signal sync_signal = {
+            .sync = sync,
+            .signal_value = b + 1,
+         };
+         struct anv_sparse_submission s = {
+            .queue = submit->queue,
+            .binds = &submit->binds[b],
+            .binds_len = 1,
+            .binds_capacity = 1,
+            .wait_count = 0,
+            .signal_count = 1,
+            .waits = NULL,
+            .signals = &sync_signal,
+         };
+         result = device->kmd_backend->vm_bind(device, &s,
+                                               ANV_VM_BIND_FLAG_NONE);
+         if (result != VK_SUCCESS) {
+            vk_sync_destroy(&device->vk, sync);
+            return vk_error(device, result); /* Well, at least we tried... */
+         }
+
+         result = vk_sync_wait(&device->vk, sync, sync_signal.signal_value,
+                               VK_SYNC_WAIT_COMPLETE, UINT64_MAX);
+         if (result != VK_SUCCESS) {
+            vk_sync_destroy(&device->vk, sync);
+            return vk_queue_set_lost(&queue->vk, "vk_sync_wait failed");
+         }
+      }
+
+      vk_sync_destroy(&device->vk, sync);
+
+      for (uint32_t i = 0; i < submit->signal_count; i++) {
+         struct vk_sync_signal *s = &submit->signals[i];
+         result = vk_sync_signal(&device->vk, s->sync, s->signal_value);
+         if (result != VK_SUCCESS)
+            return vk_queue_set_lost(&queue->vk, "vk_sync_signal failed");
+      }
+   }
+
+   return VK_SUCCESS;
 }

 VkResult
--- a/src/intel/vulkan/xe/anv_kmd_backend.c
+++ b/src/intel/vulkan/xe/anv_kmd_backend.c
@@ -242,10 +242,23 @@ xe_vm_bind_op(struct anv_device *device,
   if (signal_bind_timeline)
      intel_bind_timeline_bind_end(&device->bind_timeline);

+   /* The vm_bind ioctl can return a wide variety of error codes, but most of
+    * them shouldn't happen in the real world. Here we list the interesting
+    * error case:
+    *
+    * - EINVAL: shouldn't happen. This is most likely a bug in our driver.
+    * - ENOMEM: generic out-of-memory error.
+    * - ENOBUFS: an out-of-memory error that is related to having too many
+    *   bind operations in the same ioctl, so the recommendation here is to
+    *   try to issue fewer binds per ioctl (ideally 1).
+    *
+    * The xe.ko team has plans to differentiate between lack of device memory
+    * vs lack of host memory in the future.
+    */
   if (ret) {
      assert(errno_ != EINVAL);
-      if (errno_ == ENOMEM)
-         result = vk_error(device, VK_ERROR_OUT_OF_HOST_MEMORY);
+      if (errno_ == ENOMEM || errno_ == ENOBUFS)
+         result = VK_ERROR_OUT_OF_HOST_MEMORY;
      else
         result = vk_device_set_lost(&device->vk,
                                     "vm_bind failed with errno %d", errno_);