intel/clflush: Utilize clflushopt in intel_invalidate_range

On MTL ChromeOS boards, during AI based video conference, we were
observing a lot of overhead from invalidations. Upon debug, it was found
that we were using clflush in this function and that isn't efficient.

With this change, while executing compute workloads like zoo models, we
are getting ~25% performance improvements in a best case scenario.

Rework:
 * Jordan: Call intel_clflushopt_range() rather than
   __builtin_ia32_clflushopt() because intel_mem.c is not compiled
   with -mclflushopt.

Backport-to: 24.1 24.2
Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30238>
This commit is contained in:
Sushma Venkatesh Reddy
2024-07-17 18:55:46 -07:00
committed by Marge Bot
parent fd0592afd3
commit 2f6919e6c2

View File

@@ -78,7 +78,7 @@ intel_invalidate_range(void *start, size_t size)
if (size == 0)
return;
intel_clflush_range(start, size);
intel_flush_range_no_fence(start, size);
/* Modern Atom CPUs (Baytrail+) have issues with clflush serialization,
* where mfence is not a sufficient synchronization barrier. We must
@@ -90,6 +90,15 @@ intel_invalidate_range(void *start, size_t size)
* ("drm: Restore double clflush on the last partial cacheline")
* and https://bugs.freedesktop.org/show_bug.cgi?id=92845.
*/
#ifdef HAVE___BUILTIN_IA32_CLFLUSHOPT
/* clflushopt doesn't include an mfence like clflush */
if (util_get_cpu_caps()->has_clflushopt) {
__builtin_ia32_mfence();
intel_clflushopt_range(start + size - 1, 1);
__builtin_ia32_mfence();
return;
}
#endif
__builtin_ia32_clflush(start + size - 1);
__builtin_ia32_mfence();
}