intel/clflush: Utilize clflushopt in intel_invalidate_range
On MTL ChromeOS boards, during AI based video conference, we were observing a lot of overhead from invalidations. Upon debug, it was found that we were using clflush in this function and that isn't efficient. With this change, while executing compute workloads like zoo models, we are getting ~25% performance improvements in a best case scenario. Rework: * Jordan: Call intel_clflushopt_range() rather than __builtin_ia32_clflushopt() because intel_mem.c is not compiled with -mclflushopt. Backport-to: 24.1 24.2 Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30238>
This commit is contained in:

committed by
Marge Bot

parent
fd0592afd3
commit
2f6919e6c2
@@ -78,7 +78,7 @@ intel_invalidate_range(void *start, size_t size)
|
||||
if (size == 0)
|
||||
return;
|
||||
|
||||
intel_clflush_range(start, size);
|
||||
intel_flush_range_no_fence(start, size);
|
||||
|
||||
/* Modern Atom CPUs (Baytrail+) have issues with clflush serialization,
|
||||
* where mfence is not a sufficient synchronization barrier. We must
|
||||
@@ -90,6 +90,15 @@ intel_invalidate_range(void *start, size_t size)
|
||||
* ("drm: Restore double clflush on the last partial cacheline")
|
||||
* and https://bugs.freedesktop.org/show_bug.cgi?id=92845.
|
||||
*/
|
||||
#ifdef HAVE___BUILTIN_IA32_CLFLUSHOPT
|
||||
/* clflushopt doesn't include an mfence like clflush */
|
||||
if (util_get_cpu_caps()->has_clflushopt) {
|
||||
__builtin_ia32_mfence();
|
||||
intel_clflushopt_range(start + size - 1, 1);
|
||||
__builtin_ia32_mfence();
|
||||
return;
|
||||
}
|
||||
#endif
|
||||
__builtin_ia32_clflush(start + size - 1);
|
||||
__builtin_ia32_mfence();
|
||||
}
|
||||
|
Reference in New Issue
Block a user