intel/compiler/gfx12.5+: Lower 64-bit cluster_broadcast with 32-bit ops

For MTL (verx10 == 125), float64 is supported, but int64 is not. Therefore we need to lower cluster broadcast using 32-bit int ops. For gfx12.5+ platforms that support int64, the register regions used by cluster broadcast aren't supported by the 64-bit pipeline. On MTL, dEQP-VK.subgroups.clustered.*_double* and dEQP-VK.subgroups.clustered.*_dvec* were failing to validate the compiled shader in debug mode, and reportedly gpu-hanging in release mode. With this change dEQP-VK.subgroups.clustered.*_double* passed all 48 tests and dEQP-VK.subgroups.clustered.*_dvec* passed all 140 tests on MTL. Rework: * Move from generator to brw_fs_lower_regioning.cpp. (Suggested by Francisco) * Apply to verx10 >= 125.. (Suggested by Francisco) Cc: 23.1 <mesa-stable> Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> (v1) Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22569>
2023-04-18 20:11:41 -04:00
parent 74ab940156
commit fcb72ffd0c
1 changed files with 8 additions and 1 deletions
--- a/src/intel/compiler/brw_fs_lower_regioning.cpp
+++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
@@ -174,10 +174,17 @@ namespace {
          *    integer DWord multiply, indirect addressing must not be
          *    used."
          *
+          * For MTL (verx10 == 125), float64 is supported, but int64 is not.
+          * Therefore we need to lower cluster broadcast using 32-bit int ops.
+          *
+          * For gfx12.5+ platforms that support int64, the register regions
+          * used by cluster broadcast aren't supported by the 64-bit pipeline.
+          *
          * Work around the above and handle platforms that don't
          * support 64-bit types at all.
          */
-         if ((!has_64bit || devinfo->platform == INTEL_PLATFORM_CHV ||
+         if ((!has_64bit || devinfo->verx10 >= 125 ||
+              devinfo->platform == INTEL_PLATFORM_CHV ||
              intel_device_info_is_9lp(devinfo)) && type_sz(t) > 4)
            return BRW_REGISTER_TYPE_UD;
         else