intel/compiler: Add and use a pass to generate imul_32x16 instructions
Gfx8 and Gfx9 platforms are helped for cycles because now many instructions like mul(8) g12<1>D g10<8,8,1>D 6D become mul(8) g12<1>D g10<8,8,1>D 6W It is the same number of instructions, but the 32x16 multiply is a little faster. v2: Fix transposed hi and lo in "(hi >= INT16_MIN && lo <= INT16_MAX)". Noticed by Caio. Use nir_src_is_const instead of open coding it. Suggested by Caio. Broadwell and Skylake had similar results. (Skylake shown) total cycles in shared programs: 845748380 -> 845145547 (-0.07%) cycles in affected programs: 446346348 -> 445743515 (-0.14%) helped: 6017 HURT: 0 helped stats (abs) min: 2 max: 7380 x̄: 100.19 x̃: 8 helped stats (rel) min: <.01% max: 3.72% x̄: 0.41% x̃: 0.39% 95% mean confidence interval for cycles value: -113.37 -87.00 95% mean confidence interval for cycles %-change: -0.42% -0.41% Cycles are helped. Skylake Cycles in all programs: 8844820715 -> 8828897462 (-0.2%) Cycles helped: 47914 Cycles hurt: 1 No shader-db or fossil-db changes on any other Intel platform. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>
This commit is contained in:
@@ -176,6 +176,8 @@ void brw_nir_analyze_ubo_ranges(const struct brw_compiler *compiler,
|
||||
|
||||
bool brw_nir_opt_peephole_ffma(nir_shader *shader);
|
||||
|
||||
bool brw_nir_opt_peephole_imul32x16(nir_shader *shader);
|
||||
|
||||
void brw_nir_optimize(nir_shader *nir,
|
||||
const struct brw_compiler *compiler,
|
||||
bool is_scalar,
|
||||
|
Reference in New Issue
Block a user