radv,aco: allow unaligned LDS access on GFX9+

fossil-db (GFX10.3):
Totals from 223 (0.16% of 139391) affected shaders:
SGPRs: 10032 -> 10096 (+0.64%)
VGPRs: 7480 -> 7592 (+1.50%)
CodeSize: 853960 -> 821920 (-3.75%); split: -3.76%, +0.01%
MaxWaves: 5916 -> 5908 (-0.14%)
Instrs: 154935 -> 150281 (-3.00%); split: -3.01%, +0.01%
Cycles: 3202496 -> 3080680 (-3.80%); split: -3.81%, +0.00%
VMEM: 48187 -> 46671 (-3.15%); split: +0.29%, -3.44%
SMEM: 13869 -> 13850 (-0.14%); split: +1.52%, -1.66%
VClause: 3110 -> 3085 (-0.80%); split: -1.03%, +0.23%
SClause: 4376 -> 4381 (+0.11%)
Copies: 12132 -> 12065 (-0.55%); split: -2.61%, +2.06%
Branches: 5204 -> 5203 (-0.02%)
PreVGPRs: 6304 -> 6359 (+0.87%); split: -0.10%, +0.97%

See https://reviews.llvm.org/D82788

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8762>
This commit is contained in:
Rhys Perry
2020-07-16 11:43:50 +01:00
committed by Marge Bot
parent c2d57f55a8
commit 1a0b0e8460
4 changed files with 25 additions and 12 deletions

View File

@@ -3053,6 +3053,9 @@ mem_vectorize_callback(unsigned align_mul, unsigned align_offset,
nir_intrinsic_instr *low, nir_intrinsic_instr *high,
void *data)
{
struct radv_device *device = data;
enum chip_class chip = device->physical_device->rad_info.chip_class;
if (num_components > 4)
return false;
@@ -3081,9 +3084,9 @@ mem_vectorize_callback(unsigned align_mul, unsigned align_offset,
FALLTHROUGH;
case nir_intrinsic_load_shared:
case nir_intrinsic_store_shared:
if (bit_size * num_components == 96) /* 96 bit loads require 128 bit alignment and are split otherwise */
if (chip < GFX9 && bit_size * num_components == 96) /* 96 bit loads require 128 bit alignment on GFX6-8 and are split otherwise */
return align % 16 == 0;
else if (bit_size * num_components == 128) /* 128 bit loads require 64 bit alignment and are split otherwise */
else if (chip < GFX9 && bit_size * num_components == 128) /* 128 bit loads require 64 bit alignment on GFX6-8 and are split otherwise */
return align % 8 == 0;
else
return align % (bit_size == 8 ? 2 : 4) == 0;
@@ -3330,6 +3333,7 @@ VkResult radv_create_shaders(struct radv_pipeline *pipeline,
nir_var_mem_push_const | nir_var_mem_shared |
nir_var_mem_global,
.callback = mem_vectorize_callback,
.cb_data = device,
.robust_modes = 0,
};