radeonsi: adjust tess SGPRs to allow fully occupied 3 HS waves of triangles

With triangles and 3 HS waves, 3 lanes were unoccupied. Adjust the SGPR
encoding to allow 1 more triangle to fit there.

Some of the fields are not large enough, but they weren't large enough
before either.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7623>
This commit is contained in:
Marek Olšák
2020-11-12 22:07:56 -05:00
committed by Marge Bot
parent 9659384744
commit 9b5b5cbc53
3 changed files with 18 additions and 13 deletions

View File

@@ -117,25 +117,25 @@ struct si_shader_context {
/* API TCS & TES */
/* Layout of TCS outputs in the offchip buffer
* # 6 bits
* [0:5] = the number of patches per threadgroup, max = NUM_PATCHES (40)
* # 6 bits
* [6:11] = the number of output vertices per patch, max = 32
* # 20 bits
* [12:31] = the offset of per patch attributes in the buffer in bytes.
* max = NUM_PATCHES*32*32*16
* [0:5] = the number of patches per threadgroup - 1, max = 63
* # 5 bits
* [6:10] = the number of output vertices per patch - 1, max = 31
* # 21 bits
* [11:31] = the offset of per patch attributes in the buffer in bytes.
* max = NUM_PATCHES*32*32*16 = 1M
*/
struct ac_arg tcs_offchip_layout;
/* API TCS */
/* Offsets where TCS outputs and TCS patch outputs live in LDS:
* [0:15] = TCS output patch0 offset / 16, max = NUM_PATCHES * 32 * 32
* [0:15] = TCS output patch0 offset / 16, max = NUM_PATCHES * 32 * 32 = 64K (TODO: not enough bits)
* [16:31] = TCS output patch0 offset for per-patch / 16
* max = (NUM_PATCHES + 1) * 32*32
* max = (NUM_PATCHES + 1) * 32*32 = 66624 (TODO: not enough bits)
*/
struct ac_arg tcs_out_lds_offsets;
/* Layout of TCS outputs / TES inputs:
* [0:12] = stride between output patches in DW, num_outputs * num_vertices * 4
* max = 32*32*4 + 32*4
* max = 32*32*4 + 32*4 = 4224
* [13:18] = gl_PatchVerticesIn, max = 32
* [19:31] = high 13 bits of the 32-bit address of tessellation ring buffers
*/