iris: Use more efficient binding table pointer formats on Icelake+.

Skylake and older use a 15:5 binding table pointer format, which means
our binder can be at most 64kB in size.  Each binding table within the
binder must be aligned to 32B.

XeHP uses a new 20:5 binding table format, which allows us to increase
the binder size to 1MB while retaining the nice 32B alignment.  Larger
binders mean fewer stalls as we update the base address for the binder.

Icelake and Tigerlake can either use the 15:5 format or an 18:8 format.
18:8 mode requires the base of each binding table to be aligned to 256B
instead of 32B, but it gives us a maximum binder size of 512kB.

We can store 64 binding table entries in a 256B chunk (256B / 4B = 64),
but only 8 entries in a 32B chunk (32B / 4B = 8).  Assuming that most
binding tables have fewer than 64 entries, this means that with the 18:8
format, we're likely to be able to fit 2048 (512KB / 256B) tables into a
a buffer before needing to allocate a new one and stall.

Technically, the old format could also store 2048 binding tables per
buffer as well (64KB / 32B = 2048).  However, tables that needed more
than 8 entries would need multiple 32B chunks.  A single table would
take multiple aligned chunks, while with the larger 256B format, it
could fit in a single one.

This cuts binder resets by 6.3% on a Shadow of Mordor benchmark trace.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14507>
This commit is contained in:
Kenneth Graunke
2019-06-25 13:16:50 -07:00
committed by Marge Bot
parent a83c91a261
commit db34c71513
6 changed files with 74 additions and 24 deletions

View File

@@ -154,7 +154,7 @@ blorp_alloc_binding_table(struct blorp_batch *blorp_batch,
unsigned num_entries,
unsigned state_size,
unsigned state_alignment,
uint32_t *bt_offset,
uint32_t *out_bt_offset,
uint32_t *surface_offsets,
void **surface_maps)
{
@@ -162,8 +162,11 @@ blorp_alloc_binding_table(struct blorp_batch *blorp_batch,
struct iris_binder *binder = &ice->state.binder;
struct iris_batch *batch = blorp_batch->driver_batch;
*bt_offset = iris_binder_reserve(ice, num_entries * sizeof(uint32_t));
uint32_t *bt_map = binder->map + *bt_offset;
unsigned bt_offset =
iris_binder_reserve(ice, num_entries * sizeof(uint32_t));
uint32_t *bt_map = binder->map + bt_offset;
*out_bt_offset = bt_offset;
for (unsigned i = 0; i < num_entries; i++) {
surface_maps[i] = stream_state(batch, ice->state.surface_uploader,
@@ -181,7 +184,8 @@ static uint32_t
blorp_binding_table_offset_to_pointer(struct blorp_batch *batch,
uint32_t offset)
{
return offset;
/* See IRIS_BT_OFFSET_SHIFT in iris_state.c */
return offset >> ((GFX_VER >= 11 && GFX_VERx10 < 125) ? 3 : 0);
}
static void *