We don't have any locking issues yet because we use the pool size itself as
a mutex in block_pool_alloc to guarantee that only one thread is resizing
at a time. However, we are about to add support for growing the block pool
at both ends. This introduces two potential races:
1) You could have two block_pool_alloc() calls that both try to grow the
block pool, one from each end.
2) The relocation handling code will now have to think about not only the
bo that we use for the block pool but also the offset from the start of
that bo to the center of the block pool. It's possible that the block
pool growing code could race with the relocation handling code and get
a bo and offset out of sync.
Grabbing the device mutex solves both of these problems. Thanks to (2), we
can't really do anything more granular.
The anv_block_pool data structure suffered from the exact same race as the
state pool. Namely, that the uniqueness of the blocks handed out depends
on the next_block value increasing monotonically. However, this invariant
did not hold thanks to our block "return" concept.
The previous algorithm had a race because of the way we were using
__sync_fetch_and_add for everything. In particular, the concept of
"returning" over-allocated states in the "next > end" case was completely
bogus. If too many threads were hitting the state pool at the same time,
it was possible to have the following sequence:
A: Get an offset (next == end)
B: Get an offset (next > end)
A: Resize the pool (now next < end by a lot)
C: Get an offset (next < end)
B: Return the over-allocated offset
D: Get an offset
in which case D will get the same offset as C. The solution to this race
is to get rid of the concept of "returning" over-allocated states.
Instead, the thread that gets a new block simply sets the next and end
offsets directly and threads that over-allocate don't return anything and
just futex-wait. Since you can only ever hit the over-allocate case if
someone else hit the "next == end" case and hasn't resized yet, you're
guaranteed that the end value will get updated and the futex won't block
forever.
We have pools, so we should be using them. Also, I think this will help
keep valgrind from getting confused when we have to end up fighting with
system allocations such as those from malloc/free and mmap/munmap.
Jason started the task by creating anv_cmd_buffer.c and anv_cmd_emit.c.
This patch finishes the task by renaming all other files except
gen*_pack.h and glsl_scraper.py.