docs/panfrost: use math-role more

This renders cleaner and more consistent with the other math around
here.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29902>
This commit is contained in:
Erik Faye-Lund
2024-06-25 15:06:49 +02:00
committed by Marge Bot
parent 7033623acd
commit a5f892b5cb

View File

@@ -16,11 +16,12 @@ One option would be to do:
\text{instance id} = \text{linear id} / \text{num vertices}
but this involves a costly division and modulus by an arbitrary number.
Instead, we could pad num_vertices. We dispatch padded_num_vertices *
num_instances threads instead of num_vertices * num_instances, which results
in some "extra" threads with vertex_id >= num_vertices, which we have to
discard. The more we pad num_vertices, the more "wasted" threads we
dispatch, but the division is potentially easier.
Instead, we could pad num_vertices. We dispatch
:math:`\text{padded_num_vertices} \cdot \text{num_instances}` threads instead
of :math:`\text{num_vertices} \cdot \text{num_instances}`, which results
in some "extra" threads with :math:`\text{vertex_id} \geq \text{num_vertices}`,
which we have to discard. The more we pad num_vertices, the more "wasted"
threads we dispatch, but the division is potentially easier.
One straightforward choice is to pad num_vertices to the next power of two,
which means that the division and modulus are just simple bit shifts and
@@ -50,14 +51,15 @@ high bits padded_num_vertices
111x :math:`2^{n+4}`
========== =======================
For example, if num_vertices = 70 is passed to glDraw(), its binary
representation is 1000110, so n = 3 and the high bits are 1000, and
therefore padded_num_vertices = :math:`9 \cdot 2^3` = 72.
For example, if :math:`\text{num_vertices} = 70` is passed to glDraw(),
its binary representation is 1000110, so :math:`n = 3` and the high bits
are 1000, and therefore
:math:`\text{padded_num_vertices} = 9 \cdot 2^3 = 72`.
The attribute unit works in terms of the original linear_id. if
num_instances = 1, then they are the same, and everything is simple.
However, with instancing things get more complicated. There are four
possible modes, two of them we can group together:
:math:`\text{num_instances} = 1`, then they are the same, and everything
is simple. However, with instancing things get more complicated. There are
four possible modes, two of them we can group together:
1. Use the linear_id directly. Only used when there is no instancing.
@@ -66,12 +68,14 @@ attributes with instancing enabled by making the constant equal
padded_num_vertices. Because the modulus is always padded_num_vertices, this
mode only supports a modulus that is a power of 2 times 1, 3, 5, 7, or 9.
The shift field specifies the power of two, while the extra_flags field
specifies the odd number. If shift = n and extra_flags = m, then the modulus
is :math:`(2m + 1) \cdot 2^n`. As an example, if num_vertices = 70, then as
computed above, padded_num_vertices = :math:`9 \cdot 2^3`, so we should set
extra_flags = 4 and shift = 3. Note that we must exactly follow the hardware
algorithm used to get padded_num_vertices in order to correctly implement
per-vertex attributes.
specifies the odd number. If :math:`\text{shift} = n` and
:math:`\text{extra_flags} = m`, then the modulus is
:math:`(2m + 1) \cdot 2^n`. As an example, if
:math:`\text{num_vertices} = 70`, then as computed above,
:math:`\text{padded_num_vertices} = 9 \cdot 2^3`, so we should set
:math:`\text{extra_flags} = 4` and :math:`\text{shift} = 3`. Note that we
must exactly follow the hardware algorithm used to get padded_num_vertices
in order to correctly implement per-vertex attributes.
3. Divide the linear_id by a constant. In order to correctly implement
instance divisors, we have to divide linear_id by padded_num_vertices times
@@ -94,7 +98,7 @@ The hardware further assumes the multiplier is between :math:`2^{31}` and
to 0 by the driver -- presumably this simplifies the hardware multiplier a
little. The hardware first multiplies linear_id by the multiplier and
takes the high 32 bits, then applies the round-down correction if
extra_flags = 1, then finally shifts right by the shift field.
:math:`\text{extra_flags} = 1`, then finally shifts right by the shift field.
There are some differences between ridiculousfish's algorithm and the Mali
hardware algorithm, which means that the reference code from ridiculousfish
@@ -105,8 +109,9 @@ It also forces the multiplier to be at least :math:`2^{31}`, which means
that the exponent is entirely fixed, so there is no trial-and-error.
Altogether, given the divisor d, the algorithm the driver must follow is:
1. Set shift = :math:`\lfloor \log_2(d) \rfloor`.
1. Set :math:`\text{shift} = \lfloor \log_2(d) \rfloor`.
2. Compute :math:`m = \lceil 2^{shift + 32} / d \rceil` and :math:`e = 2^{shift + 32} % d`.
3. If :math:`e \leq 2^{shift}`, then we need to use the round-down algorithm. Set
magic_divisor = m - 1 and extra_flags = 1.
4. Otherwise, set magic_divisor = m and extra_flags = 0.
3. If :math:`e \leq 2^{shift}`, then we need to use the round-down algorithm.
Set :math:`\text{magic_divisor} = m - 1` and :math:`\text{extra_flags} = 1`.
4. Otherwise, set :math:`\text{magic_divisor} = m` and
:math:`\text{extra_flags} = 0`.