docs/panfrost: use math-role more

This renders cleaner and more consistent with the other math around here. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29902>
2024-06-25 15:06:49 +02:00
parent 7033623acd
commit a5f892b5cb
1 changed files with 27 additions and 22 deletions
--- a/docs/drivers/panfrost/instancing.rst
+++ b/docs/drivers/panfrost/instancing.rst
@@ -16,11 +16,12 @@ One option would be to do:
   \text{instance id} = \text{linear id} / \text{num vertices}

 but this involves a costly division and modulus by an arbitrary number.
-Instead, we could pad num_vertices. We dispatch padded_num_vertices *
-num_instances threads instead of num_vertices * num_instances, which results
-in some "extra" threads with vertex_id >= num_vertices, which we have to
-discard.  The more we pad num_vertices, the more "wasted" threads we
-dispatch, but the division is potentially easier.
+Instead, we could pad num_vertices. We dispatch
+:math:`\text{padded_num_vertices} \cdot \text{num_instances}` threads instead
+of :math:`\text{num_vertices} \cdot \text{num_instances}`, which results
+in some "extra" threads with :math:`\text{vertex_id} \geq \text{num_vertices}`,
+which we have to discard.  The more we pad num_vertices, the more "wasted"
+threads we dispatch, but the division is potentially easier.

 One straightforward choice is to pad num_vertices to the next power of two,
 which means that the division and modulus are just simple bit shifts and
@@ -50,14 +51,15 @@ high bits   padded_num_vertices
 111x		   :math:`2^{n+4}`
 ==========  =======================

-For example, if num_vertices = 70 is passed to glDraw(), its binary
-representation is 1000110, so n = 3 and the high bits are 1000, and
-therefore padded_num_vertices = :math:`9 \cdot 2^3` = 72.
+For example, if :math:`\text{num_vertices} = 70` is passed to glDraw(),
+its binary representation is 1000110, so :math:`n = 3` and the high bits
+are 1000, and therefore
+:math:`\text{padded_num_vertices} = 9 \cdot 2^3 = 72`.

 The attribute unit works in terms of the original linear_id. if
-num_instances = 1, then they are the same, and everything is simple.
-However, with instancing things get more complicated. There are four
-possible modes, two of them we can group together:
+:math:`\text{num_instances} = 1`, then they are the same, and everything
+is simple. However, with instancing things get more complicated. There are
+four possible modes, two of them we can group together:

 1. Use the linear_id directly. Only used when there is no instancing.

@@ -66,12 +68,14 @@ attributes with instancing enabled by making the constant equal
 padded_num_vertices. Because the modulus is always padded_num_vertices, this
 mode only supports a modulus that is a power of 2 times 1, 3, 5, 7, or 9.
 The shift field specifies the power of two, while the extra_flags field
-specifies the odd number. If shift = n and extra_flags = m, then the modulus
-is :math:`(2m + 1) \cdot 2^n`. As an example, if num_vertices = 70, then as
-computed above, padded_num_vertices = :math:`9 \cdot 2^3`, so we should set
-extra_flags = 4 and shift = 3. Note that we must exactly follow the hardware
-algorithm used to get padded_num_vertices in order to correctly implement
-per-vertex attributes.
+specifies the odd number. If :math:`\text{shift} = n` and
+:math:`\text{extra_flags} = m`, then the modulus is
+:math:`(2m + 1) \cdot 2^n`. As an example, if
+:math:`\text{num_vertices} = 70`, then as computed above,
+:math:`\text{padded_num_vertices} = 9 \cdot 2^3`, so we should set
+:math:`\text{extra_flags} = 4` and :math:`\text{shift} = 3`. Note that we
+must exactly follow the hardware algorithm used to get padded_num_vertices
+in order to correctly implement per-vertex attributes.

 3. Divide the linear_id by a constant. In order to correctly implement
 instance divisors, we have to divide linear_id by padded_num_vertices times
@@ -94,7 +98,7 @@ The hardware further assumes the multiplier is between :math:`2^{31}` and
 to 0 by the driver -- presumably this simplifies the hardware multiplier a
 little. The hardware first multiplies linear_id by the multiplier and
 takes the high 32 bits, then applies the round-down correction if
-extra_flags = 1, then finally shifts right by the shift field.
+:math:`\text{extra_flags} = 1`, then finally shifts right by the shift field.

 There are some differences between ridiculousfish's algorithm and the Mali
 hardware algorithm, which means that the reference code from ridiculousfish
@@ -105,8 +109,9 @@ It also forces the multiplier to be at least :math:`2^{31}`, which means
 that the exponent is entirely fixed, so there is no trial-and-error.
 Altogether, given the divisor d, the algorithm the driver must follow is:

-1. Set shift = :math:`\lfloor \log_2(d) \rfloor`.
+1. Set :math:`\text{shift} = \lfloor \log_2(d) \rfloor`.
 2. Compute :math:`m = \lceil 2^{shift + 32} / d \rceil` and :math:`e = 2^{shift + 32} % d`.
-3. If :math:`e \leq 2^{shift}`, then we need to use the round-down algorithm. Set
-   magic_divisor = m - 1 and extra_flags = 1.
-4. Otherwise, set magic_divisor = m and extra_flags = 0.
+3. If :math:`e \leq 2^{shift}`, then we need to use the round-down algorithm.
+   Set :math:`\text{magic_divisor} = m - 1` and :math:`\text{extra_flags} = 1`.
+4. Otherwise, set :math:`\text{magic_divisor} = m` and
+   :math:`\text{extra_flags} = 0`.