docs/panfrost: quote identifiers

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29902>
This commit is contained in:
Erik Faye-Lund
2024-06-25 20:40:15 +02:00
committed by Marge Bot
parent 577b9efa75
commit 0277d0321a

View File

@@ -16,33 +16,33 @@ One option would be to do:
\text{instance id} = \text{linear id} / \text{num vertices} \text{instance id} = \text{linear id} / \text{num vertices}
but this involves a costly division and modulus by an arbitrary number. but this involves a costly division and modulus by an arbitrary number.
Instead, we could pad num_vertices. We dispatch Instead, we could pad ``num_vertices``. We dispatch
:math:`\text{padded_num_vertices} \cdot \text{num_instances}` threads instead :math:`\text{padded_num_vertices} \cdot \text{num_instances}` threads instead
of :math:`\text{num_vertices} \cdot \text{num_instances}`, which results of :math:`\text{num_vertices} \cdot \text{num_instances}`, which results
in some "extra" threads with :math:`\text{vertex_id} \geq \text{num_vertices}`, in some "extra" threads with :math:`\text{vertex_id} \geq \text{num_vertices}`,
which we have to discard. The more we pad num_vertices, the more "wasted" which we have to discard. The more we pad ``num_vertices``, the more "wasted"
threads we dispatch, but the division is potentially easier. threads we dispatch, but the division is potentially easier.
One straightforward choice is to pad num_vertices to the next power of two, One straightforward choice is to pad ``num_vertices`` to the next power
which means that the division and modulus are just simple bit shifts and of two, which means that the division and modulus are just simple bit shifts
masking. But the actual algorithm is a bit more complicated. The thread and masking. But the actual algorithm is a bit more complicated. The thread
dispatcher has special support for dividing by 3, 5, 7, and 9, in addition dispatcher has special support for dividing by 3, 5, 7, and 9, in addition
to dividing by a power of two. As a result, padded_num_vertices can be to dividing by a power of two. As a result, ``padded_num_vertices`` can
1, 3, 5, 7, or 9 times a power of two. This results in less wasted threads, be 1, 3, 5, 7, or 9 times a power of two. This results in less wasted threads,
since we need less padding. since we need less padding.
padded_num_vertices is picked by the hardware. The driver just specifies the ``padded_num_vertices`` is picked by the hardware. The driver just specifies
actual number of vertices. Note that padded_num_vertices is a multiple of four the actual number of vertices. Note that ``padded_num_vertices`` is a multiple
(presumably because threads are dispatched in groups of 4). Also, of four (presumably because threads are dispatched in groups of 4). Also,
padded_num_vertices is always at least one more than num_vertices, which seems ``padded_num_vertices`` is always at least one more than ``num_vertices``,
like a quirk of the hardware. For larger num_vertices, the hardware uses the which seems like a quirk of the hardware. For larger ``num_vertices``, the
following algorithm: using the binary representation of num_vertices, we look at hardware uses the following algorithm: using the binary representation of
the most significant set bit as well as the following 3 bits. Let n be the ``num_vertices``, we look at the most significant set bit as well as the
number of bits after those 4 bits. Then we set padded_num_vertices according to following 3 bits. Let n be the number of bits after those 4 bits. Then we
the following table: set ``padded_num_vertices`` according to the following table:
========== ======================= ========== =======================
high bits padded_num_vertices high bits ``padded_num_vertices``
========== ======================= ========== =======================
1000 :math:`9 \cdot 2^n` 1000 :math:`9 \cdot 2^n`
1001 :math:`5 \cdot 2^{n+1}` 1001 :math:`5 \cdot 2^{n+1}`
@@ -56,32 +56,32 @@ For example, if :math:`\text{num_vertices} = 70` is passed to
and the high bits are 1000, and therefore and the high bits are 1000, and therefore
:math:`\text{padded_num_vertices} = 9 \cdot 2^3 = 72`. :math:`\text{padded_num_vertices} = 9 \cdot 2^3 = 72`.
The attribute unit works in terms of the original linear_id. if The attribute unit works in terms of the original ``linear_id``. if
:math:`\text{num_instances} = 1`, then they are the same, and everything :math:`\text{num_instances} = 1`, then they are the same, and everything
is simple. However, with instancing things get more complicated. There are is simple. However, with instancing things get more complicated. There are
four possible modes, two of them we can group together: four possible modes, two of them we can group together:
1. Use the linear_id directly. Only used when there is no instancing. 1. Use the ``linear_id`` directly. Only used when there is no instancing.
2. Use the linear_id modulo a constant. This is used for per-vertex 2. Use the ``linear_id`` modulo a constant. This is used for per-vertex
attributes with instancing enabled by making the constant equal attributes with instancing enabled by making the constant equal
padded_num_vertices. Because the modulus is always padded_num_vertices, this ``padded_num_vertices``. Because the modulus is always ``padded_num_vertices``,
mode only supports a modulus that is a power of 2 times 1, 3, 5, 7, or 9. this mode only supports a modulus that is a power of 2 times 1, 3, 5, 7,
The shift field specifies the power of two, while the extra_flags field or 9. The shift field specifies the power of two, while the ``extra_flags``
specifies the odd number. If :math:`\text{shift} = n` and field specifies the odd number. If :math:`\text{shift} = n` and
:math:`\text{extra_flags} = m`, then the modulus is :math:`\text{extra_flags} = m`, then the modulus is
:math:`(2m + 1) \cdot 2^n`. As an example, if :math:`(2m + 1) \cdot 2^n`. As an example, if
:math:`\text{num_vertices} = 70`, then as computed above, :math:`\text{num_vertices} = 70`, then as computed above,
:math:`\text{padded_num_vertices} = 9 \cdot 2^3`, so we should set :math:`\text{padded_num_vertices} = 9 \cdot 2^3`, so we should set
:math:`\text{extra_flags} = 4` and :math:`\text{shift} = 3`. Note that we :math:`\text{extra_flags} = 4` and :math:`\text{shift} = 3`. Note that we
must exactly follow the hardware algorithm used to get padded_num_vertices must exactly follow the hardware algorithm used to get ``padded_num_vertices``
in order to correctly implement per-vertex attributes. in order to correctly implement per-vertex attributes.
3. Divide the linear_id by a constant. In order to correctly implement 3. Divide the ``linear_id`` by a constant. In order to correctly implement
instance divisors, we have to divide linear_id by padded_num_vertices times instance divisors, we have to divide ``linear_id`` by ``padded_num_vertices``
to user-specified divisor. So first we compute padded_num_vertices, again times to user-specified divisor. So first we compute ``padded_num_vertices``,
following the exact same algorithm that the hardware uses, then multiply it again following the exact same algorithm that the hardware uses, then multiply
by the GL-level divisor to get the hardware-level divisor. This case is it by the GL-level divisor to get the hardware-level divisor. This case is
further divided into two more cases. If the hardware-level divisor is a further divided into two more cases. If the hardware-level divisor is a
power of two, then we just need to shift. The shift amount is specified by power of two, then we just need to shift. The shift amount is specified by
the shift field, so that the hardware-level divisor is just the shift field, so that the hardware-level divisor is just
@@ -96,7 +96,7 @@ https://ridiculousfish.com/files/faster_unsigned_division_by_constants.pdf.
The hardware further assumes the multiplier is between :math:`2^{31}` and The hardware further assumes the multiplier is between :math:`2^{31}` and
:math:`2^{32}`, so the high bit is implicitly set to 1 even though it is set :math:`2^{32}`, so the high bit is implicitly set to 1 even though it is set
to 0 by the driver -- presumably this simplifies the hardware multiplier a to 0 by the driver -- presumably this simplifies the hardware multiplier a
little. The hardware first multiplies linear_id by the multiplier and little. The hardware first multiplies ``linear_id`` by the multiplier and
takes the high 32 bits, then applies the round-down correction if takes the high 32 bits, then applies the round-down correction if
:math:`\text{extra_flags} = 1`, then finally shifts right by the shift field. :math:`\text{extra_flags} = 1`, then finally shifts right by the shift field.