docs/freedreno: Extract LRZ docs from tu_lrz

Most of the docs describe HW and are not specific to Turnip. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20491>
2023-01-03 13:34:24 +01:00
parent 22543653d5
commit e176eb6c39
3 changed files with 132 additions and 82 deletions
--- a/docs/drivers/freedreno.rst
+++ b/docs/drivers/freedreno.rst
@@ -26,6 +26,11 @@ Adreno is a mostly tile-mode renderer, but with the option to bypass tiling
 mostly write combined memory but with the ability to map some buffers as cache
 coherent with the CPU.

+.. toctree::
+   :glob:
+
+   freedreno/hw/*
+
 Hardware acronyms
 ^^^^^^^^^^^^^^^^^

--- a/docs/drivers/freedreno/hw/lrz.rst
+++ b/docs/drivers/freedreno/hw/lrz.rst
@@ -0,0 +1,122 @@
+Low Resolution Z Buffer
+=======================
+
+This doc is based on a6xx HW reverse engineering, a5xx should be similar to
+a6xx before gen3.
+
+Low Resolution Z buffer is very similar to a depth prepass that helps
+the HW to avoid executing the fragment shader on those fragments that will
+be subsequently discarded by the depth test afterwards.
+
+The interesting part of this feature is that it allows applications
+to submit the vertices in any order.
+
+Citing official Adreno documentation:
+
+::
+
+  [A Low Resolution Z (LRZ)] pass is also referred to as draw order independent
+  depth rejection. During the binning pass, a low resolution Z-buffer is constructed,
+  and can reject LRZ-tile wide contributions to boost binning performance. This LRZ
+  is then used during the rendering pass to reject pixels efficiently before testing
+  against the full resolution Z-buffer.
+
+TODO: a7xx
+
+Limitations
+-----------
+
+There are two main limitations of LRZ:
+
+- Since LRZ is an early depth test, such test cannot be used when late-z is required;
+- LRZ buffer could be formed only in one direction, changing depth comparison directions
+  without disabling LRZ would lead to a malformed LRZ buffer.
+
+Pre-a650 (before gen3)
+----------------------
+
+The direction is fully tracked on CPU. In renderpass LRZ starts with
+unknown direction, the direction is set first time when depth write occurs
+and if it does change afterwards then the direction becomes invalid and LRZ is
+disabled for the rest of the renderpass.
+
+Since the direction is not tracked by the GPU, it's impossible to know whether
+LRZ is enabled during construction of secondary command buffers.
+
+For the same reason, it's impossible to reuse LRZ between renderpasses.
+
+A650+ (gen3+)
+-------------
+
+Now LRZ direction can be tracked on GPU. There are two parts:
+
+- Direction byte which stores current LRZ direction - ``GRAS_LRZ_CNTL.DIR``.
+- Parameters of the last used depth view - ``GRAS_LRZ_DEPTH_VIEW``.
+
+The idea is the same as when LRZ tracked on CPU: when ``GRAS_LRZ_CNTL``
+is used, its direction is compared to the previously known direction
+and direction byte is set to disabled when directions are incompatible.
+
+Additionally, to reuse LRZ between renderpasses, ``GRAS_LRZ_CNTL`` checks
+if the current value of ``GRAS_LRZ_DEPTH_VIEW`` is equal to the value
+stored in the buffer. If not, LRZ is disabled. This is necessary
+because depth buffer may have several layers and mip levels, while the
+LRZ buffer represents only a single layer + mip level.
+
+LRZ Fast-Clear
+--------------
+
+The LRZ fast-clear buffer is initialized to zeroes and read/written
+when ``GRAS_LRZ_CNTL.FC_ENABLE`` is set. It appears to store 1b/block.
+``0`` means block has original depth clear value, and ``1`` means that the
+corresponding block in LRZ has been modified.
+
+LRZ fast-clear conservatively clears LRZ buffer. At the point where LRZ is
+written the LRZ block which corresponds to a single fast-clear bit is cleared:
+
+- To ``0.0`` if depth comparison is ``GREATER``
+- To ``1.0`` if depth comparison is ``LESS``
+
+This way it's always valid to fast-clear.
+
+LRZ Precision
+-------------
+
+LRZ always uses ``Z16_UNORM``. The epsilon for it is ``1.f / (1 << 16)`` which is
+not enough to represent all values of ``Z32_UNORM`` or ``Z32_FLOAT``.
+This especially raises questions in context of fast-clear, if fast-clear
+uses a value which cannot be precisely represented by LRZ - we wouldn't
+be able to round it in the correct direction since direction is tracked
+on GPU.
+
+However, it seems that depth comparisons with LRZ values have some "slack"
+and nothing special should be done for such depth clear values.
+
+How it was tested:
+
+- Clear ``Z32_FLOAT`` attachment to ``1.f / (1 << 17)``
+
+  - LRZ buffer contains all zeroes.
+
+- Do draws and check whether all samples are passing:
+
+  - ``OP_GREATER`` with ``(1.f / (1 << 17) + float32_epsilon)`` - passing;
+  - ``OP_GREATER`` with ``(1.f / (1 << 17) - float32_epsilon)`` - not passing;
+  - ``OP_LESS`` with ``(1.f / (1 << 17) - float32_epsilon)`` - samples;
+  - ``OP_LESS`` with ``(1.f / (1 << 17) + float32_epsilon)``- not passing;
+  - ``OP_LESS_OR_EQ`` with ``(1.f / (1 << 17) + float32_epsilon)`` - not passing.
+
+In all cases resulting LRZ buffer is all zeroes and LRZ direction is updated.
+
+LRZ Caches
+----------
+
+``LRZ_FLUSH`` flushes and invalidates LRZ caches, there are two caches:
+
+- Cache for fast-clear buffer;
+- Cache for direction byte + depth view params.
+
+They could be cleared by ``LRZ_CLEAR``. To become visible in GPU memory
+the caches should be flushed with ``LRZ_FLUSH`` afterwards.
+
+``GRAS_LRZ_CNTL`` reads from these caches.
--- a/src/freedreno/vulkan/tu_lrz.c
+++ b/src/freedreno/vulkan/tu_lrz.c
@@ -10,16 +10,7 @@
 #include "tu_cs.h"
 #include "tu_image.h"

-/* Low-resolution Z buffer is very similar to a depth prepass that helps
- * the HW avoid executing the fragment shader on those fragments that will
- * be subsequently discarded by the depth test afterwards.
- *
- * The interesting part of this feature is that it allows applications
- * to submit the vertices in any order.
- *
- * In the binning pass it is possible to store the depth value of each
- * vertex into internal low resolution depth buffer and quickly test
- * the primitives against it during the render pass.
+/* See lrz.rst for how HW works. Here are only the implementation notes.
 *
 * There are a number of limitations when LRZ cannot be used:
 * - Fragment shader side-effects (writing to SSBOs, atomic operations, etc);
@@ -32,38 +23,12 @@
 * - (pre-a650) Using secondary command buffers;
 * - Sysmem rendering (with small caveat).
 *
- * Pre-a650 (before gen3)
- * ======================
- *
- * The direction is fully tracked on CPU. In renderpass LRZ starts with
- * unknown direction, the direction is set first time when depth write occurs
- * and if it does change afterwards - direction becomes invalid and LRZ is
- * disabled for the rest of the renderpass.
- *
- * Since direction is not tracked by GPU - it's impossible to know whether
- * LRZ is enabled during construction of secondary command buffers.
- *
- * For the same reason it's impossible to reuse LRZ between renderpasses.
- *
 * A650+ (gen3+)
 * =============
 *
- * Now LRZ direction could be tracked on GPU. There are to parts:
- * - Direction byte which stores current LRZ direction;
- * - Parameters of the last used depth view.
- *
- * The idea is the same as when LRZ tracked on CPU: when GRAS_LRZ_CNTL
- * is used - its direction is compared to previously known direction
- * and direction byte is set to disabled when directions are incompatible.
- *
- * Additionally, to reuse LRZ between renderpasses, GRAS_LRZ_CNTL checks
- * if current value of GRAS_LRZ_DEPTH_VIEW is equal to the value
- * stored in the buffer, if not - LRZ is disabled. (This is necessary
- * because depth buffer may have several layers and mip levels, on the
- * other hand LRZ buffer represents only a single layer + mip level).
- *
- * LRZ direction between renderpasses is disabled when underlying depth
- * buffer is changed, the following commands could change depth image:
+ * While LRZ could be reused between renderpasses LRZ, it is disabled when
+ * underlying depth buffer is changed.
+ * The following commands could change a depth image:
 * - vkCmdBlitImage*
 * - vkCmdCopyBufferToImage*
 * - vkCmdCopyImage*
@@ -71,59 +36,17 @@
 * LRZ Fast-Clear
 * ==============
 *
- * The LRZ fast-clear buffer is initialized to zeroes and read/written
- * when GRAS_LRZ_CNTL.FC_ENABLE (b3) is set. It appears to store 1b/block.
- * '0' means block has original depth clear value, and '1' means that the
- * corresponding block in LRZ has been modified.
- *
- * LRZ fast-clear conservatively clears LRZ buffer, at the point where LRZ is
- * written the LRZ block which corresponds to a single fast-clear bit is cleared:
- * - To 0.0 if depth comparison is GREATER;
- * - To 1.0 if depth comparison is LESS;
- *
- * This way it's always valid to fast-clear. On the other hand we disable
+ * It's always valid to fast-clear. On the other hand we disable
 * fast-clear if depth clear value is not 0.0 or 1.0 because it may be worse
 * for perf if some primitives are expected to fail depth test against the
 * actual depth clear value.
 *
- * LRZ Precision
- * =============
- *
- * LRZ always uses Z16_UNORM. The epsilon for it is 1.f / (1 << 16) which is
- * not enough to represent all values of Z32_UNORM or Z32_FLOAT.
- * This especially rises questions in context of fast-clear, if fast-clear
- * uses a value which cannot be precisely represented by LRZ - we wouldn't
- * be able to round it in the correct direction since direction is tracked
- * on GPU.
- *
- * However, it seems that depth comparisons with LRZ values have some "slack"
- * and nothing special should be done for such depth clear values.
- *
- * How it was tested:
- * - Clear Z32_FLOAT attachment to 1.f / (1 << 17)
- *   - LRZ buffer contains all zeroes
- * - Do draws and check whether all samples are passing:
- *   - OP_GREATER with (1.f / (1 << 17) + float32_epsilon) - passing;
- *   - OP_GREATER with (1.f / (1 << 17) - float32_epsilon) - not passing;
- *   - OP_LESS with (1.f / (1 << 17) - float32_epsilon) - samples;
- *   - OP_LESS with() 1.f / (1 << 17) + float32_epsilon) - not passing;
- *   - OP_LESS_OR_EQ with (1.f / (1 << 17) + float32_epsilon) - not passing;
- * In all cases resulting LRZ buffer is all zeroes and LRZ direction is updated.
- *
 * LRZ Caches
 * ==========
 *
 * ! The policy here is to flush LRZ cache right after it is changed,
 * so if LRZ data is needed afterwards - there is no need to flush it
 * before using LRZ.
- *
- * LRZ_FLUSH flushes and invalidates LRZ caches, there are two caches:
- * - Cache for fast-clear buffer;
- * - Cache for direction byte + depth view params.
- * They could be cleared by LRZ_CLEAR. To become visible in GPU memory
- * the caches should be flushed with LRZ_FLUSH afterwards.
- *
- * GRAS_LRZ_CNTL reads from these caches.
 */

 static void