docs/freedreno: Extract LRZ docs from tu_lrz

Most of the docs describe HW and are not specific to Turnip.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20491>
This commit is contained in:
Danylo Piliaiev
2023-01-03 13:34:24 +01:00
committed by Marge Bot
parent 22543653d5
commit e176eb6c39
3 changed files with 132 additions and 82 deletions

View File

@@ -26,6 +26,11 @@ Adreno is a mostly tile-mode renderer, but with the option to bypass tiling
mostly write combined memory but with the ability to map some buffers as cache
coherent with the CPU.
.. toctree::
:glob:
freedreno/hw/*
Hardware acronyms
^^^^^^^^^^^^^^^^^

View File

@@ -0,0 +1,122 @@
Low Resolution Z Buffer
=======================
This doc is based on a6xx HW reverse engineering, a5xx should be similar to
a6xx before gen3.
Low Resolution Z buffer is very similar to a depth prepass that helps
the HW to avoid executing the fragment shader on those fragments that will
be subsequently discarded by the depth test afterwards.
The interesting part of this feature is that it allows applications
to submit the vertices in any order.
Citing official Adreno documentation:
::
[A Low Resolution Z (LRZ)] pass is also referred to as draw order independent
depth rejection. During the binning pass, a low resolution Z-buffer is constructed,
and can reject LRZ-tile wide contributions to boost binning performance. This LRZ
is then used during the rendering pass to reject pixels efficiently before testing
against the full resolution Z-buffer.
TODO: a7xx
Limitations
-----------
There are two main limitations of LRZ:
- Since LRZ is an early depth test, such test cannot be used when late-z is required;
- LRZ buffer could be formed only in one direction, changing depth comparison directions
without disabling LRZ would lead to a malformed LRZ buffer.
Pre-a650 (before gen3)
----------------------
The direction is fully tracked on CPU. In renderpass LRZ starts with
unknown direction, the direction is set first time when depth write occurs
and if it does change afterwards then the direction becomes invalid and LRZ is
disabled for the rest of the renderpass.
Since the direction is not tracked by the GPU, it's impossible to know whether
LRZ is enabled during construction of secondary command buffers.
For the same reason, it's impossible to reuse LRZ between renderpasses.
A650+ (gen3+)
-------------
Now LRZ direction can be tracked on GPU. There are two parts:
- Direction byte which stores current LRZ direction - ``GRAS_LRZ_CNTL.DIR``.
- Parameters of the last used depth view - ``GRAS_LRZ_DEPTH_VIEW``.
The idea is the same as when LRZ tracked on CPU: when ``GRAS_LRZ_CNTL``
is used, its direction is compared to the previously known direction
and direction byte is set to disabled when directions are incompatible.
Additionally, to reuse LRZ between renderpasses, ``GRAS_LRZ_CNTL`` checks
if the current value of ``GRAS_LRZ_DEPTH_VIEW`` is equal to the value
stored in the buffer. If not, LRZ is disabled. This is necessary
because depth buffer may have several layers and mip levels, while the
LRZ buffer represents only a single layer + mip level.
LRZ Fast-Clear
--------------
The LRZ fast-clear buffer is initialized to zeroes and read/written
when ``GRAS_LRZ_CNTL.FC_ENABLE`` is set. It appears to store 1b/block.
``0`` means block has original depth clear value, and ``1`` means that the
corresponding block in LRZ has been modified.
LRZ fast-clear conservatively clears LRZ buffer. At the point where LRZ is
written the LRZ block which corresponds to a single fast-clear bit is cleared:
- To ``0.0`` if depth comparison is ``GREATER``
- To ``1.0`` if depth comparison is ``LESS``
This way it's always valid to fast-clear.
LRZ Precision
-------------
LRZ always uses ``Z16_UNORM``. The epsilon for it is ``1.f / (1 << 16)`` which is
not enough to represent all values of ``Z32_UNORM`` or ``Z32_FLOAT``.
This especially raises questions in context of fast-clear, if fast-clear
uses a value which cannot be precisely represented by LRZ - we wouldn't
be able to round it in the correct direction since direction is tracked
on GPU.
However, it seems that depth comparisons with LRZ values have some "slack"
and nothing special should be done for such depth clear values.
How it was tested:
- Clear ``Z32_FLOAT`` attachment to ``1.f / (1 << 17)``
- LRZ buffer contains all zeroes.
- Do draws and check whether all samples are passing:
- ``OP_GREATER`` with ``(1.f / (1 << 17) + float32_epsilon)`` - passing;
- ``OP_GREATER`` with ``(1.f / (1 << 17) - float32_epsilon)`` - not passing;
- ``OP_LESS`` with ``(1.f / (1 << 17) - float32_epsilon)`` - samples;
- ``OP_LESS`` with ``(1.f / (1 << 17) + float32_epsilon)``- not passing;
- ``OP_LESS_OR_EQ`` with ``(1.f / (1 << 17) + float32_epsilon)`` - not passing.
In all cases resulting LRZ buffer is all zeroes and LRZ direction is updated.
LRZ Caches
----------
``LRZ_FLUSH`` flushes and invalidates LRZ caches, there are two caches:
- Cache for fast-clear buffer;
- Cache for direction byte + depth view params.
They could be cleared by ``LRZ_CLEAR``. To become visible in GPU memory
the caches should be flushed with ``LRZ_FLUSH`` afterwards.
``GRAS_LRZ_CNTL`` reads from these caches.

View File

@@ -10,16 +10,7 @@
#include "tu_cs.h"
#include "tu_image.h"
/* Low-resolution Z buffer is very similar to a depth prepass that helps
* the HW avoid executing the fragment shader on those fragments that will
* be subsequently discarded by the depth test afterwards.
*
* The interesting part of this feature is that it allows applications
* to submit the vertices in any order.
*
* In the binning pass it is possible to store the depth value of each
* vertex into internal low resolution depth buffer and quickly test
* the primitives against it during the render pass.
/* See lrz.rst for how HW works. Here are only the implementation notes.
*
* There are a number of limitations when LRZ cannot be used:
* - Fragment shader side-effects (writing to SSBOs, atomic operations, etc);
@@ -32,38 +23,12 @@
* - (pre-a650) Using secondary command buffers;
* - Sysmem rendering (with small caveat).
*
* Pre-a650 (before gen3)
* ======================
*
* The direction is fully tracked on CPU. In renderpass LRZ starts with
* unknown direction, the direction is set first time when depth write occurs
* and if it does change afterwards - direction becomes invalid and LRZ is
* disabled for the rest of the renderpass.
*
* Since direction is not tracked by GPU - it's impossible to know whether
* LRZ is enabled during construction of secondary command buffers.
*
* For the same reason it's impossible to reuse LRZ between renderpasses.
*
* A650+ (gen3+)
* =============
*
* Now LRZ direction could be tracked on GPU. There are to parts:
* - Direction byte which stores current LRZ direction;
* - Parameters of the last used depth view.
*
* The idea is the same as when LRZ tracked on CPU: when GRAS_LRZ_CNTL
* is used - its direction is compared to previously known direction
* and direction byte is set to disabled when directions are incompatible.
*
* Additionally, to reuse LRZ between renderpasses, GRAS_LRZ_CNTL checks
* if current value of GRAS_LRZ_DEPTH_VIEW is equal to the value
* stored in the buffer, if not - LRZ is disabled. (This is necessary
* because depth buffer may have several layers and mip levels, on the
* other hand LRZ buffer represents only a single layer + mip level).
*
* LRZ direction between renderpasses is disabled when underlying depth
* buffer is changed, the following commands could change depth image:
* While LRZ could be reused between renderpasses LRZ, it is disabled when
* underlying depth buffer is changed.
* The following commands could change a depth image:
* - vkCmdBlitImage*
* - vkCmdCopyBufferToImage*
* - vkCmdCopyImage*
@@ -71,59 +36,17 @@
* LRZ Fast-Clear
* ==============
*
* The LRZ fast-clear buffer is initialized to zeroes and read/written
* when GRAS_LRZ_CNTL.FC_ENABLE (b3) is set. It appears to store 1b/block.
* '0' means block has original depth clear value, and '1' means that the
* corresponding block in LRZ has been modified.
*
* LRZ fast-clear conservatively clears LRZ buffer, at the point where LRZ is
* written the LRZ block which corresponds to a single fast-clear bit is cleared:
* - To 0.0 if depth comparison is GREATER;
* - To 1.0 if depth comparison is LESS;
*
* This way it's always valid to fast-clear. On the other hand we disable
* It's always valid to fast-clear. On the other hand we disable
* fast-clear if depth clear value is not 0.0 or 1.0 because it may be worse
* for perf if some primitives are expected to fail depth test against the
* actual depth clear value.
*
* LRZ Precision
* =============
*
* LRZ always uses Z16_UNORM. The epsilon for it is 1.f / (1 << 16) which is
* not enough to represent all values of Z32_UNORM or Z32_FLOAT.
* This especially rises questions in context of fast-clear, if fast-clear
* uses a value which cannot be precisely represented by LRZ - we wouldn't
* be able to round it in the correct direction since direction is tracked
* on GPU.
*
* However, it seems that depth comparisons with LRZ values have some "slack"
* and nothing special should be done for such depth clear values.
*
* How it was tested:
* - Clear Z32_FLOAT attachment to 1.f / (1 << 17)
* - LRZ buffer contains all zeroes
* - Do draws and check whether all samples are passing:
* - OP_GREATER with (1.f / (1 << 17) + float32_epsilon) - passing;
* - OP_GREATER with (1.f / (1 << 17) - float32_epsilon) - not passing;
* - OP_LESS with (1.f / (1 << 17) - float32_epsilon) - samples;
* - OP_LESS with() 1.f / (1 << 17) + float32_epsilon) - not passing;
* - OP_LESS_OR_EQ with (1.f / (1 << 17) + float32_epsilon) - not passing;
* In all cases resulting LRZ buffer is all zeroes and LRZ direction is updated.
*
* LRZ Caches
* ==========
*
* ! The policy here is to flush LRZ cache right after it is changed,
* so if LRZ data is needed afterwards - there is no need to flush it
* before using LRZ.
*
* LRZ_FLUSH flushes and invalidates LRZ caches, there are two caches:
* - Cache for fast-clear buffer;
* - Cache for direction byte + depth view params.
* They could be cleared by LRZ_CLEAR. To become visible in GPU memory
* the caches should be flushed with LRZ_FLUSH afterwards.
*
* GRAS_LRZ_CNTL reads from these caches.
*/
static void