docs/freedreno: Extract LRZ docs from tu_lrz
Most of the docs describe HW and are not specific to Turnip. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20491>
This commit is contained in:

committed by
Marge Bot

parent
22543653d5
commit
e176eb6c39
@@ -26,6 +26,11 @@ Adreno is a mostly tile-mode renderer, but with the option to bypass tiling
|
||||
mostly write combined memory but with the ability to map some buffers as cache
|
||||
coherent with the CPU.
|
||||
|
||||
.. toctree::
|
||||
:glob:
|
||||
|
||||
freedreno/hw/*
|
||||
|
||||
Hardware acronyms
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
122
docs/drivers/freedreno/hw/lrz.rst
Normal file
122
docs/drivers/freedreno/hw/lrz.rst
Normal file
@@ -0,0 +1,122 @@
|
||||
Low Resolution Z Buffer
|
||||
=======================
|
||||
|
||||
This doc is based on a6xx HW reverse engineering, a5xx should be similar to
|
||||
a6xx before gen3.
|
||||
|
||||
Low Resolution Z buffer is very similar to a depth prepass that helps
|
||||
the HW to avoid executing the fragment shader on those fragments that will
|
||||
be subsequently discarded by the depth test afterwards.
|
||||
|
||||
The interesting part of this feature is that it allows applications
|
||||
to submit the vertices in any order.
|
||||
|
||||
Citing official Adreno documentation:
|
||||
|
||||
::
|
||||
|
||||
[A Low Resolution Z (LRZ)] pass is also referred to as draw order independent
|
||||
depth rejection. During the binning pass, a low resolution Z-buffer is constructed,
|
||||
and can reject LRZ-tile wide contributions to boost binning performance. This LRZ
|
||||
is then used during the rendering pass to reject pixels efficiently before testing
|
||||
against the full resolution Z-buffer.
|
||||
|
||||
TODO: a7xx
|
||||
|
||||
Limitations
|
||||
-----------
|
||||
|
||||
There are two main limitations of LRZ:
|
||||
|
||||
- Since LRZ is an early depth test, such test cannot be used when late-z is required;
|
||||
- LRZ buffer could be formed only in one direction, changing depth comparison directions
|
||||
without disabling LRZ would lead to a malformed LRZ buffer.
|
||||
|
||||
Pre-a650 (before gen3)
|
||||
----------------------
|
||||
|
||||
The direction is fully tracked on CPU. In renderpass LRZ starts with
|
||||
unknown direction, the direction is set first time when depth write occurs
|
||||
and if it does change afterwards then the direction becomes invalid and LRZ is
|
||||
disabled for the rest of the renderpass.
|
||||
|
||||
Since the direction is not tracked by the GPU, it's impossible to know whether
|
||||
LRZ is enabled during construction of secondary command buffers.
|
||||
|
||||
For the same reason, it's impossible to reuse LRZ between renderpasses.
|
||||
|
||||
A650+ (gen3+)
|
||||
-------------
|
||||
|
||||
Now LRZ direction can be tracked on GPU. There are two parts:
|
||||
|
||||
- Direction byte which stores current LRZ direction - ``GRAS_LRZ_CNTL.DIR``.
|
||||
- Parameters of the last used depth view - ``GRAS_LRZ_DEPTH_VIEW``.
|
||||
|
||||
The idea is the same as when LRZ tracked on CPU: when ``GRAS_LRZ_CNTL``
|
||||
is used, its direction is compared to the previously known direction
|
||||
and direction byte is set to disabled when directions are incompatible.
|
||||
|
||||
Additionally, to reuse LRZ between renderpasses, ``GRAS_LRZ_CNTL`` checks
|
||||
if the current value of ``GRAS_LRZ_DEPTH_VIEW`` is equal to the value
|
||||
stored in the buffer. If not, LRZ is disabled. This is necessary
|
||||
because depth buffer may have several layers and mip levels, while the
|
||||
LRZ buffer represents only a single layer + mip level.
|
||||
|
||||
LRZ Fast-Clear
|
||||
--------------
|
||||
|
||||
The LRZ fast-clear buffer is initialized to zeroes and read/written
|
||||
when ``GRAS_LRZ_CNTL.FC_ENABLE`` is set. It appears to store 1b/block.
|
||||
``0`` means block has original depth clear value, and ``1`` means that the
|
||||
corresponding block in LRZ has been modified.
|
||||
|
||||
LRZ fast-clear conservatively clears LRZ buffer. At the point where LRZ is
|
||||
written the LRZ block which corresponds to a single fast-clear bit is cleared:
|
||||
|
||||
- To ``0.0`` if depth comparison is ``GREATER``
|
||||
- To ``1.0`` if depth comparison is ``LESS``
|
||||
|
||||
This way it's always valid to fast-clear.
|
||||
|
||||
LRZ Precision
|
||||
-------------
|
||||
|
||||
LRZ always uses ``Z16_UNORM``. The epsilon for it is ``1.f / (1 << 16)`` which is
|
||||
not enough to represent all values of ``Z32_UNORM`` or ``Z32_FLOAT``.
|
||||
This especially raises questions in context of fast-clear, if fast-clear
|
||||
uses a value which cannot be precisely represented by LRZ - we wouldn't
|
||||
be able to round it in the correct direction since direction is tracked
|
||||
on GPU.
|
||||
|
||||
However, it seems that depth comparisons with LRZ values have some "slack"
|
||||
and nothing special should be done for such depth clear values.
|
||||
|
||||
How it was tested:
|
||||
|
||||
- Clear ``Z32_FLOAT`` attachment to ``1.f / (1 << 17)``
|
||||
|
||||
- LRZ buffer contains all zeroes.
|
||||
|
||||
- Do draws and check whether all samples are passing:
|
||||
|
||||
- ``OP_GREATER`` with ``(1.f / (1 << 17) + float32_epsilon)`` - passing;
|
||||
- ``OP_GREATER`` with ``(1.f / (1 << 17) - float32_epsilon)`` - not passing;
|
||||
- ``OP_LESS`` with ``(1.f / (1 << 17) - float32_epsilon)`` - samples;
|
||||
- ``OP_LESS`` with ``(1.f / (1 << 17) + float32_epsilon)``- not passing;
|
||||
- ``OP_LESS_OR_EQ`` with ``(1.f / (1 << 17) + float32_epsilon)`` - not passing.
|
||||
|
||||
In all cases resulting LRZ buffer is all zeroes and LRZ direction is updated.
|
||||
|
||||
LRZ Caches
|
||||
----------
|
||||
|
||||
``LRZ_FLUSH`` flushes and invalidates LRZ caches, there are two caches:
|
||||
|
||||
- Cache for fast-clear buffer;
|
||||
- Cache for direction byte + depth view params.
|
||||
|
||||
They could be cleared by ``LRZ_CLEAR``. To become visible in GPU memory
|
||||
the caches should be flushed with ``LRZ_FLUSH`` afterwards.
|
||||
|
||||
``GRAS_LRZ_CNTL`` reads from these caches.
|
Reference in New Issue
Block a user