docs/freedreno: Extract LRZ docs from tu_lrz
Most of the docs describe HW and are not specific to Turnip. Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20491>
This commit is contained in:

committed by
Marge Bot

parent
22543653d5
commit
e176eb6c39
@@ -26,6 +26,11 @@ Adreno is a mostly tile-mode renderer, but with the option to bypass tiling
|
||||
mostly write combined memory but with the ability to map some buffers as cache
|
||||
coherent with the CPU.
|
||||
|
||||
.. toctree::
|
||||
:glob:
|
||||
|
||||
freedreno/hw/*
|
||||
|
||||
Hardware acronyms
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
|
122
docs/drivers/freedreno/hw/lrz.rst
Normal file
122
docs/drivers/freedreno/hw/lrz.rst
Normal file
@@ -0,0 +1,122 @@
|
||||
Low Resolution Z Buffer
|
||||
=======================
|
||||
|
||||
This doc is based on a6xx HW reverse engineering, a5xx should be similar to
|
||||
a6xx before gen3.
|
||||
|
||||
Low Resolution Z buffer is very similar to a depth prepass that helps
|
||||
the HW to avoid executing the fragment shader on those fragments that will
|
||||
be subsequently discarded by the depth test afterwards.
|
||||
|
||||
The interesting part of this feature is that it allows applications
|
||||
to submit the vertices in any order.
|
||||
|
||||
Citing official Adreno documentation:
|
||||
|
||||
::
|
||||
|
||||
[A Low Resolution Z (LRZ)] pass is also referred to as draw order independent
|
||||
depth rejection. During the binning pass, a low resolution Z-buffer is constructed,
|
||||
and can reject LRZ-tile wide contributions to boost binning performance. This LRZ
|
||||
is then used during the rendering pass to reject pixels efficiently before testing
|
||||
against the full resolution Z-buffer.
|
||||
|
||||
TODO: a7xx
|
||||
|
||||
Limitations
|
||||
-----------
|
||||
|
||||
There are two main limitations of LRZ:
|
||||
|
||||
- Since LRZ is an early depth test, such test cannot be used when late-z is required;
|
||||
- LRZ buffer could be formed only in one direction, changing depth comparison directions
|
||||
without disabling LRZ would lead to a malformed LRZ buffer.
|
||||
|
||||
Pre-a650 (before gen3)
|
||||
----------------------
|
||||
|
||||
The direction is fully tracked on CPU. In renderpass LRZ starts with
|
||||
unknown direction, the direction is set first time when depth write occurs
|
||||
and if it does change afterwards then the direction becomes invalid and LRZ is
|
||||
disabled for the rest of the renderpass.
|
||||
|
||||
Since the direction is not tracked by the GPU, it's impossible to know whether
|
||||
LRZ is enabled during construction of secondary command buffers.
|
||||
|
||||
For the same reason, it's impossible to reuse LRZ between renderpasses.
|
||||
|
||||
A650+ (gen3+)
|
||||
-------------
|
||||
|
||||
Now LRZ direction can be tracked on GPU. There are two parts:
|
||||
|
||||
- Direction byte which stores current LRZ direction - ``GRAS_LRZ_CNTL.DIR``.
|
||||
- Parameters of the last used depth view - ``GRAS_LRZ_DEPTH_VIEW``.
|
||||
|
||||
The idea is the same as when LRZ tracked on CPU: when ``GRAS_LRZ_CNTL``
|
||||
is used, its direction is compared to the previously known direction
|
||||
and direction byte is set to disabled when directions are incompatible.
|
||||
|
||||
Additionally, to reuse LRZ between renderpasses, ``GRAS_LRZ_CNTL`` checks
|
||||
if the current value of ``GRAS_LRZ_DEPTH_VIEW`` is equal to the value
|
||||
stored in the buffer. If not, LRZ is disabled. This is necessary
|
||||
because depth buffer may have several layers and mip levels, while the
|
||||
LRZ buffer represents only a single layer + mip level.
|
||||
|
||||
LRZ Fast-Clear
|
||||
--------------
|
||||
|
||||
The LRZ fast-clear buffer is initialized to zeroes and read/written
|
||||
when ``GRAS_LRZ_CNTL.FC_ENABLE`` is set. It appears to store 1b/block.
|
||||
``0`` means block has original depth clear value, and ``1`` means that the
|
||||
corresponding block in LRZ has been modified.
|
||||
|
||||
LRZ fast-clear conservatively clears LRZ buffer. At the point where LRZ is
|
||||
written the LRZ block which corresponds to a single fast-clear bit is cleared:
|
||||
|
||||
- To ``0.0`` if depth comparison is ``GREATER``
|
||||
- To ``1.0`` if depth comparison is ``LESS``
|
||||
|
||||
This way it's always valid to fast-clear.
|
||||
|
||||
LRZ Precision
|
||||
-------------
|
||||
|
||||
LRZ always uses ``Z16_UNORM``. The epsilon for it is ``1.f / (1 << 16)`` which is
|
||||
not enough to represent all values of ``Z32_UNORM`` or ``Z32_FLOAT``.
|
||||
This especially raises questions in context of fast-clear, if fast-clear
|
||||
uses a value which cannot be precisely represented by LRZ - we wouldn't
|
||||
be able to round it in the correct direction since direction is tracked
|
||||
on GPU.
|
||||
|
||||
However, it seems that depth comparisons with LRZ values have some "slack"
|
||||
and nothing special should be done for such depth clear values.
|
||||
|
||||
How it was tested:
|
||||
|
||||
- Clear ``Z32_FLOAT`` attachment to ``1.f / (1 << 17)``
|
||||
|
||||
- LRZ buffer contains all zeroes.
|
||||
|
||||
- Do draws and check whether all samples are passing:
|
||||
|
||||
- ``OP_GREATER`` with ``(1.f / (1 << 17) + float32_epsilon)`` - passing;
|
||||
- ``OP_GREATER`` with ``(1.f / (1 << 17) - float32_epsilon)`` - not passing;
|
||||
- ``OP_LESS`` with ``(1.f / (1 << 17) - float32_epsilon)`` - samples;
|
||||
- ``OP_LESS`` with ``(1.f / (1 << 17) + float32_epsilon)``- not passing;
|
||||
- ``OP_LESS_OR_EQ`` with ``(1.f / (1 << 17) + float32_epsilon)`` - not passing.
|
||||
|
||||
In all cases resulting LRZ buffer is all zeroes and LRZ direction is updated.
|
||||
|
||||
LRZ Caches
|
||||
----------
|
||||
|
||||
``LRZ_FLUSH`` flushes and invalidates LRZ caches, there are two caches:
|
||||
|
||||
- Cache for fast-clear buffer;
|
||||
- Cache for direction byte + depth view params.
|
||||
|
||||
They could be cleared by ``LRZ_CLEAR``. To become visible in GPU memory
|
||||
the caches should be flushed with ``LRZ_FLUSH`` afterwards.
|
||||
|
||||
``GRAS_LRZ_CNTL`` reads from these caches.
|
@@ -10,16 +10,7 @@
|
||||
#include "tu_cs.h"
|
||||
#include "tu_image.h"
|
||||
|
||||
/* Low-resolution Z buffer is very similar to a depth prepass that helps
|
||||
* the HW avoid executing the fragment shader on those fragments that will
|
||||
* be subsequently discarded by the depth test afterwards.
|
||||
*
|
||||
* The interesting part of this feature is that it allows applications
|
||||
* to submit the vertices in any order.
|
||||
*
|
||||
* In the binning pass it is possible to store the depth value of each
|
||||
* vertex into internal low resolution depth buffer and quickly test
|
||||
* the primitives against it during the render pass.
|
||||
/* See lrz.rst for how HW works. Here are only the implementation notes.
|
||||
*
|
||||
* There are a number of limitations when LRZ cannot be used:
|
||||
* - Fragment shader side-effects (writing to SSBOs, atomic operations, etc);
|
||||
@@ -32,38 +23,12 @@
|
||||
* - (pre-a650) Using secondary command buffers;
|
||||
* - Sysmem rendering (with small caveat).
|
||||
*
|
||||
* Pre-a650 (before gen3)
|
||||
* ======================
|
||||
*
|
||||
* The direction is fully tracked on CPU. In renderpass LRZ starts with
|
||||
* unknown direction, the direction is set first time when depth write occurs
|
||||
* and if it does change afterwards - direction becomes invalid and LRZ is
|
||||
* disabled for the rest of the renderpass.
|
||||
*
|
||||
* Since direction is not tracked by GPU - it's impossible to know whether
|
||||
* LRZ is enabled during construction of secondary command buffers.
|
||||
*
|
||||
* For the same reason it's impossible to reuse LRZ between renderpasses.
|
||||
*
|
||||
* A650+ (gen3+)
|
||||
* =============
|
||||
*
|
||||
* Now LRZ direction could be tracked on GPU. There are to parts:
|
||||
* - Direction byte which stores current LRZ direction;
|
||||
* - Parameters of the last used depth view.
|
||||
*
|
||||
* The idea is the same as when LRZ tracked on CPU: when GRAS_LRZ_CNTL
|
||||
* is used - its direction is compared to previously known direction
|
||||
* and direction byte is set to disabled when directions are incompatible.
|
||||
*
|
||||
* Additionally, to reuse LRZ between renderpasses, GRAS_LRZ_CNTL checks
|
||||
* if current value of GRAS_LRZ_DEPTH_VIEW is equal to the value
|
||||
* stored in the buffer, if not - LRZ is disabled. (This is necessary
|
||||
* because depth buffer may have several layers and mip levels, on the
|
||||
* other hand LRZ buffer represents only a single layer + mip level).
|
||||
*
|
||||
* LRZ direction between renderpasses is disabled when underlying depth
|
||||
* buffer is changed, the following commands could change depth image:
|
||||
* While LRZ could be reused between renderpasses LRZ, it is disabled when
|
||||
* underlying depth buffer is changed.
|
||||
* The following commands could change a depth image:
|
||||
* - vkCmdBlitImage*
|
||||
* - vkCmdCopyBufferToImage*
|
||||
* - vkCmdCopyImage*
|
||||
@@ -71,59 +36,17 @@
|
||||
* LRZ Fast-Clear
|
||||
* ==============
|
||||
*
|
||||
* The LRZ fast-clear buffer is initialized to zeroes and read/written
|
||||
* when GRAS_LRZ_CNTL.FC_ENABLE (b3) is set. It appears to store 1b/block.
|
||||
* '0' means block has original depth clear value, and '1' means that the
|
||||
* corresponding block in LRZ has been modified.
|
||||
*
|
||||
* LRZ fast-clear conservatively clears LRZ buffer, at the point where LRZ is
|
||||
* written the LRZ block which corresponds to a single fast-clear bit is cleared:
|
||||
* - To 0.0 if depth comparison is GREATER;
|
||||
* - To 1.0 if depth comparison is LESS;
|
||||
*
|
||||
* This way it's always valid to fast-clear. On the other hand we disable
|
||||
* It's always valid to fast-clear. On the other hand we disable
|
||||
* fast-clear if depth clear value is not 0.0 or 1.0 because it may be worse
|
||||
* for perf if some primitives are expected to fail depth test against the
|
||||
* actual depth clear value.
|
||||
*
|
||||
* LRZ Precision
|
||||
* =============
|
||||
*
|
||||
* LRZ always uses Z16_UNORM. The epsilon for it is 1.f / (1 << 16) which is
|
||||
* not enough to represent all values of Z32_UNORM or Z32_FLOAT.
|
||||
* This especially rises questions in context of fast-clear, if fast-clear
|
||||
* uses a value which cannot be precisely represented by LRZ - we wouldn't
|
||||
* be able to round it in the correct direction since direction is tracked
|
||||
* on GPU.
|
||||
*
|
||||
* However, it seems that depth comparisons with LRZ values have some "slack"
|
||||
* and nothing special should be done for such depth clear values.
|
||||
*
|
||||
* How it was tested:
|
||||
* - Clear Z32_FLOAT attachment to 1.f / (1 << 17)
|
||||
* - LRZ buffer contains all zeroes
|
||||
* - Do draws and check whether all samples are passing:
|
||||
* - OP_GREATER with (1.f / (1 << 17) + float32_epsilon) - passing;
|
||||
* - OP_GREATER with (1.f / (1 << 17) - float32_epsilon) - not passing;
|
||||
* - OP_LESS with (1.f / (1 << 17) - float32_epsilon) - samples;
|
||||
* - OP_LESS with() 1.f / (1 << 17) + float32_epsilon) - not passing;
|
||||
* - OP_LESS_OR_EQ with (1.f / (1 << 17) + float32_epsilon) - not passing;
|
||||
* In all cases resulting LRZ buffer is all zeroes and LRZ direction is updated.
|
||||
*
|
||||
* LRZ Caches
|
||||
* ==========
|
||||
*
|
||||
* ! The policy here is to flush LRZ cache right after it is changed,
|
||||
* so if LRZ data is needed afterwards - there is no need to flush it
|
||||
* before using LRZ.
|
||||
*
|
||||
* LRZ_FLUSH flushes and invalidates LRZ caches, there are two caches:
|
||||
* - Cache for fast-clear buffer;
|
||||
* - Cache for direction byte + depth view params.
|
||||
* They could be cleared by LRZ_CLEAR. To become visible in GPU memory
|
||||
* the caches should be flushed with LRZ_FLUSH afterwards.
|
||||
*
|
||||
* GRAS_LRZ_CNTL reads from these caches.
|
||||
*/
|
||||
|
||||
static void
|
||||
|
Reference in New Issue
Block a user