third_party_mesa3d

OpenHarmony/third_party_mesa3d

Fork 0

Commit Graph

Author	SHA1	Message	Date
Alyssa Rosenzweig	735c63c75e	libagx: hoist code out of loop Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30382>	2024-07-26 18:40:47 +00:00
Alyssa Rosenzweig	d26ae4f455	asahi,libagx: tessellate on device Add OpenCL kernels implementing the tessellation algorithm on device. This is an OpenCL C port of the D3D11 reference tessellator, originally written by Microsoft in C++. There are significant differences compared to the CPU based reference implementation: * significant simplifications and clean up. The reference code did a lot of things in weird ways that would be inefficient on the GPU. I did a lot of work here to get good AGX assembly generated for the tessellation kernels ... the first attempts were quite bad! Notably, everything is carefully written to ensure that all private memory access is optimized out in NIR; the resulting kernels do not use scratch and do not spill on G13. * prefix sum variants. To implement geom+tess efficiently, we need to first calculate the count of indices generated by the tessellator, then prefix sum that, then tessellate using the prefix sum results writing into 1 large index buffer for a single indirect draw. This isn't too bad, we already have most of the logic and the guts of the prefix sum kernel is shared with geometry shaders. * VDM generation variant. To implement tess alone, it's fastest to generate a hardware Index List word for each patch, adding an appropriate 32-bit index bias to the dynamically allocated U16 index buffers. Then from the CPU, we have the illusion of a single draw to Stream Link with Return to. This requires packing hardware control words from the tessellator kernel. Fortunately, we have GenXML available so we just use agx_pack like we would in the driver. Along the way, we pick up indirect tess support (this follows on naturally), which gets rid of the other bit of tessellation-related cheating. Implementing this requires reworking our internal agx_launch data structures, but that has the nice side effect of speeding up GS invocations too (by fixing the workgroup size). Don't get me wrong. tessellator.cl is the single most unhinged file of my career, featuring GenXML-based pack macros fed by dynamic memory allocation fed by the inscrutable tessellation algorithm. But it works really well. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30051>	2024-07-15 20:09:00 +00:00
Alyssa Rosenzweig	9753cd44f7	asahi: Implement skeleton for tessellation This implements a rough skeleton of what's needed for tessellation. It contains the relevant lowerings to merge the VS and TCS, running them as a compute kernel, and to lower the TES to a new VS (possibly merged in with a subsequent GS). This is sufficient for both standalone tessellation and tess + geom/xfb together. It does not yet contain a GPU accellerated tessellator, simply falling back to the CPU for that for now. Nevertheless the data structures are engineered with that end goal in mind, in particular to be able to tessellate all patches in parallel without needing any prefix sums etc (using simple watermark allocation for the heap). Work on fleshing out the skeleton continues in parallel. For now, this does pass the tests and lets the harder stuff get regression tested more easily. And merging early will ease rebase. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27616>	2024-02-14 21:02:28 +00:00

Author

SHA1

Message

Date

Alyssa Rosenzweig

735c63c75e

libagx: hoist code out of loop

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30382>

2024-07-26 18:40:47 +00:00

Alyssa Rosenzweig

d26ae4f455

asahi,libagx: tessellate on device

Add OpenCL kernels implementing the tessellation algorithm on device. This is an
OpenCL C port of the D3D11 reference tessellator, originally written by
Microsoft in C++. There are significant differences compared to the CPU based
reference implementation:

* significant simplifications and clean up. The reference code did a lot of
  things in weird ways that would be inefficient on the GPU. I did a *lot* of
  work here to get good AGX assembly generated for the tessellation kernels ...
  the first attempts were quite bad! Notably, everything is carefully written to
  ensure that all private memory access is optimized out in NIR; the resulting
  kernels do not use scratch and do not spill on G13.

* prefix sum variants. To implement geom+tess efficiently, we need to first
  calculate the count of indices generated by the tessellator, then prefix sum
  that, then tessellate using the prefix sum results writing into 1 large index
  buffer for a single indirect draw. This isn't too bad, we already have most of
  the logic and the guts of the prefix sum kernel is shared with geometry
  shaders.

* VDM generation variant. To implement tess alone, it's fastest to generate a
  hardware Index List word for each patch, adding an appropriate 32-bit index
  bias to the dynamically allocated U16 index buffers. Then from the CPU, we
  have the illusion of a single draw to Stream Link with Return to. This
  requires packing hardware control words from the tessellator kernel.
  Fortunately, we have GenXML available so we just use agx_pack like we would in
  the driver.

Along the way, we pick up indirect tess support (this follows on naturally),
which gets rid of the other bit of tessellation-related cheating. Implementing
this requires reworking our internal agx_launch data structures, but that has
the nice side effect of speeding up GS invocations too (by fixing the workgroup
size).

Don't get me wrong. tessellator.cl is the single most unhinged file of my
career, featuring GenXML-based pack macros fed by dynamic memory allocation fed
by the inscrutable tessellation algorithm.

But it works *really* well.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30051>

2024-07-15 20:09:00 +00:00

Alyssa Rosenzweig

9753cd44f7

asahi: Implement skeleton for tessellation

This implements a rough skeleton of what's needed for tessellation. It contains
the relevant lowerings to merge the VS and TCS, running them as a compute
kernel, and to lower the TES to a new VS (possibly merged in with a subsequent
GS). This is sufficient for both standalone tessellation and tess + geom/xfb
together. It does not yet contain a GPU accellerated tessellator, simply falling
back to the CPU for that for now. Nevertheless the data structures are
engineered with that end goal in mind, in particular to be able to tessellate
all patches in parallel without needing any prefix sums etc (using simple
watermark allocation for the heap).

Work on fleshing out the skeleton continues in parallel. For now, this does pass
the tests and lets the harder stuff get regression tested more easily. And
merging early will ease rebase.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27616>

2024-02-14 21:02:28 +00:00

3 Commits