We had re-enabled this because of some test regressions:
KHR-GLES31.core.geometry_shader.limits.max_input_components and
ext_transform_feedback-max-varyings failed to register allocate,
but now that we support indirect indexing on vertex shader outputs natively
this is no longer an issue.
Piglit's max-samplers tests failed. These tests use indirect indexing
on samplers which is not supported and fail to link with this error message:
"Failed to link: error: sampler arrays indexed with non-constant expressions
is forbidden in GLSL 110". This is expected. The reason these were passing
before is that loop unrolling was able to turn indirect indexing into
direct indexing. We add them to the expected fail list.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>
We have been able to to handle indirect offsets on GS outputs for
a while and we have just implemented this for VS, so we can enable
this capability and avoid the horrible if-ladder code to convert
indirect output indices to constant indices.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>
The GL driver was getting loop unrolling from the GLSL compiler frontend,
but NIR unrolling is more sophisticated, so prefer that.
The only caveat is that loop unrolling is implemented in the Mesa state
tracker, so our backend won't have a chance to undo the optimization if
it causes us to lower thread count or spill, so we choose to be a bit more
conservative with the configuration than what we were doing with GLSL.
Shader-db results follow. Increase in instruction counts is expected due
to additional unrolling. We lose threads in very few shaders, but we
make up for this with the additional unrolling and reduced spilling. We
also managed to get 3 more shaders to compile successfully.
total instructions in shared programs: 13416427 -> 13461431 (0.34%)
instructions in affected programs: 96936 -> 141940 (46.43%)
helped: 58
HURT: 216
Instructions are HURT.
total threads in shared programs: 410626 -> 410598 (<.01%)
threads in affected programs: 56 -> 28 (-50.00%)
helped: 0
HURT: 14
Threads are HURT.
total loops in shared programs: 2121 -> 1708 (-19.47%)
loops in affected programs: 468 -> 55 (-88.25%)
helped: 446
HURT: 47
Loops are helped.
total uniforms in shared programs: 3676567 -> 3691185 (0.40%)
uniforms in affected programs: 25304 -> 39922 (57.77%)
helped: 23
HURT: 199
Uniforms are HURT.
total spills in shared programs: 5902 -> 5727 (-2.97%)
spills in affected programs: 285 -> 110 (-61.40%)
helped: 19
HURT: 0
total fills in shared programs: 13308 -> 13121 (-1.41%)
fills in affected programs: 301 -> 114 (-62.13%)
helped: 19
HURT: 0
total sfu-stalls in shared programs: 31860 -> 32856 (3.13%)
sfu-stalls in affected programs: 1692 -> 2688 (58.87%)
helped: 25
HURT: 196
Sfu-stalls are HURT.
total inst-and-stalls in shared programs: 13448287 -> 13494287 (0.34%)
inst-and-stalls in affected programs: 98404 -> 144404 (46.75%)
helped: 57
HURT: 217
Inst-and-stalls are HURT.
total nops in shared programs: 329276 -> 329551 (0.08%)
nops in affected programs: 2189 -> 2464 (12.56%)
helped: 58
HURT: 181
Nops are HURT.
LOST: 0
GAINED: 3
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>
It enables SAND modifier with columns 128-bytes-wide support for
NV12 format.
When a DRM_FORMAT_MOD_BROADCOM_SAND128 is enabled an imported NV12
texture format has a different layout. Luma and Chroma planes layout
is interleaved for every 128-bytes-wide columns.
Although TFU was supposed to convert a NV12 with SAND_COL128 modifier
from YUV to sRGB color space, it expects a particular swizzle that is not
the one provided by the video decoder available at the Raspberry Pi 4.
This patch follows a similar approach to VC4 YUV blit, using a custom
blit shader that transforms a NV12 texture with SAND_COL128 modifier
with the two interleaved planes to two not-interleaved textures with
UIF format, as it was a regular NV12 format texture.
To reduce the number of texture-fetch operations during the blit, we
are reading and writing the textures in pixel groups of 32-bits. This
implies some swizzling of the pixels to meet the particularities
of the different micro-tile layouts for 8bpp, 16bpp and 32bpp.
With this approach, we are not adding a new format that could be named
"NV12_SAND128". We are just enabling a format modifier.
v2: Rework checks for supported modifiers (Alejandro Piñeiro)
Destroy custom shaders on context destroy (Alejandro Piñeiro)
Add more comments (Alejandro Piñeiro)
SAND128 in query_dmabuf_modifiers should report external_only true.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10051>
The GL frontend can lower this weird GL feature away for us. This should
fix redeclaration of the gl_Color/SecondaryColor as centroid, since that
case had been missed in the !flat special case here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8601>
Add a "do you support this modifier?" query to all
drivers which support format modifiers. This will
be used in a subsequent change to fully
encapsulate modifier validation and auxiliary plane
count calculation logic behind the driver
abstraction, which will in turn simplify the
addition of device-class-specific format modifiers
in the nouveau driver.
Signed-off-by: James Jones <jajones@nvidia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3723>
This is needed due Vulkan because by spec (31.1. Limit Requirements)
the minimum value for the following limits are the following ones:
maxPerStageDescriptorSampledImages 16
maxPerStageDescriptorStorageImages 4
maxPerStageDescriptorInputAttachments 4
And we are using v3d textures for all of them, so current limit would
not be enough for some cases.
Note that as the current comment explains there is not exactly a HW
limit for it, so we could bump to 32 for example, but let's just be
conservative and ask the minimum required.
It is worth to note that we needed to maintain the same value for the
OpenGL case, as it gets a register allocation failure on some GL
cases. We tried to fix that with small changes on the nir scheduler,
but we found that it would require some non-trivial effort to get it
done (that eventually we would need to).
Fixes tests like:
dEQP-VK.binding_model.descriptorset_random.sets16.constant.ubolimitlow.sbolimitlow.imglimitlow.noiub.uab.comp.noia.0
v2: keep the previous limit for Opengl (Eric)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6999>
So it could be used by both the OpenGL and the Vulkan driver.
In addition to the move, some small changes were needed to be made on
the API. For example, the simulator was receiving v3d_screen on
initialization, and that code setted v3d_screen->sim_file. Now it
returns the new sim_file created.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5666>
Adds PIPE_CAP_PRIMITIVE_RESTART_FIXED_INDEX which is a subset of the
primitive restart cap for when the hardware can only support the fixed
indices specified in GLES.
The switch statements were automatically modified with this command:
find \( \( -name \*.cpp -o -name \*.c \) \! -type l \) \
-exec sed -i -r \
's/^(\s*case\s+PIPE_CAP_PRIMITIVE_RESTART)\s*:.*$/\0\n\1_FIXED_INDEX:/' \
{} \;
v2: Add a note in screen.rst
Reviewed-by: Eric Anholt <eric@anholt.net> (v1)
Reviewed by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5559>
Fixes all the ARB_texture_query_lod piglit tests, and needed to get
the Vulkan CTS textureQueryLOD passing with the ongoing Vulkan driver.
Note that LOD Query bit flag became only available on V42 of the hw,
but the v3d40_tex is using V41 as reference. In order to avoid setting
up the infrastructure to support both v41 and v42, we manually set the
bit if the device version is the correct one.
We also fix how the ARB_texture_query_lod (so EXT_texture_query_lod)
is exposed. Before this commit it was always exposed (wrongly as it
was not really supported). Now it is exposed for devinfo.ver >= 42.
v2: move _need_sampler helper to nir.h (Eric Anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4677>
V3D can do indirect inputs so we don't need it. Also, the lowering
produces horrible if-ladder code that is particularly bad for geometry
shaders where inputs are always arrays and shader bodies usually have
a loop indexing into them.
This fixes a couple of geometry shader tests in CTS that would fail to
register allocate otherwise.
There are no changes in shader-db.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
To make PIPE_FORMATs usable from non-gallium parts of Mesa, I want to
move their helpers out of gallium. Since u_format used
util_copy_rect(), I moved that in there, too.
I've put it in a separate directory in util/ because it's a big chunk
of related code, and it's not clear to me whether we might want it as
a separate library from libmesa_util at some point.
Closes: #1905
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This will expose GL_EXT_primitive_bounding_box and
GL_OES_primitive_bounding_box after previous commits
expose OpenGL ES 3.1 once Compute Shaders are available.
Reviewed-by: Eric Anholt <eric@anholt.net>
This adapts the v3d driver to the new CL submit ioctl interface that
allows the driver to request a flush of the caches after the render
job has completed. This seems to eliminate the kernel write violation
errors reported during CTS and Piglit excutions, fixing some CTS tests
and GPU resets along the way.
v2:
- Adapt to changes in the kernel side.
- Disable shader storage and shader images if the kernel doesn't
implement cache flushing.
Fixes CTS tests:
KHR-GLES31.core.shader_image_size.basic-nonMS-fs-float
KHR-GLES31.core.shader_image_size.basic-nonMS-fs-int
KHR-GLES31.core.shader_image_size.basic-nonMS-fs-uint
KHR-GLES31.core.shader_image_size.advanced-nonMS-fs-float
KHR-GLES31.core.shader_image_size.advanced-nonMS-fs-int
KHR-GLES31.core.shader_image_size.advanced-nonMS-fs-uint
KHR-GLES31.core.shader_atomic_counters.advanced-usage-many-draw-calls2
KHR-GLES31.core.shader_atomic_counters.advanced-usage-draw-update-draw
KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-int
KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std140-matR
KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std140-struct
KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std430-matC-pad
KHR-GLES31.core.shader_storage_buffer_object.advanced-unsizedArrayLength-fs-std430-vec
Reviewed-by: Eric Anholt <eric@anholt.net>
Now that the UAPI has landed, add the pipe_context function for
dispatching compute shaders. This is the last major feature for GLES 3.1,
though it's not enabled quite yet.
We have a cap bit for gallium and a GLSL compiler flag to control this.
Just trust what GLSL gives us and stop forcing it. In order for this to
be safe, we have to advertise another cap in some of the gallium
drivers.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is a relatively minimal change to adjust all the gallium interfaces
to use bool instead of boolean. I tried to avoid making unrelated
changes inside of drivers to flip boolean -> bool to reduce the risk of
regressions (the compiler will much more easily allow "dirty" values
inside a char-based boolean than a C99 _Bool).
This has been build-tested on amd64 with:
Gallium drivers: nouveau r300 r600 radeonsi freedreno swrast etnaviv v3d
vc4 i915 svga virgl swr panfrost iris lima kmsro
Gallium st: mesa xa xvmc xvmc vdpau va
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
PIPE_CAP_SM3 has always been an odd one out of all our caps. While most
other caps are fine-grained and single-purpose, this cap encode several
features in one. And since OpenGL cares more about single features, it'd
be nice to get rid of this one.
As it turns, this is now relatively simple. We only really care about
three features using this cap, and those already got their own caps. So
we can remove it, and make sure all current drivers just give the same
response to all of them.
The only place we *really* care about SM3 is in nine, and there we can
instead just re-construct the information based on the finer-grained
caps. This avoids DX9 semantics from needlessly leaking into all of the
drivers, most of who doesn't care a whole lot about DX9 specifically.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The V3D 4.2 HW has a limit to MSAA texture sizes of 4096. With non-MSAA,
we can go up to 7680 (actually probably 8138, but that hasn't been
validated by the HW team). Exposing 7680 in X11 will allow dual 4k displays.
The _LEVELS assumes that the max is always power of two. For V3D 4.2, we
can support up to 7680 non-power-of-two MSAA textures, which will let X11
support dual 4k displays on newer hardware.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>