Commit Graph

168376 Commits

Author SHA1 Message Date
Kenneth Graunke
b6878d456f st/mesa, iris: Add optional CPU-based ASTC void extent denorm flushing
Intel Gen9 GPUs have hardware ASTC support, but have a bug where they
don't handle denormalized values in void extent blocks correctly.  This
isn't that hard to work around - on upload, we can detect such blocks,
and flush any denorms to zero.  Because we're altering the data behind
the application's back, and applications can theoretically ask to
download the original unaltered image data, we unfortunately need to
maintain shadow copies of the data.

To make sure that we don't accidentally skip the void-extent flushing
via any fast-upload paths, and support download correctly, we plug this
into the st/mesa compressed texture format fallback paths, which store
a CPU copy of the original image data, and upload altered data.

This is unfortunately common code for what's likely to be a single
driver's issue (on a single generation), but it beats replicating an
entire framework we already have inside the driver.

Fixes dEQP-GLES3.functional.texture.compressed.astc.void_extent_ldr.*
using iris on Intel Gen9 GPUs.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4167
Reviewed-by: Emma Anholt <emma@anholt.net>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21943>
2023-03-17 21:30:48 +00:00
Alyssa Rosenzweig
f534c36ca5 ci: Enforce clang-format for asahi
Some drivers use clang-format exclusively. We would like to lint for correct
formatting in CI to catch style issues before they land, because mixing
clang-format and not clang-format within a codebase is a recipe for conflicts.

We don't expect this lint to ever fail in "normal" usage, since we expect
developers on these drivers to setup automatic formatting in their editor.
However, it can be useful as a failsafe or for drive-by contributors who don't
know the style guide.

Enable the linting for Asahi. We'll enable for Panfrost shortly, but Panfrost
isn't clang-format clean quite yet.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20553>
2023-03-17 19:59:21 +00:00
Alyssa Rosenzweig
5c1b360eaa ci: Add clang-format to the amd64 container
We need clang-format available in order to check for formatting errors later.
Add it to the amd64 container only (this requires some shenigans to avoid
multi-arch conflicts).

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20553>
2023-03-17 19:59:21 +00:00
José Roberto de Souza
d2621ef81d iris: Implement gem_mmap() in Xe kmd backend
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21937>
2023-03-17 19:31:56 +00:00
José Roberto de Souza
16dbf50ad9 iris: Implement gem_create() in Xe kmd backend
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21937>
2023-03-17 19:31:56 +00:00
José Roberto de Souza
c9fdfae334 iris: Implement the function to destroy VM in Xe
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21937>
2023-03-17 19:31:56 +00:00
José Roberto de Souza
60f4bd61b6 iris: Implement the Xe version of iris_bufmgr_init_global_vm()
As Xe KMD requires VM, iris_bufmgr_init_global_vm() now is returing
a boolean telling if bufmgr creationg should continue or not.

Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21937>
2023-03-17 19:31:56 +00:00
José Roberto de Souza
7f65b94451 iris: Only mark buffer as exported if drmPrimeHandleToFD() succeed
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21966>
2023-03-17 17:36:15 +00:00
Rhys Perry
596f2ef361 aco: set needs_flat_scr=true for RT
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Fixes: 39c828cb9f ("aco: remove aco::rt_stack variable")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21961>
2023-03-17 16:55:57 +00:00
Rhys Perry
184cf1cb79 aco/gfx11: fix RT prolog scratch initialization
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Friedrich Vock <friedrich.vock@gmx.de>
Fixes: 6446b79168 ("aco: implement select_rt_prolog()")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21961>
2023-03-17 16:55:57 +00:00
Michel Dänzer
2ead574abe ci: Enable LTO for fedora-release job
Requires -Wno-error=... to be passed to the linking stage.

NOTE: This does not imply that it's safe to enable LTO for Fedora
package builds yet. It just helps prevent moving further away from that
long term goal.

v2:
* Keep passing -Wno-error=array-bounds & -Wno-error=stringop-overread.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
2023-03-17 16:08:34 +00:00
Michel Dänzer
eb9cd45ef6 ci: Install procps-ng in Fedora image
For GCC LTO wrapper scripts.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
2023-03-17 16:08:34 +00:00
Michel Dänzer
2b739ca31d ci: Drop ccache from Fedora image
It started hanging in F36 as well.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
2023-03-17 16:08:34 +00:00
Michel Dänzer
bca2bcfec9 ci: Make ccache optional
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
2023-03-17 16:08:34 +00:00
Michel Dänzer
fe53fa5117 ci: Allow passing c{,pp}_link_args to meson
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
2023-03-17 16:08:34 +00:00
Michel Dänzer
b6e0bf8b76 ci: Pass -Werror to compiler linking stage for LTO
With LTO, some compiler warnings are generated only at the compiler's
linking stage. Therefore -Werror needs to be passed to the linking stage
as well for warnings to be turned into errors.

Meson should really do this when both werror and b_lto are enabled, but
meanwhile let's do it ourselves.

We can't just add -Werror to c{,pp}_link_args, because those are passed
for Meson's feature checks, some of which generate warnings, resulting
in false negatives. We use gcc/g++ wrapper scripts instead.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
2023-03-17 16:08:33 +00:00
Michel Dänzer
86c6634897 intel/vk/grl: Do not use no_override_init_args for C++
It's only valid for C code.

Avoids

cc1plus: error: command-line option '-Wno-override-init' is valid for C/ObjC but not for C++ [-Werror]

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
2023-03-17 16:08:33 +00:00
Michel Dänzer
66e34fe914 ci: Split up -Werror workarounds for debian-mingw32-x86_64 job
Most of them are only needed for C++ code, one of them only for C.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
2023-03-17 16:08:33 +00:00
Michel Dänzer
86496167ea ci: Remove some -Werror workarounds for debian-android job
No more corresponding warnings.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
2023-03-17 16:08:33 +00:00
Michel Dänzer
2f3dc68948 iris: Use ralloc_free for memory allocated with rzalloc
Pointed out by GCC with LTO:

../src/gallium/drivers/iris/iris_context.c: In function 'iris_create_context':
../src/gallium/drivers/iris/iris_context.c:304:7: error: 'free' called on pointer 'block_180' with nonzero offset 48 [-Werror=free-nonheap-object]
  304 |       free(ctx);
      |       ^
[...]
../src/gallium/drivers/iris/iris_context.c:313:7: error: 'free' called on pointer 'block_180' with nonzero offset 48 [-Werror=free-nonheap-object]
  313 |       free(ctx);
      |       ^

v2:
* Use ice pointer instead of ctx. (Karol Herbst)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
2023-03-17 16:08:33 +00:00
Michel Dänzer
c65948a34b crocus: Use ralloc_free for memory allocated with rzalloc
Pointed out by GCC with LTO:

../src/gallium/drivers/crocus/crocus_context.c: In function 'crocus_create_context':
../src/gallium/drivers/crocus/crocus_context.c:261:7: error: 'free' called on pointer 'block_174' with nonzero offset 48 [-Werror=free-nonheap-object]
  261 |       free(ctx);
      |       ^

v2:
* Use ice pointer instead of ctx. (Karol Herbst)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
2023-03-17 16:08:33 +00:00
Michel Dänzer
c67633be62 r600: Use container_of instead of direct pointer cast
Fixes strict aliasing violation:

In function 'r600_init_resource_fields',
    inlined from 'r600_buffer_create' at ../src/gallium/drivers/r600/r600_buffer_common.c:578:2:
../src/gallium/drivers/r600/r600_buffer_common.c:139:48: warning: array subscript 'struct r600_texture[0]' is partly outside array bounds of 'unsigned char[264]' [-Warray-bounds]
  139 |         if ((res->b.b.target != PIPE_BUFFER && !rtex->surface.is_linear) ||
      |                                                ^~~~~~~~~~~~~~~~~~~~~~~~
In file included from ../src/util/os_memory.h:37,
                 from ../src/util/u_memory.h:38,
                 from ../src/gallium/include/pipe/p_state.h:47,
                 from ../src/gallium/auxiliary/util/u_inlines.h:34,
                 from ../src/gallium/auxiliary/pipebuffer/pb_buffer.h:49,
                 from ../src/gallium/include/winsys/radeon_winsys.h:46,
                 from ../src/gallium/drivers/r600/r600_pipe_common.h:37,
                 from ../src/gallium/drivers/r600/r600_cs.h:33,
                 from ../src/gallium/drivers/r600/r600_buffer_common.c:27:
In function 'r600_alloc_buffer_struct',
    inlined from 'r600_buffer_create' at ../src/gallium/drivers/r600/r600_buffer_common.c:576:34:
../src/util/os_memory_stdc.h:41:27: note: object of size 264 allocated by 'malloc'
   41 | #define os_malloc(_size)  malloc(_size)
      |                           ^~~~~~~~~~~~~
../src/util/u_memory.h:46:24: note: in expansion of macro 'os_malloc'
   46 | #define MALLOC(_size)  os_malloc(_size)
      |                        ^~~~~~~~~
../src/util/u_memory.h:54:41: note: in expansion of macro 'MALLOC'
   54 | #define MALLOC_STRUCT(T)   (struct T *) MALLOC(sizeof(struct T))
      |                                         ^~~~~~
../src/gallium/drivers/r600/r600_buffer_common.c:554:19: note: in expansion of macro 'MALLOC_STRUCT'
  554 |         rbuffer = MALLOC_STRUCT(r600_resource);
      |                   ^~~~~~~~~~~~~

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
2023-03-17 16:08:33 +00:00
Michel Dänzer
ff73392774 nouveau: Make getSize return unsigned int
This matches the type of the underlying size member, and is consistent
with other getSize methods.

Avoids compiler warning with LTO enabled:

In member function '__ct ',
    inlined from 'convertToSSA' at ../src/nouveau/codegen/nv50_ir_ssa.cpp:401:26,
    inlined from 'convertToSSA' at ../src/nouveau/codegen/nv50_ir_ssa.cpp:310:28,
    inlined from 'nv50_ir_generate_code' at ../src/nouveau/codegen/nv50_ir.cpp:1331:22:
../src/nouveau/codegen/nv50_ir_ssa.cpp:407:48: error: argument 1 value '18446744073709551615' exceeds maximum object size 9223372036854775807 [-Werror=alloc-size-larger-than=]
  407 |    stack = new Stack[func->allLValues.getSize()];
      |                                                ^
/usr/include/c++/12/new: In function 'nv50_ir_generate_code':
/usr/include/c++/12/new:128:26: note: in a call to allocation function 'operator new []' declared here
  128 | _GLIBCXX_NODISCARD void* operator new[](std::size_t) _GLIBCXX_THROW (std::bad_alloc)
      |                          ^

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21781>
2023-03-17 16:08:33 +00:00
Alejandro Piñeiro
20a066e9ab v3dv/debug: add debug option to disable TFU codepaths
This can have two main uses:
  * If we suspect a problem with TFU copies, we can disable it and
    check if other codepaths gets a test/app working.
  * To test other codepaths, as in general, TFU is the preferred
    option for copies.

Note that for now this is only for v3dv, as for v3d, mipmap generation
uses TFU without an alternative codepath.

With this option we also adds an assert if we try to submit a TFU job,
just in case we keep adding other methods that use TFU, and forget to
include the debug option there.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21952>
2023-03-17 15:20:25 +00:00
Mike Blumenkrantz
46813ffecb zink: only flag rp info for updating on flush, don't actually update
this is more consistent with actual usage

also sanitize rp info on flush to ensure it isn't reused

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
2023-03-17 14:38:00 +00:00
Mike Blumenkrantz
430db81071 aux/tc: rework inter-batch renderpass info handling
the tricky part of tracking renderpass info in tc is handling batch
flushing. there are a number of places that can trigger it, but
there are only two types of flushes:
* full flushes (all commands execute)
* partial flushes (more commands will execute after)

the latter case is the important one, as it effectively means that
the current renderpass info needs to "roll over" into the next one,
and it's really the next info that the driver will want to look at.
this is made trickier by there being no way (for the driver) to distinguish
when a rollover happens in order to delay beginning a renderpass for
further parsing

to solve this, add a member to renderpass info to chain the rolled-over info,
which tc can then process when the driver tries to wait. this works "most"
of the time, except when an app/test blows out the tc batch count, in which
case this pointer will be non-null, and it can be directly signaled as a less
optimized renderpass to avoid deadlocking

also sometimes a flush will trigger sub-flushes for buffer lists, so
add an assert to ensure nobody tries using this with driver_calls_flush_notify=true

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
2023-03-17 14:38:00 +00:00
Mike Blumenkrantz
3d96049191 aux/tc: make some of the rp tracking api private
this enables some more under-the-hood changes without touching the header
that will force all of gallium to be recompiled

also update/clarify rules for using rp tracking; these haven't changed,
but the documentation was less clear before

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
2023-03-17 14:38:00 +00:00
Mike Blumenkrantz
64a256c66a aux/tc: fix initial rp info allocation
this value is -1 by default, which means the initial allocation yields
9 info structs instead of 10 (though this has no bearing on functionality)

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
2023-03-17 14:38:00 +00:00
Mike Blumenkrantz
1a9ba0aaa3 aux/tc: add a function to reset rp info
drivers should be maintaining a local copy of the rp info, and this
provides a consistent way to reset that info if a renderpass is ended
early

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
2023-03-17 14:38:00 +00:00
Mike Blumenkrantz
4a5d3590d6 aux/tc: don't sync for get_sample_position
no drivers actually use the context for this, so a sync is pointless

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
2023-03-17 14:38:00 +00:00
Mike Blumenkrantz
4f58507855 aux/tc: track the number of active queries
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
2023-03-17 14:38:00 +00:00
Mike Blumenkrantz
0f4c3cb05c aux/tc: fix renderpass splitting on flush
it's expected that a driver won't immediately trigger a deferred flush
if a fence is present, so don't split the renderpass in this case since
that breaks everything

Fixes: 07017aa137 ("util/tc: implement renderpass tracking")

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
2023-03-17 14:38:00 +00:00
Mike Blumenkrantz
454772c123 aux/tc: use a local 'deferred' variable in tc_flush()
no functional changes

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
2023-03-17 14:38:00 +00:00
Mike Blumenkrantz
767ef6e02e aux/tc: flag late zs clears as partial clears
this ensures drivers can't optimize out a zs attachment that gets
a late clear

Fixes: 07017aa137 ("util/tc: implement renderpass tracking")

Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
2023-03-17 14:38:00 +00:00
Mike Blumenkrantz
4c359f785f zink: trigger oom flushes more aggressively from copy ops
this cuts down on needing to flush from set_fb or draw

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
2023-03-17 14:38:00 +00:00
Mike Blumenkrantz
11b1ad9f3f zink: disable tc flush notify with rp optimizing
this is extremely broken and nonfunctional since it randomly flushes
mid-renderpass and triggers invalidations

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
2023-03-17 14:38:00 +00:00
Mike Blumenkrantz
e3f0eaf5f9 zink: disable queries when flushing clears from set_fb
this otherwise has weird side effects, especially with rp optimizing enabled

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
2023-03-17 14:38:00 +00:00
Mike Blumenkrantz
5d94887f08 zink: add and use a function for "safely" ending renderpasses
these are all points at which a renderpass should be split, so make sure
renderpass data isn't reset in any way here

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
2023-03-17 14:37:59 +00:00
Mike Blumenkrantz
64b9cf5760 zink: reset tc fb info upon splitting a renderpass
not sure if this actually affects anything, but if a renderpass has
to be split for some reason, ensure subsequent renderpasses don't lose
data

also ensure that rp data isn't lost when triggering primgen clears and
delete a now-invalid assert

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
2023-03-17 14:37:59 +00:00
Mike Blumenkrantz
73528dd3b7 zink: don't use/update tc rp info while blitting
this is illegal

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
2023-03-17 14:37:59 +00:00
Mike Blumenkrantz
a858bcbb37 zink: add an assert to catch renderpass optimizing bugs
this should only trigger if tc has a bug

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21800>
2023-03-17 14:37:59 +00:00
Francisco Jerez
76b4255cd8 intel/fs: Fix register coalesce in presence of force_writemask_all copy source writes.
This fixes the behavior of register coalesce in cases where the source
of a copy is written elsewhere in the program by a force_writemask_all
instruction, which could cause the overwrite to be executed for an
inactive channel under non-uniform control flow, causing
can_coalesce_vars() to give incorrect results.  This has been reported
in cases like:

> while (true) {
>    x = imageSize(img);
>    if (non_uniform_condition()) {
>       y = x;
>       break;
>    }
> }
> use(y);

Currently the register coalesce pass would coalesce x and y in the
example above, which is invalid since in the example above imageSize()
is implemented as a force_writemask_all SEND message, whose result is
broadcast to all channels, so when a given channel executes 'y = x'
and breaks out of the loop, another divergent channel can execute a
subsequent iteration of the loop overwriting 'x' with a different
value, hence coalescing y and x into the same register changes the
behavior of the program.

Note that this is a regression introduced by commit a4b36cd3dd.  In
order to avoid the problem without reverting that patch, we prevent
register coalesce if there is an overwrite of the source with
force_writemask_all behavior inconsistent with the copy and this
occurs anywhere in the intersection of the live ranges of source and
destination, even if it occurs lexically before the copy, since it
might be physically executed after the copy under divergent loop
control flow.

Fixes: a4b36cd3dd ("intel/fs: Coalesce when the src live range is contained in the dst")
Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>
2023-03-17 03:05:24 -07:00
Francisco Jerez
d4015bcb38 intel/fs: Fix copy propagation dataflow analysis in presence of force_writemask_all ACP overwrites.
This fixes the behavior of copy propagation in cases where either the
source or destination of an ACP is overwritten elsewhere in the
program by a force_writemask_all instruction, which could cause the
overwrite to be executed for an inactive channel under non-uniform
control flow, causing the current per-channel dataflow propagation to
give incorrect results.  This has been reported in cases like:

> while (true) {
>    x = imageSize(img);
>    if (non_uniform_condition()) {
>       y = x;
>       break;
>    }
> }
> use(y);

Currently the copy propagation pass would propagate copy 'y = x' into
'use(y)', which is invalid since in the example above imageSize() is
implemented as a force_writemask_all SEND message, whose result is
broadcast to all channels, so when a given channel executes 'y = x'
and breaks out of the loop, another divergent channel can execute a
subsequent iteration of the loop overwriting 'x' with a different
value, hence replacing 'y' with 'x' at 'use(y)' changes the behavior
of the program.

This patch extends the global dataflow analysis algorithm to determine
whether there is any control flow path from a given copy to an
overwrite of its source or destination which has force_writemask_all
behavior inconsistent with the copy, and in such case prevents copy
propagation for that ACP entry at any point of the program which can
be reached from the overwrite, even if the copy is statically
re-executed along all such control flow paths (as in the example
above), since the execution of the overwrite for a given channel i may
corrupt other channels j!=i inactive for the subsequently re-executed
copy.

Note that a simpler solution has been attempted which fully shuts down
copy propagation if such a force_writemask_all ACP overwrite is
present /anywhere/ in the program regardless of its location in the
control flow graph, however that led to large shader-db regressions in
some programs from shader-db (like a CS from Car Chase which would
emit 53% more instructions).  With this solution the only handful of
shaders that suffer instruction count regressions seem to be getting
misoptimized right now (e.g. some compute shaders from Deus Ex
Mankind).  This solution doesn't seem to affect the run-time of
shader-db significantly, it's less than 1% higher with the fix
applied.

Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>
2023-03-17 03:05:20 -07:00
Francisco Jerez
1c1be23497 intel/fs: Track force_writemask_all behavior of copy propagation ACP entries.
force_writemask_all determines whether all channels of the copy are
actually valid, and may be required to be set for it to be propagated
safely in cases where the destination of the copy is used by another
force_writemask_all instruction, or when the copy occurs in a
divergent control flow block different from its use.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21351>
2023-03-17 03:05:18 -07:00
Kenneth Graunke
14f9f98dcb i965/vec4: Implement uclz in the vec4 backend
Commit 28311f9d02 moved ufind_msb lowering to NIR and started emitting
uclz.  Unfortunately, the vec4 backend never actually implemented uclz.

It's trivial to do.  Now it does.

Fixes: 28311f9d02 ("nir: intel/compiler: Move ufind_msb lowering to NIR")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21974>
2023-03-17 09:01:18 +00:00
Kenneth Graunke
e7ea2aa46c intel/fs: Make bld.F16TO32 actually emit F16TO32 not F32TO16
Ahem, "add builder helpers that work on Gfx7"...now might actually work.
Too much copy and paste...

Fixes: 966995d911 ("intel/fs: Add builder helpers for F32TO16/F16TO32 that work on Gfx7.x")
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21974>
2023-03-17 09:01:18 +00:00
Kenneth Graunke
84197bc0a4 intel/vec4: Retype texture/sampler indexes to UD
generate_tex() asserts that sampler_index.type == UD, but commit
83fd7a5ed1 removed the uint temporary, which caused us to see D at
some points.  Really, either should be fine, but let's just put the
UD retype back.  This fixes a ton of things in crocus.

Fixes: 83fd7a5ed1 ("intel: Use nir_lower_tex_options::lower_index_to_offset")
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21974>
2023-03-17 09:01:18 +00:00
Jesse Natalie
49885f87c3 nir: Propagate alignment when rematerializing cast derefs
Fixes: 878a8daca6 ("nir: Add alignment information to cast derefs")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21975>
2023-03-17 08:16:03 +00:00
Mike Blumenkrantz
9df68c633e zink: track tc fences better
tc fence lifetimes can exceed the lifetimes of their parent contexts,
which means they can be destroyed after mfence->fence has been destroyed

to avoid invalid memory access on a destroyed fence, store all the assigned
tc fences into an array on the real fence and then use that to unset fence
pointers on any outstanding tc fences

fixes flakiness in dEQP-EGL.functional.sharing.gles2.multithread.random_egl_sync.images.texsubimage2d.12

in caselist:
dEQP-EGL.functional.query_context.get_current_surface.rgba4444_pbuffer
dEQP-EGL.functional.create_surface.platform_window.rgba5551_depth_no_stencil
dEQP-EGL.functional.query_surface.simple.pbuffer.rgb888_depth_no_stencil
dEQP-EGL.functional.color_clears.multi_context.gles2.rgb888_pixmap
dEQP-EGL.functional.color_clears.multi_context.gles1_gles2.rgba8888_window
dEQP-EGL.functional.color_clears.multi_context.gles1_gles2_gles3.rgb888_window
dEQP-EGL.functional.render.multi_thread.gles2_gles3.rgba5551_pbuffer
dEQP-EGL.functional.sharing.gles2.multithread.random_egl_sync.buffers.buffersubdata.3
dEQP-EGL.functional.sharing.gles2.multithread.random_egl_sync.programs.link.6
dEQP-EGL.functional.sharing.gles2.multithread.random_egl_sync.images.texsubimage2d.12

cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21843>
2023-03-17 07:58:10 +00:00
Giancarlo Devich
7edae456e2 d3d12: Track up to 16 contexts worth of batch references locally in bos
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21909>
2023-03-17 07:43:08 +00:00