some passes (e.g., opt_shrink_vector) operate on the assumption that
sparse tex ops have a certain number of components and then remove components
and unset the sparse flag if they can optimize out the sparse usage
zink's sparse ops do not have the standard number of components, which
causes such passes to make incorrect assumptions and tag them as
not being sparse, which breaks everything
fix#10540
Fixes: 0d652c0c8d ("zink: shrink vectors during optimization")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27414>
the util function here takes a bitmask of memory type indices, not properties.
rename the function and correct the usage
fixes sparse on nvidia blob
Fixes: c71287e70c ("zink: correct sparse bo mem_type_idx placement")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27414>
The extra assertions are just there to help validate
pack_lod_and_array_index (in nir_lower_tex.c).
v2: Split got_lod_or_bias into two variables. This simplifies some
changes that Sagar is working on. Suggested by Sagar.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
Note: a future commit will expand the sampler message type to the 6 bits
used on Xe2.
v2 (Francisco Jerez): Rebase on 07b9bfacc7 ("intel/compiler: Move
logical-send lowering to a separate file").
v3: Drop XE2_SAMPLER_MESSAGE_SAMPLE_BIAS_MLOD as it does not actually
exist. This resulted in some bigger changes in brw_disasm.c. Noticed
by Sagar.
v4: Now that XE2_SAMPLER_MESSAGE_SAMPLE_MLODc conflicts with
GFX7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C, the determination of
min_lod_is_first must include devinfo->ver or previous platforms will
break.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
Message types are expanded to 6-bit encoding now. 5 bits are still the
same field from the Sampler Message Descriptor. The most significant bit
is now bit 31 of the Sampler Message Descriptor. The messages that have
'1 in bit 6 are only to support programmable offsets and those would
require message header. If a sampler type shows only 5 bits encoding, it
is implied bit 6 equal to 0 and there is no requirement for header.
v2 (idr): Trivial formatting changes.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
This small refactor simplifies a later commit that will optionally emit
some opcodes before the switch (as is already done with the shadow
comparitor).
v2 (Francisco Jerez): Rebase on 07b9bfacc7 ("intel/compiler: Move
logical-send lowering to a separate file").
v3 (Jordan): SHADER_OPCODE_TXL => SHADER_OPCODE_TXL_LZ (was
SHADER_OPCODE_TXF_LZ).
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
The Bspec also says, "The table below describes the SIMD modes which
are supported. SIMD32 and SIMD64 are used for media-type operations
only." Perhaps this commit should just add
if (devinfo->ver >= 20)
return 16;
instead.
v2: Use reg_unit in get_sampler_lowered_simd_width. Suggested by Sagar.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
Some hardware that doesn't support true static samplers, emulates it
by copying all static samplers into a reserved portion of every descriptor
heap. To support Vulkan's required 4000 live sampler limit in bindless
mode, D3D is now able to create descriptor heaps which do not have a reserved
portion. Any descriptor heaps above the MaxSamplerDescriptorHeapSizeWithStaticSamplers
limit will not have that reserved portion and cannot be used with static samplers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27348>
DXIL doesn't have instruction-level coherency. We have 3 options:
1. Promote the instruction to an atomic instruction. We can only do this
for 32-bit or 64-bit ops.
2. If using bindless, declare the local resource declaration as globally-coherent.
3. If not using bindless, add globally-coherent to the global resource declaration.
This pass does all 3 of these, stopping at the intrinsic level for supported types
of atomics, otherwise assigning to the global resource declaration, which will be
unused if we're doing bindless, where instead we'll get it from the instruction.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27348>
The optimization pass will (eventually) turn the imul into a
umul_32x16. In many cases, the multiply will be converted to something
else.
I also tried cloning a bunch of existing imul algebraic patterns for
[iu]mul_32x16. This produced the same result, but it was a lot more
churn.
All of the shaders affected were ray tracing shaders in Q2RTX. This is
the only ray tracing workload in my fossil-db.
DG2
Totals:
Instrs: 191995626 -> 191995079 (-0.00%); split: -0.00%, +0.00%
Cycles: 14003803561 -> 14003798040 (-0.00%); split: -0.00%, +0.00%
Spill count: 108320 -> 108288 (-0.03%)
Fill count: 200695 -> 200663 (-0.02%)
Scratch Memory Size: 8755200 -> 8754176 (-0.01%)
Totals from 7 (0.00% of 652118) affected shaders:
Instrs: 14998 -> 14451 (-3.65%); split: -3.94%, +0.29%
Cycles: 137222 -> 131701 (-4.02%); split: -4.10%, +0.07%
Spill count: 32 -> 0 (-inf%)
Fill count: 32 -> 0 (-inf%)
Scratch Memory Size: 19456 -> 18432 (-5.26%)
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27161>
As per this optimisations description:
"Takes assignments to variables that are dereferenced only
once and pastes the RHS expression into where the variables
dereferenced."
However the optimisation is run at compile time before multiple
shaders from the same stage could have been pasted together.
So this optimisation can incorrectly assume a global is only
referenced once since it cannot see the other pieces of the
shader stage until link time.
Here we skip the optimisation if the variable is a global. We
could change it to only run at link time however this
optimisation is only run at link time if we are being forced
to use GLSL IR to inline a function that glsl to nir cannot
handle and this will also be removed in a future patchset.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10482
Fixes: d75a36a9ee ("glsl: remove do_copy_propagation_elements() optimisation pass")
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27351>
I put dashes in the dates when I first introduced the image tags; it
made sense to improve date readability as we had only a handful of these
and they barely combined.
Nowadays we combine a lot of these tags to form the docker image tags,
and we often run out of space.
Let's remove these dashes, making dates slightly harder to read, and
instead allow these two extra characters to be used in the
unique/descriptive part of the tag.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27379>
Use template specialization to handle the static control flow based
on template parameters during code generation instead of relying
on the optimizer to remove the unused code path.
v2: - Fix function opening brace location (zmike)
- remove accidently added dead code
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27327>
Use template specialization to handle the static control flow based
on template parameters during code generation instead of relying
on the optimizer to remove the unused code path.
v2: - move function opening braces to new line and fix indetion (zmike)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27327>
Replace the generic true/false by an enum to make the intent clearer.
Factor out the emission of the barrier, and use template specialization
to pick the type of barrier that is to be emitted, because with template
specialization the control flow is avoided altogether, whereas with
the static code flow it is up to the optimizer to remove the unused bits -
which may not happen in debug builds.
v2: Fix function start braces (zmike)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27327>