* int64 is a core type on Haiku (and potentially other platforms)
* rename to int64_avail matching other similar calls
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes:
- Sample shading now uses per-sample interpolation for colors if colors
are the only inputs. (this is the only case that was broken)
Optimizations:
- BC_OPTIMIZE (barycentric optimization) is now enabled with MSAA if colors
are qualified with both center and centroid. (BC_OPTIMIZE means that
the hardware skips initializing centroid (i,j) if they are equal to
center (i,j))
- If MSAA is disabled and at least 2 out of (center, centroid, sample) are
used by all inputs now including colors, center is forced for all inputs.
- If INTERP_MODE_COLOR is not used and the legacy GL shade model is flat,
the shader variant for flat shading is not generated.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8225>
If multiple rules could match, the rule that appears first in the file
is used.
Only Tiger Lake and Ice Lake are affected. Other platforms either have
a LRP instruction or can't run any shaders from shader-db that would
benefit.
v2: Fix issues created when this commit was rebased on top of
3c8934a644 ("nir/algebraic: add flrp patterns for 16 and 64 bits").
Noticed by Caio.
Tiger Lake and Ice Lake had similar results.
total instructions in shared programs: 20908672 -> 20908661 (<.01%)
instructions in affected programs: 419 -> 408 (-2.63%)
helped: 5
HURT: 0
helped stats (abs) min: 1 max: 3 x̄: 2.20 x̃: 3
helped stats (rel) min: 1.85% max: 3.19% x̄: 2.49% x̃: 2.65%
95% mean confidence interval for instructions value: -3.56 -0.84
95% mean confidence interval for instructions %-change: -3.24% -1.73%
Instructions are helped.
total cycles in shared programs: 473513940 -> 473513793 (<.01%)
cycles in affected programs: 7176 -> 7029 (-2.05%)
helped: 12
HURT: 0
helped stats (abs) min: 5 max: 22 x̄: 12.25 x̃: 12
helped stats (rel) min: 0.84% max: 3.24% x̄: 2.09% x̃: 1.80%
95% mean confidence interval for cycles value: -15.43 -9.07
95% mean confidence interval for cycles %-change: -2.57% -1.61%
Cycles are helped.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
This prevents other transformations from converting them to 'a != 0'.
For example, both of these transformations can do this:
(('~flt', 0.0, ('fabs', a)), ('fne', a, 0.0)),
(('~flt', ('fneg', ('fabs', a)), 0.0), ('fne', a, 0.0)),
Both fsign(fabs(NaN)) and fsign(fneg(fabs(NaN))) should produce zero,
but, since 'NaN != 0.0' is true, cascading these transformations could
cause them to generate 1.0 or -1.0 respecively.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
OpenGL GLSL, OpenGL ARB assembly shaders, and DX9 are pretty loose about
the behavior in the presence of NaNs. Many GPUs that implement these
specifications do not even have a representation of NaN. However,
OpenCL and Vulkan SPIR-V are not so lax. Both actually have some
required behavior in the presence of NaN, and, of the two, OpenCL is the
most strict.
For years we have implemented SPIR-V by using the same comparison
opcodes as we use for OpenGL GLSL and OpenGL assembly shaders. This has
repeatedly caused problems where an optimization that is valid in the
NaN-relaxed world is not valid in Vulkan or OpenCL. To fix this, set
the "exact" flag on comparisons instructions generated from SPIR-V.
This will block optimizations that may have different NaN behavior.
v2: Set the exact flag in the nir_builder, not in the vtn_builder.
v3: Add an assertion in vtn_handle_constant that the exact flag wasn't
set (because it's ignored). Rebase on 80163bbec3 ("nir/vtn: Support
OpOrdered and OpUnordered opcodes"). Mark the NIR generated for those
opcodes as exact as well.
v4: s/unused_exact/exact/ in a couple places, and assert that exact has
the expected value (true in one place, false in the other). Suggested
by Caio.
Closes: #3345
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Fixes: 8513b12590 ("nir/opt_if: split ALU from Phi more aggressively")
This commit doesn't really fix anything in 8513b12590. However,
without 8513b12590, a regression is triggered in RADV on No Man's
Sky. I want to ensure that this change is only applied on top of
8513b12590, and Fixes: seems the safest way to do that.
No shader-db changes on any Intel platform. This only affects SPIR-V,
and we have no OpenGL SPIR-V shaders in shader-db.
124 shaders in Shadow of the Tomb Raider (Steam "native") were hurt by 1
spill and 1 fill each.
All Intel platforms had similar results. (Tiger Lake shown)
Instructions in all programs: 155668276 -> 155685764 (+0.0%)
SENDs in all programs: 6474570 -> 6474570 (+0.0%)
Loops in all programs: 35271 -> 35271 (+0.0%)
Cycles in all programs: 3198055373 -> 3198628031 (+0.0%)
Spills in all programs: 231522 -> 231646 (+0.1%)
Fills in all programs: 347571 -> 347695 (+0.0%)
Vega
Totals:
SGPRs: 20955712 -> 20956756 (+0.00%); split: -0.02%, +0.03%
VGPRs: 13476920 -> 13473132 (-0.03%); split: -0.07%, +0.04%
CodeSize: 613371940 -> 613339348 (-0.01%); split: -0.06%, +0.05%
MaxWaves: 3111886 -> 3112481 (+0.02%); split: +0.02%, -0.00%
Instrs: 120723785 -> 120746991 (+0.02%); split: -0.04%, +0.06%
Cycles: 626658992 -> 626862708 (+0.03%); split: -0.05%, +0.08%
VMEM: 216330854 -> 216343196 (+0.01%); split: +0.04%, -0.04%
SMEM: 32079391 -> 32081972 (+0.01%); split: +0.05%, -0.04%
VClause: 2688784 -> 2688789 (+0.00%); split: -0.03%, +0.03%
SClause: 6554669 -> 6556251 (+0.02%); split: -0.01%, +0.03%
Copies: 5356667 -> 5353283 (-0.06%); split: -0.36%, +0.29%
Branches: 954466 -> 954716 (+0.03%); split: -0.01%, +0.04%
PreSGPRs: 9078300 -> 9081626 (+0.04%); split: -0.01%, +0.05%
PreVGPRs: 10972090 -> 10966576 (-0.05%); split: -0.06%, +0.01%
Totals from 48239 (12.08% of 399432) affected shaders:
SGPRs: 2713984 -> 2715028 (+0.04%); split: -0.16%, +0.19%
VGPRs: 1997804 -> 1994016 (-0.19%); split: -0.46%, +0.27%
CodeSize: 172094092 -> 172061500 (-0.02%); split: -0.21%, +0.19%
MaxWaves: 337327 -> 337922 (+0.18%); split: +0.20%, -0.02%
Instrs: 33053657 -> 33076863 (+0.07%); split: -0.15%, +0.22%
Cycles: 254961228 -> 255164944 (+0.08%); split: -0.12%, +0.20%
VMEM: 15165226 -> 15177568 (+0.08%); split: +0.59%, -0.51%
SMEM: 3304938 -> 3307519 (+0.08%); split: +0.49%, -0.41%
VClause: 766225 -> 766230 (+0.00%); split: -0.12%, +0.12%
SClause: 1332645 -> 1334227 (+0.12%); split: -0.04%, +0.16%
Copies: 2040651 -> 2037267 (-0.17%); split: -0.94%, +0.77%
Branches: 743668 -> 743918 (+0.03%); split: -0.01%, +0.05%
PreSGPRs: 1697667 -> 1700993 (+0.20%); split: -0.07%, +0.27%
PreVGPRs: 1718424 -> 1712910 (-0.32%); split: -0.39%, +0.07%
Polaris
Totals:
SGPRs: 21349172 -> 21354376 (+0.02%); split: -0.02%, +0.04%
VGPRs: 13690680 -> 13686920 (-0.03%); split: -0.07%, +0.04%
CodeSize: 613745824 -> 613704988 (-0.01%); split: -0.06%, +0.05%
MaxWaves: 2775012 -> 2775189 (+0.01%); split: +0.01%, -0.00%
Instrs: 120735079 -> 120756209 (+0.02%); split: -0.04%, +0.06%
Cycles: 627906100 -> 628076156 (+0.03%); split: -0.05%, +0.08%
VMEM: 216623065 -> 216641838 (+0.01%); split: +0.04%, -0.04%
SMEM: 32295618 -> 32299338 (+0.01%); split: +0.05%, -0.04%
VClause: 2711025 -> 2711141 (+0.00%); split: -0.03%, +0.04%
SClause: 6545185 -> 6546769 (+0.02%); split: -0.01%, +0.03%
Copies: 5387723 -> 5383249 (-0.08%); split: -0.37%, +0.29%
Branches: 953775 -> 953954 (+0.02%); split: -0.01%, +0.03%
PreSGPRs: 9148814 -> 9153211 (+0.05%); split: -0.01%, +0.06%
PreVGPRs: 11029429 -> 11023915 (-0.05%); split: -0.06%, +0.01%
Totals from 48239 (12.00% of 402052) affected shaders:
SGPRs: 2682056 -> 2687260 (+0.19%); split: -0.16%, +0.35%
VGPRs: 1994436 -> 1990676 (-0.19%); split: -0.46%, +0.27%
CodeSize: 170857060 -> 170816224 (-0.02%); split: -0.21%, +0.19%
MaxWaves: 295429 -> 295606 (+0.06%); split: +0.07%, -0.01%
Instrs: 32808802 -> 32829932 (+0.06%); split: -0.16%, +0.22%
Cycles: 254633252 -> 254803308 (+0.07%); split: -0.13%, +0.20%
VMEM: 14897934 -> 14916707 (+0.13%); split: +0.65%, -0.52%
SMEM: 3289726 -> 3293446 (+0.11%); split: +0.53%, -0.42%
VClause: 775318 -> 775434 (+0.01%); split: -0.11%, +0.13%
SClause: 1304867 -> 1306451 (+0.12%); split: -0.04%, +0.16%
Copies: 2026334 -> 2021860 (-0.22%); split: -0.99%, +0.77%
Branches: 742554 -> 742733 (+0.02%); split: -0.02%, +0.04%
PreSGPRs: 1690887 -> 1695284 (+0.26%); split: -0.07%, +0.33%
PreVGPRs: 1717709 -> 1712195 (-0.32%); split: -0.40%, +0.07%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
This prevents some fossil-db regressions in "spir-v: Mark floating point
comparisons exact".
v2: Note that the patterns and replacements produce the same value when
isnan(b). Suggested by Caio.
v3: Use C99 isfinite() instead of (obsolete) BSD finite(). Fixes
various Windows builds.
No fossil-db changes on any Inetl platform, Vega, or Polaris10.
All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 20908670 -> 20908672 (<.01%)
instructions in affected programs: 69 -> 71 (2.90%)
helped: 0
HURT: 1
total cycles in shared programs: 473515288 -> 473513940 (<.01%)
cycles in affected programs: 4942 -> 3594 (-27.28%)
helped: 2
HURT: 0
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
This also prevents some fossil-db regressions in "spir-v: Mark floating
point comparisons exact".
v2: Mark the fmin / fmax in the replacement exact to prevent other
optimizations from ruining the NaN-clensing property of the fmin / fmax.
Suggested by Rhys. Don't assume that constants are not NaN because some
components of a vector might be NaN while others are numbers. Noticed
by Rhys. This causes ~8 more shaders in Age of Wonders III (dxvk) to
regress on cycles (not instructions) by less than 1% when "spir-v: Mark
floating point comparisons exact" is applied. This difference is too
small to care.
All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 20908668 -> 20908670 (<.01%)
instructions in affected programs: 9196 -> 9198 (0.02%)
helped: 10
HURT: 5
helped stats (abs) min: 1 max: 2 x̄: 1.40 x̃: 1
helped stats (rel) min: 0.02% max: 5.41% x̄: 2.20% x̃: 2.16%
HURT stats (abs) min: 2 max: 6 x̄: 3.20 x̃: 3
HURT stats (rel) min: 2.44% max: 16.67% x̄: 9.39% x̃: 12.50%
95% mean confidence interval for instructions value: -1.22 1.49
95% mean confidence interval for instructions %-change: -2.08% 5.41%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 473515330 -> 473515288 (<.01%)
cycles in affected programs: 67146 -> 67104 (-0.06%)
helped: 10
HURT: 7
helped stats (abs) min: 1 max: 36 x̄: 15.90 x̃: 17
helped stats (rel) min: 0.01% max: 1.29% x̄: 0.66% x̃: 0.89%
HURT stats (abs) min: 1 max: 48 x̄: 16.71 x̃: 4
HURT stats (rel) min: 0.08% max: 1.94% x̄: 0.87% x̃: 0.19%
95% mean confidence interval for cycles value: -13.88 8.94
95% mean confidence interval for cycles %-change: -0.56% 0.49%
Inconclusive result (value mean confidence interval includes 0).
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
GLSL and SPIR-V GLSL.std.450 don't have any requirements for fsign(NaN),
and both only require that FSign(-0.0) == 0.0. OpenCL, on the other
hand, requires sign(-0.0) be exactly -0.0. It also requires that
sign(NaN) be exactly 0.0.
In practice, this change is difficult to test. Our GLSL frontend
already constant folds sign(NaN) to 0.0 before even getting to NIR. As
far as I can tell, glslang does the same. I don't have a good way to
run an OpenCL SPIR-V test. Maybe SPIR-V GLSL.std.450 assembly?
No shader-db or fossil-db changes on any Intel platform.
Acked-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
I originally noticed that 3b30814791 ("nir/algebraic: Optimize 1-bit
Booleans") caused this pattern no longer be matched by incorrectly
replacing b@32 with b@1. Making that correct had no effect on
shader-db. When this pattern originally was added, it only affected 4
shaders, so it's not worth the effort to debug further.
This reverts commit f50400cc80.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
The original comment was a little terse and a little incorrect. The
rearrangements are fine w.r.t. NaN. However, they produce incorrect
results if one operand is +Inf and the other is -Inf.
A later commit, "nir/algebraic: Add some compare-with-zero optimizations
that are exact", will add some more patterns here. It may be reasonable
to squash this commit (forward) into that commit.
v2: Fix some incorrect comparisons operators in the comment (<= vs >=).
Add commentary that subtraction works like addition w.r.t. NaN. Both
noticed / suggested by Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>
This commit only documents the current behavior, even if that behavior
is not the behavior preferred by the relevant specs.
In SPIR-V, there are two flavors of the sign instruction, and each lives
in an extended instruction set. The GLSL.std.450 FSign instruction is
defined as:
Result is 1.0 if x > 0, 0.0 if x = 0, or -1.0 if x < 0.
This also matches the GLSL 4.60 definition.
However, the OpenCL.ExtendedInstructionSet.100 sign instruction is
defined as:
Returns 1.0 if x > 0, -0.0 if x = -0.0, +0.0 if x = +0.0, or -1.0 if
x < 0. Returns 0.0 if x is a NaN.
There are two differences. Each treats -0.0 differently, and each also
treats NaN differently. Specifically, GLSL.std.450 FSign does not
define any specific behavior for NaN.
There has been some discussion in Khronos about the NaN behavior of
GLSL.std.450 FSign. As part of that discussion, I did some research
into how we treat NaN for nir_op_fsign, and this commit just captures
some of those notes.
v2: Document the expected behavior of nir_op_fsign more thoroughly.
Suggested by Rhys. Note that the current implementation of constant
folding does not produce the expected result for NaN. Suggested by
Caio.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> [v1]
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6358>