Commit Graph

50 Commits

Author SHA1 Message Date
Ian Romanick
73fa3a1ca4 soft-fp64/fadd: Instead of tracking "b < a", track sign of the difference
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 824403 -> 822766 (-0.20%)
instructions in affected programs: 756260 -> 754623 (-0.22%)
helped: 68
HURT: 1
helped stats (abs) min: 1 max: 118 x̄: 26.26 x̃: 18
helped stats (rel) min: 0.02% max: 0.97% x̄: 0.31% x̃: 0.23%
HURT stats (abs)   min: 149 max: 149 x̄: 149.00 x̃: 149
HURT stats (rel)   min: 0.17% max: 0.17% x̄: 0.17% x̃: 0.17%
95% mean confidence interval for instructions value: -31.94 -15.51
95% mean confidence interval for instructions %-change: -0.37% -0.23%
Instructions are helped.

total cycles in shared programs: 6828935 -> 6816791 (-0.18%)
cycles in affected programs: 6385191 -> 6373047 (-0.19%)
helped: 73
HURT: 0
helped stats (abs) min: 2 max: 852 x̄: 166.36 x̃: 120
helped stats (rel) min: <.01% max: 0.80% x̄: 0.22% x̃: 0.17%
95% mean confidence interval for cycles value: -210.80 -121.91
95% mean confidence interval for cycles %-change: -0.27% -0.17%
Cycles are helped.

total fills in shared programs: 1442 -> 1497 (3.81%)
fills in affected programs: 1442 -> 1497 (3.81%)
helped: 0
HURT: 1

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2020-03-18 20:36:29 +00:00
Ian Romanick
5b07f542e5 soft-fp64: Optimize __fmin64 and __fmax64 by using different evaluation order [v2]
v2: Go to extra effort to avoid flow control inserted to implement
short-circuit evaluation rules.

Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 797779 -> 796849 (-0.12%)
instructions in affected programs: 3499 -> 2569 (-26.58%)
helped: 21
HURT: 0
helped stats (abs) min: 8 max: 112 x̄: 44.29 x̃: 44
helped stats (rel) min: 16.09% max: 33.15% x̄: 25.72% x̃: 24.62%
95% mean confidence interval for instructions value: -55.94 -32.63
95% mean confidence interval for instructions %-change: -28.14% -23.30%
Instructions are helped.

total cycles in shared programs: 6601355 -> 6588351 (-0.20%)
cycles in affected programs: 25376 -> 12372 (-51.25%)
helped: 21
HURT: 0
helped stats (abs) min: 156 max: 1410 x̄: 619.24 x̃: 526
helped stats (rel) min: 42.39% max: 53.98% x̄: 50.12% x̃: 50.75%
95% mean confidence interval for cycles value: -776.58 -461.89
95% mean confidence interval for cycles %-change: -51.57% -48.67%
Cycles are helped.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2020-03-18 20:36:29 +00:00
Ian Romanick
617a69107e soft-fp64/ffloor: Simplify the >= 0 comparison
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 797951 -> 797779 (-0.02%)
instructions in affected programs: 126482 -> 126310 (-0.14%)
helped: 15
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 11.47 x̃: 10
helped stats (rel) min: <.01% max: 0.60% x̄: 0.28% x̃: 0.29%
95% mean confidence interval for instructions value: -14.79 -8.14
95% mean confidence interval for instructions %-change: -0.40% -0.16%
Instructions are helped.

total cycles in shared programs: 6601437 -> 6601355 (<.01%)
cycles in affected programs: 1089336 -> 1089254 (<.01%)
helped: 15
HURT: 0
helped stats (abs) min: 2 max: 12 x̄: 5.47 x̃: 6
helped stats (rel) min: <.01% max: 0.04% x̄: 0.01% x̃: 0.01%
95% mean confidence interval for cycles value: -7.06 -3.87
95% mean confidence interval for cycles %-change: -0.02% <.01%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2020-03-18 20:36:29 +00:00
Ian Romanick
abf28d6a70 soft-fp64: Relax the way NaN is propagated
Also reassociate a couple expressions to encourage some CSE.

Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 813599 -> 797951 (-1.92%)
instructions in affected programs: 796110 -> 780462 (-1.97%)
helped: 92
HURT: 0
helped stats (abs) min: 3 max: 5198 x̄: 170.09 x̃: 83
helped stats (rel) min: 0.36% max: 5.50% x̄: 1.57% x̃: 1.40%
95% mean confidence interval for instructions value: -282.42 -57.75
95% mean confidence interval for instructions %-change: -1.71% -1.42%
Instructions are helped.

total cycles in shared programs: 6687128 -> 6601437 (-1.28%)
cycles in affected programs: 6582246 -> 6496555 (-1.30%)
helped: 92
HURT: 0
helped stats (abs) min: 36 max: 14442 x̄: 931.42 x̃: 592
helped stats (rel) min: 0.45% max: 3.16% x̄: 1.44% x̃: 1.23%
95% mean confidence interval for cycles value: -1257.58 -605.27
95% mean confidence interval for cycles %-change: -1.58% -1.30%
Cycles are helped.

total spills in shared programs: 759 -> 702 (-7.51%)
spills in affected programs: 759 -> 702 (-7.51%)
helped: 3
HURT: 0

total fills in shared programs: 2412 -> 1442 (-40.22%)
fills in affected programs: 2412 -> 1442 (-40.22%)
helped: 3
HURT: 0

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2020-03-18 20:36:29 +00:00
Ian Romanick
8178fa8876 soft-fp64/fsat: Micro-optimize x >= 1 test
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 841590 -> 841332 (-0.03%)
instructions in affected programs: 121957 -> 121699 (-0.21%)
helped: 7
HURT: 0
helped stats (abs) min: 15 max: 54 x̄: 36.86 x̃: 41
helped stats (rel) min: 0.16% max: 0.33% x̄: 0.23% x̃: 0.18%
95% mean confidence interval for instructions value: -49.73 -23.98
95% mean confidence interval for instructions %-change: -0.29% -0.16%
Instructions are helped.

total cycles in shared programs: 6926828 -> 6923967 (-0.04%)
cycles in affected programs: 1038569 -> 1035708 (-0.28%)
helped: 7
HURT: 0
helped stats (abs) min: 128 max: 616 x̄: 408.71 x̃: 446
helped stats (rel) min: 0.18% max: 0.44% x̄: 0.29% x̃: 0.22%
95% mean confidence interval for cycles value: -571.72 -245.70
95% mean confidence interval for cycles %-change: -0.38% -0.19%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2020-03-18 20:36:29 +00:00
Ian Romanick
b6f58b4709 soft-fp64/fsat: Micro-optimize x < 0 test
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 841647 -> 841590 (<.01%)
instructions in affected programs: 122014 -> 121957 (-0.05%)
helped: 7
HURT: 0
helped stats (abs) min: 3 max: 12 x̄: 8.14 x̃: 9
helped stats (rel) min: 0.04% max: 0.07% x̄: 0.05% x̃: 0.04%
95% mean confidence interval for instructions value: -11.23 -5.06
95% mean confidence interval for instructions %-change: -0.06% -0.03%
Instructions are helped.

total cycles in shared programs: 6926904 -> 6926828 (<.01%)
cycles in affected programs: 1038645 -> 1038569 (<.01%)
helped: 7
HURT: 0
helped stats (abs) min: 4 max: 16 x̄: 10.86 x̃: 12
helped stats (rel) min: <.01% max: 0.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -14.97 -6.74
95% mean confidence interval for cycles %-change: -0.01% <.01%
Cycles are helped.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2020-03-18 20:36:29 +00:00
Ian Romanick
7673dcbd21 soft-fp64/fsat: Correctly handle NaN
fsat is defined as min(max(a, 0.0), 1.0), and IEEE defines both min and
max to return the non-NaN value when one value is NaN.  Based on this,
fsat should definitely return 0.0 for NaN.

Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 841666 -> 841647 (<.01%)
instructions in affected programs: 122033 -> 122014 (-0.02%)
helped: 7
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 2.71 x̃: 3
helped stats (rel) min: 0.01% max: 0.02% x̄: 0.02% x̃: 0.01%
95% mean confidence interval for instructions value: -3.74 -1.69
95% mean confidence interval for instructions %-change: -0.02% -0.01%
Instructions are helped.

total cycles in shared programs: 6927246 -> 6926904 (<.01%)
cycles in affected programs: 1038987 -> 1038645 (-0.03%)
helped: 7
HURT: 0
helped stats (abs) min: 18 max: 72 x̄: 48.86 x̃: 54
helped stats (rel) min: 0.03% max: 0.05% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -67.38 -30.33
95% mean confidence interval for cycles %-change: -0.05% -0.02%
Cycles are helped.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: a42163cbbc ("compiler: Add lowering support for 64-bit saturate operations to software")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2585
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2020-03-18 20:36:29 +00:00
Ian Romanick
b421c0466d soft-fp64/flt: Perform checks in a different order
The change to nir_opt_algebraic cleans up a pattern that was never
produced before the rest of this commit was added.

Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 843005 -> 841666 (-0.16%)
instructions in affected programs: 460655 -> 459316 (-0.29%)
helped: 64
HURT: 17
helped stats (abs) min: 1 max: 72 x̄: 21.72 x̃: 20
helped stats (rel) min: 0.01% max: 28.07% x̄: 12.67% x̃: 16.07%
HURT stats (abs)   min: 1 max: 7 x̄: 3.00 x̃: 2
HURT stats (rel)   min: 0.01% max: 0.04% x̄: 0.02% x̃: 0.02%
95% mean confidence interval for instructions value: -20.87 -12.19
95% mean confidence interval for instructions %-change: -12.35% -7.66%
Instructions are helped.

total cycles in shared programs: 6944998 -> 6927246 (-0.26%)
cycles in affected programs: 3891872 -> 3874120 (-0.46%)
helped: 71
HURT: 10
helped stats (abs) min: 2 max: 772 x̄: 254.21 x̃: 156
helped stats (rel) min: <.01% max: 66.44% x̄: 21.72% x̃: 18.40%
HURT stats (abs)   min: 18 max: 69 x̄: 29.70 x̃: 20
HURT stats (rel)   min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -270.82 -167.50
95% mean confidence interval for cycles %-change: -24.41% -13.65%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2020-03-18 20:36:29 +00:00
Ian Romanick
f6992bf624 soft-fp64/fneg: Don't treat NaN specially
__fabs64 doesn't do anything special, and the value is still NaN
regardless of the value of the MSB.  In a strict sense, it's possible
that both functions should set the "signal" bit.

lts on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 844558 -> 843005 (-0.18%)
instructions in affected programs: 725975 -> 724422 (-0.21%)
helped: 53
HURT: 4
helped stats (abs) min: 1 max: 313 x̄: 29.87 x̃: 21
helped stats (rel) min: 0.01% max: 0.94% x̄: 0.30% x̃: 0.22%
HURT stats (abs)   min: 4 max: 11 x̄: 7.50 x̃: 7
HURT stats (rel)   min: 0.03% max: 0.09% x̄: 0.05% x̃: 0.04%
95% mean confidence interval for instructions value: -39.02 -15.47
95% mean confidence interval for instructions %-change: -0.34% -0.21%
Instructions are helped.

total cycles in shared programs: 6962024 -> 6944998 (-0.24%)
cycles in affected programs: 6185470 -> 6168444 (-0.28%)
helped: 59
HURT: 0
helped stats (abs) min: 64 max: 2863 x̄: 288.58 x̃: 208
helped stats (rel) min: 0.11% max: 0.87% x̄: 0.33% x̃: 0.27%
95% mean confidence interval for cycles value: -387.15 -190.00
95% mean confidence interval for cycles %-change: -0.38% -0.28%
Cycles are helped.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2020-03-18 20:36:29 +00:00
Ian Romanick
de4acd8816 soft-fp64: Store sign value as 0 or 0x80000000
...instead of 0 or 1.  Many places the sign bit is extracted, then later
put back in the same position.  This saves some left-shift operations.

Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 848106 -> 844558 (-0.42%)
instructions in affected programs: 833480 -> 829932 (-0.43%)
helped: 106
HURT: 1
helped stats (abs) min: 1 max: 995 x̄: 33.48 x̃: 12
helped stats (rel) min: 0.15% max: 2.20% x̄: 0.60% x̃: 0.35%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: <.01% max: <.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for instructions value: -51.88 -14.43
95% mean confidence interval for instructions %-change: -0.71% -0.47%
Instructions are helped.

total cycles in shared programs: 6969125 -> 6962024 (-0.10%)
cycles in affected programs: 6717689 -> 6710588 (-0.11%)
helped: 78
HURT: 7
helped stats (abs) min: 2 max: 2083 x̄: 110.27 x̃: 56
helped stats (rel) min: <.01% max: 0.30% x̄: 0.11% x̃: 0.11%
HURT stats (abs)   min: 2 max: 1340 x̄: 214.29 x̃: 4
HURT stats (rel)   min: 0.01% max: 0.71% x̄: 0.13% x̃: 0.02%
95% mean confidence interval for cycles value: -144.02 -23.06
95% mean confidence interval for cycles %-change: -0.12% -0.07%
Cycles are helped.

total spills in shared programs: 814 -> 759 (-6.76%)
spills in affected programs: 814 -> 759 (-6.76%)
helped: 2
HURT: 1

total fills in shared programs: 2488 -> 2412 (-3.05%)
fills in affected programs: 2488 -> 2412 (-3.05%)
helped: 2
HURT: 1

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2020-03-18 20:36:29 +00:00
Ian Romanick
598e2fc6a1 soft-fp64: Pick a single idiom for treating sign value as a Boolean
Replace all of the bool(qSign) with qSign != 0u.  Remove unnecessary
parenthesis from around most of the existing qSign != 0u.

This dramatically simplifies the next commit.

Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 848109 -> 848106 (<.01%)
instructions in affected programs: 53 -> 50 (-5.66%)
helped: 1
HURT: 0

total cycles in shared programs: 6969145 -> 6969125 (<.01%)
cycles in affected programs: 396 -> 376 (-5.05%)
helped: 1
HURT: 0

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2020-03-18 20:36:29 +00:00
Ian Romanick
325a21f5eb soft-fp64: Simplify __countLeadingZeros32 function
findMSB returns -1 for an input of zero, so 31 - findMSB(a) is
sufficient on its own.

There's only one user of findMSB in shader-db, and it does not match
this pattern.

TODO: Add a pattern in the backend code generator that emits 31 -
nir_op_ufind_msb(a) as if it were nir_op_uclz.  That should save a couple
instructions.

Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 859509 -> 848109 (-1.33%)
instructions in affected programs: 841058 -> 829658 (-1.36%)
helped: 97
HURT: 0
helped stats (abs) min: 3 max: 1161 x̄: 117.53 x̃: 72
helped stats (rel) min: 0.98% max: 6.74% x̄: 1.70% x̃: 1.35%
95% mean confidence interval for instructions value: -147.21 -87.84
95% mean confidence interval for instructions %-change: -1.94% -1.46%
Instructions are helped.

total cycles in shared programs: 7072275 -> 6969145 (-1.46%)
cycles in affected programs: 6955767 -> 6852637 (-1.48%)
helped: 97
HURT: 0
helped stats (abs) min: 32 max: 10900 x̄: 1063.20 x̃: 560
helped stats (rel) min: 1.18% max: 7.58% x̄: 1.84% x̃: 1.45%
95% mean confidence interval for cycles value: -1339.43 -786.96
95% mean confidence interval for cycles %-change: -2.11% -1.57%
Cycles are helped.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2020-03-18 20:36:29 +00:00
Ian Romanick
812230fd94 soft-fp64: Don't open-code umulExtended
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 928859 -> 859509 (-7.47%)
instructions in affected programs: 866293 -> 796943 (-8.01%)
helped: 76
HURT: 0
helped stats (abs) min: 75 max: 8042 x̄: 912.50 x̃: 688
helped stats (rel) min: 5.35% max: 21.02% x̄: 10.35% x̃: 7.58%
95% mean confidence interval for instructions value: -1138.37 -686.63
95% mean confidence interval for instructions %-change: -11.69% -9.00%
Instructions are helped.

total cycles in shared programs: 7272912 -> 7072275 (-2.76%)
cycles in affected programs: 6763486 -> 6562849 (-2.97%)
helped: 76
HURT: 0
helped stats (abs) min: 214 max: 30136 x̄: 2639.96 x̃: 1923
helped stats (rel) min: 1.75% max: 9.20% x̄: 4.04% x̃: 2.41%
95% mean confidence interval for cycles value: -3455.29 -1824.63
95% mean confidence interval for cycles %-change: -4.69% -3.39%
Cycles are helped.

total spills in shared programs: 817 -> 814 (-0.37%)
spills in affected programs: 791 -> 788 (-0.38%)
helped: 2
HURT: 0

total fills in shared programs: 2438 -> 2488 (2.05%)
fills in affected programs: 2392 -> 2442 (2.09%)
helped: 0
HURT: 2

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2020-03-18 20:36:29 +00:00
Ian Romanick
d1e0227ef1 soft-fp64/b2f: Reimplement using bitwise logic ops
This doesn't help a lot of shaders, but it helps those few a LOT.

This could also be implemented using bcsel.  That version is very
slightly worse because the generated SEL instruction wants to have two
immediate sources, so one of them usually needs an extra MOV instruction
to load.

Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 929619 -> 928859 (-0.08%)
instructions in affected programs: 1651 -> 891 (-46.03%)
helped: 8
HURT: 0
helped stats (abs) min: 38 max: 152 x̄: 95.00 x̃: 95
helped stats (rel) min: 42.70% max: 86.36% x̄: 49.88% x̃: 44.66%
95% mean confidence interval for instructions value: -132.97 -57.03
95% mean confidence interval for instructions %-change: -62.28% -37.49%
Instructions are helped.

total cycles in shared programs: 7280180 -> 7272912 (-0.10%)
cycles in affected programs: 12960 -> 5692 (-56.08%)
helped: 8
HURT: 0
helped stats (abs) min: 352 max: 1456 x̄: 908.50 x̃: 910
helped stats (rel) min: 52.45% max: 91.19% x̄: 59.24% x̃: 55.15%
95% mean confidence interval for cycles value: -1274.03 -542.97
95% mean confidence interval for cycles %-change: -70.06% -48.41%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142>
2020-03-18 20:36:29 +00:00
Francisco Jerez
a30bb25a7a glsl: Fix software 64-bit integer to 32-bit float conversions.
The current implementation was broken for any integers between 2^24
and 2^30 (it would return zero for me on ICL).  The reason is that for
such integers we wouldn't take the 'if (0 <= shiftCount)' early return
path, however 'shiftCount + 7' would be positive, leading to a
negative 'count' argument passed to __shift64RightJamming(), which
would give undefined results.

This reworks the affected conversion functions to use either
__shortShift64Left() or __shift64RightJamming() based on the sign of
the final shift count, which should avoid the problem.  In addition
this should qualify as a clean-up/optimization -- This implementation
of the conversion functions translates to 7 instructions less than the
original on Intel hardware.

This fixes the 'KHR-GL46.shader_ballot_tests.ShaderBallotFunctionBallot'
conformance tests on soft fp64 hardware with large enough subgroup
size (>16).

Fixes: d5cf6e92b4 "glsl: Add built-in functions to do uint64_to_fp32(uint64_t)"
Fixes: c9d333a6b7 "glsl: Add built-in functions to do int64_to_fp32(int64_t)"
Cc: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
2020-01-10 10:51:58 -08:00
Sagar Ghuge
06807e1948 glsl: Fix round64 conversion function
Fix round64 function to handle round to nearest even cases specially
with positive and negative numbers with fraction part 0.5.

v2: 1) Simplify unused bits (Elie Tournier)

Fixes:
   KHR-GL45.gpu_shader_fp64.builtin.round_dvec2
   KHR-GL45.gpu_shader_fp64.builtin.round_dvec3
   KHR-GL45.gpu_shader_fp64.builtin.round_dvec4
   KHR-GL45.gpu_shader_fp64.builtin.roundeven_double
   KHR-GL45.gpu_shader_fp64.builtin.roundeven_dvec2
   KHR-GL45.gpu_shader_fp64.builtin.roundeven_dvec3
   KHR-GL45.gpu_shader_fp64.builtin.roundeven_dvec4

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
Acked-by: Anuj Phogat <anuj.phogat@gmail.com>
2019-06-25 15:19:10 -07:00
Anuj Phogat
a42163cbbc compiler: Add lowering support for 64-bit saturate operations to software
Fixes 7 Khronos GL CTS tests:
KHR-GL45.gpu_shader_fp64.builtin.smoothstep_dvec{double, 2, 3, 4}
KHR-GL45.gpu_shader_fp64.builtin.smoothstep_against_scalar_dvec{2, 3, 4}

Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-15 23:30:30 +00:00
Sagar Ghuge
f998ce4111 glsl: Add "built-in" functions to do fp32_to_int64(fp32)
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-01-09 16:42:40 -08:00
Sagar Ghuge
2632c12477 glsl: Add "built-in" functions to do fp32_to_uint64(fp32)
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-01-09 16:42:40 -08:00
Sagar Ghuge
876a4b85fe glsl: Add "built-in" functions to do fp64_to_int64(fp64)
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-01-09 16:42:40 -08:00
Sagar Ghuge
21e9bb2b3f glsl: Add utility function to round and pack int64_t value
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-01-09 16:42:40 -08:00
Sagar Ghuge
5a674fd789 glsl: Add "built-in" functions to do fp64_to_uint64(fp64)
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-01-09 16:42:40 -08:00
Sagar Ghuge
5a87441807 glsl: Add utility function to round and pack uint64_t value
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-01-09 16:42:40 -08:00
Sagar Ghuge
c9d333a6b7 glsl: Add "built-in" functions to do int64_to_fp32(int64_t)
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-01-09 16:42:40 -08:00
Sagar Ghuge
d5cf6e92b4 glsl: Add "built-in" functions to do uint64_to_fp32(uint64_t)
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-01-09 16:42:40 -08:00
Sagar Ghuge
b830efb191 glsl: Add "built-in" functions to do int64_to_fp64(int64_t)
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-01-09 16:42:40 -08:00
Sagar Ghuge
7c5b982b89 glsl: Add "built-in" functions to do uint64_to_fp64(uint64_t)
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
2019-01-09 16:42:40 -08:00
Matt Turner
15757bc80b glsl: Add "built-in" functions to convert bool to double
And vice versa.

Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2019-01-09 16:42:40 -08:00
Matt Turner
e213f3871f glsl: Add "built-in" functions to do ffract(fp64)
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2019-01-09 16:42:40 -08:00
Matt Turner
5c9a659f50 glsl: Add "built-in" function to do ffloor(fp64)
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2019-01-09 16:42:40 -08:00
Matt Turner
83762afa66 glsl: Add "built-in" functions to do fmin/fmax(fp64)
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2019-01-09 16:42:40 -08:00
Matt Turner
92ac2169fb glsl: Add "built-in" functions to do ffma(fp64)
Definitely not actually a fused-multiply add.

Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2019-01-09 16:42:40 -08:00
Elie Tournier
3db81b5d9f glsl: Add "built-in" functions to do round(fp64)
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
2019-01-09 16:42:40 -08:00
Elie Tournier
48891ab441 glsl: Add "built-in" functions to do trunc(fp64)
v2: use mix.

Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
2019-01-09 16:42:40 -08:00
Elie Tournier
2119094b1d glsl: Add "built-in" functions to do sqrt(fp64)
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
2019-01-09 16:42:40 -08:00
Elie Tournier
cad58fc5e7 glsl: Add "built-in" functions to do fp32_to_fp64(fp32)
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
2019-01-09 16:42:40 -08:00
Elie Tournier
407bd1bbf9 glsl: Add "built-in" functions to do fp64_to_fp32(fp64)
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
2019-01-09 16:42:40 -08:00
Elie Tournier
f499942b31 glsl: Add "built-in" functions to do int_to_fp64(int)
v2: use mix
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
2019-01-09 16:42:40 -08:00
Elie Tournier
773190f281 glsl: Add "built-in" functions to do fp64_to_int(fp64)
v2: use mix

Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
2019-01-09 16:42:40 -08:00
Elie Tournier
cbf090b809 glsl: Add "built-in" functions to do uint_to_fp64(uint)
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
2019-01-09 16:42:40 -08:00
Elie Tournier
a3551ee61f glsl: Add "built-in" functions to do fp64_to_uint(fp64)
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
2019-01-09 16:42:40 -08:00
Elie Tournier
4a93401546 glsl: Add "built-in" functions to do mul(fp64, fp64)
v2: use mix
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
2019-01-09 16:42:40 -08:00
Elie Tournier
f111d72596 glsl: Add "built-in" functions to do add(fp64, fp64)
v2: use mix and findMSB to optimise.
v3: [Sagar] Fix zFrac0 == 0u case in __normalizeRoundAndPackFloat64

Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
2019-01-09 16:42:40 -08:00
Elie Tournier
c036fc97a2 glsl: Add "built-in" functions to do lt(fp64, fp64)
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
2019-01-09 16:42:40 -08:00
Elie Tournier
3e4d5ea7b8 glsl: Add utility function to extract 64-bit sign
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-01-09 16:42:40 -08:00
Elie Tournier
ec6e823a99 glsl: Add "built-in" functions to do eq/ne(fp64, fp64) 2019-01-09 16:42:40 -08:00
Elie Tournier
c802cdde9d glsl: Add "built-in" function to do sign(fp64)
v2: use mix.

Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
2019-01-09 16:42:40 -08:00
Elie Tournier
eac66f0248 glsl: Add "built-in" functions to do neg(fp64)
v2: use mix.

Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
2019-01-09 16:42:40 -08:00
Elie Tournier
0428951b9d glsl: Add "built-in" function to do abs(fp64)
Signed-off-by: Elie Tournier <elie.tournier@collabora.com>
2019-01-09 16:42:40 -08:00
Matt Turner
b63a1f8e40 glsl: Create file to contain software fp64 functions
The following patches will add implementations of various
double-precision operations to this file.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-01-09 16:42:40 -08:00