Commit Graph

136055 Commits

Author SHA1 Message Date
Lionel Landwerlin
8955d179d3 anv: fix MI_PREDICATE_RESULT write
This register is only 32bits.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 1952fd8d2c ("anv: Implement VK_EXT_conditional_rendering for gen 7.5+")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9428>
2021-03-05 16:19:20 +00:00
Alyssa Rosenzweig
718bfdb3da pan/bi: Implement fsin/fcos
Instead of lowering it in NIR, use the lookup tables as inputs to a
second-order Taylor expansion. shader-db results aren't amazing but keep
in mind this is without backend CSE yet.

total instructions in shared programs: 115913 -> 115707 (-0.18%)
instructions in affected programs: 3151 -> 2945 (-6.54%)
helped: 12
HURT: 0
Instructions are helped.

total nops in shared programs: 84045 -> 84041 (<.01%)
nops in affected programs: 1571 -> 1567 (-0.25%)
helped: 1
HURT: 7
Inconclusive result (value mean confidence interval includes 0).

total clauses in shared programs: 20498 -> 20489 (-0.04%)
clauses in affected programs: 188 -> 179 (-4.79%)
helped: 6
HURT: 0
Clauses are helped.

total quadwords in shared programs: 90395 -> 90291 (-0.12%)
quadwords in affected programs: 2287 -> 2183 (-4.55%)
helped: 12
HURT: 0
Quadwords are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9420>
2021-03-05 15:15:10 +00:00
Alyssa Rosenzweig
253b795451 pan/bi: Allow negating constants
Useful for representing -0 in transcendental sequences matching the
blob.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9420>
2021-03-05 15:15:10 +00:00
Alyssa Rosenzweig
362756ad09 pan/bi: Use replace_index in more places
Needed to respect abs/neg.

Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9420>
2021-03-05 15:15:10 +00:00
Pierre-Eric Pelloux-Prayer
c276bde34a radeonsi/sqtt: export shader code to RGP
With these changes the shader code is visible in RGP.

Vk pipeline feature is emulated using si_update_shaders: when shaders are
updated we compute a sha1 of their code and use it as a pipeline hash.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9277>
2021-03-05 13:10:11 +00:00
Pierre-Eric Pelloux-Prayer
729d3eb0e0 radeonsi/sqtt: don't always use WGP 0
Because it may be disabled. Instead use the cu mask to
pick the first active WGP.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9277>
2021-03-05 13:10:11 +00:00
Pierre-Eric Pelloux-Prayer
47eafb3f51 radeonsi/sqtt: remove duplicate token
V_008D18_REG_INCLUDE_CONTEXT was set twice.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9277>
2021-03-05 13:10:11 +00:00
Pierre-Eric Pelloux-Prayer
a27ea38d2a radeonsi/sqtt: keep a copy of the uploaded shader code
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9277>
2021-03-05 13:10:11 +00:00
Pierre-Eric Pelloux-Prayer
7f5a8db96d ac/rgp: move radv/sqtt functions to ac
pso_correlation and code_object_loader don't depend on drivers
specific logic so move them to the shared code.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9277>
2021-03-05 13:10:11 +00:00
Pierre-Eric Pelloux-Prayer
b2ef94943f ac/rtld: make ac_rtld_upload returns the code size
This will be useful to keep a copy of the uploaded code.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9277>
2021-03-05 13:10:11 +00:00
Pierre-Eric Pelloux-Prayer
e5b1e645e7 ac/rgp: make the max gap between shader code a warning
For radeonsi the shaders don't live in the same BOs, so they're
unlikely to be less that 0x1000 bytes apart.

So this commit bumps the threshold to 0x10000 and warns once
when hitting it.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9277>
2021-03-05 13:10:11 +00:00
Pierre-Eric Pelloux-Prayer
0e97d817f5 radeonsi: properly set SPI_SHADER_PGM_HI_ES
When not using S_00B324_MEM_BASE the value isn't properly truncated.

Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9277>
2021-03-05 13:10:11 +00:00
Iago Toral Quiroga
6e6e71ddf9 broadcom/compiler: fix flags check for ldvary merge
We were checking that the previous instruction doesn't write flags,
but we also need to check it doesn't read them.

Fixes: 1784dd22a3 ('broadcom/compiler: pipeline smooth ldvary sequences')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9431>
2021-03-05 12:55:47 +00:00
Iago Toral Quiroga
21c1853c55 broadcom/compiler: ldvary doesn't implicitly write to r3 since V3D 4.1
total instructions in shared programs: 13805979 -> 13786037 (-0.14%)
instructions in affected programs: 2263244 -> 2243302 (-0.88%)
helped: 10646
HURT: 1508
Instructions are helped.

total threads in shared programs: 412220 -> 412242 (<.01%)
threads in affected programs: 58 -> 80 (37.93%)
helped: 17
HURT: 6
Threads are helped.

total uniforms in shared programs: 3793200 -> 3790401 (-0.07%)
uniforms in affected programs: 131281 -> 128482 (-2.13%)
helped: 1547
HURT: 281
Uniforms are helped.

total max-temps in shared programs: 2326309 -> 2324834 (-0.06%)
max-temps in affected programs: 31836 -> 30361 (-4.63%)
helped: 1139
HURT: 153
Max-temps are helped.

total spills in shared programs: 5932 -> 5940 (0.13%)
spills in affected programs: 80 -> 88 (10.00%)
helped: 2
HURT: 3

total fills in shared programs: 13370 -> 13372 (0.01%)
fills in affected programs: 480 -> 482 (0.42%)
helped: 2
HURT: 3

total sfu-stalls in shared programs: 30829 -> 30685 (-0.47%)
sfu-stalls in affected programs: 2190 -> 2046 (-6.58%)
helped: 570
HURT: 533
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 13836808 -> 13816722 (-0.15%)
inst-and-stalls in affected programs: 2276152 -> 2256066 (-0.88%)
helped: 10643
HURT: 1525
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9430>
2021-03-05 13:37:39 +01:00
Rhys Perry
524848707b radv: don't set sx_blend_opt_epsilon for V_028C70_COLOR_10_11_11
Matches radeonsi and PAL. From PAL:
// 1 is recommended, but doesn't provide sufficient precision

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4394
Fixes: ed94638156 ("radv: Enable RB+ where possible.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9427>
2021-03-05 11:16:40 +00:00
Iago Toral Quiroga
839007e490 broadcom/compiler: always restart ldvary pipelining when scheduling ldvary
When we were only able to pipeline smooth varyings, if we had to disable
ldvary pipelining in the middle of a sequence it would stay disabled for
the rest of the program, to prevent us from prioritizing scheduling of
ldvary instructions that we would not be able to pipeline effectively.
Now that we can pipeline all ldvary sequences we can change this.

This change re-enables ldvary pipelining upon finding the next
ldvary in the program in the hopes that we can continue pipelining
succesfully. To do this, we track the number of ldvary instructions we
emitted so far and compare that to the number of inputs in the fragment
shader we are scheduling. This also allows us to simplify our ldvary
tracking at nir to vir time, since that is all now handled in the QPU
scheduler.

total instructions in shared programs: 13817048 -> 13810783 (-0.05%)
instructions in affected programs: 810114 -> 803849 (-0.77%)
helped: 4843
HURT: 591
Instructions are helped.

total max-temps in shared programs: 2326612 -> 2326300 (-0.01%)
max-temps in affected programs: 4689 -> 4377 (-6.65%)
helped: 285
HURT: 7
Max-temps are helped.

total sfu-stalls in shared programs: 30942 -> 30865 (-0.25%)
sfu-stalls in affected programs: 207 -> 130 (-37.20%)
helped: 120
HURT: 42
Sfu-stalls are helped.

total inst-and-stalls in shared programs: 13847990 -> 13841648 (-0.05%)
inst-and-stalls in affected programs: 825378 -> 819036 (-0.77%)
helped: 4899
HURT: 590
Inst-and-stalls are helped.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9404>
2021-03-05 10:32:19 +01:00
Samuel Pitoiset
2169c4f763 radv: re-enable TC-compat HTILE for MSAA D32S8 images on GFX9+
Should help MSAA games. Note that it's broken on GFX8 because
the tiling doesn't match.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3868
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9284>
2021-03-05 08:44:40 +00:00
Xin He
97b196b921 virgl: use atomic operations when increase sub_ctx_id
Use atomic operations to avoid competition. In addition,
since sub_ctx_id 0 has been used by default, sub_ctx_id
should start from 1.

Signed-off-by: Xin He <hexin.op@bytedance.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9406>
2021-03-05 08:35:29 +00:00
Samuel Pitoiset
367a93830b radv: skip useless FCE when fast-clearing MSAA images with DCC enabled
The clear code is 0xCC which means CMASK isn't fast-cleared.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9392>
2021-03-05 08:11:28 +00:00
Samuel Pitoiset
6102507a74 radv: remove useless check about mips+layers for TC-compat HTILE images
radv_use_htile_for_image() prevents it.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9405>
2021-03-05 08:10:19 +01:00
Samuel Pitoiset
438f65fb1e radv: cleanup enabling TC-compat HTILE for depth surfaces
It makes more sense to try to enable TC-compat if the image has HTILE.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9405>
2021-03-05 08:09:42 +01:00
Mike Blumenkrantz
55b57db84d zink: add vk/spirv caps/extension for shader LAYER variable
this is required if gl_Layer is used outside of GEOMETRY stage

Fixes: c77df59c9e ("zink: export PIPE_CAP_TGSI_VS_LAYER_VIEWPORT")

Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9410>
2021-03-05 03:45:51 +00:00
Dave Airlie
1186fbcdf1 lavapipe: fix dynamic viewport/scissor pipeline emission
Just fixup the tests for when the pipeline vp/scissors
are emitted.

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9422>
2021-03-05 03:34:47 +00:00
Dave Airlie
6bcd304278 lavapipe: fix pipeline vp/scissor mixup.
Not copying all the scissors caused
dEQP-VK.pipeline.extended_dynamic_state.two_draws_dynamic.2_viewports
to fail but thah test pointlessly relies on KHR_multiview (cts issue
filed).

Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Fixes: b38879f8c5 ("vallium: initial import of the vulkan frontend")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9422>
2021-03-05 03:34:47 +00:00
Iván Briano
194e477615 anv: don't advertise mipmaps for linear 3D surfaces on BDW
Prior to SKL, the mipmaps for 3D surfaces are laid out in a way
that make it impossible to represent in the way that
VkSubresourceLayout expects. Since we can't tell users how to make
sense of them, don't report them as available.

"Fixes" dEQP-VK.image.subresource_layout.3d.*

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9419>
2021-03-04 16:23:23 -08:00
Ian Romanick
2c4fd24c01 nir/algebraic: Apply addition property of equality to the other ordering too
Inequality comparison operations are not commutative, so `foo < bar` and
`bar < foo` both have to be explicitly listed.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

All Intel GPUs had similar results. (Ice Lake shown)
total instructions in shared programs: 20027051 -> 20026899 (<.01%)
instructions in affected programs: 37181 -> 37029 (-0.41%)
helped: 85
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 1.79 x̃: 1
helped stats (rel) min: 0.05% max: 6.78% x̄: 0.92% x̃: 0.68%
95% mean confidence interval for instructions value: -2.42 -1.15
95% mean confidence interval for instructions %-change: -1.23% -0.61%
Instructions are helped.

total cycles in shared programs: 979762793 -> 979753527 (<.01%)
cycles in affected programs: 2653905 -> 2644639 (-0.35%)
helped: 104
HURT: 50
helped stats (abs) min: 1 max: 1048 x̄: 119.99 x̃: 11
helped stats (rel) min: <.01% max: 9.88% x̄: 0.77% x̃: 0.20%
HURT stats (abs)   min: 1 max: 734 x̄: 64.26 x̃: 8
HURT stats (rel)   min: <.01% max: 3.06% x̄: 0.36% x̃: 0.10%
95% mean confidence interval for cycles value: -98.65 -21.68
95% mean confidence interval for cycles %-change: -0.66% -0.15%
Cycles are helped.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9374>
2021-03-04 22:50:53 +00:00
Ian Romanick
33031bdab6 nir/algebraic: Apply addition property of equality more conservatively
This allows a lot more CSE.  Depending on where the addition and the
comparison are scheduled, it may also reduce register pressure by
reducing the live range of the addends.

Across all the platforms, the shaders affected for spills or fills were
all fragment shaders from Dirt Rally.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 21043103 -> 21038804 (-0.02%)
instructions in affected programs: 892878 -> 888579 (-0.48%)
helped: 1549
HURT: 724
helped stats (abs) min: 1 max: 225 x̄: 4.14 x̃: 2
helped stats (rel) min: 0.05% max: 11.18% x̄: 1.04% x̃: 0.78%
HURT stats (abs)   min: 1 max: 71 x̄: 2.93 x̃: 1
HURT stats (rel)   min: 0.07% max: 6.90% x̄: 0.80% x̃: 0.56%
95% mean confidence interval for instructions value: -2.33 -1.45
95% mean confidence interval for instructions %-change: -0.50% -0.40%
Instructions are helped.

total cycles in shared programs: 855054155 -> 855757566 (0.08%)
cycles in affected programs: 58275918 -> 58979329 (1.21%)
helped: 1213
HURT: 1680
helped stats (abs) min: 1 max: 107405 x̄: 1684.00 x̃: 10
helped stats (rel) min: <.01% max: 38.09% x̄: 1.51% x̃: 0.25%
HURT stats (abs)   min: 1 max: 126632 x̄: 1634.59 x̃: 12
HURT stats (rel)   min: <.01% max: 85.91% x̄: 2.75% x̃: 0.49%
95% mean confidence interval for cycles value: -98.06 584.35
95% mean confidence interval for cycles %-change: 0.71% 1.22%
Inconclusive result (value mean confidence interval includes 0).

total spills in shared programs: 9843 -> 9771 (-0.73%)
spills in affected programs: 72 -> 0
helped: 5
HURT: 0

total fills in shared programs: 9600 -> 9451 (-1.55%)
fills in affected programs: 149 -> 0
helped: 5
HURT: 0

LOST:   14
GAINED: 9

Skylake
total instructions in shared programs: 18185074 -> 18183866 (<.01%)
instructions in affected programs: 575180 -> 573972 (-0.21%)
helped: 1286
HURT: 468
helped stats (abs) min: 1 max: 15 x̄: 1.55 x̃: 1
helped stats (rel) min: 0.03% max: 4.08% x̄: 0.67% x̃: 0.65%
HURT stats (abs)   min: 1 max: 8 x̄: 1.69 x̃: 1
HURT stats (rel)   min: 0.13% max: 7.69% x̄: 0.87% x̃: 0.45%
95% mean confidence interval for instructions value: -0.77 -0.60
95% mean confidence interval for instructions %-change: -0.30% -0.22%
Instructions are helped.

total cycles in shared programs: 960518105 -> 960608234 (<.01%)
cycles in affected programs: 42536073 -> 42626202 (0.21%)
helped: 1210
HURT: 1714
helped stats (abs) min: 1 max: 7015 x̄: 123.41 x̃: 10
helped stats (rel) min: <.01% max: 33.76% x̄: 1.32% x̃: 0.26%
HURT stats (abs)   min: 1 max: 14474 x̄: 139.71 x̃: 14
HURT stats (rel)   min: <.01% max: 58.94% x̄: 2.00% x̃: 0.44%
95% mean confidence interval for cycles value: 4.02 57.63
95% mean confidence interval for cycles %-change: 0.43% 0.82%
Cycles are HURT.

LOST:   16
GAINED: 42

Broadwell
total instructions in shared programs: 17856880 -> 17852158 (-0.03%)
instructions in affected programs: 564836 -> 560114 (-0.84%)
helped: 1243
HURT: 418
helped stats (abs) min: 1 max: 115 x̄: 4.36 x̃: 1
helped stats (rel) min: 0.03% max: 9.67% x̄: 0.90% x̃: 0.67%
HURT stats (abs)   min: 1 max: 8 x̄: 1.67 x̃: 1
HURT stats (rel)   min: 0.14% max: 7.69% x̄: 0.89% x̃: 0.46%
95% mean confidence interval for instructions value: -3.45 -2.23
95% mean confidence interval for instructions %-change: -0.51% -0.38%
Instructions are helped.

total cycles in shared programs: 1031140321 -> 1029856892 (-0.12%)
cycles in affected programs: 66986946 -> 65703517 (-1.92%)
helped: 1084
HURT: 1653
helped stats (abs) min: 1 max: 415168 x̄: 1835.32 x̃: 10
helped stats (rel) min: <.01% max: 57.16% x̄: 1.19% x̃: 0.28%
HURT stats (abs)   min: 1 max: 43930 x̄: 427.14 x̃: 12
HURT stats (rel)   min: <.01% max: 57.53% x̄: 1.32% x̃: 0.39%
95% mean confidence interval for cycles value: -915.76 -22.07
95% mean confidence interval for cycles %-change: 0.17% 0.47%
Inconclusive result (value mean confidence interval and %-change mean confidence interval disagree).

total spills in shared programs: 20891 -> 20335 (-2.66%)
spills in affected programs: 1567 -> 1011 (-35.48%)
helped: 70
HURT: 0

total fills in shared programs: 27307 -> 25905 (-5.13%)
fills in affected programs: 5381 -> 3979 (-26.05%)
helped: 71
HURT: 0

LOST:   17
GAINED: 20

Haswell
total instructions in shared programs: 16411850 -> 16409414 (-0.01%)
instructions in affected programs: 602666 -> 600230 (-0.40%)
helped: 1152
HURT: 781
helped stats (abs) min: 1 max: 103 x̄: 3.59 x̃: 1
helped stats (rel) min: 0.03% max: 8.61% x̄: 0.85% x̃: 0.65%
HURT stats (abs)   min: 1 max: 41 x̄: 2.18 x̃: 1
HURT stats (rel)   min: 0.12% max: 7.69% x̄: 0.88% x̃: 0.69%
95% mean confidence interval for instructions value: -1.74 -0.78
95% mean confidence interval for instructions %-change: -0.21% -0.10%
Instructions are helped.

total cycles in shared programs: 1035338781 -> 1036977801 (0.16%)
cycles in affected programs: 68961096 -> 70600116 (2.38%)
helped: 1246
HURT: 2206
helped stats (abs) min: 1 max: 392022 x̄: 1040.28 x̃: 14
helped stats (rel) min: <.01% max: 56.44% x̄: 2.32% x̃: 0.38%
HURT stats (abs)   min: 1 max: 68630 x̄: 1330.56 x̃: 18
HURT stats (rel)   min: <.01% max: 69.97% x̄: 3.31% x̃: 0.61%
95% mean confidence interval for cycles value: 90.43 859.17
95% mean confidence interval for cycles %-change: 1.02% 1.54%
Cycles are HURT.

total spills in shared programs: 17805 -> 17457 (-1.95%)
spills in affected programs: 1202 -> 854 (-28.95%)
helped: 34
HURT: 31

total fills in shared programs: 20939 -> 20387 (-2.64%)
fills in affected programs: 2702 -> 2150 (-20.43%)
helped: 34
HURT: 31

LOST:   24
GAINED: 45

Ivy Bridge and earlier Intel GPUs had similar results. (Ivy Bridge shown)
total instructions in shared programs: 15515912 -> 15516757 (<.01%)
instructions in affected programs: 396569 -> 397414 (0.21%)
helped: 578
HURT: 858
helped stats (abs) min: 1 max: 9 x̄: 1.32 x̃: 1
helped stats (rel) min: 0.04% max: 3.70% x̄: 0.65% x̃: 0.65%
HURT stats (abs)   min: 1 max: 11 x̄: 1.87 x̃: 1
HURT stats (rel)   min: 0.08% max: 12.90% x̄: 0.95% x̃: 0.53%
95% mean confidence interval for instructions value: 0.47 0.70
95% mean confidence interval for instructions %-change: 0.24% 0.37%
Instructions are HURT.

total cycles in shared programs: 584395455 -> 584466352 (0.01%)
cycles in affected programs: 20346570 -> 20417467 (0.35%)
helped: 1192
HURT: 1896
helped stats (abs) min: 1 max: 4108 x̄: 123.27 x̃: 14
helped stats (rel) min: <.01% max: 37.20% x̄: 2.27% x̃: 0.46%
HURT stats (abs)   min: 1 max: 3698 x̄: 114.89 x̃: 19
HURT stats (rel)   min: <.01% max: 70.28% x̄: 3.02% x̃: 0.71%
95% mean confidence interval for cycles value: 10.75 35.16
95% mean confidence interval for cycles %-change: 0.73% 1.23%
Cycles are HURT.

LOST:   20
GAINED: 12
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9374>
2021-03-04 22:50:53 +00:00
Kenneth Graunke
206495cac4 iris: Enable u_threaded_context
This implements most of the remaining u_threaded_context support.  Most
of the heavy lifting was done in the previous patches which fixed things
up for the new thread safety requirements.  Only a few things remain.

u_threaded_context support can be disabled via an environment variable:

   GALLIUM_THREAD=0

On Felix's Tigerlake with the GPU at fixed frequency, enabling
u_threaded_context improves performance of several games:

   - Civilization VI: +17%
   - Shadow of Mordor: +6%
   - Bioshock Infinite +6%
   - Xonotic: +6%

Various microbenchmarks improve substantially as well:

   - GfxBench5 gl_driver2: +58%
   - SynMark2 OglBatch6: +54%
   - Piglit drawoverhead: +25%

Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
2021-03-04 13:59:21 -08:00
Kenneth Graunke
c133d0930f iris: Use thread safe slab allocators in transfer_map handling
pipe->transfer_map can be called from u_threaded_context's thread
rather than the driver thread.  We need to use two different slab
allocators, one for each thread.  transfer_unmap, on the other hand,
is only ever called from the driver thread.

Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
2021-03-04 13:59:21 -08:00
Kenneth Graunke
1b1c857248 iris: Make various classes inherit from u_threaded_context base classes
u_threaded_context requires various objects to inherit from a new
threaded_foo base class rather than directly from pipe_foo.  This
patch does most of the mechanical changes required for that.

It also initializes the new threaded_resource fields.

Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
2021-03-04 13:59:21 -08:00
Kenneth Graunke
3358c7125a iris: Use different shader uploaders for precompile vs. draw time
When we enable u_threaded_context, the pipe->create_*_state hooks
(precompile variants) are going to be called from one thread, while
iris_update_compiled_shaders (on-the-fly variants) are going to be
called from a driver thread.  BLORP shaders also happen from
clear, blit, and so on in the driver thread.

u_upload_mgr isn't thread-safe, so use an uploader for each purpose.

Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
2021-03-04 13:59:21 -08:00
Kenneth Graunke
ec0d61c14c iris: Support rebinding of stream output targets
This enables us to replace the backing storage of resources that have
been used as stream output targets, in case we're invalidating their
entire contents.  This can avoid stalls.  We simply hadn't supported it
because it was going to be tricky to re-emit 3DSTATE_SO_BUFFER without
screwing up "reset offset to zero" vs. "keep appending".  But that
should be working fine with the previous patch's refactor.

Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
2021-03-04 13:59:21 -08:00
Kenneth Graunke
08e04ddd2c iris: Rework zeroing of stream output buffer offsets
The previous mechanism was a bit fragile.  We stored the zero offset
in the pre-baked packet, and used an flag to override 0xFFFFFFFF
(append) offsets until our first emit - then prohibited anyone from
trying to re-emit the packet by flagging IRIS_DIRTY_SO_BUFFERS,
because that would re-emit the version with the zeroing of the offset.

Now, we always store 0xFFFFFFFF in the pre-baked packet, and use a
flag to override it to zero on the first emit.  That way, we can
re-emit that packet at any time, and it'll just keep appending.

Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
2021-03-04 13:59:21 -08:00
Kenneth Graunke
e40fafa991 iris: Defer stream output target space allocation until set time
In the future, Marek is planning to make u_threaded_context call
create_stream_output_target() from a different thread than the main
driver thread, which means that we can't safely use uploaders there.

To prepare for this eventual future, just defer the allocation of
the offset BO 'til later.  It's a very small amount of overhead.

Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
2021-03-04 13:59:20 -08:00
Kenneth Graunke
5659460af4 iris: Defer uploading of surface states
With u_threaded_context, create_surface and create_sampler_view will
be called from a different thread than the driver thread.  They aren't
allowed to access the context, which means that they can't use the
uploaders there to upload our SURFACE_STATE entries.

Thanks to backing-storage replacement and iris_rebind_buffer, we already
reworked things to maintain CPU-side copies of the SURFACE_STATE entries
and added the ability to upload or re-upload them later.  So we can skip
the upload at object creation time, and add a simple resource-is-NULL
check at binding table upload time to ensure that they get uploaded by
the time we need them.  (They might get uploaded earlier due to rebinds
or clear color updates, but this is the last moment to do so.)

Reviewed-by: Zoltán Böszörményi <zboszor@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8964>
2021-03-04 13:59:20 -08:00
Eric Anholt
3bdd39f03c lima: avoid stomping over bound shader state when creating new shaders
It shouldn't affect bound program state, and the current context state
shouldn't be relevant for shader creation precompiles anyway (level load
isn't going to have the eventual set of sampler views bound when you go to
draw with that shader).

Closes: #4306
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9089>
2021-03-04 18:34:35 +00:00
Eric Anholt
4ac3f85054 lima: upload the shader to a BO at shader creation
No need to conditionally upload later.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9089>
2021-03-04 18:34:35 +00:00
Eric Anholt
5a550c8dc7 lima: don't look at dirty bits for setup of FS key
You always have to populate the key with the right texture swizzles, even
if textures haven't changed since binding a new shader.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9089>
2021-03-04 18:34:35 +00:00
Eric Anholt
d4f706389c lima: stop encoding the texture format in the shader key
We can compose the swizzles at sampler view creation time, saving
recompiles on texture format changes.

Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9089>
2021-03-04 18:34:34 +00:00
Lionel Landwerlin
8023d6de20 anv: implement INTEL_DEBUG=submit
Name all the BOs!

v2: Fix 32bit build issue (Thanks Marge!)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5736>
2021-03-04 19:46:24 +02:00
Rohan Garg
c6eb84ff30 virgl: Add support for querying detailed memory info
This allows for virgl guests to expose GL_NVX_gpu_memory_info and
GL_ATI_meminfo when the extensions are supported on the host.

Signed-off-by: Rohan Garg <rohan.garg@collabora.com>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9337>
2021-03-04 17:14:14 +01:00
Jason Ekstrand
1e53e0d2c7 intel/mi_builder: Drop the gen_ prefix
mi_ is already a unique prefix in Mesa so the gen_ isn't really gaining
us anything except extra characters.  It's possible that MI_ may
conflict a tiny bit with GenXML but it doesn't seem to be a problem
today and we can deal with that in the future if it's ever an issue.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9393>
2021-03-04 15:14:27 +00:00
Jason Ekstrand
6d522538b6 intel: Rename gen_mi_builder.h to mi_builder.h
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9393>
2021-03-04 15:14:27 +00:00
Danylo Piliaiev
7e25e5b56f ir3: disallow moving memory writes over discard
Writes to global memory should not be moved over discard,
otherwise we could have unintended side-effects or lack of
side-effects where they should be observed.

Fixes tests:
 dEQP-VK.rasterization.frag_side_effects.color_at_beginning.kill
 dEQP-VK.rasterization.frag_side_effects.color_at_end.kill

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9365>
2021-03-04 11:40:58 +00:00
Juan A. Suarez Romero
7b3b8524ef ci: Bump deqp to vk-gl-cts 1.2.5.2
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9369>
2021-03-04 11:09:35 +00:00
Danylo Piliaiev
72a9f315db ir3: make mark_kill_path exit early if instr is already seen
Would bring down its complexity in pathological cases.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9386>
2021-03-04 10:52:06 +00:00
Danylo Piliaiev
9dbb678f5a ir3: prevent duplication of instruction's dependencies
Otherwise mark_kill_path() is happy to take exponential time to finish.

It was possible to have such chains:
  ...
 stib.base0 imm[0.000000,0,0x0], ssa_233, ssa_234, false-deps:ssa_231, ssa_231
 stib.base0 imm[0.000000,0,0x0], ssa_237, ssa_238, false-deps:ssa_235, ssa_235
 stib.base0 imm[0.000000,0,0x0], ssa_241, ssa_242, false-deps:ssa_239, ssa_239
 stib.base0 imm[0.000000,0,0x0], ssa_245, ssa_246, false-deps:ssa_243, ssa_243
 stib.base0 imm[0.000000,0,0x0], ssa_249, ssa_250, false-deps:ssa_247, ssa_247
 stib.base0 imm[0.000000,0,0x0], ssa_105, ssa_253, false-deps:ssa_251, ssa_251
 stib.base0 imm[0.000000,0,0x0], ssa_109, ssa_256, false-deps:ssa_254, ssa_254
 stib.base0 imm[0.000000,0,0x0], ssa_113, ssa_259, false-deps:ssa_257, ssa_257
 stib.base0 imm[0.000000,0,0x0], ssa_117, ssa_262, false-deps:ssa_260, ssa_260
 stib.base0 imm[0.000000,0,0x0], ssa_265, ssa_266, false-deps:ssa_263, ssa_263
 stib.base0 imm[0.000000,0,0x0], ssa_269, ssa_270, false-deps:ssa_267, ssa_267
 stib.base0 imm[0.000000,0,0x0], ssa_273, ssa_274, false-deps:ssa_271, ssa_271
  ...

Fixes tests:
 dEQP-VK.geometry.layered.cube_array.36_36_12.secondary_cmd_buffer_inherit_framebuffer
 dEQP-VK.geometry.layered.3d.64_64_8.secondary_cmd_buffer_inherit_framebuffer
 dEQP-VK.geometry.layered.cube_array.64_64_12.secondary_cmd_buffer_inherit_framebuffer

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9386>
2021-03-04 10:52:06 +00:00
Samuel Pitoiset
517600b4d5 Revert "radv: stop using VM_ALWAYS_VALID on APUs"
Disabling VM_ALWAYS_VALID actually hurts more than it helps
after doing more testing. Managing the global BO list in userspace
is really costly and make a bunch of games CPU bound.

I think re-enabling VM_ALWAYS_VALID is a step in the right direction.

This reverts commit 6ac6e2fbfb.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9341>
2021-03-04 09:37:59 +00:00
Gert Wollny
e148d5ec99 r600/sfn: lower intrinsic_load_tess_coord to driver version
Fixes
  KHR-GL45.tessellation_shader.tessellation_shader_tessellation.TCS_TES
  KHR-GL45.tessellation_shader.tessellation_shader_tessellation.TES

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9373>
2021-03-04 09:14:03 +00:00
Gert Wollny
81b41e0c76 nir: Add r600 specific intrinsic for loading the tesselation coords
Only the XY pair is provided directly, the Z value has to be deducted
from the primitive type.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9373>
2021-03-04 09:14:03 +00:00