Commit Graph

102712 Commits

Author SHA1 Message Date
Samuel Pitoiset
be794fa26b radv: clean up radv_{set,load}_color_clear_regs() helpers
And replace _regs by _metadata because it makes more sense.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:53:58 +02:00
Samuel Pitoiset
d7b772abb4 radv: update the fast color clear values only if the image is bound
It's unnecessary to update the fast color clear values if the
fast cleared color image isn't currently bound.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 15:53:55 +02:00
Christian Gmeiner
efae127993 util/bitset: include util/macro.h
BITSET_FFS(x) macro makes use of ARRAY_SIZE(x) macro which is
defined in util/macro.h. Include it directy to make usage more
straightforward.

Fixes: 692bd4a1ab ("util: replace Elements() with ARRAY_SIZE()")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-15 11:26:30 +01:00
Lukas Rusak
4cfc4cef80 meson: fix private libs when building without glx
I noticed that the generated pkg-config files will include
glx and x11 dependencies even when x11 isn't a selected platform.

This fixes the private libs and was tested by building kmscube

V2:
  - check if gallium-xlib is being used for glx

Fixes: 108d257a16 "meson: build libEGL"
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2018-06-15 10:43:22 +01:00
Rhys Perry
30f1ab7a59 docs: document addition of GL_ARB_sample_locations for nvc0
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v2)
2018-06-14 20:09:45 -06:00
Rhys Perry
66ca7e400b nvc0: add support for programmable sample locations
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
2018-06-14 20:09:45 -06:00
Rhys Perry
9f217facbd st/mesa: add support for ARB_sample_locations
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v2)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
2018-06-14 20:09:45 -06:00
Rhys Perry
51a221e378 gallium: add support for programmable sample locations
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v2)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
2018-06-14 20:09:45 -06:00
Rhys Perry
67f40dadaa mesa: add support for ARB_sample_locations
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com> (v2)
Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v2)
2018-06-14 20:09:45 -06:00
Eric Anholt
cd2e673abc v3d: Fix polygon offset for Z16 buffers.
Fixes:
dEQP-GLES3.functional.polygon_offset.fixed16_displacement_with_units
dEQP-GLES3.functional.polygon_offset.fixed16_render_with_units
2018-06-14 17:03:16 -07:00
Eric Anholt
d91e06a065 v3d: Fix configuration setup of mixed f32 and f16 render targets.
Fixes dEQP-GLES3.functional.fragment_out.random.26 and 6 others.
2018-06-14 16:52:25 -07:00
Eric Anholt
6784aa9870 v3d: Don't set the first_ez_state to DISABLED if after only UNDECIDED draws.
We need to have the RCL start with EZ enabled, since those undecided draws
had EZ enabled.  But we do need to update from UNDECIDED to LT or GT as
necessary still.

Fixes many simulator assertion fails in deqp
fragment_ops/interaction/basic_shader/*
2018-06-14 16:52:25 -07:00
Eric Anholt
9080642449 v3d: Use the right size for v3d 4.x TEXTURE_SHADER_STATE BO.
This doesn't really matter, since they both get rounded up to 4096.
2018-06-14 16:52:25 -07:00
Eric Anholt
31548187cf v3d: Add static asserts for other packed packet sizes. 2018-06-14 16:52:25 -07:00
Eric Anholt
0eef4d7f8f v3d: Fix the size of the packed attribute state.
Fixes segfaults in dEQP-GLES3.functional.vertex_array_objects.all_attributes.
2018-06-14 16:52:25 -07:00
Eric Anholt
7d8fe50af3 v3d: Remove some unused context fields from vc4. 2018-06-14 16:52:25 -07:00
Eric Anholt
48011c42aa v3d: Remove unused QUNIFORM_STENCIL left over from vc4. 2018-06-14 16:52:25 -07:00
Eric Anholt
4564537222 v3d: Use our #define for max attributes in shader caps. 2018-06-14 16:52:25 -07:00
Eric Anholt
a40bc33b11 v3d: Fix undefined results for a swap_color_rb RT from a float shader output.
Fixes segfaults and undefined behavior in
dEQP-GLES3.functional.fragment_out.basic.fixed.srgb8_alpha8_lowp_float
2018-06-14 16:52:25 -07:00
Dave Airlie
600d34c822 radv: remove multisample bit from shader key.
This wasn't being used anywhere inside the shader from what I can see.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-15 09:33:20 +10:00
Kenneth Graunke
f6898f2b55 intel/compiler: Properly consider UBO loads that cross 32B boundaries.
The UBO push analysis pass incorrectly assumed that all values would fit
within a 32B chunk, and only recorded a bit for the 32B chunk containing
the starting offset.

For example, if a UBO contained the following, tightly packed:

   vec4 a;  // [0, 16)
   float b; // [16, 20)
   vec4 c;  // [20, 36)

then, c would start at offset 20 / 32 = 0 and end at 36 / 32 = 1,
which means that we ought to record two 32B chunks in the bitfield.

Similarly, dvec4s would suffer from the same problem.

v2: Rewrite the accounting, my calculations were wrong.
v3: Write a comment about partial values (requested by Jason).

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v3]
2018-06-14 14:58:59 -07:00
Ian Romanick
37bd9ccd21 glsl: Don't copy propagate elements from SSBO or shared variables either
Since SSBOs can be written by a different GPU thread, copy propagating a
read can cause the value to magically change.  SSBO reads are also very
expensive, so doing it twice will be slower.

The same shader was helped by this patch and the previous.

Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14399119 -> 14399113 (<.01%)
instructions in affected programs: 683 -> 677 (-0.88%)
helped: 1
HURT: 0

total cycles in shared programs: 532973113 -> 532971865 (<.01%)
cycles in affected programs: 524666 -> 523418 (-0.24%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106774
2018-06-14 11:28:12 -07:00
Ian Romanick
461a5c899c glsl: Don't copy propagate from SSBO or shared variables either
Since SSBOs can be written by other GPU threads, copy propagating a read
can cause the value to magically change.  SSBO reads are also very
expensive, so doing it twice will be slower.

Haswell, Broadwell, and Skylake had similar results. (Skylake shown)
total instructions in shared programs: 14399120 -> 14399119 (<.01%)
instructions in affected programs: 684 -> 683 (-0.15%)
helped: 1
HURT: 0

total cycles in shared programs: 532978931 -> 532973113 (<.01%)
cycles in affected programs: 530484 -> 524666 (-1.10%)
helped: 1
HURT: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106774
2018-06-14 11:26:33 -07:00
Lukas Rusak
1d92d6486a meson: only build vl_winsys_dri.c when x11 platform is used
This seems to have been missed in the move from autotools

This fixes the following build issue:

../src/gallium/auxiliary/vl/vl_winsys_dri.c:34:10: fatal error: X11/Xlib-xcb.h: No such file or directory
 #include <X11/Xlib-xcb.h>
          ^~~~~~~~~~~~~~~~

Fixes: b1b65397d0
       ("meson: Build gallium auxiliary")
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-06-14 10:34:51 -07:00
Brian Paul
b9e6438adf st/mesa: add missing switch cases in glsl_to_tgsi_visitor::visit()
To silence compiler warning about unhandled switch cases.

Reviewed-by: Charmaine Lee <charmainel@vmware.com>
2018-06-14 11:29:51 -06:00
Bas Nieuwenhuizen
41dabdc475 radv: Fix output for sparse MRTs.
We need to init the cb_shader_format correctly with the changed
col_format, so this moves the col_format adjustment to before the
adjustment to before the cb_shader_mask gets generated.

Fixes: 06d3c65098 "radv: fix a GPU hang when MRTs are sparse"
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106903
CC: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-06-14 11:48:24 +02:00
Samuel Pitoiset
68dead112e radv: update the ZRANGE_PRECISION value for the TC-compat bug
On GFX8+, there is a bug that affects TC-compatible depth surfaces
when the ZRange is not reset after LateZ kills pixels.

The workaround is to always set DB_Z_INFO.ZRANGE_PRECISION to match
the last fast clear value. Because the value is set to 1 by default,
we only need to update it when clearing Z to 0.0.

We also need to set the depth clear regs and to update
ZRANGE_PRECISION when initializing a TC-compat depth image to 0.

Original patch from James Legg.

This fixes random CTS fails with
dEQP-VK.renderpass.suballocation.formats.d32_sfloat_s8_uint.input.*

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105396
CC: <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-14 11:38:29 +02:00
Samuel Iglesias Gonsálvez
183adc51f8 anv: reduce maxFragmentInputComponents
If the application asks for the maximum number of fragment input
components (128), use all of them plus some builtins that are
passed in the VUE, then we exceed the maximum number of used VUE
slots (32) and we break one assert that checks this limit.

Also, with separate shader objects, we add CLIP_DIST0, CLIP_DIST1
builtins in brw_compute_vue_map() because we don't know if
gl_ClipDistance is going to be read/write by an adjacent stage.

Fixes VK-GL-CTS CL#2569.

Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2018-06-14 09:54:28 +02:00
Marek Olšák
6d671078a8 radeonsi/gfx9: fix si_get_buffer_from_descriptors for 48-bit pointers
This fixes:
GL45-CTS.pipeline_statistics_query_tests_ARB.functional_compute_shader_invocations

Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-06-13 22:00:12 -04:00
Marek Olšák
a4312742a5 radeonsi/gfx9: update & clean up a DPBB heuristic
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:43 -04:00
Marek Olšák
47b780be21 radeonsi/gfx9: set POPS_DRAIN_PS_ON_OVERLAP due to a hw bug
This may not be needed yet, but let's set it now.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:42 -04:00
Marek Olšák
a152ca70f2 radeonsi/gfx9: remove UINT_MAX array terminators in bin size tables
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:40 -04:00
Marek Olšák
cd0be6cdc8 radeonsi/gfx9: update bin sizes
This is based on our docs (recently updated), not amdvlk.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:39 -04:00
Marek Olšák
2f51081a93 radeonsi/gfx9: update primitive binning code for EQAA
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:37 -04:00
Marek Olšák
22e994bb75 radeonsi: assume that rasterizer state is non-NULL in draw_vbo
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:36 -04:00
Marek Olšák
f3b3ee6974 radeonsi: micro-optimize prim checking and fix guardband with lines+adjacency
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:34 -04:00
Marek Olšák
d6974feb90 radeonsi: move the guardband registers into a separate state atom
They have a different frequency of updates and don't change when scissors
change.

I think this even fixes something in si_update_vs_viewport_state.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:31 -04:00
Marek Olšák
68b1c669e7 radeonsi/gfx9: implement the scissor bug workaround without performance drop
This might improve performance on Vega10 and Raven.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:27 -04:00
Marek Olšák
73b0d10152 radeonsi: don't set VGT_LS_HS_CONFIG if it doesn't change
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:25 -04:00
Marek Olšák
28ee825e19 radeonsi: move VGT_GS_OUT_PRIM_TYPE into si_shader_gs
same as amdvlk.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:23 -04:00
Marek Olšák
99e0ba6868 radeonsi: record CLIPVERTEX output usage properly for compatibility profiles
This was missed when adding CLIPVERTEX support into GS & tess.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:20 -04:00
Marek Olšák
47a57a709d radeonsi: fix FBFETCH with 2D MSAA arrays
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:17 -04:00
Marek Olšák
e5e57c3a5e ac: handle undefined EQAA samples in ac_apply_fmask_to_sample
RADV might wanna use this helper too.

Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
2018-06-13 22:00:12 -04:00
Marek Olšák
a2d4c8ff6d radeonsi: return real memory usage instead of per-process usage
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-13 21:47:36 -04:00
Marek Olšák
95ecde42eb ac/gpu_info: report real total memory sizes
The change from MIN2 to MAX2 is intentional.

Cc: 18.1 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-13 21:47:36 -04:00
Dave Airlie
f11b664f48 docs: mark virgl GL 4.0 features as complete.
virgl should now expose GL4.1 where it can.
2018-06-14 10:38:11 +10:00
Dave Airlie
7b6f2704eb virgl: add ARB_tessellation_shader support. (v2)
This should add all the pieces to enable tess shaders on virgl.

v2: fixup transform to handle tess and strip out precise.
set default for max patch varyings to work around issue when
tess gets enabled from v1 caps but v2 caps aren't in place. (Elie)

Reviewed-by: Elie Tournier <elie.tournier@collabora.com>
2018-06-14 10:36:31 +10:00
Dave Airlie
babd1d526b glsl: allow standalone semicolons outside main()
GLSL 4.60 offically added this but games and older CTS suites actually
had shaders that did this, we may as well enable it everywhere.

Adding stable because it appears apps in the wild do this.

Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: <mesa-stable@lists.freedesktop.org>
2018-06-14 10:21:51 +10:00
Samuel Pitoiset
51e23d3419 radv: don't fast clear HTILE for 16-bit depth surfaces on GFX8
This causes rendering issues in Shadow Warrior 2 with DXVK.

Cc: mesa-stable@lists.freedesktop.org
Fixes: ccc64f3133 ("radv: enable TC-compat HTILE for 16-bit depth surfaces on GFX8")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106912
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-06-13 20:30:04 +02:00
Andrew Galante
baf16b2ea3 configure.ac: Test for __atomic_add_fetch in atomic checks
Some platforms have 64-bit __atomic_load_n but not 64-bit
__atomic_add_fetch, so test for both of them.

Bug: https://bugs.gentoo.org/655616
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-06-13 10:09:46 -07:00