Commit Graph

93036 Commits

Author SHA1 Message Date
Andres Gomez
f1590363c9 bin/get-fixes-pick-list.sh: parse just the commit message
We were parsing the whole diff, although the candidates were
identified only by the commit message.

Now, we only use the commit message for parsing.

Signed-off-by: Andres Gomez <agomez@igalia.com>
Reviewed-by: Emil Velikov <emli.velikov@collabora.com>
2017-06-15 15:53:21 +03:00
Samuel Pitoiset
e8df89d2c5 gallium/radeon: fix initialization of new resource bindless fields
r600_resource objects are not calloc'd.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-15 11:56:21 +02:00
Lucas Stach
4026744fcb gbm: implement FD import with modifier
This implements a way to import FDs with modifiers on plain GBM devices,
without the need to go through EGL. This is mostly to the benefit of
gbm_gralloc, which can keep its dependencies low.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-06-15 10:43:36 +01:00
Lucas Stach
71b78b6b0c gbm: add API to to import FD with modifier
This allows to import an FD with an explicit modifier passed through
userspace protocols.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Robert Foss <robert.foss@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
2017-06-15 10:43:23 +01:00
Emil Velikov
18d4a6f964 i965: gen4_blorp_exec.h to the sources list
We tend to use the sources, as opposed to EXTRA_DIST to include the
headers.

Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2017-06-15 10:29:47 +01:00
Michel Dänzer
176e761513 gallium/util: Break recursion in pipe_resource_reference
It calling itself recursively prevented it from being inlined, resulting
in a copy being generated in every compilation unit referencing it. This
bloated the text segment of the Gallium mega-driver *_dri.so by ~4%,
and might also have impacted performance.

Fixes: ecd6fce261 ("mesa/st: support lowering multi-planar YUV")
v2:
* Add comment above pipe_resource_next_reference [Samuel Pitoiset]
v3:
* Use loop to unreference the full chain of resources referenced via
  the next members [Timothy Arceri]
v4:
* Stop chasing ->next chain at the first sub-resource which isn't
  destroyed [Nicolai Hähnle]

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-15 11:24:59 +09:00
Samuel Pitoiset
1c00af4264 mesa: fix 'make check' by moving bindless functions at the right place
Fixes: 5f249b9f05 ("mapi: add GL_ARB_bindless_texture entry points")
Reported-by: Mark Janes <mark.a.janes@intel.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Aaron Watry <awatry@gmail.com>
Tested-by: Michel Dänzer <michel.daenzer@amd.com>
2017-06-15 10:44:38 +09:00
Jason Ekstrand
1d132712fe i965/miptree: Use the new simple alloc_tiled for CCS buffers
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
21d83f54b3 i965/bufmgr: Add a new, simpler, bo_alloc_tiled
ISL already has all of the complexity required to figure out the correct
surface pitch and size taking tile alignment into account.  When we get
a surface out of ISL, the pitch and size are already correct and using
brw_bo_alloc_tiled_2d doesn't actually gain us anything other than extra
asserts we have to do in order to ensure that the bufmgr code and ISL
agree.  This new helper doesn't try to be smart but just allocates the
BO you ask for and sets up the tiling.

Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
6ee0530c35 i965/bufmgr: Rename bo_alloc_tiled to bo_alloc_tiled_2d
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
862493f7cb i965: Use blorp for depth/stencil clears on gen6+
Acked-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
f762962f7f i965: Set step_rate = 0 for interleaved vertex buffers
Before, we weren't setting step rate so we got whatever old value
happened to be lying around.  This can lead to some interesting
rendering errors.  In particular, if you run the OpenGL ES CTS with
dEQP-GLES3.functional.instanced.types.mat2x4 immediately followed by one
of the dEQP-GLES3.functional.transform_feedback.* tests, the transform
feedback test gets stale instancing data from the other test and fails.
The only thing that is causing this to not be a problem today is that we
use meta for clears and meta is setting up vertex buffers via the VBO or
non-interleaved path and setting step_rate to 0 for us.  When blorp
depth/stencil clears are enabled, meta is no longer sitting between the
two tests and the stale data starts causing noticeable problems.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
b3569e7445 i965: Disable the interleaved vertex optimization when instancing
Instance divisor is a property of the vertex buffer and not the vertex
element so if we ever see anything other than 0, bail.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
7175561598 intel/blorp: Work around Sandy Bridge occlusion query issue
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
39a13c08dc i965/blorp: Set no_depth_or_stencil correctly
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
b14852997a i965: Remove some unneeded fields from brw_context
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
ea225d4da4 i965: Remove some of the remnants of meta
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
96f9d4de7d intel/isl: Properly set SeparateStencilBufferEnable on gen5-6
On gen5-6, SeparateStencilBufferEnable and HierarchicalDepthBufferEnable
come hand in hand and we have to set either both or neither.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
ee0e29dd02 i965/miptree: Choose the stencil layout in miptree_create_layout
This ensures that we get the correct layout for all stencil buffers, not
just those which are created as separate stencil for a depth buffer.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
6f6aa0f462 mesa: Add a BUFFER_BITS mask for depth+stencil
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Jason Ekstrand
83ab6327c1 i965/blorp: Set aux_usage to NONE for miplevels without HiZ
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2017-06-14 18:15:05 -07:00
Aaron Watry
e4d06e4c53 radeon/winsys: Limit max allocation size to 70% of VRAM
The CL CTS queries the max allocation size, and then attempts to
allocate buffers of that size. If not enough contiguous RAM/VRAM is
available, this causes errors in the radeon kernel module due to
inability to allocate the required memory.

It's a bit of a hack, but experimentally on my system, I can use ~3/4
of the card's VRAM for a single global/constant buffer allocation given
current GUI/compositor use.

For a 1GB Pitcairn (HD7850) this gets me from the reported clinfo values of:
Global memory size                              2143076352 (1.996GiB)
Max memory allocation                           1500153446 (1.397GiB)
Max constant buffer size                        1500153446 (1.397GiB)

To:
Global memory size                              2143076352 (1.996GiB)
Max memory allocation                           751619276 (716MiB)
Max constant buffer size                        751619276 (716MiB)

Fixes: OpenCL CTS test/conformance/api/min_max_mem_alloc_size,
       OpenCL CTS test/conformance/api/min_max_constant_buffer_size

Signed-off-by: Aaron Watry <awatry@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 19:38:55 -05:00
Kenneth Graunke
b6d56c747c i965: Use a line end cap width of 0.5 unless smooth lines enabled.
This updates the Gen4-5 code to use a line end cap width of 0.5
for non-smooth lines, and 1.0 for smooth lines - which is what we
do on Gen6+.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-14 15:56:21 -07:00
Kenneth Graunke
6563d5287b i965: Use brw_get_line_width() in Gen4-5 SF_STATE code.
This unifies the Gen4-5 and Gen6+ line width calculations.

I believe it also fixes a bug - we weren't rounding the line width
to the nearest integer.  The GL 4.5 (and GL 2.1) specs "Wide Lines"
section says:

"The actual width of non-antialiased lines is determined by rounding
 the supplied width to the nearest integer, then clamping it to the
 implementation-dependent maximum non-antialiased line width."

We don't need to care about _NEW_MULTISAMPLE here because multisampling
doesn't exist on Gen4-5, so the state shouldn't change.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-14 15:56:21 -07:00
Kenneth Graunke
af373ea4a2 genxml: Fix Gen4-5 SF_STATE "Line Width" fixed point type.
It's a U3.1.  It became a U3.7 on Sandybridge.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-14 15:56:21 -07:00
Kenneth Graunke
3793410369 i965: Stop using BRW_RASTRULE_LOWER_RIGHT on Gen4-5.
This effectively reverts Robert Ellison's 2009 commit
cc8afbd386.

I'm not seeing any GL spec text indicating that UPPER won't work.
On Gen6+, this bit moved to 3DSTATE_WM as a single bit, controlling
UPPER_LEFT vs. UPPER_RIGHT.  There is no way to request LOWER_RIGHT,
so UPPER_RIGHT is the best you can do.

In the G45 docs, it's marked as "Reserved" as well, but we just
decided to use it anyway.

This patch unifies the behavior between Gen4-5 and Gen6+.

Note that this is separate from point sprite texcoord behavior.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-14 15:56:21 -07:00
Kenneth Graunke
6d4e031d9a i965: When gl_PointSize is unwritten, default to 1.0 on Gen4-5.
Modern GL specifications say that the point size should be 1.0 when
gl_PointSize is unwritten and the last enabled stage is a geometry
or tessellation shader.  If it's a vertex shader, though, both the
GL specs and ES 3.0 spec say that it's undefined - so since Gen4-5
only support vertex shaders, there's no actual requirement to do this.

Since there is a cost associated (an extra dirty bit, which may cause
SF_STATE to be emitted more often), it may not be a good idea.

The real benefit is that it makes all generations behave identically.
And that seems somewhat nice...

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-14 15:56:21 -07:00
Kenneth Graunke
3d34e27522 i965: Make Gen4-5 SF_STATE use the point size calculations from Gen6+.
Apparently, Nanhai made the Gen4-5 point size calculations round to the
nearest integer in commit 8d5231a358,
"according to spec".  When Eric first ported the driver to Sandybridge,
he did not implement this rounding.

In the GL 2.1 and 3.0 specs "Basic Point Rasterization" section, it does
say "If antialiasing and point sprites are disabled, the actual width is
determined by rounding the supplied width to the nearest integer, then
clamping it to the implementation-dependent maximum non-antialised point
width."

In contrast, GL 3.1 and later do not appear to contain this rounding.

It might be reasonable to round, given that we only implement GL 2.1.
Of course, if we were to do that, we should actually implement the AA
vs. non-AA distinction.  Brian added an XXX comment reminding us to fix
this 10 years ago, but it never happened.

I think a better plan is to follow the newer, unrounded behavior.  This
is what we do on Gen6+ and it passes all the relevant conformance tests.

Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2017-06-14 15:56:21 -07:00
Jason Ekstrand
d9261275cc i965: Do an end-of-pipe sync after flushes
According to the docs, a simple CS stall is insufficient to ensure that
the memory from the flush is visible and an end-of-pipe sync is needed.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 15:11:42 -07:00
Jason Ekstrand
314ec7b46f i965/blorp: Do an end-of-pipe sync around CCS ops
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 15:11:40 -07:00
Jason Ekstrand
96e7b7ac54 i965: Do an end-of-pipe sync prior to STATE_BASE_ADDRESS
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 15:11:39 -07:00
Topi Pohjolainen
7b607aae3f i965: Add an end-of-pipe sync helper
v2 (Jason Ekstrand):
 - Take a flags parameter to control the flushes
 - Refactoring

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 15:11:22 -07:00
Jason Ekstrand
b771d9a136 i965: Unify the two emit_pipe_control functions
These two functions contain almost identical logic except for one SNB
workaround required for render target cache flushes.  They may as well
call into the same code so we only have to handle the work-arounds in
one place.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 15:11:21 -07:00
Jason Ekstrand
a8ea68bc93 i965: Take a uint64_t immediate in emit_pipe_control_write
It's a 64-bit value.  Splitting it up just makes the function arguments
awkward.

Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 15:11:19 -07:00
Jason Ekstrand
86da08367b i965: Flush around state base address
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-06-14 15:11:06 -07:00
Kenneth Graunke
244c2a5d2c i965: Print "force dual color blending" in FS recompile debug output.
I forgot to add this when introducing the new key field.  It doesn't
happen often - just with the Unigine workarounds.  But we may as well
have it, so we get an accurate picture of why recompiles happen.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2017-06-14 14:30:11 -07:00
Eric Le Bihan
2154defcd6 Fix khrplatform.h not installed if EGL is disabled.
KHR/khrplatform.h is required by the EGL, GLES and VG headers, but is
only installed if Mesa3d is compiled with EGL support.

This patch installs this header file unconditionally.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77240
Signed-off-by: Eric Le Bihan <eric.le.bihan.dev@free.fr>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-14 16:55:13 +01:00
Ville Syrjälä
c1eedb43f3 i915: Fix wpos_tex vs. -1 comparison
wpos_tex used to be a GLuint so assigning -1 to it and
later comparing with -1 worked correctly, but commit
c349031c27 ("i915: Fix texcoord vs. varying collision in
fragment programs") changed wpos_tex to uint8_t and hence
broke the comparison. To fix this define a more explicit
invalid value for wpos_tex.

gcc warns us:
i915_fragprog.c:1255:57: warning: comparison is always true due to limited range of data type [-Wtype-limits]
    if (inputsRead & VARYING_BITS_TEX_ANY || p->wpos_tex != -1) {
                                                         ^

And clang says:
i915_fragprog.c:1255:57: warning: comparison of constant -1 with expression of type 'uint8_t' (aka 'unsigned char') is always true [-Wtautological-constant-out-of-range-compare]
   if (inputsRead & VARYING_BITS_TEX_ANY || p->wpos_tex != -1) {
                                            ~~~~~~~~~~~ ^  ~~

Cc: Chih-Wei Huang <cwhuang@android-x86.org>
Cc: Eric Anholt <eric@anholt.net>
Cc: Ian Romanick <ian.d.romanick@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Fixes: c349031c27 ("i915: Fix texcoord vs. varying collision in fragment programs")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-14 18:22:52 +03:00
Samuel Pitoiset
5f8b654b47 tgsi/scan: add missing 'static' to tgsi_is_bindless_image_file()
This should fix compilation errors in some situations.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101418
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 15:30:39 +02:00
Chuck Atkins
ad69b037b1 configure.ac: Reduce zlib requirement from 1.2.8 to 1.2.3.
Testing with zlib versions 1.2.{3,4,5,6,7,8} showed no difference in
functionality, correctness, or zlib API usage and 1.2.3 is the oldest
version available in still actively deployed production Linux
distributions (RHEL/CentOS 6 and SuSE 11).

Build 17.1.1 against the system supplied zlib-devel packages for 1.2.3
in EL6 and 1.2.7 on EL7. I then swapped out the zlib version at runtime
via LD_LIBRARY_PATH with ones build from the release tarballs from
zlib.net

Testwise - I ran the piglit shader profile with --quick addded to the
tests since I figured that would exercise the shader cache, which would
in turn use zlib.

Signed-off-by: Chuck Atkins <chuck.atkins@kitware.com>
Cc: 17.1 <mesa-stable@lists.freedesktop.org>
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
[Emil Velikov: add hunk about version/piglit testing]
Acked-by: Emil Velikov <emil.velikov@collabora.com>
2017-06-14 12:03:22 +01:00
Samuel Pitoiset
65d1e4d1eb radeonsi: enable ARB_bindless_texture
This has only been tested on RX480.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
285ec4463b radeonsi: add support for loading bindless images
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
950b5ffa31 radeonsi: add support for loading bindless samplers
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
0c2834c5b2 radeonsi: invalidate buffers which are made resident if needed
When a buffer becomes resident, check if it has been invalidated,
if so update the descriptor and the dirty flag.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
811756dfd0 radeonsi: upload new descriptors when resident buffers are invalidated
When texture buffers are invalidated the addr in the resident
descriptor has to be updated but we can't create a new descriptor
because the resident handle has to be the same.

Instead, use the WRITE_DATA packet which allows to update memory
directly but graphics/compute have to be idle in case the GPU is
reading the descriptor.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
48fe8a6210 radeonsi: only decompress resident textures/images when used
When the current bound shaders don't use any bindless textures
or images, it's useless to decompress the resident resources.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
2c3a7d5840 radeonsi: track use of bindless samplers/images from tgsi_shader_info
This adds some new helper functions to know if the current draw
call (or dispatch compute) is using bindless samplers/images,
based on TGSI analysis.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
e1813a8635 radeonsi: decompress resident textures/images before graphics/compute
Similar to the existing decompression code path except that it
loops over the list of resident textures/images.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
d7e1a66bb5 radeonsi: decompress DCC for resident textures/images
Analogous to bound textures/images. We should also update the
resident descriptors and disable COMPRESSION_EN for avoiding
useless DCC fetches, but I postpone this optimization for a
separate series.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00
Samuel Pitoiset
a45e198e2d radeonsi: only add descriptors in presence of resident handles
This won't help much except for applications that use a ton
of resident handles. Though, this will reduce the winsys
overhead a little bit.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2017-06-14 10:04:36 +02:00