The spec is quite clear this is not allowed:
From Section 4.4. (Layout Qualifiers) of the GLSL 4.60 spec:
"Layout qualifiers can appear in several forms of declaration.
They can appear as part of an interface block definition or
block member, as shown in the grammar in the previous section.
They can also appear with just an interface-qualifier to establish
layouts of other declarations made with that qualifier:
layout-qualifier interface-qualifier ;
Or, they can appear with an individual variable declared with
an interface qualifier:
layout-qualifier interface-qualifier declaration ;"
From Section 4.10 (Memory Qualifiers) of the GLSL 4.60 spec:
"Layout qualifiers cannot be used on formal function parameters,
and layout qualification is not included in parameter matching."
However on the Nvidia binary driver they actually fail to compile
if image function params don't have a layout qualifier. This results
in applications such as No Mans Sky using layout qualifiers on params.
I've submitted a CTS test to expose this problem in the Nvidia driver
but until that is resolved this patch will help Mesa drivers work
around the issue.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This extension provides new GLSL built-in function
beginFragmentShaderOrderingIntel() that guarantees
(taking wording of GL_INTEL_fragment_shader_ordering
extension) that any memory transactions issued by
shader invocations from previous primitives mapped to
same xy window coordinates (and same sample when
per-sample shading is active), complete and are visible
to the shader invocation that called
beginFragmentShaderOrderingINTEL().
One advantage of INTEL_fragment_shader_ordering over
ARB_fragment_shader_interlock is that it provides a
function that operates as a memory barrie (instead
of a defining a critcial section) that can be called
under arbitary control flow from any function (in
contrast the begin/end of ARB_fragment_shader_interlock
may only be called once, from main(), under no control
flow.
Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
because the closed driver exposes it.
It's equivalent to ARB_gpu_shader_int64.
In this patch, I did everything the same as we do for ARB_gpu_shader_int64.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The main purpose for having NV_fragment_shader_interlock
extension is because that extension is also for GLES31 while
the ARB extension is for GL only.
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Now that the elements version handles both cases, remove the
non-elements version.
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
This extension provides new GLSL built-in functions
beginInvocationInterlockARB() and endInvocationInterlockARB()
that delimit a critical section of fragment shader code. For
pairs of shader invocations with "overlapping" coverage in a
given pixel, the OpenGL implementation will guarantee that the
critical section of the fragment shader will be executed for
only one fragment at a time.
Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
- remove mtypes.h from most header files
- add main/menums.h for often used definitions
- remove main/core.h
v2: fix radv build
Reviewed-by: Brian Paul <brianp@vmware.com>
The compatibility and core tokens were not added until GLSL 1.50,
for GLSL 1.40 just assume all shaders built with a compat profile
are compat shaders.
Fixes rendering issues in Dawn of War II on radeonsi which has
enabled OpenGL 3.1 compat support.
Fixes: a0c8b49284 "mesa: enable OpenGL 3.1 with ARB_compatibility"
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105807
This fixes a bug in radeonsi where LLVM cannot handle the case where
a break exists but its not the last instruction in the block.
LLVM would fail with:
Terminator found in the middle of a basic block!
LLVM ERROR: Broken function found, compilation aborted!
Fixes: 96fe8834f5 "glsl_to_tgsi: do fewer optimizations with GLSLOptimizeConservatively"
Reviewed-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105317
This should end the drought of bits in the ast_type_qualifier object.
The bitset_t type works pretty much as a drop-in replacement for the
current uint64_t bitset.
The only catch is that the bitset_t type as defined in the previous
commit doesn't have a trivial constructor (because it has a
user-defined constructor), so it cannot be used as union member
without providing a user-defined constructor for the union (which
causes it in turn to be non-trivially constructible). This annoyance
could be easily addressed in C++11 by declaring the default
constructor of bitset_t to be the implicitly defined one -- IMO one
more reason to drop support for GCC 4.2-4.3.
The other minor change was required because glsl_parser_extras.cpp was
hard-coding the type of bitset temporaries as uint64_t, which (unlike
would have been the case if the uint64_t had been replaced with
e.g. an __int128) would otherwise have caused a build failure, because
the boolean conversion operator of bitset_t is marked explicit (if
C++11 is available), so the bitset won't be silently truncated down to
1 bit in order to use it to initialize the uint64_t temporaries
(yikes).
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
This effectively factorizes a couple of similar routines.
v2 (Neil Roberts): Non-trivial rebase on master
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Some symbols gathered in the symbols table during parsing are needed
later for the compile and link stages, so they are moved along the
process. Currently, only functions and non-temporary variables are
copied between symbol tables. However, the built-in gl_PerVertex
interface blocks are also needed during the linking stage (the last
step), to match re-declared blocks of inter-stage shaders.
This patch adds a new utility function that will factorize current code
that copies functions and variables between two symbol tables, and in
addition will copy explicitly declared gl_PerVertex blocks too.
The function will be used in a subsequent patch.
v2 (Neil Roberts):
Allow the src symbol table to be NULL and explicitly copy the
gl_PerVertex symbols in case they are not referenced in the exec_list.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
There are two callers of the constructor, and they are right next to
each other. Move the "#anon_struct" name handling to the parser so that
the conditional can be removed.
I've also deleted part of the comment (about the memory leak) because I
don't think it's quite accurate or relevant.
text data bss dec hex filename
8310399 269336 294072 8873807 87674f 32-bit i965_dri.so before
8310339 269336 294072 8873747 876713 32-bit i965_dri.so after
7845611 346552 420592 8612755 836b93 64-bit i965_dri.so before
7845579 346552 420592 8612723 836b73 64-bit i965_dri.so after
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
There are two issues with the current implementation. First, it relies
on the layout(local_size_*) happening in the same shader as the main
function, and secondly it doesn't work for variable group sizes.
In both cases, the simplest fix is to move the setup of these derived
values to a later time, similar to how the gl_VertexID workarounds are
done. There already exist system values defined for both of the derived
values, so we use them unconditionally, and lower them after linking is
performed.
While we're at it, we move to using gl_LocalGroupSizeARB instead of
gl_WorkGroupSize for variable group sizes.
Also the dead code elimination avoidance can be removed, since there
can be situations where gl_LocalGroupSizeARB is needed but has not been
inserted for the shader with main function. As a result, the lowering
code has to insert its own copies of the system values if needed.
Reported-by: Stephane Chevigny <stephane.chevigny@polymtl.ca>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103393
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
c7affbf687 enabled GLSLOptimizeConservatively on some
drivers. The idea was to speed up compile times by running
the GLSL IR passes only once each time do_common_optimization()
is called. However loop unrolling can create a big mess and
with large loops can actually case compile times to increase
significantly due to a bunch of redundant if statements being
propagated to other IRs.
Here we make sure to clean things up before moving on.
There was no measureable difference in shader-db compile times,
but it makes compile times of some piglit tests go from a couple
of seconds to basically instant.
The shader-db results seemed positive also:
Totals:
SGPRS: 2829456 -> 2828376 (-0.04 %)
VGPRS: 1720793 -> 1721457 (0.04 %)
Spilled SGPRs: 7707 -> 7707 (0.00 %)
Spilled VGPRs: 33 -> 33 (0.00 %)
Private memory VGPRs: 3140 -> 2060 (-34.39 %)
Scratch size: 3308 -> 2180 (-34.10 %) dwords per thread
Code Size: 79441464 -> 79214616 (-0.29 %) bytes
LDS: 436 -> 436 (0.00 %) blocks
Max Waves: 558670 -> 558571 (-0.02 %)
Wait states: 0 -> 0 (0.00 %)
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Having this separate just makes the code harder to follow, and
requires an extra walk of the IR.
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
As a result, unnamed structs defined in different places of the program
are considered the same types if they have the same fields in the same
order.
This will simplify matching of global variables whose type is an unnamed
struct.
It also fixes a memory leak when the same shader containing unnamed
structs is compiled over and over again: instead of creating a new type
each time, the existing type is re-used.
Finally, this does have the effect that some previously rejected programs
are now accepted, such as:
struct {
float a;
} s1;
struct {
float a;
} s2;
s2 = s1;
C/C++ do not allow that, but GLSL does seem to want to treat unnamed
structs with the same fields as the same type at least during linking
(and apparently, some applications require it), so it seems odd to treat
them as different types elsewhere.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This changes the logic during the conversion of the declaration list
struct S {
...
} v;
from AST to IR, but should not change the end result.
When assigning the type of v, instead of looking `S' up in the symbol
table, we read the type from the member variable of ast_struct_specifier.
This change is necessary for the subsequent change to how anonymous types
are handled.
v2: remove a type override when redefining a structure; should be
the same type in that case anyway
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This adds bindless_sampler and bound_sampler (and respectively
bindless_image and bound_image) to the parser.
v3: - add an extra space in apply_bindless_qualifier_to_variable()
- fix indentation in merge_qualifier()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This moves the hashing of shader source for the cache lookup to before
the preprocessor. In our experience, shaders are unlikely to hash the
same after preprocessing if they didn't hash the same before, so we can
skip preprocessing for cache hits.
Improves Deus Ex start-up times with a warm cache from ~30 seconds to
~22 seconds.
Also fixes the leaking of state.
V2: fix indentation
v3: add the value of MESA_EXTENSION_OVERRIDE to the hash of the shader.
Tested-by (v2): Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Eric Anholt <eric@anholt.net>
Due to a max limit of 65,536 entries on the index table that we use to
decide if we can skip compiling individual shaders, it is very likely
we will have collisions.
To avoid doing too much work when the linked program may be in the
cache this patch delays calling the optimisations until link time.
Improves cold cache start-up times on Deus Ex by ~20 seconds.
When deleting the cache index to simulate a worst case scenario
of collisions in the index, warm cache start-up time improves by
~45 seconds.
V2: fix indentation, make sure to call optimisations on cache
fallback, make sure optimisations get called for XFB.
Tested-by: Grazvydas Ignotas <notasas@gmail.com>
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
This will allow to hash additional data into the cache keys or even
change the hashing algorithm easily, should we decide to do so.
v2: don't try to compute key (and crash) if cache is disabled
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Unused/unchecked by any of the callers.
v2: Fix the glsl cases that have crept in since v1
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Grazvydas Ignotas <notasas@gmail.com>
Because we optimistically skip compiling shaders if we have seen them
before we may need to compile them later at link time if they haven't
yet been use in a specific combination to create a program.
Rather than always recompiling we take advantage of the
gl_compile_status enum introduced in the previous patch to only
compile when we have previously skipped compilation.
This helps with regressions in app start-up times on cold cache
runs, compared with no cache.
Deus Ex: Mankind Divided start-up times:
cache disabled: ~3m15s
cold cache master: ~4m23s
cold cache with this patch: ~3m33s
Acked-by: Marek Olšák <marek.olsak@amd.com>
This will allow us to tell if a shader really has been compiled or
if the shader cache has just seen it before.
Acked-by: Marek Olšák <marek.olsak@amd.com>
Previously, when q.subroutine was set to 1, a new subroutine
declaration was added to the AST, while 0 meant a subroutine
definition has been detected by the parser.
Thus, setting the q.subroutine flag in both situations is
obviously wrong because a new type identifier is added instead
of trying to match the declaration. To fix it up, introduce
ast_type_qualifier::is_subroutine_decl() to differentiate
declarations and definitions easily.
This fixes a regression with:
arb_shader_subroutine/compiler/direct-call.vert
Cc: Mark Janes <mark.a.janes@intel.com>
Fixes: be8aa76afd ("glsl: remove unecessary flags.q.subroutine_def")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100026
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This bit is definitely not necessary because subroutine_list
can be used instead. This frees one more bit in the flags.q
struct which is nice because arb_bindless_texture will need
4 bits for the new layout qualifiers.
No piglit regressions found (including compiler tests) with
"-t subroutine".
v2: set the subroutine flag for validating illegal flags
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The scenario is:
glShaderSource
glCompileShader <-- deferred due to cache hit of shader
glShaderSource <-- with new source code
glAttachShader
glLinkProgram <-- no cache hit for program
At this point we need to compile the original source when we
fallback.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The hash key for glsl metadata is a hash of the hashes of each GLSL
source string.
This commit uses the put_key/get_key support in the cache put the SHA-1
hash of the source string for each successfully compiled shader into the
cache. This allows for early, optimistic returns from glCompileShader
(if the identical source string had been successfully compiled in the past),
in the hope that the final, linked shader will be found in the cache.
This is based on the intial patch by Carl.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Passes the newly added piglit test for this extension on i965.
V2: Fix comments by Ilia.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
Previously if you used MESA_GL_VERSION_OVERRIDE=3.3COMPAT, Mesa exposed
an OpenGL 3.3 compatibility profile context (with various unimplemented
features and bugs), but still refused to compile shaders with
#version 330 compatibility
This patch simply adds a small bit of plumbing to let that through.
Of course the same caveats apply: compatibility profile is still not
supported (and will not be supported), so there are no guarantees that
anything will work.
Tested-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
define __STDC_FORMAT_MACROS and include <inttypes.h> (same as
ir_builder_print_visitor.cpp already does).
Otherwise, some mingw build errors out (since
8e7e1ae036 and
bbce1c538d presumably) with:
src/compiler/glsl/ir_print_visitor.cpp:479:40: error: expected ‘)’ before ‘PRIu64’
case GLSL_TYPE_UINT64:fprintf(f, "%" PRIu64, ir->value.u64[i]); break;
(Note even with that fix I get other format specifier warnings:
src/compiler/glsl/ir_print_visitor.cpp:473:47:
warning: unknown conversion type character ‘a’ in format [-Wformat=]
fprintf(f, "%a", ir->value.f[i]);
^
src/compiler/glsl/ir_print_visitor.cpp:473:47:
warning: too many arguments for format [-Wformat-extra-args]
but it still compiles at least)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>