It'll grow further, and we'd like to avoid adding an additional
parameter to fs_generator() for each new piece of data.
v2 (idr): Rebase on 17 months. Track a visitor instead of a cfg.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This allows us to drop legacy userclip plane handling in both the vec4
and FS backends, and simplifies a few interfaces.
v2 (Jason Ekstrand):
- Move brw_nir_lower_legacy_clipping to brw_nir_uniforms.cpp because
it's i965-specific.
- Handle adding the params in brw_nir_lower_legacy_clipping
- Call brw_nir_lower_legacy_clipping from brw_codegen_vs_prog
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Right now, all keys have two things in common: a program string ID and a
sampler_prog_key_data. I'd like to add another thing or two and need a
place to put it. This commit adds a new brw_base_prog_key struct which
contains those two common bits.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
One cannot write the URB arbitrarily and therefore the message
has to be carefully constructed. The clever tricks originate
from Kenneth and Jason, I'm just writing the patch.
Fixes GPU hangs on ICL with Vulkan CTS.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Move the scalar-region conversion from the IR to the generator, so it
doesn't affect the Gen11 path. We need the non-scalar regioning
for a later lowering pass that we are adding.
v2: Better commit message (Matt)
Reviewed-by: Matt Turner <mattst88@gmail.com>
We add two new texture sources for bindless surface and sampler handles.
Bindless surface handles are expected to be pre-shifted so that the
20-bit surface state table index is in the top 20 bits of the 32-bit
handle. This lets us avoid any extra shifts in the shader. Bindless
sampler handles are 32-byte aligned byte offsets from general state base
address. We use 32-byte aligned instead of 16-byte aligned to avoid
having to use more indirect messages than needed. It means we can't
tightly pack samplers but that's probably not a big deal.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
From "Alpha Coverage" section of SKL PRM Volume 7:
"If Pixel Shader outputs oMask, AlphaToCoverage is disabled in
hardware, regardless of the state setting for this feature."
From OpenGL spec 4.6, "15.2 Shader Execution":
"The built-in integer array gl_SampleMask can be used to change
the sample coverage for a fragment from within the shader."
From OpenGL spec 4.6, "17.3.1 Alpha To Coverage":
"If SAMPLE_ALPHA_TO_COVERAGE is enabled, a temporary coverage value
is generated where each bit is determined by the alpha value at the
corresponding sample location. The temporary coverage value is then
ANDed with the fragment coverage value to generate a new fragment
coverage value."
Similar wording could be found in Vulkan spec 1.1.100
"25.6. Multisample Coverage"
Thus we need to compute alpha to coverage dithering manually in shader
and replace sample mask store with the bitwise-AND of sample mask and
alpha to coverage dithering.
The following formula is used to compute final sample mask:
m = int(16.0 * clamp(src0_alpha, 0.0, 1.0))
dither_mask = 0x1111 * ((0xfea80 >> (m & ~3)) & 0xf) |
0x0808 * (m & 2) | 0x0100 * (m & 1)
sample_mask = sample_mask & dither_mask
Credits to Francisco Jerez <currojerez@riseup.net> for creating it.
It gives a number of ones proportional to the alpha for 2, 4, 8 or 16
least significant bits of the result.
GEN6 hardware does not have issue with simultaneous usage of sample mask
and alpha to coverage however due to the wrong sending order of oMask
and src0_alpha it is still affected by it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109743
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
The sampler will be ignored since the underlying 'ld_mcs' operation
won't use it, so just fill the field with 0 instead of the texture to
make it clearer that's the case.
This will also avoid is_high_sampler() to kick in unnecessarily, in
case we are using the operation for a texture with index >= 16.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This helps us to compact original instruction:
mul(8) g3<1>D g6<8,8,1>UD 0x00000006UD { align1 1Q };
So now we emit:
mul(8) g3<1>UD g6<8,8,1>UD 0x00000006UD { align1 1Q compacted };
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
There's no reason for us to emit it a pile of times and then have a
whole pass to clean it up. Just emit it once like we really want.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This generalizes the unlit centroid workaround so it's less code and now
supports SIMD32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
While we're here, we change to using horiz_offset() instead of abusing
half().
v2 (Jason Ekstrand):
- Use horiz_offset() instead of half()
Reviewed-by: Matt Turner <mattst88@gmail.com>
This prevents a crash in some arb_enhanced_layouts tests that would be
caused by the next commit.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When using multiple RT write messages to the same RT such as for
dual-source blending or all RT writes in SIMD32, we have to set the
"Last Render Target Select" bit on all write messages that target the
last RT but only set EOT on the last RT write in the shader.
Special-casing for dual-source blend works today because that is the
only case which requires multiple RT write messages per RT. When we
start doing SIMD32, this will become much more common so we add a
dedicated bit for it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This replaces the special magic opcodes which implicitly read inputs
with explicit use of the ATTR file.
v2 (Jason Ekstrand):
- Break into multiple patches
- Change the units of the FS ATTR to be in logical scalars
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
This should shut up some Valgrind errors during pre-regalloc
scheduling. The errors were harmless since they could only have led
to the estimation of the bank conflict penalty of an instruction
pre-regalloc, which is inaccurate at that point of the program
compilation, but no less accurate than the intended "return 0"
fall-back path. The scheduling pass is normally re-run after regalloc
with a well-defined grf_used value and accurate bank conflict
information.
Fixes: acf98ff933 "intel/fs: Teach instruction scheduler about GRF bank conflict cycles."
Reported-by: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Originally we tried to handle this case based on slots_valid. However,
there are a number of ways that this can go wrong. For one, we throw
away any trailing slots which either aren't written or are set to
VARYING_SLOT_PAD. Second, even if PSIZ is a valid slot, we may not
actually write anything there. Between the lot of these, it was
possible to end up in a case where we tried to do a regular URB write
but ended up with a length of 1 which is invalid. This commit moves it
to the end and makes it based on a new boolean flag urb_written.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Cc: mesa-stable@lists.freedesktop.org
It's 8 for everything except compute shaders. For compute shaders,
there's no need to duplicate the computation and it's just a possible
source of error.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Instead of requiring the caller of brw_compile_vs to figure it out, just
grow the param array on-demand.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This moves us away to the array of pointers model and onto a model where
each param is represented by a generic uint32_t handle. We reserve 2^16
of these handles for builtins that get generated by somewhere inside the
compiler and have well-defined meanings. Generic params have handles
whose meanings are defined by the driver.
The primary downside to this new approach is that it moves a little bit
of the work that we would normally do at compile time to draw time. On
my laptop this hurts OglBatch6 by no more than 1% and doesn't seem to
have any measurable affect on OglBatch7. So, while this may come back
to bite us, it doesn't look too bad.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
NIR calls these system values but they come in from the VF unit as
vertex data. It's terribly convenient to just be able to treat them as
such in the back-end.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Commit e1af20f18a changed the shader_info
from being embedded into being just a pointer. The idea was that
sharing the shader_info between NIR and GLSL would be easier if it were
a pointer pointing to the same shader_info struct. This, however, has
caused a few problems:
1) There are many things which generate NIR without GLSL. This means
we have to support both NIR shaders which come from GLSL and ones
that don't and need to have an info elsewhere.
2) The solution to (1) raises all sorts of ownership issues which have
to be resolved with ralloc_parent checks.
3) Ever since 00620782c9, we've been
using nir_gather_info to fill out the final shader_info. Thanks to
cloning and the above ownership issues, the nir_shader::info may not
point back to the gl_shader anymore and so we have to do a copy of
the shader_info from NIR back to GLSL anyway.
All of these issues go away if we just embed the shader_info in the
nir_shader. There's a little downside of having to copy it back after
calling nir_gather_info but, as explained above, we have to do that
anyway.
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Mostly a dummy git mv with a couple of noticable parts:
- With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
- Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
- brw_util.[ch] is not really compiler specific, so it's moved to i965.
v2:
- move brw_eu_defines.h instead of brw_defines.h
- remove no-longer applicable includes
- add missing vulkan/ prefix in the Android build (thanks Tapani)
v3:
- don't list brw_defines.h in src/intel/Makefile.sources (Jason)
- rebase on top of the oa patches
[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>