This adds a NIR pass that decides which portions of UBOS we should
upload as push constants, rather than pull constants.
v2: Switch to uint16_t for the UBO block number, because we may
have a lot of them in Vulkan (suggested by Jason). Add more
comments about bitfield trickery (requested by Matt).
v3: Skip vec4 stages for now...I haven't finished wiring up support
in the vec4 backend, and so pushing the data but not using it
will just be wasteful.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Soon, we're going to start providing UBO data to shaders as push
constants, rather than requiring them to issue pull loads. The
3DSTATE_CONSTANT_* commands require 32 byte aligned pointers.
So, we need to increase this from 16 to 32.
Reviewed-by: Matt Turner <mattst88@gmail.com>
By default, 3DSTATE_CONSTANT_* Constant Buffer 0 is relative to dynamic
state base address. This makes it unusable for pushing UBOs. I'd like
to be able to use all four push buffers.
There is a bit in the INSTPM register (or CS_DEBUG_MODE2 on Skylake)
which controls whether buffer 0 is relative to dynamic state base
address, or simply a normal pointer. Setting that gives us full
flexibility.
We can't currently write this on Haswell and earlier, and will need
to update the kernel command parser, and then do the whole version
checking song and dance.
Reviewed-by: Matt Turner <mattst88@gmail.com>
When writing a region of a buffer via glBufferSubData(), we can write
the data asynchronously if the destination doesn't contain any data.
Even if it's busy, the data was undefined, so the new data is fine too.
Removes all stall avoidance blits on BufferSubData calls in
"Total War: WARHAMMER" on my Skylake GT4.
Decreases the number of stall avoidance blits in Manhattan 3.1:
- Skylake GT4: -18.3544% +/- 6.76483% (n=13)
- Apollolake: -12.1095% +/- 5.24458% (n=13)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This doesn't do anything yet, but soon we'll want to know whether an
access to a buffer section may write that data, or simply reads it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Merge the code with gen6+ 3DSTATE_GS, and delete brw_gs_state.c,
together with brw_gs_unit_state.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Since we always call brw_batch_emit anyways, we can hopefully make things
simpler by calling it only once, and then branching inside its body. This
can be helpful when bringing the gen4-5 code into this function.
Additionally, check for GEN_GEN == 6 instead of < 7 in cases that won't apply
to lower gens.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This function only emits a particular case of 3DSTATE_GS. Instead, we can do
that inside genX(upload_gs_state), and later reuse part of that code for
emitting gen4-5 state.
There's the additional benefit of allowing us to remove gen6_gs_state.c, which
was only left because of this function.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Use set_blend_entry_bits and set_depth_stencil_bits to fill most of the
color calc struct, and then manually update the rest.
v2:
- Always check for depth_irb (Ken)
- Always set Backface Stencil Ref (Ken)
- Always set alpha reference value (Ken)
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
gen6+ uses _mesa_base_format_has_channel() to check for the alpha
channel, while gen4-5 use ctx->DrawBuffer->Visual.alphaBits. By using
_mesa_base_format_has_channel() here we keep the same behavior accross
all gen.
While initially both ways of checking the alpha channel seemed correct
to me, this change also seems to fix fbo-blending-formats piglit test on
gen4.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add a helper function to reuse code that fills blend entry related
state, and make genX(upload_blend_state) use it. This function can later
be used by gen4-5 color calc state to set the blend related bits.
Signed-off-by: Rafael Antognolli <rafael.antognolli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Gen4-5 basically glue DEPTH_STENCIL_STATE, COLOR_CALC_STATE, and
BLEND_STATE together into a single COLOR_CALC_STATE structure.
By making a helper function, we'll be able to reuse it when filling
out Gen4-5 COLOR_CALC_STATE without replicating any actual logic.
We use generation-defined typedef to handle the polymorphism.
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
If we allow the size to be more than 2^32, then we should compute it
in 64bit arithmetic otherwise we might run into overflow issues.
CID: 1412892, 1412891
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The compact flag doesn't make sense on local variables, since the
packing on them is up to the driver. This fixes nir_validate assertions
in some cases, particularly when lower_io_to_temporaries is used on
per-vertex inputs/outputs.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
While normally we give variables whose name field is NULL a temporary
name when called from nir_print_shader(), when we were calling from
nir_print_instr() we never bothered, meaning that we just segfaulted
when trying to print out instructions with such a variable. Since
nir_print_instr() is meant to be called while debugging, we don't need
to bother too much about giving a consistent name, but we don't want to
crash in the middle of debugging.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It's a bit rare, but blorp can trigger a urb reconfiguration. When
that happens, we need to re-upload the URB config. Previoulsy blorp
would set BRW_NEW_URB_SIZE, but this is a pretty big hammer as it
would cause back-to-black blorp operations to reconfigure both times.
Using BRW_NEW_BLORP is a small, more accurate hammer.
v2 (idr): Sort BRW_NEW_ tokens to match brw_recalculate_urb_fence and
gen6_urb.
v3 (idr): Don't whack BRW_NEW_URB_SIZE in blorp. Suggested by Jason.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Add support for 32-bit RGBX/RGBA formats which are required for Android.
The original patch (commit ccdcf91104) was reverted (commit
c0c6ca40a2) in mesa as it broke GLX resulting in swapped colors. Based
on further investigation by Chad Versace, moving the RGBX/RGBA configs
to the end is enough to prevent breaking GLX.
The handling of RGBA/RGBX in dri_fill_st_visual is a fix from Marek
Olšák.
Cc: Eric Anholt <eric@anholt.net>
Cc: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Previous check-ins without testing with USE_SIMD16_FRONTEND have
introduced regressions. This fixes the build, not the regressions.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Core will ensure hot tiles are loaded for read and write render targets,
and will skip all output merger for read-only render targets.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
Forwarding from the ES prolog to the ES just barely exceeds the current
maximum array size when 16 vertex attributes are used. Give it a decent
bump to account for merged shaders having up to 32 user SGPRs.
Fixes a crash in GL45-CTS.multi_bind.draw_bind_vertex_buffers.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If the application hasn't done any drawing since the last call, we
would reuse the same back buffer which was used for the previous swap,
which may not have completed yet. This could result in various issues
such as tearing or application hangs.
In the normal case, the behaviour is unchanged.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97957
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101683
Cc: mesa-stable@lists.freedesktop.org
[Michel Dänzer: Make Thomas' fix from bugzilla actually work as
intended, write commit log]
Any form of CCS on gen9+ only works on Y-tiled images. The only caller
of create_for_bo which uses Y-tiled BOs is create_for_dri_image.
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
Reviewed-by: Chad Versace <chadversary@chromium.org>