While we're at it, make sure we error out if it's not supported when
required.
This brings us a bit closer to being able to test on SwiftShader, which
doesn't currently support KHR_external_memory_fd.
TGL will have separate tables for src0 and src1, so the shared function
will no longer make sense.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
The EU compaction unit test fuzzes the compaction code by flipping bits.
We use a simple skip_bits() function with a list of reserved bits to
ignore, but for more complex cases like invalid combinations of register
file:type, we need either machinery to check validity or for these
functions to simply inform us whether a combination was valid.
enum brw_reg_type a 4-bit field in brw_reg, so rather than expanding it
with an "INVALID" value, just return -1 and let the caller check for
that.
Scott suggested redefining unreachable() within the unit test to
longjmp() which would allow driver code like this to still use it and
allow the test to handle expected failures like this. If that plan works
out, I plan to revert this.
Mostly for vertex formats, but they are supported as texture formats too
(untested however).
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Rob Clark <robdclark@gmail.com>
If src/dst addresses are dw aligned and size is > 4 then we align
byte count to dw as well.
PAL implementation works like this.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Previously, the scheduler tried to move up instructions from below depending
VMEM instructions only to move them down again when scheduling the VMEM
instruction.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
These got lost due to some refactoring.
Due to the way our scheduler works currently, for now
we add back the reorder flag for divergent loads only.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
This patch changes VMEM scheduling in a way that they can only
be moved upwards by previous VMEM instructions but not downwards.
This way, it improves the order of VMEM instructions in relation
to their users.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Previously, we allowed all shaders to reduce the number of max_waves to as low as 5.
Restricting this on shaders with low register demand, increases the total number of waves
while the VMEM def-use distances hardly change.
This patch also changes the max number of move operations per MEM instruction.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
This shaves around 4-5% off of a CPU-limited example running with the
Dawn WebGPU implementation.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
In 0e4a75f917, Ken added a flag brw_stage_prog_data which indicates
whether any UBO pulls ever occur. Unfortunately, he neglected to set
the bit in the vec4 back-end. This was fine at the time because the
optimization was intended for iris which does not support gen7 and using
the vec4 back-end on Gen8+ requires an environment variable. We want to
use this in Vulkan which does support Gen7 so we want the information
from the vec4 back-end as well as scalar.
Fixes: 0e4a75f917 "intel/compiler: Record whether any pull constant..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
RADV_PERFTEST=outooforder has been removed a while ago. This fixes
dumping the options into hang reports.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is actually a non-threaded implementation. I'd summarize this
as event-based submission.
When submit happens we walk a tree of submissions that depend on
the syncobj signal operations to be submitted and if those submission
we no other dependencies we start to execute them immediately.
Or, well I still use a list to avoid issues with long chains and
the stacksize when using recursion.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This does not fully do wait-before-submit, to be done in a follow
up patch.
For kernels without support for timeline syncobjs, this adds an
implementation of non-shareable timelines using legacy syncobjs.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This will lead to fewer pipelines in the cache, which is assumed to
become our most unavoidable performance bottle-neck down the line.
Reviewed-by: Dave Airlie <airlied@redhat.com>
This is a function with timeout support for reading from the pipe
between processes used for secure compile.
Initially we hardcode the timeout to 5 seconds. We can adjust the
timeout limit in future if needed.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This will be used in the following patch to support timeouts for
reading the pipe between processes.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This skips touching %ebx most times and it shows that glGetString performance
increased from 114M/s to 120M/s on my desktop.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>