Delete unnecessary virtual functions, we need just two. Refactor code
so the 'default behavior' logic (stderr and/or creating file) is not
duplicated.
Rename the virtuals so overrides don't hide the common convenience
functions. Finally, provide a variant of dump_instructions() with
a `FILE *` parameter.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23457>
In order to avoid switching pull constants to push constants and then
having to back to pull, compute the push ranges up-front. This way we
know by the time we emit code exactly what ranges are pushable. This is
a bit inefficient in the case where the "normal" push constants get
compacted. However, most apps don't use giant piles of dead uniforms
combined with substantial UBO use so this should be ok.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10571>
The way we handle spilling for fp64 in vec4 is to emit a series of MOVs
which swizzles the data around and then a pair of 32-bit spills. This
works great except that the next time we go to pick a spill reg, the
compiler isn't smart enough to figure out that the register has already
been spilled. Normally we do this by looking at the sources of spill
instructions (or destinations of fills) but, because it's separated from
the actual value by a MOV, we can't see it. This commit adds a new
opcode VEC4_OPCODE_MOV_FOR_SCRATCH which is identical to MOV in
semantics except that it lets RA know not to spill again.
Fixes: 82c69426a5 "i965/vec4: support basic spilling of 64-bit registers"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10571>
We were using nir_tex_instr::dest_type to a glsl_type, then passing it
to emit_texture(), only to just check the number of components. Just
pass the number of components directly. This lets us delete
brw_glsl_base_type_for_nir_type, which was asserting with
nir_texop_all_samples_equal because it didn't handle bool32.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7989>
v2: Restore the gen == 10 hunk in brw_compile_vs (around line 2940).
This function is also used for scalar VS compiles. Squash in:
intel/vec4: Reindent after removing Gen8+ support
intel/vec4: Silence unused parameter warning in try_immediate_source
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> [v1]
Reviewed-by: Matt Turner <mattst88@gmail.com> [v1]
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6826>
These should be more accurate than the current cycle counts, since
among other things they consider the effect of post-scheduling passes
like the software scoreboard on TGL. In addition it will enable us to
clean up some of the now redundant cycle-count estimation
functionality in the instruction scheduler.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This introduces an analysis pass intended to estimate several
performance statistics of the shader, including cycle count latency
and throughput values, based on static modeling. It has instruction
performance information more comprehensive than the current scheduling
pass for all platforms between Gen4-11, and works on both the FS and
VEC4 back-end.
The most immediate purpose of this pass is to implement a heuristic
meant to determine whether using SIMD32 dispatch for a fragment shader
can be expected to help more than it hurts. In addition this will
allow the effect of passes run after scheduling (e.g. the TGL software
scoreboard pass and the VEC4 dependency control pass) to be visible in
shader-db statistics.
But that isn't the end of the story, other potential applications of
this pass (not part of this MR) I've been playing around with are:
- Implement a similar SIMD16 heuristic allowing the identification of
inefficient SIMD16 fragment shaders.
- Implement similar SIMD16 and SIMD32 heuristics for the compute
shader stage -- Currently compute shader builds always use the
SIMD16 shader if available and never use the SIMD32 shader unless
strictly necessary, which is suboptimal under certain conditions.
- Hook up to the instruction scheduler in order to improve the
accuracy of its timing information.
- Use as heuristic in order to drive the selection of scheduling
modes (Matt was experimenting with that).
- Plug to the TGL software scoreboard pass in order to implement a
more effective SBID token allocation algorithm, since in general
the optimal token allocation depends on the timings of all
instructions in the program.
- Use its bottleneck detection functionality in order to implement a
heuristic computing a more optimal bound for the number of fragment
shader threads executed in parallel (by adjusting the
MaximumNumberofThreadsPerPSD control of 3DSTATE_PS).
As a follow-up I'm planning to submit updated timing information for
Gen12 platforms -- Everything else required to support Gen12 like SWSB
handling is already included in this patch, but there were some IP
concerns regarding the TGL timing parameters since they cannot
currently be obtained with the documentation and hardware which is
publicly available. The timing parameters for any previous Gen7-11
platforms can be obtained by anyone by sampling the timestamp register
using e.g. shader_time, though I have some more convenient
instrumentation coming up.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
I think we're unanimous in wanting to drop nir_lower_to_source_mods.
It's a bit of complexity to handle in the backend, but perhaps more
importantly, would be even more complexity to handle in nir_search.
And, it turns out that since we made other compiler improvements in the
last few years, they no longer appear to buy us anything of value.
Summarizing the results from shader-db from this patch:
- Icelake (scalar mode)
Instruction counts:
- 411 helped, 598 hurt (out of 139,470 shaders)
- 99.2% of shaders remain unaffected. The average increase in
instruction count in hurt programs is 1.78 instructions.
- total instructions in shared programs: 17214951 -> 17215206 (<.01%)
- instructions in affected programs: 1143879 -> 1144134 (0.02%)
Cycles:
- 1042 helped, 1357 hurt
- total cycles in shared programs: 365613294 -> 365882263 (0.07%)
- cycles in affected programs: 138155497 -> 138424466 (0.19%)
- Haswell (both scalar and vector modes)
Instruction counts:
- 73 helped, 1680 hurt (out of 139,470 shaders)
- 98.7% of shaders remain unaffected. The average increase in
instruction count in hurt programs is 1.9 instructions.
- total instructions in shared programs: 14199527 -> 14202262 (0.02%)
- instructions in affected programs: 446499 -> 449234 (0.61%)
Cycles:
- 5253 helped, 5559 hurt
- total cycles in shared programs: 359996545 -> 360038731 (0.01%)
- cycles in affected programs: 155897127 -> 155939313 (0.03%)
Given that ~99% of shader-db remains unaffected, and the affected
programs are hurt by about 1-2 instructions - which are all cheap
ALU instructions - this is unlikely to be measurable in terms of
any real performance impact that would affect users.
So, drop them and simplify the backend, and hopefully enable other
future simplifications in NIR.
Reviewed-by: Eric Anholt <eric@anholt.net> [v1]
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4616>
This involves wrapping vec4_live_variables in a BRW_ANALYSIS object
and hooking it up to invalidate_analysis() so it's properly
invalidated. Seems like a lot of churn but it's fairly
straightforward. The vec4_visitor invalidate_ and
calculate_live_intervals() methods are no longer necessary after this
change.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This moves the following methods that are currently defined in
vec4_visitor (even though they are side products of the liveness
analysis computation) and are already implemented in
brw_vec4_live_variables.cpp:
> int var_range_start(unsigned v, unsigned n) const;
> int var_range_end(unsigned v, unsigned n) const;
> bool virtual_grf_interferes(int a, int b) const;
> int *virtual_grf_start;
> int *virtual_grf_end;
It makes sense for them to be part of the vec4_live_variables object,
because they have the same lifetime as other liveness analysis results
and because this will allow some extra validation to happen wherever
they are accessed in order to make sure that we only ever use
up-to-date liveness analysis results.
The naming of the virtual_grf_start/end arrays was rather misleading,
they were indexed by variable rather than by vgrf, this renames them
start/end to match the FS liveness analysis pass. The churn in the
definition of var_range_start/end is just in order to avoid a
collision between the start/end arrays and local variables declared
with the same name.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
The invalidate_analysis() method knows what analysis passes there are
in the back-end and calls their invalidate() method to report changes
in the IR. For the moment it just calls invalidate_live_intervals()
(which will eventually be fully replaced by this function) if anything
changed.
This makes all optimization passes invalidate DEPENDENCY_EVERYTHING,
which is clearly far from ideal -- The dependency classes passed to
invalidate_analysis() will be refined in a future commit.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
brw_vec4.h (in particular vec4_visitor) is logically a user of the
live variables analysis pass, not the other way around.
brw_vec4_live_variables.h requires the definition of some VEC4 IR data
structures to compile, but those can be obtained directly from
brw_ir_vec4.h without including brw_vec4.h.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4012>
This commit is all annoying plumbing work which just adds support for a
new brw_compile_stats struct. This struct provides a binary driver
readable form of the same statistics we dump out to stderr when we
INTEL_DEBUG is set with a shader stage.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This allows us to drop legacy userclip plane handling in both the vec4
and FS backends, and simplifies a few interfaces.
v2 (Jason Ekstrand):
- Move brw_nir_lower_legacy_clipping to brw_nir_uniforms.cpp because
it's i965-specific.
- Handle adding the params in brw_nir_lower_legacy_clipping
- Call brw_nir_lower_legacy_clipping from brw_codegen_vs_prog
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
If a 3-source instruction uses immediate values 1.0 and -1.0, just load
1.0 into a register. Use the negation source modifier to get -1.0.
This has trivial impact now, but it prevents a few thousand regressions
on vec4 platforms with "nir/algebraic: Recognize open-coded flrp(-1, 1,
a) and flrp(1, -1, a)"
All Gen6 and Gen7 platforms had similar results. (Haswell shown)
total instructions in shared programs: 13487412 -> 13487406 (<.01%)
instructions in affected programs: 541 -> 535 (-1.11%)
helped: 6
HURT: 0
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 0.36% max: 2.08% x̄: 1.65% x̃: 1.80%
95% mean confidence interval for instructions value: -1.00 -1.00
95% mean confidence interval for instructions %-change: -2.33% -0.97%
Instructions are helped.
total cycles in shared programs: 376402564 -> 376402500 (<.01%)
cycles in affected programs: 10348 -> 10284 (-0.62%)
helped: 10
HURT: 1
helped stats (abs) min: 2 max: 26 x̄: 7.00 x̃: 2
helped stats (rel) min: 0.13% max: 2.05% x̄: 0.89% x̃: 0.79%
HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel) min: 0.29% max: 0.29% x̄: 0.29% x̃: 0.29%
95% mean confidence interval for cycles value: -11.72 0.08
95% mean confidence interval for cycles %-change: -1.20% -0.36%
Inconclusive result (value mean confidence interval includes 0).
No shader-db changes on any other Intel platform.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Effectivley unused since dd7135d55d ("intel/compiler: Use the flrp
lowering pass for all stages on Gen4 and Gen5"). I had intended to
remove this code as part of that series, but I forgot.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The next patch replaces an unsigned bitfield with a plain unsigned,
which triggers gcc to begin warning on signed/unsigned comparisons.
Keeping this patch separate from the actual move allows bisectablity and
generates no additional warnings temporarily.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Everywhere we handle SSBO intrinsics, we have exactly the same pattern
for computing the index so we may as well make a helper for it. We also
add a get_nir_src_imm to vec4 and use it for SSBO offsets.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
A recent commit (see below) triggered some cases where conditional
modifier propagation and dead code elimination would cause a MAD
instruction like the following to be generated:
mad.l.f0 null, ...
Matt pointed out that fs_visitor::fixup_3src_null_dest() fixes cases
like this in the scalar backend. This commit basically ports that code
to the vec4 backend.
NOTE: I have sent a couple tests to the piglit list that reproduce this
bug *without* the commit mentioned below. This commit fixes those
tests.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Fixes: ee63933a7 ("nir: Distribute binary operations with constants into bcsel")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105704
These days, we're just passing a pointer to a prog_data field, which
we already have access to. We can just use it directly.
(In the past, it was a pointer to a separate value.)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The only thing still using it is INVOCATION_ID for geometry shaders.
That's easily enough inlined into the nir_intrinsic_load_invocation_id
handling code.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're already doing this in the FS back-end. This just does the same
thing in the vec4 back-end.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
[ Francisco Jerez: Drop useless vec4_visitor dependencies. Demote to
static stand-alone function. Don't write unused components in the
result. Use vec4_builder interface for register allocation. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Mostly a dummy git mv with a couple of noticable parts:
- With the earlier header cleanups, nothing in src/intel depends
files from src/mesa/drivers/dri/i965/
- Both Autoconf and Android builds are addressed. Thanks to Mauro and
Tapani for the fixups in the latter
- brw_util.[ch] is not really compiler specific, so it's moved to i965.
v2:
- move brw_eu_defines.h instead of brw_defines.h
- remove no-longer applicable includes
- add missing vulkan/ prefix in the Android build (thanks Tapani)
v3:
- don't list brw_defines.h in src/intel/Makefile.sources (Jason)
- rebase on top of the oa patches
[Emil Velikov: commit message, various small fixes througout]
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>