Commit Graph

62 Commits

Author SHA1 Message Date
Nanley Chery
2ae70329f5 intel: Move the D16 workarounds out of ISL
Implement the workarounds in anv and iris instead.

Before this commit, ISL unconditionally modified workaround registers
while filling out depth stencil state. To account for this, drivers
unconditionally stalled prior to emitting depth stencil packets. This
hurt performance.

By having the drivers perform the workarounds, they can choose when to
modify the relevant registers. The drivers now avoid emitting the
workaround for NULL depth buffers. This reduces stalls and leads to
better performance.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (the ISL/Anv bits)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (the Iris bits)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11454>
2021-08-20 17:50:35 +00:00
Kenneth Graunke
2616e15c01 iris: Rename bo->gtt_offset to bo->address
This is the virtual memory address of the buffer object.  Calling it the
BO's address is a lot more obvious than calling it an offset in one of
the now many graphics translation tables.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12206>
2021-08-11 08:05:00 +00:00
Nanley Chery
34dbbfdd14 anv,iris: Port the D16 workaround stalls to BLORP
Commit cd40110420 added stalls before register writes that occur when
drivers emit depth stencil packets. However, it only did so for
non-BLORP draw calls. Since those packets are sometimes emitted during
BLORP calls, add stalls there too.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4574
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10939>
2021-05-25 20:55:27 +00:00
Mark Janes
19ea0974da iris: Use const uploader for blorp vertex data
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10759>
2021-05-11 16:03:20 -07:00
Kenneth Graunke
bfe2c5f667 iris: only flush the render cache for aux changes, not format changes
Reviewed-by: Felix DeGrood <felix.j.degrood@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10217>
2021-04-15 04:54:40 +00:00
Anuj Phogat
9da8a55b08 intel: Rename GEN_GEN macro to GFX_VER
Commands used to do the changes:
export SEARCH_PATH="src/intel src/gallium/drivers/iris src/mesa/drivers/dri/i965"
grep -E "GEN_GEN" -rIl $SEARCH_PATH | xargs sed -ie "s/GEN_GEN/GFX_VER/g"

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9936>
2021-04-02 18:33:06 +00:00
Vinson Lee
7e412a5d33 iris: Fix typos.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9382>
2021-03-15 00:01:30 +00:00
Anuj Phogat
692472a376 intel: Rename "gen_" prefix used in common code to "intel_"
This patch renames functions, structures, enums etc. with "gen_"
prefix defined in common code.

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
2021-03-10 22:23:51 +00:00
Anuj Phogat
733b0ee8cb intel: Rename files with gen_ prefix in common code to intel_
Changes in this patch include:
- Rename all files in src/intel/common path
- Update the filenames used in source and build files

Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9413>
2021-03-10 22:23:51 +00:00
Kenneth Graunke
f7d4ebbf86 iris: add hooks to call INTEL_MEASURE
These hooks were written in the initial IRIS_MEASURE implementation.
Minor changes by Mark Janes <markjanes@swizzler.org> to adapt to the
INTEL_MEASURE reimplementation.

Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7354>
2021-02-01 17:24:57 -08:00
Mark Janes
0b6209b908 blorp: add hook for INTEL_MEASURE
Saves the snapshot type within the blorp parameters.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7354>
2021-02-01 17:24:57 -08:00
Kenneth Graunke
939bc0c588 iris: Reconfigure the URB only if it's necessary or possibly useful
Reconfiguring the URB partitioning is likely to cause shader stalls,
as the dividing line between each stage's section of memory is moving.
(Technically, 3DSTATE_URB_* are pipelined commands, but that mostly
means that the command streamer doesn't need to stall.)  So it should
be beneficial to update the URB configuration less often.

If the previous URB configuration already has enough space for our
current shader's needs, we can just continue using it, assuming we
are able to allocate the maximum number of URB entries per stage.
However, if we ran out of URB space and had to limit the number of
URB entrties for a stage, and the per-entry size is larger than we
need, we should reconfigure it to try and improve concurrency.

So, we begin tracking the last URB configuration in the context,
and compare against that when updating shader variants.

Cuts 36% of the URB reconfigurations (excluding BLORP) from a
Shadow of Mordor trace, and 46% from a GFXBench Manhattan 3.0 trace.

One nice thing is that this removes the need to look at the old
prog_data when updating shaders, which should make it possible to
unbind shader variants without causing spurious URB updates.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8721>
2021-01-27 18:30:54 +00:00
Kenneth Graunke
02fe825a61 isl, anv, iris: Add a centralized helper to select MOCS based on usage
On Gen12+, we can enable additional caches in certain usage situations.
This routes that decision making to a central place in ISL, based on
surface usage flags, and updates both drivers to use it.  (i965 doesn't
need to change because it doesn't support Gen12.)

We continue handling the "external" decision via an anv_mocs() wrapper
for now, since we store that flag in anv_bo, which isl doesn't know
about.  (We could introduce an ISL_SURF_USAGE_EXTERNAL, but I'm not
actually sure that would be cleaner.)

This patch should not have any functional nor performance effects, as
we continue selecting the exact same MOCS values for now.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7104>
2020-10-19 19:18:11 +00:00
Kenneth Graunke
bfc1fd22cd iris: Delete useless #define
When I was bringing up the driver, I had BLORP use #ifdefs for
softpin-mode vs. relocation-mode.  That all got reworked during
review, but apparently this #define is still kicking around,
even though nothing uses it.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5610>
2020-06-23 10:25:31 -07:00
Francisco Jerez
8252bb0ec6 OPTIONAL: iris: Perform BLORP buffer barriers outside of iris_blorp_exec() hook.
The iris_blorp_exec() hook needs to be executed under a single
indivisible sync region, which means that in cases where we need to
emit a PIPE_CONTROL for a buffer barrier we won't be able to track the
subsequent commands separately from the previous commands, which will
prevent us from optimizing out subsequent PIPE_CONTROLs if we
encounter the same buffers again.  In particular I've encountered this
situation in some SynMark test-cases which perform lots of BLORP
operations with the same buffer bound as both source and destination
(in order to generate mipmaps): In such a scenario if the source
requires flushing we'd also end up flushing for the destination
redundantly, even though a single PIPE_CONTROL would have been
sufficient.

This avoids a 4.5% FPS regression in SynMark OglHdrBloom and a 3.5%
FPS regression in SynMark OglMultithread.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
2020-06-03 23:12:22 +00:00
Francisco Jerez
b928188493 iris: Open-code iris_cache_flush_for_read() and iris_cache_flush_for_depth().
These have become one-liners now so they can be easily inlined.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
2020-06-03 23:12:22 +00:00
Francisco Jerez
74c774dce9 iris: Remove render cache hash table-based synchronization.
The render cache hash table is now *mostly* redundant with the more
general seqno matrix-based cache tracking mechanism.  Most hash table
operations are now gone except for the format mismatch checks done in
iris_cache_flush_for_render().  Redundant code removed as a separate
patch for bisectability.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
2020-06-03 23:12:22 +00:00
Francisco Jerez
aa78d05a23 iris: Remove depth cache set tracking and synchronization.
The depth cache set is now redundant with the more general seqno
matrix-based cache tracking mechanism.  Removed as a separate patch
for bisectability.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
2020-06-03 23:12:22 +00:00
Francisco Jerez
eb5d1c2722 iris: Annotate all BO uses with domain and sequence number information.
Probably the most annoying patch to review from the whole series --
Mark every buffer object use as accessed through some caching domain
with the sequence number of the current synchronization section of the
batch.  The additional argument of iris_use_pinned_bo() makes sure I'd
have gotten a compile error if I had missed any buffer added to the
batch validation list.

There are only a few exceptions where a buffer is left untracked while
adding it to the validation list, justified below:

 - Batch buffers: These are strictly read-only for the moment.

 - BLORP buffer objects: Their seqnos are bumped manually at the end
   of iris_blorp_exec() instead, in order to avoid plumbing domain
   information through BLORP address combining.

 - Scratch buffers: The contents of these are strictly thread-local.

 - Shader images and SSBOs: Accesses of these buffers are explicitly
   synchronized at the API level.

v2: Opt out of tracking more aggressively (Ken): In addition to the
    above, surface states, binding tables, instructions and most
    dynamic states are now left untracked, which means a *lot* more BO
    uses marked IRIS_DOMAIN_NONE which need to be reviewed extremely
    carefully, since the cache tracker won't be able to provide any
    coherency guarantees for them.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3875>
2020-06-03 23:12:22 +00:00
Francisco Jerez
46183a999b iris: Extend iris_context dirty state flags to 128 bits.
We're nearly out of dirty bits, and some patches pending review on
GitLab no longer apply due to that.  Make room for them by splitting
off shader stage-specific bits into a separate stage_dirty mask.

An alternative would be to split compute-related bits into a separate
mask, but that would prevent the '<< stage' indexing done in various
parts of the driver from working.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5279>
2020-06-03 22:22:19 +00:00
Lionel Landwerlin
07781f0afe iris: store workaround address
This will allow to select a different address later, leaving the
beginning of the buffer to some other use.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3203>
2020-05-20 15:58:22 +00:00
Lionel Landwerlin
0ff5b9e692 blorp: rename workaround address function
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3203>
2020-05-20 15:58:22 +00:00
Mike Blumenkrantz
91375f13ce iris: move iris_vtable to iris_screen
instead of inlining this into every context, now a struct is used in the screen
struct to reduce memory usage and simplify a couple of the methods

Closes: https://gitlab.freedesktop.org/kwg/mesa/-/issues/6
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4376>
2020-04-29 16:59:45 +00:00
Rafael Antognolli
a7de6f1321 iris: Split aux map initialization from invalidation.
We can write the aux map address only once during the batch
initialization, and then only invalidate it once we modify it.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4005>
2020-03-02 22:28:11 +00:00
Kenneth Graunke
4bac2fa3c6 iris: Fix BLORP vertex buffers to respect ISL MOCS settings
Fixes: a4da6008b6 ("iris: Use mocs from isl_dev.")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3720>
2020-02-21 16:44:53 -08:00
Jason Ekstrand
09e4c33085 intel/blorp: Always emit URB config on Gen7+
Previously, i965/iris tried to reuse the currently programmed URB config
if it was good enough for BLORP, rather than reprogramming it each time.
However, this will make some things harder on Gen12+ and we've not seen
any performance impact from emitting URB more frequently in ANV.

This makes the blorp <-> driver interface a bit simpler on Gen7+ because
now all the driver has to do is to provide the L3$ config rather than
trying to hand off URB re-config to blorp.

Cc: "20.0" mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3454>
2020-01-30 18:46:20 -06:00
Jason Ekstrand
a500a6b7f1 blorp: Pass the VB size to the VF cache workaround
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-12-05 10:59:10 -06:00
Kenneth Graunke
3fdf2bb313 iris: Disable VF cache partial address workaround on Gen11+
The vertex cache uses the full 48-bit address on Gen11+.  See the
documentation for 3DSTATE_VERTEX_BUFFERS, which describes the
workaround and lists it as pre-Icelake.

Interestingly, the docs don't mention index buffers as needing a
workaround at all.  So either we've been overzealous, or the docs
never got updated to record that.  Which begs the question of whether
the issue there was fixed, if there was one...

Cuts 40% of the PIPE_CONTROLs from Civilization VI's benchmark; appears
that it improves performance by about 1-2% on Icelake 8x8 (not frequency
locked).
2019-11-26 12:13:34 -08:00
Jordan Justen
2e6a7ced4d iris/gen12: Write GFX_AUX_TABLE base address register
Rework:
 * Move last_aux_map_state to iris_batch. (Nanley, Ken)

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-28 00:09:14 -07:00
Kenneth Graunke
0b7ecfdda5 iris: Implement the Broadwell NP Z PMA Stall Fix
This should help avoid stalls in the pixel mask array in certain
non-promoted depth cases.  It especially helps for Z16, as each bit
in the PMA corresponds to two pixels when using Z16, as opposed to
the usual one pixel.

Improves performance in GFXBench5 TRex by 22% (n=1).
2019-10-08 21:53:12 -07:00
Kenneth Graunke
706c9f2d60 iris: Skip double-disabling TCS/TES/GS after BLORP operations
BLORP always turns off TCS/TES/GS.  If regular drawing also has them
disabled (the overwhelmingly common case), then leaving them disabled
is just fine by us and we can skip dirtying them, as that would just
re-disable them a second time on the next draw.

If they are actually enabled, however, we do need to flag them.

Cuts 52% of the 3DSTATE_HS packets in an Aztec Ruins trace.
2019-09-19 07:56:15 -07:00
Kenneth Graunke
325e25d689 iris: Add support for the always_flush_cache=true debug option.
This can be useful for debugging missing flushes.
2019-09-09 11:55:27 -07:00
Francisco Jerez
026773397b iris/gen9: Optimize slice and subslice load balancing behavior.
See "i965/gen9: Optimize slice and subslice load balancing behavior."
for the rationale.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-08-12 13:17:58 -07:00
Kenneth Graunke
10560f8506 iris: Minor tidying 2019-07-03 22:24:44 -07:00
Kenneth Graunke
262787b9bc iris: Drop bo != NULL check from blorp 48b invalidate function.
There is always a BO.
2019-06-21 20:50:42 -05:00
Kenneth Graunke
5da37a826b Revert "iris: Don't check VF address high bits when there is no buffer."
This reverts commit db8f57a5cb.

This is bonkers.  There will always be a BO.
2019-06-21 20:50:42 -05:00
Kenneth Graunke
db8f57a5cb iris: Don't check VF address high bits when there is no buffer.
If there is no buffer, then it doesn't matter.  Leave the old stale
high bits in place (for next time) and don't bother invalidating.

Cuts 5.6% of the flushes in the Civilization VI demo on Kabylake GT2.
2019-06-20 13:32:16 -05:00
Kenneth Graunke
d4a4384b31 iris: Implement INTEL_DEBUG=pc for pipe control logging.
This prints a log of every PIPE_CONTROL flush we emit, noting which bits
were set, and also the reason for the flush.  That way we can see which
are caused by hardware workarounds, render-to-texture, buffer updates,
and so on.  It should make it easier to determine whether we're doing
too many flushes and why.
2019-06-20 13:32:15 -05:00
Kenneth Graunke
b5fa3abfc2 iris: Don't flag IRIS_DIRTY_URB after BLORP operations unless it changed
We already flag IRIS_DIRTY_URB when we change it, but we were
additionally flagging it on every BLORP operation, even if we didn't.
2019-05-26 17:45:18 -07:00
Kenneth Graunke
7d2b54e393 iris: Record state sizes for INTEL_DEBUG=bat decoding.
Felix noticed a crash when using INTEL_DEBUG=bat decoding.  It turned
out that we were sometimes placing variable length data near the end
of a buffer, and with the decoder guessing random lengths rather than
having an actual count, it was walking off the end and crashing.  So
this does more than improve the decoder output.

Unfortunately, this is a bit more complicated than i965's handling,
because we don't have a single state buffer.  Various places upload
data via u_upload_mgr, and so there isn't a central place to record
the size.  We don't need to catch every single place, however, since
it's only important to record variable length packets (like viewports
and binding tables).

State data also lives arbitrarily long, rather than being discarded on
every batch like i965, so we don't know when to clear out old entries
either.  (We also don't have a callback when an upload buffer is
released.)  So, this tracking may space leak over time.  That's probably
okay though, as this is only a debugging feature and it's a slow leak.
We may also get lucky and overwrite existing entries as we reuse BOs,
though I find this unlikely to happen.

The fact that the decoder works in terms of offsets from a state base
address is also not ideal, as dynamic state base address and surface
state base address differ for iris.  However, because dynamic state
addresses start from the top of a 4GB region, and binding tables start
from addresses [0, 64K), it's highly unlikely that we'll get overlap.

We can always improve this, but for now it's better than what we had.
2019-05-23 08:07:08 -07:00
Sagar Ghuge
bbef6c2d5f iris: Flag fewer dirty bits in BLORP
v2: 1) Skip flagging IRIS_DIRTY_DEPTH_BUFFER if
       BLORP_BATCH_NO_EMIT_DEPTH_STENCIL is set (Kenneth Graunke)
    2) Add missing flags (Kenneth Graunke)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-11 22:46:39 -07:00
Sagar Ghuge
bca28deb46 iris: Track last VS URB entry size
Return immediately if last VS URB entry size is good enough for BLORP
operation

v2: Fix comments (Caio)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Kenneth Graunke<kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-08 10:01:39 -08:00
Sagar Ghuge
d0a8fba69a iris: Refactor code to share 3DSTATE_URB_* packet
v2: 1) Set IRIS_DIRTY_URB bit (Caio)
    2) Get rid of unnecessary function (Caio)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-03-08 10:01:39 -08:00
Kenneth Graunke
3bcb1a7fcd iris: Don't whack SO dirty bits when finishing a BLORP op
Re-emitting 3DSTATE_SO_BUFFERS can be hazardous, as it could zero
offsets.  Plus, it's just not necessary - BLORP doesn't change these.
2019-02-21 10:26:11 -08:00
Kenneth Graunke
61798e3c88 iris: CS stall on VF cache invalidate workarounds
See commit 31e4c9ce40 in i965.
2019-02-21 10:26:11 -08:00
Dave Airlie
823609b1a3 iris/WIP: add broadwell support
This adds all the state changes, MOCS changes,
2019-02-21 10:26:11 -08:00
Kenneth Graunke
154e3e45bb iris: better MOCS 2019-02-21 10:26:11 -08:00
Kenneth Graunke
5dbd6df9f7 iris: Do the 48-bit vertex buffer address invalidation workaround 2019-02-21 10:26:10 -08:00
Kenneth Graunke
525d89cafc iris: delete dead code 2019-02-21 10:26:09 -08:00
Kenneth Graunke
2b956a093a iris: totally untested icelake support 2019-02-21 10:26:08 -08:00