Commit Graph

26 Commits

Author SHA1 Message Date
Sagar Ghuge
7ecfbd4f6d nir: Add alpha_to_coverage lowering pass
Importing this pass from fs_visitor::emit_alpha_to_coverage_workaround()
in intel/compiler.

v2 (Caio Marcelo de Oliveira Filho):
- Track store output and sample mask instruction
- Nest math insturction for more readability
- Bail out early if no gl_SampleMask

v3: (Caio Marcelo de Oliveira Filho):
- Do math instructions after instruction block
- Restructure code
- Move pass under src/intel/compiler

v4: (Caio Marcelo de Oliveira Filho):
- Organize dither mask calculation

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-21 11:27:29 -07:00
Caio Marcelo de Oliveira Filho
c847bfaaf5 intel/fs/gen12: Add tests for scoreboard pass
Tests the combinations of cases of RAW, WAW and WAR hazards involving
both inorder and outoforder instructions.  Also tests that
dependencies combine and propagate correctly through control
flow (loops and conditionals).

v2: Add an extra test illustrating that the non-logical CFG edge
    between then-block and else-block is being taking into
    account.  (Curro)

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2019-10-17 10:02:35 -07:00
Francisco Jerez
265c7c8971 intel/fs/gen12: Introduce software scoreboard lowering pass.
Gen12+ hardware lacks the register scoreboard logic that used to
guarantee data coherency between register reads and writes in previous
generations.  This lowering pass runs after register allocation in
order to make up for it.

It works by performing global dataflow analysis in order to determine
the set of potential dependencies of every instruction in the shader,
and then inserts any required SWSB annotations and additional SYNC
instructions in order to guarantee data coherency.

v2: Drop unnecessary _safe list iteration (Caio).

v3: Temporarily workaround potential WaR hazard between FPU
    instruction and subsequent out-of-order write, pending
    clarification from the hardware team.  Drop redundant tracking of
    implicit access of acc0-1, since the hardware guarantees coherency
    of these (but not the other accumulators...).

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-11 12:24:16 -07:00
Francisco Jerez
25dd67099d intel/eu: Rework opcode description tables to allow efficient look-up by either HW or IR opcode.
This rewrites the current opcode description tables as a more compact
flat data structure.  The purpose is to allow efficient constant-time
look-up by either HW or IR opcode, which will allow us to drop the
hard-coded correspondence between HW and IR opcodes -- See the next
commits for the rationale.

brw_eu.c is now built as C++ source so we can take advantage of
pointers to member in order to make the look-up function work
regardless of the opcode_desc member used as look-up key.

v2: Optimize devinfo struct comparison (Caio)

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-11 12:24:16 -07:00
Eric Engestrom
178811d8f6 meson: drop unused dep_{thread,dl}
Unused as of last commit.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2019-08-03 00:08:37 +00:00
Eric Engestrom
d2d85b950d meson: replace libmesa_util with idep_mesautil
This automates the include_directories and dependencies tracking so that
all users of libmesa_util don't need to add them manually.

Next commit will remove the ones that were only added for that reason.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Tested-by: Vinson Lee <vlee@freedesktop.org>
2019-08-03 00:08:37 +00:00
Iago Toral Quiroga
3e377c68f8 intel/compiler: add a NIR pass to lower conversions
Some conversions are not directly supported in hardware and need to be
split in two conversion instructions going through an intermediary type.
Doing this at the NIR level simplifies a bit the complexity in the backend.

v2:
 - Consider fp16 rounding conversion opcodes
 - Properly handle swizzles on conversion sources.

v3
 - Run the pass earlier, right after nir_opt_algebraic_late (Jason)
 - NIR alu output types already have the bit-size (Jason)
 - Use 'is_conversion' to identify conversion operations (Jason)

v4:
 - Be careful about the intermediate types we use so we don't lose
   range and avoid incorrect rounding semantics (Jason)

Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com> (v1)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-04-18 11:05:18 +02:00
Kenneth Graunke
fad7801afd i965: Move program key debugging to the compiler.
The i965 driver has a bunch of code to compare two sets of program keys
and print out the differences.  This can be useful for debugging why a
shader needed to be recompiled on the fly due to non-orthogonal state
dependencies.  anv doesn't do recompiles, so we didn't need to share
this in the past - but I'd like to use it in iris.

This moves the bulk of the code to the compiler where it can be reused.
To make that possible, we need to decouple it from i965 - we can't get
at the brw program cache directly, nor use brw_context to print things.
Instead, we use compiler->shader_perf_log(), and simply pass in keys.

We put all of this debugging code in brw_debug_recompile.c, and only
export a single function, for simplicity.  I also tidied the code a
bit while moving it, now that it all lives in one file.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-04-16 09:01:15 -07:00
Jason Ekstrand
9d437f9482 intel/fs: Drop the fs_surface_builder
All of the actual abstraction (except possibly setting size_written)
happens as part of the logical opcodes.  The only thing that the surface
builder is providing at this point is extra levels of functions to call
through.  I'm going to be adding bindless image support soon and all the
extra abstraction here is just getting in the way.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-02-28 16:58:20 -06:00
Francisco Jerez
2c99c7a56c intel/fs: Remove existing lower_conversions pass.
It's redundant with the functionality provided by lower_regioning now.

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-09 12:03:09 -08:00
Francisco Jerez
efa4e4bc5f intel/fs: Introduce regioning lowering pass.
This legalization pass is meant to handle situations where the source
or destination regioning controls of an instruction are unsupported by
the hardware and need to be lowered away into separate instructions.
This should be more reliable and future-proof than the current
approach of handling CHV/BXT restrictions manually all over the
visitor.  The same mechanism is leveraged to lower unsupported type
conversions easily, which obsoletes the lower_conversions pass.

v2: Give conditional modifiers the same treatment as predicates for
    SEL instructions in lower_dst_modifiers() (Iago).  Special-case a
    couple of other instructions with inconsistent conditional mod
    semantics in lower_dst_modifiers() (Curro).

Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2019-01-09 12:03:09 -08:00
Ian Romanick
440c051340 i965/vec4/dce: Don't narrow the write mask if the flags are used
In an instruction sequence like

            cmp(8).ge.f0.0 vgrf17:D, vgrf2.xxxx:D, vgrf9.xxxx:D
    (+f0.0) sel(8) vgrf1:UD, vgrf8.xyzw:UD, vgrf1.xyzw:UD

The other fields of vgrf17 may be unused, but the CMP still needs to
generate the other flag bits.

To my surprise, nothing in shader-db or any test suite appears to hit
this.  However, I have a change to brw_vec4_cmod_propagation that
creates cases where this can happen.  This fix prevents a couple dozen
regressions in that patch.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 5df88c20 ("i965/vec4: Rewrite dead code elimination to use live in/out.")
2018-12-17 13:47:06 -08:00
Dylan Baker
a999798daa meson: Add tests to suites
Meson test has a concepts of suites, which allow tests to be grouped
together. This allows for a subtest of tests to be run only (say only
the tests for nir). A test can be added to more than one suite, but for
the most part I've only added a test to a single suite, though I've
added a compiler group that includes nir, glsl, and glcpp tests.

To use this you'll need to invoke meson test directly, instead of ninja
test (which always runs all targets). it can be invoked as:
`meson test -C builddir --suite $suitename` (meson test has addition
options that are pretty useful).

Tested-By: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2018-11-20 09:09:22 -08:00
Jason Ekstrand
6339aba775 intel/compiler: Lower SSBO and shared loads/stores in NIR
We have a bunch of code to do this in the back-end compiler but it's
fairly specific to typed surface messages and the way we emit them.
This breaks it out into NIR were it's easier to do things a bit more
generally.  It also means we can easily share the code between the vec4
and FS back-ends if we wish.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2018-11-15 19:59:49 -06:00
Jason Ekstrand
37f7983bcc intel/compiler: Do image load/store lowering to NIR
This commit moves our storage image format conversion codegen into NIR
instead of doing it in the back-end.  This has the advantage of letting
us run it through NIR's optimizer which is pretty effective at shrinking
things down.  In the common case of rgba8, the number of instructions
emitted after NIR is done with it is half of what it was with the
lowering happening in the back-end.  On the downside, the back-end's
lowering is able to directly use predicates and the NIR lowering has to
use IFs.

Shader-db results on Kaby Lake:

    total instructions in shared programs: 15166910 -> 15166872 (<.01%)
    instructions in affected programs: 5895 -> 5857 (-0.64%)
    helped: 15
    HURT: 0

Clearly, we don't have that much image_load_store happening in the
shaders in shader-db....

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2018-08-29 14:04:02 -05:00
Mathieu Bridon
2ee1c86d71 meson: Build with Python 3
Now that all the build scripts are compatible with both Python 2 and 3,
we can flip the switch and tell Meson to use the latter.

Since Meson already depends on Python 3 anyway, this means we don't need
two different Python stacks to build Mesa.

Signed-off-by: Mathieu Bridon <bochecha@daitauha.fr>
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2018-08-10 15:15:09 -07:00
Matt Turner
f3833f1ca7 intel/compiler: Use gen_get_device_info() in test_eu_validate
Previously the unit test filled out a minimal devinfo struct. A previous
patch caused the test to begin assert failing because the devinfo was
not complete. Avoid this by using the real mechanism to create devinfo.

Note that we have to drop icl from the table, since we now rely on the
name -> PCI ID translation done by gen_device_name_to_pci_device_id(),
and ICL's PCI IDs are not upstream yet.

Fixes: f89e735719 ("intel/compiler: Check for unsupported register sizes.")
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com>
2018-03-16 13:20:21 -07:00
Dylan Baker
2083a14179 meson: Use dependencies for nir
This creates two new internal dependencies, idep_nir_headers and
idep_nir. The former encapsulates the generation of nir_opcodes.h and
nir_builder_opcodes.h and adding src/compiler/nir as an include path.
This ensures that any target that needs nir headers will have the
includes and that the generated headers will be generated before the
target is build. The second, idep_nir, includes the first and
additionally links to libnir.

This is intended to make it easier to avoid race conditions in the build
when using nir, since the number of consumers for libnir and it's
headers are quite high.

Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2018-01-11 15:40:02 -08:00
Dylan Baker
4ccb981673 meson: Use consistent style for tests
Don't use intermediate variables, use consistent whitespace.

Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2018-01-11 15:40:02 -08:00
Dylan Baker
fbf192a67e meson: Use consistent style
Currently the meosn build has a mix of two styles:
arg : [foo, ...
       bar],

and
arg : [
  foo, ...,
  bar,
]

For consistency let's pick one. I've picked the later style, which I
think is more readable, and is more common in the mesa code base.

v2: - fix commit message

Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
2018-01-11 15:40:02 -08:00
Francisco Jerez
af2c320190 intel/fs: Implement GRF bank conflict mitigation pass.
Unnecessary GRF bank conflicts increase the issue time of ternary
instructions (the overwhelmingly most common of which is MAD) by
roughly 50%, leading to reduced ALU throughput.  This pass attempts to
minimize the number of bank conflicts by rearranging the layout of the
GRF space post-register allocation.  It's in general not possible to
eliminate all of them without introducing extra copies, which are
typically more expensive than the bank conflict itself.

In a shader-db run on SKL this helps roughly 46k shaders:

   total conflicts in shared programs: 1008981 -> 600461 (-40.49%)
   conflicts in affected programs: 816222 -> 407702 (-50.05%)
   helped: 46234
   HURT: 72

The running time of shader-db itself on SKL seems to be increased by
roughly 2.52%±1.13% with n=20 due to the additional work done by the
compiler back-end.

On earlier generations the pass is somewhat less effective in relative
terms because the hardware incurs a bank conflict anytime the last two
sources of the instruction are duplicate (e.g. while trying to square
a value using MAD), which is impossible to avoid without introducing
copies.  E.g. for a shader-db run on SNB:

   total conflicts in shared programs: 944636 -> 623185 (-34.03%)
   conflicts in affected programs: 853258 -> 531807 (-37.67%)
   helped: 31052
   HURT: 19

And on BDW:

   total conflicts in shared programs: 1418393 -> 987539 (-30.38%)
   conflicts in affected programs: 1179787 -> 748933 (-36.52%)
   helped: 47592
   HURT: 70

On SKL GT4e this improves performance of GpuTest Volplosion by 3.64%
±0.33% with n=16.

NOTE: This patch intentionally disregards some i965 coding conventions
      for the sake of reviewability.  This is addressed by the next
      squash patch which introduces an amount of (for the most part
      boring) boilerplate that might distract reviewers from the
      non-trivial algorithmic details of the pass.

The following patch is squashed in:

SQUASH: intel/fs/bank_conflicts: Roll back to the nineties.

Acked-by: Matt Turner <mattst88@gmail.com>
2017-12-07 15:56:06 -08:00
Matt Turner
821ec473a8 i965: Rename intel_asm_annotation -> brw_disasm_info
It was the only file named intel_* in the compiler.

Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-11-17 12:14:38 -08:00
Rob Clark
2207af032b meson: extract out variable for nir_algebraic.py
Also needed in freedreno/ir3.

Signed-off-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Engestrom <eric.engestrom@imgtec.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2017-10-24 15:33:40 -04:00
Jason Ekstrand
b1d1b7222a intel/compiler: Make brw_nir_lower_intrinsics compute-specific
It's already only ever called from brw_compile_cs and only handles
compute intrinsics.  Let's just make it CS-specific.  We can always
make it handle other stages again later if we want.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2017-10-12 22:39:30 -07:00
Dylan Baker
7a5a986ddd meson: convert gtest to an internal dependency
In truth gtest is an external dependency that upstream expects you to
"vendor" into your own tree. As such, it makes sense to treat it more
like a dependency than an internal library, and collect it's
requirements together in a dependency object.

v2: - include with -isystem instead of setting compiler args (Eric)

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2017-10-03 10:02:08 -07:00
Dylan Baker
d1992255bb meson: Add build Intel "anv" vulkan driver
This allows building and installing the Intel "anv" Vulkan driver using
meson and ninja, the driver has been tested against the CTS and has
seems to pass the same series of tests (they both segfault when the CTS
tries to run wayland wsi tests).

There are still a mess of TODO, XXX, and FIXME comments in here. Those
are mostly for meson bugs I'm trying to fix, or for additional things to
implement for other drivers/features.

I have configured all intermediate libraries and optional tools to not
build by default, meaning they will only be built if they're pulled in
as a dependency of a target that will actually be installed) this allows
us to avoid massive if chains, while ensuring that only the bits that
need to be built are.

v2: - enable anv, x11, and wayland by default
    - add configure option to disable valgrind
v3: - fix typo in meson_options (Nicholas)
v4: - Remove dead code (Eric)
    - Remove change to generator that was from v0 (Eric)
    - replace if chain with loop (Eric)
    - Fix typos (Eric)
    - define HAVE_DLOPEN for both libdl and builtin dl cases (Eric)
v5: - rebase on util string buffer implementation

Signed-off-by: Dylan Baker <dylanx.c.baker@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net> (v4)
2017-09-27 09:12:19 -07:00