Commit Graph

116816 Commits

Author SHA1 Message Date
Alyssa Rosenzweig
b8c4fb235e pan/midgard: Implement SIMD-aware dead code elimination
We would like to eliminate not just entire dead instructions, but also
dead components, which increases scheduler flexibility (since some
vector instructions can become scalar after eliminating dead
components). This also will allow better RA in the future.

Results are meh.

total instructions in shared programs: 3453 -> 3451 (-0.06%)
instructions in affected programs: 60 -> 58 (-3.33%)
helped: 2
HURT: 0

total bundles in shared programs: 1826 -> 1824 (-0.11%)
bundles in affected programs: 33 -> 31 (-6.06%)
helped: 2
HURT: 0

total quadwords in shared programs: 3144 -> 3144 (0.00%)
quadwords in affected programs: 0 -> 0
helped: 0
HURT: 0

total registers in shared programs: 321 -> 321 (0.00%)
registers in affected programs: 45 -> 45 (0.00%)
helped: 11
HURT: 11
helped stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
helped stats (rel) min: 16.67% max: 50.00% x̄: 39.70% x̃: 50.00%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 100.00% max: 100.00% x̄: 100.00% x̃: 100.00%
95% mean confidence interval for registers value: -0.45 0.45
95% mean confidence interval for registers %-change: -1.87% 62.18%
Inconclusive result (value mean confidence interval includes 0).

total threads in shared programs: 445 -> 447 (0.45%)
threads in affected programs: 2 -> 4 (100.00%)
helped: 1
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-10-20 12:02:31 +00:00
Alyssa Rosenzweig
6c4b97011b pan/midgard: Create dependency graph bytewise
This allows for vec16 dependencies in the scheduler, not that we have
any yet (thankfully).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-10-20 12:02:31 +00:00
Alyssa Rosenzweig
825f11e739 pan/midgard: Handle nontrivial masks in texture RA
The texture instruction has a mask we need to take into account.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-10-20 12:02:31 +00:00
Alyssa Rosenzweig
d1d3411ba5 pan/midgard: Implement per-byte liveness tracking
Now that we have notion of byte masks, liveness tracking can be updated
to reflect this extra granularity without loss of correctness.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-10-20 12:02:31 +00:00
Alyssa Rosenzweig
43fd730fc4 pan/midgard: Simplify mir_bytemask_of_read_components
There are easy ways to iterate sources!

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-10-20 12:02:31 +00:00
Alyssa Rosenzweig
e9202ff3cb pan/midgard: Report byte masks for read components
Read component masks don't have a particular type associated, since the
type of the ALU operation may not match the type of the operands in
question. So let's generate byte masks instead, and update the rest of
the compiler to use byte masks when analyzing reads.

Preparation for mixed types.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-10-20 12:02:31 +00:00
Alyssa Rosenzweig
d079631248 pan/midgard: Add helpers for manipulating byte masks
There are essentially two formats of masks in play beginning with this
commit: masks per-channel and masks per-byte. The former make sense
within a given fixed-size instruction; the latter are
typesize-independent. It turns out you need the latter to meaningfully
manipulate instructions containing multiple sizes (which is quite
possible with ALU operations).

Similarly, we have mir_srcsize. We calculate the size of the source by
analyzing the size of the instruction itself and stepping down if there
is a half-modifier.

Finally, we have mir_round_bytemask_down, for when we want to take a
byte mask and "round it down" to a given component size, so that we can
use it as a component mask.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-10-20 12:02:31 +00:00
Alyssa Rosenzweig
e981b69484 pan/midgard: Implement OP_IS_STORE with table
..rather than open-coding.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-10-20 12:02:31 +00:00
Alyssa Rosenzweig
8e31b14858 pan/midgard: Tableize load/store ops
This will allow us to encode properties about the load/store ops like we
do for ALU ops. We include now properties about whether we have a store,
and if there are special cases on the load/store op. We also tag each
instruction by its natural size... this is probably not totally right,
but it's a start.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-10-20 12:02:31 +00:00
Alyssa Rosenzweig
5952add9a9 pan/midgard: Factor out mir_get_alu_src
This helper is used in a bunch of places ... might as well make that
common.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-10-20 12:02:31 +00:00
Alyssa Rosenzweig
f77ea9798d pan/midgard/disasm: Fix printing 8-bit/16-bit masks
The trick is realizing even with a destination override, the masks are encoded in the same mode as the
instruction itself, rather than stepping down. The override means that
the smaller type is used, but the mask is parsed as if it were the
higher type. Overriding down is down by printed by blinding doing this. Overriding up can be thought of as printing in the upper size, but shifting the alphabet to use the upper half, i.e. shifting xyzw to become abcd.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-10-20 12:02:31 +00:00
Alyssa Rosenzweig
d49fdca229 pan/midgard: Identify 64-bit atomic opcodes
They are symmetric to their 32-bit counterparts, just shifted.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-10-20 12:02:31 +00:00
Alyssa Rosenzweig
6601570ead pan/midgard: Debug mir_insert_instruction_after_scheduled
Add some comments explaining what's going on in a more natural flow in
order to solve the actual bug.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Fixes: 2d914ebe81 ("pan/midgard: Fix memory corruption in register spilling")
2019-10-20 12:02:31 +00:00
Christian Gmeiner
a6de05a968 etnaviv: keep track of buffer valid ranges for PIPE_BUFFER
This allows a write to proceed to an uninitialized part of a buffer
even when the GPU is using the previously-initialized portions.

Such a situation can be triggered with the following API usage example:

  glBufferSubData(..., offset, size, data1);
  glDrawArrays(...);
  // append new vertex data
  glBufferSubData(..., offset+size, size, data2);
  glDrawArrays(...);

Same is done for freedreno, nouveau and radeon.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2019-10-20 09:03:06 +00:00
Christian Gmeiner
eab6d75066 etnaviv: store updated usage in pipe_transfer object
Store the changed usage in the newly created transfer object.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2019-10-20 09:03:06 +00:00
Christian Gmeiner
cd4528563f etnaviv: fix code style
Fixes: 1194afdfe3 ("etnaviv: rework the stream flush to always go through the context flush")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-10-20 10:20:22 +02:00
Lionel Landwerlin
b30e01aef5 anv: fix memory leak on device destroy
v2: handle vma destruction if vkCreateDevice fails (Jordan)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1959
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-10-20 08:02:22 +00:00
Christian Gmeiner
f834656a41 etnaviv: fix compile warnings
Fixes: e5cc66dfad ("etnaviv: Rework locking")
Fixes: 1456aa61cc ("etnaviv: Rework resource status tracking")
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Jonathan Marek <jonathan@marek.ca>
2019-10-20 08:28:18 +02:00
Eric Anholt
d8741ad251 mesa: Redefine the RG formats as array formats.
This is the layout used in the GL API, and maps directly to PIPE
formats with no endianness trickery.  As with the LA change, this
fixes big-endian fetching from texbos.  Also cleans up some endian
shenanigans in shader images.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-20 04:39:48 +00:00
Eric Anholt
4f384ddf5f gallium: Drop the unused PIPE_FORMAT_A*L* formats.
Now that Mesa is also using an array format for LA, nothing was using
these.  (And, clearly, no HW driver had exposed them).

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-20 04:39:48 +00:00
Eric Anholt
6a819cabe8 mesa: Replace MESA_FORMAT_L8A8/A8L8 UNORM/SNORM/SRGB with an array format.
The array format is what the GL API wants (fixing texbos on
big-endian), and matches directly to gallium's corresponding array
format.  The only driver exposing A8L8 was radeon/r200 in big-endian,
where the HW's underlying format was trying to read as array and we
needed to flip things around to make our packed format come out right
(note that while the radeon format tables had both AL and LA,
ChooseTextureFormat would only pick one of them based on endianness).

v2: Don't make r200/radeon use endian swaps.
v3: Rebase on dropping the r200 _be/_le format table removal patch
v4: reword commit message to explain why we can drop both formats
    from radeon.

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
2019-10-20 04:39:48 +00:00
Eric Anholt
236b478b2e mesa: Replace the LA16_UNORM packed formats with one array format.
The array format is what the GL API wants (and we made a mistake in
the format returned for texbos on big-endian!), and it's exactly what
the gallium-side PIPE_FORMAT_L16A16 is.  The only downside is that
dri_util tries to fall back to sampling RG16 using LA16, which doesn't
have a match for big-endian any more.  No HW drivers supported A16L16
anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-20 04:39:48 +00:00
Eric Anholt
1165e3f360 radeon: Drop the unused first arg of OUT_BATCH_RELOC.
This was a trap when trying to figure out how to fit data bits into
the reloc.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-20 04:39:48 +00:00
Eric Anholt
2a548cf92f radeon: Fill in the TXOFFSET field containing the tile bits in our relocs.
The first arg to OUT_BATCH_RELOC is ignored, we actually wanted these
in the third arg.  They're always 0 so far, so it didn't matter.

v2: Reword commit message that I don't end up using the tile bits, but
    keep the commit as a cleanup anyway.

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
2019-10-20 04:39:48 +00:00
Eric Anholt
ecddabfa76 r100/r200: factor out txformat/txfilter setup from the TFP path.
No matter what, we deref the texFormat from the table, except for a
mistake in cpp=4 where we pulled a 0 out of the table either way.

v2: Rebase on dropping r200 table deduplication patch.

Reviewed-by: Marek Olšák <marek.olsak@amd.com> (v1)
2019-10-20 04:39:48 +00:00
Vasily Khoruzhick
7ceafa4b40 lima: fix PP stack size
PP stack size should be set to maximum PP stack size, not to stack size of
last shader.

Fixes: 27e7603c34 ("lima: fix ppir spill stack allocation")
Tested-by: Icenowy Zheng <icenowy@aosc.io>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
2019-10-19 18:15:18 -07:00
Marijn Suijten
224b267282 freedreno/a5xx: enable a510
Kernel support for this GPU is added by the following series:
https://patchwork.kernel.org/project/linux-arm-msm/list/?series=187609
In particular https://patchwork.kernel.org/patch/11189953/

Tested on Sony Xperia X and X Compact.

Signed-off-by: Marijn Suijten <marijns95@gmail.com>
Tested-by: AngeloGioacchino Del Regno <kholk11@gmail.com>
2019-10-19 16:48:24 +02:00
Prodea Alexandru-Liviu
48d617118a Appveyor/Meson: Add build test of osmesa gallium
Signed-off-by: Prodea Alexandru-Liviu <liviuprodea@yahoo.com>
Acked-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-10-19 14:44:44 +00:00
Lionel Landwerlin
3f8f52b241 anv: fix vkUpdateDescriptorSets with inline uniform blocks
With inline uniform blocks descriptor, the meaning of descriptorCount
is a number of bytes to copy into the descriptor. Don't try to use
that size as an index into the descriptor table.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 43f40dc7cb ("anv: Implement VK_EXT_inline_uniform_block")
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1195
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-10-19 13:16:40 +03:00
Rob Clark
1cea76274e freedreno/ir3: handle imad24_ir3 case in UBO lowering
Similiar to iadd, we can fold an added constant value from an imad24_ir3
into the load_uniform's constant offset.  This avoids some cases where
the addition of imad24_ir3 could otherwise be a regression in instr
count.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Rob Clark
d9424e5821 freedreno/ir3: add imul24 opcode
This maps to mul.s24

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Rob Clark
c7b8f16bee freedreno/ir3: optimize immed 2nd src to mad
We can't encode immed sources for cat3 (mad) instructions, but we can
use const in first or third src.  We handled this case already, but we
weren't considering that we could lower immed to const.

For manhattan:

  total instructions in shared programs: 35202 -> 34718 (-1.37%)
  instructions in affected programs: 14931 -> 14447 (-3.24%)
  helped: 90
  HURT: 0
  total full in shared programs: 2451 -> 2359 (-3.75%)
  full in affected programs: 653 -> 561 (-14.09%)
  helped: 69
  HURT: 2

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-18 15:08:54 -07:00
Rob Clark
666b6236f7 freedreno/ir3: add rule to generate imad24
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Rob Clark
5e08f070f0 nir: add nir_lower_amul pass
Lower amul to either imul or imul24, depending on whether 24b is enough
bits to calculate an offset within the thing being dereferenced.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-10-18 15:08:54 -07:00
Rob Clark
1bdde31392 nir: add address calc related opt rules
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Rob Clark
6320e37d4b nir: add amul instruction
Used for address/offset calculation (ie. array derefs), where we can
potentially use less than 32b for the multiply of array idx by element
size.  For backends that support `imul24`, this gives a lowering pass
an easy way to find multiplies that potentially can be converted to
`imul24`.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Rob Clark
0568761f8e nir: Add a new ALU nir_op_imul24
Some hardware can do 24b multiply in a single instruction, but not 32b.
However in most cases 24b is sufficient for address/offset calculation.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Eduardo Lima Mitev
bc2ccdc45a freedreno/ir3: Handle newly added opcode nir_op_imad24_ir3
Simply emit an ir3_MAD_S24 instruction in the backend.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Eduardo Lima Mitev
32e5fbf47c nir: Add a new ALU nir_op_imad24_ir3
ir3 compiler has a signed integer multiply-add instruction (MAD_S24)
that is used for different offset calculations in the backend.
Since we intend to move some of these calculations to NIR, we need
a new ALU op that can directly represent it.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Rob Clark
6ad442acae freedreno/ir3: rename mul.s/mul.u
to mul.s24/mul.u24, to better reflect that these are 24b multiply.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Rob Clark
ad8167c1e0 nir/search: fix the PoT helpers
Otherwise, if the base type is (for example) uint32, we would
incorrectly think that PoT optimizations could not apply.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Jason Ekstsrand <jason@jleksrand.net>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
2019-10-18 15:08:54 -07:00
Rob Clark
f30c256ec0 freedreno/ir3: enable pre-fs texture fetch for a6xx
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-18 21:11:54 +00:00
Rob Clark
72048dd799 turnip: add support for pre-fs texture fetch
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-18 21:11:54 +00:00
Rob Clark
a5afcc76d5 freedreno/a6xx: add support for pre-fs texture fetch
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-18 21:11:54 +00:00
Hyunjun Ko
e9450ad27d freedreno/ir3: Add support for texture sampling pre-dispatch
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-18 21:11:54 +00:00
Eduardo Lima Mitev
2a0d45ae6c freedreno/ir3: Add a NIR pass to select tex instructions eligible for pre-fetch
The pass should run once at the end of shader compilation, for a4xx
onwards. It iterates texture sampling instructions and mark those
eligibile for pre-dispatch by changing the tex op from 'tex' to
'tex_prefetch'. An instruction is eligibile if:

* The coordinate is a vector where all its components come from a
  shader input.
* The order of the components match exactly that of the input (no
  swizzles).
* The instruction is in the 'main' function, and in the outer
  most-block.

The first two restrictions were arrived to empirically, so more
testing could tighten or loosen it.

The 3rd restriction is there to allow moving the instructions
eligible for pre-dispatch to the beginning of the shader, so
that we don't block the registers holding the result for too
long.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-18 21:11:54 +00:00
Rob Clark
7d4213fe88 freedreno/ir3: force i/j pixel to r0.x
It seems that pre-fs texture fetch only works if ij_pix ends up in r0.x.
I've tried unknown zero bits, to no avail, and blob also seems to force
r0.x when this feature is used.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-18 21:11:54 +00:00
Rob Clark
07e9bf564f freedreno/ir3: add pre-dispatch tex fetch to disasm
Useful to see in disassembly listing texture fetches that were moved to
pre-dispatch.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-18 21:11:54 +00:00
Rob Clark
2b93eb9c76 freedreno/ir3: add dummy bary.f(ei) for pre-fs-fetch
If the only use of varyings is a pre-shader texture-fetch, we still need
to issue a bary.f with the end-input flag, otherwise we'll block further
VS invocations, as the hw will think varying storage is still busy.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-18 21:11:54 +00:00
Rob Clark
392a309a55 freedreno/ir3: fixup register footprint to account for prefetch
It is possible that the result of a pre-fs texture fetch is an output
(or partially an output) of the FS.  Sine the meta:tex_prefetch
instructions are dropped before the assembler, we need to account for
this when we fixup the register footprint.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-18 21:11:54 +00:00