Commit Graph

13589 Commits

Author SHA1 Message Date
Zoë Blade
05e7f7f438 Fix a few typos
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2015-04-27 17:28:29 +03:00
Marek Olšák
db2415189a radeonsi: set an optimal value for DB_Z_INFO.ZRANGE_PRECISION
Required because of a VI hw bug.

Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-04-27 15:57:07 +02:00
Marek Olšák
bed98eef9a radeonsi: remove deprecated and useless registers
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-04-27 15:56:27 +02:00
Marek Olšák
393b0e0531 radeonsi: remove useless includes
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-04-27 15:56:27 +02:00
Marek Olšák
d8269be1ce gallium/radeon: print winsys info with R600_DEBUG=info
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-04-27 15:56:27 +02:00
Marek Olšák
ecc7f2ed91 gallium/radeon: don't crash when getting out-of-bounds TEMP references
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2015-04-23 16:14:39 +02:00
Dave Airlie
8a41cd2407 softpipe: fix stencil write to use an integer value
This fixes a number of regressions since
61393bdcdc
u_tile: fix stencil texturing tests under softpipe

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89960
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-04-23 08:32:30 +10:00
Rob Clark
cb24d3b7ad freedreno: misc minor cleanups
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark
1b58d8c2bf freedreno/a4xx: (partial) gl_FragCoord.zw
The bit to enable .z is still commented out, as it is triggering gpu
hangs in 0ad.  But at least gl_FragCoord.w works now, and we know what
bits we are *supposed* to set for .z (with that uncommented all piglit
fragcoord tests are passing).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark
a869183123 freedreno/a4xx: primitive-restart
This was the missing bit to get dolphin-emu working on a4xx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark
632ea2a113 freedreno/nir: sysval fixes
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark
13527df143 freedreno/a4xx: wire up integer texture sampling
Similar to a3xx, the compiler needs to know the return type of the sam,
etc, instructions.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark
48a651e98c freedreno/a4xx: formats updates/fixes
Update formats table with new formats that Ilia has figured out, and fix
sampling from srgb texture and integer vbo's.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:28 -04:00
Rob Clark
21ceedfd8b freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-22 13:20:27 -04:00
Emil Velikov
86919352e3 android: use LOCAL_SHARED_LIBRARIES over TARGET_OUT_HEADERS
... to manage the LIBDRM*_CFLAGS. The former is the recommended approach
by the Android build system developers while the latter has been
depreciated for quite some time.

Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-04-22 14:23:28 +01:00
Emil Velikov
413bc0a618 ilo: remove unused include from Android.mk
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw>
2015-04-22 14:18:47 +01:00
Ilia Mirkin
0904774af1 freedreno/a3xx: enable polymode setting with non-fill modes
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-18 17:35:23 -04:00
Ilia Mirkin
6357601628 freedreno/a3xx: fix integer and 32-bit float border colors
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-18 17:35:23 -04:00
Ilia Mirkin
6895c3554e freedreno/a3xx: add support for float R/RG render targets
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-18 17:35:23 -04:00
Rob Clark
95e68adcd9 freedreno/ir3/nir: few little fixes
isaml needs to scale up coords based on LoD.  Also fix bogus bary.f
varying # when there are non-bary frag shader inputs.  And use sub.s of
a positive immediate rather than add.s of negative (since CP is better
about figuring out that those can be collapsed into the cat2 instr).

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 11:40:14 -04:00
Rob Clark
efbf14e893 freedreno/ir3/nir: lower if/else
For now, completely flatten if/else blocks.  That will almost certainly
change once we have flow control.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 11:40:14 -04:00
Rob Clark
e5e11b5baf freedreno/a4xx: support for large shaders
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 10:40:50 -04:00
Rob Clark
20ea698c49 freedreno: update generated headers
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 10:40:44 -04:00
Rob Clark
57f0d3b3c6 freedreno/ir3/nir: UBO support
Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 10:40:36 -04:00
Rob Clark
87807e5cc5 freedreno/ir3: move out helper
We'll also want it in NIR f/e for implementing UBO support.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 10:40:28 -04:00
Rob Clark
70b2f872ea freedreno/a4xx: sysvals and UBOs
Basically just sync up the cmdstream emit parts to match the changes
already done on a3xx.

Also, fix scheduling for mem instructions.  This is needed on a4xx, and
I am a bit surprised it isn't needed for a3xx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
2015-04-17 10:40:18 -04:00
Marek Olšák
b79c620663 radeonsi: add a debug option to compile shaders when they're created
Tested-by: Tom Stellard <thomas.stellard@amd.com>
2015-04-16 18:36:29 +02:00
Emil Velikov
a7d018accf radeonsi: remove bogus r600-- triple
As mentioned by Michel Dänzer for LLVM >= 3.6 we create the
LLVMTargetMachine (with triple amdgcn--), as we setup the radeonsi
context. For older LLVM or hardware (r600) the triple is always r600--
and is created at a later stage - radeon_llvm_compile()

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
2015-04-16 14:15:19 +01:00
Glenn Kennard
17d69862a9 r600g/sb: Skip empty ALU clause while scheduling
Fixes assert triggered by
ext_transform_feedback-intervening-read output use_gs
piglit test.

Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-04-16 12:43:20 +10:00
Eric Anholt
b229e6c7de vc4: Don't try to use color load/stores to blit across format changes.
We could potentially support the right combination of 8888 to 565, but the
important thing for now is to not mix up our orderings of 8888.  Fixes
fbo-copyteximage regressions.
2015-04-15 16:50:23 -07:00
Eric Anholt
cff2e08c4c vc4: Don't try to use color load/stores to do depth/stencil blits.
Fixes regressions in fbo-generatemipmap-formats on depth/stencil (which
does blits to work around baselevel/lastlevel).
2015-04-15 16:50:23 -07:00
Eric Anholt
3a728d4dfb vc4: Update the shadow texture for public textures on every draw.
We don't know who else has written to it, so we'd better update it every
time.  This makes the gears spin in X again.
2015-04-15 16:50:23 -07:00
Eric Anholt
bd957b1b79 vc4: Hook up VC4_DEBUG=perf to some useful printfs. 2015-04-15 16:50:22 -07:00
Tom Stellard
e0994e0f97 radeon/llvm: Improve codegen for KILL_IF
Rather than emitting one kill instruction per component of KILL_IF's src
reg, we now or the components of the src register together and use the
result as a condition for just one kill instruction.

shader-db stats (bonaire):

979 shaders
Totals:
SGPRS: 34872 -> 34848 (-0.07 %)
VGPRS: 20696 -> 20676 (-0.10 %)
Code Size: 749032 -> 748452 (-0.08 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 12288 -> 12288 (0.00 %) bytes per wave

Totals from affected shaders:
SGPRS: 1184 -> 1160 (-2.03 %)
VGPRS: 600 -> 580 (-3.33 %)
Code Size: 13200 -> 12620 (-4.39 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Increases:
SGPRS: 2 (0.00 %)
VGPRS: 0 (0.00 %)
Code Size: 0 (0.00 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)

Decreases:
SGPRS: 5 (0.01 %)
VGPRS: 5 (0.01 %)
Code Size: 25 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)

*** BY PERCENTAGE ***

Max Increase:

SGPRS: 32 -> 40 (25.00 %)
VGPRS: 0 -> 0 (0.00 %)
Code Size: 0 -> 0 (0.00 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Max Decrease:

SGPRS: 32 -> 24 (-25.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 116 -> 96 (-17.24 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

*** BY UNIT ***

Max Increase:

SGPRS: 64 -> 72 (12.50 %)
VGPRS: 0 -> 0 (0.00 %)
Code Size: 0 -> 0 (0.00 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Max Decrease:

SGPRS: 32 -> 24 (-25.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 424 -> 356 (-16.04 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-04-14 13:37:12 +00:00
Tom Stellard
c6d79ed289 radeon/llvm: Run LLVM's instruction combining pass
This should improve code quality in general and will help with some
future changes to how we emit kill instructions.

shader-db shows a few regressions, but these don't seem to be the result
of deficiencies in instcombine.  They're mostly caused by the scheduler
making different decisions than before.

shader-db stats (bonaire):

979 shaders
Totals:
SGPRS: 35056 -> 34872 (-0.52 %)
VGPRS: 20624 -> 20696 (0.35 %)
Code Size: 764372 -> 749032 (-2.01 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 12288 -> 12288 (0.00 %) bytes per wave

Totals from affected shaders:
SGPRS: 13264 -> 13072 (-1.45 %)
VGPRS: 8248 -> 8316 (0.82 %)
Code Size: 486320 -> 470992 (-3.15 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 11264 -> 11264 (0.00 %) bytes per wave

Increases:
SGPRS: 6 (0.01 %)
VGPRS: 20 (0.02 %)
Code Size: 14 (0.01 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)

Decreases:
SGPRS: 32 (0.03 %)
VGPRS: 8 (0.01 %)
Code Size: 244 (0.25 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)

*** BY PERCENTAGE ***

Max Increase:

SGPRS: 32 -> 48 (50.00 %)
VGPRS: 12 -> 20 (66.67 %)
Code Size: 216 -> 224 (3.70 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Max Decrease:

SGPRS: 40 -> 32 (-20.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 368 -> 280 (-23.91 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

*** BY UNIT ***

Max Increase:

SGPRS: 32 -> 48 (50.00 %)
VGPRS: 28 -> 36 (28.57 %)
Code Size: 39320 -> 40132 (2.07 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Max Decrease:

SGPRS: 72 -> 64 (-11.11 %)
VGPRS: 48 -> 40 (-16.67 %)
Code Size: 6272 -> 5852 (-6.70 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-04-14 13:37:05 +00:00
Tom Stellard
2569c7109d radeonsi: Add header and footer to shader stat dump
This makes it easier to parse.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2015-04-14 13:36:59 +00:00
Eric Anholt
1be329e64c vc4: Add a blitter path using just the render thread.
This accelerates the path for generating the shadow tiled texture when
asked to sample from a raster texture (typical in glamor).
2015-04-13 23:20:46 -07:00
Eric Anholt
76d56752cc vc4: Allow submitting jobs with no bin CL in validation.
For blitting, we want to fire off an RCL-only job.  This takes a bit of
tweaking in our validation and the simulator support (and corresponding
new code in the kernel).
2015-04-13 23:20:45 -07:00
Eric Anholt
43b20795b7 vc4: Move the blit code to a separate file.
There will be other blit code showing up, and it seems like the place
you'd look.
2015-04-13 23:20:45 -07:00
Eric Anholt
e214a59635 vc4: Separate out a bit of code for submitting jobs to the kernel.
I want to be able to have multiple jobs being set up at the same time (for
example, a render job to do a little fixup blit in the course of doing a
render to the main FBO).
2015-04-13 23:20:45 -07:00
Eric Anholt
44b63cf5c0 vc4: When asked to sample from a raster texture, make a shadow tiled copy.
So, it turns out my simulator doesn't *quite* match the hardware.  And the
errata about raster textures tells you most of what's wrong, but there's
still stuff wrong after that.  Instead, if we're asked to sample from
raster, we'll just blit it to a tiled temporary.

Raster textures should only be screen scanout, and word is that it's
faster to copy to tiled using the tiling engine first than to texture from
an entire raster texture, anyway.
2015-04-13 22:34:06 -07:00
Eric Anholt
d04b07f8e2 vc4: Fix off-by-one in branch target validation. 2015-04-13 22:34:06 -07:00
Eric Anholt
7fa2f2e366 vc4: Use NIR-level lowering for idiv.
This fixes the idiv tests in piglit.
2015-04-13 21:36:40 -07:00
Eric Anholt
84ebaff1b7 vc4: Add a bunch of type conversions.
These are required to get piglit's idiv tests working.  The
unsigned<->float conversions are wrong, but are good enough to get
piglit's small ranges of values working.
2015-04-13 21:36:40 -07:00
Eric Anholt
adae027260 vc4: Use the blit interface for updating shadow textures.
This lets us plug in a better blit implementation and have it impact the
shadow update, too.
2015-04-13 10:39:24 -07:00
Eric Anholt
39b6f7e76c vc4: Remove dead fields from vc4_surface. 2015-04-13 10:39:24 -07:00
Eric Anholt
5100221ff7 vc4: Skip sending down the clear colors if not clearing. 2015-04-13 10:39:24 -07:00
Eric Anholt
725620f21d vc4: Sync with kernel changes to relax BCL versus RCL validation.
There was no reason to tie the two packets' values together.
2015-04-13 10:39:23 -07:00
Eric Anholt
cb88d2cfcb vc4: Fix another space allocation mistake.
We're over-allocating our BCL in vc4_draw.c, so this never mattered.
However, new RCL-only blit support might end up here without having set up
any BCL contents.
2015-04-13 10:39:02 -07:00
Eric Anholt
8eb9304ee7 vc4: Add missed accounting for the size of the semaphore.
This wouldn't have mattered except in the worst case scenario RCL setup.
2015-04-13 10:33:30 -07:00