Zoë Blade
05e7f7f438
Fix a few typos
...
Reviewed-by: Francisco Jerez <currojerez@riseup.net >
2015-04-27 17:28:29 +03:00
Marek Olšák
db2415189a
radeonsi: set an optimal value for DB_Z_INFO.ZRANGE_PRECISION
...
Required because of a VI hw bug.
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com >
2015-04-27 15:57:07 +02:00
Marek Olšák
bed98eef9a
radeonsi: remove deprecated and useless registers
...
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com >
2015-04-27 15:56:27 +02:00
Marek Olšák
393b0e0531
radeonsi: remove useless includes
...
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com >
2015-04-27 15:56:27 +02:00
Marek Olšák
d8269be1ce
gallium/radeon: print winsys info with R600_DEBUG=info
...
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com >
2015-04-27 15:56:27 +02:00
Marek Olšák
ecc7f2ed91
gallium/radeon: don't crash when getting out-of-bounds TEMP references
...
Reviewed-by: Tom Stellard <thomas.stellard@amd.com >
2015-04-23 16:14:39 +02:00
Dave Airlie
8a41cd2407
softpipe: fix stencil write to use an integer value
...
This fixes a number of regressions since
61393bdcdc
u_tile: fix stencil texturing tests under softpipe
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89960
Reviewed-by: Brian Paul <brianp@vmware.com >
Reviewed-by: Roland Scheidegger <sroland@vmware.com >
Signed-off-by: Dave Airlie <airlied@redhat.com >
2015-04-23 08:32:30 +10:00
Rob Clark
cb24d3b7ad
freedreno: misc minor cleanups
...
Signed-off-by: Rob Clark <robclark@freedesktop.org >
2015-04-22 13:20:28 -04:00
Rob Clark
1b58d8c2bf
freedreno/a4xx: (partial) gl_FragCoord.zw
...
The bit to enable .z is still commented out, as it is triggering gpu
hangs in 0ad. But at least gl_FragCoord.w works now, and we know what
bits we are *supposed* to set for .z (with that uncommented all piglit
fragcoord tests are passing).
Signed-off-by: Rob Clark <robclark@freedesktop.org >
2015-04-22 13:20:28 -04:00
Rob Clark
a869183123
freedreno/a4xx: primitive-restart
...
This was the missing bit to get dolphin-emu working on a4xx.
Signed-off-by: Rob Clark <robclark@freedesktop.org >
2015-04-22 13:20:28 -04:00
Rob Clark
632ea2a113
freedreno/nir: sysval fixes
...
Signed-off-by: Rob Clark <robclark@freedesktop.org >
2015-04-22 13:20:28 -04:00
Rob Clark
13527df143
freedreno/a4xx: wire up integer texture sampling
...
Similar to a3xx, the compiler needs to know the return type of the sam,
etc, instructions.
Signed-off-by: Rob Clark <robclark@freedesktop.org >
2015-04-22 13:20:28 -04:00
Rob Clark
48a651e98c
freedreno/a4xx: formats updates/fixes
...
Update formats table with new formats that Ilia has figured out, and fix
sampling from srgb texture and integer vbo's.
Signed-off-by: Rob Clark <robclark@freedesktop.org >
2015-04-22 13:20:28 -04:00
Rob Clark
21ceedfd8b
freedreno: update generated headers
...
Signed-off-by: Rob Clark <robclark@freedesktop.org >
2015-04-22 13:20:27 -04:00
Emil Velikov
86919352e3
android: use LOCAL_SHARED_LIBRARIES over TARGET_OUT_HEADERS
...
... to manage the LIBDRM*_CFLAGS. The former is the recommended approach
by the Android build system developers while the latter has been
depreciated for quite some time.
Cc: "10.4 10.5" <mesa-stable@lists.freedesktop.org >
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com >
2015-04-22 14:23:28 +01:00
Emil Velikov
413bc0a618
ilo: remove unused include from Android.mk
...
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com >
Reviewed-by: Chih-Wei Huang <cwhuang@linux.org.tw >
2015-04-22 14:18:47 +01:00
Ilia Mirkin
0904774af1
freedreno/a3xx: enable polymode setting with non-fill modes
...
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu >
2015-04-18 17:35:23 -04:00
Ilia Mirkin
6357601628
freedreno/a3xx: fix integer and 32-bit float border colors
...
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu >
2015-04-18 17:35:23 -04:00
Ilia Mirkin
6895c3554e
freedreno/a3xx: add support for float R/RG render targets
...
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu >
2015-04-18 17:35:23 -04:00
Rob Clark
95e68adcd9
freedreno/ir3/nir: few little fixes
...
isaml needs to scale up coords based on LoD. Also fix bogus bary.f
varying # when there are non-bary frag shader inputs. And use sub.s of
a positive immediate rather than add.s of negative (since CP is better
about figuring out that those can be collapsed into the cat2 instr).
Signed-off-by: Rob Clark <robclark@freedesktop.org >
2015-04-17 11:40:14 -04:00
Rob Clark
efbf14e893
freedreno/ir3/nir: lower if/else
...
For now, completely flatten if/else blocks. That will almost certainly
change once we have flow control.
Signed-off-by: Rob Clark <robclark@freedesktop.org >
2015-04-17 11:40:14 -04:00
Rob Clark
e5e11b5baf
freedreno/a4xx: support for large shaders
...
Signed-off-by: Rob Clark <robclark@freedesktop.org >
2015-04-17 10:40:50 -04:00
Rob Clark
20ea698c49
freedreno: update generated headers
...
Signed-off-by: Rob Clark <robclark@freedesktop.org >
2015-04-17 10:40:44 -04:00
Rob Clark
57f0d3b3c6
freedreno/ir3/nir: UBO support
...
Signed-off-by: Rob Clark <robclark@freedesktop.org >
2015-04-17 10:40:36 -04:00
Rob Clark
87807e5cc5
freedreno/ir3: move out helper
...
We'll also want it in NIR f/e for implementing UBO support.
Signed-off-by: Rob Clark <robclark@freedesktop.org >
2015-04-17 10:40:28 -04:00
Rob Clark
70b2f872ea
freedreno/a4xx: sysvals and UBOs
...
Basically just sync up the cmdstream emit parts to match the changes
already done on a3xx.
Also, fix scheduling for mem instructions. This is needed on a4xx, and
I am a bit surprised it isn't needed for a3xx.
Signed-off-by: Rob Clark <robclark@freedesktop.org >
2015-04-17 10:40:18 -04:00
Marek Olšák
b79c620663
radeonsi: add a debug option to compile shaders when they're created
...
Tested-by: Tom Stellard <thomas.stellard@amd.com >
2015-04-16 18:36:29 +02:00
Emil Velikov
a7d018accf
radeonsi: remove bogus r600-- triple
...
As mentioned by Michel Dänzer for LLVM >= 3.6 we create the
LLVMTargetMachine (with triple amdgcn--), as we setup the radeonsi
context. For older LLVM or hardware (r600) the triple is always r600--
and is created at a later stage - radeon_llvm_compile()
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com >
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com >
2015-04-16 14:15:19 +01:00
Glenn Kennard
17d69862a9
r600g/sb: Skip empty ALU clause while scheduling
...
Fixes assert triggered by
ext_transform_feedback-intervening-read output use_gs
piglit test.
Signed-off-by: Glenn Kennard <glenn.kennard@gmail.com >
Signed-off-by: Dave Airlie <airlied@redhat.com >
2015-04-16 12:43:20 +10:00
Eric Anholt
b229e6c7de
vc4: Don't try to use color load/stores to blit across format changes.
...
We could potentially support the right combination of 8888 to 565, but the
important thing for now is to not mix up our orderings of 8888. Fixes
fbo-copyteximage regressions.
2015-04-15 16:50:23 -07:00
Eric Anholt
cff2e08c4c
vc4: Don't try to use color load/stores to do depth/stencil blits.
...
Fixes regressions in fbo-generatemipmap-formats on depth/stencil (which
does blits to work around baselevel/lastlevel).
2015-04-15 16:50:23 -07:00
Eric Anholt
3a728d4dfb
vc4: Update the shadow texture for public textures on every draw.
...
We don't know who else has written to it, so we'd better update it every
time. This makes the gears spin in X again.
2015-04-15 16:50:23 -07:00
Eric Anholt
bd957b1b79
vc4: Hook up VC4_DEBUG=perf to some useful printfs.
2015-04-15 16:50:22 -07:00
Tom Stellard
e0994e0f97
radeon/llvm: Improve codegen for KILL_IF
...
Rather than emitting one kill instruction per component of KILL_IF's src
reg, we now or the components of the src register together and use the
result as a condition for just one kill instruction.
shader-db stats (bonaire):
979 shaders
Totals:
SGPRS: 34872 -> 34848 (-0.07 %)
VGPRS: 20696 -> 20676 (-0.10 %)
Code Size: 749032 -> 748452 (-0.08 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 12288 -> 12288 (0.00 %) bytes per wave
Totals from affected shaders:
SGPRS: 1184 -> 1160 (-2.03 %)
VGPRS: 600 -> 580 (-3.33 %)
Code Size: 13200 -> 12620 (-4.39 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
Increases:
SGPRS: 2 (0.00 %)
VGPRS: 0 (0.00 %)
Code Size: 0 (0.00 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Decreases:
SGPRS: 5 (0.01 %)
VGPRS: 5 (0.01 %)
Code Size: 25 (0.03 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
*** BY PERCENTAGE ***
Max Increase:
SGPRS: 32 -> 40 (25.00 %)
VGPRS: 0 -> 0 (0.00 %)
Code Size: 0 -> 0 (0.00 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
Max Decrease:
SGPRS: 32 -> 24 (-25.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 116 -> 96 (-17.24 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
*** BY UNIT ***
Max Increase:
SGPRS: 64 -> 72 (12.50 %)
VGPRS: 0 -> 0 (0.00 %)
Code Size: 0 -> 0 (0.00 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
Max Decrease:
SGPRS: 32 -> 24 (-25.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 424 -> 356 (-16.04 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2015-04-14 13:37:12 +00:00
Tom Stellard
c6d79ed289
radeon/llvm: Run LLVM's instruction combining pass
...
This should improve code quality in general and will help with some
future changes to how we emit kill instructions.
shader-db shows a few regressions, but these don't seem to be the result
of deficiencies in instcombine. They're mostly caused by the scheduler
making different decisions than before.
shader-db stats (bonaire):
979 shaders
Totals:
SGPRS: 35056 -> 34872 (-0.52 %)
VGPRS: 20624 -> 20696 (0.35 %)
Code Size: 764372 -> 749032 (-2.01 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 12288 -> 12288 (0.00 %) bytes per wave
Totals from affected shaders:
SGPRS: 13264 -> 13072 (-1.45 %)
VGPRS: 8248 -> 8316 (0.82 %)
Code Size: 486320 -> 470992 (-3.15 %) bytes
LDS: 11 -> 11 (0.00 %) blocks
Scratch: 11264 -> 11264 (0.00 %) bytes per wave
Increases:
SGPRS: 6 (0.01 %)
VGPRS: 20 (0.02 %)
Code Size: 14 (0.01 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
Decreases:
SGPRS: 32 (0.03 %)
VGPRS: 8 (0.01 %)
Code Size: 244 (0.25 %)
LDS: 0 (0.00 %)
Scratch: 0 (0.00 %)
*** BY PERCENTAGE ***
Max Increase:
SGPRS: 32 -> 48 (50.00 %)
VGPRS: 12 -> 20 (66.67 %)
Code Size: 216 -> 224 (3.70 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
Max Decrease:
SGPRS: 40 -> 32 (-20.00 %)
VGPRS: 16 -> 12 (-25.00 %)
Code Size: 368 -> 280 (-23.91 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
*** BY UNIT ***
Max Increase:
SGPRS: 32 -> 48 (50.00 %)
VGPRS: 28 -> 36 (28.57 %)
Code Size: 39320 -> 40132 (2.07 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
Max Decrease:
SGPRS: 72 -> 64 (-11.11 %)
VGPRS: 48 -> 40 (-16.67 %)
Code Size: 6272 -> 5852 (-6.70 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Scratch: 0 -> 0 (0.00 %) bytes per wave
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2015-04-14 13:37:05 +00:00
Tom Stellard
2569c7109d
radeonsi: Add header and footer to shader stat dump
...
This makes it easier to parse.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
2015-04-14 13:36:59 +00:00
Eric Anholt
1be329e64c
vc4: Add a blitter path using just the render thread.
...
This accelerates the path for generating the shadow tiled texture when
asked to sample from a raster texture (typical in glamor).
2015-04-13 23:20:46 -07:00
Eric Anholt
76d56752cc
vc4: Allow submitting jobs with no bin CL in validation.
...
For blitting, we want to fire off an RCL-only job. This takes a bit of
tweaking in our validation and the simulator support (and corresponding
new code in the kernel).
2015-04-13 23:20:45 -07:00
Eric Anholt
43b20795b7
vc4: Move the blit code to a separate file.
...
There will be other blit code showing up, and it seems like the place
you'd look.
2015-04-13 23:20:45 -07:00
Eric Anholt
e214a59635
vc4: Separate out a bit of code for submitting jobs to the kernel.
...
I want to be able to have multiple jobs being set up at the same time (for
example, a render job to do a little fixup blit in the course of doing a
render to the main FBO).
2015-04-13 23:20:45 -07:00
Eric Anholt
44b63cf5c0
vc4: When asked to sample from a raster texture, make a shadow tiled copy.
...
So, it turns out my simulator doesn't *quite* match the hardware. And the
errata about raster textures tells you most of what's wrong, but there's
still stuff wrong after that. Instead, if we're asked to sample from
raster, we'll just blit it to a tiled temporary.
Raster textures should only be screen scanout, and word is that it's
faster to copy to tiled using the tiling engine first than to texture from
an entire raster texture, anyway.
2015-04-13 22:34:06 -07:00
Eric Anholt
d04b07f8e2
vc4: Fix off-by-one in branch target validation.
2015-04-13 22:34:06 -07:00
Eric Anholt
7fa2f2e366
vc4: Use NIR-level lowering for idiv.
...
This fixes the idiv tests in piglit.
2015-04-13 21:36:40 -07:00
Eric Anholt
84ebaff1b7
vc4: Add a bunch of type conversions.
...
These are required to get piglit's idiv tests working. The
unsigned<->float conversions are wrong, but are good enough to get
piglit's small ranges of values working.
2015-04-13 21:36:40 -07:00
Eric Anholt
adae027260
vc4: Use the blit interface for updating shadow textures.
...
This lets us plug in a better blit implementation and have it impact the
shadow update, too.
2015-04-13 10:39:24 -07:00
Eric Anholt
39b6f7e76c
vc4: Remove dead fields from vc4_surface.
2015-04-13 10:39:24 -07:00
Eric Anholt
5100221ff7
vc4: Skip sending down the clear colors if not clearing.
2015-04-13 10:39:24 -07:00
Eric Anholt
725620f21d
vc4: Sync with kernel changes to relax BCL versus RCL validation.
...
There was no reason to tie the two packets' values together.
2015-04-13 10:39:23 -07:00
Eric Anholt
cb88d2cfcb
vc4: Fix another space allocation mistake.
...
We're over-allocating our BCL in vc4_draw.c, so this never mattered.
However, new RCL-only blit support might end up here without having set up
any BCL contents.
2015-04-13 10:39:02 -07:00
Eric Anholt
8eb9304ee7
vc4: Add missed accounting for the size of the semaphore.
...
This wouldn't have mattered except in the worst case scenario RCL setup.
2015-04-13 10:33:30 -07:00