Commit Graph

55600 Commits

Author SHA1 Message Date
Christoph Bumiller
1ed507ca46 nv50/ir/opt: CALLs cannot load 2013-03-12 12:55:35 +01:00
Christoph Bumiller
c893b94060 nv50/ir: add support for indirect BRA,CALL 2013-03-12 12:55:34 +01:00
Christoph Bumiller
efe55075b5 nvc0/ir/emit: implement move to and logic ops on predicates 2013-03-12 12:55:34 +01:00
Christoph Bumiller
ce7610f7d5 nvc0/ir/emit: implement surface related ops 2013-03-12 12:55:34 +01:00
Christoph Bumiller
3741b7d844 nv50/ir: initialize CodeEmitters' specialized target fields 2013-03-12 12:55:34 +01:00
Christoph Bumiller
b0fc2f13ec nv50/ir/opt: make optimization aware of atomics, barriers, surface ops 2013-03-12 12:55:34 +01:00
Christoph Bumiller
22b762f9b4 nv50/ir: add various new OPs that will be needed for compute 2013-03-12 12:55:34 +01:00
Francisco Jerez
c82714c593 nv50/ir: Rename "mkLoad" to "mkLoadv" for consistency. 2013-03-12 12:55:34 +01:00
Christoph Bumiller
cc30ce8160 nv50/ir: fix comparison of system values 2013-03-12 12:55:34 +01:00
Francisco Jerez
4ddfdcea04 nv50/ir/tgsi: Translate grid-related system parameters. 2013-03-12 12:55:34 +01:00
Francisco Jerez
8446c31d0e nv50/ir/tgsi: Accept COMPUTE programs. 2013-03-12 12:55:34 +01:00
Christoph Bumiller
e9294e11b4 nv50/ir/ra: make sure all used function inputs get assigned a reg
A live range [0, 0) counts as empty. For function inputs this can
be a problem, so insert a nop at the beginning to make it [0, 1).
This is a bit of a hack but also the most simple solution.
2013-03-12 12:55:34 +01:00
Christoph Bumiller
ee431b12ec nv50/ir/ra: also add pre-existing MERGE,SPLIT to constraint list 2013-03-12 12:55:34 +01:00
Christoph Bumiller
f1dfa414f4 nv50/ir/ra: fix confusion with conditional RegisterSet::occupy 2013-03-12 12:55:34 +01:00
Christoph Bumiller
d995f44f0b nv50/ir/ra: swap copyCompound args if src is compound and dst isn't 2013-03-12 12:55:33 +01:00
Francisco Jerez
95ad9bca2f nv50/ir/ra: Fix maxGPR calculation for programs with multiple functions. 2013-03-12 12:55:33 +01:00
Francisco Jerez
ca04e71024 nv50/ir/ra: Fix traversal before the beginning of the active list in buildRIG. 2013-03-12 12:55:33 +01:00
Francisco Jerez
fe17d8a7c0 nv50/ir/ra: Fix RegisterSet::occupy(const Value *v). 2013-03-12 12:55:33 +01:00
Francisco Jerez
49ded0e132 nv50/ir/ra: Fix argument const-ness in RegisterSet::idToUnits and idToBytes 2013-03-12 12:55:33 +01:00
Francisco Jerez
5959d4247a nv50/ir/opt: Fix tryPropagateBranch for BBs with several exit branches.
Comments and "if (bf->cfg.incidentCount() == 1)" condition added
by Christoph Bumiller.
2013-03-12 12:55:33 +01:00
Francisco Jerez
572bf83ec0 nv50/ir: Clean up references to function values before destroying them. 2013-03-12 12:55:33 +01:00
Francisco Jerez
12f65e38c0 nouveau: Bail out from nouveau_fence_wait if flushing the pushbuf fails. 2013-03-12 12:55:33 +01:00
Vinson Lee
543d032885 mesa: Use correct functions for enum conversion.
Fixes mixing enum types defects reported by Coverity.

Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2013-03-11 23:44:10 -07:00
Rob Clark
6173cc19c4 freedreno: gallium driver for adreno
Currently works on a220.  Others in the a2xx family look pretty similar
and should be pretty straightforward to support with the same driver.

The a3xx has a new shader ISA, and while many registers appear similar,
the register addresses have been completely shuffled around.  I am not
sure yet whether it is best to support with the same driver, but
different compiler, or whether it should be split into a different
driver.

v1: original
v2: build file updates from review comments, and remove GPL licensed
    header files from msm kernel
v3: smarter temp/pred register assignment, fix clear and depth/stencil
    format issues, resource_transfer fixes, scissor fixes

Signed-off-by: Rob Clark <robdclark@gmail.com>
2013-03-11 21:53:24 -04:00
José Fonseca
44a8e51354 d3d1x: Remove.
Unused/unmaintained.

Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
2013-03-12 00:35:06 +00:00
José Fonseca
7db60f049f nv50: Remove nv0_ir_from_sm4.*
Unused, depends on d3d1x.

Reviewed-by: Christoph Bumiller <e0425955@student.tuwien.ac.at>
2013-03-12 00:35:06 +00:00
Roland Scheidegger
5c41d1c222 gallivm: clean up passing derivatives around
Previously, the derivatives were calculated and passed in a packed form
to the sample code (for implicit derivatives, explicit derivatives were
packed to the same format).
There's several reasons why this wasn't such a good idea:
1) the derivatives may not even be needed (not as bad as it sounds since
llvm will just throw the calculations needed for them away but still)
2) the special packing format really shouldn't be part of the sampler
interface
3) depending what the sample code actually does the derivatives will
be processed differently, hence there is no "ideal" packing. For cube
maps with explicit derivatives (which we don't do yet) for instance the
packing looked downright useless, and for non-isotropic filtering we'd
need different calculations too.

So, instead just pass the derivatives as is (for explicit derivatives),
or let the rho calculating sample code calculate them itself. This still
does exactly the same packing stuff for implicit derivatives for now,
though explicit ones are handled in a more straightforward manner (quick
estimates show performance should be quite similar, though it is much
easier to follow and also does the rho calculation per-pixel until the
end, which we eventually need for spec compliance anyway).

No piglit changes.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-03-12 00:24:22 +01:00
Chad Versace
b7262ac7ea i965: Fix typo in doxygen hyperlink
s/brw_state_upload/brw_upload_state/

Found because the link was broken.

Signed-off-by: Chad Versace <chad.versace@linux.intel.com>
2013-03-11 16:01:19 -07:00
Eric Anholt
11b8df0c01 mesa: Reduce memory usage for reg alloc with many graph nodes (part 2).
After the previous fix that almost removes an allocation of 4*n^2
bytes, we can use a bitset to reduce another allocation from n^2 bytes
to n^2/8 bytes.

Between the previous commit and this one, the peak heap size for an
oglconform ARB_fragment_program max instructions test on i965 goes from
4GB to 255MB.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=55825
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-11 12:11:54 -07:00
Eric Anholt
6aa3afbfd6 mesa: Reduce the memory usage for reg alloc with many graph nodes (part 1)
We were allocating an adjacency_list entry for every possible
interference that could get created, but that usually doesn't happen.
We can save a lot of memory by resizing the array on demand.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-11 12:11:54 -07:00
Eric Anholt
5daf867f6c i965/fs: Improve CSE performance by expiring some available expressions.
We're already walking the list, and we can easily know when something
has no reason to be in the list any longer, so take a brief extra step
to reduce our worst-case runtime (an oglconform test that emits the
maximum instructions in a fragment program).  I don't actually know what
the worst-case runtime was, because it was too long and I got bored.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-11 12:11:54 -07:00
Eric Anholt
f179f419d1 i965/fs: Improve live variables calculation performance.
We can execute way fewer instructions by doing our boolean manipulation
on an "int" of bits at a time, while also reducing our working set size.

Reduces compile time of L4D2's slowest shader from 4s to 1.1s
(-72.4% +/- 0.2%, n=10)

v2: Remove redundant masking (noted by Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-11 12:11:54 -07:00
Eric Anholt
4dc7e6dcbf i965/fs: Also do the gen4 SEND dependency workaround against other SENDs.
We were handling the the dependency workaround for the first written reg
of a send preceding the one we're fixing up, but didn't consider the other
regs.  Thus if you had two sampler calls that got allocated to the same
set of regs, one might, rarely, ovewrite the other.  This was occurring in
XBMC's GLSL shaders.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=44567
NOTE: This is a candidate for the stable branches.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-11 12:11:53 -07:00
Eric Anholt
4c1fdae0a0 i965/fs: Switch to using sampler LD messages for uniform pull constants.
When forcing the compiler to always generate pull constants instead of
push constants (in order to have an easy to use testcase), improves
performance of my old GLSL demo 23.3553% +/- 1.42968% (n=7).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=60866
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-11 12:11:53 -07:00
Eric Anholt
1323772543 i965/fs: Fix broken rendering in large shaders with UBO loads.
The lowering process creates a new vgrf on gen7 that should be represented
in live interval analysis.  As-is, it was getting a conflicting allocation
with gl_FragDepth in the dolphin emulator, producing broken rendering.

NOTE: This is a candidate for the 9.1 branch.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61317
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-11 12:11:53 -07:00
Eric Anholt
c588cd2031 i965/fs: Add a comment about about an implementation detail.
I was going to fix the code above like the previous commit, but we already
had that covered (otherwise all our uniform access would have been broken,
unlike just pull constants).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-11 12:11:53 -07:00
Eric Anholt
f10f5e4980 i965/fs: Fix register allocation for uniform pull constants in 16-wide.
We were allowing a compressed instruction to write a register that
contained the last use of a uniform pull constant (either UBO load or push
constant spillover), so it would get half its values smashed.

Since we need to see the actual instruction to decide this, move the
pre-gen6 pixel_x/y logic here, which should improve the performance of
register allocation since virtual_grf_interferes() is called more than
once per instruction.

NOTE: This is a candidate for the stable branches.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=61317
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-11 12:11:53 -07:00
Eric Anholt
f09a8e17e5 intel: Remove some unused debug flags.
I was looking at the list to see what might be interesting to document for
application developers, and it turns out some are completely dead.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2013-03-11 12:11:53 -07:00
Zack Rusin
7295fad204 draw/gs: Correctly iterate the emitted primitives
We were assuming that each emitted primitive had the same
number of vertices. That is incorrect. Emitted primitives
can have arbirtrary number of vertices. Simply increment
index on iteration to fix it.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-03-07 20:16:07 -08:00
Zack Rusin
e5406f7058 tgsi/exec: Correctly reset NumOutputs before parsing the shader
Whenever we're binding the shaders we're incrementing NumOutputs,
assuming the parser spots an output decleration, but we were never
reseting the variable. That means that each subsequent bind of
a geometry shader would add its number of output to the number
of output bound by all previously ran shaders and our indexes
would get completely messed up.

Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
2013-03-07 20:16:00 -08:00
Roland Scheidegger
9060c835fd draw/llvm: another quick hack for drawing with no position output
Also need to skip things if we have no cv value but pos value
(happens with geometry shaders enabled).
Needs a round of cleanup, though.
2013-03-11 17:07:51 +01:00
Roland Scheidegger
ef17cc9cb6 softpipe: don't use samplers with prebaked sampler and sampler_view state
This is needed for handling the dx10-style sample opcodes.
This also simplifies the logic by getting rid of sampler variants
completely (sampler_views though OTOH have sort of variants because
some of their state is different depending on the shader stage they
are bound to).
No significant performance difference (openarena run:
840 frames in 459.8 seconds vs. 840 frames in 460.5 seconds).

v2: fix reference counting bug spotted by Jose.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-03-11 17:07:51 +01:00
Roland Scheidegger
f33c744fb9 tgsi: emit code for SVIEWINFO and SAMPLE_I
Can handle them since the single sampler interface was introduced.

v2: simplify txf/sample_i handling a bit according to Brian's feedback.

Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-03-11 17:07:51 +01:00
Roland Scheidegger
7b3a0bb45d tgsi: fix wrong reg used for unit for TGSI_OPCODE_TXF
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
2013-03-11 17:07:51 +01:00
Tom Stellard
a0676968b9 r600g/llvm: Fix build 2013-03-11 11:10:51 -04:00
Marek Olšák
e4e655fd11 r600g: add debug options disabling various copy-buffer-related features
This will be invaluable for debugging and bug reports.
2013-03-11 13:44:46 +01:00
Marek Olšák
4b69c1a92d mesa: don't allocate a texture if width or height is 0 in CopyTexImage
NOTE: This is a candidate for the stable branches.

Reviewed-by: Brian Paul <brianp@vmware.com>
2013-03-11 13:44:14 +01:00
Marek Olšák
68ed4c9c89 gallium/util: attempt to fix blitting multisample texture arrays
We don't have a test for this yet, but obviously the swizzle was wrong.
2013-03-11 13:43:36 +01:00
Marek Olšák
52efa01de0 r600g: allocate FMASK right after the texture, so that it's aligned with it
This avoids the kernel CS checker errors with MSAA textures.

Reviewed-by: Jerome Glisse <jglisse@redhat.com>
2013-03-11 13:43:36 +01:00
Marek Olšák
2c339f8015 r600g: remove r600.h, move the stuff elsewhere (mostly to r600_pipe.h)
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
2013-03-11 13:43:36 +01:00