i965 needs to know whether stencil writes are enabled in several places,
and gets the test wrong sometimes. While we could create a function to
compute this, it seems generally useful enough to warrant a new piece of
derived state. Also, all the plumbing is already in place.
NOTE: This is a candidate for stable branches.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Paul Berry <stereotype441@gmail.com>
The ar_ge_as_at variable was just very very confusing since the condition
was actually the other way around (as_at_ge_ar). So change the condition
(and the selects depending on it) to match the variable name.
And also change the chosen major axis in case the coord values are the
same. OpenGL doesn't care one bit which one is chosen in this case but
it looks like dx10 would require z chosen over y, and y chosen over x
(previously did x chosen over y, y chosen over z). Since it's all the
same effort just honor dx10's wishes. (Though actually, for some prefered
orderings, we could save one (or two with derivatives) selects since the
tnewx and tnewz (and the corresponding dmax values) are the same.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The way we were allocating registers before, packing into low register
numbers for Ironlake, resulted in an overly-constrained dependency graph
for instruction scheduling. Improves GLBenchmark 2.1 performance by
4.5% +/- 0.7% (n=26). No difference on my old GLSL demo (n=20). No
difference on nexuiz (n=15).
v2: Fix off-by-one bug that made the change only work for 16-wide on i965.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
GCC 4.8 now warns about typedefs that are local to a scope and not
used anywhere within that scope. This produced spurious warnings with
the STATIC_ASSERT() macro (which used a typedef to provoke a compile
error in the event of an assertion failure).
This patch switches to a simpler technique that avoids the warning.
v2: Avoid GCC-specific syntax. Also update p_compiler.h.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The former just checks that the given block is valid by checking
the header and footer.
The later sets the memory block's tag. With extra debug code, we
can use that for monitoring/checking particular allocations.
Reviewed-by: José Fonseca <jfonseca@vmware.com>
This is trivial now, though need to make sure we pass all the necessary
derivative values (which is 3 each for ddx/ddy not 2).
Passes piglit arb_shader_texture_lod-texgradcube test.
v2: add the forgotten abs() for all incoming derivatives (discovered
by new piglit arb_shader_texture_lod-texgradcube test, though more by
luck as it was failing only for exactly one pixel...).
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
This proved to be tricky, the problem is that after selection/mirroring
we cannot calculate reasonable derivatives (if not all pixels in a quad
end up on the same face the derivatives could get "randomly" exceedingly
large).
However, it is actually quite easy to simply calculate the derivatives
before selection/mirroring and then transform them similar to
the cube coordinates (they only need selection/projection, but not
mirroring as we're not interested in the sign bit, of course). While
there is a tiny bit more work to do (need to calculate derivs for 3
coords instead of 2, and additional selects) it also simplifies things
somewhat for the coord selection itself (as we save some broadcast aos
shuffles, and we don't need to calculate the average vector) - hence if
derivatives aren't needed this should actually be faster.
Also, this has the benefit that this will (trivially) work for explicit
derivatives too, which we completely ignored before that (will be in a
separate commit for better trackability).
Note that while the way for getting rho looks very different, it should
result in "nearly" the same values as before (the "nearly" is only because
before the code would choose the face based on an "average" vector and hence
the derivatives calculated according to this face, where now (for implicit
derivatives) the derivatives are projected on the face selected for the
first (top-left) pixel in a quad, so not necessarly the same face).
The transformation done might not quite be state-of-the-art, calculating
length(dx,dy) as max(dx,dy) certainly isn't neither but this stays the
same as before (that is I think a better transform would _somehow_ take
the "derivative major axis" into account so that derivative changes in
the major axis wouldn't get ignored).
Should solve some accuracy problems with cubemaps (can easily be seen with
the cubemap demo when switching wrapping/filtering), though we still don't
do seamless filtering to fix it completely (so not per-sample but per-pixel
is certainly better than per-quad and already sufficient for accurate
results with nearest tex filter).
As for performance, it seems to be a tiny bit faster too (maybe 3% or so
with cubemap demo). Which I'd have expected with nearest/nearest filtering
where this will be less instructions, but the difference seems to actually
be larger with linear/linear_mipmap_linear where it is slightly more
instructions, probably the code appears less serialized allowing better
scheduling (on a sandy bridge cpu). It actually seems to be now at least
as fast as the old path using a conditional when using 128bit vectors too
(that is probably more a result of testing with a newer cpu though), for now
that old path is still there but unused.
No piglit regressions.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Using a different packing for the single coord case should save a shuffle.
Plus some minor style fixes.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Should be way faster of course on cpus supporting this (includes AMD
Bulldozer and Jaguar cores, Intel Ivy Bridge and up (except budget models)).
Passes piglit fbo-blending-formats GL_ARB_texture_float -auto on Ivy Bridge.
Reviewed-by: Brian Paul <brianp@vmware.com>
When geometry shaders are present, one needs to be able to create
an empty geometry shader with stream output that needs to be
resolved later and attached to the currently bound vertex shader.
Lets add support for it to llvmpipe and draw. draw allows attaching
independent stream output info to any vertex shader and llvmpipe
resolves at draw time which vertex shader the given empty geometry
shader should be linked to.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We need to reset the internal state of the so buffers or we'll
keep appending even though we're not supposed to.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
I think this was there before and got accidently
removed during a merge. Same code as for the GS
context, which is also using an enum instead of
hardcoded numbers.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
It's quite helpful during the rendering when we know
exactly the count of the vertices available in the
buffer.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
We were flushing with incorrect number of primitives. TGSI exec
can only work with a single primitive at a time. Plus the fetching
with multiple primitives on llvm paths wasn't copying the last
element.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Reviewed-by: José Fonseca <jfonseca@vmware.com>
The functions are prototyped in u_transfer.h and are related to the
other functions in u_transfer.c.
The next patch will re-use the u_resource.c file for new code.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The fallbacks count is the number of drawing calls that use a "draw"
module fallback, such as polygon stipple.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
NOTE: Changed the semantic index for the drawtex coordinate to
be the texture unit index instead of always 0.
Not sure if this is correct but since the value seems to depend
on the unit it would make sense to use different varying slots.