In commit 4be146b1, I neglected to add the new property to the strings
array. This leads to the string '(null)' to be printed instead when
converting a GS shader to text.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Make sure to normalize the z coordinates as well as the x/y ones when
there are mipmaps present. Fixes 3d mipmap generation, which now uses
the blit path.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
These instructions can come in either through IMUL_HI/UMUL_HI TGSI
opcodes, or from OP_DIV constant folding.
Also make sure that the constant foldings which delete the original
instruction still get counted as having done something.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Retrieving the high 32 bits of a signed multiply is rather annoying. It
appears that the simplest way to do this is to compute the absolute
value of the arguments, and perform a u32 x u32 -> u64 operation. If the
arguments' signs differ, then negate the result. Since there is no u64
support in the cvt instruction, we have the perform the 2's complement
negation "by hand".
This logic can come into use by the IMUL_HI instruction (very unlikely
to be seen), as well as from constant folding of division by a constant.
Fixes dolphin's divisions by 255.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Modern applications frequencly use both UNORM buffers and FLOAT buffers
with color clamping disabled. (FLOAT with clamping explicitly enabled
and SNORM buffers appear to be less common.) We don't need to emit
saturates in the fragment shader in either of the common cases.
Mesa sets ctx->Color._ClampFragmentColor to false if all the color
buffers are UNORM. Also, for GL_FIXED_ONLY mode (the default in
legacy OpenGL), it will be false if any FLOAT buffers are bound.
Since the common case is false, that should be our default.
Thanks to Roland Scheidegger for pointing out some faulty logic
in v1 of this patch (unnecessary code and incorrect explanations).
v2: Drop superfluous code and reword commit message.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
I think a3xx and later should support (it is part of GLES3), but this
isn't needed for the time being and still needs to be reversed.
Signed-off-by: Rob Clark <robclark@freedesktop.org>
Separating the software fallbacks from the rest of the meta path (which
is usually hardware accelerated) gives callers better control over their
blitting options.
For example, i965 might want to try meta blit, hardware blits, then
swrast as a last resort. Splitting it makes that possible.
This updates all callers to maintain the existing behavior (even in the
few cases where it isn't desirable behavior - later patches can change
that).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Cc: "10.2" <mesa-stable@lists.freedesktop.org>
SCons is required for Windows. Add links to flex/bison for Windows.
Reorder items and improve formatting.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2ea923cf57 had the side effect of IR counting
now being done after IR optimization instead of before. Some quick analysis
shows that there's roughly 1.5 times more IR instructions before optimization
than after, hence the effective shader cache size got quite a bit smaller.
Could counter this with an increase of the instruction limit but it probably
makes more sense to count them after optimizations, so move that code.
Reviewed-by: Brian Paul <brianp@vmware.com>
Fixes build error introduced with commit
4b04152db0.
CC test_eu_compact.o
test_eu_compact.c: In function ‘test_compact_instruction’:
test_eu_compact.c:54:3: error: implicit declaration of function ‘brw_disasm’ [-Werror=implicit-function-declaration]
brw_disasm(stderr, &src, brw->gen, false);
^
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78888
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
We already have a perfectly good copy of the program key, and nobody is
going to modify it. The only reason we copied it was because the
brw_wm_compile structure embedded the key rather than pointing to it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Instead, just pass the key and prog_data as separate parameters.
This moves it up a level - one step further toward getting rid of it.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
'c' is going away, but we still need a memory context that lives
for the duration of the compile.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Previously, the memory context situation was a bit of a mess:
fs_visitor allocated its own memory context, and freed it in the
destructor. However, some data produced by fs_visitor (such as the list
of instructions) needs to live beyond when fs_visitor is "done", so the
caller can pass it to fs_generator.
Everything worked out because brw_wm_fs_emit's fs_visitor variables
happen to not go out of scope until the end of the function. But that
meant that moving the declaration of, say, the SIMD16 fs_visitor
instance, could cause everything to explode.
Using a memory context that exists for the duration of the compile is
clearer, and should be equivalent.
Ultimately, we don't want to use 'c', but this matches the behavior of
fs_generator and gen8_fs_generator, so it'll be simple to change later.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
We throw away the data generated during compilation on the success path,
so we really ought to on the failure path as well. The caller has no
access to it anyway, so it's purely leaked.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
'c' is going away. This is also a bit shorter.
Marking the key pointer as const will also deter people from changing
it in these classes, as that's absolutely not OK.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
'c' is going away. This is also shorter.
Marking the key pointer as const will also deter people from changing
it in fs_visitor, as it's absolutely not OK to modify it there.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This data is created by fs_visitor and only used when emitting code,
so keeping it in fs_visitor makes sense. I decided it would be
reasonable to group these all together in a struct, since they're
highly related.
v2: s/nr_payload_regs/payload.num_regs/ in some comments (chrisf).
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
As far as I can tell, there's no point in allocating an extra register
and generating a MOV---we can just use the copy provided as part of our
thread payload directly. It's already in the right format.
Of course, there are zero Piglit tests for this. We don't actually ship
the extension (GL_ARB_gpu_shader5) that exposes this functionality
either.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
This is actually for gl_SampleMaskIn, which is quite different than
gl_SampleMask. Renaming should help avoid confusion.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
Nothing outside of fs_visitor uses it, so we may as well keep it
internal.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
With this one use gone, c->last_scratch is now only used inside
fs_visitor. The rest of the driver uses prog_data->total_scratch.
We already compute similar prog_data fields in fs_visitor, so this
seems reasonable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
The if (!allocated_without_spills) block is an obvious spot for this
performance warning message.
In the Vec4 backend, scratch is also used for indirect access of
temporary arrays. The FS backend doesn't implement that yet, but
if it did, this message would be inaccurate, since scratch access
wouldn't necessarily mean spilling. Moving it preemptively fixes that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
"Disassemble" is an accurate description of what this function does.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
We're going to use "disassemble" for the function that disassembles
the whole program.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
dump_prog_cache has interpreted compacted instructions as full size
instructions, decoding garbage and complaining about invalid values.
We can just use brw_dump_compile to handle this correctly in less code.
The output format changes slightly, but it's still perfectly acceptable.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Looping over the instructions and calling brw_disasm doesn't handle
compacted instructions. In most cases, this hasn't been a problem since
we don't compact prior to Sandybridge.
However, Sandybridge's transform feedback GS program should already be
compacted, and so this ought to fix decoding of that.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
UNION appears to expect that all of its sources are conditionally
defined. Otherwise it inserts an unpredicated mov instruction which
overwrites the desired result. This fixes tests that use UMUL_HI, and
much less directly, unsigned integer division by a constant, which uses
this functionality in a peephole pass.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: "10.1 10.2" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Ben Skeggs <bskeggs@redhat.com>
Gallium already gives us height==1 for these, so the texture state is
already setup correctly to emulate 1D textures as a Nx1 2D texture. We
just need to supply the .y coord.
Signed-off-by: Rob Clark <robclark@freedesktop.org>