Reduces compiled size of brw_wm_surface_state.o another 1.9%.
Overall, this brw_wm_surface_state reduction series cuts
firefox-talos-gfx runtime by 0.68% +/- 0.42% (n=6).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
It turns out that gcc is just awful at generating code for
brw_structs.h style state setup, and using bitshifting on u32s
generates better code while being similarly readable (and more
verifiable compared to the specs, using the INTEL_MASK macro).
It's only used in the old fragment program path, to avoid projection
when w is always 1. We do want to do this in the new path pre-gen6
too, but we'll probably do it through the ir.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Oddly, this increases compiled code size. (marking the 'if' as likely
also increases code size, but not as much).
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Interestingly, the compiler wasn't doing this for us at -O2, so we
were doing the computation for every non-_ReallyEnabled unit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
- all asics need to emit CONTEXT_CONTROL
- all r6xx asics need to emit 3D_START_CMDBUF
The ddx and r600c already do this. r600g should as well.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
We are getting inconsistent methods for endian detection (same answer when
it works, just doesn't work on some platforms) depending on whether __GLIBC__
is defined, which of course depends on include ordering before p_config.h
Just make p_config.h include limits.h to solve this.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
On my original R600 card this at least lets gnome shell run for a while longer
and the piglit r300-readcache test case works a lot more reliably.
Still a few more stability issues running a piglit test run though.
Signed-off-by: Dave Airlie <airlied@redhat.com>
The spec doesn't state it should be an error, but. We have this piglit test
useprogram-inside-begin that passes with this commit. No idea what's correct.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
The conditional rendering should be able to kill CopyPixels.
I assume the render condition has no effect on resource_copy_region.
This fixes piglit:
- NV_conditional_render/copypixels
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
Always default to DEFAULT_*_FORMATS for mandatory GL formats.
(st_choose_format must not fail for those)
Use DEFAULT_RGBA when alpha is required instead of RGB.
Use DEFAULT_RGB otherwise.
These are more or less the remaining differences between the old code and
the new one.
Reviewed-by: Brian Paul <brianp@vmware.com>
The problem is: The second time the function is called with a new
internal format, strb->format is usually not PIPE_FORMAT_NONE.
RenderbufferStorage(... GL_RGBA8 ...);
RenderbufferStorage(... GL_RGBA16 ...); // had no effect on the format
Broken with: fd6f2d6e57
Test: piglit/fbo-storage-completeness
NOTE: This is a candidate for the 7.10 branch.
(if fd6f2d6e57 is cherry-picked as well)
Reviewed-by: Brian Paul <brianp@vmware.com>
Lowered indirect addressing can create lots of immediates.
Fixes piglit/glsl-fs-uniform-array-7 on r300g.
NOTE: This is a candidate for the 7.10 branch.
Reviewed-by: Brian Paul <brianp@vmware.com>
From now on, depth test is always enabled in hardware.
If depth test is disabled in Gallium, the hardware Z function is set to ALWAYS.
If there is no zbuffer set, the colorbuffer0 memory is set as a zbuffer
to silence the CS checker.
This fixes piglit:
- occlusion-query-discard
- NV_conditional_render/bitmap
- NV_conditional_render/drawpixels
- NV_conditional_render/vertex_array
We want to check for Success, otherwise it will fail even with the right visual.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Antoine Labour <piman@chromium.org>
Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
Signed-off-by: Brian Paul <brianp@vmware.com>
When using _mesa_layout_parameters, all params copied in the 'layout'
output in the PASS 1 don't modify StateFlags (because they are simply
memcpy'ed).
This patch fixes the problem, assuring output gl_prog_param_list
StateFlags field is the same as the input one.
NOTE: This is a candidate for the 7.10 branch.
Signed-off-by: Brian Paul <brianp@vmware.com>
At glLinkShaders time, a fail() call in FS compile in 8-wide (the one
that's required to succeed, though we may relax that at some point for
pre-Ironlake performance) will now report out as a link error.
We now have:
brw_fs.cpp handles calling out to everything and optimization.
brw_fs_visitor.cpp handles translating to our LIR.
brw_fs_emit.cpp handles emitting from our LIR to native code.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
There's an assumption here that fixed GRFs will never intersect with
the allocated GRFs. That's true today, though it might change some
day if we decide to register-allocate the regs containing push
constants once they're dead.
This fixes a regression in 0f7325b890 in
Lightsmark from the texture instructions now containing g0 references
instead of having that be implied. Performance is improved 15.2% +/-
3.6% (n=3).
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=34968