It can be trivially derived from the number of already declared system
values. This allows ureg users not to worry about which index to choose.
Reviewed-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Edward O'Callaghan <eocallaghan@alterapraxis.com>
The piglit copyteximage check has recently been augmented to test this, but
apparently it hasn't been fixed in Mesa so far.
This language also already appears in the OpenGL 2.1 spec (Ian).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
For gen9 SURFTYPE_CUBE, the RENDER_SURFACE_STATE's Depth,
MinimumArrayElement, and RenderTargetViewExtent is in units of full
cubes and so must be divided by 6.
Fixes 'dEQP-VK.pipeline.image.view_type.cube_array.cube_array.*'.
Now all of 'dEQP-VK.pipeline.image.*' passes.
Drop the temporary variables for RENDER_SURFACE_STATE's Depth and
RenderTargetViewExtent. Instead, assign them in-place.
This simplifies the next commit, which fixes gen9 cube surfaces.
SKL needs this to make sure we flush the push constants. It gets a
little tricky, since we also need to emit binding tables before push
constants, since that may affect the push constants (dynamic buffer
offsets and storage image parameters). This patch splits emitting
binding tables from emitting the pointers so that we can emit push
constants after binding tables but before emitting binding table
pointers.
We don't need these for GLSL or ARB, but we need them for SPIR-V
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Both were defined as returning bool but the gpu_shader5 functions are
defined to return int. Also, we had the parameters for usub borrwo
backwards in the folding expression.
Reviewed-by: Matt Turner <mattst88@gmail.com>
The draw groups are now split up into groups of 32 if there's a
non-packed stride, or in groups of 400-500 if the draw data is packed.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
This shifts all indirect draws to go through the new function. If the
driver doesn't have support for multi draws, we break those up and
perform N draws. Otherwise, we pass everything through for just a single
draw call.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This makes it possible to support indirect multidraws as well as having
the number of such draws to come from a separate GPU resource.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
All indirect draws are passed to the new draw function. By default
there's a fallback implementation which pipes it right back to
draw_prims, but eventually both the fallback and draw_prim's support for
indirect drawing should be removed.
This should allow a backend to properly support ARB_multi_draw_indirect
and ARB_indirect_parameters.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
The sse path was pretty much disabled for practical purposes because the
largest allowed fb size was 128x128. So, adapt it for 64bit plane calculations.
This is actually not that difficult, though a problem is that we can't do
a signed 32x32->64bit mul, only unsigned, so need to fix that up. Overall,
the code still looks reasonable, though it's not like changes there in
setup really make much of a difference in the end...
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
eo, just like dcdx and dcdy, cannot overflow 32bit.
Store it as unsigned though just in case (it cannot be negative, but
in theory twice as big as dcdx or dcdy so this gives it one more bit).
This doesn't really change anything, albeit it might help minimally on
32bit archs.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Back in the day (before 24678700ed) the values
were not actually in a struct but even then I can't see why we didn't simply
align the values. Especially since it's trivial to do so.
(Not that it actually matters since the code is pretty much unused for now.)
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Otherwise, clipped lines would have undefined stippling reset bit if line
stippling is enabled.
(Untested, and I just assume copying over the bits from the original line
is actually the right thing to do.)
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
The unfilled stage was not filling in the prim header, and the line stage
then decided to reset the stipple counter or not based on the uninitialized
data. This causes some failures in conform linestipple test (albeit quite
randomly happening depending on environment).
So fill in the prim header in the unfilled stage - I am not entirely sure
if anybody really needs determinant after that stage, but there's at least
later stages (wide line for instance) which copy over the determinant as well.
Reviewed-by: Jose Fonseca <jfonseca@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
This was added in 54f583a20 since then error handling has improved.
The test this was added to fix now fails earlier since 01822706ec
Reviewed-by: Matt Turner <mattst88@gmail.com>
When looping through VkBufferImageCopy regions, for each region we
incremented the offset into the VkBuffer assuming the format size was 4.
Fixes CTS tests dEQP-VK.pipeline.image.view_type.cube_array.3d.* on
Skylake.
In lp_build_conv() and lp_build_conv_auto(), there is a special case of
conversion when sse2 is present. That code path is suitable without any
changes to altivec, because all the functions that are called in that
code path already support altivec.
This patch increase the FPS in POWER arch across the board
between 10%-25%
I checked ipers, glxgears, glxspheres64, openarena, xonotic and glmark2.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
For tiled 3D surfaces, the array pitch must aligned to the tile height.
From the Skylake BSpec >> RENDER_SURFACE_STATE >> Surface QPitch:
Tile Mode != Linear: This field must be set to an integer multiple of
the tile height
Fixes CTS tests 'dEQP-VK.pipeline.image.view_type.3d.format.r8g8b8a8_unorm.*'.
Fixes Crucible tests 'func.miptree.r8g8b8a8-unorm.aspect-color.view-3d.*'.
The function will be extended to dump all binaries shaders will consist of,
so si_shader* makes sense here.
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>