Commit Graph

84113 Commits

Author SHA1 Message Date
Jason Ekstrand
38c1909c0a i965/blorp/gen6: Move constant disables higher up
This is what gen7-8 do and it's a bit cleaner.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
e0bc2cb145 i965/blorp: Don't clear an empty region
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
e4d6ffbbf6 i965/blorp: Move the non-static blorp state setup helpers to another file
We're about to start replacing blorp state setup code with packing structs
and we want to feel free to delete files as we go.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Topi Pohjolainen <topi.pohjolainen@intel.com>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
50768a3879 i965/blorp: Make gen6 VS and GS disable helpers static
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
949a892026 i965: Roll intel_reg.h into brw_defines.h
More than half of the stuff in intel_reg.h had nothing whatsoever to do
with registers and really belongs in brw_defines.h anyway.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
8455f9430f i965: Stop including brw_defines.h in brw_state.h
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
4c3acf94da i965/state: Move is_drawing_lines/points to gen6_clip_state.c
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
04f3594cd5 genxml/gen9: Make 3DSTATE_SBE::AttributeActiveComponentFormat an array
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
bfdff28d68 genxml: Add a uint MOCS field to VERTEX_BUFFER_STATE
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
373613fa4b genxml: Make a couple of VERTEX_BUFFER_STATE fields boolean
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
29f1f945a6 genxml: Make VERTEX_ELEMENT_STATE::Valid a bool
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
eb2589cba6 genxml/gen6: Make SAMPLER_STATE look a bit more like gen7
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
2a84e40dae genxml: Add a uint MOCS field to DEPTH_BUFFER packets
This is easier than dealing with structs all the time

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
3f1022b029 genxml/gen6: Make "Depth Clear Value" a uint
The actual data storred is in float, UNORM24, or UNORM16 depending on the
actual depth format.

Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
be62e7645e genxml/gen6: Add the 3D_Prim_Topo_Type enum
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
cca95a7bd6 genxml/gen6: Fix the length of 3DSTATE_WM
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
3ddb6f6e2a genxml/gen6: Add a Surface Base Address field to HIER_DEPTH_BUFFER
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Jason Ekstrand
be52e16dbc genxml/gen6: Add uint MOCS fields for most things
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-08-19 03:11:29 -07:00
Kenneth Graunke
7d0554f341 nir: Rely on the fact that bcsel takes a well formed boolean.
According to Connor, it's safe to assume that the first operand of
bcsel, as well as the operand of b2f and b2i, must be well formed
booleans.

https://lists.freedesktop.org/archives/mesa-dev/2016-August/125658.html

With the previous improvements to a@bool handling, this now has no
change in shader-db instruction counts on Broadwell.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-08-19 02:05:23 -07:00
Francisco Jerez
7ceb42ccc5 i965/sched: Change the scheduling heuristics to favor early program termination.
This uses the unblocked time of the exit assigned to each available
node to attempt to unblock exit nodes as early as possible,
potentially reducing the runtime of the shader when an exit branch is
taken.  There is a natural trade-off between terminating the program
as early as possible and reducing the worst-case latency of the
program as a whole (since this will typically move exit-unblocking
nodes closer to its dependencies potentially causing additional stalls
of the execution pipeline), but in practice the bandwidth and ALU
cycle savings from terminating the program earlier tend to outweigh
the slight increase in worst-case program execution latency, so it
makes sense to prefer nodes likely to unblock an earlier exit
regardless of the latency benefits of other available nodes.

I haven't observed any benchmark regressions from this change after
testing on VLV, HSW, BDW, BSW and SKL.  The FPS of the GfxBench
Manhattan benchmark increases by 10%-20% and the FPS of Unigine Valley
improves by roughly 5% depending on the platform and settings.

The change to the register pressure-sensitive heuristic is rather
conservative and gives precedence to the existing heuristic in order
to avoid increasing register pressure and causing spill count and SIMD
width regressions in shader-db.  It may make sense to revisit this
with additional performance data.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 20:05:00 -07:00
Francisco Jerez
4147ca75d5 i965/sched: Assign a preferred exit node to each node of the dependency graph.
This adds a bit of metadata to schedule_node that will be used to
compare available nodes in the scheduling heuristic code based on
which of them unblocks the earliest successor exit node.  Note that
assigning exit nodes wouldn't be necessary in a bottom-up scheduler
because we could achieve the same effect by scheduling the exit nodes
themselves appropriately.

No shader-db changes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 20:05:00 -07:00
Francisco Jerez
b295d7ca32 i965/sched: Calculate the critical path of scheduling nodes non-recursively.
The critical path of each node is calculated by induction based on the
critical paths of its children, which can be done in a post-order
depth-first traversal of the dependency graph.  The current code
implements graph traversal by iterating over all nodes of the graph
and then recursing into its children -- But it turns out that
recursion is unnecessary because the lexical order of instructions in
the block is already a good enough reverse post-order of the
dependency graph (if it weren't a reverse post-order some instruction
would have been located before one of its dependencies in the original
ordering of the basic block, which is impossible), so we just need to
walk the instruction list in reverse to achieve the same result more
efficiently.

No shader-db changes.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 20:05:00 -07:00
Francisco Jerez
b2b621a0ec i965/fs: Switch to per-subspan discard jumps.
ANY4H is more efficient than ANY8H and ANY16H because it makes sure
that whenever a whole subspan hits a discard statement it gets
disabled by the EU until the end of the program, regardless of whether
the discard condition is uniform across all channels of the SIMD8-16
thread.  OTOH ANY8H/ANY16H would cause the rest of the program to be
executed for *all* channels if only one of the channels hadn't taken
the discard branch, potentially increasing the bandwidth and ALU usage
of the program unnecessarily.

This change increases the FPS by over 3x of a simple micro-benchmark
that discards a bunch of fragments and then does a single costly
texturing operation.  I've just re-verified the FPS change on HSW and
SKL, but I expect all platforms from Gen6 up to get a similar benefit.

Note that we could potentially be more aggressive and use the NORMAL
predicate to discard individual channels, but that would need to
happen post-scheduling because the scheduler currently doesn't care to
reorder HALT instructions with respect to other instructions, and the
NORMAL predicate would cause the results of subsequent derivative
computations to become undefined -- If the scheduler didn't reorder
HALT instructions it would actually be safe to switch to NORMAL
because the behavior of derivative computations after a non-uniform
discard statement is undefined by the GLSL spec, but that would make
the optimization implemented by one of the following commits somewhat
more difficult.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 20:05:00 -07:00
Francisco Jerez
01b321f242 i965/fs: Drop bogus writemasking disable bit from HALT instructions.
This may have been the reason people ran into problems with
non-uniform HALT instructions and ended up using the inefficient
ANY16H/ANY8H predicates instead of ANY4H or NORMAL in order to prevent
non-uniform discard.  The HALT instruction is able to handle
non-uniform execution masks just fine.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 20:04:59 -07:00
Ilia Mirkin
27e59ed477 mesa: avoid valgrind warning due to opaque only being set sometimes
Valgrind complains with a "Conditional jump or move depends on
uninitialised value(s)" warning due to opaque being conditionally
initialized. However in the punchthrough_alpha == true case, it is
always initialized, so just flip the condition around to silence the
warning.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Chris Forbes <chrisf@ijw.co.nz>
2016-08-18 22:48:55 -04:00
Ilia Mirkin
59bb821180 vbo: remove unnecessary max_basevertex computation
The max basevertex is already computed and added into max_index by the
caller, _tnl_draw_prims.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-18 20:26:34 -04:00
Ilia Mirkin
659dc10d32 vbo: add basevertex when looking up elements for vbo splitting
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97351
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2016-08-18 20:26:22 -04:00
Marek Olšák
07ccec002b radeonsi: initialize and finalize the LLVM function pass manager
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
2016-08-18 21:36:03 +02:00
Emil Velikov
d61d259518 isl: automake: use VISIBILITY_CFLAGS to restrict symbol visibility
v2: Add VISIBILITY_CFLAGS to AM_CFLAGS (Ken)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-18 15:06:19 +01:00
mil Velikov
ebd5dc8826 anv: remove dummy VK_DEBUG_MARKER_EXT entry points
The vkCmdDbgMarker{Begin,End} symbols are exported, yet the json does no
advertise that the driver supports the extension. Furthermore the
functions are empty stubs.

Remove those until we get a proper implementation and json notation.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-18 15:05:32 +01:00
Emil Velikov
49394e8d77 anv: do not export the Vulkan API
With version 1 of the Loader interface there is an internal/private symbol
(vk_icdGetInstanceProcAddr) which is used to retrieve all the API from the
Vulkan entrypoints from the ICD. Implying that exposing the Vulkan API is not
recommended.

Version 2 goes a step further explicitly forbiding the ICD from exposing Vulkan
symbols (and adding a negotiation API)

As a reference:
 - Nvidia 367.35
Missing negotiation API - version 1.
Exposes only vk_icdGetInstanceProcAddr.

 - AMD 16.30.3.306809
Have negotiation API - version 2,
Exposes vk_icdGetInstanceProcAddr.
Exposes a couple of Vulkan entry points - seems to be in violation with the spec.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 14:55:42 +01:00
Emil Velikov
1cdb6ca40b anv: automake: build with -Bsymbolic
Explicitly suggested in the Loader interface version 2 section, but it's good
idea either way. It essentially, ensures that our symbols are not interposed.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-18 14:53:33 +01:00
Emil Velikov
40e4fff563 anv: automake: use VISIBILITY_CFLAGS to restrict symbol visibility
Hide the internal symbols and annotate the vk_icdGetInstanceProcAddr as public
since the loader needs it (since v1 of the loader interface).

v2: Add VISIBILITY_CFLAGS to AM_CFLAGS (Ken)

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v1)
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-18 14:53:30 +01:00
Emil Velikov
b0d56f2f4f anv: remove internal 'validate' layer
Presently the layer has only a single entry point. As mentioned by Jason the
function does not validate anything that isn't checked elsewhere, thus we can
drop the whole thing.

Cc: "12.0" <mesa-stable@lists.freedesktop.org>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2016-08-18 14:53:24 +01:00
Kenneth Graunke
3a9e6102b4 nir/search: Extend 'a@bool' to handle a couple of system values.
load_front_face and load_helper_invocation produce booleans.

On Broadwell:

total instructions in shared programs: 11638956 -> 11638011 (-0.01%)
instructions in affected programs: 115093 -> 114148 (-0.82%)
helped: 628
HURT: 14

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 01:27:27 -07:00
Kenneth Graunke
e8543feba7 nir/search: Fold src_is_bool()/alu_instr_is_bool() into src_is_type().
I don't want src_is_bool() and src_is_type(x, nir_type_bool) to behave
differently.  Having the logic spread out over three functions makes it
harder to decide where to put new logic, as well.

So, combine them all.  It's a bit simpler because there's now only one
recursive function rather than a pair of mutually recursive functions.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 01:27:15 -07:00
Kenneth Graunke
241870fe5b nir/search: Introduce a src_is_type() helper for 'a@type' handling.
Currently, 'a@type' can only match if 'a' is produced by an ALU
instruction.  This is rather limited - there are other cases we
can easily detect which we should handle.

Extending the code in-place would be fairly messy, so we introduce
a new src_is_type() helper.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-08-18 01:26:47 -07:00
Kenneth Graunke
d14dd727f4 i965: Fix barrier count shift in scalar TCS backend.
The "Barrier Count" field goes in 14:9 of m0.2.  The vec4 backend
correctly shifts by 9, but the scalar backend only shifted by 8.

It's not like this changed - I think I just made a typo when writing
the original scalar TCS backend code.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-08-18 00:47:00 -07:00
Kenneth Graunke
159f037755 i965: Fix execution size of scalar TCS barrier setup code.
Previously, the scalar TCS backend was generating:

mov(8)   g17<1>UD     0x00000000UD    { align1 WE_all 1Q compacted };
and(8)   g17.2<1>UD   g0.2<0,1,0>UD   0x0001e000UD  { align1 WE_all 1Q };
shl(8)   g17.2<1>UD   g17.2<8,8,1>UD  0x0000000bUD  { align1 WE_all 1Q };
or(8)    g17.2<1>UD   g17.2<8,8,1>UD  0x00008200UD  { align1 WE_all 1Q };
send(8)  null<1>UW    g17<8,8,1>UD
         gateway (barrier msg) mlen 1 rlen 0 { align1 WE_all 1Q };

This is rubbish - g17.2<8,8,1>UD spans two registers, and is an illegal
region.  Not to mention it clobbers 8 channels of data when we only
wanted to touch m0.2.

Instead, we want:

mov(8)   g17<1>UD     0x00000000UD    { align1 WE_all 1Q compacted };
and(1)   g17.2<1>UD   g0.2<0,1,0>UD   0x0001e000UD  { align1 WE_all };
shl(1)   g17.2<1>UD   g17.2<0,1,0>UD  0x0000000bUD  { align1 WE_all };
or(1)    g17.2<1>UD   g17.2<0,1,0>UD  0x00008200UD  { align1 WE_all };
send(8)  null<1>UW    g17<8,8,1>UD
         gateway (barrier msg) mlen 1 rlen 0 { align1 WE_all 1Q };

Using component() accomplishes this.

Fixes GL44-CTS.tessellation_shader.tessellation_shader_tc_barriers.
barrier_guarded_read_write_calls on Skylake.  Probably fixes other
barrier issues on Gen8+.

v2: Use a group(1, 0) builder so inst->exec_size is set correctly
    (thanks to Francisco Jerez for catching that it was incorrect).

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v1]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-08-18 00:47:00 -07:00
Kenneth Graunke
9e778837ff i965: Implement the WaPreventHSTessLevelsInterference workaround.
Fixes several GL44-CTS.tessellation_shader (and GL45 and ES31) subcases:
- vertex_spacing
- tessellation_shader_point_mode.points_verification
- tessellation_shader_quads_tessellation.inner_tessellation_level_rounding

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-08-18 00:46:55 -07:00
Kenneth Graunke
d8971128ac nir/builder: Add bany_inequal and bany helpers.
The first simply picks the bany_inequal[234] opcodes based on the SSA
def's number of components.  The latter implicitly compares with zero
to achieve the same semantics of GLSL's any().

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
2016-08-18 00:46:04 -07:00
Kenneth Graunke
01e99cba04 mesa: Fix uf10_to_f32() scale factor in the E == 0 and M != 0 case.
GL_EXT_packed_float, 2.1.B Unsigned 10-Bit Floating-Point Numbers:

        0.0,                      if E == 0 and M == 0,
        2^-14 * (M / 32),         if E == 0 and M != 0,
        2^(E-15) * (1 + M/32),    if 0 < E < 31,
        INF,                      if E == 31 and M == 0, or
        NaN,                      if E == 31 and M != 0,

In the second case (E == 0 and M != 0), we were multiplying the mantissa
by 2^-20, when we should have been multiplying by 2^-19 (which is
2^(-14 + -5), or 2^-14 * 2^-5, or 2^-14 / 32).

The previous section defines the formula for 11-bit numbers, which is:

        2^-14 * (M / 64),         if E == 0 and M != 0,

In other words, we had accidentally copy and pasted the 11-bit code
to the 10-bit case, and neglected to change the exponent.

Fixes dEQP-GLES3.functional.pbo.renderbuffer.r11f_g11f_b10f_triangles
when run with surface dimensions of 1536x1152 or 1920x1080.

Cc: mesa-stable@lists.freedesktop.org
References: https://code.google.com/p/chrome-os-partner/issues/detail?id=56244
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Stephane Marchesin <stephane.marchesin@gmail.com>
Reviewed-by: Antia Puentes <apuentes@igalia.com>
2016-08-17 17:26:11 -07:00
Tim Rowley
0ff57446e3 swr: [rasterizer core] only use Viewport/Scissors during SwrDraw* operations
Add explicit rects for:

- SwrClearRenderTarget
- SwrDiscardRect
- SwrInvalidateTiles
- SwrStoreTiles

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:55 -05:00
Tim Rowley
6209dbf5a4 swr: [rasterizer common] reorder SWR_FORMAT_INFO
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:55 -05:00
Tim Rowley
2a25ce7472 swr: [rasterizer core] make dirtytile list point directly to macrotilequeues
Speeds up high geometry HPC workloads.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:55 -05:00
Tim Rowley
550503e776 swr: [rasterizer core] portability - remove use of INT64
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:55 -05:00
Tim Rowley
d70f96fd67 swr: [rasterizer core] viewport transform disabled fix
When viewport transform is disabled (ie. screen space coords are passed
in directly), the W component should be interpreted as RHW.

Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:55 -05:00
Tim Rowley
812b45d049 swr: [rasterizer core] clamp scissor rects to current tile rect
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:55 -05:00
Tim Rowley
93fb768c7e swr: [rasterizer core] align stats structures
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:55 -05:00
Tim Rowley
9a25987b4a swr: [rasterizer core] use AVX2 permute to simplify PaTriList
Signed-off-by: Tim Rowley <timothy.o.rowley@intel.com>
2016-08-17 17:08:55 -05:00