third_party_mesa3d

Author	SHA1	Message	Date
Rhys Perry	e151398de6	aco: fix stack buffer overflow in apply_sgprs() Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Fixes: `cef7879719` ('aco: rewrite apply_sgprs()') Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2361 Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3442> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3442>	2020-01-20 11:13:11 +00:00
Tapani Pälli	9b2ccd6a0e	anv: add assert for isl_mod_info in choose_isl_tiling_flags CID: 1457859 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3469> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3469>	2020-01-20 12:12:29 +02:00
Tapani Pälli	8eebdd594b	anv: fix assert in GetImageDrmFormatModifierPropertiesEXT CID: 1457861 Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Engestrom <eric@engestrom.ch> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3469>	2020-01-20 12:11:43 +02:00
Tapani Pälli	31feae1c21	isl/gen12: add reminder comment about missing WA with 3D surfaces Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3441> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3441>	2020-01-20 08:06:19 +02:00
Icecream95	d8a3501f1b	panfrost: Dynamically allocate shader variants This fixes a crash in LZDoom where over 16 shader variants are needed for a few shaders in some maps, and should also save a few kilobytes of RAM as most of the time only one or two variants of the 8 previously allocated are actually needed. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>	2020-01-18 11:47:34 -05:00
Alyssa Rosenzweig	bef716b56c	panfrost: Expose some functionality with dEQP flag These features are stable enough that they don't need to be hidden. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3464> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3464>	2020-01-18 14:57:52 +00:00
Alyssa Rosenzweig	4af8d5b064	pan/midgard: Fix recursive csel scheduling Corner case causing invalid scheduling on shaders with nested csels, i.e. GLSL code resembling: (foo ? bool1 : bool2) ? x : y By explicitly disallowing csels this is fixed. Fixes INSTR_INVALID_ENC on a glamor shader (noticeable with slowdown and visual corruption when scrolling "too far" on GTK apps). Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3463> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3463>	2020-01-18 14:40:05 +00:00
Alyssa Rosenzweig	564a782ff7	panfrost: Identify un/pack colour opcodes We still need to identify formats in the disassembler, but this will at least get the opcode name clear. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3462> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3462>	2020-01-18 14:18:48 +00:00
Alyssa Rosenzweig	13c32e5fed	pan/midgard: Bytemasks should round up, not round down Otherwise we'll lost components in DCE. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3462>	2020-01-18 14:18:48 +00:00
Icecream95	5e8386c606	panfrost: Compact the bo_access readers array Previously, the array bo_access->readers was only cleared when there were no unsignaled fences, which in some situations never happened. That resulted in the array having thousands of NULL pointers, but only a handful of active readers. With this patch, all the unsignaled readers are moved to the front of the array, effectively building a new array only containing the active readers in-place. This results in the readers array usually only having a couple of elements. Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3419> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3419>	2020-01-18 13:58:43 +00:00
Erik Faye-Lund	c0ba9000d2	zink: support arrays of samplers Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275>	2020-01-18 10:45:38 +00:00
Erik Faye-Lund	a9023ec566	zink: support sampling non-float textures Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275>	2020-01-18 10:45:38 +00:00
Erik Faye-Lund	3e1acff560	zink: store image-type per texture Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275>	2020-01-18 10:45:38 +00:00
Erik Faye-Lund	5fc1562a72	zink: avoid incorrect vector-construction Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275>	2020-01-18 10:45:38 +00:00
Erik Faye-Lund	8112240d29	zink: support offset-variants of texturing Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275>	2020-01-18 10:45:38 +00:00
Erik Faye-Lund	f1a5bcdc16	zink: implement nir_texop_txs Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3275>	2020-01-18 10:45:38 +00:00
Erik Faye-Lund	7ee94d1b21	docs: fixup indentation The most canonical indentation-style here is two spaces, which is what the standard boilerplate in all documents use. So let's normalize to that. Reviewed-by: Eric Engestrom <eric@engestrom.ch> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>	2020-01-18 11:39:32 +01:00
Erik Faye-Lund	2ef989473a	docs: remove pointless, stray newline Reviewed-by: Eric Engestrom <eric@engestrom.ch> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>	2020-01-18 11:39:29 +01:00
Erik Faye-Lund	199572b65b	docs: use [1] instead of asterisk for footnote While we're at it, make it a link as well. Reviewed-by: Eric Engestrom <eric@engestrom.ch> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>	2020-01-18 11:39:25 +01:00
Erik Faye-Lund	063a28642e	docs: remove trailing newlines Reviewed-by: Eric Engestrom <eric@engestrom.ch> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>	2020-01-18 11:39:22 +01:00
Erik Faye-Lund	9954120b38	docs: remove leading spaces There's no good reason to have leading space in these pre-formatted blocks. It looks strange, so let's get rid of it. Reviewed-by: Eric Engestrom <eric@engestrom.ch> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>	2020-01-18 11:39:18 +01:00
Erik Faye-Lund	c871862744	docs: remove trailing header This header has been there since the document was added, but contains nothing. So let's get rid of it. Reviewed-by: Eric Engestrom <eric@engestrom.ch> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>	2020-01-18 11:39:14 +01:00
Erik Faye-Lund	37daddd3e4	docs: use figure/figcaption instead of tables Reviewed-by: Eric Engestrom <eric@engestrom.ch> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>	2020-01-18 11:39:07 +01:00
Erik Faye-Lund	f5983a6eed	docs: do not use definition-list for sub-topics The dl-tag isn't a neat tool for defining sub-headings, it's a semantic tool for defining definitions and their meaning. Let's insetad use normal sub-headings instead. To make the last few paragraphs stand out from the above, let's add a sub-heading for those as well. Reviewed-by: Eric Engestrom <eric@engestrom.ch> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3443>	2020-01-18 11:38:56 +01:00
Rob Clark	95187083c4	freedreno/a6xx: add PROG_FB_RAST stateobj For the handful of registers that depend on the union of program/ framebuffer/rasterizer state. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>	2020-01-17 15:43:51 -08:00
Rob Clark	6dc9b292d0	freedreno/a6xx: move dynamic program state to streaming stateobj Move the program state which we can't pre-bake to a streaming state object, rather than emitting directly in the draw cmdstream. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>	2020-01-17 15:43:51 -08:00
Rob Clark	d2fd6469c3	freedreno/a6xx: drop a few more per-draw registers Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>	2020-01-17 15:43:51 -08:00
Rob Clark	4d8f42c851	freedreno/a6xx: separate rast stateobj for prim restart This lets us move PC_PRIMITIVE_CNTL into the rasterizr stateobj, rather than unconditionally emitting it directly in the cmdstream on every draw. This also starts adding some tracking about previous draw state, so that following patches can limit some of the register writes we currently emit on every draw. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>	2020-01-17 15:43:51 -08:00
Rob Clark	0e063b3079	freedreno/a6xx: cleanup rasterizer state All but one of the reg values is only used in the stateobj, so we can inline the register value setup and stateobj construction. While we are at it, switch over to the new register builders. Prep work for next patch. Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>	2020-01-17 15:43:51 -08:00
Rob Clark	fba7e6f896	freedreno/a6xx: limit scratch/debug markers to debug builds The overhead does seem to matter when you have a high enough # of draw calls that effect few bins/pixels, because these writes would happen unconditionally (ie. not part of a state-group). Possibly we could keep these if we moved them into a state-group so the register writes would be no-ops on bins with no geometry. OTOH I usually end up adding in a WFI when using them scratch reg values to track down a crash. (So add a WFI to mitigate the annoyance of needing to use a debug build to get scratch regs to locate the position of a crash/hang in the cmdstream.) Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3435>	2020-01-17 15:43:51 -08:00
Jordan Justen	5d7381c645	iris: Fix some indentation in iris_init_render_context Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>	2020-01-17 15:28:07 -08:00
C Stout	c1104e4cee	util/vector: Fix u_vector_foreach when head rolls over Also add unit tests for u_vector. Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3453> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3453>	2020-01-17 22:21:00 +00:00
Francisco Jerez	b54b67e067	intel/fs: Switch to standard vector layout for barycentrics at optimization time. This involves permuting the registers of barycentric vectors to have the standard X[0-n] Y[0-n] layout at NIR translation time. Barycentrics are converted to the format expected by the PLN instruction in the lower_barycentrics() pass run after the optimization loop. Main reason is correctness of SIMD32 fragment shaders. The shuffle_from_pln_layout() and shuffle_to_pln_layout() helpers used during NIR translation are busted for SIMD32. This leads to serious corruption at present with INTEL_DEBUG=do32, especially on Gen11+ where these helpers are hit more frequently due to the lack of a hardware PLN instruction. Of course one could have chosen to fix those helpers instead, but there is another far more subtle issue that was reported during review of the SIMD32 fragment shader codegen changes: The SIMD splitting pass currently handles SIMD32 barycentric vectors as if they had the standard X[0-n] Y[0-n] layout, even though they are interleaved for the PLN instruction, which causes incorrect execution masks to be applied to the MOVs unzipping barycentric vectors in cases where a LINTERP instruction occurs under non-uniform control flow. I'm not aware of any conformance regressions due to the latter issue at present, but for our peace of mind let's move the conversion to the PLN layout into the lower_barycentrics() pass run after lower_simd_width(). This leads to the following shader-db improvements (including SIMD32 shaders) in combination with the previous back-end preparation changes -- Without them (especially the copy propagation changes) this would lead to a massive number of regressions. On ICL: total instructions in shared programs: 20662316 -> 20466903 (-0.95%) instructions in affected programs: 10538474 -> 10343061 (-1.85%) helped: 68775 HURT: 6 total spills in shared programs: 8938 -> 8748 (-2.13%) spills in affected programs: 376 -> 186 (-50.53%) helped: 9 HURT: 5 total fills in shared programs: 8965 -> 8663 (-3.37%) fills in affected programs: 965 -> 663 (-31.30%) helped: 9 HURT: 6 LOST: 146 GAINED: 43 On SKL: total instructions in shared programs: 18725867 -> 18614912 (-0.59%) instructions in affected programs: 3876590 -> 3765635 (-2.86%) helped: 27492 HURT: 2 LOST: 191 GAINED: 417 On SNB: total instructions in shared programs: 14573613 -> 13980646 (-4.07%) instructions in affected programs: 5199074 -> 4606107 (-11.41%) helped: 29998 HURT: 0 LOST: 21 GAINED: 30 Results are somewhat less impressive but still significant without SIMD32 fragment shaders enabled. On ICL: total instructions in shared programs: 16148728 -> 16061659 (-0.54%) instructions in affected programs: 6114788 -> 6027719 (-1.42%) helped: 42046 HURT: 6 total spills in shared programs: 8218 -> 8028 (-2.31%) spills in affected programs: 376 -> 186 (-50.53%) helped: 9 HURT: 5 total fills in shared programs: 8953 -> 8651 (-3.37%) fills in affected programs: 965 -> 663 (-31.30%) helped: 9 HURT: 6 LOST: 0 GAINED: 3 On SKL: total instructions in shared programs: 14927994 -> 14926738 (-0.01%) instructions in affected programs: 168850 -> 167594 (-0.74%) helped: 711 HURT: 2 On SNB: total instructions in shared programs: 10770538 -> 10734403 (-0.34%) instructions in affected programs: 2702172 -> 2666037 (-1.34%) helped: 17818 HURT: 0 All of the hurt shaders are either spilling slightly more or emitting additional NOP instructions due to the SIMD16 POW workaround for Gen8-9 combined with differences in scheduling. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:23:12 -08:00
Francisco Jerez	79bd252d6e	intel/fs: Introduce barycentric layout lowering pass. The goal is to represent barycentrics with the standard vector layout during optimization and particularly SIMD lowering. Instead of emitting the barycentric layout conversions at NIR translation time, do it later as a lowering pass. For the moment this is only applied to PI messages, but we'll give the same treatment to LINTERP instructions too. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:22:59 -08:00
Francisco Jerez	44d7d66adc	intel/fs: Split fetch_payload_reg() into separate helper for barycentrics. We're about to change the layout of barycentric vectors, which will involve permuting the GRFs of barycentrics fetched from the thread payload. Make room for this in a function separate from the generic fetch_payload_reg(), since the permutation will only be applicable to barycentric vectors. This allows simplifying fetch_payload_reg(), since there was no need for handling multiple-component payload registers except for barycentrics. This causes some minor shader-db noise due to the new helper emitting a LOAD_PAYLOAD instruction unconditionally, but it will be cleaned up shortly. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:22:51 -08:00
Francisco Jerez	9c9e80103c	intel/fs/gen6: Use SEL instead of bashing thread payload for unlit centroid workaround. This prevents regressions on SNB due to the redundant MOVs lying around in cases where fetch_payload_reg() returns a VGRF (currently only in SIMD32 but soon in pretty much all cases). The MOVs can't be register-coalesced due to their source being a FIXED_GRF, and they can't be copy-propagated either due to the unlit centroid workaround partial writes. They can be copy-propagated just fine into a SEL instruction though. On SNB this prevents the following shader-db regressions (including SIMD32 programs) in combination with the interpolation rework part of this series: total instructions in shared programs: 13996898 -> 14001982 (0.04%) instructions in affected programs: 197461 -> 202545 (2.57%) helped: 0 HURT: 1251 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:22:39 -08:00
Francisco Jerez	0dd18d70ae	intel/fs/gen6: Generalize aligned_pairs_class to SIMD16 aligned barycentrics. This is mainly meant to avoid shader-db regressions on SNB as we start using VGRFs for barycentrics more frequently. Currently the aligned_pairs_class is only useful in SIMD8 mode, because in SIMD16 mode barycentric vectors are typically 4 GRFs. This is not a problem on Gen4-5, because on those platforms all VGRF allocations are pair-aligned in SIMD16 mode. However on Gen6 we end up using either the fast or the slow path of LINTERP rather non-deterministically based on the behavior of the register allocator. Fix it by repurposing aligned_pairs_class to hold PLN-aligned registers of whatever the natural size of a barycentric vector is in the current dispatch width. On SNB this prevents the following shader-db regressions (including SIMD32 programs) in combination with the interpolation rework part of this series: total instructions in shared programs: 13983257 -> 14527274 (3.89%) instructions in affected programs: 1766255 -> 2310272 (30.80%) helped: 0 HURT: 11608 LOST: 26 GAINED: 13 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:22:34 -08:00
Francisco Jerez	0db4455c1f	intel/fs/gen6: Constrain barycentric source of LINTERP during bank conflict mitigation. This avoids regressions on SNB due to the bank conflict mitigation pass moving a VGRF-allocated barycentric vector to a misaligned location, which would prevent the PLN instruction from being used. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:22:29 -08:00
Francisco Jerez	369aef851d	intel/fs/gen4-6: Allocate registers from aligned_pairs_class based on LINTERP use. Previously we would hardcode fs_visitor::delta_xy barycentrics to be allocated from aligned_pairs_class on hardware with PLN source alignment restrictions (pre-Gen7). Instead allocate any registers consumed by LINTERP from aligned_pairs_class, even if some barycentric vector had ended up in a temporary. On SNB this prevents the following shader-db regressions (including SIMD32 programs) in combination with the interpolation rework part of this series: total instructions in shared programs: 13983257 -> 14527274 (3.89%) instructions in affected programs: 1766255 -> 2310272 (30.80%) helped: 0 HURT: 11608 LOST: 26 GAINED: 13 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:22:20 -08:00
Francisco Jerez	54b1b71e73	intel/fs: Allow limited copy propagation of a LOAD_PAYLOAD into another. This is particularly useful in cases where register coalaesce is unlikely to succeed because the LOAD_PAYLOAD isn't a plain copy -- E.g. when a LOAD_PAYLOAD is shuffling the contents of a barycentric vector in order to transform it into the PLN layout. This prevents the following shader-db regressions (including SIMD32 programs) in combination with the interpolation rework part of this series. On SKL: total instructions in shared programs: 18596672 -> 18976097 (2.04%) instructions in affected programs: 7937041 -> 8316466 (4.78%) helped: 39 HURT: 67427 LOST: 466 GAINED: 220 On SNB: total instructions in shared programs: 13993866 -> 14202963 (1.49%) instructions in affected programs: 7611309 -> 7820406 (2.75%) helped: 624 HURT: 52943 LOST: 6 GAINED: 18 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:22:09 -08:00
Francisco Jerez	8eb4f2092a	intel/fs: Add support for copy-propagating a block of multiple FIXED_GRFs. In cases where a LOAD_PAYLOAD instruction copies a single block of sequential GRF registers into the destination (see is_identity_payload()), splitting the block copy into a number of ACP entries (one for each LOAD_PAYLOAD source) is undesirable, because that prevents copy propagation into any instructions which read multiple components at once with the same source (the barycentric source of the LINTERP instruction is going to be the overwhelmingly most common example). Technically it would also be possible to do this for VGRF sources, but there is little benefit from that since register coalesce already covers many of those cases -- There is no way for a block of FIXED_GRFs to be coalesced into a VGRF though. This prevents the following shader-db regressions (including SIMD32 programs) in combination with the interpolation rework part of this series. On SKL: total instructions in shared programs: 18595160 -> 18828562 (1.26%) instructions in affected programs: 13374946 -> 13608348 (1.75%) helped: 7 HURT: 108977 total spills in shared programs: 9116 -> 9106 (-0.11%) spills in affected programs: 404 -> 394 (-2.48%) helped: 7 HURT: 9 total fills in shared programs: 8994 -> 9176 (2.02%) fills in affected programs: 898 -> 1080 (20.27%) helped: 7 HURT: 9 LOST: 469 GAINED: 220 On SNB: total instructions in shared programs: 13996898 -> 14096222 (0.71%) instructions in affected programs: 8088546 -> 8187870 (1.23%) helped: 2 HURT: 66520 total spills in shared programs: 2985 -> 2961 (-0.80%) spills in affected programs: 632 -> 608 (-3.80%) helped: 2 HURT: 0 total fills in shared programs: 3144 -> 3128 (-0.51%) fills in affected programs: 1515 -> 1499 (-1.06%) helped: 2 HURT: 0 LOST: 0 GAINED: 4 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:21:41 -08:00
Francisco Jerez	e328fbd9f8	intel/fs: Add partial support for copy-propagating FIXED_GRFs. This will be useful for eliminating redundant copies from the FS thread payload, particularly in SIMD32 programs. For the moment we only allow FIXED_GRFs with identity strides in order to avoid dealing with composing the arbitrary bidimensional strides that FIXED_GRF regions potentially have, which are rarely used at the IR level anyway. This enables the following commit allowing block-propagation of FIXED_GRF LOAD_PAYLOAD copies, and prevents the following shader-db regressions (including SIMD32 programs) in combination with the interpolation rework part of this series. On ICL: total instructions in shared programs: 20484665 -> 20529650 (0.22%) instructions in affected programs: 6031235 -> 6076220 (0.75%) helped: 5 HURT: 42073 total spills in shared programs: 8748 -> 8925 (2.02%) spills in affected programs: 186 -> 363 (95.16%) helped: 5 HURT: 9 total fills in shared programs: 8663 -> 8960 (3.43%) fills in affected programs: 647 -> 944 (45.90%) helped: 5 HURT: 9 On SKL: total instructions in shared programs: 18937442 -> 19128162 (1.01%) instructions in affected programs: 8378187 -> 8568907 (2.28%) helped: 39 HURT: 68176 LOST: 1 GAINED: 4 On SNB: total instructions in shared programs: 14094685 -> 14243499 (1.06%) instructions in affected programs: 7751062 -> 7899876 (1.92%) helped: 623 HURT: 53586 LOST: 7 GAINED: 25 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:21:33 -08:00
Francisco Jerez	5153d06d92	intel/fs: Extend copy propagation dataflow analysis to copies with FIXED_GRF source. This involves indexing the ACP tables used internally by fs_copy_prop_dataflow::setup_initial_values() by reg_space() instead of register number. Both are nearly equivalent for virtual GRFs (barring the single bit of entropy lost in the hash), and this makes handling FIXED_GRFs straightforward. Because we're only going to support FIXED_GRFs for the source of a copy, this change is only strictly necessary during the second pass that checks for source interference, but we also apply the same change to the first pass for consistency. Note that this shouldn't change the behavior of the copy propagation pass until we start inserting FIXED_GRF entries into the ACP. Even then FIXED_GRF writes are extremely rare so this change will hardly ever have an effect, but they aren't completely non-existing so we need to handle them for correctness. No functional nor shader-db changes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:21:27 -08:00
Francisco Jerez	ab0d1b3b3d	intel/fs: Rework fs_inst::is_copy_payload() into multiple classification helpers. This reworks the current fs_inst::is_copy_payload() method into a number of classification helpers with well-defined semantics. This will be useful later on in order to optimize LOAD_PAYLOAD instructions more aggressively in cases where we can determine it's safe to do so. The closest equivalent of the present fs_inst::is_copy_payload() method is the is_coalescing_payload() helper introduced here. No functional nor shader-db changes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:21:19 -08:00
Francisco Jerez	1873202f44	intel/fs: Generalize fs_reg::is_contiguous() to register files other than VGRF. No functional nor shader-db changes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:20:59 -08:00
Francisco Jerez	d9a57c85cc	intel/fs: Try to vectorize header setup in lower_load_payload(). In cases where LOAD_PAYLOAD is provided a pair of contiguous registers as header sources, try to use a single SIMD16 instruction in order to initialize them. This is unlikely to affect the overall cycle count of the shader, since the compressed instruction has twice the issue time, except due to the reduced pressure on the instruction cache. Main motivation is avoiding instruction-count regressions in combination with the following copy propagation improvements, which will allow the SIMD16 g0-1 header setup emitted for framebuffer writes to be copy-propagated into its LOAD_PAYLOAD, leading to the emission of two SIMD8 MOV instructions instead of a single SIMD16 MOV. Reverting this commit on top of the copy propagation changes would lead to the following shader-db regressions on SKL and other platforms: total instructions in shared programs: 14926738 -> 14935415 (0.06%) instructions in affected programs: 1892445 -> 1901122 (0.46%) helped: 0 HURT: 8676 Without the following copy propagation changes this doesn't have any effect on shader-db on Gen7+, because we would typically set up the FB write header with a separate SIMD16 MOV that isn't currently copy-propagated into the LOAD_PAYLOAD, so the individual SIMD8 MOVs result of LOAD_PAYLOAD lowering would get register-coalesced away under normal circumstances. However that wasn't the case for MRF LOAD_PAYLOAD destinations on Gen6 and earlier, because register coalesce only kicks in for GRFs, leaving a number of redundant SIMD8 MOVs lying around. On SNB this leads to the following shader-db improvements: total instructions in shared programs: 10770538 -> 10734681 (-0.33%) instructions in affected programs: 2700655 -> 2664798 (-1.33%) helped: 17791 HURT: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:20:46 -08:00
Marek Olšák	3ba16d36c9	st/dri: do FLUSH_VERTICES before calling flush_resource	2020-01-17 15:04:35 -05:00
Marek Olšák	bec9c90b5e	gallium: add st_context_iface::flush_resource to call FLUSH_VERTICES	2020-01-17 15:04:35 -05:00
Lionel Landwerlin	ddb80f9276	anv: enable VK_KHR_swapchain_mutable_format Enable new tests in dEQP-VK.image.swapchain_mutable.* Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3434> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3434>	2020-01-17 18:27:29 +00:00
Jason Ekstrand	4bdf8547f4	vulkan/wsi: Implement VK_KHR_swapchain_mutable_format This is only the core WSI code for the extension. It adds the image format list and the flags to vkCreateImage as well as handling things properly in the modifier queries. Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3434>	2020-01-17 18:27:29 +00:00

1 2 3 4 5 ...

119331 Commits