third_party_mesa3d

Author	SHA1	Message	Date
Matt Turner	40f0ade68e	intel/compiler: Handle invalid compacted immediates 16-bit immediates need to be replicated through the 32-bit immediate field, so we should never see one that isn't. This does happen however in the fuzzer unit test, so returning false allows the fuzzer to reject this case. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>	2020-01-22 00:19:21 +00:00
Matt Turner	205cb8a139	intel/compiler: Handle invalid inputs to brw_reg_type_to_*() Necessary to handle these cases when we test fuzzed instructions. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>	2020-01-22 00:19:21 +00:00
Matt Turner	741cf9a104	intel/compiler: Split hw_type tables Previously we were sharing tables between generations that were nearly identical (i.e., Gen8 3-src adds HF support) and used a small bit of code to handle the differences. This is kind of a mess if you want to reject 64-bit types on platforms that don't support 64-bit types, so split the tables, allowing each generation's table to list exactly what it supports. Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>	2020-01-22 00:19:21 +00:00
Matt Turner	0b70d46f7a	intel/compiler: Add a INVALID_{,HW_}REG_TYPE macros Since the enum brw_reg_type is packed, comparisons with -1 don't work directly, necessitating the cast. Add a macro to avoid this confusion. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>	2020-01-22 00:19:20 +00:00
Matt Turner	ab7c25b9aa	intel/compiler: Add NF some more places Necessary to handle these cases when we test fuzzed instructions. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>	2020-01-22 00:19:20 +00:00
Matt Turner	8634286c5d	intel/compiler: Limit compaction unit tests to specific gens Two of the tests emit instructions with MRF destinations, and MRFs aren't present on Gen7+. I think we were just lucky that this didn't cause a problem earlier since we were running the tests on Gen7-9. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>	2020-01-22 00:19:20 +00:00
Matt Turner	713c123bfa	intel/compiler: Don't disassemble align1 3-src operands on Gen < 10 Since the platforms don't support align1 3-src instructions, the contents of these operands are not going to be meaningful. Just don't print them to avoid hitting some assertions in brw_inst functions. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>	2020-01-22 00:19:20 +00:00
Matt Turner	49c21802cb	intel/compiler: Split has_64bit_types into float/int Gen7 has 64-bit floats but not 64-bit ints. Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>	2020-01-22 00:19:20 +00:00
Matt Turner	bb47aa2124	intel/compiler: Extract GEN_* macros into separate file Will be used by the instruction compaction unit test. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>	2020-01-22 00:19:20 +00:00
Matt Turner	c69f3ece61	intel/compiler: Use ARRAY_SIZE() Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2635>	2020-01-22 00:19:20 +00:00
Caio Marcelo de Oliveira Filho	45164fc8c5	intel/fs: Don't emit control barrier if only one thread is used When there's only one hardware thread (i.e. the dispatch width greater or equal to the workgroup size), there's no need to use a barrier to ensure all the invocations reach the same point in the shader, because they are already running lock-step. Results for SKL running Iris for shader-db tests with compute shaders total sends in shared programs: 18361 -> 18339 (-0.12%) sends in affected programs: 904 -> 882 (-2.43%) helped: 9 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 2.44 x̃: 2 helped stats (rel) min: 0.84% max: 21.43% x̄: 7.82% x̃: 2.67% 95% mean confidence interval for sends value: -3.31 -1.58 95% mean confidence interval for sends %-change: -14.67% -0.97% Sends are helped. Shaders from Aztec Ruins, Car Chase, Manhattan and DeusEx are helped. Results for ICL and TGL are similar to SKL. Results for BDW are similar to SKL except for DeusEx shader that has a workgroup size 16 but in BDW picks the SIMD8. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>	2020-01-21 23:41:35 +00:00
Caio Marcelo de Oliveira Filho	4f431e870c	intel/fs: Don't emit fence for shared memory if only one thread is used When there's only one hardware thread (i.e. the dispatch width greater or equal to the workgroup size), there's no need to synchronize shared memory access (SLM) since all the requests from a single thread are already synchronized. In such case, we just add a scheduling fence. To be able to identify that case for all platforms, move the handling of platforms prior to Gen11 (which don't have a separate SLM fence) after the optimization. Results for SKL running Iris for shader-db tests with compute shaders total sends in shared programs: 18395 -> 18361 (-0.18%) sends in affected programs: 938 -> 904 (-3.62%) helped: 9 HURT: 0 helped stats (abs) min: 1 max: 5 x̄: 3.78 x̃: 4 helped stats (rel) min: 1.56% max: 26.32% x̄: 10.33% x̃: 2.60% 95% mean confidence interval for sends value: -4.85 -2.71 95% mean confidence interval for sends %-change: -19.12% -1.54% Sends are helped. Shaders from Aztec Ruins, Car Chase, Manhattan and DeusEx are helped. Results for ICL and TGL are similar to SKL. Results for BDW are similar to SKL except for DeusEx shader that has a workgroup size 16 but in BDW picks the SIMD8. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>	2020-01-21 23:41:35 +00:00
Caio Marcelo de Oliveira Filho	ff5b74ef32	intel/fs: Add workgroup_size() helper Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>	2020-01-21 23:41:35 +00:00
Caio Marcelo de Oliveira Filho	18e72ee210	intel/fs: Add FS_OPCODE_SCHEDULING_FENCE Like a SHADER_OPCODE_MEMORY_FENCE but doesn't doesn't generate any assembly code. Will be used when the compiler shouldn't reorder certain instructions but there's no need to generate code for the HW to do it -- as the ordering will be guaranteed by other means. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3226>	2020-01-21 23:41:35 +00:00
Francisco Jerez	b54b67e067	intel/fs: Switch to standard vector layout for barycentrics at optimization time. This involves permuting the registers of barycentric vectors to have the standard X[0-n] Y[0-n] layout at NIR translation time. Barycentrics are converted to the format expected by the PLN instruction in the lower_barycentrics() pass run after the optimization loop. Main reason is correctness of SIMD32 fragment shaders. The shuffle_from_pln_layout() and shuffle_to_pln_layout() helpers used during NIR translation are busted for SIMD32. This leads to serious corruption at present with INTEL_DEBUG=do32, especially on Gen11+ where these helpers are hit more frequently due to the lack of a hardware PLN instruction. Of course one could have chosen to fix those helpers instead, but there is another far more subtle issue that was reported during review of the SIMD32 fragment shader codegen changes: The SIMD splitting pass currently handles SIMD32 barycentric vectors as if they had the standard X[0-n] Y[0-n] layout, even though they are interleaved for the PLN instruction, which causes incorrect execution masks to be applied to the MOVs unzipping barycentric vectors in cases where a LINTERP instruction occurs under non-uniform control flow. I'm not aware of any conformance regressions due to the latter issue at present, but for our peace of mind let's move the conversion to the PLN layout into the lower_barycentrics() pass run after lower_simd_width(). This leads to the following shader-db improvements (including SIMD32 shaders) in combination with the previous back-end preparation changes -- Without them (especially the copy propagation changes) this would lead to a massive number of regressions. On ICL: total instructions in shared programs: 20662316 -> 20466903 (-0.95%) instructions in affected programs: 10538474 -> 10343061 (-1.85%) helped: 68775 HURT: 6 total spills in shared programs: 8938 -> 8748 (-2.13%) spills in affected programs: 376 -> 186 (-50.53%) helped: 9 HURT: 5 total fills in shared programs: 8965 -> 8663 (-3.37%) fills in affected programs: 965 -> 663 (-31.30%) helped: 9 HURT: 6 LOST: 146 GAINED: 43 On SKL: total instructions in shared programs: 18725867 -> 18614912 (-0.59%) instructions in affected programs: 3876590 -> 3765635 (-2.86%) helped: 27492 HURT: 2 LOST: 191 GAINED: 417 On SNB: total instructions in shared programs: 14573613 -> 13980646 (-4.07%) instructions in affected programs: 5199074 -> 4606107 (-11.41%) helped: 29998 HURT: 0 LOST: 21 GAINED: 30 Results are somewhat less impressive but still significant without SIMD32 fragment shaders enabled. On ICL: total instructions in shared programs: 16148728 -> 16061659 (-0.54%) instructions in affected programs: 6114788 -> 6027719 (-1.42%) helped: 42046 HURT: 6 total spills in shared programs: 8218 -> 8028 (-2.31%) spills in affected programs: 376 -> 186 (-50.53%) helped: 9 HURT: 5 total fills in shared programs: 8953 -> 8651 (-3.37%) fills in affected programs: 965 -> 663 (-31.30%) helped: 9 HURT: 6 LOST: 0 GAINED: 3 On SKL: total instructions in shared programs: 14927994 -> 14926738 (-0.01%) instructions in affected programs: 168850 -> 167594 (-0.74%) helped: 711 HURT: 2 On SNB: total instructions in shared programs: 10770538 -> 10734403 (-0.34%) instructions in affected programs: 2702172 -> 2666037 (-1.34%) helped: 17818 HURT: 0 All of the hurt shaders are either spilling slightly more or emitting additional NOP instructions due to the SIMD16 POW workaround for Gen8-9 combined with differences in scheduling. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:23:12 -08:00
Francisco Jerez	79bd252d6e	intel/fs: Introduce barycentric layout lowering pass. The goal is to represent barycentrics with the standard vector layout during optimization and particularly SIMD lowering. Instead of emitting the barycentric layout conversions at NIR translation time, do it later as a lowering pass. For the moment this is only applied to PI messages, but we'll give the same treatment to LINTERP instructions too. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:22:59 -08:00
Francisco Jerez	44d7d66adc	intel/fs: Split fetch_payload_reg() into separate helper for barycentrics. We're about to change the layout of barycentric vectors, which will involve permuting the GRFs of barycentrics fetched from the thread payload. Make room for this in a function separate from the generic fetch_payload_reg(), since the permutation will only be applicable to barycentric vectors. This allows simplifying fetch_payload_reg(), since there was no need for handling multiple-component payload registers except for barycentrics. This causes some minor shader-db noise due to the new helper emitting a LOAD_PAYLOAD instruction unconditionally, but it will be cleaned up shortly. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:22:51 -08:00
Francisco Jerez	9c9e80103c	intel/fs/gen6: Use SEL instead of bashing thread payload for unlit centroid workaround. This prevents regressions on SNB due to the redundant MOVs lying around in cases where fetch_payload_reg() returns a VGRF (currently only in SIMD32 but soon in pretty much all cases). The MOVs can't be register-coalesced due to their source being a FIXED_GRF, and they can't be copy-propagated either due to the unlit centroid workaround partial writes. They can be copy-propagated just fine into a SEL instruction though. On SNB this prevents the following shader-db regressions (including SIMD32 programs) in combination with the interpolation rework part of this series: total instructions in shared programs: 13996898 -> 14001982 (0.04%) instructions in affected programs: 197461 -> 202545 (2.57%) helped: 0 HURT: 1251 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:22:39 -08:00
Francisco Jerez	0dd18d70ae	intel/fs/gen6: Generalize aligned_pairs_class to SIMD16 aligned barycentrics. This is mainly meant to avoid shader-db regressions on SNB as we start using VGRFs for barycentrics more frequently. Currently the aligned_pairs_class is only useful in SIMD8 mode, because in SIMD16 mode barycentric vectors are typically 4 GRFs. This is not a problem on Gen4-5, because on those platforms all VGRF allocations are pair-aligned in SIMD16 mode. However on Gen6 we end up using either the fast or the slow path of LINTERP rather non-deterministically based on the behavior of the register allocator. Fix it by repurposing aligned_pairs_class to hold PLN-aligned registers of whatever the natural size of a barycentric vector is in the current dispatch width. On SNB this prevents the following shader-db regressions (including SIMD32 programs) in combination with the interpolation rework part of this series: total instructions in shared programs: 13983257 -> 14527274 (3.89%) instructions in affected programs: 1766255 -> 2310272 (30.80%) helped: 0 HURT: 11608 LOST: 26 GAINED: 13 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:22:34 -08:00
Francisco Jerez	0db4455c1f	intel/fs/gen6: Constrain barycentric source of LINTERP during bank conflict mitigation. This avoids regressions on SNB due to the bank conflict mitigation pass moving a VGRF-allocated barycentric vector to a misaligned location, which would prevent the PLN instruction from being used. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:22:29 -08:00
Francisco Jerez	369aef851d	intel/fs/gen4-6: Allocate registers from aligned_pairs_class based on LINTERP use. Previously we would hardcode fs_visitor::delta_xy barycentrics to be allocated from aligned_pairs_class on hardware with PLN source alignment restrictions (pre-Gen7). Instead allocate any registers consumed by LINTERP from aligned_pairs_class, even if some barycentric vector had ended up in a temporary. On SNB this prevents the following shader-db regressions (including SIMD32 programs) in combination with the interpolation rework part of this series: total instructions in shared programs: 13983257 -> 14527274 (3.89%) instructions in affected programs: 1766255 -> 2310272 (30.80%) helped: 0 HURT: 11608 LOST: 26 GAINED: 13 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:22:20 -08:00
Francisco Jerez	54b1b71e73	intel/fs: Allow limited copy propagation of a LOAD_PAYLOAD into another. This is particularly useful in cases where register coalaesce is unlikely to succeed because the LOAD_PAYLOAD isn't a plain copy -- E.g. when a LOAD_PAYLOAD is shuffling the contents of a barycentric vector in order to transform it into the PLN layout. This prevents the following shader-db regressions (including SIMD32 programs) in combination with the interpolation rework part of this series. On SKL: total instructions in shared programs: 18596672 -> 18976097 (2.04%) instructions in affected programs: 7937041 -> 8316466 (4.78%) helped: 39 HURT: 67427 LOST: 466 GAINED: 220 On SNB: total instructions in shared programs: 13993866 -> 14202963 (1.49%) instructions in affected programs: 7611309 -> 7820406 (2.75%) helped: 624 HURT: 52943 LOST: 6 GAINED: 18 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:22:09 -08:00
Francisco Jerez	8eb4f2092a	intel/fs: Add support for copy-propagating a block of multiple FIXED_GRFs. In cases where a LOAD_PAYLOAD instruction copies a single block of sequential GRF registers into the destination (see is_identity_payload()), splitting the block copy into a number of ACP entries (one for each LOAD_PAYLOAD source) is undesirable, because that prevents copy propagation into any instructions which read multiple components at once with the same source (the barycentric source of the LINTERP instruction is going to be the overwhelmingly most common example). Technically it would also be possible to do this for VGRF sources, but there is little benefit from that since register coalesce already covers many of those cases -- There is no way for a block of FIXED_GRFs to be coalesced into a VGRF though. This prevents the following shader-db regressions (including SIMD32 programs) in combination with the interpolation rework part of this series. On SKL: total instructions in shared programs: 18595160 -> 18828562 (1.26%) instructions in affected programs: 13374946 -> 13608348 (1.75%) helped: 7 HURT: 108977 total spills in shared programs: 9116 -> 9106 (-0.11%) spills in affected programs: 404 -> 394 (-2.48%) helped: 7 HURT: 9 total fills in shared programs: 8994 -> 9176 (2.02%) fills in affected programs: 898 -> 1080 (20.27%) helped: 7 HURT: 9 LOST: 469 GAINED: 220 On SNB: total instructions in shared programs: 13996898 -> 14096222 (0.71%) instructions in affected programs: 8088546 -> 8187870 (1.23%) helped: 2 HURT: 66520 total spills in shared programs: 2985 -> 2961 (-0.80%) spills in affected programs: 632 -> 608 (-3.80%) helped: 2 HURT: 0 total fills in shared programs: 3144 -> 3128 (-0.51%) fills in affected programs: 1515 -> 1499 (-1.06%) helped: 2 HURT: 0 LOST: 0 GAINED: 4 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:21:41 -08:00
Francisco Jerez	e328fbd9f8	intel/fs: Add partial support for copy-propagating FIXED_GRFs. This will be useful for eliminating redundant copies from the FS thread payload, particularly in SIMD32 programs. For the moment we only allow FIXED_GRFs with identity strides in order to avoid dealing with composing the arbitrary bidimensional strides that FIXED_GRF regions potentially have, which are rarely used at the IR level anyway. This enables the following commit allowing block-propagation of FIXED_GRF LOAD_PAYLOAD copies, and prevents the following shader-db regressions (including SIMD32 programs) in combination with the interpolation rework part of this series. On ICL: total instructions in shared programs: 20484665 -> 20529650 (0.22%) instructions in affected programs: 6031235 -> 6076220 (0.75%) helped: 5 HURT: 42073 total spills in shared programs: 8748 -> 8925 (2.02%) spills in affected programs: 186 -> 363 (95.16%) helped: 5 HURT: 9 total fills in shared programs: 8663 -> 8960 (3.43%) fills in affected programs: 647 -> 944 (45.90%) helped: 5 HURT: 9 On SKL: total instructions in shared programs: 18937442 -> 19128162 (1.01%) instructions in affected programs: 8378187 -> 8568907 (2.28%) helped: 39 HURT: 68176 LOST: 1 GAINED: 4 On SNB: total instructions in shared programs: 14094685 -> 14243499 (1.06%) instructions in affected programs: 7751062 -> 7899876 (1.92%) helped: 623 HURT: 53586 LOST: 7 GAINED: 25 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:21:33 -08:00
Francisco Jerez	5153d06d92	intel/fs: Extend copy propagation dataflow analysis to copies with FIXED_GRF source. This involves indexing the ACP tables used internally by fs_copy_prop_dataflow::setup_initial_values() by reg_space() instead of register number. Both are nearly equivalent for virtual GRFs (barring the single bit of entropy lost in the hash), and this makes handling FIXED_GRFs straightforward. Because we're only going to support FIXED_GRFs for the source of a copy, this change is only strictly necessary during the second pass that checks for source interference, but we also apply the same change to the first pass for consistency. Note that this shouldn't change the behavior of the copy propagation pass until we start inserting FIXED_GRF entries into the ACP. Even then FIXED_GRF writes are extremely rare so this change will hardly ever have an effect, but they aren't completely non-existing so we need to handle them for correctness. No functional nor shader-db changes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:21:27 -08:00
Francisco Jerez	ab0d1b3b3d	intel/fs: Rework fs_inst::is_copy_payload() into multiple classification helpers. This reworks the current fs_inst::is_copy_payload() method into a number of classification helpers with well-defined semantics. This will be useful later on in order to optimize LOAD_PAYLOAD instructions more aggressively in cases where we can determine it's safe to do so. The closest equivalent of the present fs_inst::is_copy_payload() method is the is_coalescing_payload() helper introduced here. No functional nor shader-db changes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:21:19 -08:00
Francisco Jerez	1873202f44	intel/fs: Generalize fs_reg::is_contiguous() to register files other than VGRF. No functional nor shader-db changes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:20:59 -08:00
Francisco Jerez	d9a57c85cc	intel/fs: Try to vectorize header setup in lower_load_payload(). In cases where LOAD_PAYLOAD is provided a pair of contiguous registers as header sources, try to use a single SIMD16 instruction in order to initialize them. This is unlikely to affect the overall cycle count of the shader, since the compressed instruction has twice the issue time, except due to the reduced pressure on the instruction cache. Main motivation is avoiding instruction-count regressions in combination with the following copy propagation improvements, which will allow the SIMD16 g0-1 header setup emitted for framebuffer writes to be copy-propagated into its LOAD_PAYLOAD, leading to the emission of two SIMD8 MOV instructions instead of a single SIMD16 MOV. Reverting this commit on top of the copy propagation changes would lead to the following shader-db regressions on SKL and other platforms: total instructions in shared programs: 14926738 -> 14935415 (0.06%) instructions in affected programs: 1892445 -> 1901122 (0.46%) helped: 0 HURT: 8676 Without the following copy propagation changes this doesn't have any effect on shader-db on Gen7+, because we would typically set up the FB write header with a separate SIMD16 MOV that isn't currently copy-propagated into the LOAD_PAYLOAD, so the individual SIMD8 MOVs result of LOAD_PAYLOAD lowering would get register-coalesced away under normal circumstances. However that wasn't the case for MRF LOAD_PAYLOAD destinations on Gen6 and earlier, because register coalesce only kicks in for GRFs, leaving a number of redundant SIMD8 MOVs lying around. On SNB this leads to the following shader-db improvements: total instructions in shared programs: 10770538 -> 10734681 (-0.33%) instructions in affected programs: 2700655 -> 2664798 (-1.33%) helped: 17791 HURT: 0 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-17 13:20:46 -08:00
Kenneth Graunke	0a1c47074b	intel/compiler: Fix illegal mutation in get_nir_image_intrinsic_image get_nir_image_intrinsic_image() was incorrectly mutating the value held by the register which holds the intrinsic's first source (image index). If this happened to be the register for an SSA def which is also used elsewhere in the program, this meant that we would clobber that value in subsequent uses. Note that this only affects i965, because neither anv nor iris use the binding table start sections, so nothing is ever added here. Fixes KHR-GL46.compute_shader.resources-max on i965 with Eric Anholt's MR !3240 applied. That MR reorders SSBOs and ABOs, so that test uses image 0 and SSBO 0, causing this code to brilliantly add binding table index 45 to both the image (correct) and the SSBO (bzzt, wrong!). Fixes: `09f1de97a7` ("anv,i965: Lower away image derefs in the driver") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3404> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3404>	2020-01-15 19:25:35 +00:00
Jason Ekstrand	721666e52a	anv,nir: Lower quad_broadcast with dynamic index in NIR This is required for the subgroupBroadcastDynamicId feature that was added in Vulkan 1.2. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2020-01-15 08:34:57 -06:00
Eric Anholt	3d9a3d0be0	i965: Reuse the new core glsl_count_dword_slots(). The only difference I could see was treating interfaces like structs. Maintain that case. Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3297> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3297>	2020-01-14 23:55:00 +00:00
Caio Marcelo de Oliveira Filho	edf6a40cb2	intel/fs: Only use SLM fence in compute shaders Fixes: `b390ff3517` ("intel/fs: Add support for SLM fence in Gen11") Fixes: `e142061399` ("intel/fs: Implement scoped_memory_barrier") Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2020-01-14 10:55:48 -08:00
Rhys Perry	1ffacc3ce1	nir/lower_gs_intrinsics: add option for per-stream counts Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2422> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2422>	2020-01-14 12:11:14 +00:00
Jason Ekstrand	d3737002ee	nir/lower_atomics_to_ssbo: Also lower barriers This is more correct for a pass which is supposed to completely lower away atomic counters. It also lets us stop supporting atomic counter barriers in most of the drivers. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>	2020-01-13 17:23:47 +00:00
Jason Ekstrand	e40b11bbcb	nir: Rename nir_intrinsic_barrier to control_barrier This is a more explicit name now that we don't want it to be doing any memory barrier stuff for us. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>	2020-01-13 17:23:47 +00:00
Jason Ekstrand	bd3ab75aef	intel/nir: Stop adding redundant barriers Now that both GLSL and SPIR-V are adding shared and tcs_patch barriers (as appropreate) prior to the nir_intrinsic_barrier, we don't need to do it ourselves in the back-end. This reverts commit 26e950a5de01564e3b5f2148ae994454ae5205fe. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>	2020-01-13 17:23:47 +00:00
Jason Ekstrand	60097cc840	nir: Add a new memory_barrier_tcs_patch intrinsic Right now, it's implemented as a no-op for everyone. For most drivers, it's a switch case in the NIR -> whatever which just breaks. For ir3, they already have code to delete tessellation barriers so we just add a case to also delete memory_barrier_tcs_patch. Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Reviewed-by: Eric Anholt <eric@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>	2020-01-13 17:23:47 +00:00
Jason Ekstrand	ada49bae5e	intel/vec4: Support scoped_memory_barrier Fixes: `06aecb14c0` "anv: Implement VK_KHR_vulkan_memory_model" Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3307>	2020-01-13 17:23:46 +00:00
Francisco Jerez	c20dc9b836	intel/fs: Make implied_mrf_writes() an fs_inst method. This will be convenient in a later commit enabling SIMD32 fragment shaders, and happens to fix the calculation for MATH instructions which is currently inaccurate for SIMD-lowered instructions on Gen4-5 platforms (all of them on Gen4 in SIMD16 mode), since it was based on the shader's dispatch width rather than on the actual execution size of the instruction. This causes some shader-db noise on Gen4 due to the more compact register allocation interacting with the SEND dependency workarounds, but otherwise no major changes. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-10 11:02:30 -08:00
Francisco Jerez	591f146fd2	intel/fs/cse: Fix non-deterministic behavior due to inaccurate liveness calculation. The liveness calculation done by the local CSE pass in order to prune AEB entries whose sources are no longer live is currently inaccurate, because the live intervals are calculated once at the beginning of the pass, so they don't take into account any of the copy instructions inserted by the CSE pass as it makes progress. However the IP counter used in that calculation is based on the start_ip of the basic block, which is updated automatically whenever any instructions are inserted into the CFG. This causes the IP counter and liveness intervals to get out of sync in programs with multiple basic blocks, causing the CSE pass to toss AEB entries prematurely, which can lead to missed optimization opportunities rather non-deterministically. On BDW this leads to the following shader-db changes: total instructions in shared programs: 14952488 -> 14951763 (-0.00%) instructions in affected programs: 45416 -> 44691 (-1.60%) helped: 40 HURT: 4 total spills in shared programs: 20989 -> 20970 (-0.09%) spills in affected programs: 103 -> 84 (-18.45%) helped: 3 HURT: 0 total fills in shared programs: 24981 -> 24926 (-0.22%) fills in affected programs: 127 -> 72 (-43.31%) helped: 3 HURT: 0 In addition it avoids a number of regressions in combination with some of the optimization changes I'm working on for SIMD32, which would have made CSE more effective... Causing it to be less effective elsewhere in the program astonishingly. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-10 11:02:06 -08:00
Francisco Jerez	cc0ea482ad	intel/fs: Fix nir_intrinsic_load_barycentric_at_sample for SIMD32. For uniform sample ID, only the first channel of msg_data will be initialized. We need to pass that component only to the SEND message for SIMD lowering to unzip the descriptor source correctly. Fixes several dozens of conformance test failures with SIMD32 fragment shaders enabled, including: dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_sample.dynamic_sample_number.* Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-10 11:01:52 -08:00
Francisco Jerez	0703eab012	intel/fs/gen8+: Fix r127 dst/src overlap RA workaround for EOT message payload. The problem occured when the return payload of a SIMD8 SEND instruction was re-used as source payload of an EOT SEND message. In such cases the interference edge added by that workaround between the payload and grf127_send_hack_node would have no effect, because the payload would be allocated to a fixed range of registers containing r127 by the special handling of EOT message payloads in the same function. This would cause things to blow up if the source payload of the first SIMD8 message ended up being allocated to a range which happened to overlap the destination. Fix it by avoiding r127 altogether in the allocation of EOT message payloads. The problem can be reproduced on ICL with the fp-indirections2 Piglit test-case in combination with the other optimizer changes of this series. Fixes: `232ed89802` "i965/fs: Register allocator shoudn't use grf127 for sends dest" Cc: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-10 11:00:42 -08:00
Francisco Jerez	0a6e46d44d	intel/fs/gen11+: Handle ROR/ROL in lower_simd_width(). Prevents invalid code from being emitted for ROR/ROL instructions in SIMD32 shaders. The problem can be reproduced with the following tests while forcing SIMD32 to be used for fragment shaders: piglit.shaders.glsl-rotate-left piglit.shaders.glsl-rotate-right However the issue could occur in production already with compute shaders and a workgroup size large enough to trigger SIMD32 dispatch. Fixes: `83fdec0f0d` "intel/compiler: Enable the emission of ROR/ROL instructions" Cc: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>	2020-01-10 11:00:24 -08:00
Jason Ekstrand	b788cccfe2	intel/disasm: Fix decoding of src0 of SENDS There is no instruction field for the register file for src0 because it's always GRF. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3309> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3309>	2020-01-08 14:14:16 +00:00
Jason Ekstrand	803fad43c3	intel/nir: Add a memory barrier before barrier() Our barrier instruction does not implicitly do a memory fence but the GLSL barrier() intrinsic is supposed to. The easiest back-portable solution is to just add the NIR barriers. We'll sort this out more properly in later commits. Cc: mesa-stable@lists.freedesktop.org Closes: #2138 Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2020-01-07 21:52:19 -06:00
Kenneth Graunke	d0d28c783d	iris: Set nir_shader_compiler_options::unify_interfaces. This is technically enabling the option in the common intel backend code, but only the st/nir linker uses the option, so it's iris-only. Fixes Piglit's spec/glsl-1.50/execution/geometry/clip-distance-vs-gs-out Closes: #2274 Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3249> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3249>	2020-01-03 00:41:50 +00:00
Caio Marcelo de Oliveira Filho	766fdeccf9	intel/vec4: Fix lowering of multiplication by 16-bit constant Existing code was ignoring whether the type of the immediate source was signed or not. If the source was signed, it would ignore small negative values but it also would wrongly accept values between INT16_MAX and UINT16_MAX, causing the atual value to later be reinterpreted as a negative number (under 16-bits). Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for older platforms that don't support MUL with 32x32 types and use vec4. Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-12-17 10:45:22 -08:00
Caio Marcelo de Oliveira Filho	2137be22fa	intel/fs: Fix lowering of dword multiplication by 16-bit constant Existing code was ignoring whether the type of the immediate source was signed or not. If the source was signed, it would ignore small negative values but it also would wrongly accept values between INT16_MAX and UINT16_MAX, causing the atual value to later be reinterpreted as a negative number (under 16-bits). Fixes tests/shaders/glsl-mul-const.shader_test in Piglit for platforms that don't support MUL with 32x32 types, including ICL and TGL. Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2186 Cc: <mesa-stable@lists.freedesktop.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>	2019-12-17 10:45:22 -08:00
Caio Marcelo de Oliveira Filho	c06ba83589	intel/fs: Lower 64-bit MOVs after lower_load_payload() Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3070> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3070>	2019-12-14 21:12:21 +00:00
Eric Engestrom	d600b19640	intel/compiler: replace `0` pointer with `NULL` Signed-off-by: Eric Engestrom <eric.engestrom@intel.com> Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>	2019-12-13 20:16:20 +00:00

... 2 3 4 5 6 ...

1357 Commits