LLVM  16.0.0git
AMDGPURegisterBankInfo.cpp
Go to the documentation of this file.
1 //===- AMDGPURegisterBankInfo.cpp -------------------------------*- C++ -*-==//
2 //
3 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4 // See https://llvm.org/LICENSE.txt for license information.
5 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6 //
7 //===----------------------------------------------------------------------===//
8 /// \file
9 /// This file implements the targeting of the RegisterBankInfo class for
10 /// AMDGPU.
11 ///
12 /// \par
13 ///
14 /// AMDGPU has unique register bank constraints that require special high level
15 /// strategies to deal with. There are two main true physical register banks
16 /// VGPR (vector), and SGPR (scalar). Additionally the VCC register bank is a
17 /// sort of pseudo-register bank needed to represent SGPRs used in a vector
18 /// boolean context. There is also the AGPR bank, which is a special purpose
19 /// physical register bank present on some subtargets.
20 ///
21 /// Copying from VGPR to SGPR is generally illegal, unless the value is known to
22 /// be uniform. It is generally not valid to legalize operands by inserting
23 /// copies as on other targets. Operations which require uniform, SGPR operands
24 /// generally require scalarization by repeatedly executing the instruction,
25 /// activating each set of lanes using a unique set of input values. This is
26 /// referred to as a waterfall loop.
27 ///
28 /// \par Booleans
29 ///
30 /// Booleans (s1 values) requires special consideration. A vector compare result
31 /// is naturally a bitmask with one bit per lane, in a 32 or 64-bit
32 /// register. These are represented with the VCC bank. During selection, we need
33 /// to be able to unambiguously go back from a register class to a register
34 /// bank. To distinguish whether an SGPR should use the SGPR or VCC register
35 /// bank, we need to know the use context type. An SGPR s1 value always means a
36 /// VCC bank value, otherwise it will be the SGPR bank. A scalar compare sets
37 /// SCC, which is a 1-bit unaddressable register. This will need to be copied to
38 /// a 32-bit virtual register. Taken together, this means we need to adjust the
39 /// type of boolean operations to be regbank legal. All SALU booleans need to be
40 /// widened to 32-bits, and all VALU booleans need to be s1 values.
41 ///
42 /// A noteworthy exception to the s1-means-vcc rule is for legalization artifact
43 /// casts. G_TRUNC s1 results, and G_SEXT/G_ZEXT/G_ANYEXT sources are never vcc
44 /// bank. A non-boolean source (such as a truncate from a 1-bit load from
45 /// memory) will require a copy to the VCC bank which will require clearing the
46 /// high bits and inserting a compare.
47 ///
48 /// \par Constant bus restriction
49 ///
50 /// VALU instructions have a limitation known as the constant bus
51 /// restriction. Most VALU instructions can use SGPR operands, but may read at
52 /// most 1 SGPR or constant literal value (this to 2 in gfx10 for most
53 /// instructions). This is one unique SGPR, so the same SGPR may be used for
54 /// multiple operands. From a register bank perspective, any combination of
55 /// operands should be legal as an SGPR, but this is contextually dependent on
56 /// the SGPR operands all being the same register. There is therefore optimal to
57 /// choose the SGPR with the most uses to minimize the number of copies.
58 ///
59 /// We avoid trying to solve this problem in RegBankSelect. Any VALU G_*
60 /// operation should have its source operands all mapped to VGPRs (except for
61 /// VCC), inserting copies from any SGPR operands. This the most trivial legal
62 /// mapping. Anything beyond the simplest 1:1 instruction selection would be too
63 /// complicated to solve here. Every optimization pattern or instruction
64 /// selected to multiple outputs would have to enforce this rule, and there
65 /// would be additional complexity in tracking this rule for every G_*
66 /// operation. By forcing all inputs to VGPRs, it also simplifies the task of
67 /// picking the optimal operand combination from a post-isel optimization pass.
68 ///
69 //===----------------------------------------------------------------------===//
70 
71 #include "AMDGPURegisterBankInfo.h"
72 
73 #include "AMDGPU.h"
74 #include "AMDGPUGlobalISelUtils.h"
75 #include "AMDGPUInstrInfo.h"
76 #include "GCNSubtarget.h"
77 #include "SIMachineFunctionInfo.h"
78 #include "SIRegisterInfo.h"
84 #include "llvm/IR/IntrinsicsAMDGPU.h"
85 
86 #define GET_TARGET_REGBANK_IMPL
87 #include "AMDGPUGenRegisterBank.inc"
88 
89 // This file will be TableGen'ed at some point.
90 #include "AMDGPUGenRegisterBankInfo.def"
91 
92 using namespace llvm;
93 using namespace MIPatternMatch;
94 
95 namespace {
96 
97 // Observer to apply a register bank to new registers created by LegalizerHelper.
98 class ApplyRegBankMapping final : public GISelChangeObserver {
99 private:
100  const AMDGPURegisterBankInfo &RBI;
102  const RegisterBank *NewBank;
104 
105 public:
106  ApplyRegBankMapping(const AMDGPURegisterBankInfo &RBI_,
107  MachineRegisterInfo &MRI_, const RegisterBank *RB)
108  : RBI(RBI_), MRI(MRI_), NewBank(RB) {}
109 
110  ~ApplyRegBankMapping() {
111  for (MachineInstr *MI : NewInsts)
112  applyBank(*MI);
113  }
114 
115  /// Set any registers that don't have a set register class or bank to SALU.
116  void applyBank(MachineInstr &MI) {
117  const unsigned Opc = MI.getOpcode();
118  if (Opc == AMDGPU::G_ANYEXT || Opc == AMDGPU::G_ZEXT ||
119  Opc == AMDGPU::G_SEXT) {
120  // LegalizerHelper wants to use the basic legalization artifacts when
121  // widening etc. We don't handle selection with vcc in artifact sources,
122  // so we need to use a select instead to handle these properly.
123  Register DstReg = MI.getOperand(0).getReg();
124  Register SrcReg = MI.getOperand(1).getReg();
125  const RegisterBank *SrcBank = RBI.getRegBank(SrcReg, MRI, *RBI.TRI);
126  if (SrcBank == &AMDGPU::VCCRegBank) {
127  const LLT S32 = LLT::scalar(32);
128  assert(MRI.getType(SrcReg) == LLT::scalar(1));
129  assert(MRI.getType(DstReg) == S32);
130  assert(NewBank == &AMDGPU::VGPRRegBank);
131 
132  // Replace the extension with a select, which really uses the boolean
133  // source.
135  auto True = B.buildConstant(S32, Opc == AMDGPU::G_SEXT ? -1 : 1);
136  auto False = B.buildConstant(S32, 0);
137  B.buildSelect(DstReg, SrcReg, True, False);
138  MRI.setRegBank(True.getReg(0), *NewBank);
139  MRI.setRegBank(False.getReg(0), *NewBank);
140  MI.eraseFromParent();
141  }
142 
143  assert(!MRI.getRegClassOrRegBank(DstReg));
144  MRI.setRegBank(DstReg, *NewBank);
145  return;
146  }
147 
148 #ifndef NDEBUG
149  if (Opc == AMDGPU::G_TRUNC) {
150  Register DstReg = MI.getOperand(0).getReg();
151  const RegisterBank *DstBank = RBI.getRegBank(DstReg, MRI, *RBI.TRI);
152  assert(DstBank != &AMDGPU::VCCRegBank);
153  }
154 #endif
155 
156  for (MachineOperand &Op : MI.operands()) {
157  if (!Op.isReg())
158  continue;
159 
160  // We may see physical registers if building a real MI
161  Register Reg = Op.getReg();
162  if (Reg.isPhysical() || MRI.getRegClassOrRegBank(Reg))
163  continue;
164 
165  const RegisterBank *RB = NewBank;
166  if (MRI.getType(Reg) == LLT::scalar(1)) {
167  assert(NewBank == &AMDGPU::VGPRRegBank &&
168  "s1 operands should only be used for vector bools");
169  assert((MI.getOpcode() != AMDGPU::G_TRUNC &&
170  MI.getOpcode() != AMDGPU::G_ANYEXT) &&
171  "not expecting legalization artifacts here");
172  RB = &AMDGPU::VCCRegBank;
173  }
174 
175  MRI.setRegBank(Reg, *RB);
176  }
177  }
178 
179  void erasingInstr(MachineInstr &MI) override {}
180 
181  void createdInstr(MachineInstr &MI) override {
182  // At this point, the instruction was just inserted and has no operands.
183  NewInsts.push_back(&MI);
184  }
185 
186  void changingInstr(MachineInstr &MI) override {}
187  void changedInstr(MachineInstr &MI) override {
188  // FIXME: In principle we should probably add the instruction to NewInsts,
189  // but the way the LegalizerHelper uses the observer, we will always see the
190  // registers we need to set the regbank on also referenced in a new
191  // instruction.
192  }
193 };
194 
195 }
197  : Subtarget(ST), TRI(Subtarget.getRegisterInfo()),
198  TII(Subtarget.getInstrInfo()) {
199 
200  // HACK: Until this is fully tablegen'd.
201  static llvm::once_flag InitializeRegisterBankFlag;
202 
203  static auto InitializeRegisterBankOnce = [this]() {
204  assert(&getRegBank(AMDGPU::SGPRRegBankID) == &AMDGPU::SGPRRegBank &&
205  &getRegBank(AMDGPU::VGPRRegBankID) == &AMDGPU::VGPRRegBank &&
206  &getRegBank(AMDGPU::AGPRRegBankID) == &AMDGPU::AGPRRegBank);
207  (void)this;
208  };
209 
210  llvm::call_once(InitializeRegisterBankFlag, InitializeRegisterBankOnce);
211 }
212 
213 static bool isVectorRegisterBank(const RegisterBank &Bank) {
214  unsigned BankID = Bank.getID();
215  return BankID == AMDGPU::VGPRRegBankID || BankID == AMDGPU::AGPRRegBankID;
216 }
217 
219  const RegisterBank &Src,
220  unsigned Size) const {
221  // TODO: Should there be a UniformVGPRRegBank which can use readfirstlane?
222  if (Dst.getID() == AMDGPU::SGPRRegBankID &&
223  (isVectorRegisterBank(Src) || Src.getID() == AMDGPU::VCCRegBankID)) {
225  }
226 
227  // Bool values are tricky, because the meaning is based on context. The SCC
228  // and VCC banks are for the natural scalar and vector conditions produced by
229  // a compare.
230  //
231  // Legalization doesn't know about the necessary context, so an s1 use may
232  // have been a truncate from an arbitrary value, in which case a copy (lowered
233  // as a compare with 0) needs to be inserted.
234  if (Size == 1 &&
235  (Dst.getID() == AMDGPU::SGPRRegBankID) &&
236  (isVectorRegisterBank(Src) ||
237  Src.getID() == AMDGPU::SGPRRegBankID ||
238  Src.getID() == AMDGPU::VCCRegBankID))
240 
241  // There is no direct copy between AGPRs.
242  if (Dst.getID() == AMDGPU::AGPRRegBankID &&
243  Src.getID() == AMDGPU::AGPRRegBankID)
244  return 4;
245 
246  return RegisterBankInfo::copyCost(Dst, Src, Size);
247 }
248 
250  const ValueMapping &ValMapping,
251  const RegisterBank *CurBank) const {
252  // Check if this is a breakdown for G_LOAD to move the pointer from SGPR to
253  // VGPR.
254  // FIXME: Is there a better way to do this?
255  if (ValMapping.NumBreakDowns >= 2 || ValMapping.BreakDown[0].Length >= 64)
256  return 10; // This is expensive.
257 
258  assert(ValMapping.NumBreakDowns == 2 &&
259  ValMapping.BreakDown[0].Length == 32 &&
260  ValMapping.BreakDown[0].StartIdx == 0 &&
261  ValMapping.BreakDown[1].Length == 32 &&
262  ValMapping.BreakDown[1].StartIdx == 32 &&
263  ValMapping.BreakDown[0].RegBank == ValMapping.BreakDown[1].RegBank);
264 
265  // 32-bit extract of a 64-bit value is just access of a subregister, so free.
266  // TODO: Cost of 0 hits assert, though it's not clear it's what we really
267  // want.
268 
269  // TODO: 32-bit insert to a 64-bit SGPR may incur a non-free copy due to SGPR
270  // alignment restrictions, but this probably isn't important.
271  return 1;
272 }
273 
274 const RegisterBank &
276  LLT Ty) const {
277  if (&RC == &AMDGPU::SReg_1RegClass)
278  return AMDGPU::VCCRegBank;
279 
280  // We promote real scalar booleans to SReg_32. Any SGPR using s1 is really a
281  // VCC-like use.
282  if (TRI->isSGPRClass(&RC)) {
283  // FIXME: This probably came from a copy from a physical register, which
284  // should be inferable from the copied to-type. We don't have many boolean
285  // physical register constraints so just assume a normal SGPR for now.
286  if (!Ty.isValid())
287  return AMDGPU::SGPRRegBank;
288 
289  return Ty == LLT::scalar(1) ? AMDGPU::VCCRegBank : AMDGPU::SGPRRegBank;
290  }
291 
292  return TRI->isAGPRClass(&RC) ? AMDGPU::AGPRRegBank : AMDGPU::VGPRRegBank;
293 }
294 
295 template <unsigned NumOps>
298  const MachineInstr &MI, const MachineRegisterInfo &MRI,
299  const std::array<unsigned, NumOps> RegSrcOpIdx,
300  ArrayRef<OpRegBankEntry<NumOps>> Table) const {
301 
302  InstructionMappings AltMappings;
303 
305 
306  unsigned Sizes[NumOps];
307  for (unsigned I = 0; I < NumOps; ++I) {
308  Register Reg = MI.getOperand(RegSrcOpIdx[I]).getReg();
309  Sizes[I] = getSizeInBits(Reg, MRI, *TRI);
310  }
311 
312  for (unsigned I = 0, E = MI.getNumExplicitDefs(); I != E; ++I) {
313  unsigned SizeI = getSizeInBits(MI.getOperand(I).getReg(), MRI, *TRI);
314  Operands[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SizeI);
315  }
316 
317  // getInstrMapping's default mapping uses ID 1, so start at 2.
318  unsigned MappingID = 2;
319  for (const auto &Entry : Table) {
320  for (unsigned I = 0; I < NumOps; ++I) {
321  int OpIdx = RegSrcOpIdx[I];
322  Operands[OpIdx] = AMDGPU::getValueMapping(Entry.RegBanks[I], Sizes[I]);
323  }
324 
325  AltMappings.push_back(&getInstructionMapping(MappingID++, Entry.Cost,
327  Operands.size()));
328  }
329 
330  return AltMappings;
331 }
332 
335  const MachineInstr &MI, const MachineRegisterInfo &MRI) const {
336  switch (MI.getIntrinsicID()) {
337  case Intrinsic::amdgcn_readlane: {
338  static const OpRegBankEntry<3> Table[2] = {
339  // Perfectly legal.
340  { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID }, 1 },
341 
342  // Need a readfirstlane for the index.
343  { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 }
344  };
345 
346  const std::array<unsigned, 3> RegSrcOpIdx = { { 0, 2, 3 } };
347  return addMappingFromTable<3>(MI, MRI, RegSrcOpIdx, makeArrayRef(Table));
348  }
349  case Intrinsic::amdgcn_writelane: {
350  static const OpRegBankEntry<4> Table[4] = {
351  // Perfectly legal.
352  { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 1 },
353 
354  // Need readfirstlane of first op
355  { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 },
356 
357  // Need readfirstlane of second op
358  { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 },
359 
360  // Need readfirstlane of both ops
361  { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 3 }
362  };
363 
364  // rsrc, voffset, offset
365  const std::array<unsigned, 4> RegSrcOpIdx = { { 0, 2, 3, 4 } };
366  return addMappingFromTable<4>(MI, MRI, RegSrcOpIdx, makeArrayRef(Table));
367  }
368  default:
370  }
371 }
372 
375  const MachineInstr &MI, const MachineRegisterInfo &MRI) const {
376 
377  switch (MI.getIntrinsicID()) {
378  case Intrinsic::amdgcn_s_buffer_load: {
379  static const OpRegBankEntry<2> Table[4] = {
380  // Perfectly legal.
381  { { AMDGPU::SGPRRegBankID, AMDGPU::SGPRRegBankID }, 1 },
382 
383  // Only need 1 register in loop
384  { { AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 300 },
385 
386  // Have to waterfall the resource.
387  { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID }, 1000 },
388 
389  // Have to waterfall the resource, and the offset.
390  { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 1500 }
391  };
392 
393  // rsrc, offset
394  const std::array<unsigned, 2> RegSrcOpIdx = { { 2, 3 } };
395  return addMappingFromTable<2>(MI, MRI, RegSrcOpIdx, makeArrayRef(Table));
396  }
397  case Intrinsic::amdgcn_ds_ordered_add:
398  case Intrinsic::amdgcn_ds_ordered_swap: {
399  // VGPR = M0, VGPR
400  static const OpRegBankEntry<3> Table[2] = {
401  // Perfectly legal.
402  { { AMDGPU::VGPRRegBankID, AMDGPU::SGPRRegBankID, AMDGPU::VGPRRegBankID }, 1 },
403 
404  // Need a readfirstlane for m0
405  { { AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID, AMDGPU::VGPRRegBankID }, 2 }
406  };
407 
408  const std::array<unsigned, 3> RegSrcOpIdx = { { 0, 2, 3 } };
409  return addMappingFromTable<3>(MI, MRI, RegSrcOpIdx, makeArrayRef(Table));
410  }
411  case Intrinsic::amdgcn_s_sendmsg:
412  case Intrinsic::amdgcn_s_sendmsghalt: {
413  // FIXME: Should have no register for immediate
414  static const OpRegBankEntry<1> Table[2] = {
415  // Perfectly legal.
416  { { AMDGPU::SGPRRegBankID }, 1 },
417 
418  // Need readlane
419  { { AMDGPU::VGPRRegBankID }, 3 }
420  };
421 
422  const std::array<unsigned, 1> RegSrcOpIdx = { { 2 } };
423  return addMappingFromTable<1>(MI, MRI, RegSrcOpIdx, makeArrayRef(Table));
424  }
425  default:
427  }
428 }
429 
430 // FIXME: Returns uniform if there's no source value information. This is
431 // probably wrong.
432 static bool isScalarLoadLegal(const MachineInstr &MI) {
433  if (!MI.hasOneMemOperand())
434  return false;
435 
436  const MachineMemOperand *MMO = *MI.memoperands_begin();
437  const unsigned AS = MMO->getAddrSpace();
438  const bool IsConst = AS == AMDGPUAS::CONSTANT_ADDRESS ||
440  // Require 4-byte alignment.
441  return MMO->getAlign() >= Align(4) &&
442  // Can't do a scalar atomic load.
443  !MMO->isAtomic() &&
444  // Don't use scalar loads for volatile accesses to non-constant address
445  // spaces.
446  (IsConst || !MMO->isVolatile()) &&
447  // Memory must be known constant, or not written before this load.
448  (IsConst || MMO->isInvariant() || (MMO->getFlags() & MONoClobber)) &&
450 }
451 
454  const MachineInstr &MI) const {
455 
456  const MachineFunction &MF = *MI.getParent()->getParent();
457  const MachineRegisterInfo &MRI = MF.getRegInfo();
458 
459 
460  InstructionMappings AltMappings;
461  switch (MI.getOpcode()) {
462  case TargetOpcode::G_CONSTANT: {
463  unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
464  if (Size == 1) {
465  static const OpRegBankEntry<1> Table[3] = {
466  { { AMDGPU::VGPRRegBankID }, 1 },
467  { { AMDGPU::SGPRRegBankID }, 1 },
468  { { AMDGPU::VCCRegBankID }, 1 }
469  };
470 
471  return addMappingFromTable<1>(MI, MRI, {{ 0 }}, Table);
472  }
473 
474  [[fallthrough]];
475  }
476  case TargetOpcode::G_FCONSTANT:
477  case TargetOpcode::G_FRAME_INDEX:
478  case TargetOpcode::G_GLOBAL_VALUE: {
479  static const OpRegBankEntry<1> Table[2] = {
480  { { AMDGPU::VGPRRegBankID }, 1 },
481  { { AMDGPU::SGPRRegBankID }, 1 }
482  };
483 
484  return addMappingFromTable<1>(MI, MRI, {{ 0 }}, Table);
485  }
486  case TargetOpcode::G_AND:
487  case TargetOpcode::G_OR:
488  case TargetOpcode::G_XOR: {
489  unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
490 
491  if (Size == 1) {
492  // s_{and|or|xor}_b32 set scc when the result of the 32-bit op is not 0.
493  const InstructionMapping &SCCMapping = getInstructionMapping(
494  1, 1, getOperandsMapping(
495  {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32),
496  AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32),
497  AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32)}),
498  3); // Num Operands
499  AltMappings.push_back(&SCCMapping);
500 
501  const InstructionMapping &VCCMapping0 = getInstructionMapping(
502  2, 1, getOperandsMapping(
503  {AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size),
504  AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size),
505  AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size)}),
506  3); // Num Operands
507  AltMappings.push_back(&VCCMapping0);
508  return AltMappings;
509  }
510 
511  if (Size != 64)
512  break;
513 
514  const InstructionMapping &SSMapping = getInstructionMapping(
515  1, 1, getOperandsMapping(
516  {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
517  AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
518  AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size)}),
519  3); // Num Operands
520  AltMappings.push_back(&SSMapping);
521 
522  const InstructionMapping &VVMapping = getInstructionMapping(
523  2, 2, getOperandsMapping(
524  {AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
525  AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
526  AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size)}),
527  3); // Num Operands
528  AltMappings.push_back(&VVMapping);
529  break;
530  }
531  case TargetOpcode::G_LOAD:
532  case TargetOpcode::G_ZEXTLOAD:
533  case TargetOpcode::G_SEXTLOAD: {
534  unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
535  LLT PtrTy = MRI.getType(MI.getOperand(1).getReg());
536  unsigned PtrSize = PtrTy.getSizeInBits();
537  unsigned AS = PtrTy.getAddressSpace();
538 
540  AS != AMDGPUAS::PRIVATE_ADDRESS) &&
542  const InstructionMapping &SSMapping = getInstructionMapping(
543  1, 1, getOperandsMapping(
544  {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
545  AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, PtrSize)}),
546  2); // Num Operands
547  AltMappings.push_back(&SSMapping);
548  }
549 
550  const InstructionMapping &VVMapping = getInstructionMapping(
551  2, 1,
553  {AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
554  AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, PtrSize)}),
555  2); // Num Operands
556  AltMappings.push_back(&VVMapping);
557 
558  // It may be possible to have a vgpr = load sgpr mapping here, because
559  // the mubuf instructions support this kind of load, but probably for only
560  // gfx7 and older. However, the addressing mode matching in the instruction
561  // selector should be able to do a better job of detecting and selecting
562  // these kinds of loads from the vgpr = load vgpr mapping.
563 
564  return AltMappings;
565 
566  }
567  case TargetOpcode::G_SELECT: {
568  unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
569  const InstructionMapping &SSMapping = getInstructionMapping(1, 1,
570  getOperandsMapping({AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
571  AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1),
572  AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
573  AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size)}),
574  4); // Num Operands
575  AltMappings.push_back(&SSMapping);
576 
577  const InstructionMapping &VVMapping = getInstructionMapping(2, 1,
578  getOperandsMapping({AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
579  AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1),
580  AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size),
581  AMDGPU::getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size)}),
582  4); // Num Operands
583  AltMappings.push_back(&VVMapping);
584 
585  return AltMappings;
586  }
587  case TargetOpcode::G_UADDE:
588  case TargetOpcode::G_USUBE:
589  case TargetOpcode::G_SADDE:
590  case TargetOpcode::G_SSUBE: {
591  unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
592  const InstructionMapping &SSMapping = getInstructionMapping(1, 1,
594  {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
595  AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1),
596  AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
597  AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size),
598  AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1)}),
599  5); // Num Operands
600  AltMappings.push_back(&SSMapping);
601 
602  const InstructionMapping &VVMapping = getInstructionMapping(2, 1,
603  getOperandsMapping({AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
604  AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1),
605  AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
606  AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size),
607  AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1)}),
608  5); // Num Operands
609  AltMappings.push_back(&VVMapping);
610  return AltMappings;
611  }
612  case AMDGPU::G_BRCOND: {
613  assert(MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 1);
614 
615  // TODO: Change type to 32 for scalar
616  const InstructionMapping &SMapping = getInstructionMapping(
617  1, 1, getOperandsMapping(
618  {AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 1), nullptr}),
619  2); // Num Operands
620  AltMappings.push_back(&SMapping);
621 
622  const InstructionMapping &VMapping = getInstructionMapping(
623  1, 1, getOperandsMapping(
624  {AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1), nullptr }),
625  2); // Num Operands
626  AltMappings.push_back(&VMapping);
627  return AltMappings;
628  }
629  case AMDGPU::G_INTRINSIC:
631  case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS:
633  default:
634  break;
635  }
637 }
638 
642  LLT HalfTy,
643  Register Reg) const {
644  assert(HalfTy.getSizeInBits() == 32);
645  MachineRegisterInfo *MRI = B.getMRI();
646  Register LoLHS = MRI->createGenericVirtualRegister(HalfTy);
647  Register HiLHS = MRI->createGenericVirtualRegister(HalfTy);
648  const RegisterBank *Bank = getRegBank(Reg, *MRI, *TRI);
649  MRI->setRegBank(LoLHS, *Bank);
650  MRI->setRegBank(HiLHS, *Bank);
651 
652  Regs.push_back(LoLHS);
653  Regs.push_back(HiLHS);
654 
655  B.buildInstr(AMDGPU::G_UNMERGE_VALUES)
656  .addDef(LoLHS)
657  .addDef(HiLHS)
658  .addUse(Reg);
659 }
660 
661 /// Replace the current type each register in \p Regs has with \p NewTy
663  LLT NewTy) {
664  for (Register Reg : Regs) {
666  MRI.setType(Reg, NewTy);
667  }
668 }
669 
671  if (Ty.isVector()) {
674  Ty.getElementType());
675  }
676 
677  assert(Ty.getScalarSizeInBits() % 2 == 0);
678  return LLT::scalar(Ty.getScalarSizeInBits() / 2);
679 }
680 
681 // Build one or more V_READFIRSTLANE_B32 instructions to move the given vector
682 // source value into a scalar register.
685  Register Src) const {
686  LLT Ty = MRI.getType(Src);
687  const RegisterBank *Bank = getRegBank(Src, MRI, *TRI);
688 
689  if (Bank == &AMDGPU::SGPRRegBank)
690  return Src;
691 
692  unsigned Bits = Ty.getSizeInBits();
693  assert(Bits % 32 == 0);
694 
695  if (Bank != &AMDGPU::VGPRRegBank) {
696  // We need to copy from AGPR to VGPR
697  Src = B.buildCopy(Ty, Src).getReg(0);
698  MRI.setRegBank(Src, AMDGPU::VGPRRegBank);
699  }
700 
701  LLT S32 = LLT::scalar(32);
702  unsigned NumParts = Bits / 32;
703  SmallVector<Register, 8> SrcParts;
704  SmallVector<Register, 8> DstParts;
705 
706  if (Bits == 32) {
707  SrcParts.push_back(Src);
708  } else {
709  auto Unmerge = B.buildUnmerge(S32, Src);
710  for (unsigned i = 0; i < NumParts; ++i)
711  SrcParts.push_back(Unmerge.getReg(i));
712  }
713 
714  for (unsigned i = 0; i < NumParts; ++i) {
715  Register SrcPart = SrcParts[i];
716  Register DstPart = MRI.createVirtualRegister(&AMDGPU::SReg_32RegClass);
717  MRI.setType(DstPart, NumParts == 1 ? Ty : S32);
718 
719  const TargetRegisterClass *Constrained =
720  constrainGenericRegister(SrcPart, AMDGPU::VGPR_32RegClass, MRI);
721  (void)Constrained;
722  assert(Constrained && "Failed to constrain readfirstlane src reg");
723 
724  B.buildInstr(AMDGPU::V_READFIRSTLANE_B32, {DstPart}, {SrcPart});
725 
726  DstParts.push_back(DstPart);
727  }
728 
729  if (Bits == 32)
730  return DstParts[0];
731 
732  Register Dst = B.buildMerge(Ty, DstParts).getReg(0);
733  MRI.setRegBank(Dst, AMDGPU::SGPRRegBank);
734  return Dst;
735 }
736 
737 /// Legalize instruction \p MI where operands in \p OpIndices must be SGPRs. If
738 /// any of the required SGPR operands are VGPRs, perform a waterfall loop to
739 /// execute the instruction for each unique combination of values in all lanes
740 /// in the wave. The block will be split such that rest of the instructions are
741 /// moved to a new block.
742 ///
743 /// Essentially performs this loop:
744 //
745 /// Save Execution Mask
746 /// For (Lane : Wavefront) {
747 /// Enable Lane, Disable all other lanes
748 /// SGPR = read SGPR value for current lane from VGPR
749 /// VGPRResult[Lane] = use_op SGPR
750 /// }
751 /// Restore Execution Mask
752 ///
753 /// There is additional complexity to try for compare values to identify the
754 /// unique values used.
758  SmallSet<Register, 4> &SGPROperandRegs,
759  MachineRegisterInfo &MRI) const {
760 
761  // Track use registers which have already been expanded with a readfirstlane
762  // sequence. This may have multiple uses if moving a sequence.
763  DenseMap<Register, Register> WaterfalledRegMap;
764 
765  MachineBasicBlock &MBB = B.getMBB();
766  MachineFunction *MF = &B.getMF();
767 
768  const TargetRegisterClass *WaveRC = TRI->getWaveMaskRegClass();
769  const unsigned MovExecOpc =
770  Subtarget.isWave32() ? AMDGPU::S_MOV_B32 : AMDGPU::S_MOV_B64;
771  const unsigned MovExecTermOpc =
772  Subtarget.isWave32() ? AMDGPU::S_MOV_B32_term : AMDGPU::S_MOV_B64_term;
773 
774  const unsigned XorTermOpc = Subtarget.isWave32() ?
775  AMDGPU::S_XOR_B32_term : AMDGPU::S_XOR_B64_term;
776  const unsigned AndSaveExecOpc = Subtarget.isWave32() ?
777  AMDGPU::S_AND_SAVEEXEC_B32 : AMDGPU::S_AND_SAVEEXEC_B64;
778  const unsigned ExecReg = Subtarget.isWave32() ?
779  AMDGPU::EXEC_LO : AMDGPU::EXEC;
780 
781 #ifndef NDEBUG
782  const int OrigRangeSize = std::distance(Range.begin(), Range.end());
783 #endif
784 
785  Register SaveExecReg = MRI.createVirtualRegister(WaveRC);
786  Register InitSaveExecReg = MRI.createVirtualRegister(WaveRC);
787 
788  // Don't bother using generic instructions/registers for the exec mask.
789  B.buildInstr(TargetOpcode::IMPLICIT_DEF)
790  .addDef(InitSaveExecReg);
791 
792  Register PhiExec = MRI.createVirtualRegister(WaveRC);
793  Register NewExec = MRI.createVirtualRegister(WaveRC);
794 
795  // To insert the loop we need to split the block. Move everything before this
796  // point to a new block, and insert a new empty block before this instruction.
799  MachineBasicBlock *RemainderBB = MF->CreateMachineBasicBlock();
800  MachineBasicBlock *RestoreExecBB = MF->CreateMachineBasicBlock();
802  ++MBBI;
803  MF->insert(MBBI, LoopBB);
804  MF->insert(MBBI, BodyBB);
805  MF->insert(MBBI, RestoreExecBB);
806  MF->insert(MBBI, RemainderBB);
807 
808  LoopBB->addSuccessor(BodyBB);
809  BodyBB->addSuccessor(RestoreExecBB);
810  BodyBB->addSuccessor(LoopBB);
811 
812  // Move the rest of the block into a new block.
813  RemainderBB->transferSuccessorsAndUpdatePHIs(&MBB);
814  RemainderBB->splice(RemainderBB->begin(), &MBB, Range.end(), MBB.end());
815 
816  MBB.addSuccessor(LoopBB);
817  RestoreExecBB->addSuccessor(RemainderBB);
818 
819  B.setInsertPt(*LoopBB, LoopBB->end());
820 
821  B.buildInstr(TargetOpcode::PHI)
822  .addDef(PhiExec)
823  .addReg(InitSaveExecReg)
824  .addMBB(&MBB)
825  .addReg(NewExec)
826  .addMBB(BodyBB);
827 
828  const DebugLoc &DL = B.getDL();
829 
830  MachineInstr &FirstInst = *Range.begin();
831 
832  // Move the instruction into the loop body. Note we moved everything after
833  // Range.end() already into a new block, so Range.end() is no longer valid.
834  BodyBB->splice(BodyBB->end(), &MBB, Range.begin(), MBB.end());
835 
836  // Figure out the iterator range after splicing the instructions.
837  MachineBasicBlock::iterator NewBegin = FirstInst.getIterator();
838  auto NewEnd = BodyBB->end();
839 
840  B.setMBB(*LoopBB);
841 
842  LLT S1 = LLT::scalar(1);
843  Register CondReg;
844 
845  assert(std::distance(NewBegin, NewEnd) == OrigRangeSize);
846 
847  for (MachineInstr &MI : make_range(NewBegin, NewEnd)) {
848  for (MachineOperand &Op : MI.uses()) {
849  if (!Op.isReg() || Op.isDef())
850  continue;
851 
852  Register OldReg = Op.getReg();
853  if (!SGPROperandRegs.count(OldReg))
854  continue;
855 
856  // See if we already processed this register in another instruction in the
857  // sequence.
858  auto OldVal = WaterfalledRegMap.find(OldReg);
859  if (OldVal != WaterfalledRegMap.end()) {
860  Op.setReg(OldVal->second);
861  continue;
862  }
863 
864  Register OpReg = Op.getReg();
865  LLT OpTy = MRI.getType(OpReg);
866 
867  const RegisterBank *OpBank = getRegBank(OpReg, MRI, *TRI);
868  if (OpBank != &AMDGPU::VGPRRegBank) {
869  // Insert copy from AGPR to VGPR before the loop.
870  B.setMBB(MBB);
871  OpReg = B.buildCopy(OpTy, OpReg).getReg(0);
872  MRI.setRegBank(OpReg, AMDGPU::VGPRRegBank);
873  B.setMBB(*LoopBB);
874  }
875 
876  Register CurrentLaneReg = buildReadFirstLane(B, MRI, OpReg);
877 
878  // Build the comparison(s).
879  unsigned OpSize = OpTy.getSizeInBits();
880  bool Is64 = OpSize % 64 == 0;
881  unsigned PartSize = Is64 ? 64 : 32;
882  LLT PartTy = LLT::scalar(PartSize);
883  unsigned NumParts = OpSize / PartSize;
884  SmallVector<Register, 8> OpParts;
885  SmallVector<Register, 8> CurrentLaneParts;
886 
887  if (NumParts == 1) {
888  OpParts.push_back(OpReg);
889  CurrentLaneParts.push_back(CurrentLaneReg);
890  } else {
891  auto UnmergeOp = B.buildUnmerge(PartTy, OpReg);
892  auto UnmergeCurrentLane = B.buildUnmerge(PartTy, CurrentLaneReg);
893  for (unsigned i = 0; i < NumParts; ++i) {
894  OpParts.push_back(UnmergeOp.getReg(i));
895  CurrentLaneParts.push_back(UnmergeCurrentLane.getReg(i));
896  MRI.setRegBank(OpParts[i], AMDGPU::VGPRRegBank);
897  MRI.setRegBank(CurrentLaneParts[i], AMDGPU::SGPRRegBank);
898  }
899  }
900 
901  for (unsigned i = 0; i < NumParts; ++i) {
902  auto CmpReg = B.buildICmp(CmpInst::ICMP_EQ, S1, CurrentLaneParts[i],
903  OpParts[i]).getReg(0);
904  MRI.setRegBank(CmpReg, AMDGPU::VCCRegBank);
905 
906  if (!CondReg) {
907  CondReg = CmpReg;
908  } else {
909  CondReg = B.buildAnd(S1, CondReg, CmpReg).getReg(0);
910  MRI.setRegBank(CondReg, AMDGPU::VCCRegBank);
911  }
912  }
913 
914  Op.setReg(CurrentLaneReg);
915 
916  // Make sure we don't re-process this register again.
917  WaterfalledRegMap.insert(std::make_pair(OldReg, Op.getReg()));
918  }
919  }
920 
921  // The ballot becomes a no-op during instruction selection.
922  CondReg = B.buildIntrinsic(Intrinsic::amdgcn_ballot,
923  {LLT::scalar(Subtarget.isWave32() ? 32 : 64)},
924  false)
925  .addReg(CondReg)
926  .getReg(0);
927  MRI.setRegClass(CondReg, WaveRC);
928 
929  // Update EXEC, save the original EXEC value to VCC.
930  B.buildInstr(AndSaveExecOpc)
931  .addDef(NewExec)
932  .addReg(CondReg, RegState::Kill);
933 
934  MRI.setSimpleHint(NewExec, CondReg);
935 
936  B.setInsertPt(*BodyBB, BodyBB->end());
937 
938  // Update EXEC, switch all done bits to 0 and all todo bits to 1.
939  B.buildInstr(XorTermOpc)
940  .addDef(ExecReg)
941  .addReg(ExecReg)
942  .addReg(NewExec);
943 
944  // XXX - s_xor_b64 sets scc to 1 if the result is nonzero, so can we use
945  // s_cbranch_scc0?
946 
947  // Loop back to V_READFIRSTLANE_B32 if there are still variants to cover.
948  B.buildInstr(AMDGPU::SI_WATERFALL_LOOP).addMBB(LoopBB);
949 
950  // Save the EXEC mask before the loop.
951  BuildMI(MBB, MBB.end(), DL, TII->get(MovExecOpc), SaveExecReg)
952  .addReg(ExecReg);
953 
954  // Restore the EXEC mask after the loop.
955  B.setMBB(*RestoreExecBB);
956  B.buildInstr(MovExecTermOpc)
957  .addDef(ExecReg)
958  .addReg(SaveExecReg);
959 
960  // Set the insert point after the original instruction, so any new
961  // instructions will be in the remainder.
962  B.setInsertPt(*RemainderBB, RemainderBB->begin());
963 
964  return true;
965 }
966 
967 // Return any unique registers used by \p MI at \p OpIndices that need to be
968 // handled in a waterfall loop. Returns these registers in \p
969 // SGPROperandRegs. Returns true if there are any operands to handle and a
970 // waterfall loop is necessary.
972  SmallSet<Register, 4> &SGPROperandRegs, MachineInstr &MI,
973  MachineRegisterInfo &MRI, ArrayRef<unsigned> OpIndices) const {
974  for (unsigned Op : OpIndices) {
975  assert(MI.getOperand(Op).isUse());
976  Register Reg = MI.getOperand(Op).getReg();
977  const RegisterBank *OpBank = getRegBank(Reg, MRI, *TRI);
978  if (OpBank->getID() != AMDGPU::SGPRRegBankID)
979  SGPROperandRegs.insert(Reg);
980  }
981 
982  // No operands need to be replaced, so no need to loop.
983  return !SGPROperandRegs.empty();
984 }
985 
988  ArrayRef<unsigned> OpIndices) const {
989  // Use a set to avoid extra readfirstlanes in the case where multiple operands
990  // are the same register.
991  SmallSet<Register, 4> SGPROperandRegs;
992 
993  if (!collectWaterfallOperands(SGPROperandRegs, MI, MRI, OpIndices))
994  return false;
995 
996  MachineBasicBlock::iterator I = MI.getIterator();
997  return executeInWaterfallLoop(B, make_range(I, std::next(I)),
998  SGPROperandRegs, MRI);
999 }
1000 
1003  ArrayRef<unsigned> OpIndices) const {
1005  return executeInWaterfallLoop(B, MI, MRI, OpIndices);
1006 }
1007 
1008 // Legalize an operand that must be an SGPR by inserting a readfirstlane.
1010  MachineInstr &MI, MachineRegisterInfo &MRI, unsigned OpIdx) const {
1011  Register Reg = MI.getOperand(OpIdx).getReg();
1012  const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
1013  if (Bank == &AMDGPU::SGPRRegBank)
1014  return;
1015 
1017 
1019  MI.getOperand(OpIdx).setReg(Reg);
1020 }
1021 
1022 /// Split \p Ty into 2 pieces. The first will have \p FirstSize bits, and the
1023 /// rest will be in the remainder.
1024 static std::pair<LLT, LLT> splitUnequalType(LLT Ty, unsigned FirstSize) {
1025  unsigned TotalSize = Ty.getSizeInBits();
1026  if (!Ty.isVector())
1027  return {LLT::scalar(FirstSize), LLT::scalar(TotalSize - FirstSize)};
1028 
1029  LLT EltTy = Ty.getElementType();
1030  unsigned EltSize = EltTy.getSizeInBits();
1031  assert(FirstSize % EltSize == 0);
1032 
1033  unsigned FirstPartNumElts = FirstSize / EltSize;
1034  unsigned RemainderElts = (TotalSize - FirstSize) / EltSize;
1035 
1036  return {LLT::scalarOrVector(ElementCount::getFixed(FirstPartNumElts), EltTy),
1037  LLT::scalarOrVector(ElementCount::getFixed(RemainderElts), EltTy)};
1038 }
1039 
1040 static LLT widen96To128(LLT Ty) {
1041  if (!Ty.isVector())
1042  return LLT::scalar(128);
1043 
1044  LLT EltTy = Ty.getElementType();
1045  assert(128 % EltTy.getSizeInBits() == 0);
1046  return LLT::fixed_vector(128 / EltTy.getSizeInBits(), EltTy);
1047 }
1048 
1050  const AMDGPURegisterBankInfo::OperandsMapper &OpdMapper,
1051  MachineRegisterInfo &MRI) const {
1052  Register DstReg = MI.getOperand(0).getReg();
1053  const LLT LoadTy = MRI.getType(DstReg);
1054  unsigned LoadSize = LoadTy.getSizeInBits();
1055  const unsigned MaxNonSmrdLoadSize = 128;
1056 
1057  const RegisterBank *DstBank =
1058  OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1059  if (DstBank == &AMDGPU::SGPRRegBank) {
1060  // There are some special cases that we need to look at for 32 bit and 96
1061  // bit SGPR loads otherwise we have nothing to do.
1062  if (LoadSize != 32 && LoadSize != 96)
1063  return false;
1064 
1065  MachineMemOperand *MMO = *MI.memoperands_begin();
1066  const unsigned MemSize = 8 * MMO->getSize();
1067  // Scalar loads of size 8 or 16 bit with proper alignment may be widened to
1068  // 32 bit. Check to see if we need to widen the memory access, 8 or 16 bit
1069  // scalar loads should have a load size of 32 but memory access size of less
1070  // than 32.
1071  if (LoadSize == 32 &&
1072  (MemSize == 32 || LoadTy.isVector() || !isScalarLoadLegal(MI)))
1073  return false;
1074 
1075  Register PtrReg = MI.getOperand(1).getReg();
1076 
1077  ApplyRegBankMapping O(*this, MRI, &AMDGPU::SGPRRegBank);
1078  MachineIRBuilder B(MI, O);
1079 
1080  if (LoadSize == 32) {
1081  // This is an extending load from a sub-dword size. Widen the memory
1082  // access size to 4 bytes and clear the extra high bits appropriately
1083  const LLT S32 = LLT::scalar(32);
1084  if (MI.getOpcode() == AMDGPU::G_SEXTLOAD) {
1085  // Must extend the sign bit into higher bits for a G_SEXTLOAD
1086  auto WideLoad = B.buildLoadFromOffset(S32, PtrReg, *MMO, 0);
1087  B.buildSExtInReg(MI.getOperand(0), WideLoad, MemSize);
1088  } else if (MI.getOpcode() == AMDGPU::G_ZEXTLOAD) {
1089  // Must extend zero into higher bits with an AND for a G_ZEXTLOAD
1090  auto WideLoad = B.buildLoadFromOffset(S32, PtrReg, *MMO, 0);
1091  B.buildZExtInReg(MI.getOperand(0), WideLoad, MemSize);
1092  } else
1093  // We do not need to touch the higher bits for regular loads.
1094  B.buildLoadFromOffset(MI.getOperand(0), PtrReg, *MMO, 0);
1095  } else {
1096  // 96-bit loads are only available for vector loads. We need to split this
1097  // into a 64-bit part, and 32 (unless we can widen to a 128-bit load).
1098  if (MMO->getAlign() < Align(16)) {
1099  MachineFunction *MF = MI.getParent()->getParent();
1100  ApplyRegBankMapping ApplyBank(*this, MRI, DstBank);
1101  MachineIRBuilder B(MI, ApplyBank);
1102  LegalizerHelper Helper(*MF, ApplyBank, B);
1103  LLT Part64, Part32;
1104  std::tie(Part64, Part32) = splitUnequalType(LoadTy, 64);
1105  if (Helper.reduceLoadStoreWidth(cast<GAnyLoad>(MI), 0, Part64) !=
1107  return false;
1108  return true;
1109  } else {
1110  LLT WiderTy = widen96To128(LoadTy);
1111  auto WideLoad = B.buildLoadFromOffset(WiderTy, PtrReg, *MMO, 0);
1112  if (WiderTy.isScalar())
1113  B.buildTrunc(MI.getOperand(0), WideLoad);
1114  else {
1115  B.buildDeleteTrailingVectorElements(MI.getOperand(0).getReg(),
1116  WideLoad);
1117  }
1118  }
1119  }
1120 
1121  MI.eraseFromParent();
1122  return true;
1123  }
1124 
1125  // 128-bit loads are supported for all instruction types.
1126  if (LoadSize <= MaxNonSmrdLoadSize)
1127  return false;
1128 
1129  SmallVector<Register, 16> DefRegs(OpdMapper.getVRegs(0));
1130  SmallVector<Register, 1> SrcRegs(OpdMapper.getVRegs(1));
1131 
1132  if (SrcRegs.empty())
1133  SrcRegs.push_back(MI.getOperand(1).getReg());
1134 
1135  assert(LoadSize % MaxNonSmrdLoadSize == 0);
1136 
1137  // RegBankSelect only emits scalar types, so we need to reset the pointer
1138  // operand to a pointer type.
1139  Register BasePtrReg = SrcRegs[0];
1140  LLT PtrTy = MRI.getType(MI.getOperand(1).getReg());
1141  MRI.setType(BasePtrReg, PtrTy);
1142 
1143  unsigned NumSplitParts = LoadTy.getSizeInBits() / MaxNonSmrdLoadSize;
1144  const LLT LoadSplitTy = LoadTy.divide(NumSplitParts);
1145  ApplyRegBankMapping Observer(*this, MRI, &AMDGPU::VGPRRegBank);
1146  MachineIRBuilder B(MI, Observer);
1147  LegalizerHelper Helper(B.getMF(), Observer, B);
1148 
1149  if (LoadTy.isVector()) {
1150  if (Helper.fewerElementsVector(MI, 0, LoadSplitTy) != LegalizerHelper::Legalized)
1151  return false;
1152  } else {
1153  if (Helper.narrowScalar(MI, 0, LoadSplitTy) != LegalizerHelper::Legalized)
1154  return false;
1155  }
1156 
1157  MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
1158  return true;
1159 }
1160 
1162  MachineInstr &MI,
1163  const AMDGPURegisterBankInfo::OperandsMapper &OpdMapper,
1164  MachineRegisterInfo &MRI) const {
1165  const MachineFunction &MF = *MI.getMF();
1166  const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
1167  const auto &TFI = *ST.getFrameLowering();
1168 
1169  // Guard in case the stack growth direction ever changes with scratch
1170  // instructions.
1171  if (TFI.getStackGrowthDirection() == TargetFrameLowering::StackGrowsDown)
1172  return false;
1173 
1174  Register Dst = MI.getOperand(0).getReg();
1175  Register AllocSize = MI.getOperand(1).getReg();
1176  Align Alignment = assumeAligned(MI.getOperand(2).getImm());
1177 
1178  const RegisterBank *SizeBank = getRegBank(AllocSize, MRI, *TRI);
1179 
1180  // TODO: Need to emit a wave reduction to get the maximum size.
1181  if (SizeBank != &AMDGPU::SGPRRegBank)
1182  return false;
1183 
1184  LLT PtrTy = MRI.getType(Dst);
1185  LLT IntPtrTy = LLT::scalar(PtrTy.getSizeInBits());
1186 
1188  Register SPReg = Info->getStackPtrOffsetReg();
1189  ApplyRegBankMapping ApplyBank(*this, MRI, &AMDGPU::SGPRRegBank);
1190  MachineIRBuilder B(MI, ApplyBank);
1191 
1192  auto WaveSize = B.buildConstant(LLT::scalar(32), ST.getWavefrontSizeLog2());
1193  auto ScaledSize = B.buildShl(IntPtrTy, AllocSize, WaveSize);
1194 
1195  auto SPCopy = B.buildCopy(PtrTy, SPReg);
1196  if (Alignment > TFI.getStackAlign()) {
1197  auto PtrAdd = B.buildPtrAdd(PtrTy, SPCopy, ScaledSize);
1198  B.buildMaskLowPtrBits(Dst, PtrAdd,
1199  Log2(Alignment) + ST.getWavefrontSizeLog2());
1200  } else {
1201  B.buildPtrAdd(Dst, SPCopy, ScaledSize);
1202  }
1203 
1204  MI.eraseFromParent();
1205  return true;
1206 }
1207 
1210  MachineRegisterInfo &MRI, int RsrcIdx) const {
1211  const int NumDefs = MI.getNumExplicitDefs();
1212 
1213  // The reported argument index is relative to the IR intrinsic call arguments,
1214  // so we need to shift by the number of defs and the intrinsic ID.
1215  RsrcIdx += NumDefs + 1;
1216 
1217  // Insert copies to VGPR arguments.
1218  applyDefaultMapping(OpdMapper);
1219 
1220  // Fixup any SGPR arguments.
1221  SmallVector<unsigned, 4> SGPRIndexes;
1222  for (int I = NumDefs, NumOps = MI.getNumOperands(); I != NumOps; ++I) {
1223  if (!MI.getOperand(I).isReg())
1224  continue;
1225 
1226  // If this intrinsic has a sampler, it immediately follows rsrc.
1227  if (I == RsrcIdx || I == RsrcIdx + 1)
1228  SGPRIndexes.push_back(I);
1229  }
1230 
1231  executeInWaterfallLoop(MI, MRI, SGPRIndexes);
1232  return true;
1233 }
1234 
1236  Register Reg) {
1238  if (!Def)
1239  return Reg;
1240 
1241  // TODO: Guard against this being an implicit def
1242  return Def->getOperand(0).getReg();
1243 }
1244 
1245 // Analyze a combined offset from an llvm.amdgcn.s.buffer intrinsic and store
1246 // the three offsets (voffset, soffset and instoffset)
1248  const AMDGPURegisterBankInfo &RBI,
1249  Register CombinedOffset, Register &VOffsetReg,
1250  Register &SOffsetReg, int64_t &InstOffsetVal,
1251  Align Alignment) {
1252  const LLT S32 = LLT::scalar(32);
1253  MachineRegisterInfo *MRI = B.getMRI();
1254 
1255  if (Optional<int64_t> Imm = getIConstantVRegSExtVal(CombinedOffset, *MRI)) {
1256  uint32_t SOffset, ImmOffset;
1257  if (AMDGPU::splitMUBUFOffset(*Imm, SOffset, ImmOffset, &RBI.Subtarget,
1258  Alignment)) {
1259  VOffsetReg = B.buildConstant(S32, 0).getReg(0);
1260  SOffsetReg = B.buildConstant(S32, SOffset).getReg(0);
1261  InstOffsetVal = ImmOffset;
1262 
1263  B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1264  B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1265  return SOffset + ImmOffset;
1266  }
1267  }
1268 
1269  Register Base;
1270  unsigned Offset;
1271 
1272  std::tie(Base, Offset) =
1273  AMDGPU::getBaseWithConstantOffset(*MRI, CombinedOffset);
1274 
1275  uint32_t SOffset, ImmOffset;
1276  if ((int)Offset > 0 && AMDGPU::splitMUBUFOffset(Offset, SOffset, ImmOffset,
1277  &RBI.Subtarget, Alignment)) {
1278  if (RBI.getRegBank(Base, *MRI, *RBI.TRI) == &AMDGPU::VGPRRegBank) {
1279  VOffsetReg = Base;
1280  SOffsetReg = B.buildConstant(S32, SOffset).getReg(0);
1281  B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1282  InstOffsetVal = ImmOffset;
1283  return 0; // XXX - Why is this 0?
1284  }
1285 
1286  // If we have SGPR base, we can use it for soffset.
1287  if (SOffset == 0) {
1288  VOffsetReg = B.buildConstant(S32, 0).getReg(0);
1289  B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1290  SOffsetReg = Base;
1291  InstOffsetVal = ImmOffset;
1292  return 0; // XXX - Why is this 0?
1293  }
1294  }
1295 
1296  // Handle the variable sgpr + vgpr case.
1297  MachineInstr *Add = getOpcodeDef(AMDGPU::G_ADD, CombinedOffset, *MRI);
1298  if (Add && (int)Offset >= 0) {
1299  Register Src0 = getSrcRegIgnoringCopies(*MRI, Add->getOperand(1).getReg());
1300  Register Src1 = getSrcRegIgnoringCopies(*MRI, Add->getOperand(2).getReg());
1301 
1302  const RegisterBank *Src0Bank = RBI.getRegBank(Src0, *MRI, *RBI.TRI);
1303  const RegisterBank *Src1Bank = RBI.getRegBank(Src1, *MRI, *RBI.TRI);
1304 
1305  if (Src0Bank == &AMDGPU::VGPRRegBank && Src1Bank == &AMDGPU::SGPRRegBank) {
1306  VOffsetReg = Src0;
1307  SOffsetReg = Src1;
1308  return 0;
1309  }
1310 
1311  if (Src0Bank == &AMDGPU::SGPRRegBank && Src1Bank == &AMDGPU::VGPRRegBank) {
1312  VOffsetReg = Src1;
1313  SOffsetReg = Src0;
1314  return 0;
1315  }
1316  }
1317 
1318  // Ensure we have a VGPR for the combined offset. This could be an issue if we
1319  // have an SGPR offset and a VGPR resource.
1320  if (RBI.getRegBank(CombinedOffset, *MRI, *RBI.TRI) == &AMDGPU::VGPRRegBank) {
1321  VOffsetReg = CombinedOffset;
1322  } else {
1323  VOffsetReg = B.buildCopy(S32, CombinedOffset).getReg(0);
1324  B.getMRI()->setRegBank(VOffsetReg, AMDGPU::VGPRRegBank);
1325  }
1326 
1327  SOffsetReg = B.buildConstant(S32, 0).getReg(0);
1328  B.getMRI()->setRegBank(SOffsetReg, AMDGPU::SGPRRegBank);
1329  return 0;
1330 }
1331 
1333  const OperandsMapper &OpdMapper) const {
1334  MachineInstr &MI = OpdMapper.getMI();
1335  MachineRegisterInfo &MRI = OpdMapper.getMRI();
1336 
1337  const LLT S32 = LLT::scalar(32);
1338  Register Dst = MI.getOperand(0).getReg();
1339  LLT Ty = MRI.getType(Dst);
1340 
1341  const RegisterBank *RSrcBank =
1342  OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
1343  const RegisterBank *OffsetBank =
1344  OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
1345  if (RSrcBank == &AMDGPU::SGPRRegBank &&
1346  OffsetBank == &AMDGPU::SGPRRegBank)
1347  return true; // Legal mapping
1348 
1349  // FIXME: 96-bit case was widened during legalize. We need to narrow it back
1350  // here but don't have an MMO.
1351 
1352  unsigned LoadSize = Ty.getSizeInBits();
1353  int NumLoads = 1;
1354  if (LoadSize == 256 || LoadSize == 512) {
1355  NumLoads = LoadSize / 128;
1356  Ty = Ty.divide(NumLoads);
1357  }
1358 
1359  // Use the alignment to ensure that the required offsets will fit into the
1360  // immediate offsets.
1361  const Align Alignment = NumLoads > 1 ? Align(16 * NumLoads) : Align(1);
1362 
1364  MachineFunction &MF = B.getMF();
1365 
1366  Register SOffset;
1367  Register VOffset;
1368  int64_t ImmOffset = 0;
1369 
1370  unsigned MMOOffset = setBufferOffsets(B, *this, MI.getOperand(2).getReg(),
1371  VOffset, SOffset, ImmOffset, Alignment);
1372 
1373  // TODO: 96-bit loads were widened to 128-bit results. Shrink the result if we
1374  // can, but we need to track an MMO for that.
1375  const unsigned MemSize = (Ty.getSizeInBits() + 7) / 8;
1376  const Align MemAlign(4); // FIXME: ABI type alignment?
1381  MemSize, MemAlign);
1382  if (MMOOffset != 0)
1383  BaseMMO = MF.getMachineMemOperand(BaseMMO, MMOOffset, MemSize);
1384 
1385  // If only the offset is divergent, emit a MUBUF buffer load instead. We can
1386  // assume that the buffer is unswizzled.
1387 
1388  Register RSrc = MI.getOperand(1).getReg();
1389  Register VIndex = B.buildConstant(S32, 0).getReg(0);
1390  B.getMRI()->setRegBank(VIndex, AMDGPU::VGPRRegBank);
1391 
1392  SmallVector<Register, 4> LoadParts(NumLoads);
1393 
1394  MachineBasicBlock::iterator MII = MI.getIterator();
1395  MachineInstrSpan Span(MII, &B.getMBB());
1396 
1397  for (int i = 0; i < NumLoads; ++i) {
1398  if (NumLoads == 1) {
1399  LoadParts[i] = Dst;
1400  } else {
1401  LoadParts[i] = MRI.createGenericVirtualRegister(Ty);
1402  MRI.setRegBank(LoadParts[i], AMDGPU::VGPRRegBank);
1403  }
1404 
1405  MachineMemOperand *MMO = BaseMMO;
1406  if (i != 0)
1407  BaseMMO = MF.getMachineMemOperand(BaseMMO, MMOOffset + 16 * i, MemSize);
1408 
1409  B.buildInstr(AMDGPU::G_AMDGPU_BUFFER_LOAD)
1410  .addDef(LoadParts[i]) // vdata
1411  .addUse(RSrc) // rsrc
1412  .addUse(VIndex) // vindex
1413  .addUse(VOffset) // voffset
1414  .addUse(SOffset) // soffset
1415  .addImm(ImmOffset + 16 * i) // offset(imm)
1416  .addImm(0) // cachepolicy, swizzled buffer(imm)
1417  .addImm(0) // idxen(imm)
1418  .addMemOperand(MMO);
1419  }
1420 
1421  // TODO: If only the resource is a VGPR, it may be better to execute the
1422  // scalar load in the waterfall loop if the resource is expected to frequently
1423  // be dynamically uniform.
1424  if (RSrcBank != &AMDGPU::SGPRRegBank) {
1425  // Remove the original instruction to avoid potentially confusing the
1426  // waterfall loop logic.
1427  B.setInstr(*Span.begin());
1428  MI.eraseFromParent();
1429 
1430  SmallSet<Register, 4> OpsToWaterfall;
1431 
1432  OpsToWaterfall.insert(RSrc);
1433  executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
1434  OpsToWaterfall, MRI);
1435  }
1436 
1437  if (NumLoads != 1) {
1438  if (Ty.isVector())
1439  B.buildConcatVectors(Dst, LoadParts);
1440  else
1441  B.buildMerge(Dst, LoadParts);
1442  }
1443 
1444  // We removed the instruction earlier with a waterfall loop.
1445  if (RSrcBank == &AMDGPU::SGPRRegBank)
1446  MI.eraseFromParent();
1447 
1448  return true;
1449 }
1450 
1452  bool Signed) const {
1453  MachineInstr &MI = OpdMapper.getMI();
1454  MachineRegisterInfo &MRI = OpdMapper.getMRI();
1455 
1456  // Insert basic copies
1457  applyDefaultMapping(OpdMapper);
1458 
1459  Register DstReg = MI.getOperand(0).getReg();
1460  LLT Ty = MRI.getType(DstReg);
1461 
1462  const LLT S32 = LLT::scalar(32);
1463 
1464  unsigned FirstOpnd = MI.getOpcode() == AMDGPU::G_INTRINSIC ? 2 : 1;
1465  Register SrcReg = MI.getOperand(FirstOpnd).getReg();
1466  Register OffsetReg = MI.getOperand(FirstOpnd + 1).getReg();
1467  Register WidthReg = MI.getOperand(FirstOpnd + 2).getReg();
1468 
1469  const RegisterBank *DstBank =
1470  OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1471  if (DstBank == &AMDGPU::VGPRRegBank) {
1472  if (Ty == S32)
1473  return true;
1474 
1475  // There is no 64-bit vgpr bitfield extract instructions so the operation
1476  // is expanded to a sequence of instructions that implement the operation.
1477  ApplyRegBankMapping ApplyBank(*this, MRI, &AMDGPU::VGPRRegBank);
1478  MachineIRBuilder B(MI, ApplyBank);
1479 
1480  const LLT S64 = LLT::scalar(64);
1481  // Shift the source operand so that extracted bits start at bit 0.
1482  auto ShiftOffset = Signed ? B.buildAShr(S64, SrcReg, OffsetReg)
1483  : B.buildLShr(S64, SrcReg, OffsetReg);
1484  auto UnmergeSOffset = B.buildUnmerge({S32, S32}, ShiftOffset);
1485 
1486  // A 64-bit bitfield extract uses the 32-bit bitfield extract instructions
1487  // if the width is a constant.
1488  if (auto ConstWidth = getIConstantVRegValWithLookThrough(WidthReg, MRI)) {
1489  // Use the 32-bit bitfield extract instruction if the width is a constant.
1490  // Depending on the width size, use either the low or high 32-bits.
1491  auto Zero = B.buildConstant(S32, 0);
1492  auto WidthImm = ConstWidth->Value.getZExtValue();
1493  if (WidthImm <= 32) {
1494  // Use bitfield extract on the lower 32-bit source, and then sign-extend
1495  // or clear the upper 32-bits.
1496  auto Extract =
1497  Signed ? B.buildSbfx(S32, UnmergeSOffset.getReg(0), Zero, WidthReg)
1498  : B.buildUbfx(S32, UnmergeSOffset.getReg(0), Zero, WidthReg);
1499  auto Extend =
1500  Signed ? B.buildAShr(S32, Extract, B.buildConstant(S32, 31)) : Zero;
1501  B.buildMerge(DstReg, {Extract, Extend});
1502  } else {
1503  // Use bitfield extract on upper 32-bit source, and combine with lower
1504  // 32-bit source.
1505  auto UpperWidth = B.buildConstant(S32, WidthImm - 32);
1506  auto Extract =
1507  Signed
1508  ? B.buildSbfx(S32, UnmergeSOffset.getReg(1), Zero, UpperWidth)
1509  : B.buildUbfx(S32, UnmergeSOffset.getReg(1), Zero, UpperWidth);
1510  B.buildMerge(DstReg, {UnmergeSOffset.getReg(0), Extract});
1511  }
1512  MI.eraseFromParent();
1513  return true;
1514  }
1515 
1516  // Expand to Src >> Offset << (64 - Width) >> (64 - Width) using 64-bit
1517  // operations.
1518  auto ExtShift = B.buildSub(S32, B.buildConstant(S32, 64), WidthReg);
1519  auto SignBit = B.buildShl(S64, ShiftOffset, ExtShift);
1520  if (Signed)
1521  B.buildAShr(S64, SignBit, ExtShift);
1522  else
1523  B.buildLShr(S64, SignBit, ExtShift);
1524  MI.eraseFromParent();
1525  return true;
1526  }
1527 
1528  // The scalar form packs the offset and width in a single operand.
1529 
1530  ApplyRegBankMapping ApplyBank(*this, MRI, &AMDGPU::SGPRRegBank);
1531  MachineIRBuilder B(MI, ApplyBank);
1532 
1533  // Ensure the high bits are clear to insert the offset.
1534  auto OffsetMask = B.buildConstant(S32, maskTrailingOnes<unsigned>(6));
1535  auto ClampOffset = B.buildAnd(S32, OffsetReg, OffsetMask);
1536 
1537  // Zeros out the low bits, so don't bother clamping the input value.
1538  auto ShiftWidth = B.buildShl(S32, WidthReg, B.buildConstant(S32, 16));
1539 
1540  // Transformation function, pack the offset and width of a BFE into
1541  // the format expected by the S_BFE_I32 / S_BFE_U32. In the second
1542  // source, bits [5:0] contain the offset and bits [22:16] the width.
1543  auto MergedInputs = B.buildOr(S32, ClampOffset, ShiftWidth);
1544 
1545  // TODO: It might be worth using a pseudo here to avoid scc clobber and
1546  // register class constraints.
1547  unsigned Opc = Ty == S32 ? (Signed ? AMDGPU::S_BFE_I32 : AMDGPU::S_BFE_U32) :
1548  (Signed ? AMDGPU::S_BFE_I64 : AMDGPU::S_BFE_U64);
1549 
1550  auto MIB = B.buildInstr(Opc, {DstReg}, {SrcReg, MergedInputs});
1551  if (!constrainSelectedInstRegOperands(*MIB, *TII, *TRI, *this))
1552  llvm_unreachable("failed to constrain BFE");
1553 
1554  MI.eraseFromParent();
1555  return true;
1556 }
1557 
1559  const OperandsMapper &OpdMapper) const {
1560  MachineInstr &MI = OpdMapper.getMI();
1561  MachineRegisterInfo &MRI = OpdMapper.getMRI();
1562 
1563  // Insert basic copies.
1564  applyDefaultMapping(OpdMapper);
1565 
1566  Register Dst0 = MI.getOperand(0).getReg();
1567  Register Dst1 = MI.getOperand(1).getReg();
1568  Register Src0 = MI.getOperand(2).getReg();
1569  Register Src1 = MI.getOperand(3).getReg();
1570  Register Src2 = MI.getOperand(4).getReg();
1571 
1572  if (MRI.getRegBankOrNull(Src0) == &AMDGPU::VGPRRegBank)
1573  return true;
1574 
1575  bool IsUnsigned = MI.getOpcode() == AMDGPU::G_AMDGPU_MAD_U64_U32;
1576  LLT S1 = LLT::scalar(1);
1577  LLT S32 = LLT::scalar(32);
1578 
1579  bool DstOnValu = MRI.getRegBankOrNull(Src2) == &AMDGPU::VGPRRegBank;
1580  bool Accumulate = true;
1581 
1582  if (!DstOnValu) {
1583  if (mi_match(Src2, MRI, m_ZeroInt()))
1584  Accumulate = false;
1585  }
1586 
1587  // Keep the multiplication on the SALU.
1589 
1590  Register DstHi;
1591  Register DstLo = B.buildMul(S32, Src0, Src1).getReg(0);
1592  bool MulHiInVgpr = false;
1593 
1594  MRI.setRegBank(DstLo, AMDGPU::SGPRRegBank);
1595 
1596  if (Subtarget.hasSMulHi()) {
1597  DstHi = IsUnsigned ? B.buildUMulH(S32, Src0, Src1).getReg(0)
1598  : B.buildSMulH(S32, Src0, Src1).getReg(0);
1599  MRI.setRegBank(DstHi, AMDGPU::SGPRRegBank);
1600  } else {
1601  Register VSrc0 = B.buildCopy(S32, Src0).getReg(0);
1602  Register VSrc1 = B.buildCopy(S32, Src1).getReg(0);
1603 
1604  MRI.setRegBank(VSrc0, AMDGPU::VGPRRegBank);
1605  MRI.setRegBank(VSrc1, AMDGPU::VGPRRegBank);
1606 
1607  DstHi = IsUnsigned ? B.buildUMulH(S32, VSrc0, VSrc1).getReg(0)
1608  : B.buildSMulH(S32, VSrc0, VSrc1).getReg(0);
1609  MRI.setRegBank(DstHi, AMDGPU::VGPRRegBank);
1610 
1611  if (!DstOnValu) {
1612  DstHi = buildReadFirstLane(B, MRI, DstHi);
1613  } else {
1614  MulHiInVgpr = true;
1615  }
1616  }
1617 
1618  // Accumulate and produce the "carry-out" bit.
1619  //
1620  // The "carry-out" is defined as bit 64 of the result when computed as a
1621  // big integer. For unsigned multiply-add, this matches the usual definition
1622  // of carry-out. For signed multiply-add, bit 64 is the sign bit of the
1623  // result, which is determined as:
1624  // sign(Src0 * Src1) + sign(Src2) + carry-out from unsigned 64-bit add
1625  LLT CarryType = DstOnValu ? S1 : S32;
1626  const RegisterBank &CarryBank =
1627  DstOnValu ? AMDGPU::VCCRegBank : AMDGPU::SGPRRegBank;
1628  const RegisterBank &DstBank =
1629  DstOnValu ? AMDGPU::VGPRRegBank : AMDGPU::SGPRRegBank;
1630  Register Carry;
1631  Register Zero;
1632 
1633  if (!IsUnsigned) {
1634  Zero = B.buildConstant(S32, 0).getReg(0);
1635  MRI.setRegBank(Zero,
1636  MulHiInVgpr ? AMDGPU::VGPRRegBank : AMDGPU::SGPRRegBank);
1637 
1638  Carry = B.buildICmp(CmpInst::ICMP_SLT, MulHiInVgpr ? S1 : S32, DstHi, Zero)
1639  .getReg(0);
1640  MRI.setRegBank(Carry, MulHiInVgpr ? AMDGPU::VCCRegBank
1641  : AMDGPU::SGPRRegBank);
1642 
1643  if (DstOnValu && !MulHiInVgpr) {
1644  Carry = B.buildTrunc(S1, Carry).getReg(0);
1645  MRI.setRegBank(Carry, AMDGPU::VCCRegBank);
1646  }
1647  }
1648 
1649  if (Accumulate) {
1650  if (DstOnValu) {
1651  DstLo = B.buildCopy(S32, DstLo).getReg(0);
1652  DstHi = B.buildCopy(S32, DstHi).getReg(0);
1653  MRI.setRegBank(DstLo, AMDGPU::VGPRRegBank);
1654  MRI.setRegBank(DstHi, AMDGPU::VGPRRegBank);
1655  }
1656 
1657  auto Unmerge = B.buildUnmerge(S32, Src2);
1658  Register Src2Lo = Unmerge.getReg(0);
1659  Register Src2Hi = Unmerge.getReg(1);
1660  MRI.setRegBank(Src2Lo, DstBank);
1661  MRI.setRegBank(Src2Hi, DstBank);
1662 
1663  if (!IsUnsigned) {
1664  auto Src2Sign = B.buildICmp(CmpInst::ICMP_SLT, CarryType, Src2Hi, Zero);
1665  MRI.setRegBank(Src2Sign.getReg(0), CarryBank);
1666 
1667  Carry = B.buildXor(CarryType, Carry, Src2Sign).getReg(0);
1668  MRI.setRegBank(Carry, CarryBank);
1669  }
1670 
1671  auto AddLo = B.buildUAddo(S32, CarryType, DstLo, Src2Lo);
1672  DstLo = AddLo.getReg(0);
1673  Register CarryLo = AddLo.getReg(1);
1674  MRI.setRegBank(DstLo, DstBank);
1675  MRI.setRegBank(CarryLo, CarryBank);
1676 
1677  auto AddHi = B.buildUAdde(S32, CarryType, DstHi, Src2Hi, CarryLo);
1678  DstHi = AddHi.getReg(0);
1679  MRI.setRegBank(DstHi, DstBank);
1680 
1681  Register CarryHi = AddHi.getReg(1);
1682  MRI.setRegBank(CarryHi, CarryBank);
1683 
1684  if (IsUnsigned) {
1685  Carry = CarryHi;
1686  } else {
1687  Carry = B.buildXor(CarryType, Carry, CarryHi).getReg(0);
1688  MRI.setRegBank(Carry, CarryBank);
1689  }
1690  } else {
1691  if (IsUnsigned) {
1692  Carry = B.buildConstant(CarryType, 0).getReg(0);
1693  MRI.setRegBank(Carry, CarryBank);
1694  }
1695  }
1696 
1697  B.buildMerge(Dst0, {DstLo, DstHi});
1698 
1699  if (DstOnValu) {
1700  B.buildCopy(Dst1, Carry);
1701  } else {
1702  B.buildTrunc(Dst1, Carry);
1703  }
1704 
1705  MI.eraseFromParent();
1706  return true;
1707 }
1708 
1709 // Return a suitable opcode for extending the operands of Opc when widening.
1710 static unsigned getExtendOp(unsigned Opc) {
1711  switch (Opc) {
1712  case TargetOpcode::G_ASHR:
1713  case TargetOpcode::G_SMIN:
1714  case TargetOpcode::G_SMAX:
1715  return TargetOpcode::G_SEXT;
1716  case TargetOpcode::G_LSHR:
1717  case TargetOpcode::G_UMIN:
1718  case TargetOpcode::G_UMAX:
1719  return TargetOpcode::G_ZEXT;
1720  default:
1721  return TargetOpcode::G_ANYEXT;
1722  }
1723 }
1724 
1725 // Emit a legalized extension from <2 x s16> to 2 32-bit components, avoiding
1726 // any illegal vector extend or unmerge operations.
1727 static std::pair<Register, Register>
1728 unpackV2S16ToS32(MachineIRBuilder &B, Register Src, unsigned ExtOpcode) {
1729  const LLT S32 = LLT::scalar(32);
1730  auto Bitcast = B.buildBitcast(S32, Src);
1731 
1732  if (ExtOpcode == TargetOpcode::G_SEXT) {
1733  auto ExtLo = B.buildSExtInReg(S32, Bitcast, 16);
1734  auto ShiftHi = B.buildAShr(S32, Bitcast, B.buildConstant(S32, 16));
1735  return std::make_pair(ExtLo.getReg(0), ShiftHi.getReg(0));
1736  }
1737 
1738  auto ShiftHi = B.buildLShr(S32, Bitcast, B.buildConstant(S32, 16));
1739  if (ExtOpcode == TargetOpcode::G_ZEXT) {
1740  auto ExtLo = B.buildAnd(S32, Bitcast, B.buildConstant(S32, 0xffff));
1741  return std::make_pair(ExtLo.getReg(0), ShiftHi.getReg(0));
1742  }
1743 
1744  assert(ExtOpcode == TargetOpcode::G_ANYEXT);
1745  return std::make_pair(Bitcast.getReg(0), ShiftHi.getReg(0));
1746 }
1747 
1748 // For cases where only a single copy is inserted for matching register banks.
1749 // Replace the register in the instruction operand
1751  const AMDGPURegisterBankInfo::OperandsMapper &OpdMapper, unsigned OpIdx) {
1752  SmallVector<unsigned, 1> SrcReg(OpdMapper.getVRegs(OpIdx));
1753  if (!SrcReg.empty()) {
1754  assert(SrcReg.size() == 1);
1755  OpdMapper.getMI().getOperand(OpIdx).setReg(SrcReg[0]);
1756  return true;
1757  }
1758 
1759  return false;
1760 }
1761 
1762 /// Handle register layout difference for f16 images for some subtargets.
1765  Register Reg) const {
1767  return Reg;
1768 
1769  const LLT S16 = LLT::scalar(16);
1770  LLT StoreVT = MRI.getType(Reg);
1771  if (!StoreVT.isVector() || StoreVT.getElementType() != S16)
1772  return Reg;
1773 
1774  auto Unmerge = B.buildUnmerge(S16, Reg);
1775 
1776 
1777  SmallVector<Register, 4> WideRegs;
1778  for (int I = 0, E = Unmerge->getNumOperands() - 1; I != E; ++I)
1779  WideRegs.push_back(Unmerge.getReg(I));
1780 
1781  const LLT S32 = LLT::scalar(32);
1782  int NumElts = StoreVT.getNumElements();
1783 
1784  return B.buildMerge(LLT::fixed_vector(NumElts, S32), WideRegs).getReg(0);
1785 }
1786 
1787 static std::pair<Register, unsigned>
1789  int64_t Const;
1790  if (mi_match(Reg, MRI, m_ICst(Const)))
1791  return std::make_pair(Register(), Const);
1792 
1793  Register Base;
1794  if (mi_match(Reg, MRI, m_GAdd(m_Reg(Base), m_ICst(Const))))
1795  return std::make_pair(Base, Const);
1796 
1797  // TODO: Handle G_OR used for add case
1798  return std::make_pair(Reg, 0);
1799 }
1800 
1801 std::pair<Register, unsigned>
1803  Register OrigOffset) const {
1804  const unsigned MaxImm = 4095;
1805  Register BaseReg;
1806  unsigned ImmOffset;
1807  const LLT S32 = LLT::scalar(32);
1808 
1809  // TODO: Use AMDGPU::getBaseWithConstantOffset() instead.
1810  std::tie(BaseReg, ImmOffset) = getBaseWithConstantOffset(*B.getMRI(),
1811  OrigOffset);
1812 
1813  unsigned C1 = 0;
1814  if (ImmOffset != 0) {
1815  // If the immediate value is too big for the immoffset field, put the value
1816  // and -4096 into the immoffset field so that the value that is copied/added
1817  // for the voffset field is a multiple of 4096, and it stands more chance
1818  // of being CSEd with the copy/add for another similar load/store.
1819  // However, do not do that rounding down to a multiple of 4096 if that is a
1820  // negative number, as it appears to be illegal to have a negative offset
1821  // in the vgpr, even if adding the immediate offset makes it positive.
1822  unsigned Overflow = ImmOffset & ~MaxImm;
1823  ImmOffset -= Overflow;
1824  if ((int32_t)Overflow < 0) {
1825  Overflow += ImmOffset;
1826  ImmOffset = 0;
1827  }
1828 
1829  C1 = ImmOffset;
1830  if (Overflow != 0) {
1831  if (!BaseReg)
1832  BaseReg = B.buildConstant(S32, Overflow).getReg(0);
1833  else {
1834  auto OverflowVal = B.buildConstant(S32, Overflow);
1835  BaseReg = B.buildAdd(S32, BaseReg, OverflowVal).getReg(0);
1836  }
1837  }
1838  }
1839 
1840  if (!BaseReg)
1841  BaseReg = B.buildConstant(S32, 0).getReg(0);
1842 
1843  return {BaseReg, C1};
1844 }
1845 
1847  Register SrcReg) const {
1848  MachineRegisterInfo &MRI = *B.getMRI();
1849  LLT SrcTy = MRI.getType(SrcReg);
1850  if (SrcTy.getSizeInBits() == 32) {
1851  // Use a v_mov_b32 here to make the exec dependency explicit.
1852  B.buildInstr(AMDGPU::V_MOV_B32_e32)
1853  .addDef(DstReg)
1854  .addUse(SrcReg);
1855  return constrainGenericRegister(DstReg, AMDGPU::VGPR_32RegClass, MRI) &&
1856  constrainGenericRegister(SrcReg, AMDGPU::SReg_32RegClass, MRI);
1857  }
1858 
1859  Register TmpReg0 = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
1860  Register TmpReg1 = MRI.createVirtualRegister(&AMDGPU::VGPR_32RegClass);
1861 
1862  B.buildInstr(AMDGPU::V_MOV_B32_e32)
1863  .addDef(TmpReg0)
1864  .addUse(SrcReg, 0, AMDGPU::sub0);
1865  B.buildInstr(AMDGPU::V_MOV_B32_e32)
1866  .addDef(TmpReg1)
1867  .addUse(SrcReg, 0, AMDGPU::sub1);
1868  B.buildInstr(AMDGPU::REG_SEQUENCE)
1869  .addDef(DstReg)
1870  .addUse(TmpReg0)
1871  .addImm(AMDGPU::sub0)
1872  .addUse(TmpReg1)
1873  .addImm(AMDGPU::sub1);
1874 
1875  return constrainGenericRegister(SrcReg, AMDGPU::SReg_64RegClass, MRI) &&
1876  constrainGenericRegister(DstReg, AMDGPU::VReg_64RegClass, MRI);
1877 }
1878 
1879 /// Utility function for pushing dynamic vector indexes with a constant offset
1880 /// into waterfall loops.
1882  MachineInstr &IdxUseInstr,
1883  unsigned OpIdx,
1884  unsigned ConstOffset) {
1885  MachineRegisterInfo &MRI = *B.getMRI();
1886  const LLT S32 = LLT::scalar(32);
1887  Register WaterfallIdx = IdxUseInstr.getOperand(OpIdx).getReg();
1888  B.setInsertPt(*IdxUseInstr.getParent(), IdxUseInstr.getIterator());
1889 
1890  auto MaterializedOffset = B.buildConstant(S32, ConstOffset);
1891 
1892  auto Add = B.buildAdd(S32, WaterfallIdx, MaterializedOffset);
1893  MRI.setRegBank(MaterializedOffset.getReg(0), AMDGPU::SGPRRegBank);
1894  MRI.setRegBank(Add.getReg(0), AMDGPU::SGPRRegBank);
1895  IdxUseInstr.getOperand(OpIdx).setReg(Add.getReg(0));
1896 }
1897 
1898 /// Implement extending a 32-bit value to a 64-bit value. \p Lo32Reg is the
1899 /// original 32-bit source value (to be inserted in the low part of the combined
1900 /// 64-bit result), and \p Hi32Reg is the high half of the combined 64-bit
1901 /// value.
1903  Register Hi32Reg, Register Lo32Reg,
1904  unsigned ExtOpc,
1905  const RegisterBank &RegBank,
1906  bool IsBooleanSrc = false) {
1907  if (ExtOpc == AMDGPU::G_ZEXT) {
1908  B.buildConstant(Hi32Reg, 0);
1909  } else if (ExtOpc == AMDGPU::G_SEXT) {
1910  if (IsBooleanSrc) {
1911  // If we know the original source was an s1, the high half is the same as
1912  // the low.
1913  B.buildCopy(Hi32Reg, Lo32Reg);
1914  } else {
1915  // Replicate sign bit from 32-bit extended part.
1916  auto ShiftAmt = B.buildConstant(LLT::scalar(32), 31);
1917  B.getMRI()->setRegBank(ShiftAmt.getReg(0), RegBank);
1918  B.buildAShr(Hi32Reg, Lo32Reg, ShiftAmt);
1919  }
1920  } else {
1921  assert(ExtOpc == AMDGPU::G_ANYEXT && "not an integer extension");
1922  B.buildUndef(Hi32Reg);
1923  }
1924 }
1925 
1926 bool AMDGPURegisterBankInfo::foldExtractEltToCmpSelect(
1928  const OperandsMapper &OpdMapper) const {
1929 
1930  Register VecReg = MI.getOperand(1).getReg();
1931  Register Idx = MI.getOperand(2).getReg();
1932 
1933  const RegisterBank &IdxBank =
1934  *OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
1935 
1936  bool IsDivergentIdx = IdxBank != AMDGPU::SGPRRegBank;
1937 
1938  LLT VecTy = MRI.getType(VecReg);
1939  unsigned EltSize = VecTy.getScalarSizeInBits();
1940  unsigned NumElem = VecTy.getNumElements();
1941 
1942  if (!SITargetLowering::shouldExpandVectorDynExt(EltSize, NumElem,
1943  IsDivergentIdx, &Subtarget))
1944  return false;
1945 
1947  LLT S32 = LLT::scalar(32);
1948 
1949  const RegisterBank &DstBank =
1950  *OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
1951  const RegisterBank &SrcBank =
1952  *OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
1953 
1954  const RegisterBank &CCBank =
1955  (DstBank == AMDGPU::SGPRRegBank &&
1956  SrcBank == AMDGPU::SGPRRegBank &&
1957  IdxBank == AMDGPU::SGPRRegBank) ? AMDGPU::SGPRRegBank
1958  : AMDGPU::VCCRegBank;
1959  LLT CCTy = (CCBank == AMDGPU::SGPRRegBank) ? S32 : LLT::scalar(1);
1960 
1961  if (CCBank == AMDGPU::VCCRegBank && IdxBank == AMDGPU::SGPRRegBank) {
1962  Idx = B.buildCopy(S32, Idx)->getOperand(0).getReg();
1963  MRI.setRegBank(Idx, AMDGPU::VGPRRegBank);
1964  }
1965 
1966  LLT EltTy = VecTy.getScalarType();
1967  SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
1968  unsigned NumLanes = DstRegs.size();
1969  if (!NumLanes)
1970  NumLanes = 1;
1971  else
1972  EltTy = MRI.getType(DstRegs[0]);
1973 
1974  auto UnmergeToEltTy = B.buildUnmerge(EltTy, VecReg);
1975  SmallVector<Register, 2> Res(NumLanes);
1976  for (unsigned L = 0; L < NumLanes; ++L)
1977  Res[L] = UnmergeToEltTy.getReg(L);
1978 
1979  for (unsigned I = 1; I < NumElem; ++I) {
1980  auto IC = B.buildConstant(S32, I);
1981  MRI.setRegBank(IC->getOperand(0).getReg(), AMDGPU::SGPRRegBank);
1982  auto Cmp = B.buildICmp(CmpInst::ICMP_EQ, CCTy, Idx, IC);
1983  MRI.setRegBank(Cmp->getOperand(0).getReg(), CCBank);
1984 
1985  for (unsigned L = 0; L < NumLanes; ++L) {
1986  auto S = B.buildSelect(EltTy, Cmp,
1987  UnmergeToEltTy.getReg(I * NumLanes + L), Res[L]);
1988 
1989  for (unsigned N : { 0, 2, 3 })
1990  MRI.setRegBank(S->getOperand(N).getReg(), DstBank);
1991 
1992  Res[L] = S->getOperand(0).getReg();
1993  }
1994  }
1995 
1996  for (unsigned L = 0; L < NumLanes; ++L) {
1997  Register DstReg = (NumLanes == 1) ? MI.getOperand(0).getReg() : DstRegs[L];
1998  B.buildCopy(DstReg, Res[L]);
1999  MRI.setRegBank(DstReg, DstBank);
2000  }
2001 
2002  MRI.setRegBank(MI.getOperand(0).getReg(), DstBank);
2003  MI.eraseFromParent();
2004 
2005  return true;
2006 }
2007 
2008 // Insert a cross regbank copy for a register if it already has a bank that
2009 // differs from the one we want to set.
2012  const RegisterBank &Bank) {
2013  const RegisterBank *CurrBank = MRI.getRegBankOrNull(Reg);
2014  if (CurrBank && *CurrBank != Bank) {
2015  Register Copy = B.buildCopy(MRI.getType(Reg), Reg).getReg(0);
2016  MRI.setRegBank(Copy, Bank);
2017  return Copy;
2018  }
2019 
2020  MRI.setRegBank(Reg, Bank);
2021  return Reg;
2022 }
2023 
2024 bool AMDGPURegisterBankInfo::foldInsertEltToCmpSelect(
2026  const OperandsMapper &OpdMapper) const {
2027 
2028  Register VecReg = MI.getOperand(1).getReg();
2029  Register Idx = MI.getOperand(3).getReg();
2030 
2031  const RegisterBank &IdxBank =
2032  *OpdMapper.getInstrMapping().getOperandMapping(3).BreakDown[0].RegBank;
2033 
2034  bool IsDivergentIdx = IdxBank != AMDGPU::SGPRRegBank;
2035 
2036  LLT VecTy = MRI.getType(VecReg);
2037  unsigned EltSize = VecTy.getScalarSizeInBits();
2038  unsigned NumElem = VecTy.getNumElements();
2039 
2040  if (!SITargetLowering::shouldExpandVectorDynExt(EltSize, NumElem,
2041  IsDivergentIdx, &Subtarget))
2042  return false;
2043 
2045  LLT S32 = LLT::scalar(32);
2046 
2047  const RegisterBank &DstBank =
2048  *OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2049  const RegisterBank &SrcBank =
2050  *OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2051  const RegisterBank &InsBank =
2052  *OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
2053 
2054  const RegisterBank &CCBank =
2055  (DstBank == AMDGPU::SGPRRegBank &&
2056  SrcBank == AMDGPU::SGPRRegBank &&
2057  InsBank == AMDGPU::SGPRRegBank &&
2058  IdxBank == AMDGPU::SGPRRegBank) ? AMDGPU::SGPRRegBank
2059  : AMDGPU::VCCRegBank;
2060  LLT CCTy = (CCBank == AMDGPU::SGPRRegBank) ? S32 : LLT::scalar(1);
2061 
2062  if (CCBank == AMDGPU::VCCRegBank && IdxBank == AMDGPU::SGPRRegBank) {
2063  Idx = B.buildCopy(S32, Idx)->getOperand(0).getReg();
2064  MRI.setRegBank(Idx, AMDGPU::VGPRRegBank);
2065  }
2066 
2067  LLT EltTy = VecTy.getScalarType();
2068  SmallVector<Register, 2> InsRegs(OpdMapper.getVRegs(2));
2069  unsigned NumLanes = InsRegs.size();
2070  if (!NumLanes) {
2071  NumLanes = 1;
2072  InsRegs.push_back(MI.getOperand(2).getReg());
2073  } else {
2074  EltTy = MRI.getType(InsRegs[0]);
2075  }
2076 
2077  auto UnmergeToEltTy = B.buildUnmerge(EltTy, VecReg);
2078  SmallVector<Register, 16> Ops(NumElem * NumLanes);
2079 
2080  for (unsigned I = 0; I < NumElem; ++I) {
2081  auto IC = B.buildConstant(S32, I);
2082  MRI.setRegBank(IC->getOperand(0).getReg(), AMDGPU::SGPRRegBank);
2083  auto Cmp = B.buildICmp(CmpInst::ICMP_EQ, CCTy, Idx, IC);
2084  MRI.setRegBank(Cmp->getOperand(0).getReg(), CCBank);
2085 
2086  for (unsigned L = 0; L < NumLanes; ++L) {
2087  Register Op0 = constrainRegToBank(MRI, B, InsRegs[L], DstBank);
2088  Register Op1 = UnmergeToEltTy.getReg(I * NumLanes + L);
2089  Op1 = constrainRegToBank(MRI, B, Op1, DstBank);
2090 
2091  Register Select = B.buildSelect(EltTy, Cmp, Op0, Op1).getReg(0);
2092  MRI.setRegBank(Select, DstBank);
2093 
2094  Ops[I * NumLanes + L] = Select;
2095  }
2096  }
2097 
2098  LLT MergeTy = LLT::fixed_vector(Ops.size(), EltTy);
2099  if (MergeTy == MRI.getType(MI.getOperand(0).getReg())) {
2100  B.buildBuildVector(MI.getOperand(0), Ops);
2101  } else {
2102  auto Vec = B.buildBuildVector(MergeTy, Ops);
2103  MRI.setRegBank(Vec->getOperand(0).getReg(), DstBank);
2104  B.buildBitcast(MI.getOperand(0).getReg(), Vec);
2105  }
2106 
2107  MRI.setRegBank(MI.getOperand(0).getReg(), DstBank);
2108  MI.eraseFromParent();
2109 
2110  return true;
2111 }
2112 
2114  const OperandsMapper &OpdMapper) const {
2115  MachineInstr &MI = OpdMapper.getMI();
2116  unsigned Opc = MI.getOpcode();
2117  MachineRegisterInfo &MRI = OpdMapper.getMRI();
2118  switch (Opc) {
2119  case AMDGPU::G_PHI: {
2120  Register DstReg = MI.getOperand(0).getReg();
2121  LLT DstTy = MRI.getType(DstReg);
2122  if (DstTy != LLT::scalar(1))
2123  break;
2124 
2125  const LLT S32 = LLT::scalar(32);
2126  const RegisterBank *DstBank =
2127  OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2128  if (DstBank == &AMDGPU::VCCRegBank) {
2129  applyDefaultMapping(OpdMapper);
2130  // The standard handling only considers the result register bank for
2131  // phis. For VCC, blindly inserting a copy when the phi is lowered will
2132  // produce an invalid copy. We can only copy with some kind of compare to
2133  // get a vector boolean result. Insert a register bank copy that will be
2134  // correctly lowered to a compare.
2135  MachineIRBuilder B(*MI.getParent()->getParent());
2136 
2137  for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
2138  Register SrcReg = MI.getOperand(I).getReg();
2139  const RegisterBank *SrcBank = getRegBank(SrcReg, MRI, *TRI);
2140 
2141  if (SrcBank != &AMDGPU::VCCRegBank) {
2142  MachineBasicBlock *SrcMBB = MI.getOperand(I + 1).getMBB();
2143  B.setInsertPt(*SrcMBB, SrcMBB->getFirstTerminator());
2144 
2145  auto Copy = B.buildCopy(LLT::scalar(1), SrcReg);
2146  MRI.setRegBank(Copy.getReg(0), AMDGPU::VCCRegBank);
2147  MI.getOperand(I).setReg(Copy.getReg(0));
2148  }
2149  }
2150 
2151  return;
2152  }
2153 
2154  // Phi handling is strange and only considers the bank of the destination.
2155  substituteSimpleCopyRegs(OpdMapper, 0);
2156 
2157  // Promote SGPR/VGPR booleans to s32
2158  MachineFunction *MF = MI.getParent()->getParent();
2159  ApplyRegBankMapping ApplyBank(*this, MRI, DstBank);
2160  MachineIRBuilder B(MI, ApplyBank);
2161  LegalizerHelper Helper(*MF, ApplyBank, B);
2162 
2163  if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2164  llvm_unreachable("widen scalar should have succeeded");
2165 
2166  return;
2167  }
2168  case AMDGPU::G_ICMP:
2169  case AMDGPU::G_UADDO:
2170  case AMDGPU::G_USUBO:
2171  case AMDGPU::G_UADDE:
2172  case AMDGPU::G_SADDE:
2173  case AMDGPU::G_USUBE:
2174  case AMDGPU::G_SSUBE: {
2175  unsigned BoolDstOp = Opc == AMDGPU::G_ICMP ? 0 : 1;
2176  Register DstReg = MI.getOperand(BoolDstOp).getReg();
2177 
2178  const RegisterBank *DstBank =
2179  OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2180  if (DstBank != &AMDGPU::SGPRRegBank)
2181  break;
2182 
2183  const bool HasCarryIn = MI.getNumOperands() == 5;
2184 
2185  // If this is a scalar compare, promote the result to s32, as the selection
2186  // will end up using a copy to a 32-bit vreg.
2187  const LLT S32 = LLT::scalar(32);
2188  Register NewDstReg = MRI.createGenericVirtualRegister(S32);
2189  MRI.setRegBank(NewDstReg, AMDGPU::SGPRRegBank);
2190  MI.getOperand(BoolDstOp).setReg(NewDstReg);
2192 
2193  if (HasCarryIn) {
2194  Register NewSrcReg = MRI.createGenericVirtualRegister(S32);
2195  MRI.setRegBank(NewSrcReg, AMDGPU::SGPRRegBank);
2196  B.buildZExt(NewSrcReg, MI.getOperand(4).getReg());
2197  MI.getOperand(4).setReg(NewSrcReg);
2198  }
2199 
2200  MachineBasicBlock *MBB = MI.getParent();
2201  B.setInsertPt(*MBB, std::next(MI.getIterator()));
2202 
2203  // If we had a constrained VCC result register, a copy was inserted to VCC
2204  // from SGPR.
2205  SmallVector<Register, 1> DefRegs(OpdMapper.getVRegs(0));
2206  if (DefRegs.empty())
2207  DefRegs.push_back(DstReg);
2208  B.buildTrunc(DefRegs[0], NewDstReg);
2209  return;
2210  }
2211  case AMDGPU::G_SELECT: {
2212  Register DstReg = MI.getOperand(0).getReg();
2213  LLT DstTy = MRI.getType(DstReg);
2214 
2215  SmallVector<Register, 1> CondRegs(OpdMapper.getVRegs(1));
2216  if (CondRegs.empty())
2217  CondRegs.push_back(MI.getOperand(1).getReg());
2218  else {
2219  assert(CondRegs.size() == 1);
2220  }
2221 
2222  const RegisterBank *CondBank = getRegBank(CondRegs[0], MRI, *TRI);
2223  if (CondBank == &AMDGPU::SGPRRegBank) {
2225  const LLT S32 = LLT::scalar(32);
2226  Register NewCondReg = MRI.createGenericVirtualRegister(S32);
2227  MRI.setRegBank(NewCondReg, AMDGPU::SGPRRegBank);
2228 
2229  MI.getOperand(1).setReg(NewCondReg);
2230  B.buildZExt(NewCondReg, CondRegs[0]);
2231  }
2232 
2233  if (DstTy.getSizeInBits() != 64)
2234  break;
2235 
2237  LLT HalfTy = getHalfSizedType(DstTy);
2238 
2239  SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2240  SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2241  SmallVector<Register, 2> Src2Regs(OpdMapper.getVRegs(3));
2242 
2243  // All inputs are SGPRs, nothing special to do.
2244  if (DefRegs.empty()) {
2245  assert(Src1Regs.empty() && Src2Regs.empty());
2246  break;
2247  }
2248 
2249  if (Src1Regs.empty())
2250  split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2251  else {
2252  setRegsToType(MRI, Src1Regs, HalfTy);
2253  }
2254 
2255  if (Src2Regs.empty())
2256  split64BitValueForMapping(B, Src2Regs, HalfTy, MI.getOperand(3).getReg());
2257  else
2258  setRegsToType(MRI, Src2Regs, HalfTy);
2259 
2260  setRegsToType(MRI, DefRegs, HalfTy);
2261 
2262  B.buildSelect(DefRegs[0], CondRegs[0], Src1Regs[0], Src2Regs[0]);
2263  B.buildSelect(DefRegs[1], CondRegs[0], Src1Regs[1], Src2Regs[1]);
2264 
2265  MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2266  MI.eraseFromParent();
2267  return;
2268  }
2269  case AMDGPU::G_BRCOND: {
2270  Register CondReg = MI.getOperand(0).getReg();
2271  // FIXME: Should use legalizer helper, but should change bool ext type.
2272  const RegisterBank *CondBank =
2273  OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2274 
2275  if (CondBank == &AMDGPU::SGPRRegBank) {
2277  const LLT S32 = LLT::scalar(32);
2278  Register NewCondReg = MRI.createGenericVirtualRegister(S32);
2279  MRI.setRegBank(NewCondReg, AMDGPU::SGPRRegBank);
2280 
2281  MI.getOperand(0).setReg(NewCondReg);
2282  B.buildZExt(NewCondReg, CondReg);
2283  return;
2284  }
2285 
2286  break;
2287  }
2288  case AMDGPU::G_AND:
2289  case AMDGPU::G_OR:
2290  case AMDGPU::G_XOR: {
2291  // 64-bit and is only available on the SALU, so split into 2 32-bit ops if
2292  // there is a VGPR input.
2293  Register DstReg = MI.getOperand(0).getReg();
2294  LLT DstTy = MRI.getType(DstReg);
2295 
2296  if (DstTy.getSizeInBits() == 1) {
2297  const RegisterBank *DstBank =
2298  OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2299  if (DstBank == &AMDGPU::VCCRegBank)
2300  break;
2301 
2302  MachineFunction *MF = MI.getParent()->getParent();
2303  ApplyRegBankMapping ApplyBank(*this, MRI, DstBank);
2304  MachineIRBuilder B(MI, ApplyBank);
2305  LegalizerHelper Helper(*MF, ApplyBank, B);
2306 
2307  if (Helper.widenScalar(MI, 0, LLT::scalar(32)) !=
2309  llvm_unreachable("widen scalar should have succeeded");
2310  return;
2311  }
2312 
2313  if (DstTy.getSizeInBits() != 64)
2314  break;
2315 
2316  LLT HalfTy = getHalfSizedType(DstTy);
2317  SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2318  SmallVector<Register, 2> Src0Regs(OpdMapper.getVRegs(1));
2319  SmallVector<Register, 2> Src1Regs(OpdMapper.getVRegs(2));
2320 
2321  // All inputs are SGPRs, nothing special to do.
2322  if (DefRegs.empty()) {
2323  assert(Src0Regs.empty() && Src1Regs.empty());
2324  break;
2325  }
2326 
2327  assert(DefRegs.size() == 2);
2328  assert(Src0Regs.size() == Src1Regs.size() &&
2329  (Src0Regs.empty() || Src0Regs.size() == 2));
2330 
2331  // Depending on where the source registers came from, the generic code may
2332  // have decided to split the inputs already or not. If not, we still need to
2333  // extract the values.
2335 
2336  if (Src0Regs.empty())
2337  split64BitValueForMapping(B, Src0Regs, HalfTy, MI.getOperand(1).getReg());
2338  else
2339  setRegsToType(MRI, Src0Regs, HalfTy);
2340 
2341  if (Src1Regs.empty())
2342  split64BitValueForMapping(B, Src1Regs, HalfTy, MI.getOperand(2).getReg());
2343  else
2344  setRegsToType(MRI, Src1Regs, HalfTy);
2345 
2346  setRegsToType(MRI, DefRegs, HalfTy);
2347 
2348  B.buildInstr(Opc, {DefRegs[0]}, {Src0Regs[0], Src1Regs[0]});
2349  B.buildInstr(Opc, {DefRegs[1]}, {Src0Regs[1], Src1Regs[1]});
2350 
2351  MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2352  MI.eraseFromParent();
2353  return;
2354  }
2355  case AMDGPU::G_ABS: {
2356  Register SrcReg = MI.getOperand(1).getReg();
2357  const RegisterBank *SrcBank = MRI.getRegBankOrNull(SrcReg);
2358 
2359  // There is no VALU abs instruction so we need to replace it with a sub and
2360  // max combination.
2361  if (SrcBank && SrcBank == &AMDGPU::VGPRRegBank) {
2362  MachineFunction *MF = MI.getParent()->getParent();
2363  ApplyRegBankMapping Apply(*this, MRI, &AMDGPU::VGPRRegBank);
2364  MachineIRBuilder B(MI, Apply);
2365  LegalizerHelper Helper(*MF, Apply, B);
2366 
2368  llvm_unreachable("lowerAbsToMaxNeg should have succeeded");
2369  return;
2370  }
2371  [[fallthrough]];
2372  }
2373  case AMDGPU::G_ADD:
2374  case AMDGPU::G_SUB:
2375  case AMDGPU::G_MUL:
2376  case AMDGPU::G_SHL:
2377  case AMDGPU::G_LSHR:
2378  case AMDGPU::G_ASHR:
2379  case AMDGPU::G_SMIN:
2380  case AMDGPU::G_SMAX:
2381  case AMDGPU::G_UMIN:
2382  case AMDGPU::G_UMAX: {
2383  Register DstReg = MI.getOperand(0).getReg();
2384  LLT DstTy = MRI.getType(DstReg);
2385 
2386  // 16-bit operations are VALU only, but can be promoted to 32-bit SALU.
2387  // Packed 16-bit operations need to be scalarized and promoted.
2388  if (DstTy != LLT::scalar(16) && DstTy != LLT::fixed_vector(2, 16))
2389  break;
2390 
2391  const RegisterBank *DstBank =
2392  OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2393  if (DstBank == &AMDGPU::VGPRRegBank)
2394  break;
2395 
2396  const LLT S32 = LLT::scalar(32);
2397  MachineBasicBlock *MBB = MI.getParent();
2398  MachineFunction *MF = MBB->getParent();
2399  ApplyRegBankMapping ApplySALU(*this, MRI, &AMDGPU::SGPRRegBank);
2400  MachineIRBuilder B(MI, ApplySALU);
2401 
2402  if (DstTy.isVector()) {
2403  Register WideSrc0Lo, WideSrc0Hi;
2404  Register WideSrc1Lo, WideSrc1Hi;
2405 
2406  unsigned ExtendOp = getExtendOp(MI.getOpcode());
2407  std::tie(WideSrc0Lo, WideSrc0Hi)
2408  = unpackV2S16ToS32(B, MI.getOperand(1).getReg(), ExtendOp);
2409  std::tie(WideSrc1Lo, WideSrc1Hi)
2410  = unpackV2S16ToS32(B, MI.getOperand(2).getReg(), ExtendOp);
2411  auto Lo = B.buildInstr(MI.getOpcode(), {S32}, {WideSrc0Lo, WideSrc1Lo});
2412  auto Hi = B.buildInstr(MI.getOpcode(), {S32}, {WideSrc0Hi, WideSrc1Hi});
2413  B.buildBuildVectorTrunc(DstReg, {Lo.getReg(0), Hi.getReg(0)});
2414  MI.eraseFromParent();
2415  } else {
2416  LegalizerHelper Helper(*MF, ApplySALU, B);
2417 
2418  if (Helper.widenScalar(MI, 0, S32) != LegalizerHelper::Legalized)
2419  llvm_unreachable("widen scalar should have succeeded");
2420 
2421  // FIXME: s16 shift amounts should be legal.
2422  if (Opc == AMDGPU::G_SHL || Opc == AMDGPU::G_LSHR ||
2423  Opc == AMDGPU::G_ASHR) {
2424  B.setInsertPt(*MBB, MI.getIterator());
2425  if (Helper.widenScalar(MI, 1, S32) != LegalizerHelper::Legalized)
2426  llvm_unreachable("widen scalar should have succeeded");
2427  }
2428  }
2429 
2430  return;
2431  }
2432  case AMDGPU::G_SEXT_INREG: {
2433  SmallVector<Register, 2> SrcRegs(OpdMapper.getVRegs(1));
2434  if (SrcRegs.empty())
2435  break; // Nothing to repair
2436 
2437  const LLT S32 = LLT::scalar(32);
2439  ApplyRegBankMapping O(*this, MRI, &AMDGPU::VGPRRegBank);
2440  GISelObserverWrapper Observer(&O);
2441  B.setChangeObserver(Observer);
2442 
2443  // Don't use LegalizerHelper's narrowScalar. It produces unwanted G_SEXTs
2444  // we would need to further expand, and doesn't let us directly set the
2445  // result registers.
2446  SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
2447 
2448  int Amt = MI.getOperand(2).getImm();
2449  if (Amt <= 32) {
2450  // Downstream users have expectations for the high bit behavior, so freeze
2451  // incoming undefined bits.
2452  if (Amt == 32) {
2453  // The low bits are unchanged.
2454  B.buildFreeze(DstRegs[0], SrcRegs[0]);
2455  } else {
2456  auto Freeze = B.buildFreeze(S32, SrcRegs[0]);
2457  // Extend in the low bits and propagate the sign bit to the high half.
2458  B.buildSExtInReg(DstRegs[0], Freeze, Amt);
2459  }
2460 
2461  B.buildAShr(DstRegs[1], DstRegs[0], B.buildConstant(S32, 31));
2462  } else {
2463  // The low bits are unchanged, and extend in the high bits.
2464  // No freeze required
2465  B.buildCopy(DstRegs[0], SrcRegs[0]);
2466  B.buildSExtInReg(DstRegs[1], DstRegs[0], Amt - 32);
2467  }
2468 
2469  Register DstReg = MI.getOperand(0).getReg();
2470  MRI.setRegBank(DstReg, AMDGPU::VGPRRegBank);
2471  MI.eraseFromParent();
2472  return;
2473  }
2474  case AMDGPU::G_CTPOP:
2475  case AMDGPU::G_BITREVERSE: {
2476  const RegisterBank *DstBank =
2477  OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2478  if (DstBank == &AMDGPU::SGPRRegBank)
2479  break;
2480 
2481  Register SrcReg = MI.getOperand(1).getReg();
2482  const LLT S32 = LLT::scalar(32);
2483  LLT Ty = MRI.getType(SrcReg);
2484  if (Ty == S32)
2485  break;
2486 
2487  ApplyRegBankMapping ApplyVALU(*this, MRI, &AMDGPU::VGPRRegBank);
2488  MachineIRBuilder B(MI, ApplyVALU);
2489 
2490  MachineFunction &MF = B.getMF();
2491  LegalizerHelper Helper(MF, ApplyVALU, B);
2492 
2493  if (Helper.narrowScalar(MI, 1, S32) != LegalizerHelper::Legalized)
2494  llvm_unreachable("narrowScalar should have succeeded");
2495  return;
2496  }
2497  case AMDGPU::G_AMDGPU_FFBH_U32:
2498  case AMDGPU::G_AMDGPU_FFBL_B32:
2499  case AMDGPU::G_CTLZ_ZERO_UNDEF:
2500  case AMDGPU::G_CTTZ_ZERO_UNDEF: {
2501  const RegisterBank *DstBank =
2502  OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2503  if (DstBank == &AMDGPU::SGPRRegBank)
2504  break;
2505 
2506  Register SrcReg = MI.getOperand(1).getReg();
2507  const LLT S32 = LLT::scalar(32);
2508  LLT Ty = MRI.getType(SrcReg);
2509  if (Ty == S32)
2510  break;
2511 
2512  // We can narrow this more efficiently than Helper can by using ffbh/ffbl
2513  // which return -1 when the input is zero:
2514  // (ctlz_zero_undef hi:lo) -> (umin (ffbh hi), (add (ffbh lo), 32))
2515  // (cttz_zero_undef hi:lo) -> (umin (add (ffbl hi), 32), (ffbl lo))
2516  // (ffbh hi:lo) -> (umin (ffbh hi), (uaddsat (ffbh lo), 32))
2517  // (ffbl hi:lo) -> (umin (uaddsat (ffbh hi), 32), (ffbh lo))
2518  ApplyRegBankMapping ApplyVALU(*this, MRI, &AMDGPU::VGPRRegBank);
2519  MachineIRBuilder B(MI, ApplyVALU);
2520  SmallVector<Register, 2> SrcRegs(OpdMapper.getVRegs(1));
2521  unsigned NewOpc = Opc == AMDGPU::G_CTLZ_ZERO_UNDEF
2522  ? (unsigned)AMDGPU::G_AMDGPU_FFBH_U32
2523  : Opc == AMDGPU::G_CTTZ_ZERO_UNDEF
2524  ? (unsigned)AMDGPU::G_AMDGPU_FFBL_B32
2525  : Opc;
2526  unsigned Idx = NewOpc == AMDGPU::G_AMDGPU_FFBH_U32;
2527  auto X = B.buildInstr(NewOpc, {S32}, {SrcRegs[Idx]});
2528  auto Y = B.buildInstr(NewOpc, {S32}, {SrcRegs[Idx ^ 1]});
2529  unsigned AddOpc =
2530  Opc == AMDGPU::G_CTLZ_ZERO_UNDEF || Opc == AMDGPU::G_CTTZ_ZERO_UNDEF
2531  ? AMDGPU::G_ADD
2532  : AMDGPU::G_UADDSAT;
2533  Y = B.buildInstr(AddOpc, {S32}, {Y, B.buildConstant(S32, 32)});
2534  Register DstReg = MI.getOperand(0).getReg();
2535  B.buildUMin(DstReg, X, Y);
2536  MI.eraseFromParent();
2537  return;
2538  }
2539  case AMDGPU::G_SEXT:
2540  case AMDGPU::G_ZEXT:
2541  case AMDGPU::G_ANYEXT: {
2542  Register SrcReg = MI.getOperand(1).getReg();
2543  LLT SrcTy = MRI.getType(SrcReg);
2544  const bool Signed = Opc == AMDGPU::G_SEXT;
2545 
2546  assert(OpdMapper.getVRegs(1).empty());
2547 
2549  const RegisterBank *SrcBank =
2550  OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2551 
2552  Register DstReg = MI.getOperand(0).getReg();
2553  LLT DstTy = MRI.getType(DstReg);
2554  if (DstTy.isScalar() &&
2555  SrcBank != &AMDGPU::SGPRRegBank &&
2556  SrcBank != &AMDGPU::VCCRegBank &&
2557  // FIXME: Should handle any type that round to s64 when irregular
2558  // breakdowns supported.
2559  DstTy.getSizeInBits() == 64 &&
2560  SrcTy.getSizeInBits() <= 32) {
2561  SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2562 
2563  // Extend to 32-bit, and then extend the low half.
2564  if (Signed) {
2565  // TODO: Should really be buildSExtOrCopy
2566  B.buildSExtOrTrunc(DefRegs[0], SrcReg);
2567  } else if (Opc == AMDGPU::G_ZEXT) {
2568  B.buildZExtOrTrunc(DefRegs[0], SrcReg);
2569  } else {
2570  B.buildAnyExtOrTrunc(DefRegs[0], SrcReg);
2571  }
2572 
2573  extendLow32IntoHigh32(B, DefRegs[1], DefRegs[0], Opc, *SrcBank);
2574  MRI.setRegBank(DstReg, *SrcBank);
2575  MI.eraseFromParent();
2576  return;
2577  }
2578 
2579  if (SrcTy != LLT::scalar(1))
2580  return;
2581 
2582  // It is not legal to have a legalization artifact with a VCC source. Rather
2583  // than introducing a copy, insert the select we would have to select the
2584  // copy to.
2585  if (SrcBank == &AMDGPU::VCCRegBank) {
2586  SmallVector<Register, 2> DefRegs(OpdMapper.getVRegs(0));
2587 
2588  const RegisterBank *DstBank = &AMDGPU::VGPRRegBank;
2589 
2590  unsigned DstSize = DstTy.getSizeInBits();
2591  // 64-bit select is SGPR only
2592  const bool UseSel64 = DstSize > 32 &&
2593  SrcBank->getID() == AMDGPU::SGPRRegBankID;
2594 
2595  // TODO: Should s16 select be legal?
2596  LLT SelType = UseSel64 ? LLT::scalar(64) : LLT::scalar(32);
2597  auto True = B.buildConstant(SelType, Signed ? -1 : 1);
2598  auto False = B.buildConstant(SelType, 0);
2599 
2600  MRI.setRegBank(True.getReg(0), *DstBank);
2601  MRI.setRegBank(False.getReg(0), *DstBank);
2602  MRI.setRegBank(DstReg, *DstBank);
2603 
2604  if (DstSize > 32) {
2605  B.buildSelect(DefRegs[0], SrcReg, True, False);
2606  extendLow32IntoHigh32(B, DefRegs[1], DefRegs[0], Opc, *SrcBank, true);
2607  } else if (DstSize < 32) {
2608  auto Sel = B.buildSelect(SelType, SrcReg, True, False);
2609  MRI.setRegBank(Sel.getReg(0), *DstBank);
2610  B.buildTrunc(DstReg, Sel);
2611  } else {
2612  B.buildSelect(DstReg, SrcReg, True, False);
2613  }
2614 
2615  MI.eraseFromParent();
2616  return;
2617  }
2618 
2619  break;
2620  }
2621  case AMDGPU::G_EXTRACT_VECTOR_ELT: {
2622  SmallVector<Register, 2> DstRegs(OpdMapper.getVRegs(0));
2623 
2624  assert(OpdMapper.getVRegs(1).empty() && OpdMapper.getVRegs(2).empty());
2625 
2626  Register DstReg = MI.getOperand(0).getReg();
2627  Register SrcReg = MI.getOperand(1).getReg();
2628 
2629  const LLT S32 = LLT::scalar(32);
2630  LLT DstTy = MRI.getType(DstReg);
2631  LLT SrcTy = MRI.getType(SrcReg);
2632 
2633  if (foldExtractEltToCmpSelect(MI, MRI, OpdMapper))
2634  return;
2635 
2637 
2638  const ValueMapping &DstMapping
2639  = OpdMapper.getInstrMapping().getOperandMapping(0);
2640  const RegisterBank *DstBank = DstMapping.BreakDown[0].RegBank;
2641  const RegisterBank *SrcBank =
2642  OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2643  const RegisterBank *IdxBank =
2644  OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
2645 
2646  Register BaseIdxReg;
2647  unsigned ConstOffset;
2648  std::tie(BaseIdxReg, ConstOffset) =
2649  AMDGPU::getBaseWithConstantOffset(MRI, MI.getOperand(2).getReg());
2650 
2651  // See if the index is an add of a constant which will be foldable by moving
2652  // the base register of the index later if this is going to be executed in a
2653  // waterfall loop. This is essentially to reassociate the add of a constant
2654  // with the readfirstlane.
2655  bool ShouldMoveIndexIntoLoop = IdxBank != &AMDGPU::SGPRRegBank &&
2656  ConstOffset > 0 &&
2657  ConstOffset < SrcTy.getNumElements();
2658 
2659  // Move the base register. We'll re-insert the add later.
2660  if (ShouldMoveIndexIntoLoop)
2661  MI.getOperand(2).setReg(BaseIdxReg);
2662 
2663  // If this is a VGPR result only because the index was a VGPR result, the
2664  // actual indexing will be done on the SGPR source vector, which will
2665  // produce a scalar result. We need to copy to the VGPR result inside the
2666  // waterfall loop.
2667  const bool NeedCopyToVGPR = DstBank == &AMDGPU::VGPRRegBank &&
2668  SrcBank == &AMDGPU::SGPRRegBank;
2669  if (DstRegs.empty()) {
2670  applyDefaultMapping(OpdMapper);
2671 
2672  executeInWaterfallLoop(MI, MRI, { 2 });
2673 
2674  if (NeedCopyToVGPR) {
2675  // We don't want a phi for this temporary reg.
2676  Register TmpReg = MRI.createGenericVirtualRegister(DstTy);
2677  MRI.setRegBank(TmpReg, AMDGPU::SGPRRegBank);
2678  MI.getOperand(0).setReg(TmpReg);
2679  B.setInsertPt(*MI.getParent(), ++MI.getIterator());
2680 
2681  // Use a v_mov_b32 here to make the exec dependency explicit.
2682  buildVCopy(B, DstReg, TmpReg);
2683  }
2684 
2685  // Re-insert the constant offset add inside the waterfall loop.
2686  if (ShouldMoveIndexIntoLoop)
2687  reinsertVectorIndexAdd(B, MI, 2, ConstOffset);
2688 
2689  return;
2690  }
2691 
2692  assert(DstTy.getSizeInBits() == 64);
2693 
2694  LLT Vec32 = LLT::fixed_vector(2 * SrcTy.getNumElements(), 32);
2695 
2696  auto CastSrc = B.buildBitcast(Vec32, SrcReg);
2697  auto One = B.buildConstant(S32, 1);
2698 
2699  MachineBasicBlock::iterator MII = MI.getIterator();
2700 
2701  // Split the vector index into 32-bit pieces. Prepare to move all of the
2702  // new instructions into a waterfall loop if necessary.
2703  //
2704  // Don't put the bitcast or constant in the loop.
2705  MachineInstrSpan Span(MII, &B.getMBB());
2706 
2707  // Compute 32-bit element indices, (2 * OrigIdx, 2 * OrigIdx + 1).
2708  auto IdxLo = B.buildShl(S32, BaseIdxReg, One);
2709  auto IdxHi = B.buildAdd(S32, IdxLo, One);
2710 
2711  auto Extract0 = B.buildExtractVectorElement(DstRegs[0], CastSrc, IdxLo);
2712  auto Extract1 = B.buildExtractVectorElement(DstRegs[1], CastSrc, IdxHi);
2713 
2714  MRI.setRegBank(DstReg, *DstBank);
2715  MRI.setRegBank(CastSrc.getReg(0), *SrcBank);
2716  MRI.setRegBank(One.getReg(0), AMDGPU::SGPRRegBank);
2717  MRI.setRegBank(IdxLo.getReg(0), AMDGPU::SGPRRegBank);
2718  MRI.setRegBank(IdxHi.getReg(0), AMDGPU::SGPRRegBank);
2719 
2720  SmallSet<Register, 4> OpsToWaterfall;
2721  if (!collectWaterfallOperands(OpsToWaterfall, MI, MRI, { 2 })) {
2722  MI.eraseFromParent();
2723  return;
2724  }
2725 
2726  // Remove the original instruction to avoid potentially confusing the
2727  // waterfall loop logic.
2728  B.setInstr(*Span.begin());
2729  MI.eraseFromParent();
2730  executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
2731  OpsToWaterfall, MRI);
2732 
2733  if (NeedCopyToVGPR) {
2734  MachineBasicBlock *LoopBB = Extract1->getParent();
2735  Register TmpReg0 = MRI.createGenericVirtualRegister(S32);
2736  Register TmpReg1 = MRI.createGenericVirtualRegister(S32);
2737  MRI.setRegBank(TmpReg0, AMDGPU::SGPRRegBank);
2738  MRI.setRegBank(TmpReg1, AMDGPU::SGPRRegBank);
2739 
2740  Extract0->getOperand(0).setReg(TmpReg0);
2741  Extract1->getOperand(0).setReg(TmpReg1);
2742 
2743  B.setInsertPt(*LoopBB, ++Extract1->getIterator());
2744 
2745  buildVCopy(B, DstRegs[0], TmpReg0);
2746  buildVCopy(B, DstRegs[1], TmpReg1);
2747  }
2748 
2749  if (ShouldMoveIndexIntoLoop)
2750  reinsertVectorIndexAdd(B, *IdxLo, 1, ConstOffset);
2751 
2752  return;
2753  }
2754  case AMDGPU::G_INSERT_VECTOR_ELT: {
2755  SmallVector<Register, 2> InsRegs(OpdMapper.getVRegs(2));
2756 
2757  Register DstReg = MI.getOperand(0).getReg();
2758  LLT VecTy = MRI.getType(DstReg);
2759 
2760  assert(OpdMapper.getVRegs(0).empty());
2761  assert(OpdMapper.getVRegs(3).empty());
2762 
2763  if (substituteSimpleCopyRegs(OpdMapper, 1))
2764  MRI.setType(MI.getOperand(1).getReg(), VecTy);
2765 
2766  if (foldInsertEltToCmpSelect(MI, MRI, OpdMapper))
2767  return;
2768 
2769  const RegisterBank *IdxBank =
2770  OpdMapper.getInstrMapping().getOperandMapping(3).BreakDown[0].RegBank;
2771 
2772  Register SrcReg = MI.getOperand(1).getReg();
2773  Register InsReg = MI.getOperand(2).getReg();
2774  LLT InsTy = MRI.getType(InsReg);
2775  (void)InsTy;
2776 
2777  Register BaseIdxReg;
2778  unsigned ConstOffset;
2779  std::tie(BaseIdxReg, ConstOffset) =
2780  AMDGPU::getBaseWithConstantOffset(MRI, MI.getOperand(3).getReg());
2781 
2782  // See if the index is an add of a constant which will be foldable by moving
2783  // the base register of the index later if this is going to be executed in a
2784  // waterfall loop. This is essentially to reassociate the add of a constant
2785  // with the readfirstlane.
2786  bool ShouldMoveIndexIntoLoop = IdxBank != &AMDGPU::SGPRRegBank &&
2787  ConstOffset > 0 &&
2788  ConstOffset < VecTy.getNumElements();
2789 
2790  // Move the base register. We'll re-insert the add later.
2791  if (ShouldMoveIndexIntoLoop)
2792  MI.getOperand(3).setReg(BaseIdxReg);
2793 
2794 
2795  if (InsRegs.empty()) {
2796  executeInWaterfallLoop(MI, MRI, { 3 });
2797 
2798  // Re-insert the constant offset add inside the waterfall loop.
2799  if (ShouldMoveIndexIntoLoop) {
2801  reinsertVectorIndexAdd(B, MI, 3, ConstOffset);
2802  }
2803 
2804  return;
2805  }
2806 
2807 
2808  assert(InsTy.getSizeInBits() == 64);
2809 
2810  const LLT S32 = LLT::scalar(32);
2811  LLT Vec32 = LLT::fixed_vector(2 * VecTy.getNumElements(), 32);
2812 
2814  auto CastSrc = B.buildBitcast(Vec32, SrcReg);
2815  auto One = B.buildConstant(S32, 1);
2816 
2817  // Split the vector index into 32-bit pieces. Prepare to move all of the
2818  // new instructions into a waterfall loop if necessary.
2819  //
2820  // Don't put the bitcast or constant in the loop.
2821  MachineInstrSpan Span(MachineBasicBlock::iterator(&MI), &B.getMBB());
2822 
2823  // Compute 32-bit element indices, (2 * OrigIdx, 2 * OrigIdx + 1).
2824  auto IdxLo = B.buildShl(S32, BaseIdxReg, One);
2825  auto IdxHi = B.buildAdd(S32, IdxLo, One);
2826 
2827  auto InsLo = B.buildInsertVectorElement(Vec32, CastSrc, InsRegs[0], IdxLo);
2828  auto InsHi = B.buildInsertVectorElement(Vec32, InsLo, InsRegs[1], IdxHi);
2829 
2830  const RegisterBank *DstBank =
2831  OpdMapper.getInstrMapping().getOperandMapping(0).BreakDown[0].RegBank;
2832  const RegisterBank *SrcBank =
2833  OpdMapper.getInstrMapping().getOperandMapping(1).BreakDown[0].RegBank;
2834  const RegisterBank *InsSrcBank =
2835  OpdMapper.getInstrMapping().getOperandMapping(2).BreakDown[0].RegBank;
2836 
2837  MRI.setRegBank(InsReg, *InsSrcBank);
2838  MRI.setRegBank(CastSrc.getReg(0), *SrcBank);
2839  MRI.setRegBank(InsLo.getReg(0), *DstBank);
2840  MRI.setRegBank(InsHi.getReg(0), *DstBank);
2841  MRI.setRegBank(One.getReg(0), AMDGPU::SGPRRegBank);
2842  MRI.setRegBank(IdxLo.getReg(0), AMDGPU::SGPRRegBank);
2843  MRI.setRegBank(IdxHi.getReg(0), AMDGPU::SGPRRegBank);
2844 
2845 
2846  SmallSet<Register, 4> OpsToWaterfall;
2847  if (!collectWaterfallOperands(OpsToWaterfall, MI, MRI, { 3 })) {
2848  B.setInsertPt(B.getMBB(), MI);
2849  B.buildBitcast(DstReg, InsHi);
2850  MI.eraseFromParent();
2851  return;
2852  }
2853 
2854  B.setInstr(*Span.begin());
2855  MI.eraseFromParent();
2856 
2857  // Figure out the point after the waterfall loop before mangling the control
2858  // flow.
2859  executeInWaterfallLoop(B, make_range(Span.begin(), Span.end()),
2860  OpsToWaterfall, MRI);
2861 
2862  // The insertion point is now right after the original instruction.
2863  //
2864  // Keep the bitcast to the original vector type out of the loop. Doing this
2865  // saved an extra phi we don't need inside the loop.
2866  B.buildBitcast(DstReg, InsHi);
2867 
2868  // Re-insert the constant offset add inside the waterfall loop.
2869  if (ShouldMoveIndexIntoLoop)
2870  reinsertVectorIndexAdd(B, *IdxLo, 1, ConstOffset);
2871 
2872  return;
2873  }
2874  case AMDGPU::G_AMDGPU_BUFFER_LOAD:
2875  case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT:
2876  case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT:
2877  case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE:
2878  case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE:
2879  case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT:
2880  case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_TFE:
2881  case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_D16:
2882  case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT:
2883  case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT_D16:
2884  case AMDGPU::G_AMDGPU_BUFFER_STORE:
2885  case AMDGPU::G_AMDGPU_BUFFER_STORE_BYTE:
2886  case AMDGPU::G_AMDGPU_BUFFER_STORE_SHORT:
2887  case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT:
2888  case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT_D16:
2889  case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT:
2890  case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT_D16: {
2891  applyDefaultMapping(OpdMapper);
2892  executeInWaterfallLoop(MI, MRI, {1, 4});
2893  return;
2894  }
2895  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:
2896  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:
2897  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:
2898  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:
2899  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:
2900  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:
2901  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:
2902  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:
2903  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_OR:
2904  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_XOR:
2905  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_INC:
2906  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_DEC: {
2907  applyDefaultMapping(OpdMapper);
2908  executeInWaterfallLoop(MI, MRI, {2, 5});
2909  return;
2910  }
2911  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FADD:
2912  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMIN:
2913  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMAX: {
2914  applyDefaultMapping(OpdMapper);
2915  executeInWaterfallLoop(MI, MRI, {2, 5});
2916  return;
2917  }
2918  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_CMPSWAP: {
2919  applyDefaultMapping(OpdMapper);
2920  executeInWaterfallLoop(MI, MRI, {3, 6});
2921  return;
2922  }
2923  case AMDGPU::G_AMDGPU_S_BUFFER_LOAD: {
2924  applyMappingSBufferLoad(OpdMapper);
2925  return;
2926  }
2927  case AMDGPU::G_INTRINSIC: {
2928  switch (MI.getIntrinsicID()) {
2929  case Intrinsic::amdgcn_readlane: {
2930  substituteSimpleCopyRegs(OpdMapper, 2);
2931 
2932  assert(OpdMapper.getVRegs(0).empty());
2933  assert(OpdMapper.getVRegs(3).empty());
2934 
2935  // Make sure the index is an SGPR. It doesn't make sense to run this in a
2936  // waterfall loop, so assume it's a uniform value.
2937  constrainOpWithReadfirstlane(MI, MRI, 3); // Index
2938  return;
2939  }
2940  case Intrinsic::amdgcn_writelane: {
2941  assert(OpdMapper.getVRegs(0).empty());
2942  assert(OpdMapper.getVRegs(2).empty());
2943  assert(OpdMapper.getVRegs(3).empty());
2944 
2945  substituteSimpleCopyRegs(OpdMapper, 4); // VGPR input val
2946  constrainOpWithReadfirstlane(MI, MRI, 2); // Source value
2947  constrainOpWithReadfirstlane(MI, MRI, 3); // Index
2948  return;
2949  }
2950  case Intrinsic::amdgcn_interp_p1:
2951  case Intrinsic::amdgcn_interp_p2:
2952  case Intrinsic::amdgcn_interp_mov:
2953  case Intrinsic::amdgcn_interp_p1_f16:
2954  case Intrinsic::amdgcn_interp_p2_f16:
2955  case Intrinsic::amdgcn_lds_param_load: {
2956  applyDefaultMapping(OpdMapper);
2957 
2958  // Readlane for m0 value, which is always the last operand.
2959  // FIXME: Should this be a waterfall loop instead?
2960  constrainOpWithReadfirstlane(MI, MRI, MI.getNumOperands() - 1); // Index
2961  return;
2962  }
2963  case Intrinsic::amdgcn_interp_inreg_p10:
2964  case Intrinsic::amdgcn_interp_inreg_p2:
2965  case Intrinsic::amdgcn_interp_inreg_p10_f16:
2966  case Intrinsic::amdgcn_interp_inreg_p2_f16:
2967  applyDefaultMapping(OpdMapper);
2968  return;
2969  case Intrinsic::amdgcn_permlane16:
2970  case Intrinsic::amdgcn_permlanex16: {
2971  // Doing a waterfall loop over these wouldn't make any sense.
2972  substituteSimpleCopyRegs(OpdMapper, 2);
2973  substituteSimpleCopyRegs(OpdMapper, 3);
2976  return;
2977  }
2978  case Intrinsic::amdgcn_sbfe:
2979  applyMappingBFE(OpdMapper, true);
2980  return;
2981  case Intrinsic::amdgcn_ubfe:
2982  applyMappingBFE(OpdMapper, false);
2983  return;
2984  case Intrinsic::amdgcn_ballot:
2985  // Use default handling and insert copy to vcc source.
2986  break;
2987  }
2988  break;
2989  }
2990  case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD:
2991  case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_D16:
2992  case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE:
2993  case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE_D16: {
2994  const AMDGPU::RsrcIntrinsic *RSrcIntrin
2995  = AMDGPU::lookupRsrcIntrinsic(MI.getIntrinsicID());
2996  assert(RSrcIntrin && RSrcIntrin->IsImage);
2997  // Non-images can have complications from operands that allow both SGPR
2998  // and VGPR. For now it's too complicated to figure out the final opcode
2999  // to derive the register bank from the MCInstrDesc.
3000  applyMappingImage(MI, OpdMapper, MRI, RSrcIntrin->RsrcArg);
3001  return;
3002  }
3003  case AMDGPU::G_AMDGPU_INTRIN_BVH_INTERSECT_RAY: {
3004  unsigned N = MI.getNumExplicitOperands() - 2;
3005  applyDefaultMapping(OpdMapper);
3006  executeInWaterfallLoop(MI, MRI, { N });
3007  return;
3008  }
3009  case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS: {
3010  auto IntrID = MI.getIntrinsicID();
3011  switch (IntrID) {
3012  case Intrinsic::amdgcn_ds_ordered_add:
3013  case Intrinsic::amdgcn_ds_ordered_swap: {
3014  // This is only allowed to execute with 1 lane, so readfirstlane is safe.
3015  assert(OpdMapper.getVRegs(0).empty());
3016  substituteSimpleCopyRegs(OpdMapper, 3);
3017  constrainOpWithReadfirstlane(MI, MRI, 2); // M0
3018  return;
3019  }
3020  case Intrinsic::amdgcn_ds_gws_init:
3021  case Intrinsic::amdgcn_ds_gws_barrier:
3022  case Intrinsic::amdgcn_ds_gws_sema_br: {
3023  // Only the first lane is executes, so readfirstlane is safe.
3024  substituteSimpleCopyRegs(OpdMapper, 1);
3025  constrainOpWithReadfirstlane(MI, MRI, 2); // M0
3026  return;
3027  }
3028  case Intrinsic::amdgcn_ds_gws_sema_v:
3029  case Intrinsic::amdgcn_ds_gws_sema_p:
3030  case Intrinsic::amdgcn_ds_gws_sema_release_all: {
3031  // Only the first lane is executes, so readfirstlane is safe.
3032  constrainOpWithReadfirstlane(MI, MRI, 1); // M0
3033  return;
3034  }
3035  case Intrinsic::amdgcn_ds_append:
3036  case Intrinsic::amdgcn_ds_consume: {
3037  constrainOpWithReadfirstlane(MI, MRI, 2); // M0
3038  return;
3039  }
3040  case Intrinsic::amdgcn_s_sendmsg:
3041  case Intrinsic::amdgcn_s_sendmsghalt: {
3042  // FIXME: Should this use a waterfall loop?
3043  constrainOpWithReadfirstlane(MI, MRI, 2); // M0
3044  return;
3045  }
3046  case Intrinsic::amdgcn_s_setreg: {
3048  return;
3049  }
3050  case Intrinsic::amdgcn_raw_buffer_load_lds: {
3051  applyDefaultMapping(OpdMapper);
3052  constrainOpWithReadfirstlane(MI, MRI, 1); // rsrc
3053  constrainOpWithReadfirstlane(MI, MRI, 2); // M0
3054  constrainOpWithReadfirstlane(MI, MRI, 5); // soffset
3055  return;
3056  }
3057  case Intrinsic::amdgcn_struct_buffer_load_lds: {
3058  applyDefaultMapping(OpdMapper);
3059  constrainOpWithReadfirstlane(MI, MRI, 1); // rsrc
3060  constrainOpWithReadfirstlane(MI, MRI, 2); // M0
3061  constrainOpWithReadfirstlane(MI, MRI, 6); // soffset
3062  return;
3063  }
3064  case Intrinsic::amdgcn_global_load_lds: {
3065  applyDefaultMapping(OpdMapper);
3067  return;
3068  }
3069  case Intrinsic::amdgcn_lds_direct_load: {
3070  applyDefaultMapping(OpdMapper);
3071  // Readlane for m0 value, which is always the last operand.
3072  constrainOpWithReadfirstlane(MI, MRI, MI.getNumOperands() - 1); // Index
3073  return;
3074  }
3075  case Intrinsic::amdgcn_exp_row:
3076  applyDefaultMapping(OpdMapper);
3077  constrainOpWithReadfirstlane(MI, MRI, 8); // M0
3078  return;
3079  default: {
3080  if (const AMDGPU::RsrcIntrinsic *RSrcIntrin =
3081  AMDGPU::lookupRsrcIntrinsic(IntrID)) {
3082  // Non-images can have complications from operands that allow both SGPR
3083  // and VGPR. For now it's too complicated to figure out the final opcode
3084  // to derive the register bank from the MCInstrDesc.
3085  if (RSrcIntrin->IsImage) {
3086  applyMappingImage(MI, OpdMapper, MRI, RSrcIntrin->RsrcArg);
3087  return;
3088  }
3089  }
3090 
3091  break;
3092  }
3093  }
3094  break;
3095  }
3096  case AMDGPU::G_SI_CALL: {
3097  // Use a set to avoid extra readfirstlanes in the case where multiple
3098  // operands are the same register.
3099  SmallSet<Register, 4> SGPROperandRegs;
3100 
3101  if (!collectWaterfallOperands(SGPROperandRegs, MI, MRI, {1}))
3102  break;
3103 
3104  // Move all copies to physical SGPRs that are used by the call instruction
3105  // into the loop block. Start searching for these copies until the
3106  // ADJCALLSTACKUP.
3107  unsigned FrameSetupOpcode = AMDGPU::ADJCALLSTACKUP;
3108  unsigned FrameDestroyOpcode = AMDGPU::ADJCALLSTACKDOWN;
3109 
3110  // Move all non-copies before the copies, so that a complete range can be
3111  // moved into the waterfall loop.
3112  SmallVector<MachineInstr *, 4> NonCopyInstrs;
3113  // Count of NonCopyInstrs found until the current LastCopy.
3114  unsigned NonCopyInstrsLen = 0;
3116  MachineBasicBlock::iterator LastCopy = Start;
3117  MachineBasicBlock *MBB = MI.getParent();
3118  const SIMachineFunctionInfo *Info =
3120  while (Start->getOpcode() != FrameSetupOpcode) {
3121  --Start;
3122  bool IsCopy = false;
3123  if (Start->getOpcode() == AMDGPU::COPY) {
3124  auto &Dst = Start->getOperand(0);
3125  if (Dst.isReg()) {
3126  Register Reg = Dst.getReg();
3127  if (Reg.isPhysical() && MI.readsRegister(Reg, TRI)) {
3128  IsCopy = true;
3129  } else {
3130  // Also move the copy from the scratch rsrc descriptor into the loop
3131  // to allow it to be optimized away.
3132  auto &Src = Start->getOperand(1);
3133  if (Src.isReg()) {
3134  Reg = Src.getReg();
3135  IsCopy = Info->getScratchRSrcReg() == Reg;
3136  }
3137  }
3138  }
3139  }
3140 
3141  if (IsCopy) {
3142  LastCopy = Start;
3143  NonCopyInstrsLen = NonCopyInstrs.size();
3144  } else {
3145  NonCopyInstrs.push_back(&*Start);
3146  }
3147  }
3148  NonCopyInstrs.resize(NonCopyInstrsLen);
3149 
3150  for (auto *NonCopy : reverse(NonCopyInstrs)) {
3151  MBB->splice(LastCopy, MBB, NonCopy->getIterator());
3152  }
3153  Start = LastCopy;
3154 
3155  // Do the same for copies after the loop
3156  NonCopyInstrs.clear();
3157  NonCopyInstrsLen = 0;
3159  LastCopy = End;
3160  while (End->getOpcode() != FrameDestroyOpcode) {
3161  ++End;
3162  bool IsCopy = false;
3163  if (End->getOpcode() == AMDGPU::COPY) {
3164  auto &Src = End->getOperand(1);
3165  if (Src.isReg()) {
3166  Register Reg = Src.getReg();
3167  IsCopy = Reg.isPhysical() && MI.modifiesRegister(Reg, TRI);
3168  }
3169  }
3170 
3171  if (IsCopy) {
3172  LastCopy = End;
3173  NonCopyInstrsLen = NonCopyInstrs.size();
3174  } else {
3175  NonCopyInstrs.push_back(&*End);
3176  }
3177  }
3178  NonCopyInstrs.resize(NonCopyInstrsLen);
3179 
3180  End = LastCopy;
3181  ++LastCopy;
3182  for (auto *NonCopy : reverse(NonCopyInstrs)) {
3183  MBB->splice(LastCopy, MBB, NonCopy->getIterator());
3184  }
3185 
3186  ++End;
3187  MachineIRBuilder B(*Start);
3188  executeInWaterfallLoop(B, make_range(Start, End), SGPROperandRegs, MRI);
3189  break;
3190  }
3191  case AMDGPU::G_LOAD:
3192  case AMDGPU::G_ZEXTLOAD:
3193  case AMDGPU::G_SEXTLOAD: {
3194  if (applyMappingLoad(MI, OpdMapper, MRI))
3195  return;
3196  break;
3197  }
3198  case AMDGPU::G_DYN_STACKALLOC:
3199  applyMappingDynStackAlloc(MI, OpdMapper, MRI);
3200  return;
3201  case AMDGPU::G_SBFX:
3202  applyMappingBFE(OpdMapper, /*Signed*/ true);
3203  return;
3204  case AMDGPU::G_UBFX:
3205  applyMappingBFE(OpdMapper, /*Signed*/ false);
3206  return;
3207  case AMDGPU::G_AMDGPU_MAD_U64_U32:
3208  case AMDGPU::G_AMDGPU_MAD_I64_I32:
3209  applyMappingMAD_64_32(OpdMapper);
3210  return;
3211  default:
3212  break;
3213  }
3214 
3215  return applyDefaultMapping(OpdMapper);
3216 }
3217 
3218 // vgpr, sgpr -> vgpr
3219 // vgpr, agpr -> vgpr
3220 // agpr, agpr -> agpr
3221 // agpr, sgpr -> vgpr
3222 static unsigned regBankUnion(unsigned RB0, unsigned RB1) {
3223  if (RB0 == AMDGPU::InvalidRegBankID)
3224  return RB1;
3225  if (RB1 == AMDGPU::InvalidRegBankID)
3226  return RB0;
3227 
3228  if (RB0 == AMDGPU::SGPRRegBankID && RB1 == AMDGPU::SGPRRegBankID)
3229  return AMDGPU::SGPRRegBankID;
3230 
3231  if (RB0 == AMDGPU::AGPRRegBankID && RB1 == AMDGPU::AGPRRegBankID)
3232  return AMDGPU::AGPRRegBankID;
3233 
3234  return AMDGPU::VGPRRegBankID;
3235 }
3236 
3237 static unsigned regBankBoolUnion(unsigned RB0, unsigned RB1) {
3238  if (RB0 == AMDGPU::InvalidRegBankID)
3239  return RB1;
3240  if (RB1 == AMDGPU::InvalidRegBankID)
3241  return RB0;
3242 
3243  // vcc, vcc -> vcc
3244  // vcc, sgpr -> vcc
3245  // vcc, vgpr -> vcc
3246  if (RB0 == AMDGPU::VCCRegBankID || RB1 == AMDGPU::VCCRegBankID)
3247  return AMDGPU::VCCRegBankID;
3248 
3249  // vcc, vgpr -> vgpr
3250  return regBankUnion(RB0, RB1);
3251 }
3252 
3254  const MachineInstr &MI) const {
3255  unsigned RegBank = AMDGPU::InvalidRegBankID;
3256 
3257  for (const MachineOperand &MO : MI.operands()) {
3258  if (!MO.isReg())
3259  continue;
3260  Register Reg = MO.getReg();
3261  if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
3262  RegBank = regBankUnion(RegBank, Bank->getID());
3263  if (RegBank == AMDGPU::VGPRRegBankID)
3264  break;
3265  }
3266  }
3267 
3268  return RegBank;
3269 }
3270 
3272  const MachineFunction &MF = *MI.getParent()->getParent();
3273  const MachineRegisterInfo &MRI = MF.getRegInfo();
3274  for (const MachineOperand &MO : MI.operands()) {
3275  if (!MO.isReg())
3276  continue;
3277  Register Reg = MO.getReg();
3278  if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
3279  if (Bank->getID() != AMDGPU::SGPRRegBankID)
3280  return false;
3281  }
3282  }
3283  return true;
3284 }
3285 
3288  const MachineFunction &MF = *MI.getParent()->getParent();
3289  const MachineRegisterInfo &MRI = MF.getRegInfo();
3290  SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3291 
3292  for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
3293  const MachineOperand &SrcOp = MI.getOperand(i);
3294  if (!SrcOp.isReg())
3295  continue;
3296 
3297  unsigned Size = getSizeInBits(SrcOp.getReg(), MRI, *TRI);
3298  OpdsMapping[i] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
3299  }
3300  return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3301  MI.getNumOperands());
3302 }
3303 
3306  const MachineFunction &MF = *MI.getParent()->getParent();
3307  const MachineRegisterInfo &MRI = MF.getRegInfo();
3308  SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3309 
3310  // Even though we technically could use SGPRs, this would require knowledge of
3311  // the constant bus restriction. Force all sources to VGPR (except for VCC).
3312  //
3313  // TODO: Unary ops are trivially OK, so accept SGPRs?
3314  for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
3315  const MachineOperand &Src = MI.getOperand(i);
3316  if (!Src.isReg())
3317  continue;
3318 
3319  unsigned Size = getSizeInBits(Src.getReg(), MRI, *TRI);
3320  unsigned BankID = Size == 1 ? AMDGPU::VCCRegBankID : AMDGPU::VGPRRegBankID;
3321  OpdsMapping[i] = AMDGPU::getValueMapping(BankID, Size);
3322  }
3323 
3324  return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3325  MI.getNumOperands());
3326 }
3327 
3330  const MachineFunction &MF = *MI.getParent()->getParent();
3331  const MachineRegisterInfo &MRI = MF.getRegInfo();
3332  SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3333 
3334  for (unsigned I = 0, E = MI.getNumOperands(); I != E; ++I) {
3335  const MachineOperand &Op = MI.getOperand(I);
3336  if (!Op.isReg())
3337  continue;
3338 
3339  unsigned Size = getSizeInBits(Op.getReg(), MRI, *TRI);
3340  OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3341  }
3342 
3343  return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping),
3344  MI.getNumOperands());
3345 }
3346 
3349  const MachineInstr &MI,
3350  int RsrcIdx) const {
3351  // The reported argument index is relative to the IR intrinsic call arguments,
3352  // so we need to shift by the number of defs and the intrinsic ID.
3353  RsrcIdx += MI.getNumExplicitDefs() + 1;
3354 
3355  const int NumOps = MI.getNumOperands();
3356  SmallVector<const ValueMapping *, 8> OpdsMapping(NumOps);
3357 
3358  // TODO: Should packed/unpacked D16 difference be reported here as part of
3359  // the value mapping?
3360  for (int I = 0; I != NumOps; ++I) {
3361  if (!MI.getOperand(I).isReg())
3362  continue;
3363 
3364  Register OpReg = MI.getOperand(I).getReg();
3365  // We replace some dead address operands with $noreg
3366  if (!OpReg)
3367  continue;
3368 
3369  unsigned Size = getSizeInBits(OpReg, MRI, *TRI);
3370 
3371  // FIXME: Probably need a new intrinsic register bank searchable table to
3372  // handle arbitrary intrinsics easily.
3373  //
3374  // If this has a sampler, it immediately follows rsrc.
3375  const bool MustBeSGPR = I == RsrcIdx || I == RsrcIdx + 1;
3376 
3377  if (MustBeSGPR) {
3378  // If this must be an SGPR, so we must report whatever it is as legal.
3379  unsigned NewBank = getRegBankID(OpReg, MRI, AMDGPU::SGPRRegBankID);
3380  OpdsMapping[I] = AMDGPU::getValueMapping(NewBank, Size);
3381  } else {
3382  // Some operands must be VGPR, and these are easy to copy to.
3383  OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3384  }
3385  }
3386 
3387  return getInstructionMapping(1, 1, getOperandsMapping(OpdsMapping), NumOps);
3388 }
3389 
3390 /// Return the mapping for a pointer argument.
3393  Register PtrReg) const {
3394  LLT PtrTy = MRI.getType(PtrReg);
3395  unsigned Size = PtrTy.getSizeInBits();
3396  if (Subtarget.useFlatForGlobal() ||
3398  return AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3399 
3400  // If we're using MUBUF instructions for global memory, an SGPR base register
3401  // is possible. Otherwise this needs to be a VGPR.
3402  const RegisterBank *PtrBank = getRegBank(PtrReg, MRI, *TRI);
3403  return AMDGPU::getValueMapping(PtrBank->getID(), Size);
3404 }
3405 
3408 
3409  const MachineFunction &MF = *MI.getParent()->getParent();
3410  const MachineRegisterInfo &MRI = MF.getRegInfo();
3411  SmallVector<const ValueMapping*, 2> OpdsMapping(2);
3412  unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
3413  Register PtrReg = MI.getOperand(1).getReg();
3414  LLT PtrTy = MRI.getType(PtrReg);
3415  unsigned AS = PtrTy.getAddressSpace();
3416  unsigned PtrSize = PtrTy.getSizeInBits();
3417 
3418  const ValueMapping *ValMapping;
3419  const ValueMapping *PtrMapping;
3420 
3421  const RegisterBank *PtrBank = getRegBank(PtrReg, MRI, *TRI);
3422 
3423  if (PtrBank == &AMDGPU::SGPRRegBank && AMDGPU::isFlatGlobalAddrSpace(AS)) {
3424  if (isScalarLoadLegal(MI)) {
3425  // We have a uniform instruction so we want to use an SMRD load
3426  ValMapping = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
3427  PtrMapping = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, PtrSize);
3428  } else {
3429  ValMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3430 
3431  // If we're using MUBUF instructions for global memory, an SGPR base
3432  // register is possible. Otherwise this needs to be a VGPR.
3433  unsigned PtrBankID = Subtarget.useFlatForGlobal() ?
3434  AMDGPU::VGPRRegBankID : AMDGPU::SGPRRegBankID;
3435 
3436  PtrMapping = AMDGPU::getValueMapping(PtrBankID, PtrSize);
3437  }
3438  } else {
3439  ValMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3440  PtrMapping = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, PtrSize);
3441  }
3442 
3443  OpdsMapping[0] = ValMapping;
3444  OpdsMapping[1] = PtrMapping;
3446  1, 1, getOperandsMapping(OpdsMapping), MI.getNumOperands());
3447  return Mapping;
3448 
3449  // FIXME: Do we want to add a mapping for FLAT load, or should we just
3450  // handle that during instruction selection?
3451 }
3452 
3453 unsigned
3455  const MachineRegisterInfo &MRI,
3456  unsigned Default) const {
3457  const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
3458  return Bank ? Bank->getID() : Default;
3459 }
3460 
3463  const MachineRegisterInfo &MRI,
3464  const TargetRegisterInfo &TRI) const {
3465  // Lie and claim anything is legal, even though this needs to be an SGPR
3466  // applyMapping will have to deal with it as a waterfall loop.
3467  unsigned Bank = getRegBankID(Reg, MRI, AMDGPU::SGPRRegBankID);
3468  unsigned Size = getSizeInBits(Reg, MRI, TRI);
3469  return AMDGPU::getValueMapping(Bank, Size);
3470 }
3471 
3474  const MachineRegisterInfo &MRI,
3475  const TargetRegisterInfo &TRI) const {
3476  unsigned Size = getSizeInBits(Reg, MRI, TRI);
3477  return AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3478 }
3479 
3482  const MachineRegisterInfo &MRI,
3483  const TargetRegisterInfo &TRI) const {
3484  unsigned Size = getSizeInBits(Reg, MRI, TRI);
3485  return AMDGPU::getValueMapping(AMDGPU::AGPRRegBankID, Size);
3486 }
3487 
3488 ///
3489 /// This function must return a legal mapping, because
3490 /// AMDGPURegisterBankInfo::getInstrAlternativeMappings() is not called
3491 /// in RegBankSelect::Mode::Fast. Any mapping that would cause a
3492 /// VGPR to SGPR generated is illegal.
3493 ///
3494 // Operands that must be SGPRs must accept potentially divergent VGPRs as
3495 // legal. These will be dealt with in applyMappingImpl.
3496 //
3499  const MachineFunction &MF = *MI.getParent()->getParent();
3500  const MachineRegisterInfo &MRI = MF.getRegInfo();
3501 
3502  if (MI.isCopy() || MI.getOpcode() == AMDGPU::G_FREEZE) {
3503  // The default logic bothers to analyze impossible alternative mappings. We
3504  // want the most straightforward mapping, so just directly handle this.
3505  const RegisterBank *DstBank = getRegBank(MI.getOperand(0).getReg(), MRI,
3506  *TRI);
3507  const RegisterBank *SrcBank = getRegBank(MI.getOperand(1).getReg(), MRI,
3508  *TRI);
3509  assert(SrcBank && "src bank should have been assigned already");
3510  if (!DstBank)
3511  DstBank = SrcBank;
3512 
3513  unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
3514  if (cannotCopy(*DstBank, *SrcBank, Size))
3516 
3517  const ValueMapping &ValMap = getValueMapping(0, Size, *DstBank);
3518  unsigned OpdsMappingSize = MI.isCopy() ? 1 : 2;
3519  SmallVector<const ValueMapping *, 1> OpdsMapping(OpdsMappingSize);
3520  OpdsMapping[0] = &ValMap;
3521  if (MI.getOpcode() == AMDGPU::G_FREEZE)
3522  OpdsMapping[1] = &ValMap;
3523 
3524  return getInstructionMapping(
3525  1, /*Cost*/ 1,
3526  /*OperandsMapping*/ getOperandsMapping(OpdsMapping), OpdsMappingSize);
3527  }
3528 
3529  if (MI.isRegSequence()) {
3530  // If any input is a VGPR, the result must be a VGPR. The default handling
3531  // assumes any copy between banks is legal.
3532  unsigned BankID = AMDGPU::SGPRRegBankID;
3533 
3534  for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
3535  auto OpBank = getRegBankID(MI.getOperand(I).getReg(), MRI);
3536  // It doesn't make sense to use vcc or scc banks here, so just ignore
3537  // them.
3538  if (OpBank != AMDGPU::SGPRRegBankID) {
3539  BankID = AMDGPU::VGPRRegBankID;
3540  break;
3541  }
3542  }
3543  unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
3544 
3545  const ValueMapping &ValMap = getValueMapping(0, Size, getRegBank(BankID));
3546  return getInstructionMapping(
3547  1, /*Cost*/ 1,
3548  /*OperandsMapping*/ getOperandsMapping({&ValMap}), 1);
3549  }
3550 
3551  // The default handling is broken and doesn't handle illegal SGPR->VGPR copies
3552  // properly.
3553  //
3554  // TODO: There are additional exec masking dependencies to analyze.
3555  if (MI.getOpcode() == TargetOpcode::G_PHI) {
3556  unsigned ResultBank = AMDGPU::InvalidRegBankID;
3557  Register DstReg = MI.getOperand(0).getReg();
3558 
3559  // Sometimes the result may have already been assigned a bank.
3560  if (const RegisterBank *DstBank = getRegBank(DstReg, MRI, *TRI))
3561  ResultBank = DstBank->getID();
3562 
3563  for (unsigned I = 1, E = MI.getNumOperands(); I != E; I += 2) {
3564  Register Reg = MI.getOperand(I).getReg();
3565  const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI);
3566 
3567  // FIXME: Assuming VGPR for any undetermined inputs.
3568  if (!Bank || Bank->getID() == AMDGPU::VGPRRegBankID) {
3569  ResultBank = AMDGPU::VGPRRegBankID;
3570  break;
3571  }
3572 
3573  // FIXME: Need to promote SGPR case to s32
3574  unsigned OpBank = Bank->getID();
3575  ResultBank = regBankBoolUnion(ResultBank, OpBank);
3576  }
3577 
3578  assert(ResultBank != AMDGPU::InvalidRegBankID);
3579 
3580  unsigned Size = MRI.getType(DstReg).getSizeInBits();
3581 
3582  const ValueMapping &ValMap =
3583  getValueMapping(0, Size, getRegBank(ResultBank));
3584  return getInstructionMapping(
3585  1, /*Cost*/ 1,
3586  /*OperandsMapping*/ getOperandsMapping({&ValMap}), 1);
3587  }
3588 
3590  if (Mapping.isValid())
3591  return Mapping;
3592 
3593  SmallVector<const ValueMapping*, 8> OpdsMapping(MI.getNumOperands());
3594 
3595  switch (MI.getOpcode()) {
3596  default:
3598 
3599  case AMDGPU::G_AND:
3600  case AMDGPU::G_OR:
3601  case AMDGPU::G_XOR: {
3602  unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
3603  if (Size == 1) {
3604  const RegisterBank *DstBank
3605  = getRegBank(MI.getOperand(0).getReg(), MRI, *TRI);
3606 
3607  unsigned TargetBankID = AMDGPU::InvalidRegBankID;
3608  unsigned BankLHS = AMDGPU::InvalidRegBankID;
3609  unsigned BankRHS = AMDGPU::InvalidRegBankID;
3610  if (DstBank) {
3611  TargetBankID = DstBank->getID();
3612  if (DstBank == &AMDGPU::VCCRegBank) {
3613  TargetBankID = AMDGPU::VCCRegBankID;
3614  BankLHS = AMDGPU::VCCRegBankID;
3615  BankRHS = AMDGPU::VCCRegBankID;
3616  } else {
3617  BankLHS = getRegBankID(MI.getOperand(1).getReg(), MRI,
3618  AMDGPU::SGPRRegBankID);
3619  BankRHS = getRegBankID(MI.getOperand(2).getReg(), MRI,
3620  AMDGPU::SGPRRegBankID);
3621  }
3622  } else {
3623  BankLHS = getRegBankID(MI.getOperand(1).getReg(), MRI,
3624  AMDGPU::VCCRegBankID);
3625  BankRHS = getRegBankID(MI.getOperand(2).getReg(), MRI,
3626  AMDGPU::VCCRegBankID);
3627 
3628  // Both inputs should be true booleans to produce a boolean result.
3629  if (BankLHS == AMDGPU::VGPRRegBankID || BankRHS == AMDGPU::VGPRRegBankID) {
3630  TargetBankID = AMDGPU::VGPRRegBankID;
3631  } else if (BankLHS == AMDGPU::VCCRegBankID || BankRHS == AMDGPU::VCCRegBankID) {
3632  TargetBankID = AMDGPU::VCCRegBankID;
3633  BankLHS = AMDGPU::VCCRegBankID;
3634  BankRHS = AMDGPU::VCCRegBankID;
3635  } else if (BankLHS == AMDGPU::SGPRRegBankID && BankRHS == AMDGPU::SGPRRegBankID) {
3636  TargetBankID = AMDGPU::SGPRRegBankID;
3637  }
3638  }
3639 
3640  OpdsMapping[0] = AMDGPU::getValueMapping(TargetBankID, Size);
3641  OpdsMapping[1] = AMDGPU::getValueMapping(BankLHS, Size);
3642  OpdsMapping[2] = AMDGPU::getValueMapping(BankRHS, Size);
3643  break;
3644  }
3645 
3646  if (Size == 64) {
3647 
3648  if (isSALUMapping(MI)) {
3649  OpdsMapping[0] = getValueMappingSGPR64Only(AMDGPU::SGPRRegBankID, Size);
3650  OpdsMapping[1] = OpdsMapping[2] = OpdsMapping[0];
3651  } else {
3652  OpdsMapping[0] = getValueMappingSGPR64Only(AMDGPU::VGPRRegBankID, Size);
3653  unsigned Bank1 = getRegBankID(MI.getOperand(1).getReg(), MRI /*, DefaultBankID*/);
3654  OpdsMapping[1] = AMDGPU::getValueMapping(Bank1, Size);
3655 
3656  unsigned Bank2 = getRegBankID(MI.getOperand(2).getReg(), MRI /*, DefaultBankID*/);
3657  OpdsMapping[2] = AMDGPU::getValueMapping(Bank2, Size);
3658  }
3659 
3660  break;
3661  }
3662 
3663  [[fallthrough]];
3664  }
3665  case AMDGPU::G_PTR_ADD:
3666  case AMDGPU::G_PTRMASK:
3667  case AMDGPU::G_ADD:
3668  case AMDGPU::G_SUB:
3669  case AMDGPU::G_MUL:
3670  case AMDGPU::G_SHL:
3671  case AMDGPU::G_LSHR:
3672  case AMDGPU::G_ASHR:
3673  case AMDGPU::G_UADDO:
3674  case AMDGPU::G_USUBO:
3675  case AMDGPU::G_UADDE:
3676  case AMDGPU::G_SADDE:
3677  case AMDGPU::G_USUBE:
3678  case AMDGPU::G_SSUBE:
3679  case AMDGPU::G_SMIN:
3680  case AMDGPU::G_SMAX:
3681  case AMDGPU::G_UMIN:
3682  case AMDGPU::G_UMAX:
3683  case AMDGPU::G_ABS:
3684  case AMDGPU::G_SHUFFLE_VECTOR:
3685  case AMDGPU::G_SBFX:
3686  case AMDGPU::G_UBFX:
3687  if (isSALUMapping(MI))
3688  return getDefaultMappingSOP(MI);
3689  [[fallthrough]];
3690 
3691  case AMDGPU::G_SADDSAT: // FIXME: Could lower sat ops for SALU
3692  case AMDGPU::G_SSUBSAT:
3693  case AMDGPU::G_UADDSAT:
3694  case AMDGPU::G_USUBSAT:
3695  case AMDGPU::G_FADD:
3696  case AMDGPU::G_FSUB:
3697  case AMDGPU::G_FPTOSI:
3698  case AMDGPU::G_FPTOUI:
3699  case AMDGPU::G_FMUL:
3700  case AMDGPU::G_FMA:
3701  case AMDGPU::G_FMAD:
3702  case AMDGPU::G_FSQRT:
3703  case AMDGPU::G_FFLOOR:
3704  case AMDGPU::G_FCEIL:
3705  case AMDGPU::G_FRINT:
3706  case AMDGPU::G_SITOFP:
3707  case AMDGPU::G_UITOFP:
3708  case AMDGPU::G_FPTRUNC:
3709  case AMDGPU::G_FPEXT:
3710  case AMDGPU::G_FEXP2:
3711  case AMDGPU::G_FLOG2:
3712  case AMDGPU::G_FMINNUM:
3713  case AMDGPU::G_FMAXNUM:
3714  case AMDGPU::G_FMINNUM_IEEE:
3715  case AMDGPU::G_FMAXNUM_IEEE:
3716  case AMDGPU::G_FCANONICALIZE:
3717  case AMDGPU::G_INTRINSIC_TRUNC:
3718  case AMDGPU::G_STRICT_FADD:
3719  case AMDGPU::G_STRICT_FSUB:
3720  case AMDGPU::G_STRICT_FMUL:
3721  case AMDGPU::G_STRICT_FMA:
3722  case AMDGPU::G_BSWAP: // TODO: Somehow expand for scalar?
3723  case AMDGPU::G_FSHR: // TODO: Expand for scalar
3724  case AMDGPU::G_AMDGPU_FMIN_LEGACY:
3725  case AMDGPU::G_AMDGPU_FMAX_LEGACY:
3726  case AMDGPU::G_AMDGPU_RCP_IFLAG:
3727  case AMDGPU::G_AMDGPU_CVT_F32_UBYTE0:
3728  case AMDGPU::G_AMDGPU_CVT_F32_UBYTE1:
3729  case AMDGPU::G_AMDGPU_CVT_F32_UBYTE2:
3730  case AMDGPU::G_AMDGPU_CVT_F32_UBYTE3:
3731  case AMDGPU::G_AMDGPU_CVT_PK_I16_I32:
3732  case AMDGPU::G_AMDGPU_SMED3:
3733  return getDefaultMappingVOP(MI);
3734  case AMDGPU::G_UMULH:
3735  case AMDGPU::G_SMULH: {
3737  return getDefaultMappingSOP(MI);
3738  return getDefaultMappingVOP(MI);
3739  }
3740  case AMDGPU::G_AMDGPU_MAD_U64_U32:
3741  case AMDGPU::G_AMDGPU_MAD_I64_I32: {
3742  // Three possible mappings:
3743  //
3744  // - Default SOP
3745  // - Default VOP
3746  // - Scalar multiply: src0 and src1 are SGPRs, the rest is VOP.
3747  //
3748  // This allows instruction selection to keep the multiplication part of the
3749  // instruction on the SALU.
3750  bool AllSalu = true;
3751  bool MulSalu = true;
3752  for (unsigned i = 0; i < 5; ++i) {
3753  Register Reg = MI.getOperand(i).getReg();
3754  if (const RegisterBank *Bank = getRegBank(Reg, MRI, *TRI)) {
3755  if (Bank->getID() != AMDGPU::SGPRRegBankID) {
3756  AllSalu = false;
3757  if (i == 2 || i == 3) {
3758  MulSalu = false;
3759  break;
3760  }
3761  }
3762  }
3763  }
3764 
3765  if (AllSalu)
3766  return getDefaultMappingSOP(MI);
3767 
3768  // If the multiply-add is full-rate in VALU, use that even if the
3769  // multiplication part is scalar. Accumulating separately on the VALU would
3770  // take two instructions.
3771  if (!MulSalu || Subtarget.hasFullRate64Ops())
3772  return getDefaultMappingVOP(MI);
3773 
3774  // Keep the multiplication on the SALU, then accumulate on the VALU.
3775  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 64);
3776  OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
3777  OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
3778  OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
3779  OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 64);
3780  break;
3781  }
3782  case AMDGPU::G_IMPLICIT_DEF: {
3783  unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
3784  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
3785  break;
3786  }
3787  case AMDGPU::G_FCONSTANT:
3788  case AMDGPU::G_CONSTANT:
3789  case AMDGPU::G_GLOBAL_VALUE:
3790  case AMDGPU::G_BLOCK_ADDR:
3791  case AMDGPU::G_READCYCLECOUNTER: {
3792  unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
3793  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
3794  break;
3795  }
3796  case AMDGPU::G_FRAME_INDEX: {
3797  // TODO: This should be the same as other constants, but eliminateFrameIndex
3798  // currently assumes VALU uses.
3799  unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
3800  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3801  break;
3802  }
3803  case AMDGPU::G_DYN_STACKALLOC: {
3804  // Result is always uniform, and a wave reduction is needed for the source.
3805  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
3806  unsigned SrcBankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
3807  OpdsMapping[1] = AMDGPU::getValueMapping(SrcBankID, 32);
3808  break;
3809  }
3810  case AMDGPU::G_AMDGPU_WAVE_ADDRESS: {
3811  // This case is weird because we expect a physical register in the source,
3812  // but need to set a bank anyway.
3813  //
3814  // We could select the result to SGPR or VGPR, but for the one current use
3815  // it's more practical to always use VGPR.
3816  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
3817  OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 32);
3818  break;
3819  }
3820  case AMDGPU::G_INSERT: {
3821  unsigned BankID = getMappingType(MRI, MI);
3822  unsigned DstSize = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
3823  unsigned SrcSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
3824  unsigned EltSize = getSizeInBits(MI.getOperand(2).getReg(), MRI, *TRI);
3825  OpdsMapping[0] = AMDGPU::getValueMapping(BankID, DstSize);
3826  OpdsMapping[1] = AMDGPU::getValueMapping(BankID, SrcSize);
3827  OpdsMapping[2] = AMDGPU::getValueMapping(BankID, EltSize);
3828  OpdsMapping[3] = nullptr;
3829  break;
3830  }
3831  case AMDGPU::G_EXTRACT: {
3832  unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
3833  unsigned DstSize = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
3834  unsigned SrcSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
3835  OpdsMapping[0] = AMDGPU::getValueMapping(BankID, DstSize);
3836  OpdsMapping[1] = AMDGPU::getValueMapping(BankID, SrcSize);
3837  OpdsMapping[2] = nullptr;
3838  break;
3839  }
3840  case AMDGPU::G_BUILD_VECTOR:
3841  case AMDGPU::G_BUILD_VECTOR_TRUNC: {
3842  LLT DstTy = MRI.getType(MI.getOperand(0).getReg());
3843  if (DstTy == LLT::fixed_vector(2, 16)) {
3844  unsigned DstSize = DstTy.getSizeInBits();
3845  unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
3846  unsigned Src0BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
3847  unsigned Src1BankID = getRegBankID(MI.getOperand(2).getReg(), MRI);
3848  unsigned DstBankID = regBankUnion(Src0BankID, Src1BankID);
3849 
3850  OpdsMapping[0] = AMDGPU::getValueMapping(DstBankID, DstSize);
3851  OpdsMapping[1] = AMDGPU::getValueMapping(Src0BankID, SrcSize);
3852  OpdsMapping[2] = AMDGPU::getValueMapping(Src1BankID, SrcSize);
3853  break;
3854  }
3855 
3856  [[fallthrough]];
3857  }
3858  case AMDGPU::G_MERGE_VALUES:
3859  case AMDGPU::G_CONCAT_VECTORS: {
3860  unsigned Bank = getMappingType(MRI, MI);
3861  unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
3862  unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
3863 
3864  OpdsMapping[0] = AMDGPU::getValueMapping(Bank, DstSize);
3865  // Op1 and Dst should use the same register bank.
3866  for (unsigned i = 1, e = MI.getNumOperands(); i != e; ++i)
3867  OpdsMapping[i] = AMDGPU::getValueMapping(Bank, SrcSize);
3868  break;
3869  }
3870  case AMDGPU::G_BITREVERSE:
3871  case AMDGPU::G_BITCAST:
3872  case AMDGPU::G_INTTOPTR:
3873  case AMDGPU::G_PTRTOINT:
3874  case AMDGPU::G_FABS:
3875  case AMDGPU::G_FNEG: {
3876  unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
3877  unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
3878  OpdsMapping[0] = OpdsMapping[1] = AMDGPU::getValueMapping(BankID, Size);
3879  break;
3880  }
3881  case AMDGPU::G_AMDGPU_FFBH_U32:
3882  case AMDGPU::G_AMDGPU_FFBL_B32:
3883  case AMDGPU::G_CTLZ_ZERO_UNDEF:
3884  case AMDGPU::G_CTTZ_ZERO_UNDEF: {
3885  unsigned Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
3886  unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
3887  OpdsMapping[0] = AMDGPU::getValueMapping(BankID, 32);
3888  OpdsMapping[1] = AMDGPU::getValueMappingSGPR64Only(BankID, Size);
3889  break;
3890  }
3891  case AMDGPU::G_CTPOP: {
3892  unsigned Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
3893  unsigned BankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
3894  OpdsMapping[0] = AMDGPU::getValueMapping(BankID, 32);
3895 
3896  // This should really be getValueMappingSGPR64Only, but allowing the generic
3897  // code to handle the register split just makes using LegalizerHelper more
3898  // difficult.
3899  OpdsMapping[1] = AMDGPU::getValueMapping(BankID, Size);
3900  break;
3901  }
3902  case AMDGPU::G_TRUNC: {
3903  Register Dst = MI.getOperand(0).getReg();
3904  Register Src = MI.getOperand(1).getReg();
3905  unsigned Bank = getRegBankID(Src, MRI);
3906  unsigned DstSize = getSizeInBits(Dst, MRI, *TRI);
3907  unsigned SrcSize = getSizeInBits(Src, MRI, *TRI);
3908  OpdsMapping[0] = AMDGPU::getValueMapping(Bank, DstSize);
3909  OpdsMapping[1] = AMDGPU::getValueMapping(Bank, SrcSize);
3910  break;
3911  }
3912  case AMDGPU::G_ZEXT:
3913  case AMDGPU::G_SEXT:
3914  case AMDGPU::G_ANYEXT:
3915  case AMDGPU::G_SEXT_INREG: {
3916  Register Dst = MI.getOperand(0).getReg();
3917  Register Src = MI.getOperand(1).getReg();
3918  unsigned DstSize = getSizeInBits(Dst, MRI, *TRI);
3919  unsigned SrcSize = getSizeInBits(Src, MRI, *TRI);
3920 
3921  unsigned DstBank;
3922  const RegisterBank *SrcBank = getRegBank(Src, MRI, *TRI);
3923  assert(SrcBank);
3924  switch (SrcBank->getID()) {
3925  case AMDGPU::SGPRRegBankID:
3926  DstBank = AMDGPU::SGPRRegBankID;
3927  break;
3928  default:
3929  DstBank = AMDGPU::VGPRRegBankID;
3930  break;
3931  }
3932 
3933  // Scalar extend can use 64-bit BFE, but VGPRs require extending to
3934  // 32-bits, and then to 64.
3935  OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(DstBank, DstSize);
3936  OpdsMapping[1] = AMDGPU::getValueMappingSGPR64Only(SrcBank->getID(),
3937  SrcSize);
3938  break;
3939  }
3940  case AMDGPU::G_FCMP: {
3941  unsigned Size = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
3942  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
3943  OpdsMapping[1] = nullptr; // Predicate Operand.
3944  OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3945  OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3946  break;
3947  }
3948  case AMDGPU::G_IS_FPCLASS: {
3949  Register SrcReg = MI.getOperand(1).getReg();
3950  unsigned SrcSize = MRI.getType(SrcReg).getSizeInBits();
3951  unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
3952  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, DstSize);
3953  OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
3954  break;
3955  }
3956  case AMDGPU::G_STORE: {
3957  assert(MI.getOperand(0).isReg());
3958  unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
3959 
3960  // FIXME: We need to specify a different reg bank once scalar stores are
3961  // supported.
3962  const ValueMapping *ValMapping =
3963  AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
3964  OpdsMapping[0] = ValMapping;
3965  OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
3966  break;
3967  }
3968  case AMDGPU::G_ICMP: {
3969  auto Pred = static_cast<CmpInst::Predicate>(MI.getOperand(1).getPredicate());
3970  unsigned Size = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
3971 
3972  // See if the result register has already been constrained to vcc, which may
3973  // happen due to control flow intrinsic lowering.
3974  unsigned DstBank = getRegBankID(MI.getOperand(0).getReg(), MRI,
3975  AMDGPU::SGPRRegBankID);
3976  unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI);
3977  unsigned Op3Bank = getRegBankID(MI.getOperand(3).getReg(), MRI);
3978 
3979  bool CanUseSCC = DstBank == AMDGPU::SGPRRegBankID &&
3980  Op2Bank == AMDGPU::SGPRRegBankID &&
3981  Op3Bank == AMDGPU::SGPRRegBankID &&
3982  (Size == 32 || (Size == 64 &&
3983  (Pred == CmpInst::ICMP_EQ || Pred == CmpInst::ICMP_NE) &&
3985 
3986  DstBank = CanUseSCC ? AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
3987  unsigned SrcBank = CanUseSCC ? AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
3988 
3989  // TODO: Use 32-bit for scalar output size.
3990  // SCC results will need to be copied to a 32-bit SGPR virtual register.
3991  const unsigned ResultSize = 1;
3992 
3993  OpdsMapping[0] = AMDGPU::getValueMapping(DstBank, ResultSize);
3994  OpdsMapping[2] = AMDGPU::getValueMapping(SrcBank, Size);
3995  OpdsMapping[3] = AMDGPU::getValueMapping(SrcBank, Size);
3996  break;
3997  }
3998  case AMDGPU::G_EXTRACT_VECTOR_ELT: {
3999  // VGPR index can be used for waterfall when indexing a SGPR vector.
4000  unsigned SrcBankID = getRegBankID(MI.getOperand(1).getReg(), MRI);
4001  unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4002  unsigned SrcSize = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4003  unsigned IdxSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4004  unsigned IdxBank = getRegBankID(MI.getOperand(2).getReg(), MRI);
4005  unsigned OutputBankID = regBankUnion(SrcBankID, IdxBank);
4006 
4007  OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(OutputBankID, DstSize);
4008  OpdsMapping[1] = AMDGPU::getValueMapping(SrcBankID, SrcSize);
4009 
4010  // The index can be either if the source vector is VGPR.
4011  OpdsMapping[2] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4012  break;
4013  }
4014  case AMDGPU::G_INSERT_VECTOR_ELT: {
4015  unsigned OutputBankID = isSALUMapping(MI) ?
4016  AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
4017 
4018  unsigned VecSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4019  unsigned InsertSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4020  unsigned IdxSize = MRI.getType(MI.getOperand(3).getReg()).getSizeInBits();
4021  unsigned InsertEltBankID = getRegBankID(MI.getOperand(2).getReg(), MRI);
4022  unsigned IdxBankID = getRegBankID(MI.getOperand(3).getReg(), MRI);
4023 
4024  OpdsMapping[0] = AMDGPU::getValueMapping(OutputBankID, VecSize);
4025  OpdsMapping[1] = AMDGPU::getValueMapping(OutputBankID, VecSize);
4026 
4027  // This is a weird case, because we need to break down the mapping based on
4028  // the register bank of a different operand.
4029  if (InsertSize == 64 && OutputBankID == AMDGPU::VGPRRegBankID) {
4030  OpdsMapping[2] = AMDGPU::getValueMappingSplit64(InsertEltBankID,
4031  InsertSize);
4032  } else {
4033  assert(InsertSize == 32 || InsertSize == 64);
4034  OpdsMapping[2] = AMDGPU::getValueMapping(InsertEltBankID, InsertSize);
4035  }
4036 
4037  // The index can be either if the source vector is VGPR.
4038  OpdsMapping[3] = AMDGPU::getValueMapping(IdxBankID, IdxSize);
4039  break;
4040  }
4041  case AMDGPU::G_UNMERGE_VALUES: {
4042  unsigned Bank = getMappingType(MRI, MI);
4043 
4044  // Op1 and Dst should use the same register bank.
4045  // FIXME: Shouldn't this be the default? Why do we need to handle this?
4046  for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
4047  unsigned Size = getSizeInBits(MI.getOperand(i).getReg(), MRI, *TRI);
4048  OpdsMapping[i] = AMDGPU::getValueMapping(Bank, Size);
4049  }
4050  break;
4051  }
4052  case AMDGPU::G_AMDGPU_BUFFER_LOAD:
4053  case AMDGPU::G_AMDGPU_BUFFER_LOAD_UBYTE:
4054  case AMDGPU::G_AMDGPU_BUFFER_LOAD_SBYTE:
4055  case AMDGPU::G_AMDGPU_BUFFER_LOAD_USHORT:
4056  case AMDGPU::G_AMDGPU_BUFFER_LOAD_SSHORT:
4057  case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT:
4058  case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_TFE:
4059  case AMDGPU::G_AMDGPU_BUFFER_LOAD_FORMAT_D16:
4060  case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT:
4061  case AMDGPU::G_AMDGPU_TBUFFER_LOAD_FORMAT_D16:
4062  case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT:
4063  case AMDGPU::G_AMDGPU_TBUFFER_STORE_FORMAT_D16:
4064  case AMDGPU::G_AMDGPU_BUFFER_STORE:
4065  case AMDGPU::G_AMDGPU_BUFFER_STORE_BYTE:
4066  case AMDGPU::G_AMDGPU_BUFFER_STORE_SHORT:
4067  case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT:
4068  case AMDGPU::G_AMDGPU_BUFFER_STORE_FORMAT_D16: {
4069  OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4070 
4071  // rsrc
4072  OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4073 
4074  // vindex
4075  OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4076 
4077  // voffset
4078  OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4079 
4080  // soffset
4081  OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4082 
4083  // Any remaining operands are immediates and were correctly null
4084  // initialized.
4085  break;
4086  }
4087  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SWAP:
4088  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_ADD:
4089  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SUB:
4090  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMIN:
4091  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMIN:
4092  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_SMAX:
4093  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_UMAX:
4094  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_AND:
4095  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_OR:
4096  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_XOR:
4097  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_INC:
4098  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_DEC:
4099  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FADD:
4100  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMIN:
4101  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_FMAX: {
4102  // vdata_out
4103  OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4104 
4105  // vdata_in
4106  OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4107 
4108  // rsrc
4109  OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4110 
4111  // vindex
4112  OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4113 
4114  // voffset
4115  OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4116 
4117  // soffset
4118  OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4119 
4120  // Any remaining operands are immediates and were correctly null
4121  // initialized.
4122  break;
4123  }
4124  case AMDGPU::G_AMDGPU_BUFFER_ATOMIC_CMPSWAP: {
4125  // vdata_out
4126  OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4127 
4128  // vdata_in
4129  OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4130 
4131  // cmp
4132  OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4133 
4134  // rsrc
4135  OpdsMapping[3] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4136 
4137  // vindex
4138  OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4139 
4140  // voffset
4141  OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4142 
4143  // soffset
4144  OpdsMapping[6] = getSGPROpMapping(MI.getOperand(6).getReg(), MRI, *TRI);
4145 
4146  // Any remaining operands are immediates and were correctly null
4147  // initialized.
4148  break;
4149  }
4150  case AMDGPU::G_AMDGPU_S_BUFFER_LOAD: {
4151  // Lie and claim everything is legal, even though some need to be
4152  // SGPRs. applyMapping will have to deal with it as a waterfall loop.
4153  OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4154  OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4155 
4156  // We need to convert this to a MUBUF if either the resource of offset is
4157  // VGPR.
4158  unsigned RSrcBank = OpdsMapping[1]->BreakDown[0].RegBank->getID();
4159  unsigned OffsetBank = OpdsMapping[2]->BreakDown[0].RegBank->getID();
4160  unsigned ResultBank = regBankUnion(RSrcBank, OffsetBank);
4161 
4162  unsigned Size0 = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4163  OpdsMapping[0] = AMDGPU::getValueMapping(ResultBank, Size0);
4164  break;
4165  }
4166  case AMDGPU::G_INTRINSIC: {
4167  switch (MI.getIntrinsicID()) {
4168  default:
4170  case Intrinsic::amdgcn_div_fmas:
4171  case Intrinsic::amdgcn_div_fixup:
4172  case Intrinsic::amdgcn_trig_preop:
4173  case Intrinsic::amdgcn_sin:
4174  case Intrinsic::amdgcn_cos:
4175  case Intrinsic::amdgcn_log_clamp:
4176  case Intrinsic::amdgcn_rcp:
4177  case Intrinsic::amdgcn_rcp_legacy:
4178  case Intrinsic::amdgcn_sqrt:
4179  case Intrinsic::amdgcn_rsq:
4180  case Intrinsic::amdgcn_rsq_legacy:
4181  case Intrinsic::amdgcn_rsq_clamp:
4182  case Intrinsic::amdgcn_fmul_legacy:
4183  case Intrinsic::amdgcn_fma_legacy:
4184  case Intrinsic::amdgcn_ldexp:
4185  case Intrinsic::amdgcn_frexp_mant:
4186  case Intrinsic::amdgcn_frexp_exp:
4187  case Intrinsic::amdgcn_fract:
4188  case Intrinsic::amdgcn_cvt_pkrtz:
4189  case Intrinsic::amdgcn_cvt_pknorm_i16:
4190  case Intrinsic::amdgcn_cvt_pknorm_u16:
4191  case Intrinsic::amdgcn_cvt_pk_i16:
4192  case Intrinsic::amdgcn_cvt_pk_u16:
4193  case Intrinsic::amdgcn_fmed3:
4194  case Intrinsic::amdgcn_cubeid:
4195  case Intrinsic::amdgcn_cubema:
4196  case Intrinsic::amdgcn_cubesc:
4197  case Intrinsic::amdgcn_cubetc:
4198  case Intrinsic::amdgcn_sffbh:
4199  case Intrinsic::amdgcn_fmad_ftz:
4200  case Intrinsic::amdgcn_mbcnt_lo:
4201  case Intrinsic::amdgcn_mbcnt_hi:
4202  case Intrinsic::amdgcn_mul_u24:
4203  case Intrinsic::amdgcn_mul_i24:
4204  case Intrinsic::amdgcn_mulhi_u24:
4205  case Intrinsic::amdgcn_mulhi_i24:
4206  case Intrinsic::amdgcn_lerp:
4207  case Intrinsic::amdgcn_sad_u8:
4208  case Intrinsic::amdgcn_msad_u8:
4209  case Intrinsic::amdgcn_sad_hi_u8:
4210  case Intrinsic::amdgcn_sad_u16:
4211  case Intrinsic::amdgcn_qsad_pk_u16_u8:
4212  case Intrinsic::amdgcn_mqsad_pk_u16_u8:
4213  case Intrinsic::amdgcn_mqsad_u32_u8:
4214  case Intrinsic::amdgcn_cvt_pk_u8_f32:
4215  case Intrinsic::amdgcn_alignbyte:
4216  case Intrinsic::amdgcn_perm:
4217  case Intrinsic::amdgcn_fdot2:
4218  case Intrinsic::amdgcn_sdot2:
4219  case Intrinsic::amdgcn_udot2:
4220  case Intrinsic::amdgcn_sdot4:
4221  case Intrinsic::amdgcn_udot4:
4222  case Intrinsic::amdgcn_sdot8:
4223  case Intrinsic::amdgcn_udot8:
4224  case Intrinsic::amdgcn_fdot2_bf16_bf16:
4225  case Intrinsic::amdgcn_fdot2_f16_f16:
4226  case Intrinsic::amdgcn_fdot2_f32_bf16:
4227  case Intrinsic::amdgcn_sudot4:
4228  case Intrinsic::amdgcn_sudot8:
4229  case Intrinsic::amdgcn_wmma_bf16_16x16x16_bf16:
4230  case Intrinsic::amdgcn_wmma_f16_16x16x16_f16:
4231  case Intrinsic::amdgcn_wmma_f32_16x16x16_bf16:
4232  case Intrinsic::amdgcn_wmma_f32_16x16x16_f16:
4233  case Intrinsic::amdgcn_wmma_i32_16x16x16_iu4:
4234  case Intrinsic::amdgcn_wmma_i32_16x16x16_iu8:
4235  return getDefaultMappingVOP(MI);
4236  case Intrinsic::amdgcn_sbfe:
4237  case Intrinsic::amdgcn_ubfe:
4238  if (isSALUMapping(MI))
4239  return getDefaultMappingSOP(MI);
4240  return getDefaultMappingVOP(MI);
4241  case Intrinsic::amdgcn_ds_swizzle:
4242  case Intrinsic::amdgcn_ds_permute:
4243  case Intrinsic::amdgcn_ds_bpermute:
4244  case Intrinsic::amdgcn_update_dpp:
4245  case Intrinsic::amdgcn_mov_dpp8:
4246  case Intrinsic::amdgcn_mov_dpp:
4247  case Intrinsic::amdgcn_strict_wwm:
4248  case Intrinsic::amdgcn_wwm:
4249  case Intrinsic::amdgcn_strict_wqm:
4250  case Intrinsic::amdgcn_wqm:
4251  case Intrinsic::amdgcn_softwqm:
4252  case Intrinsic::amdgcn_set_inactive:
4253  case Intrinsic::amdgcn_permlane64:
4254  return getDefaultMappingAllVGPR(MI);
4255  case Intrinsic::amdgcn_kernarg_segment_ptr:
4256  case Intrinsic::amdgcn_s_getpc:
4257  case Intrinsic::amdgcn_groupstaticsize:
4258  case Intrinsic::amdgcn_reloc_constant:
4259  case Intrinsic::returnaddress: {
4260  unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4261  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4262  break;
4263  }
4264  case Intrinsic::amdgcn_wqm_vote: {
4265  unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4266  OpdsMapping[0] = OpdsMapping[2]
4267  = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Size);
4268  break;
4269  }
4270  case Intrinsic::amdgcn_ps_live: {
4271  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4272  break;
4273  }
4274  case Intrinsic::amdgcn_div_scale: {
4275  unsigned Dst0Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4276  unsigned Dst1Size = MRI.getType(MI.getOperand(1).getReg()).getSizeInBits();
4277  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Dst0Size);
4278  OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, Dst1Size);
4279 
4280  unsigned SrcSize = MRI.getType(MI.getOperand(3).getReg()).getSizeInBits();
4281  OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4282  OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4283  break;
4284  }
4285  case Intrinsic::amdgcn_class: {
4286  Register Src0Reg = MI.getOperand(2).getReg();
4287  Register Src1Reg = MI.getOperand(3).getReg();
4288  unsigned Src0Size = MRI.getType(Src0Reg).getSizeInBits();
4289  unsigned Src1Size = MRI.getType(Src1Reg).getSizeInBits();
4290  unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4291  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, DstSize);
4292  OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Src0Size);
4293  OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Src1Size);
4294  break;
4295  }
4296  case Intrinsic::amdgcn_icmp:
4297  case Intrinsic::amdgcn_fcmp: {
4298  unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4299  // This is not VCCRegBank because this is not used in boolean contexts.
4300  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
4301  unsigned OpSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4302  OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
4303  OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, OpSize);
4304  break;
4305  }
4306  case Intrinsic::amdgcn_readlane: {
4307  // This must be an SGPR, but accept a VGPR.
4308  Register IdxReg = MI.getOperand(3).getReg();
4309  unsigned IdxSize = MRI.getType(IdxReg).getSizeInBits();
4310  unsigned IdxBank = getRegBankID(IdxReg, MRI, AMDGPU::SGPRRegBankID);
4311  OpdsMapping[3] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4312  [[fallthrough]];
4313  }
4314  case Intrinsic::amdgcn_readfirstlane: {
4315  unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4316  unsigned SrcSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4317  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
4318  OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4319  break;
4320  }
4321  case Intrinsic::amdgcn_writelane: {
4322  unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4323  Register SrcReg = MI.getOperand(2).getReg();
4324  unsigned SrcSize = MRI.getType(SrcReg).getSizeInBits();
4325  unsigned SrcBank = getRegBankID(SrcReg, MRI, AMDGPU::SGPRRegBankID);
4326  Register IdxReg = MI.getOperand(3).getReg();
4327  unsigned IdxSize = MRI.getType(IdxReg).getSizeInBits();
4328  unsigned IdxBank = getRegBankID(IdxReg, MRI, AMDGPU::SGPRRegBankID);
4329  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
4330 
4331  // These 2 must be SGPRs, but accept VGPRs. Readfirstlane will be inserted
4332  // to legalize.
4333  OpdsMapping[2] = AMDGPU::getValueMapping(SrcBank, SrcSize);
4334  OpdsMapping[3] = AMDGPU::getValueMapping(IdxBank, IdxSize);
4335  OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, SrcSize);
4336  break;
4337  }
4338  case Intrinsic::amdgcn_if_break: {
4339  unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4340  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4341  OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4342  OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4343  break;
4344  }
4345  case Intrinsic::amdgcn_permlane16:
4346  case Intrinsic::amdgcn_permlanex16: {
4347  unsigned Size = getSizeInBits(MI.getOperand(0).getReg(), MRI, *TRI);
4348  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4349  OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4350  OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4351  OpdsMapping[4] = getSGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4352  OpdsMapping[5] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4353  break;
4354  }
4355  case Intrinsic::amdgcn_mfma_f32_4x4x1f32:
4356  case Intrinsic::amdgcn_mfma_f32_4x4x4f16:
4357  case Intrinsic::amdgcn_mfma_i32_4x4x4i8:
4358  case Intrinsic::amdgcn_mfma_f32_4x4x2bf16:
4359  case Intrinsic::amdgcn_mfma_f32_16x16x1f32:
4360  case Intrinsic::amdgcn_mfma_f32_16x16x4f32:
4361  case Intrinsic::amdgcn_mfma_f32_16x16x4f16:
4362  case Intrinsic::amdgcn_mfma_f32_16x16x16f16:
4363  case Intrinsic::amdgcn_mfma_i32_16x16x4i8:
4364  case Intrinsic::amdgcn_mfma_i32_16x16x16i8:
4365  case Intrinsic::amdgcn_mfma_f32_16x16x2bf16:
4366  case Intrinsic::amdgcn_mfma_f32_16x16x8bf16:
4367  case Intrinsic::amdgcn_mfma_f32_32x32x1f32:
4368  case Intrinsic::amdgcn_mfma_f32_32x32x2f32:
4369  case Intrinsic::amdgcn_mfma_f32_32x32x4f16:
4370  case Intrinsic::amdgcn_mfma_f32_32x32x8f16:
4371  case Intrinsic::amdgcn_mfma_i32_32x32x4i8:
4372  case Intrinsic::amdgcn_mfma_i32_32x32x8i8:
4373  case Intrinsic::amdgcn_mfma_f32_32x32x2bf16:
4374  case Intrinsic::amdgcn_mfma_f32_32x32x4bf16:
4375  case Intrinsic::amdgcn_mfma_f32_32x32x4bf16_1k:
4376  case Intrinsic::amdgcn_mfma_f32_16x16x4bf16_1k:
4377  case Intrinsic::amdgcn_mfma_f32_4x4x4bf16_1k:
4378  case Intrinsic::amdgcn_mfma_f32_32x32x8bf16_1k:
4379  case Intrinsic::amdgcn_mfma_f32_16x16x16bf16_1k:
4380  case Intrinsic::amdgcn_mfma_f64_16x16x4f64:
4381  case Intrinsic::amdgcn_mfma_f64_4x4x4f64:
4382  case Intrinsic::amdgcn_mfma_i32_16x16x32_i8:
4383  case Intrinsic::amdgcn_mfma_i32_32x32x16_i8:
4384  case Intrinsic::amdgcn_mfma_f32_16x16x8_xf32:
4385  case Intrinsic::amdgcn_mfma_f32_32x32x4_xf32:
4386  case Intrinsic::amdgcn_mfma_f32_16x16x32_bf8_bf8:
4387  case Intrinsic::amdgcn_mfma_f32_16x16x32_bf8_fp8:
4388  case Intrinsic::amdgcn_mfma_f32_16x16x32_fp8_bf8:
4389  case Intrinsic::amdgcn_mfma_f32_16x16x32_fp8_fp8:
4390  case Intrinsic::amdgcn_mfma_f32_32x32x16_bf8_bf8:
4391  case Intrinsic::amdgcn_mfma_f32_32x32x16_bf8_fp8:
4392  case Intrinsic::amdgcn_mfma_f32_32x32x16_fp8_bf8:
4393  case Intrinsic::amdgcn_mfma_f32_32x32x16_fp8_fp8: {
4394  // Default for MAI intrinsics.
4395  // srcC can also be an immediate which can be folded later.
4396  // FIXME: Should we eventually add an alternative mapping with AGPR src
4397  // for srcA/srcB?
4398  //
4399  // vdst, srcA, srcB, srcC
4401  OpdsMapping[0] =
4402  Info->mayNeedAGPRs()
4403  ? getAGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI)
4404  : getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4405  OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4406  OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4407  OpdsMapping[4] =
4408  Info->mayNeedAGPRs()
4409  ? getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI)
4410  : getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4411  break;
4412  }
4413  case Intrinsic::amdgcn_smfmac_f32_16x16x32_f16:
4414  case Intrinsic::amdgcn_smfmac_f32_32x32x16_f16:
4415  case Intrinsic::amdgcn_smfmac_f32_16x16x32_bf16:
4416  case Intrinsic::amdgcn_smfmac_f32_32x32x16_bf16:
4417  case Intrinsic::amdgcn_smfmac_i32_16x16x64_i8:
4418  case Intrinsic::amdgcn_smfmac_i32_32x32x32_i8:
4419  case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf8_bf8:
4420  case Intrinsic::amdgcn_smfmac_f32_16x16x64_bf8_fp8:
4421  case Intrinsic::amdgcn_smfmac_f32_16x16x64_fp8_bf8:
4422  case Intrinsic::amdgcn_smfmac_f32_16x16x64_fp8_fp8:
4423  case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf8_bf8:
4424  case Intrinsic::amdgcn_smfmac_f32_32x32x32_bf8_fp8:
4425  case Intrinsic::amdgcn_smfmac_f32_32x32x32_fp8_bf8:
4426  case Intrinsic::amdgcn_smfmac_f32_32x32x32_fp8_fp8: {
4427  // vdst, srcA, srcB, srcC, idx
4428  OpdsMapping[0] = getAGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4429  OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4430  OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4431  OpdsMapping[4] = getAGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4432  OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4433  break;
4434  }
4435  case Intrinsic::amdgcn_interp_p1:
4436  case Intrinsic::amdgcn_interp_p2:
4437  case Intrinsic::amdgcn_interp_mov:
4438  case Intrinsic::amdgcn_interp_p1_f16:
4439  case Intrinsic::amdgcn_interp_p2_f16:
4440  case Intrinsic::amdgcn_lds_param_load: {
4441  const int M0Idx = MI.getNumOperands() - 1;
4442  Register M0Reg = MI.getOperand(M0Idx).getReg();
4443  unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
4444  unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4445 
4446  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
4447  for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
4448  OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4449 
4450  // Must be SGPR, but we must take whatever the original bank is and fix it
4451  // later.
4452  OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);
4453  break;
4454  }
4455  case Intrinsic::amdgcn_interp_inreg_p10:
4456  case Intrinsic::amdgcn_interp_inreg_p2:
4457  case Intrinsic::amdgcn_interp_inreg_p10_f16:
4458  case Intrinsic::amdgcn_interp_inreg_p2_f16: {
4459  unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4460  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
4461  OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4462  OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4463  OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4464  break;
4465  }
4466  case Intrinsic::amdgcn_ballot: {
4467  unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4468  unsigned SrcSize = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4469  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, DstSize);
4470  OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, SrcSize);
4471  break;
4472  }
4473  }
4474  break;
4475  }
4476  case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD:
4477  case AMDGPU::G_AMDGPU_INTRIN_IMAGE_LOAD_D16:
4478  case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE:
4479  case AMDGPU::G_AMDGPU_INTRIN_IMAGE_STORE_D16: {
4480  auto IntrID = MI.getIntrinsicID();
4481  const AMDGPU::RsrcIntrinsic *RSrcIntrin = AMDGPU::lookupRsrcIntrinsic(IntrID);
4482  assert(RSrcIntrin && "missing RsrcIntrinsic for image intrinsic");
4483  // Non-images can have complications from operands that allow both SGPR
4484  // and VGPR. For now it's too complicated to figure out the final opcode
4485  // to derive the register bank from the MCInstrDesc.
4486  assert(RSrcIntrin->IsImage);
4487  return getImageMapping(MRI, MI, RSrcIntrin->RsrcArg);
4488  }
4489  case AMDGPU::G_AMDGPU_INTRIN_BVH_INTERSECT_RAY: {
4490  unsigned N = MI.getNumExplicitOperands() - 2;
4491  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 128);
4492  OpdsMapping[N] = getSGPROpMapping(MI.getOperand(N).getReg(), MRI, *TRI);
4493  if (N == 3) {
4494  // Sequential form: all operands combined into VGPR256/VGPR512
4495  unsigned Size = MRI.getType(MI.getOperand(2).getReg()).getSizeInBits();
4496  if (Size > 256)
4497  Size = 512;
4498  OpdsMapping[2] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4499  } else {
4500  // NSA form
4501  for (unsigned I = 2; I < N; ++I) {
4502  unsigned Size = MRI.getType(MI.getOperand(I).getReg()).getSizeInBits();
4503  OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, Size);
4504  }
4505  }
4506  break;
4507  }
4508  case AMDGPU::G_INTRINSIC_W_SIDE_EFFECTS: {
4509  auto IntrID = MI.getIntrinsicID();
4510  switch (IntrID) {
4511  case Intrinsic::amdgcn_s_getreg:
4512  case Intrinsic::amdgcn_s_memtime:
4513  case Intrinsic::amdgcn_s_memrealtime:
4514  case Intrinsic::amdgcn_s_get_waveid_in_workgroup:
4515  case Intrinsic::amdgcn_s_sendmsg_rtn: {
4516  unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4517  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4518  break;
4519  }
4520  case Intrinsic::amdgcn_global_atomic_fadd:
4521  case Intrinsic::amdgcn_global_atomic_csub:
4522  case Intrinsic::amdgcn_global_atomic_fmin:
4523  case Intrinsic::amdgcn_global_atomic_fmax:
4524  case Intrinsic::amdgcn_flat_atomic_fadd:
4525  case Intrinsic::amdgcn_flat_atomic_fmin:
4526  case Intrinsic::amdgcn_flat_atomic_fmax:
4527  case Intrinsic::amdgcn_global_atomic_fadd_v2bf16:
4528  case Intrinsic::amdgcn_flat_atomic_fadd_v2bf16:
4529  return getDefaultMappingAllVGPR(MI);
4530  case Intrinsic::amdgcn_ds_ordered_add:
4531  case Intrinsic::amdgcn_ds_ordered_swap:
4532  case Intrinsic::amdgcn_ds_fadd_v2bf16: {
4533  unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4534  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
4535  unsigned M0Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
4536  AMDGPU::SGPRRegBankID);
4537  OpdsMapping[2] = AMDGPU::getValueMapping(M0Bank, 32);
4538  OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4539  break;
4540  }
4541  case Intrinsic::amdgcn_ds_append:
4542  case Intrinsic::amdgcn_ds_consume: {
4543  unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4544  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
4545  OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4546  break;
4547  }
4548  case Intrinsic::amdgcn_exp_compr:
4549  OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4550  OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4551  break;
4552  case Intrinsic::amdgcn_exp:
4553  // FIXME: Could we support packed types here?
4554  OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4555  OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4556  OpdsMapping[5] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4557  OpdsMapping[6] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4558  break;
4559  case Intrinsic::amdgcn_exp_row:
4560  OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4561  OpdsMapping[4] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4562  OpdsMapping[5] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4563  OpdsMapping[6] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4564  OpdsMapping[8] = getSGPROpMapping(MI.getOperand(8).getReg(), MRI, *TRI);
4565  break;
4566  case Intrinsic::amdgcn_s_sendmsg:
4567  case Intrinsic::amdgcn_s_sendmsghalt: {
4568  // This must be an SGPR, but accept a VGPR.
4569  unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
4570  AMDGPU::SGPRRegBankID);
4571  OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
4572  break;
4573  }
4574  case Intrinsic::amdgcn_s_setreg: {
4575  // This must be an SGPR, but accept a VGPR.
4576  unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
4577  AMDGPU::SGPRRegBankID);
4578  OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
4579  break;
4580  }
4581  case Intrinsic::amdgcn_end_cf: {
4582  unsigned Size = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
4583  OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4584  break;
4585  }
4586  case Intrinsic::amdgcn_else: {
4587  unsigned WaveSize = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
4588  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4589  OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);
4590  OpdsMapping[3] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, WaveSize);
4591  break;
4592  }
4593  case Intrinsic::amdgcn_live_mask: {
4594  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4595  break;
4596  }
4597  case Intrinsic::amdgcn_wqm_demote:
4598  case Intrinsic::amdgcn_kill: {
4599  OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VCCRegBankID, 1);
4600  break;
4601  }
4602  case Intrinsic::amdgcn_raw_buffer_load:
4603  case Intrinsic::amdgcn_raw_tbuffer_load: {
4604  // FIXME: Should make intrinsic ID the last operand of the instruction,
4605  // then this would be the same as store
4606  OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4607  OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4608  OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4609  OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4610  break;
4611  }
4612  case Intrinsic::amdgcn_raw_buffer_load_lds: {
4613  OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4614  OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4615  OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4616  OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4617  break;
4618  }
4619  case Intrinsic::amdgcn_raw_buffer_store:
4620  case Intrinsic::amdgcn_raw_buffer_store_format:
4621  case Intrinsic::amdgcn_raw_tbuffer_store: {
4622  OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4623  OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4624  OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4625  OpdsMapping[4] = getSGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4626  break;
4627  }
4628  case Intrinsic::amdgcn_struct_buffer_load:
4629  case Intrinsic::amdgcn_struct_tbuffer_load: {
4630  OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4631  OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4632  OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4633  OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4634  OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4635  break;
4636  }
4637  case Intrinsic::amdgcn_struct_buffer_load_lds: {
4638  OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4639  OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4640  OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4641  OpdsMapping[5] = getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4642  OpdsMapping[6] = getSGPROpMapping(MI.getOperand(6).getReg(), MRI, *TRI);
4643  break;
4644  }
4645  case Intrinsic::amdgcn_struct_buffer_store:
4646  case Intrinsic::amdgcn_struct_tbuffer_store: {
4647  OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4648  OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4649  OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4650  OpdsMapping[4] = getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI);
4651  OpdsMapping[5] = getSGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI);
4652  break;
4653  }
4654  case Intrinsic::amdgcn_init_exec_from_input: {
4655  unsigned Size = getSizeInBits(MI.getOperand(1).getReg(), MRI, *TRI);
4656  OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, Size);
4657  break;
4658  }
4659  case Intrinsic::amdgcn_ds_gws_init:
4660  case Intrinsic::amdgcn_ds_gws_barrier:
4661  case Intrinsic::amdgcn_ds_gws_sema_br: {
4662  OpdsMapping[1] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4663 
4664  // This must be an SGPR, but accept a VGPR.
4665  unsigned Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
4666  AMDGPU::SGPRRegBankID);
4667  OpdsMapping[2] = AMDGPU::getValueMapping(Bank, 32);
4668  break;
4669  }
4670  case Intrinsic::amdgcn_ds_gws_sema_v:
4671  case Intrinsic::amdgcn_ds_gws_sema_p:
4672  case Intrinsic::amdgcn_ds_gws_sema_release_all: {
4673  // This must be an SGPR, but accept a VGPR.
4674  unsigned Bank = getRegBankID(MI.getOperand(1).getReg(), MRI,
4675  AMDGPU::SGPRRegBankID);
4676  OpdsMapping[1] = AMDGPU::getValueMapping(Bank, 32);
4677  break;
4678  }
4679  case Intrinsic::amdgcn_global_load_lds: {
4680  OpdsMapping[1] = getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4681  OpdsMapping[2] = getSGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4682  break;
4683  }
4684  case Intrinsic::amdgcn_lds_direct_load: {
4685  const int M0Idx = MI.getNumOperands() - 1;
4686  Register M0Reg = MI.getOperand(M0Idx).getReg();
4687  unsigned M0Bank = getRegBankID(M0Reg, MRI, AMDGPU::SGPRRegBankID);
4688  unsigned DstSize = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4689 
4690  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, DstSize);
4691  for (int I = 2; I != M0Idx && MI.getOperand(I).isReg(); ++I)
4692  OpdsMapping[I] = AMDGPU::getValueMapping(AMDGPU::VGPRRegBankID, 32);
4693 
4694  // Must be SGPR, but we must take whatever the original bank is and fix it
4695  // later.
4696  OpdsMapping[M0Idx] = AMDGPU::getValueMapping(M0Bank, 32);
4697  break;
4698  }
4699  case Intrinsic::amdgcn_ds_add_gs_reg_rtn:
4700  case Intrinsic::amdgcn_ds_sub_gs_reg_rtn:
4701  OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4702  OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4703  break;
4704  case Intrinsic::amdgcn_ds_bvh_stack_rtn: {
4705  OpdsMapping[0] =
4706  getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI); // %vdst
4707  OpdsMapping[1] =
4708  getVGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI); // %addr
4709  OpdsMapping[3] =
4710  getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI); // %addr
4711  OpdsMapping[4] =
4712  getVGPROpMapping(MI.getOperand(4).getReg(), MRI, *TRI); // %data0
4713  OpdsMapping[5] =
4714  getVGPROpMapping(MI.getOperand(5).getReg(), MRI, *TRI); // %data1
4715  break;
4716  }
4717 
4718  default:
4720  }
4721  break;
4722  }
4723  case AMDGPU::G_SELECT: {
4724  unsigned Size = MRI.getType(MI.getOperand(0).getReg()).getSizeInBits();
4725  unsigned Op2Bank = getRegBankID(MI.getOperand(2).getReg(), MRI,
4726  AMDGPU::SGPRRegBankID);
4727  unsigned Op3Bank = getRegBankID(MI.getOperand(3).getReg(), MRI,
4728  AMDGPU::SGPRRegBankID);
4729  bool SGPRSrcs = Op2Bank == AMDGPU::SGPRRegBankID &&
4730  Op3Bank == AMDGPU::SGPRRegBankID;
4731 
4732  unsigned CondBankDefault = SGPRSrcs ?
4733  AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
4734  unsigned CondBank = getRegBankID(MI.getOperand(1).getReg(), MRI,
4735  CondBankDefault);
4736  if (CondBank == AMDGPU::SGPRRegBankID)
4737  CondBank = SGPRSrcs ? AMDGPU::SGPRRegBankID : AMDGPU::VCCRegBankID;
4738  else if (CondBank == AMDGPU::VGPRRegBankID)
4739  CondBank = AMDGPU::VCCRegBankID;
4740 
4741  unsigned Bank = SGPRSrcs && CondBank == AMDGPU::SGPRRegBankID ?
4742  AMDGPU::SGPRRegBankID : AMDGPU::VGPRRegBankID;
4743 
4744  assert(CondBank == AMDGPU::VCCRegBankID || CondBank == AMDGPU::SGPRRegBankID);
4745 
4746  // TODO: Should report 32-bit for scalar condition type.
4747  if (Size == 64) {
4748  OpdsMapping[0] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
4749  OpdsMapping[1] = AMDGPU::getValueMapping(CondBank, 1);
4750  OpdsMapping[2] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
4751  OpdsMapping[3] = AMDGPU::getValueMappingSGPR64Only(Bank, Size);
4752  } else {
4753  OpdsMapping[0] = AMDGPU::getValueMapping(Bank, Size);
4754  OpdsMapping[1] = AMDGPU::getValueMapping(CondBank, 1);
4755  OpdsMapping[2] = AMDGPU::getValueMapping(Bank, Size);
4756  OpdsMapping[3] = AMDGPU::getValueMapping(Bank, Size);
4757  }
4758 
4759  break;
4760  }
4761 
4762  case AMDGPU::G_SI_CALL: {
4763  OpdsMapping[0] = AMDGPU::getValueMapping(AMDGPU::SGPRRegBankID, 64);
4764  // Lie and claim everything is legal, even though some need to be
4765  // SGPRs. applyMapping will have to deal with it as a waterfall loop.
4766  OpdsMapping[1] = getSGPROpMapping(MI.getOperand(1).getReg(), MRI, *TRI);
4767 
4768  // Allow anything for implicit arguments
4769  for (unsigned I = 4; I < MI.getNumOperands(); ++I) {
4770  if (MI.getOperand(I).isReg()) {
4771  Register Reg = MI.getOperand(I).getReg();
4772  auto OpBank = getRegBankID(Reg, MRI);
4773  unsigned Size = getSizeInBits(Reg, MRI, *TRI);
4774  OpdsMapping[I] = AMDGPU::getValueMapping(OpBank, Size);
4775  }
4776  }
4777  break;
4778  }
4779  case AMDGPU::G_LOAD:
4780  case AMDGPU::G_ZEXTLOAD:
4781  case AMDGPU::G_SEXTLOAD:
4782  return getInstrMappingForLoad(MI);
4783 
4784  case AMDGPU::G_ATOMICRMW_XCHG:
4785  case AMDGPU::G_ATOMICRMW_ADD:
4786  case AMDGPU::G_ATOMICRMW_SUB:
4787  case AMDGPU::G_ATOMICRMW_AND:
4788  case AMDGPU::G_ATOMICRMW_OR:
4789  case AMDGPU::G_ATOMICRMW_XOR:
4790  case AMDGPU::G_ATOMICRMW_MAX:
4791  case AMDGPU::G_ATOMICRMW_MIN:
4792  case AMDGPU::G_ATOMICRMW_UMAX:
4793  case AMDGPU::G_ATOMICRMW_UMIN:
4794  case AMDGPU::G_ATOMICRMW_FADD:
4795  case AMDGPU::G_AMDGPU_ATOMIC_CMPXCHG:
4796  case AMDGPU::G_AMDGPU_ATOMIC_INC:
4797  case AMDGPU::G_AMDGPU_ATOMIC_DEC:
4798  case AMDGPU::G_AMDGPU_ATOMIC_FMIN:
4799  case AMDGPU::G_AMDGPU_ATOMIC_FMAX: {
4800  OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4801  OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
4802  OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4803  break;
4804  }
4805  case AMDGPU::G_ATOMIC_CMPXCHG: {
4806  OpdsMapping[0] = getVGPROpMapping(MI.getOperand(0).getReg(), MRI, *TRI);
4807  OpdsMapping[1] = getValueMappingForPtr(MRI, MI.getOperand(1).getReg());
4808  OpdsMapping[2] = getVGPROpMapping(MI.getOperand(2).getReg(), MRI, *TRI);
4809  OpdsMapping[3] = getVGPROpMapping(MI.getOperand(3).getReg(), MRI, *TRI);
4810  break;
4811  }
4812  case AMDGPU::G_BRCOND: {
4813  unsigned Bank = getRegBankID(MI.getOperand(0).getReg(), MRI,
4814  AMDGPU::SGPRRegBankID);
4815  assert(MRI.getType(MI.getOperand(0).getReg()).getSizeInBits() == 1);
4816  if (Bank != AMDGPU::SGPRRegBankID)
4817  Bank = AMDGPU::VCCRegBankID;
4818 
4819  OpdsMapping[0] = AMDGPU::getValueMapping(Bank, 1);
4820  break;
4821  }
4822  case AMDGPU::G_FPTRUNC_ROUND_UPWARD:
4823  case AMDGPU::G_FPTRUNC_ROUND_DOWNWARD:
4824  return getDefaultMappingVOP(MI);
4825  }
4826 
4827  return getInstructionMapping(/*ID*/1, /*Cost*/1,
4828  getOperandsMapping(OpdsMapping),
4829  MI.getNumOperands());
4830 }
i
i
Definition: README.txt:29
MIPatternMatch.h
llvm::GCNSubtarget::hasScalarMulHiInsts
bool hasScalarMulHiInsts() const
Definition: GCNSubtarget.h:399
isScalarLoadLegal
static bool isScalarLoadLegal(const MachineInstr &MI)
Definition: AMDGPURegisterBankInfo.cpp:432
llvm::getIConstantVRegSExtVal
Optional< int64_t > getIConstantVRegSExtVal(Register VReg, const MachineRegisterInfo &MRI)
If VReg is defined by a G_CONSTANT fits in int64_t returns it.
Definition: Utils.cpp:301
llvm::AMDGPURegisterBankInfo::getSGPROpMapping
const ValueMapping * getSGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
Definition: AMDGPURegisterBankInfo.cpp:3462
Signed
@ Signed
Definition: NVPTXISelLowering.cpp:4715
substituteSimpleCopyRegs
static bool substituteSimpleCopyRegs(const AMDGPURegisterBankInfo::OperandsMapper &OpdMapper, unsigned OpIdx)
Definition: AMDGPURegisterBankInfo.cpp:1750
llvm::getDefIgnoringCopies
MachineInstr * getDefIgnoringCopies(Register Reg, const MachineRegisterInfo &MRI)
Find the def instruction for Reg, folding away any trivial copies.
Definition: Utils.cpp:462
llvm::AMDGPURegisterBankInfo
Definition: AMDGPURegisterBankInfo.h:42
llvm::SIRegisterInfo::isAGPRClass
static bool isAGPRClass(const TargetRegisterClass *RC)
Definition: SIRegisterInfo.h:207
MI
IRTranslator LLVM IR MI
Definition: IRTranslator.cpp:108
llvm::AMDGPURegisterBankInfo::getDefaultMappingAllVGPR
const InstructionMapping & getDefaultMappingAllVGPR(const MachineInstr &MI) const
Definition: AMDGPURegisterBankInfo.cpp:3329
llvm
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
llvm::tgtok::Def
@ Def
Definition: TGLexer.h:50
llvm::CmpInst::ICMP_EQ
@ ICMP_EQ
equal
Definition: InstrTypes.h:741
getExtendOp
static unsigned getExtendOp(unsigned Opc)
Definition: AMDGPURegisterBankInfo.cpp:1710
llvm::make_range
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
Definition: iterator_range.h:53
llvm::MONoClobber
static const MachineMemOperand::Flags MONoClobber
Mark the MMO of a uniform load if there are no potentially clobbering stores on any path from the sta...
Definition: SIInstrInfo.h:41
llvm::LLT::getScalarSizeInBits
unsigned getScalarSizeInBits() const
Definition: LowLevelTypeImpl.h:224
llvm::MachineRegisterInfo::createVirtualRegister
Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
Definition: MachineRegisterInfo.cpp:156
llvm::CmpInst::Predicate
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition: InstrTypes.h:720
PHI
Rewrite undef for PHI
Definition: AMDGPURewriteUndefForPHI.cpp:101
llvm::MachineMemOperand::getAlign
Align getAlign() const
Return the minimum known alignment in bytes of the actual memory reference.
Definition: MachineOperand.cpp:1100
llvm::MIPatternMatch::m_Reg
operand_type_match m_Reg()
Definition: MIPatternMatch.h:268
SIMachineFunctionInfo.h
llvm::RegisterBankInfo::getInstrMappingImpl
const InstructionMapping & getInstrMappingImpl(const MachineInstr &MI) const
Try to get the mapping of MI.
Definition: RegisterBankInfo.cpp:159
llvm::MachineRegisterInfo
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
Definition: MachineRegisterInfo.h:50
splitUnequalType
static std::pair< LLT, LLT > splitUnequalType(LLT Ty, unsigned FirstSize)
Split Ty into 2 pieces.
Definition: AMDGPURegisterBankInfo.cpp:1024
llvm::MachineInstrSpan
MachineInstrSpan provides an interface to get an iteration range containing the instruction it was in...
Definition: MachineBasicBlock.h:1240
llvm::RegisterBankInfo::OperandsMapper::getMRI
MachineRegisterInfo & getMRI() const
The MachineRegisterInfo we used to realize the mapping.
Definition: RegisterBankInfo.h:334
llvm::getOpcodeDef
MachineInstr * getOpcodeDef(unsigned Opcode, Register Reg, const MachineRegisterInfo &MRI)
See if Reg is defined by an single def instruction that is Opcode.
Definition: Utils.cpp:476
C1
instcombine should handle this C2 when C1
Definition: README.txt:263
llvm::SmallVector< MachineInstr *, 4 >
llvm::AMDGPUAS::REGION_ADDRESS
@ REGION_ADDRESS
Address space for region memory. (GDS)
Definition: AMDGPU.h:373
llvm::MachineFunction::getMachineMemOperand
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, uint64_t s, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
Definition: MachineFunction.cpp:454
llvm::LegacyLegalizeActions::Bitcast
@ Bitcast
Perform the operation on a different, but equivalently sized type.
Definition: LegacyLegalizerInfo.h:54
llvm::RegisterBankInfo::getRegBank
RegisterBank & getRegBank(unsigned ID)
Get the register bank identified by ID.
Definition: RegisterBankInfo.h:431
llvm::X86Disassembler::Reg
Reg
All possible values of the reg field in the ModR/M byte.
Definition: X86DisassemblerDecoder.h:462
llvm::getSrcRegIgnoringCopies
Register getSrcRegIgnoringCopies(Register Reg, const MachineRegisterInfo &MRI)
Find the source register for Reg, folding away any trivial copies.
Definition: Utils.cpp:469
llvm::AMDGPUInstrInfo::isUniformMMO
static bool isUniformMMO(const MachineMemOperand *MMO)
Definition: AMDGPUInstrInfo.cpp:31
llvm::MachineMemOperand::MOInvariant
@ MOInvariant
The memory access always returns the same value (or traps).
Definition: MachineMemOperand.h:144
llvm::CmpInst::ICMP_NE
@ ICMP_NE
not equal
Definition: InstrTypes.h:742
llvm::AMDGPURegisterBankInfo::applyMappingImpl
void applyMappingImpl(const OperandsMapper &OpdMapper) const override
See RegisterBankInfo::applyMapping.
Definition: AMDGPURegisterBankInfo.cpp:2113
llvm::RegisterBankInfo::applyDefaultMapping
static void applyDefaultMapping(const OperandsMapper &OpdMapper)
Helper method to apply something that is like the default mapping.
Definition: RegisterBankInfo.cpp:435
llvm::AMDGPURegisterBankInfo::AMDGPURegisterBankInfo
AMDGPURegisterBankInfo(const GCNSubtarget &STI)
Definition: AMDGPURegisterBankInfo.cpp:196
llvm::TargetRegisterInfo
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
Definition: TargetRegisterInfo.h:237
llvm::SIRegisterInfo::getWaveMaskRegClass
const TargetRegisterClass * getWaveMaskRegClass() const
Definition: SIRegisterInfo.h:335
llvm::LLT::getScalarType
LLT getScalarType() const
Definition: LowLevelTypeImpl.h:167
llvm::tgtok::Bits
@ Bits
Definition: TGLexer.h:50
llvm::MachineMemOperand
A description of a memory reference used in the backend.
Definition: MachineMemOperand.h:127
llvm::MachineMemOperand::MODereferenceable
@ MODereferenceable
The memory access is dereferenceable (i.e., doesn't trap).
Definition: MachineMemOperand.h:142
llvm::AMDGPURegisterBankInfo::getInstrMapping
const InstructionMapping & getInstrMapping(const MachineInstr &MI) const override
This function must return a legal mapping, because AMDGPURegisterBankInfo::getInstrAlternativeMapping...
Definition: AMDGPURegisterBankInfo.cpp:3498
llvm::MachineFunction::insert
void insert(iterator MBBI, MachineBasicBlock *MBB)
Definition: MachineFunction.h:873
llvm::SmallSet
SmallSet - This maintains a set of unique values, optimizing for the case when the set is small (less...
Definition: SmallSet.h:136
llvm::LLT::isValid
bool isValid() const
Definition: LowLevelTypeImpl.h:116
GenericMachineInstrs.h
llvm::Optional< int64_t >
llvm::RegisterBankInfo::OperandsMapper::getInstrMapping
const InstructionMapping & getInstrMapping() const
The final mapping of the instruction.
Definition: RegisterBankInfo.h:331
llvm::AMDGPURegisterBankInfo::getVGPROpMapping
const ValueMapping * getVGPROpMapping(Register Reg, const MachineRegisterInfo &MRI, const TargetRegisterInfo &TRI) const
Definition: AMDGPURegisterBankInfo.cpp:3473
llvm::GCNSubtarget
Definition: GCNSubtarget.h:31
llvm::max
Expected< ExpressionValue > max(const ExpressionValue &Lhs, const ExpressionValue &Rhs)
Definition: FileCheck.cpp:337
MachineIRBuilder.h
unpackV2S16ToS32
static std::pair< Register, Register > unpackV2S16ToS32(MachineIRBuilder &B, Register Src, unsigned ExtOpcode)
Definition: AMDGPURegisterBankInfo.cpp:1728
llvm::RegisterBankInfo::InstructionMapping::isValid
bool isValid() const
Check whether this object is valid.
Definition: RegisterBankInfo.h:253
llvm::MachineMemOperand::isInvariant
bool isInvariant() const
Definition: MachineMemOperand.h:292
llvm::AMDGPURegisterBankInfo::TII
const SIInstrInfo * TII
Definition: AMDGPURegisterBankInfo.h:46
llvm::LegalizerHelper
Definition: LegalizerHelper.h:46
llvm::AMDGPURegisterBankInfo::applyMappingSBufferLoad
bool applyMappingSBufferLoad(const OperandsMapper &OpdMapper) const
Definition: AMDGPURegisterBankInfo.cpp:1332
llvm::AMDGPURegisterBankInfo::TRI
const SIRegisterInfo * TRI
Definition: AMDGPURegisterBankInfo.h:45
TRI
unsigned const TargetRegisterInfo * TRI
Definition: MachineSink.cpp:1628
llvm::RegisterBankInfo::getValueMapping
const ValueMapping & getValueMapping(unsigned StartIdx, unsigned Length, const RegisterBank &RegBank) const
The most common ValueMapping consists of a single PartialMapping.
Definition: RegisterBankInfo.cpp:294
llvm::GCNSubtarget::hasScalarCompareEq64
bool hasScalarCompareEq64() const
Definition: GCNSubtarget.h:840
regBankBoolUnion
static unsigned regBankBoolUnion(unsigned RB0, unsigned RB1)
Definition: AMDGPURegisterBankInfo.cpp:3237
llvm::AMDGPURegisterBankInfo::getInstrAlternativeMappingsIntrinsic
RegisterBankInfo::InstructionMappings getInstrAlternativeMappingsIntrinsic(const MachineInstr &MI, const MachineRegisterInfo &MRI) const
Definition: AMDGPURegisterBankInfo.cpp:334
llvm::RegisterBankInfo::ValueMapping::BreakDown
const PartialMapping * BreakDown
How the value is broken down between the different register banks.
Definition: RegisterBankInfo.h:147
llvm::constrainSelectedInstRegOperands
bool constrainSelectedInstRegOperands(MachineInstr &I, const TargetInstrInfo &TII, const TargetRegisterInfo &TRI, const RegisterBankInfo &RBI)
Mutate the newly-selected instruction I to constrain its (possibly generic) virtual register operands...
Definition: Utils.cpp:153
llvm::AMDGPURegisterBankInfo::copyCost
unsigned copyCost(const RegisterBank &A, const RegisterBank &B, unsigned Size) const override
Get the cost of a copy from B to A, or put differently, get the cost of A = COPY B.
Definition: AMDGPURegisterBankInfo.cpp:218
llvm::AMDGPURegisterBankInfo::buildReadFirstLane
Register buildReadFirstLane(MachineIRBuilder &B, MachineRegisterInfo &MRI, Register Src) const
Definition: AMDGPURegisterBankInfo.cpp:683
llvm::LLT::fixed_vector
static LLT fixed_vector(unsigned NumElements, unsigned ScalarSizeInBits)
Get a low-level fixed-width vector of some number of elements and element width.
Definition: LowLevelTypeImpl.h:74
llvm::MachineBasicBlock::addSuccessor
void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
Definition: MachineBasicBlock.cpp:762
llvm::MachineFunction::getRegInfo
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Definition: MachineFunction.h:667
llvm::AMDGPUSubtarget::hasSMulHi
bool hasSMulHi() const
Definition: AMDGPUSubtarget.h:180
getBaseWithConstantOffset