LLVM  13.0.0git
Macros | Functions
AMDGPURegisterBankInfo.cpp File Reference
#include "AMDGPURegisterBankInfo.h"
#include "AMDGPU.h"
#include "AMDGPUGlobalISelUtils.h"
#include "AMDGPUInstrInfo.h"
#include "GCNSubtarget.h"
#include "SIMachineFunctionInfo.h"
#include "SIRegisterInfo.h"
#include "llvm/CodeGen/GlobalISel/LegalizerHelper.h"
#include "llvm/CodeGen/GlobalISel/MIPatternMatch.h"
#include "llvm/CodeGen/GlobalISel/MachineIRBuilder.h"
#include "llvm/CodeGen/GlobalISel/RegisterBank.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "AMDGPUGenRegisterBank.inc"
#include "AMDGPUGenRegisterBankInfo.def"
Include dependency graph for AMDGPURegisterBankInfo.cpp:

Go to the source code of this file.

Macros

#define GET_TARGET_REGBANK_IMPL
 

Functions

static bool isVectorRegisterBank (const RegisterBank &Bank)
 
static bool memOpHasNoClobbered (const MachineMemOperand *MMO)
 
static bool isScalarLoadLegal (const MachineInstr &MI)
 
static void setRegsToType (MachineRegisterInfo &MRI, ArrayRef< Register > Regs, LLT NewTy)
 Replace the current type each register in Regs has with NewTy. More...
 
static LLT getHalfSizedType (LLT Ty)
 
static std::pair< LLT, LLTsplitUnequalType (LLT Ty, unsigned FirstSize)
 Split Ty into 2 pieces. More...
 
static LLT widen96To128 (LLT Ty)
 
static Register getSrcRegIgnoringCopies (const MachineRegisterInfo &MRI, Register Reg)
 
static unsigned setBufferOffsets (MachineIRBuilder &B, const AMDGPURegisterBankInfo &RBI, Register CombinedOffset, Register &VOffsetReg, Register &SOffsetReg, int64_t &InstOffsetVal, Align Alignment)
 
static unsigned getExtendOp (unsigned Opc)
 
static std::pair< Register, RegisterunpackV2S16ToS32 (MachineIRBuilder &B, Register Src, unsigned ExtOpcode)
 
static bool substituteSimpleCopyRegs (const AMDGPURegisterBankInfo::OperandsMapper &OpdMapper, unsigned OpIdx)
 
static std::pair< Register, unsigned > getBaseWithConstantOffset (MachineRegisterInfo &MRI, Register Reg)
 
static bool isZero (Register Reg, MachineRegisterInfo &MRI)
 
static unsigned extractCPol (unsigned CachePolicy)
 
static unsigned extractSWZ (unsigned CachePolicy)
 
static void reinsertVectorIndexAdd (MachineIRBuilder &B, MachineInstr &IdxUseInstr, unsigned OpIdx, unsigned ConstOffset)
 Utility function for pushing dynamic vector indexes with a constant offset into waterwall loops. More...
 
static void extendLow32IntoHigh32 (MachineIRBuilder &B, Register Hi32Reg, Register Lo32Reg, unsigned ExtOpc, const RegisterBank &RegBank, bool IsBooleanSrc=false)
 Implement extending a 32-bit value to a 64-bit value. More...
 
static Register constrainRegToBank (MachineRegisterInfo &MRI, MachineIRBuilder &B, Register &Reg, const RegisterBank &Bank)
 
static unsigned regBankUnion (unsigned RB0, unsigned RB1)
 
static unsigned regBankBoolUnion (unsigned RB0, unsigned RB1)
 

Detailed Description

This file implements the targeting of the RegisterBankInfo class for AMDGPU.

AMDGPU has unique register bank constraints that require special high level strategies to deal with. There are two main true physical register banks VGPR (vector), and SGPR (scalar). Additionally the VCC register bank is a sort of pseudo-register bank needed to represent SGPRs used in a vector boolean context. There is also the AGPR bank, which is a special purpose physical register bank present on some subtargets.

Copying from VGPR to SGPR is generally illegal, unless the value is known to be uniform. It is generally not valid to legalize operands by inserting copies as on other targets. Operations which require uniform, SGPR operands generally require scalarization by repeatedly executing the instruction, activating each set of lanes using a unique set of input values. This is referred to as a waterfall loop.

Booleans

Booleans (s1 values) requires special consideration. A vector compare result is naturally a bitmask with one bit per lane, in a 32 or 64-bit register. These are represented with the VCC bank. During selection, we need to be able to unambiguously go back from a register class to a register bank. To distinguish whether an SGPR should use the SGPR or VCC register bank, we need to know the use context type. An SGPR s1 value always means a VCC bank value, otherwise it will be the SGPR bank. A scalar compare sets SCC, which is a 1-bit unaddressable register. This will need to be copied to a 32-bit virtual register. Taken together, this means we need to adjust the type of boolean operations to be regbank legal. All SALU booleans need to be widened to 32-bits, and all VALU booleans need to be s1 values.

A noteworthy exception to the s1-means-vcc rule is for legalization artifact casts. G_TRUNC s1 results, and G_SEXT/G_ZEXT/G_ANYEXT sources are never vcc bank. A non-boolean source (such as a truncate from a 1-bit load from memory) will require a copy to the VCC bank which will require clearing the high bits and inserting a compare.

Constant bus restriction

VALU instructions have a limitation known as the constant bus restriction. Most VALU instructions can use SGPR operands, but may read at most 1 SGPR or constant literal value (this to 2 in gfx10 for most instructions). This is one unique SGPR, so the same SGPR may be used for multiple operands. From a register bank perspective, any combination of operands should be legal as an SGPR, but this is contextually dependent on the SGPR operands all being the same register. There is therefore optimal to choose the SGPR with the most uses to minimize the number of copies.

We avoid trying to solve this problem in RegBankSelect. Any VALU G_* operation should have its source operands all mapped to VGPRs (except for VCC), inserting copies from any SGPR operands. This the most trival legal mapping. Anything beyond the simplest 1:1 instruction selection would be too complicated to solve here. Every optimization pattern or instruction selected to multiple outputs would have to enforce this rule, and there would be additional complexity in tracking this rule for every G_* operation. By forcing all inputs to VGPRs, it also simplifies the task of picking the optimal operand combination from a post-isel optimization pass.

Definition in file AMDGPURegisterBankInfo.cpp.

Macro Definition Documentation

◆ GET_TARGET_REGBANK_IMPL

#define GET_TARGET_REGBANK_IMPL

Definition at line 85 of file AMDGPURegisterBankInfo.cpp.

Function Documentation

◆ constrainRegToBank()

static Register constrainRegToBank ( MachineRegisterInfo MRI,
MachineIRBuilder B,
Register Reg,
const RegisterBank Bank 
)
static

◆ extendLow32IntoHigh32()

static void extendLow32IntoHigh32 ( MachineIRBuilder B,
Register  Hi32Reg,
Register  Lo32Reg,
unsigned  ExtOpc,
const RegisterBank RegBank,
bool  IsBooleanSrc = false 
)
static

Implement extending a 32-bit value to a 64-bit value.

Lo32Reg is the original 32-bit source value (to be inserted in the low part of the combined 64-bit result), and Hi32Reg is the high half of the combined 64-bit value.

Definition at line 1874 of file AMDGPURegisterBankInfo.cpp.

References assert(), B, and llvm::LLT::scalar().

Referenced by llvm::AMDGPURegisterBankInfo::applyMappingImpl().

◆ extractCPol()

static unsigned extractCPol ( unsigned  CachePolicy)
static

◆ extractSWZ()

static unsigned extractSWZ ( unsigned  CachePolicy)
static

◆ getBaseWithConstantOffset()

static std::pair<Register, unsigned> getBaseWithConstantOffset ( MachineRegisterInfo MRI,
Register  Reg 
)
static

◆ getExtendOp()

static unsigned getExtendOp ( unsigned  Opc)
static

◆ getHalfSizedType()

static LLT getHalfSizedType ( LLT  Ty)
static

◆ getSrcRegIgnoringCopies()

static Register getSrcRegIgnoringCopies ( const MachineRegisterInfo MRI,
Register  Reg 
)
static

Definition at line 1318 of file AMDGPURegisterBankInfo.cpp.

References llvm::tgtok::Def, llvm::getDefIgnoringCopies(), MRI, and Reg.

◆ isScalarLoadLegal()

static bool isScalarLoadLegal ( const MachineInstr MI)
static

◆ isVectorRegisterBank()

static bool isVectorRegisterBank ( const RegisterBank Bank)
static

◆ isZero()

static bool isZero ( Register  Reg,
MachineRegisterInfo MRI 
)
static

◆ memOpHasNoClobbered()

static bool memOpHasNoClobbered ( const MachineMemOperand MMO)
static

Definition at line 431 of file AMDGPURegisterBankInfo.cpp.

References llvm::MachineMemOperand::getValue(), and I.

Referenced by isScalarLoadLegal().

◆ regBankBoolUnion()

static unsigned regBankBoolUnion ( unsigned  RB0,
unsigned  RB1 
)
static

◆ regBankUnion()

static unsigned regBankUnion ( unsigned  RB0,
unsigned  RB1 
)
static

◆ reinsertVectorIndexAdd()

static void reinsertVectorIndexAdd ( MachineIRBuilder B,
MachineInstr IdxUseInstr,
unsigned  OpIdx,
unsigned  ConstOffset 
)
static

◆ setBufferOffsets()

static unsigned setBufferOffsets ( MachineIRBuilder B,
const AMDGPURegisterBankInfo RBI,
Register  CombinedOffset,
Register VOffsetReg,
Register SOffsetReg,
int64_t &  InstOffsetVal,
Align  Alignment 
)
static

◆ setRegsToType()

static void setRegsToType ( MachineRegisterInfo MRI,
ArrayRef< Register Regs,
LLT  NewTy 
)
static

Replace the current type each register in Regs has with NewTy.

Definition at line 668 of file AMDGPURegisterBankInfo.cpp.

References assert(), llvm::LLT::getSizeInBits(), llvm::MachineRegisterInfo::getType(), MRI, Reg, and llvm::MachineRegisterInfo::setType().

Referenced by llvm::AMDGPURegisterBankInfo::applyMappingImpl().

◆ splitUnequalType()

static std::pair<LLT, LLT> splitUnequalType ( LLT  Ty,
unsigned  FirstSize 
)
static

Split Ty into 2 pieces.

The first will have FirstSize bits, and the rest will be in the remainder.

Definition at line 1114 of file AMDGPURegisterBankInfo.cpp.

References assert(), llvm::LLT::getElementType(), llvm::LLT::getSizeInBits(), llvm::LLT::isVector(), llvm::LLT::scalar(), and llvm::LLT::scalarOrVector().

Referenced by llvm::AMDGPURegisterBankInfo::applyMappingLoad().

◆ substituteSimpleCopyRegs()

static bool substituteSimpleCopyRegs ( const AMDGPURegisterBankInfo::OperandsMapper &  OpdMapper,
unsigned  OpIdx 
)
static

Definition at line 1632 of file AMDGPURegisterBankInfo.cpp.

References assert().

Referenced by llvm::AMDGPURegisterBankInfo::applyMappingImpl().

◆ unpackV2S16ToS32()

static std::pair<Register, Register> unpackV2S16ToS32 ( MachineIRBuilder B,
Register  Src,
unsigned  ExtOpcode 
)
static

◆ widen96To128()

static LLT widen96To128 ( LLT  Ty)
static