|
LLVM 23.0.0git
|
#include "Target/AMDGPU/AMDGPUSubtarget.h"
Public Types | |
| enum | Generation { INVALID = 0 , R600 = 1 , R700 = 2 , EVERGREEN = 3 , NORTHERN_ISLANDS = 4 , SOUTHERN_ISLANDS = 5 , SEA_ISLANDS = 6 , VOLCANIC_ISLANDS = 7 , GFX9 = 8 , GFX10 = 9 , GFX11 = 10 , GFX12 = 11 , GFX13 = 12 } |
Public Member Functions | |
| AMDGPUSubtarget (Triple TT) | |
| std::pair< unsigned, unsigned > | getDefaultFlatWorkGroupSize (CallingConv::ID CC) const |
| std::pair< unsigned, unsigned > | getFlatWorkGroupSizes (const Function &F) const |
| std::optional< unsigned > | getReqdWorkGroupSize (const Function &F, unsigned Dim) const |
| bool | hasWavefrontsEvenlySplittingXDim (const Function &F, bool REquiresUniformYZ=false) const |
| std::pair< unsigned, unsigned > | getWavesPerEU (const Function &F) const |
| std::pair< unsigned, unsigned > | getWavesPerEU (const Function &F, std::pair< unsigned, unsigned > FlatWorkGroupSizes) const |
| Overload which uses the specified values for the flat work group sizes, rather than querying the function itself. | |
| std::pair< unsigned, unsigned > | getWavesPerEU (std::pair< unsigned, unsigned > FlatWorkGroupSizes, unsigned LDSBytes, const Function &F) const |
| Overload which uses the specified values for the flat workgroup sizes and LDS space rather than querying the function itself. | |
| std::pair< unsigned, unsigned > | getEffectiveWavesPerEU (std::pair< unsigned, unsigned > RequestedWavesPerEU, std::pair< unsigned, unsigned > FlatWorkGroupSizes, unsigned LDSBytes) const |
| Returns the target minimum/maximum number of waves per EU. | |
| unsigned | getMaxLocalMemSizeWithWaveCount (unsigned WaveCount, const Function &) const |
| Return the amount of LDS that can be used that will not restrict the occupancy lower than WaveCount. | |
| std::pair< unsigned, unsigned > | getOccupancyWithWorkGroupSizes (uint32_t LDSBytes, const Function &F) const |
Subtarget's minimum/maximum occupancy, in number of waves per EU, that can be achieved when the only function running on a CU is F and each workgroup running the function requires LDSBytes bytes of LDS space. | |
| std::pair< unsigned, unsigned > | getOccupancyWithWorkGroupSizes (uint32_t LDSBytes, std::pair< unsigned, unsigned > FlatWorkGroupSizes) const |
| Overload which uses the specified values for the flat work group sizes, rather than querying the function itself. | |
| std::pair< unsigned, unsigned > | getOccupancyWithWorkGroupSizes (const MachineFunction &MF) const |
Subtarget's minimum/maximum occupancy, in number of waves per EU, that can be achieved when the only function running on a CU is MF. | |
| bool | isAmdHsaOS () const |
| bool | isAmdPalOS () const |
| bool | isMesa3DOS () const |
| bool | isMesaKernel (const Function &F) const |
| bool | isAmdHsaOrMesa (const Function &F) const |
| bool | isGCN () const |
| bool | useRealTrue16Insts () const |
| Return true if real (non-fake) variants of True16 instructions using 16-bit registers should be code-generated. | |
| bool | hasMulI24 () const |
| bool | hasMulU24 () const |
| bool | hasSMulHi () const |
| bool | hasFminFmaxLegacy () const |
| unsigned | getWavefrontSize () const |
| unsigned | getWavefrontSizeLog2 () const |
| unsigned | getLocalMemorySize () const |
| Return the maximum number of bytes of LDS available for all workgroups running on the same WGP or CU. | |
| unsigned | getAddressableLocalMemorySize () const |
| Return the maximum number of bytes of LDS that can be allocated to a single workgroup. | |
| unsigned | getEUsPerCU () const |
| Number of SIMDs/EUs (execution units) per "CU" ("compute unit"), where the "CU" is the unit onto which workgroups are mapped. | |
| Align | getAlignmentForImplicitArgPtr () const |
| unsigned | getExplicitKernelArgOffset () const |
| Returns the offset in bytes from the start of the input buffer of the first explicit kernel argument. | |
| virtual unsigned | getMaxWorkGroupsPerCU (unsigned FlatWorkGroupSize) const =0 |
| virtual unsigned | getMinFlatWorkGroupSize () const =0 |
| virtual unsigned | getMaxFlatWorkGroupSize () const =0 |
| virtual unsigned | getWavesPerEUForWorkGroup (unsigned FlatWorkGroupSize) const =0 |
| virtual unsigned | getMinWavesPerEU () const =0 |
| unsigned | getMaxWavesPerEU () const |
| unsigned | getMaxWorkitemID (const Function &Kernel, unsigned Dimension) const |
| Return the maximum workitem ID value in the function, for the given (0, 1, 2) dimension. | |
| SmallVector< unsigned > | getMaxNumWorkGroups (const Function &F) const |
| Return the number of work groups for the function. | |
| bool | isSingleLaneExecution (const Function &Kernel) const |
| Return true if only a single workitem can be active in a wave. | |
| bool | makeLIDRangeMetadata (Instruction *I) const |
| Creates value range metadata on an workitemid.* intrinsic call or load. | |
| unsigned | getImplicitArgNumBytes (const Function &F) const |
| uint64_t | getExplicitKernArgSize (const Function &F, Align &MaxAlign) const |
| unsigned | getKernArgSegmentSize (const Function &F, Align &MaxAlign) const |
| AMDGPUDwarfFlavour | getAMDGPUDwarfFlavour () const |
| virtual | ~AMDGPUSubtarget ()=default |
Static Public Member Functions | |
| static const AMDGPUSubtarget & | get (const MachineFunction &MF) |
| static const AMDGPUSubtarget & | get (const TargetMachine &TM, const Function &F) |
Protected Attributes | |
| bool | HasMulI24 = true |
| bool | HasMulU24 = true |
| bool | HasSMulHi = false |
| bool | HasFminFmaxLegacy = true |
| unsigned | EUsPerCU = 4 |
| unsigned | MaxWavesPerEU = 10 |
| unsigned | LocalMemorySize = 0 |
| unsigned | AddressableLocalMemorySize = 0 |
| char | WavefrontSizeLog2 = 0 |
Definition at line 30 of file AMDGPUSubtarget.h.
| Enumerator | |
|---|---|
| INVALID | |
| R600 | |
| R700 | |
| EVERGREEN | |
| NORTHERN_ISLANDS | |
| SOUTHERN_ISLANDS | |
| SEA_ISLANDS | |
| VOLCANIC_ISLANDS | |
| GFX9 | |
| GFX10 | |
| GFX11 | |
| GFX12 | |
| GFX13 | |
Definition at line 32 of file AMDGPUSubtarget.h.
|
inline |
Definition at line 64 of file AMDGPUSubtarget.h.
References llvm::move().
Referenced by llvm::GCNSubtarget::GCNSubtarget(), get(), get(), and llvm::R600Subtarget::R600Subtarget().
|
virtualdefault |
|
static |
Definition at line 413 of file AMDGPUSubtarget.cpp.
References AMDGPUSubtarget(), llvm::MachineFunction::getSubtarget(), llvm::MachineFunction::getTarget(), llvm::TargetMachine::getTargetTriple(), and llvm::Triple::isAMDGCN().
|
static |
Definition at line 419 of file AMDGPUSubtarget.cpp.
References AMDGPUSubtarget(), F, llvm::TargetMachine::getSubtarget(), llvm::TargetMachine::getTargetTriple(), and llvm::Triple::isAMDGCN().
|
inline |
Return the maximum number of bytes of LDS that can be allocated to a single workgroup.
For GFX10-GFX12 in WGP mode this is limited to 64k even though the WGP has 128k in total.
Definition at line 241 of file AMDGPUSubtarget.h.
References AddressableLocalMemorySize.
|
inline |
Definition at line 250 of file AMDGPUSubtarget.h.
References isAmdHsaOS().
Referenced by getKernArgSegmentSize().
| AMDGPUDwarfFlavour AMDGPUSubtarget::getAMDGPUDwarfFlavour | ( | ) | const |
WavefrontSize. Definition at line 408 of file AMDGPUSubtarget.cpp.
References getWavefrontSize(), llvm::Wave32, and llvm::Wave64.
| std::pair< unsigned, unsigned > AMDGPUSubtarget::getDefaultFlatWorkGroupSize | ( | CallingConv::ID | CC | ) | const |
Definition at line 139 of file AMDGPUSubtarget.cpp.
References llvm::CallingConv::AMDGPU_ES, llvm::CallingConv::AMDGPU_GS, llvm::CallingConv::AMDGPU_HS, llvm::CallingConv::AMDGPU_LS, llvm::CallingConv::AMDGPU_PS, llvm::CallingConv::AMDGPU_VS, getMaxFlatWorkGroupSize(), and getWavefrontSize().
Referenced by getFlatWorkGroupSizes().
| std::pair< unsigned, unsigned > AMDGPUSubtarget::getEffectiveWavesPerEU | ( | std::pair< unsigned, unsigned > | RequestedWavesPerEU, |
| std::pair< unsigned, unsigned > | FlatWorkGroupSizes, | ||
| unsigned | LDSBytes ) const |
Returns the target minimum/maximum number of waves per EU.
This is based on the minimum/maximum number of RequestedWavesPerEU and further limited by the maximum achievable occupancy derived from the range of FlatWorkGroupSizes and number of LDSBytes per workgroup.
Definition at line 176 of file AMDGPUSubtarget.cpp.
References llvm::Default, getMaxWavesPerEU(), getOccupancyWithWorkGroupSizes(), and getWavesPerEUForWorkGroup().
Referenced by getWavesPerEU().
|
inline |
Number of SIMDs/EUs (execution units) per "CU" ("compute unit"), where the "CU" is the unit onto which workgroups are mapped.
This takes WGP mode vs. CU mode into account.
Definition at line 248 of file AMDGPUSubtarget.h.
References EUsPerCU.
Referenced by getMaxLocalMemSizeWithWaveCount(), and getOccupancyWithWorkGroupSizes().
Definition at line 361 of file AMDGPUSubtarget.cpp.
References llvm::alignTo(), llvm::CallingConv::AMDGPU_KERNEL, assert(), DL, F, and llvm::CallingConv::SPIR_KERNEL.
Referenced by getKernArgSegmentSize().
|
inline |
Returns the offset in bytes from the start of the input buffer of the first explicit kernel argument.
Definition at line 256 of file AMDGPUSubtarget.h.
References llvm::Triple::AMDHSA, llvm::Triple::AMDPAL, llvm_unreachable, llvm::Triple::Mesa3D, and llvm::Triple::UnknownOS.
Referenced by getKernArgSegmentSize(), and llvm::AMDGPUCallLowering::lowerFormalArgumentsKernel().
F, or minimum/maximum flat work group sizes explicitly requested using "amdgpu-flat-work-group-size" attribute attached to function F.Definition at line 153 of file AMDGPUSubtarget.cpp.
References llvm::Default, F, getDefaultFlatWorkGroupSize(), llvm::AMDGPU::getIntegerPairAttribute(), getMaxFlatWorkGroupSize(), and getMinFlatWorkGroupSize().
Referenced by getMaxLocalMemSizeWithWaveCount(), getMaxWorkitemID(), getOccupancyWithWorkGroupSizes(), getWavesPerEU(), and makeLIDRangeMetadata().
Definition at line 342 of file AMDGPUSubtarget.cpp.
References llvm::AMDGPU::AMDHSA_COV5, assert(), F, llvm::AMDGPU::getAMDHSACodeObjectVersion(), llvm::AMDGPU::isKernel(), and isMesaKernel().
Referenced by getKernArgSegmentSize().
Definition at line 386 of file AMDGPUSubtarget.cpp.
References llvm::alignTo(), llvm::CallingConv::AMDGPU_KERNEL, F, getAlignmentForImplicitArgPtr(), getExplicitKernArgSize(), getExplicitKernelArgOffset(), getImplicitArgNumBytes(), and llvm::CallingConv::SPIR_KERNEL.
Referenced by llvm::AMDGPU::HSAMD::MetadataStreamerMsgPackV4::getHSAKernelProps().
|
inline |
Return the maximum number of bytes of LDS available for all workgroups running on the same WGP or CU.
For GFX10-GFX12 in WGP mode this is 128k even though each workgroup is limited to 64k.
Definition at line 233 of file AMDGPUSubtarget.h.
References LocalMemorySize.
Referenced by getMaxLocalMemSizeWithWaveCount(), and getOccupancyWithWorkGroupSizes().
|
pure virtual |
Implemented in llvm::GCNSubtarget, and llvm::R600Subtarget.
Referenced by getDefaultFlatWorkGroupSize(), and getFlatWorkGroupSizes().
| unsigned AMDGPUSubtarget::getMaxLocalMemSizeWithWaveCount | ( | unsigned | WaveCount, |
| const Function & | F ) const |
Return the amount of LDS that can be used that will not restrict the occupancy lower than WaveCount.
Definition at line 39 of file AMDGPUSubtarget.cpp.
References F, getEUsPerCU(), getFlatWorkGroupSizes(), getLocalMemorySize(), getMaxLocalMemSizeWithWaveCount(), and getWavefrontSize().
Referenced by getMaxLocalMemSizeWithWaveCount().
| SmallVector< unsigned > AMDGPUSubtarget::getMaxNumWorkGroups | ( | const Function & | F | ) | const |
Return the number of work groups for the function.
Definition at line 428 of file AMDGPUSubtarget.cpp.
References F, and llvm::AMDGPU::getIntegerVecAttribute().
|
inline |
Definition at line 293 of file AMDGPUSubtarget.h.
References MaxWavesPerEU.
Referenced by getEffectiveWavesPerEU(), getOccupancyWithWorkGroupSizes(), and getWavesPerEU().
|
pure virtual |
FlatWorkGroupSize. Implemented in llvm::GCNSubtarget, and llvm::R600Subtarget.
Referenced by getOccupancyWithWorkGroupSizes().
Return the maximum workitem ID value in the function, for the given (0, 1, 2) dimension.
Definition at line 257 of file AMDGPUSubtarget.cpp.
References getFlatWorkGroupSizes(), and getReqdWorkGroupSize().
Referenced by isSingleLaneExecution().
|
pure virtual |
Implemented in llvm::GCNSubtarget, and llvm::R600Subtarget.
Referenced by getFlatWorkGroupSizes().
|
pure virtual |
Implemented in llvm::GCNSubtarget, and llvm::R600Subtarget.
| std::pair< unsigned, unsigned > AMDGPUSubtarget::getOccupancyWithWorkGroupSizes | ( | const MachineFunction & | MF | ) | const |
Subtarget's minimum/maximum occupancy, in number of waves per EU, that can be achieved when the only function running on a CU is MF.
This notably depends on the range of allowed flat group sizes for the function, the amount of per-workgroup LDS space required by the function, and hardware characteristics.
Definition at line 132 of file AMDGPUSubtarget.cpp.
References llvm::MachineFunction::getFunction(), llvm::MachineFunction::getInfo(), and getOccupancyWithWorkGroupSizes().
|
inline |
Subtarget's minimum/maximum occupancy, in number of waves per EU, that can be achieved when the only function running on a CU is F and each workgroup running the function requires LDSBytes bytes of LDS space.
This notably depends on the range of allowed flat group sizes for the function and hardware characteristics.
Definition at line 148 of file AMDGPUSubtarget.h.
References F, getFlatWorkGroupSizes(), and getOccupancyWithWorkGroupSizes().
Referenced by llvm::GCNSubtarget::computeOccupancy(), getEffectiveWavesPerEU(), getOccupancyWithWorkGroupSizes(), and getOccupancyWithWorkGroupSizes().
| std::pair< unsigned, unsigned > AMDGPUSubtarget::getOccupancyWithWorkGroupSizes | ( | uint32_t | LDSBytes, |
| std::pair< unsigned, unsigned > | FlatWorkGroupSizes ) const |
Overload which uses the specified values for the flat work group sizes, rather than querying the function itself.
FlatWorkGroupSizes should correspond to the function's value for getFlatWorkGroupSizes.
Definition at line 52 of file AMDGPUSubtarget.cpp.
References llvm::divideCeil(), getEUsPerCU(), getLocalMemorySize(), getMaxWavesPerEU(), getMaxWorkGroupsPerCU(), getWavefrontSize(), and std::swap().
| std::optional< unsigned > AMDGPUSubtarget::getReqdWorkGroupSize | ( | const Function & | F, |
| unsigned | Dim ) const |
F in the Dim dimension, if it is known (from !reqd_work_group_size metadata. Otherwise, returns std::nullopt. Definition at line 227 of file AMDGPUSubtarget.cpp.
References llvm::mdconst::extract(), and llvm::GlobalObject::getMetadata().
Referenced by getMaxWorkitemID(), and makeLIDRangeMetadata().
|
inline |
Definition at line 221 of file AMDGPUSubtarget.h.
References WavefrontSizeLog2.
Referenced by getAMDGPUDwarfFlavour(), getDefaultFlatWorkGroupSize(), llvm::AMDGPU::HSAMD::MetadataStreamerMsgPackV4::getHSAKernelProps(), getMaxLocalMemSizeWithWaveCount(), getOccupancyWithWorkGroupSizes(), hasWavefrontsEvenlySplittingXDim(), llvm::GCNSubtarget::isWave32(), llvm::GCNSubtarget::isWave64(), lowerFCMPIntrinsic(), and lowerICMPIntrinsic().
|
inline |
Definition at line 225 of file AMDGPUSubtarget.h.
References WavefrontSizeLog2.
Referenced by llvm::GCNSubtarget::getKnownHighZeroBitsForFrameIndex(), and llvm::GCNSubtarget::initializeSubtargetDependencies().
F, or minimum/maximum number of waves per execution unit explicitly requested using "amdgpu-waves-per-eu" attribute attached to function F.Definition at line 203 of file AMDGPUSubtarget.cpp.
References F, getFlatWorkGroupSizes(), llvm::AMDGPU::getIntegerPairAttribute(), and getWavesPerEU().
Referenced by llvm::GCNSubtarget::getMaxNumSGPRs(), llvm::GCNSubtarget::getMaxNumVGPRs(), and getWavesPerEU().
| std::pair< unsigned, unsigned > llvm::AMDGPUSubtarget::getWavesPerEU | ( | const Function & | F, |
| std::pair< unsigned, unsigned > | FlatWorkGroupSizes ) const |
Overload which uses the specified values for the flat work group sizes, rather than querying the function itself.
FlatWorkGroupSizes Should correspond to the function's value for getFlatWorkGroupSizes.
References F.
| std::pair< unsigned, unsigned > AMDGPUSubtarget::getWavesPerEU | ( | std::pair< unsigned, unsigned > | FlatWorkGroupSizes, |
| unsigned | LDSBytes, | ||
| const Function & | F ) const |
Overload which uses the specified values for the flat workgroup sizes and LDS space rather than querying the function itself.
FlatWorkGroupSizes should correspond to the function's value for getFlatWorkGroupSizes and LDSBytes to the per-workgroup LDS allocation.
Definition at line 215 of file AMDGPUSubtarget.cpp.
References llvm::Default, F, getEffectiveWavesPerEU(), llvm::AMDGPU::getIntegerPairAttribute(), and getMaxWavesPerEU().
|
pure virtual |
FlatWorkGroupSize. Implemented in llvm::GCNSubtarget, and llvm::R600Subtarget.
Referenced by getEffectiveWavesPerEU().
|
inline |
Definition at line 217 of file AMDGPUSubtarget.h.
References HasFminFmaxLegacy.
|
inline |
Definition at line 205 of file AMDGPUSubtarget.h.
References HasMulI24.
|
inline |
Definition at line 209 of file AMDGPUSubtarget.h.
References HasMulU24.
|
inline |
Definition at line 213 of file AMDGPUSubtarget.h.
References HasSMulHi.
| bool AMDGPUSubtarget::hasWavefrontsEvenlySplittingXDim | ( | const Function & | F, |
| bool | REquiresUniformYZ = false ) const |
F will execute in a manner that leaves the X dimensions of the workitem ID evenly tiling wavefronts - that is, if X / wavefrontsize is uniform. This is true if either the Y and Z block dimensions are known to always be 1 or if the X dimension will always be a power of 2. If RequireUniformYZ is true, it also ensures that the Y and Z workitem IDs will be uniform (so, while a (32, 2, 1) launch with wavesize64 would ordinarily pass this test, it won't with \pRequiresUniformYZ).This information is currently only gathered from the !reqd_work_group_size metadata on F, but this may be improved in the future.
Definition at line 235 of file AMDGPUSubtarget.cpp.
References llvm::mdconst::extract(), F, getWavefrontSize(), and llvm::isPowerOf2_32().
Definition at line 181 of file AMDGPUSubtarget.h.
References F, isAmdHsaOS(), and isMesaKernel().
|
inline |
Definition at line 167 of file AMDGPUSubtarget.h.
References llvm::Triple::AMDHSA.
Referenced by getAlignmentForImplicitArgPtr(), llvm::GCNSubtarget::getTrapHandlerAbi(), llvm::GCNSubtarget::initializeSubtargetDependencies(), isAmdHsaOrMesa(), and llvm::AMDGPUAsmPrinter::runOnMachineFunction().
|
inline |
Definition at line 171 of file AMDGPUSubtarget.h.
References llvm::Triple::AMDPAL.
Referenced by llvm::AMDGPUAsmPrinter::runOnMachineFunction().
|
inline |
Definition at line 185 of file AMDGPUSubtarget.h.
|
inline |
Definition at line 175 of file AMDGPUSubtarget.h.
References llvm::Triple::Mesa3D.
Referenced by llvm::GCNSubtarget::isMesaGfxShader(), and isMesaKernel().
Definition at line 253 of file AMDGPUSubtarget.cpp.
References F, isMesa3DOS(), and llvm::AMDGPU::isShader().
Referenced by getImplicitArgNumBytes(), and isAmdHsaOrMesa().
Return true if only a single workitem can be active in a wave.
Definition at line 265 of file AMDGPUSubtarget.cpp.
References getMaxWorkitemID(), and I.
| bool AMDGPUSubtarget::makeLIDRangeMetadata | ( | Instruction * | I | ) | const |
Creates value range metadata on an workitemid.* intrinsic call or load.
Definition at line 274 of file AMDGPUSubtarget.cpp.
References llvm::MDBuilder::createRange(), llvm::dyn_cast(), F, getFlatWorkGroupSizes(), getReqdWorkGroupSize(), I, llvm::Lower, Range, and llvm::Upper.
|
inline |
Return true if real (non-fake) variants of True16 instructions using 16-bit registers should be code-generated.
Fake True16 instructions are identical to non-fake ones except that they take 32-bit registers as operands and always use their low halves.
Definition at line 201 of file AMDGPUSubtarget.h.
|
protected |
Definition at line 60 of file AMDGPUSubtarget.h.
Referenced by getAddressableLocalMemorySize(), llvm::GCNSubtarget::initializeSubtargetDependencies(), and llvm::R600Subtarget::R600Subtarget().
|
protected |
Definition at line 57 of file AMDGPUSubtarget.h.
Referenced by llvm::GCNSubtarget::GCNSubtarget(), and getEUsPerCU().
|
protected |
Definition at line 55 of file AMDGPUSubtarget.h.
Referenced by hasFminFmaxLegacy(), and llvm::GCNSubtarget::initializeSubtargetDependencies().
|
protected |
Definition at line 52 of file AMDGPUSubtarget.h.
Referenced by hasMulI24(), and llvm::R600Subtarget::initializeSubtargetDependencies().
|
protected |
Definition at line 53 of file AMDGPUSubtarget.h.
Referenced by hasMulU24(), and llvm::R600Subtarget::initializeSubtargetDependencies().
|
protected |
Definition at line 54 of file AMDGPUSubtarget.h.
Referenced by hasSMulHi(), and llvm::GCNSubtarget::initializeSubtargetDependencies().
|
protected |
Definition at line 59 of file AMDGPUSubtarget.h.
Referenced by getLocalMemorySize(), llvm::GCNSubtarget::initializeSubtargetDependencies(), and llvm::R600Subtarget::R600Subtarget().
|
protected |
Definition at line 58 of file AMDGPUSubtarget.h.
Referenced by llvm::GCNSubtarget::GCNSubtarget(), and getMaxWavesPerEU().
|
protected |
Definition at line 61 of file AMDGPUSubtarget.h.
Referenced by getWavefrontSize(), getWavefrontSizeLog2(), and llvm::GCNSubtarget::initializeSubtargetDependencies().