LLVM 23.0.0git
AMDGPULowerModuleLDSPass.cpp
Go to the documentation of this file.
1//===-- AMDGPULowerModuleLDSPass.cpp ------------------------------*- C++ -*-=//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This pass eliminates local data store, LDS, uses from non-kernel functions.
10// LDS is contiguous memory allocated per kernel execution.
11//
12// Background.
13//
14// The programming model is global variables, or equivalently function local
15// static variables, accessible from kernels or other functions. For uses from
16// kernels this is straightforward - assign an integer to the kernel for the
17// memory required by all the variables combined, allocate them within that.
18// For uses from functions there are performance tradeoffs to choose between.
19//
20// This model means the GPU runtime can specify the amount of memory allocated.
21// If this is more than the kernel assumed, the excess can be made available
22// using a language specific feature, which IR represents as a variable with
23// no initializer. This feature is referred to here as "Dynamic LDS" and is
24// lowered slightly differently to the normal case.
25//
26// Consequences of this GPU feature:
27// - memory is limited and exceeding it halts compilation
28// - a global accessed by one kernel exists independent of other kernels
29// - a global exists independent of simultaneous execution of the same kernel
30// - the address of the global may be different from different kernels as they
31// do not alias, which permits only allocating variables they use
32// - if the address is allowed to differ, functions need help to find it
33//
34// Uses from kernels are implemented here by grouping them in a per-kernel
35// struct instance. This duplicates the variables, accurately modelling their
36// aliasing properties relative to a single global representation. It also
37// permits control over alignment via padding.
38//
39// Uses from functions are more complicated and the primary purpose of this
40// IR pass. Several different lowering are chosen between to meet requirements
41// to avoid allocating any LDS where it is not necessary, as that impacts
42// occupancy and may fail the compilation, while not imposing overhead on a
43// feature whose primary advantage over global memory is performance. The basic
44// design goal is to avoid one kernel imposing overhead on another.
45//
46// Implementation.
47//
48// LDS variables with constant annotation or non-undef initializer are passed
49// through unchanged for simplification or error diagnostics in later passes.
50// Non-undef initializers are not yet implemented for LDS.
51//
52// LDS variables that are always allocated at the same address can be found
53// by lookup at that address. Otherwise runtime information/cost is required.
54//
55// The simplest strategy possible is to group all LDS variables in a single
56// struct and allocate that struct in every kernel such that the original
57// variables are always at the same address. LDS is however a limited resource
58// so this strategy is unusable in practice. It is not implemented here.
59//
60// Strategy | Precise allocation | Zero runtime cost | General purpose |
61// --------+--------------------+-------------------+-----------------+
62// Module | No | Yes | Yes |
63// Table | Yes | No | Yes |
64// Kernel | Yes | Yes | No |
65// Hybrid | Yes | Partial | Yes |
66//
67// "Module" spends LDS memory to save cycles. "Table" spends cycles and global
68// memory to save LDS. "Kernel" is as fast as kernel allocation but only works
69// for variables that are known reachable from a single kernel. "Hybrid" picks
70// between all three. When forced to choose between LDS and cycles we minimise
71// LDS use.
72
73// The "module" lowering implemented here finds LDS variables which are used by
74// non-kernel functions and creates a new struct with a field for each of those
75// LDS variables. Variables that are only used from kernels are excluded.
76//
77// The "table" lowering implemented here has three components.
78// First kernels are assigned a unique integer identifier which is available in
79// functions it calls through the intrinsic amdgcn_lds_kernel_id. The integer
80// is passed through a specific SGPR, thus works with indirect calls.
81// Second, each kernel allocates LDS variables independent of other kernels and
82// writes the addresses it chose for each variable into an array in consistent
83// order. If the kernel does not allocate a given variable, it writes undef to
84// the corresponding array location. These arrays are written to a constant
85// table in the order matching the kernel unique integer identifier.
86// Third, uses from non-kernel functions are replaced with a table lookup using
87// the intrinsic function to find the address of the variable.
88//
89// "Kernel" lowering is only applicable for variables that are unambiguously
90// reachable from exactly one kernel. For those cases, accesses to the variable
91// can be lowered to ConstantExpr address of a struct instance specific to that
92// one kernel. This is zero cost in space and in compute. It will raise a fatal
93// error on any variable that might be reachable from multiple kernels and is
94// thus most easily used as part of the hybrid lowering strategy.
95//
96// Hybrid lowering is a mixture of the above. It uses the zero cost kernel
97// lowering where it can. It lowers the variable accessed by the greatest
98// number of kernels using the module strategy as that is free for the first
99// variable. Any futher variables that can be lowered with the module strategy
100// without incurring LDS memory overhead are. The remaining ones are lowered
101// via table.
102//
103// Consequences
104// - No heuristics or user controlled magic numbers, hybrid is the right choice
105// - Kernels that don't use functions (or have had them all inlined) are not
106// affected by any lowering for kernels that do.
107// - Kernels that don't make indirect function calls are not affected by those
108// that do.
109// - Variables which are used by lots of kernels, e.g. those injected by a
110// language runtime in most kernels, are expected to have no overhead
111// - Implementations that instantiate templates per-kernel where those templates
112// use LDS are expected to hit the "Kernel" lowering strategy
113// - The runtime properties impose a cost in compiler implementation complexity
114//
115// Dynamic LDS implementation
116// Dynamic LDS is lowered similarly to the "table" strategy above and uses the
117// same intrinsic to identify which kernel is at the root of the dynamic call
118// graph. This relies on the specified behaviour that all dynamic LDS variables
119// alias one another, i.e. are at the same address, with respect to a given
120// kernel. Therefore this pass creates new dynamic LDS variables for each kernel
121// that allocates any dynamic LDS and builds a table of addresses out of those.
122// The AMDGPUPromoteAlloca pass skips kernels that use dynamic LDS.
123// The corresponding optimisation for "kernel" lowering where the table lookup
124// is elided is not implemented.
125//
126//
127// Implementation notes / limitations
128// A single LDS global variable represents an instance per kernel that can reach
129// said variables. This pass essentially specialises said variables per kernel.
130// Handling ConstantExpr during the pass complicated this significantly so now
131// all ConstantExpr uses of LDS variables are expanded to instructions. This
132// may need amending when implementing non-undef initialisers.
133//
134// Lowering is split between this IR pass and the back end. This pass chooses
135// where given variables should be allocated and marks them with metadata,
136// MD_absolute_symbol. The backend places the variables in coincidentally the
137// same location and raises a fatal error if something has gone awry. This works
138// in practice because the only pass between this one and the backend that
139// changes LDS is PromoteAlloca and the changes it makes do not conflict.
140//
141// Addresses are written to constant global arrays based on the same metadata.
142//
143// The backend lowers LDS variables in the order of traversal of the function.
144// This is at odds with the deterministic layout required. The workaround is to
145// allocate the fixed-address variables immediately upon starting the function
146// where they can be placed as intended. This requires a means of mapping from
147// the function to the variables that it allocates. For the module scope lds,
148// this is via metadata indicating whether the variable is not required. If a
149// pass deletes that metadata, a fatal error on disagreement with the absolute
150// symbol metadata will occur. For kernel scope and dynamic, this is by _name_
151// correspondence between the function and the variable. It requires the
152// kernel to have a name (which is only a limitation for tests in practice) and
153// for nothing to rename the corresponding symbols. This is a hazard if the pass
154// is run multiple times during debugging. Alternative schemes considered all
155// involve bespoke metadata.
156//
157// If the name correspondence can be replaced, multiple distinct kernels that
158// have the same memory layout can map to the same kernel id (as the address
159// itself is handled by the absolute symbol metadata) and that will allow more
160// uses of the "kernel" style faster lowering and reduce the size of the lookup
161// tables.
162//
163// There is a test that checks this does not fire for a graphics shader. This
164// lowering is expected to work for graphics if the isKernel test is changed.
165//
166// The current markUsedByKernel is sufficient for PromoteAlloca but is elided
167// before codegen. Replacing this with an equivalent intrinsic which lasts until
168// shortly after the machine function lowering of LDS would help break the name
169// mapping. The other part needed is probably to amend PromoteAlloca to embed
170// the LDS variables it creates in the same struct created here. That avoids the
171// current hazard where a PromoteAlloca LDS variable might be allocated before
172// the kernel scope (and thus error on the address check). Given a new invariant
173// that no LDS variables exist outside of the structs managed here, and an
174// intrinsic that lasts until after the LDS frame lowering, it should be
175// possible to drop the name mapping and fold equivalent memory layouts.
176//
177//===----------------------------------------------------------------------===//
178
179#include "AMDGPU.h"
180#include "AMDGPUMemoryUtils.h"
181#include "AMDGPUTargetMachine.h"
182#include "Utils/AMDGPUBaseInfo.h"
183#include "llvm/ADT/BitVector.h"
184#include "llvm/ADT/DenseMap.h"
185#include "llvm/ADT/DenseSet.h"
186#include "llvm/ADT/STLExtras.h"
188#include "llvm/ADT/SmallString.h"
192#include "llvm/IR/Constants.h"
193#include "llvm/IR/DerivedTypes.h"
194#include "llvm/IR/Dominators.h"
195#include "llvm/IR/IRBuilder.h"
196#include "llvm/IR/InlineAsm.h"
197#include "llvm/IR/Instructions.h"
198#include "llvm/IR/IntrinsicsAMDGPU.h"
199#include "llvm/IR/MDBuilder.h"
202#include "llvm/Pass.h"
204#include "llvm/Support/Debug.h"
205#include "llvm/Support/Format.h"
210
211#include <vector>
212
213#include <cstdio>
214
215#define DEBUG_TYPE "amdgpu-lower-module-lds"
216
217using namespace llvm;
218using namespace AMDGPU;
219
220namespace {
221
222cl::opt<bool> SuperAlignLDSGlobals(
223 "amdgpu-super-align-lds-globals",
224 cl::desc("Increase alignment of LDS if it is not on align boundary"),
225 cl::init(true), cl::Hidden);
226
227enum class LoweringKind { module, table, kernel, hybrid };
228cl::opt<LoweringKind> LoweringKindLoc(
229 "amdgpu-lower-module-lds-strategy",
230 cl::desc("Specify lowering strategy for function LDS access:"), cl::Hidden,
231 cl::init(LoweringKind::hybrid),
233 clEnumValN(LoweringKind::table, "table", "Lower via table lookup"),
234 clEnumValN(LoweringKind::module, "module", "Lower via module struct"),
236 LoweringKind::kernel, "kernel",
237 "Lower variables reachable from one kernel, otherwise abort"),
238 clEnumValN(LoweringKind::hybrid, "hybrid",
239 "Lower via mixture of above strategies")));
240
241template <typename T> std::vector<T> sortByName(std::vector<T> &&V) {
242 llvm::sort(V, [](const auto *L, const auto *R) {
243 return L->getName() < R->getName();
244 });
245 return {std::move(V)};
246}
247
248class AMDGPULowerModuleLDS {
249 const AMDGPUTargetMachine &TM;
250
251 static void
252 removeLocalVarsFromUsedLists(Module &M,
253 const DenseSet<GlobalVariable *> &LocalVars) {
254 // The verifier rejects used lists containing an inttoptr of a constant
255 // so remove the variables from these lists before replaceAllUsesWith
256 SmallPtrSet<Constant *, 8> LocalVarsSet;
257 for (GlobalVariable *LocalVar : LocalVars)
258 LocalVarsSet.insert(cast<Constant>(LocalVar->stripPointerCasts()));
259
261 M, [&LocalVarsSet](Constant *C) { return LocalVarsSet.count(C); });
262
263 for (GlobalVariable *LocalVar : LocalVars)
264 LocalVar->removeDeadConstantUsers();
265 }
266
267 static void markUsedByKernel(Function *Func, GlobalVariable *SGV) {
268 // The llvm.amdgcn.module.lds instance is implicitly used by all kernels
269 // that might call a function which accesses a field within it. This is
270 // presently approximated to 'all kernels' if there are any such functions
271 // in the module. This implicit use is redefined as an explicit use here so
272 // that later passes, specifically PromoteAlloca, account for the required
273 // memory without any knowledge of this transform.
274
275 // An operand bundle on llvm.donothing works because the call instruction
276 // survives until after the last pass that needs to account for LDS. It is
277 // better than inline asm as the latter survives until the end of codegen. A
278 // totally robust solution would be a function with the same semantics as
279 // llvm.donothing that takes a pointer to the instance and is lowered to a
280 // no-op after LDS is allocated, but that is not presently necessary.
281
282 // This intrinsic is eliminated shortly before instruction selection. It
283 // does not suffice to indicate to ISel that a given global which is not
284 // immediately used by the kernel must still be allocated by it. An
285 // equivalent target specific intrinsic which lasts until immediately after
286 // codegen would suffice for that, but one would still need to ensure that
287 // the variables are allocated in the anticipated order.
288 BasicBlock *Entry = &Func->getEntryBlock();
289 IRBuilder<> Builder(Entry, Entry->getFirstNonPHIIt());
290
292 Func->getParent(), Intrinsic::donothing, {});
293
294 Value *UseInstance[1] = {
295 Builder.CreateConstInBoundsGEP1_32(SGV->getValueType(), SGV, 0)};
296
297 Builder.CreateCall(
298 Decl, {}, {OperandBundleDefT<Value *>("ExplicitUse", UseInstance)});
299 }
300
301public:
302 AMDGPULowerModuleLDS(const AMDGPUTargetMachine &TM_) : TM(TM_) {}
303
304 struct LDSVariableReplacement {
305 GlobalVariable *SGV = nullptr;
306 DenseMap<GlobalVariable *, Constant *> LDSVarsToConstantGEP;
307 };
308
309 // remap from lds global to a constantexpr gep to where it has been moved to
310 // for each kernel
311 // an array with an element for each kernel containing where the corresponding
312 // variable was remapped to
313
314 static Constant *getAddressesOfVariablesInKernel(
316 const DenseMap<GlobalVariable *, Constant *> &LDSVarsToConstantGEP) {
317 // Create a ConstantArray containing the address of each Variable within the
318 // kernel corresponding to LDSVarsToConstantGEP, or poison if that kernel
319 // does not allocate it
320
322 ArrayType *KernelOffsetsType = ArrayType::get(LocalPtrTy, Variables.size());
323
325 for (GlobalVariable *GV : Variables) {
326 auto ConstantGepIt = LDSVarsToConstantGEP.find(GV);
327 if (ConstantGepIt != LDSVarsToConstantGEP.end()) {
328 Elements.push_back(ConstantGepIt->second);
329 } else {
330 Elements.push_back(PoisonValue::get(LocalPtrTy));
331 }
332 }
333 return ConstantArray::get(KernelOffsetsType, Elements);
334 }
335
336 static GlobalVariable *buildLookupTable(
338 ArrayRef<Function *> kernels,
340 if (Variables.empty()) {
341 return nullptr;
342 }
343 LLVMContext &Ctx = M.getContext();
344
345 const size_t NumberVariables = Variables.size();
346 const size_t NumberKernels = kernels.size();
347
349 ArrayType *KernelOffsetsType = ArrayType::get(LocalPtrTy, NumberVariables);
350
351 ArrayType *AllKernelsOffsetsType =
352 ArrayType::get(KernelOffsetsType, NumberKernels);
353
354 Constant *Missing = PoisonValue::get(KernelOffsetsType);
355 std::vector<Constant *> overallConstantExprElts(NumberKernels);
356 for (size_t i = 0; i < NumberKernels; i++) {
357 auto Replacement = KernelToReplacement.find(kernels[i]);
358 overallConstantExprElts[i] =
359 (Replacement == KernelToReplacement.end())
360 ? Missing
361 : getAddressesOfVariablesInKernel(
362 Ctx, Variables, Replacement->second.LDSVarsToConstantGEP);
363 }
364
365 Constant *init =
366 ConstantArray::get(AllKernelsOffsetsType, overallConstantExprElts);
367
368 return new GlobalVariable(
369 M, AllKernelsOffsetsType, true, GlobalValue::InternalLinkage, init,
370 "llvm.amdgcn.lds.offset.table", nullptr, GlobalValue::NotThreadLocal,
372 }
373
374 void replaceUseWithTableLookup(Module &M, IRBuilder<> &Builder,
375 GlobalVariable *LookupTable,
376 GlobalVariable *GV, Use &U,
377 Value *OptionalIndex) {
378 // Table is a constant array of the same length as OrderedKernels
379 LLVMContext &Ctx = M.getContext();
380 Type *I32 = Type::getInt32Ty(Ctx);
381 auto *I = cast<Instruction>(U.getUser());
382
383 Value *tableKernelIndex = getTableLookupKernelIndex(M, I->getFunction());
384
385 if (auto *Phi = dyn_cast<PHINode>(I)) {
386 BasicBlock *BB = Phi->getIncomingBlock(U);
387 Builder.SetInsertPoint(&(*(BB->getFirstInsertionPt())));
388 } else {
389 Builder.SetInsertPoint(I);
390 }
391
392 SmallVector<Value *, 3> GEPIdx = {
393 ConstantInt::get(I32, 0),
394 tableKernelIndex,
395 };
396 if (OptionalIndex)
397 GEPIdx.push_back(OptionalIndex);
398
399 Value *Address = Builder.CreateInBoundsGEP(
400 LookupTable->getValueType(), LookupTable, GEPIdx, GV->getName());
401
402 Value *Loaded = Builder.CreateLoad(GV->getType(), Address);
403 U.set(Loaded);
404 }
405
406 void replaceUsesInInstructionsWithTableLookup(
407 Module &M, ArrayRef<GlobalVariable *> ModuleScopeVariables,
408 GlobalVariable *LookupTable) {
409
410 LLVMContext &Ctx = M.getContext();
411 IRBuilder<> Builder(Ctx);
412 Type *I32 = Type::getInt32Ty(Ctx);
413
414 for (size_t Index = 0; Index < ModuleScopeVariables.size(); Index++) {
415 auto *GV = ModuleScopeVariables[Index];
416
417 for (Use &U : make_early_inc_range(GV->uses())) {
418 auto *I = dyn_cast<Instruction>(U.getUser());
419 if (!I)
420 continue;
421
422 replaceUseWithTableLookup(M, Builder, LookupTable, GV, U,
423 ConstantInt::get(I32, Index));
424 }
425 }
426 }
427
428 static DenseSet<Function *> kernelsThatIndirectlyAccessAnyOfPassedVariables(
429 Module &M, LDSUsesInfoTy &LDSUsesInfo,
430 DenseSet<GlobalVariable *> const &VariableSet) {
431
432 DenseSet<Function *> KernelSet;
433
434 if (VariableSet.empty())
435 return KernelSet;
436
437 for (Function &Func : M.functions()) {
438 if (Func.isDeclaration() || !isKernel(Func))
439 continue;
440 for (GlobalVariable *GV : LDSUsesInfo.indirect_access[&Func]) {
441 if (VariableSet.contains(GV)) {
442 KernelSet.insert(&Func);
443 break;
444 }
445 }
446 }
447
448 return KernelSet;
449 }
450
451 static GlobalVariable *
452 chooseBestVariableForModuleStrategy(const DataLayout &DL,
453 VariableFunctionMap &LDSVars) {
454 // Find the global variable with the most indirect uses from kernels
455
456 struct CandidateTy {
457 GlobalVariable *GV = nullptr;
458 size_t UserCount = 0;
459 size_t Size = 0;
460
461 CandidateTy() = default;
462
463 CandidateTy(GlobalVariable *GV, uint64_t UserCount, uint64_t AllocSize)
464 : GV(GV), UserCount(UserCount), Size(AllocSize) {}
465
466 bool operator<(const CandidateTy &Other) const {
467 // Fewer users makes module scope variable less attractive
468 if (UserCount < Other.UserCount) {
469 return true;
470 }
471 if (UserCount > Other.UserCount) {
472 return false;
473 }
474
475 // Bigger makes module scope variable less attractive
476 if (Size < Other.Size) {
477 return false;
478 }
479
480 if (Size > Other.Size) {
481 return true;
482 }
483
484 // Arbitrary but consistent
485 return GV->getName() < Other.GV->getName();
486 }
487 };
488
489 CandidateTy MostUsed;
490
491 for (auto &K : LDSVars) {
492 GlobalVariable *GV = K.first;
493 if (K.second.size() <= 1) {
494 // A variable reachable by only one kernel is best lowered with kernel
495 // strategy
496 continue;
497 }
498 CandidateTy Candidate(GV, K.second.size(), GV->getGlobalSize(DL));
499 if (MostUsed < Candidate)
500 MostUsed = Candidate;
501 }
502
503 return MostUsed.GV;
504 }
505
506 static void recordLDSAbsoluteAddress(Module *M, GlobalVariable *GV,
507 uint32_t Address) {
508 // Write the specified address into metadata where it can be retrieved by
509 // the assembler. Format is a half open range, [Address Address+1)
510 LLVMContext &Ctx = M->getContext();
511 auto *IntTy =
512 M->getDataLayout().getIntPtrType(Ctx, AMDGPUAS::LOCAL_ADDRESS);
513 auto *MinC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address));
514 auto *MaxC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address + 1));
515 GV->setMetadata(LLVMContext::MD_absolute_symbol,
516 MDNode::get(Ctx, {MinC, MaxC}));
517 }
518
519 DenseMap<Function *, Value *> tableKernelIndexCache;
520 Value *getTableLookupKernelIndex(Module &M, Function *F) {
521 // Accesses from a function use the amdgcn_lds_kernel_id intrinsic which
522 // lowers to a read from a live in register. Emit it once in the entry
523 // block to spare deduplicating it later.
524 auto [It, Inserted] = tableKernelIndexCache.try_emplace(F);
525 if (Inserted) {
526 auto InsertAt = F->getEntryBlock().getFirstNonPHIOrDbgOrAlloca();
527 IRBuilder<> Builder(&*InsertAt);
528
529 It->second = Builder.CreateIntrinsic(Intrinsic::amdgcn_lds_kernel_id, {});
530 }
531
532 return It->second;
533 }
534
535 static std::vector<Function *> assignLDSKernelIDToEachKernel(
536 Module *M, DenseSet<Function *> const &KernelsThatAllocateTableLDS,
537 DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS) {
538 // Associate kernels in the set with an arbitrary but reproducible order and
539 // annotate them with that order in metadata. This metadata is recognised by
540 // the backend and lowered to a SGPR which can be read from using
541 // amdgcn_lds_kernel_id.
542
543 std::vector<Function *> OrderedKernels;
544 if (!KernelsThatAllocateTableLDS.empty() ||
545 !KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
546
547 for (Function &Func : M->functions()) {
548 if (Func.isDeclaration())
549 continue;
550 if (!isKernel(Func))
551 continue;
552
553 if (KernelsThatAllocateTableLDS.contains(&Func) ||
554 KernelsThatIndirectlyAllocateDynamicLDS.contains(&Func)) {
555 assert(Func.hasName()); // else fatal error earlier
556 OrderedKernels.push_back(&Func);
557 }
558 }
559
560 // Put them in an arbitrary but reproducible order
561 OrderedKernels = sortByName(std::move(OrderedKernels));
562
563 // Annotate the kernels with their order in this vector
564 LLVMContext &Ctx = M->getContext();
565 IRBuilder<> Builder(Ctx);
566
567 if (OrderedKernels.size() > UINT32_MAX) {
568 // 32 bit keeps it in one SGPR. > 2**32 kernels won't fit on the GPU
569 reportFatalUsageError("unimplemented LDS lowering for > 2**32 kernels");
570 }
571
572 for (size_t i = 0; i < OrderedKernels.size(); i++) {
573 Metadata *AttrMDArgs[1] = {
574 ConstantAsMetadata::get(Builder.getInt32(i)),
575 };
576 OrderedKernels[i]->setMetadata("llvm.amdgcn.lds.kernel.id",
577 MDNode::get(Ctx, AttrMDArgs));
578 }
579 }
580 return OrderedKernels;
581 }
582
583 static void partitionVariablesIntoIndirectStrategies(
584 Module &M, LDSUsesInfoTy const &LDSUsesInfo,
585 VariableFunctionMap &LDSToKernelsThatNeedToAccessItIndirectly,
586 DenseSet<GlobalVariable *> &ModuleScopeVariables,
587 DenseSet<GlobalVariable *> &TableLookupVariables,
588 DenseSet<GlobalVariable *> &KernelAccessVariables,
589 DenseSet<GlobalVariable *> &DynamicVariables) {
590
591 GlobalVariable *HybridModuleRoot =
592 LoweringKindLoc != LoweringKind::hybrid
593 ? nullptr
594 : chooseBestVariableForModuleStrategy(
595 M.getDataLayout(), LDSToKernelsThatNeedToAccessItIndirectly);
596
597 DenseSet<Function *> const EmptySet;
598 DenseSet<Function *> const &HybridModuleRootKernels =
599 HybridModuleRoot
600 ? LDSToKernelsThatNeedToAccessItIndirectly[HybridModuleRoot]
601 : EmptySet;
602
603 for (auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {
604 // Each iteration of this loop assigns exactly one global variable to
605 // exactly one of the implementation strategies.
606
607 GlobalVariable *GV = K.first;
609 assert(!K.second.empty());
610
611 if (AMDGPU::isDynamicLDS(*GV)) {
612 DynamicVariables.insert(GV);
613 continue;
614 }
615
616 switch (LoweringKindLoc) {
617 case LoweringKind::module:
618 ModuleScopeVariables.insert(GV);
619 break;
620
621 case LoweringKind::table:
622 TableLookupVariables.insert(GV);
623 break;
624
625 case LoweringKind::kernel:
626 if (K.second.size() == 1) {
627 KernelAccessVariables.insert(GV);
628 } else {
629 // FIXME: This should use DiagnosticInfo
631 "cannot lower LDS '" + GV->getName() +
632 "' to kernel access as it is reachable from multiple kernels");
633 }
634 break;
635
636 case LoweringKind::hybrid: {
637 if (GV == HybridModuleRoot) {
638 assert(K.second.size() != 1);
639 ModuleScopeVariables.insert(GV);
640 } else if (K.second.size() == 1) {
641 KernelAccessVariables.insert(GV);
642 } else if (K.second == HybridModuleRootKernels) {
643 ModuleScopeVariables.insert(GV);
644 } else {
645 TableLookupVariables.insert(GV);
646 }
647 break;
648 }
649 }
650 }
651
652 // All LDS variables accessed indirectly have now been partitioned into
653 // the distinct lowering strategies.
654 assert(ModuleScopeVariables.size() + TableLookupVariables.size() +
655 KernelAccessVariables.size() + DynamicVariables.size() ==
656 LDSToKernelsThatNeedToAccessItIndirectly.size());
657 }
658
659 static GlobalVariable *lowerModuleScopeStructVariables(
660 Module &M, DenseSet<GlobalVariable *> const &ModuleScopeVariables,
661 DenseSet<Function *> const &KernelsThatAllocateModuleLDS) {
662 // Create a struct to hold the ModuleScopeVariables
663 // Replace all uses of those variables from non-kernel functions with the
664 // new struct instance Replace only the uses from kernel functions that will
665 // allocate this instance. That is a space optimisation - kernels that use a
666 // subset of the module scope struct and do not need to allocate it for
667 // indirect calls will only allocate the subset they use (they do so as part
668 // of the per-kernel lowering).
669 if (ModuleScopeVariables.empty()) {
670 return nullptr;
671 }
672
673 LLVMContext &Ctx = M.getContext();
674
675 LDSVariableReplacement ModuleScopeReplacement =
676 createLDSVariableReplacement(M, "llvm.amdgcn.module.lds",
677 ModuleScopeVariables);
678
679 appendToCompilerUsed(M, {static_cast<GlobalValue *>(
681 cast<Constant>(ModuleScopeReplacement.SGV),
682 PointerType::getUnqual(Ctx)))});
683
684 // module.lds will be allocated at zero in any kernel that allocates it
685 recordLDSAbsoluteAddress(&M, ModuleScopeReplacement.SGV, 0);
686
687 // historic
688 removeLocalVarsFromUsedLists(M, ModuleScopeVariables);
689
690 // Replace all uses of module scope variable from non-kernel functions
691 replaceLDSVariablesWithStruct(
692 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
693 Instruction *I = dyn_cast<Instruction>(U.getUser());
694 if (!I) {
695 return false;
696 }
697 Function *F = I->getFunction();
698 return !isKernel(*F);
699 });
700
701 // Replace uses of module scope variable from kernel functions that
702 // allocate the module scope variable, otherwise leave them unchanged
703 // Record on each kernel whether the module scope global is used by it
704
705 for (Function &Func : M.functions()) {
706 if (Func.isDeclaration() || !isKernel(Func))
707 continue;
708
709 if (KernelsThatAllocateModuleLDS.contains(&Func)) {
710 replaceLDSVariablesWithStruct(
711 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
712 Instruction *I = dyn_cast<Instruction>(U.getUser());
713 if (!I) {
714 return false;
715 }
716 Function *F = I->getFunction();
717 return F == &Func;
718 });
719
720 markUsedByKernel(&Func, ModuleScopeReplacement.SGV);
721 }
722 }
723
724 return ModuleScopeReplacement.SGV;
725 }
726
728 lowerKernelScopeStructVariables(
729 Module &M, LDSUsesInfoTy &LDSUsesInfo,
730 DenseSet<GlobalVariable *> const &ModuleScopeVariables,
731 DenseSet<Function *> const &KernelsThatAllocateModuleLDS,
732 GlobalVariable *MaybeModuleScopeStruct) {
733
734 // Create a struct for each kernel for the non-module-scope variables.
735
737 for (Function &Func : M.functions()) {
738 if (Func.isDeclaration() || !isKernel(Func))
739 continue;
740
741 DenseSet<GlobalVariable *> KernelUsedVariables;
742 // Allocating variables that are used directly in this struct to get
743 // alignment aware allocation and predictable frame size.
744 for (auto &v : LDSUsesInfo.direct_access[&Func]) {
745 if (!AMDGPU::isDynamicLDS(*v)) {
746 KernelUsedVariables.insert(v);
747 }
748 }
749
750 // Allocating variables that are accessed indirectly so that a lookup of
751 // this struct instance can find them from nested functions.
752 for (auto &v : LDSUsesInfo.indirect_access[&Func]) {
753 if (!AMDGPU::isDynamicLDS(*v)) {
754 KernelUsedVariables.insert(v);
755 }
756 }
757
758 // Variables allocated in module lds must all resolve to that struct,
759 // not to the per-kernel instance.
760 if (KernelsThatAllocateModuleLDS.contains(&Func)) {
761 for (GlobalVariable *v : ModuleScopeVariables) {
762 KernelUsedVariables.erase(v);
763 }
764 }
765
766 if (KernelUsedVariables.empty()) {
767 // Either used no LDS, or the LDS it used was all in the module struct
768 // or dynamically sized
769 continue;
770 }
771
772 // The association between kernel function and LDS struct is done by
773 // symbol name, which only works if the function in question has a
774 // name This is not expected to be a problem in practice as kernels
775 // are called by name making anonymous ones (which are named by the
776 // backend) difficult to use. This does mean that llvm test cases need
777 // to name the kernels.
778 if (!Func.hasName()) {
779 reportFatalUsageError("anonymous kernels cannot use LDS variables");
780 }
781
782 std::string VarName =
783 (Twine("llvm.amdgcn.kernel.") + Func.getName() + ".lds").str();
784
785 auto Replacement =
786 createLDSVariableReplacement(M, VarName, KernelUsedVariables);
787
788 // If any indirect uses, create a direct use to ensure allocation
789 // TODO: Simpler to unconditionally mark used but that regresses
790 // codegen in test/CodeGen/AMDGPU/noclobber-barrier.ll
791 auto Accesses = LDSUsesInfo.indirect_access.find(&Func);
792 if ((Accesses != LDSUsesInfo.indirect_access.end()) &&
793 !Accesses->second.empty())
794 markUsedByKernel(&Func, Replacement.SGV);
795
796 // remove preserves existing codegen
797 removeLocalVarsFromUsedLists(M, KernelUsedVariables);
798 KernelToReplacement[&Func] = Replacement;
799
800 // Rewrite uses within kernel to the new struct
801 replaceLDSVariablesWithStruct(
802 M, KernelUsedVariables, Replacement, [&Func](Use &U) {
803 Instruction *I = dyn_cast<Instruction>(U.getUser());
804 return I && I->getFunction() == &Func;
805 });
806 }
807 return KernelToReplacement;
808 }
809
810 static GlobalVariable *
811 buildRepresentativeDynamicLDSInstance(Module &M, LDSUsesInfoTy &LDSUsesInfo,
812 Function *func) {
813 // Create a dynamic lds variable with a name associated with the passed
814 // function that has the maximum alignment of any dynamic lds variable
815 // reachable from this kernel. Dynamic LDS is allocated after the static LDS
816 // allocation, possibly after alignment padding. The representative variable
817 // created here has the maximum alignment of any other dynamic variable
818 // reachable by that kernel. All dynamic LDS variables are allocated at the
819 // same address in each kernel in order to provide the documented aliasing
820 // semantics. Setting the alignment here allows this IR pass to accurately
821 // predict the exact constant at which it will be allocated.
822
823 assert(isKernel(*func));
824
825 LLVMContext &Ctx = M.getContext();
826 const DataLayout &DL = M.getDataLayout();
827 Align MaxDynamicAlignment(1);
828
829 auto UpdateMaxAlignment = [&MaxDynamicAlignment, &DL](GlobalVariable *GV) {
830 if (AMDGPU::isDynamicLDS(*GV)) {
831 MaxDynamicAlignment =
832 std::max(MaxDynamicAlignment, AMDGPU::getAlign(DL, GV));
833 }
834 };
835
836 for (GlobalVariable *GV : LDSUsesInfo.indirect_access[func]) {
837 UpdateMaxAlignment(GV);
838 }
839
840 for (GlobalVariable *GV : LDSUsesInfo.direct_access[func]) {
841 UpdateMaxAlignment(GV);
842 }
843
844 assert(func->hasName()); // Checked by caller
845 auto *emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
847 M, emptyCharArray, false, GlobalValue::ExternalLinkage, nullptr,
848 Twine("llvm.amdgcn." + func->getName() + ".dynlds"), nullptr,
850 N->setAlignment(MaxDynamicAlignment);
851
853 return N;
854 }
855
856 DenseMap<Function *, GlobalVariable *> lowerDynamicLDSVariables(
857 Module &M, LDSUsesInfoTy &LDSUsesInfo,
858 DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS,
859 DenseSet<GlobalVariable *> const &DynamicVariables,
860 std::vector<Function *> const &OrderedKernels) {
861 DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS;
862 if (!KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
863 LLVMContext &Ctx = M.getContext();
864 IRBuilder<> Builder(Ctx);
866
867 std::vector<Constant *> newDynamicLDS;
868
869 // Table is built in the same order as OrderedKernels
870 for (auto &func : OrderedKernels) {
871
872 if (KernelsThatIndirectlyAllocateDynamicLDS.contains(func)) {
873 assert(isKernel(*func));
874 if (!func->hasName()) {
875 reportFatalUsageError("anonymous kernels cannot use LDS variables");
876 }
877
879 buildRepresentativeDynamicLDSInstance(M, LDSUsesInfo, func);
880
881 KernelToCreatedDynamicLDS[func] = N;
882
883 markUsedByKernel(func, N);
884
885 newDynamicLDS.push_back(N);
886 } else {
887 newDynamicLDS.push_back(PoisonValue::get(LocalPtrTy));
888 }
889 }
890 assert(OrderedKernels.size() == newDynamicLDS.size());
891
892 ArrayType *t = ArrayType::get(LocalPtrTy, newDynamicLDS.size());
893 Constant *init = ConstantArray::get(t, newDynamicLDS);
894 GlobalVariable *table = new GlobalVariable(
895 M, t, true, GlobalValue::InternalLinkage, init,
896 "llvm.amdgcn.dynlds.offset.table", nullptr,
898
899 for (GlobalVariable *GV : DynamicVariables) {
900 for (Use &U : make_early_inc_range(GV->uses())) {
901 auto *I = dyn_cast<Instruction>(U.getUser());
902 if (!I)
903 continue;
904 if (isKernel(*I->getFunction()))
905 continue;
906
907 replaceUseWithTableLookup(M, Builder, table, GV, U, nullptr);
908 }
909 }
910 }
911 return KernelToCreatedDynamicLDS;
912 }
913
914 // Per-TU mode for link-time LDS resolution. Instead of computing a global
915 // layout, create per-function LDS struct declarations so the linker can
916 // assign offsets across TUs.
917 bool runOnModuleLinkTime(Module &M) {
918 bool Changed = superAlignLDSGlobals(M);
920
921 CallGraph CG(M);
922 FunctionVariableMap KernelLDSUses, FunctionLDSUses;
923 getUsesOfLDSByFunction(CG, M, KernelLDSUses, FunctionLDSUses);
924
925 if (KernelLDSUses.empty() && FunctionLDSUses.empty())
926 return Changed;
927
928 std::string ModuleId = getUniqueModuleId(&M);
929 assert(!ModuleId.empty() &&
930 "modules with LDS variables should have a unique ID");
931
932 FunctionVariableMap AllLDSUses;
933 for (auto &[F, Vars] : KernelLDSUses)
934 AllLDSUses[F].insert(Vars.begin(), Vars.end());
935 for (auto &[F, Vars] : FunctionLDSUses)
936 AllLDSUses[F].insert(Vars.begin(), Vars.end());
937
938 // Named barriers are handled by AMDGPULowerExecSync; filter them out.
939 for (auto &[F, Vars] : AllLDSUses) {
941 for (GlobalVariable *V : Vars)
943 Barriers.push_back(V);
944 for (GlobalVariable *V : Barriers)
945 Vars.erase(V);
946 }
947
948 // Build reverse map: LDS variable -> functions that use it.
950 for (auto &[F, Vars] : AllLDSUses) {
951 for (GlobalVariable *V : Vars)
952 VarToFuncs[V].push_back(F);
953 }
954
955 // A variable is function-scope iff it has local linkage and exactly one
956 // user function. Everything else is global-scope and must remain as a
957 // standalone external declaration so the linker can assign a single shared
958 // offset.
959 DenseSet<GlobalVariable *> GlobalScopeVars;
960 DenseSet<GlobalVariable *> InternalMultiUserVars;
961 for (auto &[V, Funcs] : VarToFuncs) {
962 if (!V->hasLocalLinkage() || Funcs.size() > 1) {
963 GlobalScopeVars.insert(V);
964 if (V->hasLocalLinkage())
965 InternalMultiUserVars.insert(V);
966 }
967 }
968
969 // Wrap function-scope LDS into per-function structs (unchanged logic,
970 // but global-scope variables are excluded from the set).
972 DenseSet<GlobalVariable *> AllReplacedVars;
973 for (auto &KV : AllLDSUses) {
974 Function *F = KV.first;
975 DenseSet<GlobalVariable *> FuncScopeVars;
976 for (GlobalVariable *V : KV.second) {
977 if (!GlobalScopeVars.count(V))
978 FuncScopeVars.insert(V);
979 }
980
981 if (FuncScopeVars.empty())
982 continue;
983
984 std::string StructName =
985 F->hasLocalLinkage()
986 ? ("__amdgpu_lds." + F->getName() + ModuleId).str()
987 : ("__amdgpu_lds." + F->getName()).str();
988 LDSVariableReplacement Replacement =
989 createLDSVariableReplacement(M, StructName, FuncScopeVars);
990
991 GlobalVariable *SGV = Replacement.SGV;
993 SGV->setInitializer(nullptr);
994 FuncToLdsStruct.push_back({F, SGV});
995
996 replaceLDSVariablesWithStruct(
997 M, FuncScopeVars, Replacement, [F](const Use &U) {
998 auto *I = dyn_cast<Instruction>(U.getUser());
999 return I && I->getFunction() == F;
1000 });
1001
1002 AllReplacedVars.insert(FuncScopeVars.begin(), FuncScopeVars.end());
1003 }
1004
1005 // Internal-linkage LDS variables used by multiple functions would collide
1006 // across TUs if promoted individually to external linkage (same name in
1007 // different TUs). Pack them into a single per-module struct with a
1008 // module-unique name so the linker treats them as one allocation unit.
1009 if (!InternalMultiUserVars.empty()) {
1010 std::string StructName = "__amdgpu_lds.__internal" + ModuleId;
1011 LDSVariableReplacement Replacement =
1012 createLDSVariableReplacement(M, StructName, InternalMultiUserVars);
1013
1014 GlobalVariable *SGV = Replacement.SGV;
1016 SGV->setInitializer(nullptr);
1017
1018 replaceLDSVariablesWithStruct(
1019 M, InternalMultiUserVars, Replacement,
1020 [](const Use &U) { return isa<Instruction>(U.getUser()); });
1021
1022 DenseSet<Function *> FuncsUsingInternalVars;
1023 for (GlobalVariable *V : InternalMultiUserVars) {
1024 for (Function *F : VarToFuncs[V])
1025 FuncsUsingInternalVars.insert(F);
1026 }
1027 for (Function *F : FuncsUsingInternalVars)
1028 FuncToLdsStruct.push_back({F, SGV});
1029
1030 AllReplacedVars.insert(InternalMultiUserVars.begin(),
1031 InternalMultiUserVars.end());
1032 }
1033
1034 // Convert global-scope LDS to external declarations. Their uses remain
1035 // intact and ISel generates R_AMDGPU_ABS32_LO relocations for them.
1036 for (GlobalVariable *V : GlobalScopeVars) {
1037 V->setInitializer(nullptr);
1038 V->setLinkage(GlobalValue::ExternalLinkage);
1039 }
1040
1041 // Emit amdgpu.lds.uses metadata for struct and global-scope LDS.
1042 {
1043 LLVMContext &Ctx = M.getContext();
1044 NamedMDNode *LdsMD = M.getOrInsertNamedMetadata("amdgpu.lds.uses");
1045
1046 for (auto &[F, SGV] : FuncToLdsStruct)
1047 LdsMD->addOperand(MDNode::get(
1049
1050 for (auto &[V, Funcs] : VarToFuncs) {
1051 if (GlobalScopeVars.count(V) && !InternalMultiUserVars.count(V)) {
1052 for (Function *F : Funcs) {
1053 LdsMD->addOperand(MDNode::get(
1055 }
1056 }
1057 }
1058 }
1059
1060 M.addModuleFlag(Module::Error, "amdgpu-link-time-lds", 1);
1061
1062 DenseSet<GlobalVariable *> AllLDSVarsForCleanup = AllReplacedVars;
1063 AllLDSVarsForCleanup.insert(GlobalScopeVars.begin(), GlobalScopeVars.end());
1064 removeLocalVarsFromUsedLists(M, AllLDSVarsForCleanup);
1065 for (GlobalVariable *GV : AllReplacedVars) {
1067 if (GV->use_empty())
1068 GV->eraseFromParent();
1069 }
1070
1071 return true;
1072 }
1073
1074 bool runOnModule(Module &M) {
1076 return runOnModuleLinkTime(M);
1077 return runOnModuleNormal(M);
1078 }
1079
1080 bool runOnModuleNormal(Module &M) {
1081 CallGraph CG = CallGraph(M);
1082 bool Changed = superAlignLDSGlobals(M);
1083
1085
1086 Changed = true; // todo: narrow this down
1087
1088 // For each kernel, what variables does it access directly or through
1089 // callees
1090 LDSUsesInfoTy LDSUsesInfo = getTransitiveUsesOfLDS(CG, M);
1091
1092 // For each variable accessed through callees, which kernels access it
1093 VariableFunctionMap LDSToKernelsThatNeedToAccessItIndirectly;
1094 for (auto &K : LDSUsesInfo.indirect_access) {
1095 Function *F = K.first;
1096 assert(isKernel(*F));
1097 for (GlobalVariable *GV : K.second) {
1098 LDSToKernelsThatNeedToAccessItIndirectly[GV].insert(F);
1099 }
1100 }
1101
1102 // Partition variables accessed indirectly into the different strategies
1103 DenseSet<GlobalVariable *> ModuleScopeVariables;
1104 DenseSet<GlobalVariable *> TableLookupVariables;
1105 DenseSet<GlobalVariable *> KernelAccessVariables;
1106 DenseSet<GlobalVariable *> DynamicVariables;
1107 partitionVariablesIntoIndirectStrategies(
1108 M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly,
1109 ModuleScopeVariables, TableLookupVariables, KernelAccessVariables,
1110 DynamicVariables);
1111
1112 // If the kernel accesses a variable that is going to be stored in the
1113 // module instance through a call then that kernel needs to allocate the
1114 // module instance
1115 const DenseSet<Function *> KernelsThatAllocateModuleLDS =
1116 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1117 ModuleScopeVariables);
1118 const DenseSet<Function *> KernelsThatAllocateTableLDS =
1119 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1120 TableLookupVariables);
1121
1122 const DenseSet<Function *> KernelsThatIndirectlyAllocateDynamicLDS =
1123 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1124 DynamicVariables);
1125
1126 GlobalVariable *MaybeModuleScopeStruct = lowerModuleScopeStructVariables(
1127 M, ModuleScopeVariables, KernelsThatAllocateModuleLDS);
1128
1130 lowerKernelScopeStructVariables(M, LDSUsesInfo, ModuleScopeVariables,
1131 KernelsThatAllocateModuleLDS,
1132 MaybeModuleScopeStruct);
1133
1134 // Lower zero cost accesses to the kernel instances just created
1135 for (auto &GV : KernelAccessVariables) {
1136 auto &funcs = LDSToKernelsThatNeedToAccessItIndirectly[GV];
1137 assert(funcs.size() == 1); // Only one kernel can access it
1138 LDSVariableReplacement Replacement =
1139 KernelToReplacement[*(funcs.begin())];
1140
1142 Vec.insert(GV);
1143
1144 replaceLDSVariablesWithStruct(M, Vec, Replacement, [](Use &U) {
1145 return isa<Instruction>(U.getUser());
1146 });
1147 }
1148
1149 // The ith element of this vector is kernel id i
1150 std::vector<Function *> OrderedKernels =
1151 assignLDSKernelIDToEachKernel(&M, KernelsThatAllocateTableLDS,
1152 KernelsThatIndirectlyAllocateDynamicLDS);
1153
1154 if (!KernelsThatAllocateTableLDS.empty()) {
1155 LLVMContext &Ctx = M.getContext();
1156 IRBuilder<> Builder(Ctx);
1157
1158 // The order must be consistent between lookup table and accesses to
1159 // lookup table
1160 auto TableLookupVariablesOrdered =
1161 sortByName(std::vector<GlobalVariable *>(TableLookupVariables.begin(),
1162 TableLookupVariables.end()));
1163
1164 GlobalVariable *LookupTable = buildLookupTable(
1165 M, TableLookupVariablesOrdered, OrderedKernels, KernelToReplacement);
1166 replaceUsesInInstructionsWithTableLookup(M, TableLookupVariablesOrdered,
1167 LookupTable);
1168 }
1169
1170 DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS =
1171 lowerDynamicLDSVariables(M, LDSUsesInfo,
1172 KernelsThatIndirectlyAllocateDynamicLDS,
1173 DynamicVariables, OrderedKernels);
1174
1175 // Strip amdgpu-no-lds-kernel-id from all functions reachable from the
1176 // kernel. We may have inferred this wasn't used prior to the pass.
1177 // TODO: We could filter out subgraphs that do not access LDS globals.
1178 for (auto *KernelSet : {&KernelsThatIndirectlyAllocateDynamicLDS,
1179 &KernelsThatAllocateTableLDS})
1180 for (Function *F : *KernelSet)
1181 removeFnAttrFromReachable(CG, F, {"amdgpu-no-lds-kernel-id"});
1182
1183 // All kernel frames have been allocated. Calculate and record the
1184 // addresses.
1185 {
1186 const DataLayout &DL = M.getDataLayout();
1187
1188 for (Function &Func : M.functions()) {
1189 if (Func.isDeclaration() || !isKernel(Func))
1190 continue;
1191
1192 // All three of these are optional. The first variable is allocated at
1193 // zero. They are allocated by AMDGPUMachineFunctionInfo as one block.
1194 // Layout:
1195 //{
1196 // module.lds
1197 // alignment padding
1198 // kernel instance
1199 // alignment padding
1200 // dynamic lds variables
1201 //}
1202
1203 const bool AllocateModuleScopeStruct =
1204 MaybeModuleScopeStruct &&
1205 KernelsThatAllocateModuleLDS.contains(&Func);
1206
1207 auto Replacement = KernelToReplacement.find(&Func);
1208 const bool AllocateKernelScopeStruct =
1209 Replacement != KernelToReplacement.end();
1210
1211 const bool AllocateDynamicVariable =
1212 KernelToCreatedDynamicLDS.contains(&Func);
1213
1214 uint32_t Offset = 0;
1215
1216 if (AllocateModuleScopeStruct) {
1217 // Allocated at zero, recorded once on construction, not once per
1218 // kernel
1219 Offset += MaybeModuleScopeStruct->getGlobalSize(DL);
1220 }
1221
1222 if (AllocateKernelScopeStruct) {
1223 GlobalVariable *KernelStruct = Replacement->second.SGV;
1224 Offset = alignTo(Offset, AMDGPU::getAlign(DL, KernelStruct));
1225 recordLDSAbsoluteAddress(&M, KernelStruct, Offset);
1226 Offset += KernelStruct->getGlobalSize(DL);
1227 }
1228
1229 // If there is dynamic allocation, the alignment needed is included in
1230 // the static frame size. There may be no reference to the dynamic
1231 // variable in the kernel itself, so without including it here, that
1232 // alignment padding could be missed.
1233 if (AllocateDynamicVariable) {
1234 GlobalVariable *DynamicVariable = KernelToCreatedDynamicLDS[&Func];
1235 Offset = alignTo(Offset, AMDGPU::getAlign(DL, DynamicVariable));
1236 recordLDSAbsoluteAddress(&M, DynamicVariable, Offset);
1237 }
1238
1239 if (Offset != 0) {
1240 (void)TM; // TODO: Account for target maximum LDS
1241 std::string Buffer;
1242 raw_string_ostream SS{Buffer};
1243 SS << format("%u", Offset);
1244
1245 // Instead of explicitly marking kernels that access dynamic variables
1246 // using special case metadata, annotate with min-lds == max-lds, i.e.
1247 // that there is no more space available for allocating more static
1248 // LDS variables. That is the right condition to prevent allocating
1249 // more variables which would collide with the addresses assigned to
1250 // dynamic variables.
1251 if (AllocateDynamicVariable)
1252 SS << format(",%u", Offset);
1253
1254 Func.addFnAttr("amdgpu-lds-size", Buffer);
1255 }
1256 }
1257 }
1258
1259 for (auto &GV : make_early_inc_range(M.globals()))
1261 // probably want to remove from used lists
1263 if (GV.use_empty())
1264 GV.eraseFromParent();
1265 }
1266
1267 return Changed;
1268 }
1269
1270private:
1271 // Increase the alignment of LDS globals if necessary to maximise the chance
1272 // that we can use aligned LDS instructions to access them.
1273 static bool superAlignLDSGlobals(Module &M) {
1274 const DataLayout &DL = M.getDataLayout();
1275 bool Changed = false;
1276 if (!SuperAlignLDSGlobals) {
1277 return Changed;
1278 }
1279
1280 for (auto &GV : M.globals()) {
1282 // Only changing alignment of LDS variables
1283 continue;
1284 }
1285 if (!GV.hasInitializer()) {
1286 // cuda/hip extern __shared__ variable, leave alignment alone
1287 continue;
1288 }
1289
1290 if (GV.isAbsoluteSymbolRef()) {
1291 // If the variable is already allocated, don't change the alignment
1292 continue;
1293 }
1294
1295 Align Alignment = AMDGPU::getAlign(DL, &GV);
1296 uint64_t GVSize = GV.getGlobalSize(DL);
1297
1298 if (GVSize > 8) {
1299 // We might want to use a b96 or b128 load/store
1300 Alignment = std::max(Alignment, Align(16));
1301 } else if (GVSize > 4) {
1302 // We might want to use a b64 load/store
1303 Alignment = std::max(Alignment, Align(8));
1304 } else if (GVSize > 2) {
1305 // We might want to use a b32 load/store
1306 Alignment = std::max(Alignment, Align(4));
1307 } else if (GVSize > 1) {
1308 // We might want to use a b16 load/store
1309 Alignment = std::max(Alignment, Align(2));
1310 }
1311
1312 if (Alignment != AMDGPU::getAlign(DL, &GV)) {
1313 Changed = true;
1314 GV.setAlignment(Alignment);
1315 }
1316 }
1317 return Changed;
1318 }
1319
1320 static LDSVariableReplacement createLDSVariableReplacement(
1321 Module &M, std::string VarName,
1322 DenseSet<GlobalVariable *> const &LDSVarsToTransform) {
1323 // Create a struct instance containing LDSVarsToTransform and map from those
1324 // variables to ConstantExprGEP
1325 // Variables may be introduced to meet alignment requirements. No aliasing
1326 // metadata is useful for these as they have no uses. Erased before return.
1327
1328 LLVMContext &Ctx = M.getContext();
1329 const DataLayout &DL = M.getDataLayout();
1330 assert(!LDSVarsToTransform.empty());
1331
1333 LayoutFields.reserve(LDSVarsToTransform.size());
1334 {
1335 // The order of fields in this struct depends on the order of
1336 // variables in the argument which varies when changing how they
1337 // are identified, leading to spurious test breakage.
1338 auto Sorted = sortByName(std::vector<GlobalVariable *>(
1339 LDSVarsToTransform.begin(), LDSVarsToTransform.end()));
1340
1341 for (GlobalVariable *GV : Sorted) {
1343 AMDGPU::getAlign(DL, GV));
1344 LayoutFields.emplace_back(F);
1345 }
1346 }
1347
1348 performOptimizedStructLayout(LayoutFields);
1349
1350 std::vector<GlobalVariable *> LocalVars;
1351 BitVector IsPaddingField;
1352 LocalVars.reserve(LDSVarsToTransform.size()); // will be at least this large
1353 IsPaddingField.reserve(LDSVarsToTransform.size());
1354 {
1355 uint64_t CurrentOffset = 0;
1356 for (auto &F : LayoutFields) {
1357 GlobalVariable *FGV =
1358 static_cast<GlobalVariable *>(const_cast<void *>(F.Id));
1359 Align DataAlign = F.Alignment;
1360
1361 uint64_t DataAlignV = DataAlign.value();
1362 if (uint64_t Rem = CurrentOffset % DataAlignV) {
1363 uint64_t Padding = DataAlignV - Rem;
1364
1365 // Append an array of padding bytes to meet alignment requested
1366 // Note (o + (a - (o % a)) ) % a == 0
1367 // (offset + Padding ) % align == 0
1368
1369 Type *ATy = ArrayType::get(Type::getInt8Ty(Ctx), Padding);
1370 LocalVars.push_back(new GlobalVariable(
1371 M, ATy, false, GlobalValue::InternalLinkage,
1373 AMDGPUAS::LOCAL_ADDRESS, false));
1374 IsPaddingField.push_back(true);
1375 CurrentOffset += Padding;
1376 }
1377
1378 LocalVars.push_back(FGV);
1379 IsPaddingField.push_back(false);
1380 CurrentOffset += F.Size;
1381 }
1382 }
1383
1384 std::vector<Type *> LocalVarTypes;
1385 LocalVarTypes.reserve(LocalVars.size());
1386 std::transform(
1387 LocalVars.cbegin(), LocalVars.cend(), std::back_inserter(LocalVarTypes),
1388 [](const GlobalVariable *V) -> Type * { return V->getValueType(); });
1389
1390 StructType *LDSTy = StructType::create(Ctx, LocalVarTypes, VarName + ".t");
1391
1392 Align StructAlign = AMDGPU::getAlign(DL, LocalVars[0]);
1393
1394 GlobalVariable *SGV = new GlobalVariable(
1395 M, LDSTy, false, GlobalValue::InternalLinkage, PoisonValue::get(LDSTy),
1397 false);
1398 SGV->setAlignment(StructAlign);
1399
1401 Type *I32 = Type::getInt32Ty(Ctx);
1402 for (size_t I = 0; I < LocalVars.size(); I++) {
1403 GlobalVariable *GV = LocalVars[I];
1404 Constant *GEPIdx[] = {ConstantInt::get(I32, 0), ConstantInt::get(I32, I)};
1405 Constant *GEP = ConstantExpr::getGetElementPtr(LDSTy, SGV, GEPIdx, true);
1406 if (IsPaddingField[I]) {
1407 assert(GV->use_empty());
1408 GV->eraseFromParent();
1409 } else {
1410 Map[GV] = GEP;
1411 }
1412 }
1413 assert(Map.size() == LDSVarsToTransform.size());
1414 return {SGV, std::move(Map)};
1415 }
1416
1417 template <typename PredicateTy>
1418 static void replaceLDSVariablesWithStruct(
1419 Module &M, DenseSet<GlobalVariable *> const &LDSVarsToTransformArg,
1420 const LDSVariableReplacement &Replacement, PredicateTy Predicate) {
1421 LLVMContext &Ctx = M.getContext();
1422 const DataLayout &DL = M.getDataLayout();
1423
1424 // A hack... we need to insert the aliasing info in a predictable order for
1425 // lit tests. Would like to have them in a stable order already, ideally the
1426 // same order they get allocated, which might mean an ordered set container
1427 auto LDSVarsToTransform = sortByName(std::vector<GlobalVariable *>(
1428 LDSVarsToTransformArg.begin(), LDSVarsToTransformArg.end()));
1429
1430 // Create alias.scope and their lists. Each field in the new structure
1431 // does not alias with all other fields.
1432 SmallVector<MDNode *> AliasScopes;
1433 SmallVector<Metadata *> NoAliasList;
1434 const size_t NumberVars = LDSVarsToTransform.size();
1435 if (NumberVars > 1) {
1436 MDBuilder MDB(Ctx);
1437 AliasScopes.reserve(NumberVars);
1439 for (size_t I = 0; I < NumberVars; I++) {
1441 AliasScopes.push_back(Scope);
1442 }
1443 NoAliasList.append(&AliasScopes[1], AliasScopes.end());
1444 }
1445
1446 // Replace uses of ith variable with a constantexpr to the corresponding
1447 // field of the instance that will be allocated by AMDGPUMachineFunctionInfo
1448 for (size_t I = 0; I < NumberVars; I++) {
1449 GlobalVariable *GV = LDSVarsToTransform[I];
1450 Constant *GEP = Replacement.LDSVarsToConstantGEP.at(GV);
1451
1453
1454 APInt APOff(DL.getIndexTypeSizeInBits(GEP->getType()), 0);
1455 GEP->stripAndAccumulateInBoundsConstantOffsets(DL, APOff);
1456 uint64_t Offset = APOff.getZExtValue();
1457
1458 Align A =
1459 commonAlignment(Replacement.SGV->getAlign().valueOrOne(), Offset);
1460
1461 if (I)
1462 NoAliasList[I - 1] = AliasScopes[I - 1];
1463 MDNode *NoAlias =
1464 NoAliasList.empty() ? nullptr : MDNode::get(Ctx, NoAliasList);
1465 MDNode *AliasScope =
1466 AliasScopes.empty() ? nullptr : MDNode::get(Ctx, {AliasScopes[I]});
1467
1468 refineUsesAlignmentAndAA(GEP, A, DL, AliasScope, NoAlias);
1469 }
1470 }
1471
1472 static void refineUsesAlignmentAndAA(Value *Ptr, Align A,
1473 const DataLayout &DL, MDNode *AliasScope,
1474 MDNode *NoAlias, unsigned MaxDepth = 5) {
1475 if (!MaxDepth || (A == 1 && !AliasScope))
1476 return;
1477
1478 ScopedNoAliasAAResult ScopedNoAlias;
1479
1480 for (User *U : Ptr->users()) {
1481 if (auto *I = dyn_cast<Instruction>(U)) {
1482 if (AliasScope && I->mayReadOrWriteMemory()) {
1483 MDNode *AS = I->getMetadata(LLVMContext::MD_alias_scope);
1484 AS = (AS ? MDNode::getMostGenericAliasScope(AS, AliasScope)
1485 : AliasScope);
1486 I->setMetadata(LLVMContext::MD_alias_scope, AS);
1487
1488 MDNode *NA = I->getMetadata(LLVMContext::MD_noalias);
1489
1490 // Scoped aliases can originate from two different domains.
1491 // First domain would be from LDS domain (created by this pass).
1492 // All entries (LDS vars) into LDS struct will have same domain.
1493
1494 // Second domain could be existing scoped aliases that are the
1495 // results of noalias params and subsequent optimizations that
1496 // may alter thesse sets.
1497
1498 // We need to be careful how we create new alias sets, and
1499 // have right scopes and domains for loads/stores of these new
1500 // LDS variables. We intersect NoAlias set if alias sets belong
1501 // to the same domain. This is the case if we have memcpy using
1502 // LDS variables. Both src and dst of memcpy would belong to
1503 // LDS struct, they donot alias.
1504 // On the other hand, if one of the domains is LDS and other is
1505 // existing domain prior to LDS, we need to have a union of all
1506 // these aliases set to preserve existing aliasing information.
1507
1508 SmallPtrSet<const MDNode *, 16> ExistingDomains, LDSDomains;
1509 ScopedNoAlias.collectScopedDomains(NA, ExistingDomains);
1510 ScopedNoAlias.collectScopedDomains(NoAlias, LDSDomains);
1511 auto Intersection = set_intersection(ExistingDomains, LDSDomains);
1512 if (Intersection.empty()) {
1513 NA = NA ? MDNode::concatenate(NA, NoAlias) : NoAlias;
1514 } else {
1515 NA = NA ? MDNode::intersect(NA, NoAlias) : NoAlias;
1516 }
1517 I->setMetadata(LLVMContext::MD_noalias, NA);
1518 }
1519 }
1520
1521 if (auto *LI = dyn_cast<LoadInst>(U)) {
1522 LI->setAlignment(std::max(A, LI->getAlign()));
1523 continue;
1524 }
1525 if (auto *SI = dyn_cast<StoreInst>(U)) {
1526 if (SI->getPointerOperand() == Ptr)
1527 SI->setAlignment(std::max(A, SI->getAlign()));
1528 continue;
1529 }
1530 if (auto *AI = dyn_cast<AtomicRMWInst>(U)) {
1531 // None of atomicrmw operations can work on pointers, but let's
1532 // check it anyway in case it will or we will process ConstantExpr.
1533 if (AI->getPointerOperand() == Ptr)
1534 AI->setAlignment(std::max(A, AI->getAlign()));
1535 continue;
1536 }
1537 if (auto *AI = dyn_cast<AtomicCmpXchgInst>(U)) {
1538 if (AI->getPointerOperand() == Ptr)
1539 AI->setAlignment(std::max(A, AI->getAlign()));
1540 continue;
1541 }
1542 if (auto *GEP = dyn_cast<GetElementPtrInst>(U)) {
1543 unsigned BitWidth = DL.getIndexTypeSizeInBits(GEP->getType());
1544 APInt Off(BitWidth, 0);
1545 if (GEP->getPointerOperand() == Ptr) {
1546 Align GA;
1547 if (GEP->accumulateConstantOffset(DL, Off))
1548 GA = commonAlignment(A, Off.getLimitedValue());
1549 refineUsesAlignmentAndAA(GEP, GA, DL, AliasScope, NoAlias,
1550 MaxDepth - 1);
1551 }
1552 continue;
1553 }
1554 if (auto *I = dyn_cast<Instruction>(U)) {
1555 if (I->getOpcode() == Instruction::BitCast ||
1556 I->getOpcode() == Instruction::AddrSpaceCast)
1557 refineUsesAlignmentAndAA(I, A, DL, AliasScope, NoAlias, MaxDepth - 1);
1558 }
1559 }
1560 }
1561};
1562
1563class AMDGPULowerModuleLDSLegacy : public ModulePass {
1564public:
1565 const AMDGPUTargetMachine *TM;
1566 static char ID;
1567
1568 AMDGPULowerModuleLDSLegacy(const AMDGPUTargetMachine *TM = nullptr)
1569 : ModulePass(ID), TM(TM) {}
1570
1571 void getAnalysisUsage(AnalysisUsage &AU) const override {
1572 if (!TM)
1574 }
1575
1576 bool runOnModule(Module &M) override {
1577 if (!TM) {
1578 auto &TPC = getAnalysis<TargetPassConfig>();
1579 TM = &TPC.getTM<AMDGPUTargetMachine>();
1580 }
1581
1582 return AMDGPULowerModuleLDS(*TM).runOnModule(M);
1583 }
1584};
1585
1586} // namespace
1587char AMDGPULowerModuleLDSLegacy::ID = 0;
1588
1589char &llvm::AMDGPULowerModuleLDSLegacyPassID = AMDGPULowerModuleLDSLegacy::ID;
1590
1591INITIALIZE_PASS_BEGIN(AMDGPULowerModuleLDSLegacy, DEBUG_TYPE,
1592 "Lower uses of LDS variables from non-kernel functions",
1593 false, false)
1595INITIALIZE_PASS_END(AMDGPULowerModuleLDSLegacy, DEBUG_TYPE,
1596 "Lower uses of LDS variables from non-kernel functions",
1598
1599ModulePass *
1601 return new AMDGPULowerModuleLDSLegacy(TM);
1602}
1603
1606 return AMDGPULowerModuleLDS(TM).runOnModule(M) ? PreservedAnalyses::none()
1608}
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
aarch64 promote const
The AMDGPU TargetMachine interface definition for hw codegen targets.
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
This file implements the BitVector class.
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
This file provides interfaces used to build and manipulate a call graph, which is a very useful tool ...
#define clEnumValN(ENUMVAL, FLAGNAME, DESC)
This file contains the declarations for the subclasses of Constant, which represent the different fla...
DXIL Forward Handle Accesses
This file defines the DenseMap class.
This file defines the DenseSet and SmallDenseSet classes.
#define DEBUG_TYPE
Hexagon Common GEP
#define F(x, y, z)
Definition MD5.cpp:54
#define I(x, y, z)
Definition MD5.cpp:57
const std::string FatArchTraits< MachO::fat_arch >::StructName
This file provides an interface for laying out a sequence of fields as a struct in a way that attempt...
#define INITIALIZE_PASS_DEPENDENCY(depName)
Definition PassSupport.h:42
#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis)
Definition PassSupport.h:44
#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis)
Definition PassSupport.h:39
This file contains some templates that are useful if you are working with the STL at all.
This is the interface for a metadata-based scoped no-alias analysis.
This file defines generic set operations that may be used on set's of different types,...
This file defines the SmallString class.
Target-Independent Code Generator Pass Configuration Options pass.
Class for arbitrary precision integers.
Definition APInt.h:78
uint64_t getZExtValue() const
Get zero extended value.
Definition APInt.h:1563
Represent the analysis usage information of a pass.
AnalysisUsage & addRequired()
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:40
size_t size() const
size - Get the array size.
Definition ArrayRef.h:142
bool empty() const
empty - Check if the array is empty.
Definition ArrayRef.h:137
static LLVM_ABI ArrayType * get(Type *ElementType, uint64_t NumElements)
This static method is the primary way to construct an ArrayType.
LLVM Basic Block Representation.
Definition BasicBlock.h:62
LLVM_ABI const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
void reserve(unsigned N)
Definition BitVector.h:367
void push_back(bool Val)
Definition BitVector.h:485
The basic data container for the call graph of a Module of IR.
Definition CallGraph.h:72
static LLVM_ABI Constant * get(ArrayType *T, ArrayRef< Constant * > V)
static ConstantAsMetadata * get(Constant *C)
Definition Metadata.h:537
static LLVM_ABI Constant * getPointerBitCastOrAddrSpaceCast(Constant *C, Type *Ty)
Create a BitCast or AddrSpaceCast for a pointer type depending on the address space.
static Constant * getGetElementPtr(Type *Ty, Constant *C, ArrayRef< Constant * > IdxList, GEPNoWrapFlags NW=GEPNoWrapFlags::none(), std::optional< ConstantRange > InRange=std::nullopt, Type *OnlyIfReducedTy=nullptr)
Getelementptr form.
Definition Constants.h:1445
This is an important base class in LLVM.
Definition Constant.h:43
LLVM_ABI void removeDeadConstantUsers() const
If there are any dead constant users dangling off of this constant, remove them.
A parsed version of the target data layout string in and methods for querying it.
Definition DataLayout.h:64
iterator find(const_arg_type_t< KeyT > Val)
Definition DenseMap.h:178
std::pair< iterator, bool > try_emplace(KeyT &&Key, Ts &&...Args)
Definition DenseMap.h:256
iterator end()
Definition DenseMap.h:81
bool contains(const_arg_type_t< KeyT > Val) const
Return true if the specified key is in the map, false otherwise.
Definition DenseMap.h:169
Implements a dense probed hash-table based set.
Definition DenseSet.h:279
LLVM_ABI void setMetadata(unsigned KindID, MDNode *Node)
Set a particular kind of metadata attachment.
LLVM_ABI bool isAbsoluteSymbolRef() const
Returns whether this is a reference to an absolute symbol.
Definition Globals.cpp:455
void setLinkage(LinkageTypes LT)
PointerType * getType() const
Global values are always pointers.
@ InternalLinkage
Rename collisions when linking (static functions).
Definition GlobalValue.h:60
@ ExternalLinkage
Externally visible function.
Definition GlobalValue.h:53
Type * getValueType() const
LLVM_ABI void setInitializer(Constant *InitVal)
setInitializer - Sets the initializer for this global variable, removing any existing initializer if ...
Definition Globals.cpp:542
bool hasInitializer() const
Definitions have initializers, declarations don't.
LLVM_ABI uint64_t getGlobalSize(const DataLayout &DL) const
Get the size of this global variable in bytes.
Definition Globals.cpp:569
LLVM_ABI void eraseFromParent()
eraseFromParent - This method unlinks 'this' from the containing module and deletes it.
Definition Globals.cpp:538
void setAlignment(Align Align)
Sets the alignment attribute of the GlobalVariable.
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition IRBuilder.h:2835
bool runOnModule(Module &) override
ImmutablePasses are never run.
Definition Pass.h:302
This is an important class for using LLVM in a threaded context.
Definition LLVMContext.h:68
MDNode * createAnonymousAliasScope(MDNode *Domain, StringRef Name=StringRef())
Return metadata appropriate for an alias scope root node.
Definition MDBuilder.h:195
MDNode * createAnonymousAliasScopeDomain(StringRef Name=StringRef())
Return metadata appropriate for an alias scope domain node.
Definition MDBuilder.h:188
Metadata node.
Definition Metadata.h:1080
static LLVM_ABI MDNode * getMostGenericAliasScope(MDNode *A, MDNode *B)
static LLVM_ABI MDNode * concatenate(MDNode *A, MDNode *B)
Methods for metadata merging.
static MDTuple * get(LLVMContext &Context, ArrayRef< Metadata * > MDs)
Definition Metadata.h:1572
static LLVM_ABI MDNode * intersect(MDNode *A, MDNode *B)
Root of the metadata hierarchy.
Definition Metadata.h:64
ModulePass class - This class is used to implement unstructured interprocedural optimizations and ana...
Definition Pass.h:255
A Module instance is used to store all the information related to an LLVM module.
Definition Module.h:67
@ Error
Emits an error if two values disagree, otherwise the resulting value is that of the operands.
Definition Module.h:120
A tuple of MDNodes.
Definition Metadata.h:1760
LLVM_ABI void addOperand(MDNode *M)
A container for an operand bundle being viewed as a set of values rather than a set of uses.
static PointerType * getUnqual(Type *ElementType)
This constructs a pointer to an object of the specified type in the default address space (address sp...
static LLVM_ABI PointerType * get(Type *ElementType, unsigned AddressSpace)
This constructs a pointer to an object of the specified type in a numbered address space.
static LLVM_ABI PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
A set of analyses that are preserved following a run of a transformation pass.
Definition Analysis.h:112
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition Analysis.h:115
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition Analysis.h:118
A simple AA result which uses scoped-noalias metadata to answer queries.
static LLVM_ABI void collectScopedDomains(const MDNode *NoAlias, SmallPtrSetImpl< const MDNode * > &Domains)
Collect the set of scoped domains relevant to the noalias scopes.
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:151
size_type count(ConstPtrType Ptr) const
count - Return 1 if the specified pointer is in the set, 0 otherwise.
std::pair< iterator, bool > insert(PtrType Ptr)
Inserts Ptr if and only if there is no element in the container equal to Ptr.
SmallPtrSet - This class implements a set which is optimized for holding SmallSize or less elements.
reference emplace_back(ArgTypes &&... Args)
void reserve(size_type N)
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Class to represent struct types.
static LLVM_ABI StructType * create(LLVMContext &Context, StringRef Name)
This creates an identified struct.
Definition Type.cpp:689
Target-Independent Code Generator Pass Configuration Options.
Twine - A lightweight data structure for efficiently representing the concatenation of temporary valu...
Definition Twine.h:82
The instances of the Type class are immutable: once they are created, they are never changed.
Definition Type.h:46
static LLVM_ABI IntegerType * getInt32Ty(LLVMContext &C)
Definition Type.cpp:313
LLVM_ABI unsigned getPointerAddressSpace() const
Get the address space of this pointer or pointer vector type.
static LLVM_ABI IntegerType * getInt8Ty(LLVMContext &C)
Definition Type.cpp:311
A Use represents the edge between a Value definition and its users.
Definition Use.h:35
static LLVM_ABI ValueAsMetadata * get(Value *V)
Definition Metadata.cpp:509
LLVM Value Representation.
Definition Value.h:75
iterator_range< user_iterator > users()
Definition Value.h:426
bool use_empty() const
Definition Value.h:346
LLVM_ABI bool replaceUsesWithIf(Value *New, llvm::function_ref< bool(Use &U)> ShouldReplace)
Go through the uses list for this definition and make each use point to "V" if the callback ShouldRep...
Definition Value.cpp:557
iterator_range< use_iterator > uses()
Definition Value.h:380
bool hasName() const
Definition Value.h:261
LLVM_ABI StringRef getName() const
Return a constant reference to the value's name.
Definition Value.cpp:318
std::pair< iterator, bool > insert(const ValueT &V)
Definition DenseSet.h:202
size_type size() const
Definition DenseSet.h:87
bool contains(const_arg_type_t< ValueT > V) const
Check if the set contains the given element.
Definition DenseSet.h:175
bool erase(const ValueT &V)
Definition DenseSet.h:100
size_type count(const_arg_type_t< ValueT > V) const
Return 1 if the specified key is in the set, 0 otherwise.
Definition DenseSet.h:180
A raw_ostream that writes to an std::string.
Changed
@ LOCAL_ADDRESS
Address space for local memory.
@ CONSTANT_ADDRESS
Address space for constant memory (VTX2).
bool isDynamicLDS(const GlobalVariable &GV)
void removeFnAttrFromReachable(CallGraph &CG, Function *KernelRoot, ArrayRef< StringRef > FnAttrs)
Strip FnAttr attribute from any functions where we may have introduced its use.
LLVM_READNONE constexpr bool isKernel(CallingConv::ID CC)
void getUsesOfLDSByFunction(const CallGraph &CG, Module &M, FunctionVariableMap &kernels, FunctionVariableMap &Functions)
LDSUsesInfoTy getTransitiveUsesOfLDS(const CallGraph &CG, Module &M)
DenseMap< Function *, DenseSet< GlobalVariable * > > FunctionVariableMap
TargetExtType * isNamedBarrier(const GlobalVariable &GV)
bool isLDSVariableToLower(const GlobalVariable &GV)
bool eliminateConstantExprUsesOfLDSFromAllInstructions(Module &M)
Align getAlign(const DataLayout &DL, const GlobalVariable *GV)
DenseMap< GlobalVariable *, DenseSet< Function * > > VariableFunctionMap
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
LLVM_ABI Function * getOrInsertDeclaration(Module *M, ID id, ArrayRef< Type * > OverloadTys={})
Look up the Function declaration of the intrinsic id in the Module M.
ValuesClass values(OptsTy... Options)
Helper to build a ValuesClass by forwarding a variable number of arguments as an initializer list to ...
initializer< Ty > init(const Ty &Val)
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:532
bool operator<(int64_t V1, const APSInt &V2)
Definition APSInt.h:360
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition STLExtras.h:634
LLVM_ABI std::string getUniqueModuleId(Module *M)
Produce a unique identifier for this module by taking the MD5 sum of the names of the module's strong...
void sort(IteratorTy Start, IteratorTy End)
Definition STLExtras.h:1636
constexpr uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
char & AMDGPULowerModuleLDSLegacyPassID
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:547
S1Ty set_intersection(const S1Ty &S1, const S2Ty &S2)
set_intersection(A, B) - Return A ^ B
LLVM_ABI void removeFromUsedLists(Module &M, function_ref< bool(Constant *)> ShouldRemove)
Removes global values from the llvm.used and llvm.compiler.used arrays.
format_object< Ts... > format(const char *Fmt, const Ts &... Vals)
These are helper functions used to produce formatted output.
Definition Format.h:129
ModulePass * createAMDGPULowerModuleLDSLegacyPass(const AMDGPUTargetMachine *TM=nullptr)
@ Other
Any other memory.
Definition ModRef.h:68
LLVM_ABI void appendToCompilerUsed(Module &M, ArrayRef< GlobalValue * > Values)
Adds global values to the llvm.compiler.used list.
LLVM_ABI std::pair< uint64_t, Align > performOptimizedStructLayout(MutableArrayRef< OptimizedStructLayoutField > Fields)
Compute a layout for a struct containing the given fields, making a best-effort attempt to minimize t...
constexpr unsigned BitWidth
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:559
Align commonAlignment(Align A, uint64_t Offset)
Returns the alignment that satisfies both alignments.
Definition Alignment.h:201
AnalysisManager< Module > ModuleAnalysisManager
Convenience typedef for the Module analysis manager.
Definition MIRParser.h:39
LLVM_ABI void reportFatalUsageError(Error Err)
Report a fatal error that does not indicate a bug in LLVM.
Definition Error.cpp:177
#define N
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
const AMDGPUTargetMachine & TM
Definition AMDGPU.h:143
FunctionVariableMap direct_access
FunctionVariableMap indirect_access
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77