LLVM 19.0.0git
AMDGPULowerModuleLDSPass.cpp
Go to the documentation of this file.
1//===-- AMDGPULowerModuleLDSPass.cpp ------------------------------*- C++ -*-=//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This pass eliminates local data store, LDS, uses from non-kernel functions.
10// LDS is contiguous memory allocated per kernel execution.
11//
12// Background.
13//
14// The programming model is global variables, or equivalently function local
15// static variables, accessible from kernels or other functions. For uses from
16// kernels this is straightforward - assign an integer to the kernel for the
17// memory required by all the variables combined, allocate them within that.
18// For uses from functions there are performance tradeoffs to choose between.
19//
20// This model means the GPU runtime can specify the amount of memory allocated.
21// If this is more than the kernel assumed, the excess can be made available
22// using a language specific feature, which IR represents as a variable with
23// no initializer. This feature is referred to here as "Dynamic LDS" and is
24// lowered slightly differently to the normal case.
25//
26// Consequences of this GPU feature:
27// - memory is limited and exceeding it halts compilation
28// - a global accessed by one kernel exists independent of other kernels
29// - a global exists independent of simultaneous execution of the same kernel
30// - the address of the global may be different from different kernels as they
31// do not alias, which permits only allocating variables they use
32// - if the address is allowed to differ, functions need help to find it
33//
34// Uses from kernels are implemented here by grouping them in a per-kernel
35// struct instance. This duplicates the variables, accurately modelling their
36// aliasing properties relative to a single global representation. It also
37// permits control over alignment via padding.
38//
39// Uses from functions are more complicated and the primary purpose of this
40// IR pass. Several different lowering are chosen between to meet requirements
41// to avoid allocating any LDS where it is not necessary, as that impacts
42// occupancy and may fail the compilation, while not imposing overhead on a
43// feature whose primary advantage over global memory is performance. The basic
44// design goal is to avoid one kernel imposing overhead on another.
45//
46// Implementation.
47//
48// LDS variables with constant annotation or non-undef initializer are passed
49// through unchanged for simplification or error diagnostics in later passes.
50// Non-undef initializers are not yet implemented for LDS.
51//
52// LDS variables that are always allocated at the same address can be found
53// by lookup at that address. Otherwise runtime information/cost is required.
54//
55// The simplest strategy possible is to group all LDS variables in a single
56// struct and allocate that struct in every kernel such that the original
57// variables are always at the same address. LDS is however a limited resource
58// so this strategy is unusable in practice. It is not implemented here.
59//
60// Strategy | Precise allocation | Zero runtime cost | General purpose |
61// --------+--------------------+-------------------+-----------------+
62// Module | No | Yes | Yes |
63// Table | Yes | No | Yes |
64// Kernel | Yes | Yes | No |
65// Hybrid | Yes | Partial | Yes |
66//
67// "Module" spends LDS memory to save cycles. "Table" spends cycles and global
68// memory to save LDS. "Kernel" is as fast as kernel allocation but only works
69// for variables that are known reachable from a single kernel. "Hybrid" picks
70// between all three. When forced to choose between LDS and cycles we minimise
71// LDS use.
72
73// The "module" lowering implemented here finds LDS variables which are used by
74// non-kernel functions and creates a new struct with a field for each of those
75// LDS variables. Variables that are only used from kernels are excluded.
76//
77// The "table" lowering implemented here has three components.
78// First kernels are assigned a unique integer identifier which is available in
79// functions it calls through the intrinsic amdgcn_lds_kernel_id. The integer
80// is passed through a specific SGPR, thus works with indirect calls.
81// Second, each kernel allocates LDS variables independent of other kernels and
82// writes the addresses it chose for each variable into an array in consistent
83// order. If the kernel does not allocate a given variable, it writes undef to
84// the corresponding array location. These arrays are written to a constant
85// table in the order matching the kernel unique integer identifier.
86// Third, uses from non-kernel functions are replaced with a table lookup using
87// the intrinsic function to find the address of the variable.
88//
89// "Kernel" lowering is only applicable for variables that are unambiguously
90// reachable from exactly one kernel. For those cases, accesses to the variable
91// can be lowered to ConstantExpr address of a struct instance specific to that
92// one kernel. This is zero cost in space and in compute. It will raise a fatal
93// error on any variable that might be reachable from multiple kernels and is
94// thus most easily used as part of the hybrid lowering strategy.
95//
96// Hybrid lowering is a mixture of the above. It uses the zero cost kernel
97// lowering where it can. It lowers the variable accessed by the greatest
98// number of kernels using the module strategy as that is free for the first
99// variable. Any futher variables that can be lowered with the module strategy
100// without incurring LDS memory overhead are. The remaining ones are lowered
101// via table.
102//
103// Consequences
104// - No heuristics or user controlled magic numbers, hybrid is the right choice
105// - Kernels that don't use functions (or have had them all inlined) are not
106// affected by any lowering for kernels that do.
107// - Kernels that don't make indirect function calls are not affected by those
108// that do.
109// - Variables which are used by lots of kernels, e.g. those injected by a
110// language runtime in most kernels, are expected to have no overhead
111// - Implementations that instantiate templates per-kernel where those templates
112// use LDS are expected to hit the "Kernel" lowering strategy
113// - The runtime properties impose a cost in compiler implementation complexity
114//
115// Dynamic LDS implementation
116// Dynamic LDS is lowered similarly to the "table" strategy above and uses the
117// same intrinsic to identify which kernel is at the root of the dynamic call
118// graph. This relies on the specified behaviour that all dynamic LDS variables
119// alias one another, i.e. are at the same address, with respect to a given
120// kernel. Therefore this pass creates new dynamic LDS variables for each kernel
121// that allocates any dynamic LDS and builds a table of addresses out of those.
122// The AMDGPUPromoteAlloca pass skips kernels that use dynamic LDS.
123// The corresponding optimisation for "kernel" lowering where the table lookup
124// is elided is not implemented.
125//
126//
127// Implementation notes / limitations
128// A single LDS global variable represents an instance per kernel that can reach
129// said variables. This pass essentially specialises said variables per kernel.
130// Handling ConstantExpr during the pass complicated this significantly so now
131// all ConstantExpr uses of LDS variables are expanded to instructions. This
132// may need amending when implementing non-undef initialisers.
133//
134// Lowering is split between this IR pass and the back end. This pass chooses
135// where given variables should be allocated and marks them with metadata,
136// MD_absolute_symbol. The backend places the variables in coincidentally the
137// same location and raises a fatal error if something has gone awry. This works
138// in practice because the only pass between this one and the backend that
139// changes LDS is PromoteAlloca and the changes it makes do not conflict.
140//
141// Addresses are written to constant global arrays based on the same metadata.
142//
143// The backend lowers LDS variables in the order of traversal of the function.
144// This is at odds with the deterministic layout required. The workaround is to
145// allocate the fixed-address variables immediately upon starting the function
146// where they can be placed as intended. This requires a means of mapping from
147// the function to the variables that it allocates. For the module scope lds,
148// this is via metadata indicating whether the variable is not required. If a
149// pass deletes that metadata, a fatal error on disagreement with the absolute
150// symbol metadata will occur. For kernel scope and dynamic, this is by _name_
151// correspondence between the function and the variable. It requires the
152// kernel to have a name (which is only a limitation for tests in practice) and
153// for nothing to rename the corresponding symbols. This is a hazard if the pass
154// is run multiple times during debugging. Alternative schemes considered all
155// involve bespoke metadata.
156//
157// If the name correspondence can be replaced, multiple distinct kernels that
158// have the same memory layout can map to the same kernel id (as the address
159// itself is handled by the absolute symbol metadata) and that will allow more
160// uses of the "kernel" style faster lowering and reduce the size of the lookup
161// tables.
162//
163// There is a test that checks this does not fire for a graphics shader. This
164// lowering is expected to work for graphics if the isKernel test is changed.
165//
166// The current markUsedByKernel is sufficient for PromoteAlloca but is elided
167// before codegen. Replacing this with an equivalent intrinsic which lasts until
168// shortly after the machine function lowering of LDS would help break the name
169// mapping. The other part needed is probably to amend PromoteAlloca to embed
170// the LDS variables it creates in the same struct created here. That avoids the
171// current hazard where a PromoteAlloca LDS variable might be allocated before
172// the kernel scope (and thus error on the address check). Given a new invariant
173// that no LDS variables exist outside of the structs managed here, and an
174// intrinsic that lasts until after the LDS frame lowering, it should be
175// possible to drop the name mapping and fold equivalent memory layouts.
176//
177//===----------------------------------------------------------------------===//
178
179#include "AMDGPU.h"
180#include "AMDGPUTargetMachine.h"
181#include "Utils/AMDGPUBaseInfo.h"
183#include "llvm/ADT/BitVector.h"
184#include "llvm/ADT/DenseMap.h"
185#include "llvm/ADT/DenseSet.h"
186#include "llvm/ADT/STLExtras.h"
190#include "llvm/IR/Constants.h"
191#include "llvm/IR/DerivedTypes.h"
192#include "llvm/IR/IRBuilder.h"
193#include "llvm/IR/InlineAsm.h"
194#include "llvm/IR/Instructions.h"
195#include "llvm/IR/IntrinsicsAMDGPU.h"
196#include "llvm/IR/MDBuilder.h"
199#include "llvm/Pass.h"
201#include "llvm/Support/Debug.h"
202#include "llvm/Support/Format.h"
207
208#include <vector>
209
210#include <cstdio>
211
212#define DEBUG_TYPE "amdgpu-lower-module-lds"
213
214using namespace llvm;
215
216namespace {
217
218cl::opt<bool> SuperAlignLDSGlobals(
219 "amdgpu-super-align-lds-globals",
220 cl::desc("Increase alignment of LDS if it is not on align boundary"),
221 cl::init(true), cl::Hidden);
222
223enum class LoweringKind { module, table, kernel, hybrid };
224cl::opt<LoweringKind> LoweringKindLoc(
225 "amdgpu-lower-module-lds-strategy",
226 cl::desc("Specify lowering strategy for function LDS access:"), cl::Hidden,
227 cl::init(LoweringKind::hybrid),
229 clEnumValN(LoweringKind::table, "table", "Lower via table lookup"),
230 clEnumValN(LoweringKind::module, "module", "Lower via module struct"),
232 LoweringKind::kernel, "kernel",
233 "Lower variables reachable from one kernel, otherwise abort"),
234 clEnumValN(LoweringKind::hybrid, "hybrid",
235 "Lower via mixture of above strategies")));
236
237bool isKernelLDS(const Function *F) {
238 // Some weirdness here. AMDGPU::isKernelCC does not call into
239 // AMDGPU::isKernel with the calling conv, it instead calls into
240 // isModuleEntryFunction which returns true for more calling conventions
241 // than AMDGPU::isKernel does. There's a FIXME on AMDGPU::isKernel.
242 // There's also a test that checks that the LDS lowering does not hit on
243 // a graphics shader, denoted amdgpu_ps, so stay with the limited case.
244 // Putting LDS in the name of the function to draw attention to this.
245 return AMDGPU::isKernel(F->getCallingConv());
246}
247
248template <typename T> std::vector<T> sortByName(std::vector<T> &&V) {
249 llvm::sort(V.begin(), V.end(), [](const auto *L, const auto *R) {
250 return L->getName() < R->getName();
251 });
252 return {std::move(V)};
253}
254
255class AMDGPULowerModuleLDS {
256 const AMDGPUTargetMachine &TM;
257
258 static void
259 removeLocalVarsFromUsedLists(Module &M,
260 const DenseSet<GlobalVariable *> &LocalVars) {
261 // The verifier rejects used lists containing an inttoptr of a constant
262 // so remove the variables from these lists before replaceAllUsesWith
263 SmallPtrSet<Constant *, 8> LocalVarsSet;
264 for (GlobalVariable *LocalVar : LocalVars)
265 LocalVarsSet.insert(cast<Constant>(LocalVar->stripPointerCasts()));
266
268 M, [&LocalVarsSet](Constant *C) { return LocalVarsSet.count(C); });
269
270 for (GlobalVariable *LocalVar : LocalVars)
271 LocalVar->removeDeadConstantUsers();
272 }
273
274 static void markUsedByKernel(Function *Func, GlobalVariable *SGV) {
275 // The llvm.amdgcn.module.lds instance is implicitly used by all kernels
276 // that might call a function which accesses a field within it. This is
277 // presently approximated to 'all kernels' if there are any such functions
278 // in the module. This implicit use is redefined as an explicit use here so
279 // that later passes, specifically PromoteAlloca, account for the required
280 // memory without any knowledge of this transform.
281
282 // An operand bundle on llvm.donothing works because the call instruction
283 // survives until after the last pass that needs to account for LDS. It is
284 // better than inline asm as the latter survives until the end of codegen. A
285 // totally robust solution would be a function with the same semantics as
286 // llvm.donothing that takes a pointer to the instance and is lowered to a
287 // no-op after LDS is allocated, but that is not presently necessary.
288
289 // This intrinsic is eliminated shortly before instruction selection. It
290 // does not suffice to indicate to ISel that a given global which is not
291 // immediately used by the kernel must still be allocated by it. An
292 // equivalent target specific intrinsic which lasts until immediately after
293 // codegen would suffice for that, but one would still need to ensure that
294 // the variables are allocated in the anticpated order.
295 BasicBlock *Entry = &Func->getEntryBlock();
296 IRBuilder<> Builder(Entry, Entry->getFirstNonPHIIt());
297
298 Function *Decl =
299 Intrinsic::getDeclaration(Func->getParent(), Intrinsic::donothing, {});
300
301 Value *UseInstance[1] = {
302 Builder.CreateConstInBoundsGEP1_32(SGV->getValueType(), SGV, 0)};
303
304 Builder.CreateCall(
305 Decl, {}, {OperandBundleDefT<Value *>("ExplicitUse", UseInstance)});
306 }
307
308 static bool eliminateConstantExprUsesOfLDSFromAllInstructions(Module &M) {
309 // Constants are uniqued within LLVM. A ConstantExpr referring to a LDS
310 // global may have uses from multiple different functions as a result.
311 // This pass specialises LDS variables with respect to the kernel that
312 // allocates them.
313
314 // This is semantically equivalent to (the unimplemented as slow):
315 // for (auto &F : M.functions())
316 // for (auto &BB : F)
317 // for (auto &I : BB)
318 // for (Use &Op : I.operands())
319 // if (constantExprUsesLDS(Op))
320 // replaceConstantExprInFunction(I, Op);
321
322 SmallVector<Constant *> LDSGlobals;
323 for (auto &GV : M.globals())
325 LDSGlobals.push_back(&GV);
326
327 return convertUsersOfConstantsToInstructions(LDSGlobals);
328 }
329
330public:
331 AMDGPULowerModuleLDS(const AMDGPUTargetMachine &TM_) : TM(TM_) {}
332
333 using FunctionVariableMap = DenseMap<Function *, DenseSet<GlobalVariable *>>;
334
335 using VariableFunctionMap = DenseMap<GlobalVariable *, DenseSet<Function *>>;
336
337 static void getUsesOfLDSByFunction(CallGraph const &CG, Module &M,
338 FunctionVariableMap &kernels,
339 FunctionVariableMap &functions) {
340
341 // Get uses from the current function, excluding uses by called functions
342 // Two output variables to avoid walking the globals list twice
343 for (auto &GV : M.globals()) {
345 continue;
346 }
347
348 if (GV.isAbsoluteSymbolRef()) {
350 "LDS variables with absolute addresses are unimplemented.");
351 }
352
353 for (User *V : GV.users()) {
354 if (auto *I = dyn_cast<Instruction>(V)) {
355 Function *F = I->getFunction();
356 if (isKernelLDS(F)) {
357 kernels[F].insert(&GV);
358 } else {
359 functions[F].insert(&GV);
360 }
361 }
362 }
363 }
364 }
365
366 struct LDSUsesInfoTy {
367 FunctionVariableMap direct_access;
368 FunctionVariableMap indirect_access;
369 };
370
371 static LDSUsesInfoTy getTransitiveUsesOfLDS(CallGraph const &CG, Module &M) {
372
373 FunctionVariableMap direct_map_kernel;
374 FunctionVariableMap direct_map_function;
375 getUsesOfLDSByFunction(CG, M, direct_map_kernel, direct_map_function);
376
377 // Collect variables that are used by functions whose address has escaped
378 DenseSet<GlobalVariable *> VariablesReachableThroughFunctionPointer;
379 for (Function &F : M.functions()) {
380 if (!isKernelLDS(&F))
381 if (F.hasAddressTaken(nullptr,
382 /* IgnoreCallbackUses */ false,
383 /* IgnoreAssumeLikeCalls */ false,
384 /* IgnoreLLVMUsed */ true,
385 /* IgnoreArcAttachedCall */ false)) {
386 set_union(VariablesReachableThroughFunctionPointer,
387 direct_map_function[&F]);
388 }
389 }
390
391 auto functionMakesUnknownCall = [&](const Function *F) -> bool {
392 assert(!F->isDeclaration());
393 for (const CallGraphNode::CallRecord &R : *CG[F]) {
394 if (!R.second->getFunction()) {
395 return true;
396 }
397 }
398 return false;
399 };
400
401 // Work out which variables are reachable through function calls
402 FunctionVariableMap transitive_map_function = direct_map_function;
403
404 // If the function makes any unknown call, assume the worst case that it can
405 // access all variables accessed by functions whose address escaped
406 for (Function &F : M.functions()) {
407 if (!F.isDeclaration() && functionMakesUnknownCall(&F)) {
408 if (!isKernelLDS(&F)) {
409 set_union(transitive_map_function[&F],
410 VariablesReachableThroughFunctionPointer);
411 }
412 }
413 }
414
415 // Direct implementation of collecting all variables reachable from each
416 // function
417 for (Function &Func : M.functions()) {
418 if (Func.isDeclaration() || isKernelLDS(&Func))
419 continue;
420
421 DenseSet<Function *> seen; // catches cycles
423
424 while (!wip.empty()) {
425 Function *F = wip.pop_back_val();
426
427 // Can accelerate this by referring to transitive map for functions that
428 // have already been computed, with more care than this
429 set_union(transitive_map_function[&Func], direct_map_function[F]);
430
431 for (const CallGraphNode::CallRecord &R : *CG[F]) {
432 Function *ith = R.second->getFunction();
433 if (ith) {
434 if (!seen.contains(ith)) {
435 seen.insert(ith);
436 wip.push_back(ith);
437 }
438 }
439 }
440 }
441 }
442
443 // direct_map_kernel lists which variables are used by the kernel
444 // find the variables which are used through a function call
445 FunctionVariableMap indirect_map_kernel;
446
447 for (Function &Func : M.functions()) {
448 if (Func.isDeclaration() || !isKernelLDS(&Func))
449 continue;
450
451 for (const CallGraphNode::CallRecord &R : *CG[&Func]) {
452 Function *ith = R.second->getFunction();
453 if (ith) {
454 set_union(indirect_map_kernel[&Func], transitive_map_function[ith]);
455 } else {
456 set_union(indirect_map_kernel[&Func],
457 VariablesReachableThroughFunctionPointer);
458 }
459 }
460 }
461
462 return {std::move(direct_map_kernel), std::move(indirect_map_kernel)};
463 }
464
465 struct LDSVariableReplacement {
466 GlobalVariable *SGV = nullptr;
467 DenseMap<GlobalVariable *, Constant *> LDSVarsToConstantGEP;
468 };
469
470 // remap from lds global to a constantexpr gep to where it has been moved to
471 // for each kernel
472 // an array with an element for each kernel containing where the corresponding
473 // variable was remapped to
474
475 static Constant *getAddressesOfVariablesInKernel(
477 const DenseMap<GlobalVariable *, Constant *> &LDSVarsToConstantGEP) {
478 // Create a ConstantArray containing the address of each Variable within the
479 // kernel corresponding to LDSVarsToConstantGEP, or poison if that kernel
480 // does not allocate it
481 // TODO: Drop the ptrtoint conversion
482
483 Type *I32 = Type::getInt32Ty(Ctx);
484
485 ArrayType *KernelOffsetsType = ArrayType::get(I32, Variables.size());
486
488 for (size_t i = 0; i < Variables.size(); i++) {
489 GlobalVariable *GV = Variables[i];
490 auto ConstantGepIt = LDSVarsToConstantGEP.find(GV);
491 if (ConstantGepIt != LDSVarsToConstantGEP.end()) {
492 auto elt = ConstantExpr::getPtrToInt(ConstantGepIt->second, I32);
493 Elements.push_back(elt);
494 } else {
495 Elements.push_back(PoisonValue::get(I32));
496 }
497 }
498 return ConstantArray::get(KernelOffsetsType, Elements);
499 }
500
501 static GlobalVariable *buildLookupTable(
503 ArrayRef<Function *> kernels,
505 if (Variables.empty()) {
506 return nullptr;
507 }
508 LLVMContext &Ctx = M.getContext();
509
510 const size_t NumberVariables = Variables.size();
511 const size_t NumberKernels = kernels.size();
512
513 ArrayType *KernelOffsetsType =
514 ArrayType::get(Type::getInt32Ty(Ctx), NumberVariables);
515
516 ArrayType *AllKernelsOffsetsType =
517 ArrayType::get(KernelOffsetsType, NumberKernels);
518
519 Constant *Missing = PoisonValue::get(KernelOffsetsType);
520 std::vector<Constant *> overallConstantExprElts(NumberKernels);
521 for (size_t i = 0; i < NumberKernels; i++) {
522 auto Replacement = KernelToReplacement.find(kernels[i]);
523 overallConstantExprElts[i] =
524 (Replacement == KernelToReplacement.end())
525 ? Missing
526 : getAddressesOfVariablesInKernel(
527 Ctx, Variables, Replacement->second.LDSVarsToConstantGEP);
528 }
529
530 Constant *init =
531 ConstantArray::get(AllKernelsOffsetsType, overallConstantExprElts);
532
533 return new GlobalVariable(
534 M, AllKernelsOffsetsType, true, GlobalValue::InternalLinkage, init,
535 "llvm.amdgcn.lds.offset.table", nullptr, GlobalValue::NotThreadLocal,
537 }
538
539 void replaceUseWithTableLookup(Module &M, IRBuilder<> &Builder,
540 GlobalVariable *LookupTable,
541 GlobalVariable *GV, Use &U,
542 Value *OptionalIndex) {
543 // Table is a constant array of the same length as OrderedKernels
544 LLVMContext &Ctx = M.getContext();
545 Type *I32 = Type::getInt32Ty(Ctx);
546 auto *I = cast<Instruction>(U.getUser());
547
548 Value *tableKernelIndex = getTableLookupKernelIndex(M, I->getFunction());
549
550 if (auto *Phi = dyn_cast<PHINode>(I)) {
551 BasicBlock *BB = Phi->getIncomingBlock(U);
552 Builder.SetInsertPoint(&(*(BB->getFirstInsertionPt())));
553 } else {
554 Builder.SetInsertPoint(I);
555 }
556
557 SmallVector<Value *, 3> GEPIdx = {
558 ConstantInt::get(I32, 0),
559 tableKernelIndex,
560 };
561 if (OptionalIndex)
562 GEPIdx.push_back(OptionalIndex);
563
564 Value *Address = Builder.CreateInBoundsGEP(
565 LookupTable->getValueType(), LookupTable, GEPIdx, GV->getName());
566
567 Value *loaded = Builder.CreateLoad(I32, Address);
568
569 Value *replacement =
570 Builder.CreateIntToPtr(loaded, GV->getType(), GV->getName());
571
572 U.set(replacement);
573 }
574
575 void replaceUsesInInstructionsWithTableLookup(
576 Module &M, ArrayRef<GlobalVariable *> ModuleScopeVariables,
577 GlobalVariable *LookupTable) {
578
579 LLVMContext &Ctx = M.getContext();
580 IRBuilder<> Builder(Ctx);
581 Type *I32 = Type::getInt32Ty(Ctx);
582
583 for (size_t Index = 0; Index < ModuleScopeVariables.size(); Index++) {
584 auto *GV = ModuleScopeVariables[Index];
585
586 for (Use &U : make_early_inc_range(GV->uses())) {
587 auto *I = dyn_cast<Instruction>(U.getUser());
588 if (!I)
589 continue;
590
591 replaceUseWithTableLookup(M, Builder, LookupTable, GV, U,
592 ConstantInt::get(I32, Index));
593 }
594 }
595 }
596
597 static DenseSet<Function *> kernelsThatIndirectlyAccessAnyOfPassedVariables(
598 Module &M, LDSUsesInfoTy &LDSUsesInfo,
599 DenseSet<GlobalVariable *> const &VariableSet) {
600
601 DenseSet<Function *> KernelSet;
602
603 if (VariableSet.empty())
604 return KernelSet;
605
606 for (Function &Func : M.functions()) {
607 if (Func.isDeclaration() || !isKernelLDS(&Func))
608 continue;
609 for (GlobalVariable *GV : LDSUsesInfo.indirect_access[&Func]) {
610 if (VariableSet.contains(GV)) {
611 KernelSet.insert(&Func);
612 break;
613 }
614 }
615 }
616
617 return KernelSet;
618 }
619
620 static GlobalVariable *
621 chooseBestVariableForModuleStrategy(const DataLayout &DL,
622 VariableFunctionMap &LDSVars) {
623 // Find the global variable with the most indirect uses from kernels
624
625 struct CandidateTy {
626 GlobalVariable *GV = nullptr;
627 size_t UserCount = 0;
628 size_t Size = 0;
629
630 CandidateTy() = default;
631
632 CandidateTy(GlobalVariable *GV, uint64_t UserCount, uint64_t AllocSize)
633 : GV(GV), UserCount(UserCount), Size(AllocSize) {}
634
635 bool operator<(const CandidateTy &Other) const {
636 // Fewer users makes module scope variable less attractive
637 if (UserCount < Other.UserCount) {
638 return true;
639 }
640 if (UserCount > Other.UserCount) {
641 return false;
642 }
643
644 // Bigger makes module scope variable less attractive
645 if (Size < Other.Size) {
646 return false;
647 }
648
649 if (Size > Other.Size) {
650 return true;
651 }
652
653 // Arbitrary but consistent
654 return GV->getName() < Other.GV->getName();
655 }
656 };
657
658 CandidateTy MostUsed;
659
660 for (auto &K : LDSVars) {
661 GlobalVariable *GV = K.first;
662 if (K.second.size() <= 1) {
663 // A variable reachable by only one kernel is best lowered with kernel
664 // strategy
665 continue;
666 }
667 CandidateTy Candidate(
668 GV, K.second.size(),
669 DL.getTypeAllocSize(GV->getValueType()).getFixedValue());
670 if (MostUsed < Candidate)
671 MostUsed = Candidate;
672 }
673
674 return MostUsed.GV;
675 }
676
677 static void recordLDSAbsoluteAddress(Module *M, GlobalVariable *GV,
678 uint32_t Address) {
679 // Write the specified address into metadata where it can be retrieved by
680 // the assembler. Format is a half open range, [Address Address+1)
681 LLVMContext &Ctx = M->getContext();
682 auto *IntTy =
683 M->getDataLayout().getIntPtrType(Ctx, AMDGPUAS::LOCAL_ADDRESS);
684 auto *MinC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address));
685 auto *MaxC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address + 1));
686 GV->setMetadata(LLVMContext::MD_absolute_symbol,
687 MDNode::get(Ctx, {MinC, MaxC}));
688 }
689
690 DenseMap<Function *, Value *> tableKernelIndexCache;
691 Value *getTableLookupKernelIndex(Module &M, Function *F) {
692 // Accesses from a function use the amdgcn_lds_kernel_id intrinsic which
693 // lowers to a read from a live in register. Emit it once in the entry
694 // block to spare deduplicating it later.
695 auto [It, Inserted] = tableKernelIndexCache.try_emplace(F);
696 if (Inserted) {
697 Function *Decl =
698 Intrinsic::getDeclaration(&M, Intrinsic::amdgcn_lds_kernel_id, {});
699
700 auto InsertAt = F->getEntryBlock().getFirstNonPHIOrDbgOrAlloca();
701 IRBuilder<> Builder(&*InsertAt);
702
703 It->second = Builder.CreateCall(Decl, {});
704 }
705
706 return It->second;
707 }
708
709 static std::vector<Function *> assignLDSKernelIDToEachKernel(
710 Module *M, DenseSet<Function *> const &KernelsThatAllocateTableLDS,
711 DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS) {
712 // Associate kernels in the set with an arbirary but reproducible order and
713 // annotate them with that order in metadata. This metadata is recognised by
714 // the backend and lowered to a SGPR which can be read from using
715 // amdgcn_lds_kernel_id.
716
717 std::vector<Function *> OrderedKernels;
718 if (!KernelsThatAllocateTableLDS.empty() ||
719 !KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
720
721 for (Function &Func : M->functions()) {
722 if (Func.isDeclaration())
723 continue;
724 if (!isKernelLDS(&Func))
725 continue;
726
727 if (KernelsThatAllocateTableLDS.contains(&Func) ||
728 KernelsThatIndirectlyAllocateDynamicLDS.contains(&Func)) {
729 assert(Func.hasName()); // else fatal error earlier
730 OrderedKernels.push_back(&Func);
731 }
732 }
733
734 // Put them in an arbitrary but reproducible order
735 OrderedKernels = sortByName(std::move(OrderedKernels));
736
737 // Annotate the kernels with their order in this vector
738 LLVMContext &Ctx = M->getContext();
739 IRBuilder<> Builder(Ctx);
740
741 if (OrderedKernels.size() > UINT32_MAX) {
742 // 32 bit keeps it in one SGPR. > 2**32 kernels won't fit on the GPU
743 report_fatal_error("Unimplemented LDS lowering for > 2**32 kernels");
744 }
745
746 for (size_t i = 0; i < OrderedKernels.size(); i++) {
747 Metadata *AttrMDArgs[1] = {
749 };
750 OrderedKernels[i]->setMetadata("llvm.amdgcn.lds.kernel.id",
751 MDNode::get(Ctx, AttrMDArgs));
752 }
753 }
754 return OrderedKernels;
755 }
756
757 static void partitionVariablesIntoIndirectStrategies(
758 Module &M, LDSUsesInfoTy const &LDSUsesInfo,
759 VariableFunctionMap &LDSToKernelsThatNeedToAccessItIndirectly,
760 DenseSet<GlobalVariable *> &ModuleScopeVariables,
761 DenseSet<GlobalVariable *> &TableLookupVariables,
762 DenseSet<GlobalVariable *> &KernelAccessVariables,
763 DenseSet<GlobalVariable *> &DynamicVariables) {
764
765 GlobalVariable *HybridModuleRoot =
766 LoweringKindLoc != LoweringKind::hybrid
767 ? nullptr
768 : chooseBestVariableForModuleStrategy(
769 M.getDataLayout(), LDSToKernelsThatNeedToAccessItIndirectly);
770
771 DenseSet<Function *> const EmptySet;
772 DenseSet<Function *> const &HybridModuleRootKernels =
773 HybridModuleRoot
774 ? LDSToKernelsThatNeedToAccessItIndirectly[HybridModuleRoot]
775 : EmptySet;
776
777 for (auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {
778 // Each iteration of this loop assigns exactly one global variable to
779 // exactly one of the implementation strategies.
780
781 GlobalVariable *GV = K.first;
783 assert(K.second.size() != 0);
784
785 if (AMDGPU::isDynamicLDS(*GV)) {
786 DynamicVariables.insert(GV);
787 continue;
788 }
789
790 switch (LoweringKindLoc) {
791 case LoweringKind::module:
792 ModuleScopeVariables.insert(GV);
793 break;
794
795 case LoweringKind::table:
796 TableLookupVariables.insert(GV);
797 break;
798
799 case LoweringKind::kernel:
800 if (K.second.size() == 1) {
801 KernelAccessVariables.insert(GV);
802 } else {
804 "cannot lower LDS '" + GV->getName() +
805 "' to kernel access as it is reachable from multiple kernels");
806 }
807 break;
808
809 case LoweringKind::hybrid: {
810 if (GV == HybridModuleRoot) {
811 assert(K.second.size() != 1);
812 ModuleScopeVariables.insert(GV);
813 } else if (K.second.size() == 1) {
814 KernelAccessVariables.insert(GV);
815 } else if (set_is_subset(K.second, HybridModuleRootKernels)) {
816 ModuleScopeVariables.insert(GV);
817 } else {
818 TableLookupVariables.insert(GV);
819 }
820 break;
821 }
822 }
823 }
824
825 // All LDS variables accessed indirectly have now been partitioned into
826 // the distinct lowering strategies.
827 assert(ModuleScopeVariables.size() + TableLookupVariables.size() +
828 KernelAccessVariables.size() + DynamicVariables.size() ==
829 LDSToKernelsThatNeedToAccessItIndirectly.size());
830 }
831
832 static GlobalVariable *lowerModuleScopeStructVariables(
833 Module &M, DenseSet<GlobalVariable *> const &ModuleScopeVariables,
834 DenseSet<Function *> const &KernelsThatAllocateModuleLDS) {
835 // Create a struct to hold the ModuleScopeVariables
836 // Replace all uses of those variables from non-kernel functions with the
837 // new struct instance Replace only the uses from kernel functions that will
838 // allocate this instance. That is a space optimisation - kernels that use a
839 // subset of the module scope struct and do not need to allocate it for
840 // indirect calls will only allocate the subset they use (they do so as part
841 // of the per-kernel lowering).
842 if (ModuleScopeVariables.empty()) {
843 return nullptr;
844 }
845
846 LLVMContext &Ctx = M.getContext();
847
848 LDSVariableReplacement ModuleScopeReplacement =
849 createLDSVariableReplacement(M, "llvm.amdgcn.module.lds",
850 ModuleScopeVariables);
851
852 appendToCompilerUsed(M, {static_cast<GlobalValue *>(
854 cast<Constant>(ModuleScopeReplacement.SGV),
855 PointerType::getUnqual(Ctx)))});
856
857 // module.lds will be allocated at zero in any kernel that allocates it
858 recordLDSAbsoluteAddress(&M, ModuleScopeReplacement.SGV, 0);
859
860 // historic
861 removeLocalVarsFromUsedLists(M, ModuleScopeVariables);
862
863 // Replace all uses of module scope variable from non-kernel functions
864 replaceLDSVariablesWithStruct(
865 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
866 Instruction *I = dyn_cast<Instruction>(U.getUser());
867 if (!I) {
868 return false;
869 }
870 Function *F = I->getFunction();
871 return !isKernelLDS(F);
872 });
873
874 // Replace uses of module scope variable from kernel functions that
875 // allocate the module scope variable, otherwise leave them unchanged
876 // Record on each kernel whether the module scope global is used by it
877
878 for (Function &Func : M.functions()) {
879 if (Func.isDeclaration() || !isKernelLDS(&Func))
880 continue;
881
882 if (KernelsThatAllocateModuleLDS.contains(&Func)) {
883 replaceLDSVariablesWithStruct(
884 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
885 Instruction *I = dyn_cast<Instruction>(U.getUser());
886 if (!I) {
887 return false;
888 }
889 Function *F = I->getFunction();
890 return F == &Func;
891 });
892
893 markUsedByKernel(&Func, ModuleScopeReplacement.SGV);
894 }
895 }
896
897 return ModuleScopeReplacement.SGV;
898 }
899
901 lowerKernelScopeStructVariables(
902 Module &M, LDSUsesInfoTy &LDSUsesInfo,
903 DenseSet<GlobalVariable *> const &ModuleScopeVariables,
904 DenseSet<Function *> const &KernelsThatAllocateModuleLDS,
905 GlobalVariable *MaybeModuleScopeStruct) {
906
907 // Create a struct for each kernel for the non-module-scope variables.
908
910 for (Function &Func : M.functions()) {
911 if (Func.isDeclaration() || !isKernelLDS(&Func))
912 continue;
913
914 DenseSet<GlobalVariable *> KernelUsedVariables;
915 // Allocating variables that are used directly in this struct to get
916 // alignment aware allocation and predictable frame size.
917 for (auto &v : LDSUsesInfo.direct_access[&Func]) {
918 if (!AMDGPU::isDynamicLDS(*v)) {
919 KernelUsedVariables.insert(v);
920 }
921 }
922
923 // Allocating variables that are accessed indirectly so that a lookup of
924 // this struct instance can find them from nested functions.
925 for (auto &v : LDSUsesInfo.indirect_access[&Func]) {
926 if (!AMDGPU::isDynamicLDS(*v)) {
927 KernelUsedVariables.insert(v);
928 }
929 }
930
931 // Variables allocated in module lds must all resolve to that struct,
932 // not to the per-kernel instance.
933 if (KernelsThatAllocateModuleLDS.contains(&Func)) {
934 for (GlobalVariable *v : ModuleScopeVariables) {
935 KernelUsedVariables.erase(v);
936 }
937 }
938
939 if (KernelUsedVariables.empty()) {
940 // Either used no LDS, or the LDS it used was all in the module struct
941 // or dynamically sized
942 continue;
943 }
944
945 // The association between kernel function and LDS struct is done by
946 // symbol name, which only works if the function in question has a
947 // name This is not expected to be a problem in practice as kernels
948 // are called by name making anonymous ones (which are named by the
949 // backend) difficult to use. This does mean that llvm test cases need
950 // to name the kernels.
951 if (!Func.hasName()) {
952 report_fatal_error("Anonymous kernels cannot use LDS variables");
953 }
954
955 std::string VarName =
956 (Twine("llvm.amdgcn.kernel.") + Func.getName() + ".lds").str();
957
958 auto Replacement =
959 createLDSVariableReplacement(M, VarName, KernelUsedVariables);
960
961 // If any indirect uses, create a direct use to ensure allocation
962 // TODO: Simpler to unconditionally mark used but that regresses
963 // codegen in test/CodeGen/AMDGPU/noclobber-barrier.ll
964 auto Accesses = LDSUsesInfo.indirect_access.find(&Func);
965 if ((Accesses != LDSUsesInfo.indirect_access.end()) &&
966 !Accesses->second.empty())
967 markUsedByKernel(&Func, Replacement.SGV);
968
969 // remove preserves existing codegen
970 removeLocalVarsFromUsedLists(M, KernelUsedVariables);
971 KernelToReplacement[&Func] = Replacement;
972
973 // Rewrite uses within kernel to the new struct
974 replaceLDSVariablesWithStruct(
975 M, KernelUsedVariables, Replacement, [&Func](Use &U) {
976 Instruction *I = dyn_cast<Instruction>(U.getUser());
977 return I && I->getFunction() == &Func;
978 });
979 }
980 return KernelToReplacement;
981 }
982
983 static GlobalVariable *
984 buildRepresentativeDynamicLDSInstance(Module &M, LDSUsesInfoTy &LDSUsesInfo,
985 Function *func) {
986 // Create a dynamic lds variable with a name associated with the passed
987 // function that has the maximum alignment of any dynamic lds variable
988 // reachable from this kernel. Dynamic LDS is allocated after the static LDS
989 // allocation, possibly after alignment padding. The representative variable
990 // created here has the maximum alignment of any other dynamic variable
991 // reachable by that kernel. All dynamic LDS variables are allocated at the
992 // same address in each kernel in order to provide the documented aliasing
993 // semantics. Setting the alignment here allows this IR pass to accurately
994 // predict the exact constant at which it will be allocated.
995
996 assert(isKernelLDS(func));
997
998 LLVMContext &Ctx = M.getContext();
999 const DataLayout &DL = M.getDataLayout();
1000 Align MaxDynamicAlignment(1);
1001
1002 auto UpdateMaxAlignment = [&MaxDynamicAlignment, &DL](GlobalVariable *GV) {
1003 if (AMDGPU::isDynamicLDS(*GV)) {
1004 MaxDynamicAlignment =
1005 std::max(MaxDynamicAlignment, AMDGPU::getAlign(DL, GV));
1006 }
1007 };
1008
1009 for (GlobalVariable *GV : LDSUsesInfo.indirect_access[func]) {
1010 UpdateMaxAlignment(GV);
1011 }
1012
1013 for (GlobalVariable *GV : LDSUsesInfo.direct_access[func]) {
1014 UpdateMaxAlignment(GV);
1015 }
1016
1017 assert(func->hasName()); // Checked by caller
1018 auto emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
1020 M, emptyCharArray, false, GlobalValue::ExternalLinkage, nullptr,
1021 Twine("llvm.amdgcn." + func->getName() + ".dynlds"), nullptr, GlobalValue::NotThreadLocal, AMDGPUAS::LOCAL_ADDRESS,
1022 false);
1023 N->setAlignment(MaxDynamicAlignment);
1024
1026 return N;
1027 }
1028
1029 /// Strip "amdgpu-no-lds-kernel-id" from any functions where we may have
1030 /// introduced its use. If AMDGPUAttributor ran prior to the pass, we inferred
1031 /// the lack of llvm.amdgcn.lds.kernel.id calls.
1032 void removeNoLdsKernelIdFromReachable(CallGraph &CG, Function *KernelRoot) {
1033 KernelRoot->removeFnAttr("amdgpu-no-lds-kernel-id");
1034
1035 SmallVector<Function *> Tmp({CG[KernelRoot]->getFunction()});
1036 if (!Tmp.back())
1037 return;
1038
1040 bool SeenUnknownCall = false;
1041
1042 do {
1043 Function *F = Tmp.pop_back_val();
1044
1045 for (auto &N : *CG[F]) {
1046 if (!N.second)
1047 continue;
1048
1049 Function *Callee = N.second->getFunction();
1050 if (!Callee) {
1051 if (!SeenUnknownCall) {
1052 SeenUnknownCall = true;
1053
1054 // If we see any indirect calls, assume nothing about potential
1055 // targets.
1056 // TODO: This could be refined to possible LDS global users.
1057 for (auto &N : *CG.getExternalCallingNode()) {
1058 Function *PotentialCallee = N.second->getFunction();
1059 if (!isKernelLDS(PotentialCallee))
1060 PotentialCallee->removeFnAttr("amdgpu-no-lds-kernel-id");
1061 }
1062
1063 continue;
1064 }
1065 }
1066
1067 Callee->removeFnAttr("amdgpu-no-lds-kernel-id");
1068 if (Visited.insert(Callee).second)
1069 Tmp.push_back(Callee);
1070 }
1071 } while (!Tmp.empty());
1072 }
1073
1074 DenseMap<Function *, GlobalVariable *> lowerDynamicLDSVariables(
1075 Module &M, LDSUsesInfoTy &LDSUsesInfo,
1076 DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS,
1077 DenseSet<GlobalVariable *> const &DynamicVariables,
1078 std::vector<Function *> const &OrderedKernels) {
1079 DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS;
1080 if (!KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
1081 LLVMContext &Ctx = M.getContext();
1082 IRBuilder<> Builder(Ctx);
1083 Type *I32 = Type::getInt32Ty(Ctx);
1084
1085 std::vector<Constant *> newDynamicLDS;
1086
1087 // Table is built in the same order as OrderedKernels
1088 for (auto &func : OrderedKernels) {
1089
1090 if (KernelsThatIndirectlyAllocateDynamicLDS.contains(func)) {
1091 assert(isKernelLDS(func));
1092 if (!func->hasName()) {
1093 report_fatal_error("Anonymous kernels cannot use LDS variables");
1094 }
1095
1096 GlobalVariable *N =
1097 buildRepresentativeDynamicLDSInstance(M, LDSUsesInfo, func);
1098
1099 KernelToCreatedDynamicLDS[func] = N;
1100
1101 markUsedByKernel(func, N);
1102
1103 auto emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
1105 emptyCharArray, N, ConstantInt::get(I32, 0), true);
1106 newDynamicLDS.push_back(ConstantExpr::getPtrToInt(GEP, I32));
1107 } else {
1108 newDynamicLDS.push_back(PoisonValue::get(I32));
1109 }
1110 }
1111 assert(OrderedKernels.size() == newDynamicLDS.size());
1112
1113 ArrayType *t = ArrayType::get(I32, newDynamicLDS.size());
1114 Constant *init = ConstantArray::get(t, newDynamicLDS);
1115 GlobalVariable *table = new GlobalVariable(
1116 M, t, true, GlobalValue::InternalLinkage, init,
1117 "llvm.amdgcn.dynlds.offset.table", nullptr,
1119
1120 for (GlobalVariable *GV : DynamicVariables) {
1121 for (Use &U : make_early_inc_range(GV->uses())) {
1122 auto *I = dyn_cast<Instruction>(U.getUser());
1123 if (!I)
1124 continue;
1125 if (isKernelLDS(I->getFunction()))
1126 continue;
1127
1128 replaceUseWithTableLookup(M, Builder, table, GV, U, nullptr);
1129 }
1130 }
1131 }
1132 return KernelToCreatedDynamicLDS;
1133 }
1134
1135 bool runOnModule(Module &M) {
1136 CallGraph CG = CallGraph(M);
1137 bool Changed = superAlignLDSGlobals(M);
1138
1139 Changed |= eliminateConstantExprUsesOfLDSFromAllInstructions(M);
1140
1141 Changed = true; // todo: narrow this down
1142
1143 // For each kernel, what variables does it access directly or through
1144 // callees
1145 LDSUsesInfoTy LDSUsesInfo = getTransitiveUsesOfLDS(CG, M);
1146
1147 // For each variable accessed through callees, which kernels access it
1148 VariableFunctionMap LDSToKernelsThatNeedToAccessItIndirectly;
1149 for (auto &K : LDSUsesInfo.indirect_access) {
1150 Function *F = K.first;
1151 assert(isKernelLDS(F));
1152 for (GlobalVariable *GV : K.second) {
1153 LDSToKernelsThatNeedToAccessItIndirectly[GV].insert(F);
1154 }
1155 }
1156
1157 // Partition variables accessed indirectly into the different strategies
1158 DenseSet<GlobalVariable *> ModuleScopeVariables;
1159 DenseSet<GlobalVariable *> TableLookupVariables;
1160 DenseSet<GlobalVariable *> KernelAccessVariables;
1161 DenseSet<GlobalVariable *> DynamicVariables;
1162 partitionVariablesIntoIndirectStrategies(
1163 M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly,
1164 ModuleScopeVariables, TableLookupVariables, KernelAccessVariables,
1165 DynamicVariables);
1166
1167 // If the kernel accesses a variable that is going to be stored in the
1168 // module instance through a call then that kernel needs to allocate the
1169 // module instance
1170 const DenseSet<Function *> KernelsThatAllocateModuleLDS =
1171 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1172 ModuleScopeVariables);
1173 const DenseSet<Function *> KernelsThatAllocateTableLDS =
1174 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1175 TableLookupVariables);
1176
1177 const DenseSet<Function *> KernelsThatIndirectlyAllocateDynamicLDS =
1178 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1179 DynamicVariables);
1180
1181 GlobalVariable *MaybeModuleScopeStruct = lowerModuleScopeStructVariables(
1182 M, ModuleScopeVariables, KernelsThatAllocateModuleLDS);
1183
1185 lowerKernelScopeStructVariables(M, LDSUsesInfo, ModuleScopeVariables,
1186 KernelsThatAllocateModuleLDS,
1187 MaybeModuleScopeStruct);
1188
1189 // Lower zero cost accesses to the kernel instances just created
1190 for (auto &GV : KernelAccessVariables) {
1191 auto &funcs = LDSToKernelsThatNeedToAccessItIndirectly[GV];
1192 assert(funcs.size() == 1); // Only one kernel can access it
1193 LDSVariableReplacement Replacement =
1194 KernelToReplacement[*(funcs.begin())];
1195
1197 Vec.insert(GV);
1198
1199 replaceLDSVariablesWithStruct(M, Vec, Replacement, [](Use &U) {
1200 return isa<Instruction>(U.getUser());
1201 });
1202 }
1203
1204 // The ith element of this vector is kernel id i
1205 std::vector<Function *> OrderedKernels =
1206 assignLDSKernelIDToEachKernel(&M, KernelsThatAllocateTableLDS,
1207 KernelsThatIndirectlyAllocateDynamicLDS);
1208
1209 if (!KernelsThatAllocateTableLDS.empty()) {
1210 LLVMContext &Ctx = M.getContext();
1211 IRBuilder<> Builder(Ctx);
1212
1213 // The order must be consistent between lookup table and accesses to
1214 // lookup table
1215 auto TableLookupVariablesOrdered =
1216 sortByName(std::vector<GlobalVariable *>(TableLookupVariables.begin(),
1217 TableLookupVariables.end()));
1218
1219 GlobalVariable *LookupTable = buildLookupTable(
1220 M, TableLookupVariablesOrdered, OrderedKernels, KernelToReplacement);
1221 replaceUsesInInstructionsWithTableLookup(M, TableLookupVariablesOrdered,
1222 LookupTable);
1223
1224 // Strip amdgpu-no-lds-kernel-id from all functions reachable from the
1225 // kernel. We may have inferred this wasn't used prior to the pass.
1226 //
1227 // TODO: We could filter out subgraphs that do not access LDS globals.
1228 for (Function *F : KernelsThatAllocateTableLDS)
1229 removeNoLdsKernelIdFromReachable(CG, F);
1230 }
1231
1232 DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS =
1233 lowerDynamicLDSVariables(M, LDSUsesInfo,
1234 KernelsThatIndirectlyAllocateDynamicLDS,
1235 DynamicVariables, OrderedKernels);
1236
1237 // All kernel frames have been allocated. Calculate and record the
1238 // addresses.
1239 {
1240 const DataLayout &DL = M.getDataLayout();
1241
1242 for (Function &Func : M.functions()) {
1243 if (Func.isDeclaration() || !isKernelLDS(&Func))
1244 continue;
1245
1246 // All three of these are optional. The first variable is allocated at
1247 // zero. They are allocated by AMDGPUMachineFunction as one block.
1248 // Layout:
1249 //{
1250 // module.lds
1251 // alignment padding
1252 // kernel instance
1253 // alignment padding
1254 // dynamic lds variables
1255 //}
1256
1257 const bool AllocateModuleScopeStruct =
1258 MaybeModuleScopeStruct &&
1259 KernelsThatAllocateModuleLDS.contains(&Func);
1260
1261 auto Replacement = KernelToReplacement.find(&Func);
1262 const bool AllocateKernelScopeStruct =
1263 Replacement != KernelToReplacement.end();
1264
1265 const bool AllocateDynamicVariable =
1266 KernelToCreatedDynamicLDS.contains(&Func);
1267
1268 uint32_t Offset = 0;
1269
1270 if (AllocateModuleScopeStruct) {
1271 // Allocated at zero, recorded once on construction, not once per
1272 // kernel
1273 Offset += DL.getTypeAllocSize(MaybeModuleScopeStruct->getValueType());
1274 }
1275
1276 if (AllocateKernelScopeStruct) {
1277 GlobalVariable *KernelStruct = Replacement->second.SGV;
1278 Offset = alignTo(Offset, AMDGPU::getAlign(DL, KernelStruct));
1279 recordLDSAbsoluteAddress(&M, KernelStruct, Offset);
1280 Offset += DL.getTypeAllocSize(KernelStruct->getValueType());
1281 }
1282
1283 // If there is dynamic allocation, the alignment needed is included in
1284 // the static frame size. There may be no reference to the dynamic
1285 // variable in the kernel itself, so without including it here, that
1286 // alignment padding could be missed.
1287 if (AllocateDynamicVariable) {
1288 GlobalVariable *DynamicVariable = KernelToCreatedDynamicLDS[&Func];
1289 Offset = alignTo(Offset, AMDGPU::getAlign(DL, DynamicVariable));
1290 recordLDSAbsoluteAddress(&M, DynamicVariable, Offset);
1291 }
1292
1293 if (Offset != 0) {
1294 (void)TM; // TODO: Account for target maximum LDS
1295 std::string Buffer;
1296 raw_string_ostream SS{Buffer};
1297 SS << format("%u", Offset);
1298
1299 // Instead of explictly marking kernels that access dynamic variables
1300 // using special case metadata, annotate with min-lds == max-lds, i.e.
1301 // that there is no more space available for allocating more static
1302 // LDS variables. That is the right condition to prevent allocating
1303 // more variables which would collide with the addresses assigned to
1304 // dynamic variables.
1305 if (AllocateDynamicVariable)
1306 SS << format(",%u", Offset);
1307
1308 Func.addFnAttr("amdgpu-lds-size", Buffer);
1309 }
1310 }
1311 }
1312
1313 for (auto &GV : make_early_inc_range(M.globals()))
1315 // probably want to remove from used lists
1317 if (GV.use_empty())
1318 GV.eraseFromParent();
1319 }
1320
1321 return Changed;
1322 }
1323
1324private:
1325 // Increase the alignment of LDS globals if necessary to maximise the chance
1326 // that we can use aligned LDS instructions to access them.
1327 static bool superAlignLDSGlobals(Module &M) {
1328 const DataLayout &DL = M.getDataLayout();
1329 bool Changed = false;
1330 if (!SuperAlignLDSGlobals) {
1331 return Changed;
1332 }
1333
1334 for (auto &GV : M.globals()) {
1336 // Only changing alignment of LDS variables
1337 continue;
1338 }
1339 if (!GV.hasInitializer()) {
1340 // cuda/hip extern __shared__ variable, leave alignment alone
1341 continue;
1342 }
1343
1344 Align Alignment = AMDGPU::getAlign(DL, &GV);
1345 TypeSize GVSize = DL.getTypeAllocSize(GV.getValueType());
1346
1347 if (GVSize > 8) {
1348 // We might want to use a b96 or b128 load/store
1349 Alignment = std::max(Alignment, Align(16));
1350 } else if (GVSize > 4) {
1351 // We might want to use a b64 load/store
1352 Alignment = std::max(Alignment, Align(8));
1353 } else if (GVSize > 2) {
1354 // We might want to use a b32 load/store
1355 Alignment = std::max(Alignment, Align(4));
1356 } else if (GVSize > 1) {
1357 // We might want to use a b16 load/store
1358 Alignment = std::max(Alignment, Align(2));
1359 }
1360
1361 if (Alignment != AMDGPU::getAlign(DL, &GV)) {
1362 Changed = true;
1363 GV.setAlignment(Alignment);
1364 }
1365 }
1366 return Changed;
1367 }
1368
1369 static LDSVariableReplacement createLDSVariableReplacement(
1370 Module &M, std::string VarName,
1371 DenseSet<GlobalVariable *> const &LDSVarsToTransform) {
1372 // Create a struct instance containing LDSVarsToTransform and map from those
1373 // variables to ConstantExprGEP
1374 // Variables may be introduced to meet alignment requirements. No aliasing
1375 // metadata is useful for these as they have no uses. Erased before return.
1376
1377 LLVMContext &Ctx = M.getContext();
1378 const DataLayout &DL = M.getDataLayout();
1379 assert(!LDSVarsToTransform.empty());
1380
1382 LayoutFields.reserve(LDSVarsToTransform.size());
1383 {
1384 // The order of fields in this struct depends on the order of
1385 // varables in the argument which varies when changing how they
1386 // are identified, leading to spurious test breakage.
1387 auto Sorted = sortByName(std::vector<GlobalVariable *>(
1388 LDSVarsToTransform.begin(), LDSVarsToTransform.end()));
1389
1390 for (GlobalVariable *GV : Sorted) {
1392 DL.getTypeAllocSize(GV->getValueType()),
1393 AMDGPU::getAlign(DL, GV));
1394 LayoutFields.emplace_back(F);
1395 }
1396 }
1397
1398 performOptimizedStructLayout(LayoutFields);
1399
1400 std::vector<GlobalVariable *> LocalVars;
1401 BitVector IsPaddingField;
1402 LocalVars.reserve(LDSVarsToTransform.size()); // will be at least this large
1403 IsPaddingField.reserve(LDSVarsToTransform.size());
1404 {
1405 uint64_t CurrentOffset = 0;
1406 for (size_t I = 0; I < LayoutFields.size(); I++) {
1407 GlobalVariable *FGV = static_cast<GlobalVariable *>(
1408 const_cast<void *>(LayoutFields[I].Id));
1409 Align DataAlign = LayoutFields[I].Alignment;
1410
1411 uint64_t DataAlignV = DataAlign.value();
1412 if (uint64_t Rem = CurrentOffset % DataAlignV) {
1413 uint64_t Padding = DataAlignV - Rem;
1414
1415 // Append an array of padding bytes to meet alignment requested
1416 // Note (o + (a - (o % a)) ) % a == 0
1417 // (offset + Padding ) % align == 0
1418
1419 Type *ATy = ArrayType::get(Type::getInt8Ty(Ctx), Padding);
1420 LocalVars.push_back(new GlobalVariable(
1421 M, ATy, false, GlobalValue::InternalLinkage,
1423 AMDGPUAS::LOCAL_ADDRESS, false));
1424 IsPaddingField.push_back(true);
1425 CurrentOffset += Padding;
1426 }
1427
1428 LocalVars.push_back(FGV);
1429 IsPaddingField.push_back(false);
1430 CurrentOffset += LayoutFields[I].Size;
1431 }
1432 }
1433
1434 std::vector<Type *> LocalVarTypes;
1435 LocalVarTypes.reserve(LocalVars.size());
1436 std::transform(
1437 LocalVars.cbegin(), LocalVars.cend(), std::back_inserter(LocalVarTypes),
1438 [](const GlobalVariable *V) -> Type * { return V->getValueType(); });
1439
1440 StructType *LDSTy = StructType::create(Ctx, LocalVarTypes, VarName + ".t");
1441
1442 Align StructAlign = AMDGPU::getAlign(DL, LocalVars[0]);
1443
1444 GlobalVariable *SGV = new GlobalVariable(
1445 M, LDSTy, false, GlobalValue::InternalLinkage, PoisonValue::get(LDSTy),
1447 false);
1448 SGV->setAlignment(StructAlign);
1449
1451 Type *I32 = Type::getInt32Ty(Ctx);
1452 for (size_t I = 0; I < LocalVars.size(); I++) {
1453 GlobalVariable *GV = LocalVars[I];
1454 Constant *GEPIdx[] = {ConstantInt::get(I32, 0), ConstantInt::get(I32, I)};
1455 Constant *GEP = ConstantExpr::getGetElementPtr(LDSTy, SGV, GEPIdx, true);
1456 if (IsPaddingField[I]) {
1457 assert(GV->use_empty());
1458 GV->eraseFromParent();
1459 } else {
1460 Map[GV] = GEP;
1461 }
1462 }
1463 assert(Map.size() == LDSVarsToTransform.size());
1464 return {SGV, std::move(Map)};
1465 }
1466
1467 template <typename PredicateTy>
1468 static void replaceLDSVariablesWithStruct(
1469 Module &M, DenseSet<GlobalVariable *> const &LDSVarsToTransformArg,
1470 const LDSVariableReplacement &Replacement, PredicateTy Predicate) {
1471 LLVMContext &Ctx = M.getContext();
1472 const DataLayout &DL = M.getDataLayout();
1473
1474 // A hack... we need to insert the aliasing info in a predictable order for
1475 // lit tests. Would like to have them in a stable order already, ideally the
1476 // same order they get allocated, which might mean an ordered set container
1477 auto LDSVarsToTransform = sortByName(std::vector<GlobalVariable *>(
1478 LDSVarsToTransformArg.begin(), LDSVarsToTransformArg.end()));
1479
1480 // Create alias.scope and their lists. Each field in the new structure
1481 // does not alias with all other fields.
1482 SmallVector<MDNode *> AliasScopes;
1483 SmallVector<Metadata *> NoAliasList;
1484 const size_t NumberVars = LDSVarsToTransform.size();
1485 if (NumberVars > 1) {
1486 MDBuilder MDB(Ctx);
1487 AliasScopes.reserve(NumberVars);
1489 for (size_t I = 0; I < NumberVars; I++) {
1491 AliasScopes.push_back(Scope);
1492 }
1493 NoAliasList.append(&AliasScopes[1], AliasScopes.end());
1494 }
1495
1496 // Replace uses of ith variable with a constantexpr to the corresponding
1497 // field of the instance that will be allocated by AMDGPUMachineFunction
1498 for (size_t I = 0; I < NumberVars; I++) {
1499 GlobalVariable *GV = LDSVarsToTransform[I];
1500 Constant *GEP = Replacement.LDSVarsToConstantGEP.at(GV);
1501
1502 GV->replaceUsesWithIf(GEP, Predicate);
1503
1504 APInt APOff(DL.getIndexTypeSizeInBits(GEP->getType()), 0);
1505 GEP->stripAndAccumulateInBoundsConstantOffsets(DL, APOff);
1506 uint64_t Offset = APOff.getZExtValue();
1507
1508 Align A =
1509 commonAlignment(Replacement.SGV->getAlign().valueOrOne(), Offset);
1510
1511 if (I)
1512 NoAliasList[I - 1] = AliasScopes[I - 1];
1513 MDNode *NoAlias =
1514 NoAliasList.empty() ? nullptr : MDNode::get(Ctx, NoAliasList);
1515 MDNode *AliasScope =
1516 AliasScopes.empty() ? nullptr : MDNode::get(Ctx, {AliasScopes[I]});
1517
1518 refineUsesAlignmentAndAA(GEP, A, DL, AliasScope, NoAlias);
1519 }
1520 }
1521
1522 static void refineUsesAlignmentAndAA(Value *Ptr, Align A,
1523 const DataLayout &DL, MDNode *AliasScope,
1524 MDNode *NoAlias, unsigned MaxDepth = 5) {
1525 if (!MaxDepth || (A == 1 && !AliasScope))
1526 return;
1527
1528 for (User *U : Ptr->users()) {
1529 if (auto *I = dyn_cast<Instruction>(U)) {
1530 if (AliasScope && I->mayReadOrWriteMemory()) {
1531 MDNode *AS = I->getMetadata(LLVMContext::MD_alias_scope);
1532 AS = (AS ? MDNode::getMostGenericAliasScope(AS, AliasScope)
1533 : AliasScope);
1534 I->setMetadata(LLVMContext::MD_alias_scope, AS);
1535
1536 MDNode *NA = I->getMetadata(LLVMContext::MD_noalias);
1537 NA = (NA ? MDNode::intersect(NA, NoAlias) : NoAlias);
1538 I->setMetadata(LLVMContext::MD_noalias, NA);
1539 }
1540 }
1541
1542 if (auto *LI = dyn_cast<LoadInst>(U)) {
1543 LI->setAlignment(std::max(A, LI->getAlign()));
1544 continue;
1545 }
1546 if (auto *SI = dyn_cast<StoreInst>(U)) {
1547 if (SI->getPointerOperand() == Ptr)
1548 SI->setAlignment(std::max(A, SI->getAlign()));
1549 continue;
1550 }
1551 if (auto *AI = dyn_cast<AtomicRMWInst>(U)) {
1552 // None of atomicrmw operations can work on pointers, but let's
1553 // check it anyway in case it will or we will process ConstantExpr.
1554 if (AI->getPointerOperand() == Ptr)
1555 AI->setAlignment(std::max(A, AI->getAlign()));
1556 continue;
1557 }
1558 if (auto *AI = dyn_cast<AtomicCmpXchgInst>(U)) {
1559 if (AI->getPointerOperand() == Ptr)
1560 AI->setAlignment(std::max(A, AI->getAlign()));
1561 continue;
1562 }
1563 if (auto *GEP = dyn_cast<GetElementPtrInst>(U)) {
1564 unsigned BitWidth = DL.getIndexTypeSizeInBits(GEP->getType());
1565 APInt Off(BitWidth, 0);
1566 if (GEP->getPointerOperand() == Ptr) {
1567 Align GA;
1568 if (GEP->accumulateConstantOffset(DL, Off))
1569 GA = commonAlignment(A, Off.getLimitedValue());
1570 refineUsesAlignmentAndAA(GEP, GA, DL, AliasScope, NoAlias,
1571 MaxDepth - 1);
1572 }
1573 continue;
1574 }
1575 if (auto *I = dyn_cast<Instruction>(U)) {
1576 if (I->getOpcode() == Instruction::BitCast ||
1577 I->getOpcode() == Instruction::AddrSpaceCast)
1578 refineUsesAlignmentAndAA(I, A, DL, AliasScope, NoAlias, MaxDepth - 1);
1579 }
1580 }
1581 }
1582};
1583
1584class AMDGPULowerModuleLDSLegacy : public ModulePass {
1585public:
1586 const AMDGPUTargetMachine *TM;
1587 static char ID;
1588
1589 AMDGPULowerModuleLDSLegacy(const AMDGPUTargetMachine *TM_ = nullptr)
1590 : ModulePass(ID), TM(TM_) {
1592 }
1593
1594 void getAnalysisUsage(AnalysisUsage &AU) const override {
1595 if (!TM)
1597 }
1598
1599 bool runOnModule(Module &M) override {
1600 if (!TM) {
1601 auto &TPC = getAnalysis<TargetPassConfig>();
1602 TM = &TPC.getTM<AMDGPUTargetMachine>();
1603 }
1604
1605 return AMDGPULowerModuleLDS(*TM).runOnModule(M);
1606 }
1607};
1608
1609} // namespace
1610char AMDGPULowerModuleLDSLegacy::ID = 0;
1611
1612char &llvm::AMDGPULowerModuleLDSLegacyPassID = AMDGPULowerModuleLDSLegacy::ID;
1613
1614INITIALIZE_PASS_BEGIN(AMDGPULowerModuleLDSLegacy, DEBUG_TYPE,
1615 "Lower uses of LDS variables from non-kernel functions",
1616 false, false)
1618INITIALIZE_PASS_END(AMDGPULowerModuleLDSLegacy, DEBUG_TYPE,
1619 "Lower uses of LDS variables from non-kernel functions",
1621
1622ModulePass *
1624 return new AMDGPULowerModuleLDSLegacy(TM);
1625}
1626
1629 return AMDGPULowerModuleLDS(TM).runOnModule(M) ? PreservedAnalyses::none()
1631}
aarch64 promote const
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
Lower uses of LDS variables from non kernel functions
#define DEBUG_TYPE
AMDGPU promote alloca to vector or LDS
The AMDGPU TargetMachine interface definition for hw codegen targets.
This file implements the BitVector class.
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
This file provides interfaces used to build and manipulate a call graph, which is a very useful tool ...
#define clEnumValN(ENUMVAL, FLAGNAME, DESC)
Definition: CommandLine.h:693
This file contains the declarations for the subclasses of Constant, which represent the different fla...
Given that RA is a live propagate it s liveness to any other values it uses(according to Uses). void DeadArgumentEliminationPass
This file defines the DenseMap class.
This file defines the DenseSet and SmallDenseSet classes.
uint64_t Size
std::optional< std::vector< StOtherPiece > > Other
Definition: ELFYAML.cpp:1290
Hexagon Common GEP
static const unsigned MaxDepth
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
This file provides an interface for laying out a sequence of fields as a struct in a way that attempt...
const char LLVMTargetMachineRef TM
#define INITIALIZE_PASS_DEPENDENCY(depName)
Definition: PassSupport.h:55
#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:59
#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:52
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
This file contains some templates that are useful if you are working with the STL at all.
This file defines generic set operations that may be used on set's of different types,...
Target-Independent Code Generator Pass Configuration Options pass.
Class for arbitrary precision integers.
Definition: APInt.h:76
uint64_t getZExtValue() const
Get zero extended value.
Definition: APInt.h:1485
A container for analyses that lazily runs them and caches their results.
Definition: PassManager.h:348
Represent the analysis usage information of a pass.
AnalysisUsage & addRequired()
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:165
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:160
LLVM Basic Block Representation.
Definition: BasicBlock.h:60
const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
Definition: BasicBlock.cpp:452
void reserve(unsigned N)
Definition: BitVector.h:348
void push_back(bool Val)
Definition: BitVector.h:466
std::pair< std::optional< WeakTrackingVH >, CallGraphNode * > CallRecord
A pair of the calling instruction (a call or invoke) and the call graph node being called.
Definition: CallGraph.h:178
The basic data container for the call graph of a Module of IR.
Definition: CallGraph.h:72
CallGraphNode * getExternalCallingNode() const
Returns the CallGraphNode which is used to represent undetermined calls into the callgraph.
Definition: CallGraph.h:127
static Constant * get(ArrayType *T, ArrayRef< Constant * > V)
Definition: Constants.cpp:1291
static ConstantAsMetadata * get(Constant *C)
Definition: Metadata.h:528
static Constant * getPointerBitCastOrAddrSpaceCast(Constant *C, Type *Ty)
Create a BitCast or AddrSpaceCast for a pointer type depending on the address space.
Definition: Constants.cpp:2087
static Constant * getPtrToInt(Constant *C, Type *Ty, bool OnlyIfReduced=false)
Definition: Constants.cpp:2112
static Constant * getGetElementPtr(Type *Ty, Constant *C, ArrayRef< Constant * > IdxList, bool InBounds=false, std::optional< unsigned > InRangeIndex=std::nullopt, Type *OnlyIfReducedTy=nullptr)
Getelementptr form.
Definition: Constants.h:1201
This is an important base class in LLVM.
Definition: Constant.h:41
void removeDeadConstantUsers() const
If there are any dead constant users dangling off of this constant, remove them.
Definition: Constants.cpp:722
A parsed version of the target data layout string in and methods for querying it.
Definition: DataLayout.h:110
iterator find(const_arg_type_t< KeyT > Val)
Definition: DenseMap.h:155
std::pair< iterator, bool > try_emplace(KeyT &&Key, Ts &&... Args)
Definition: DenseMap.h:235
iterator end()
Definition: DenseMap.h:84
bool contains(const_arg_type_t< KeyT > Val) const
Return true if the specified key is in the map, false otherwise.
Definition: DenseMap.h:145
Implements a dense probed hash-table based set.
Definition: DenseSet.h:271
void removeFnAttr(Attribute::AttrKind Kind)
Remove function attributes from this function.
Definition: Function.cpp:627
void setMetadata(unsigned KindID, MDNode *Node)
Set a particular kind of metadata attachment.
Definition: Metadata.cpp:1485
void setAlignment(Align Align)
Sets the alignment attribute of the GlobalObject.
Definition: Globals.cpp:128
PointerType * getType() const
Global values are always pointers.
Definition: GlobalValue.h:294
@ InternalLinkage
Rename collisions when linking (static functions).
Definition: GlobalValue.h:59
@ ExternalLinkage
Externally visible function.
Definition: GlobalValue.h:52
Type * getValueType() const
Definition: GlobalValue.h:296
bool hasInitializer() const
Definitions have initializers, declarations don't.
void eraseFromParent()
eraseFromParent - This method unlinks 'this' from the containing module and deletes it.
Definition: Globals.cpp:455
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Definition: IRBuilder.h:2105
Value * CreateConstInBoundsGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Definition: IRBuilder.h:1890
Value * CreateInBoundsGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="")
Definition: IRBuilder.h:1875
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
Definition: IRBuilder.h:480
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
Definition: IRBuilder.h:1789
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
Definition: IRBuilder.h:180
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args=std::nullopt, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition: IRBuilder.h:2395
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition: IRBuilder.h:2649
This is an important class for using LLVM in a threaded context.
Definition: LLVMContext.h:67
MDNode * createAnonymousAliasScope(MDNode *Domain, StringRef Name=StringRef())
Return metadata appropriate for an alias scope root node.
Definition: MDBuilder.h:159
MDNode * createAnonymousAliasScopeDomain(StringRef Name=StringRef())
Return metadata appropriate for an alias scope domain node.
Definition: MDBuilder.h:152
Metadata node.
Definition: Metadata.h:1067
static MDNode * getMostGenericAliasScope(MDNode *A, MDNode *B)
Definition: Metadata.cpp:1132
static MDTuple * get(LLVMContext &Context, ArrayRef< Metadata * > MDs)
Definition: Metadata.h:1541
static MDNode * intersect(MDNode *A, MDNode *B)
Definition: Metadata.cpp:1119
Root of the metadata hierarchy.
Definition: Metadata.h:62
ModulePass class - This class is used to implement unstructured interprocedural optimizations and ana...
Definition: Pass.h:251
virtual bool runOnModule(Module &M)=0
runOnModule - Virtual method overriden by subclasses to process the module being operated on.
A Module instance is used to store all the information related to an LLVM module.
Definition: Module.h:65
A container for an operand bundle being viewed as a set of values rather than a set of uses.
Definition: InstrTypes.h:1212
static PassRegistry * getPassRegistry()
getPassRegistry - Access the global registry object, which is automatically initialized at applicatio...
virtual void getAnalysisUsage(AnalysisUsage &) const
getAnalysisUsage - This function should be overriden by passes that need analysis information to do t...
Definition: Pass.cpp:98
static PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
Definition: Constants.cpp:1827
A set of analyses that are preserved following a run of a transformation pass.
Definition: Analysis.h:109
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition: Analysis.h:112
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition: Analysis.h:115
size_type count(ConstPtrType Ptr) const
count - Return 1 if the specified pointer is in the set, 0 otherwise.
Definition: SmallPtrSet.h:360
std::pair< iterator, bool > insert(PtrType Ptr)
Inserts Ptr if and only if there is no element in the container equal to Ptr.
Definition: SmallPtrSet.h:342
SmallPtrSet - This class implements a set which is optimized for holding SmallSize or less elements.
Definition: SmallPtrSet.h:427
bool empty() const
Definition: SmallVector.h:94
size_t size() const
Definition: SmallVector.h:91
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:950
void reserve(size_type N)
Definition: SmallVector.h:676
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:696
void push_back(const T &Elt)
Definition: SmallVector.h:426
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1209
Class to represent struct types.
Definition: DerivedTypes.h:216
static StructType * create(LLVMContext &Context, StringRef Name)
This creates an identified struct.
Definition: Type.cpp:513
Target-Independent Code Generator Pass Configuration Options.
Twine - A lightweight data structure for efficiently representing the concatenation of temporary valu...
Definition: Twine.h:81
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
unsigned getPointerAddressSpace() const
Get the address space of this pointer or pointer vector type.
static IntegerType * getInt8Ty(LLVMContext &C)
static IntegerType * getInt32Ty(LLVMContext &C)
A Use represents the edge between a Value definition and its users.
Definition: Use.h:43
LLVM Value Representation.
Definition: Value.h:74
void replaceUsesWithIf(Value *New, llvm::function_ref< bool(Use &U)> ShouldReplace)
Go through the uses list for this definition and make each use point to "V" if the callback ShouldRep...
Definition: Value.cpp:542
bool use_empty() const
Definition: Value.h:344
iterator_range< use_iterator > uses()
Definition: Value.h:376
bool hasName() const
Definition: Value.h:261
StringRef getName() const
Return a constant reference to the value's name.
Definition: Value.cpp:309
std::pair< iterator, bool > insert(const ValueT &V)
Definition: DenseSet.h:206
size_type size() const
Definition: DenseSet.h:81
bool contains(const_arg_type_t< ValueT > V) const
Check if the set contains the given element.
Definition: DenseSet.h:185
bool erase(const ValueT &V)
Definition: DenseSet.h:101
A raw_ostream that writes to an std::string.
Definition: raw_ostream.h:660
@ LOCAL_ADDRESS
Address space for local memory.
@ CONSTANT_ADDRESS
Address space for constant memory (VTX2).
LLVM_READNONE bool isKernel(CallingConv::ID CC)
bool isDynamicLDS(const GlobalVariable &GV)
Align getAlign(DataLayout const &DL, const GlobalVariable *GV)
bool isLDSVariableToLower(const GlobalVariable &GV)
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition: CallingConv.h:24
@ C
The default llvm calling convention, compatible with C.
Definition: CallingConv.h:34
Function * getDeclaration(Module *M, ID id, ArrayRef< Type * > Tys=std::nullopt)
Create or insert an LLVM Function declaration for an intrinsic, and return it.
Definition: Function.cpp:1447
ValuesClass values(OptsTy... Options)
Helper to build a ValuesClass by forwarding a variable number of arguments as an initializer list to ...
Definition: CommandLine.h:718
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:450
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:456
bool operator<(int64_t V1, const APSInt &V2)
Definition: APSInt.h:361
bool convertUsersOfConstantsToInstructions(ArrayRef< Constant * > Consts)
Replace constant expressions users of the given constants with instructions.
bool set_is_subset(const S1Ty &S1, const S2Ty &S2)
set_is_subset(A, B) - Return true iff A in B
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition: STLExtras.h:665
void initializeAMDGPULowerModuleLDSLegacyPass(PassRegistry &)
void sort(IteratorTy Start, IteratorTy End)
Definition: STLExtras.h:1656
void report_fatal_error(Error Err, bool gen_crash_diag=true)
Report a serious error, calling any installed error handler.
Definition: Error.cpp:156
char & AMDGPULowerModuleLDSLegacyPassID
bool set_union(S1Ty &S1, const S2Ty &S2)
set_union(A, B) - Compute A := A u B, return whether A changed.
Definition: SetOperations.h:23
void removeFromUsedLists(Module &M, function_ref< bool(Constant *)> ShouldRemove)
Removes global values from the llvm.used and llvm.compiler.used arrays.
format_object< Ts... > format(const char *Fmt, const Ts &... Vals)
These are helper functions used to produce formatted output.
Definition: Format.h:125
ModulePass * createAMDGPULowerModuleLDSLegacyPass(const AMDGPUTargetMachine *TM=nullptr)
void appendToCompilerUsed(Module &M, ArrayRef< GlobalValue * > Values)
Adds global values to the llvm.compiler.used list.
std::pair< uint64_t, Align > performOptimizedStructLayout(MutableArrayRef< OptimizedStructLayoutField > Fields)
Compute a layout for a struct containing the given fields, making a best-effort attempt to minimize t...
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition: Alignment.h:155
constexpr unsigned BitWidth
Definition: BitmaskEnum.h:191
Align commonAlignment(Align A, uint64_t Offset)
Returns the alignment that satisfies both alignments.
Definition: Alignment.h:212
#define N
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
const AMDGPUTargetMachine & TM
Definition: AMDGPU.h:133
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85