LLVM 18.0.0git
AMDGPULowerModuleLDSPass.cpp
Go to the documentation of this file.
1//===-- AMDGPULowerModuleLDSPass.cpp ------------------------------*- C++ -*-=//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This pass eliminates local data store, LDS, uses from non-kernel functions.
10// LDS is contiguous memory allocated per kernel execution.
11//
12// Background.
13//
14// The programming model is global variables, or equivalently function local
15// static variables, accessible from kernels or other functions. For uses from
16// kernels this is straightforward - assign an integer to the kernel for the
17// memory required by all the variables combined, allocate them within that.
18// For uses from functions there are performance tradeoffs to choose between.
19//
20// This model means the GPU runtime can specify the amount of memory allocated.
21// If this is more than the kernel assumed, the excess can be made available
22// using a language specific feature, which IR represents as a variable with
23// no initializer. This feature is referred to here as "Dynamic LDS" and is
24// lowered slightly differently to the normal case.
25//
26// Consequences of this GPU feature:
27// - memory is limited and exceeding it halts compilation
28// - a global accessed by one kernel exists independent of other kernels
29// - a global exists independent of simultaneous execution of the same kernel
30// - the address of the global may be different from different kernels as they
31// do not alias, which permits only allocating variables they use
32// - if the address is allowed to differ, functions need help to find it
33//
34// Uses from kernels are implemented here by grouping them in a per-kernel
35// struct instance. This duplicates the variables, accurately modelling their
36// aliasing properties relative to a single global representation. It also
37// permits control over alignment via padding.
38//
39// Uses from functions are more complicated and the primary purpose of this
40// IR pass. Several different lowering are chosen between to meet requirements
41// to avoid allocating any LDS where it is not necessary, as that impacts
42// occupancy and may fail the compilation, while not imposing overhead on a
43// feature whose primary advantage over global memory is performance. The basic
44// design goal is to avoid one kernel imposing overhead on another.
45//
46// Implementation.
47//
48// LDS variables with constant annotation or non-undef initializer are passed
49// through unchanged for simplification or error diagnostics in later passes.
50// Non-undef initializers are not yet implemented for LDS.
51//
52// LDS variables that are always allocated at the same address can be found
53// by lookup at that address. Otherwise runtime information/cost is required.
54//
55// The simplest strategy possible is to group all LDS variables in a single
56// struct and allocate that struct in every kernel such that the original
57// variables are always at the same address. LDS is however a limited resource
58// so this strategy is unusable in practice. It is not implemented here.
59//
60// Strategy | Precise allocation | Zero runtime cost | General purpose |
61// --------+--------------------+-------------------+-----------------+
62// Module | No | Yes | Yes |
63// Table | Yes | No | Yes |
64// Kernel | Yes | Yes | No |
65// Hybrid | Yes | Partial | Yes |
66//
67// "Module" spends LDS memory to save cycles. "Table" spends cycles and global
68// memory to save LDS. "Kernel" is as fast as kernel allocation but only works
69// for variables that are known reachable from a single kernel. "Hybrid" picks
70// between all three. When forced to choose between LDS and cycles we minimise
71// LDS use.
72
73// The "module" lowering implemented here finds LDS variables which are used by
74// non-kernel functions and creates a new struct with a field for each of those
75// LDS variables. Variables that are only used from kernels are excluded.
76//
77// The "table" lowering implemented here has three components.
78// First kernels are assigned a unique integer identifier which is available in
79// functions it calls through the intrinsic amdgcn_lds_kernel_id. The integer
80// is passed through a specific SGPR, thus works with indirect calls.
81// Second, each kernel allocates LDS variables independent of other kernels and
82// writes the addresses it chose for each variable into an array in consistent
83// order. If the kernel does not allocate a given variable, it writes undef to
84// the corresponding array location. These arrays are written to a constant
85// table in the order matching the kernel unique integer identifier.
86// Third, uses from non-kernel functions are replaced with a table lookup using
87// the intrinsic function to find the address of the variable.
88//
89// "Kernel" lowering is only applicable for variables that are unambiguously
90// reachable from exactly one kernel. For those cases, accesses to the variable
91// can be lowered to ConstantExpr address of a struct instance specific to that
92// one kernel. This is zero cost in space and in compute. It will raise a fatal
93// error on any variable that might be reachable from multiple kernels and is
94// thus most easily used as part of the hybrid lowering strategy.
95//
96// Hybrid lowering is a mixture of the above. It uses the zero cost kernel
97// lowering where it can. It lowers the variable accessed by the greatest
98// number of kernels using the module strategy as that is free for the first
99// variable. Any futher variables that can be lowered with the module strategy
100// without incurring LDS memory overhead are. The remaining ones are lowered
101// via table.
102//
103// Consequences
104// - No heuristics or user controlled magic numbers, hybrid is the right choice
105// - Kernels that don't use functions (or have had them all inlined) are not
106// affected by any lowering for kernels that do.
107// - Kernels that don't make indirect function calls are not affected by those
108// that do.
109// - Variables which are used by lots of kernels, e.g. those injected by a
110// language runtime in most kernels, are expected to have no overhead
111// - Implementations that instantiate templates per-kernel where those templates
112// use LDS are expected to hit the "Kernel" lowering strategy
113// - The runtime properties impose a cost in compiler implementation complexity
114//
115// Dynamic LDS implementation
116// Dynamic LDS is lowered similarly to the "table" strategy above and uses the
117// same intrinsic to identify which kernel is at the root of the dynamic call
118// graph. This relies on the specified behaviour that all dynamic LDS variables
119// alias one another, i.e. are at the same address, with respect to a given
120// kernel. Therefore this pass creates new dynamic LDS variables for each kernel
121// that allocates any dynamic LDS and builds a table of addresses out of those.
122// The AMDGPUPromoteAlloca pass skips kernels that use dynamic LDS.
123// The corresponding optimisation for "kernel" lowering where the table lookup
124// is elided is not implemented.
125//
126//
127// Implementation notes / limitations
128// A single LDS global variable represents an instance per kernel that can reach
129// said variables. This pass essentially specialises said variables per kernel.
130// Handling ConstantExpr during the pass complicated this significantly so now
131// all ConstantExpr uses of LDS variables are expanded to instructions. This
132// may need amending when implementing non-undef initialisers.
133//
134// Lowering is split between this IR pass and the back end. This pass chooses
135// where given variables should be allocated and marks them with metadata,
136// MD_absolute_symbol. The backend places the variables in coincidentally the
137// same location and raises a fatal error if something has gone awry. This works
138// in practice because the only pass between this one and the backend that
139// changes LDS is PromoteAlloca and the changes it makes do not conflict.
140//
141// Addresses are written to constant global arrays based on the same metadata.
142//
143// The backend lowers LDS variables in the order of traversal of the function.
144// This is at odds with the deterministic layout required. The workaround is to
145// allocate the fixed-address variables immediately upon starting the function
146// where they can be placed as intended. This requires a means of mapping from
147// the function to the variables that it allocates. For the module scope lds,
148// this is via metadata indicating whether the variable is not required. If a
149// pass deletes that metadata, a fatal error on disagreement with the absolute
150// symbol metadata will occur. For kernel scope and dynamic, this is by _name_
151// correspondence between the function and the variable. It requires the
152// kernel to have a name (which is only a limitation for tests in practice) and
153// for nothing to rename the corresponding symbols. This is a hazard if the pass
154// is run multiple times during debugging. Alternative schemes considered all
155// involve bespoke metadata.
156//
157// If the name correspondence can be replaced, multiple distinct kernels that
158// have the same memory layout can map to the same kernel id (as the address
159// itself is handled by the absolute symbol metadata) and that will allow more
160// uses of the "kernel" style faster lowering and reduce the size of the lookup
161// tables.
162//
163// There is a test that checks this does not fire for a graphics shader. This
164// lowering is expected to work for graphics if the isKernel test is changed.
165//
166// The current markUsedByKernel is sufficient for PromoteAlloca but is elided
167// before codegen. Replacing this with an equivalent intrinsic which lasts until
168// shortly after the machine function lowering of LDS would help break the name
169// mapping. The other part needed is probably to amend PromoteAlloca to embed
170// the LDS variables it creates in the same struct created here. That avoids the
171// current hazard where a PromoteAlloca LDS variable might be allocated before
172// the kernel scope (and thus error on the address check). Given a new invariant
173// that no LDS variables exist outside of the structs managed here, and an
174// intrinsic that lasts until after the LDS frame lowering, it should be
175// possible to drop the name mapping and fold equivalent memory layouts.
176//
177//===----------------------------------------------------------------------===//
178
179#include "AMDGPU.h"
180#include "AMDGPUTargetMachine.h"
181#include "Utils/AMDGPUBaseInfo.h"
183#include "llvm/ADT/BitVector.h"
184#include "llvm/ADT/DenseMap.h"
185#include "llvm/ADT/DenseSet.h"
186#include "llvm/ADT/STLExtras.h"
188#include "llvm/ADT/SetVector.h"
191#include "llvm/IR/Constants.h"
192#include "llvm/IR/DerivedTypes.h"
193#include "llvm/IR/IRBuilder.h"
194#include "llvm/IR/InlineAsm.h"
195#include "llvm/IR/Instructions.h"
196#include "llvm/IR/IntrinsicsAMDGPU.h"
197#include "llvm/IR/MDBuilder.h"
200#include "llvm/Pass.h"
202#include "llvm/Support/Debug.h"
203#include "llvm/Support/Format.h"
208
209#include <tuple>
210#include <vector>
211
212#include <cstdio>
213
214#define DEBUG_TYPE "amdgpu-lower-module-lds"
215
216using namespace llvm;
217
218namespace {
219
220cl::opt<bool> SuperAlignLDSGlobals(
221 "amdgpu-super-align-lds-globals",
222 cl::desc("Increase alignment of LDS if it is not on align boundary"),
223 cl::init(true), cl::Hidden);
224
225enum class LoweringKind { module, table, kernel, hybrid };
226cl::opt<LoweringKind> LoweringKindLoc(
227 "amdgpu-lower-module-lds-strategy",
228 cl::desc("Specify lowering strategy for function LDS access:"), cl::Hidden,
229 cl::init(LoweringKind::hybrid),
231 clEnumValN(LoweringKind::table, "table", "Lower via table lookup"),
232 clEnumValN(LoweringKind::module, "module", "Lower via module struct"),
234 LoweringKind::kernel, "kernel",
235 "Lower variables reachable from one kernel, otherwise abort"),
236 clEnumValN(LoweringKind::hybrid, "hybrid",
237 "Lower via mixture of above strategies")));
238
239bool isKernelLDS(const Function *F) {
240 // Some weirdness here. AMDGPU::isKernelCC does not call into
241 // AMDGPU::isKernel with the calling conv, it instead calls into
242 // isModuleEntryFunction which returns true for more calling conventions
243 // than AMDGPU::isKernel does. There's a FIXME on AMDGPU::isKernel.
244 // There's also a test that checks that the LDS lowering does not hit on
245 // a graphics shader, denoted amdgpu_ps, so stay with the limited case.
246 // Putting LDS in the name of the function to draw attention to this.
247 return AMDGPU::isKernel(F->getCallingConv());
248}
249
250template <typename T> std::vector<T> sortByName(std::vector<T> &&V) {
251 llvm::sort(V.begin(), V.end(), [](const auto *L, const auto *R) {
252 return L->getName() < R->getName();
253 });
254 return {std::move(V)};
255}
256
257class AMDGPULowerModuleLDS {
258 const AMDGPUTargetMachine &TM;
259
260 static void
261 removeLocalVarsFromUsedLists(Module &M,
262 const DenseSet<GlobalVariable *> &LocalVars) {
263 // The verifier rejects used lists containing an inttoptr of a constant
264 // so remove the variables from these lists before replaceAllUsesWith
265 SmallPtrSet<Constant *, 8> LocalVarsSet;
266 for (GlobalVariable *LocalVar : LocalVars)
267 LocalVarsSet.insert(cast<Constant>(LocalVar->stripPointerCasts()));
268
270 M, [&LocalVarsSet](Constant *C) { return LocalVarsSet.count(C); });
271
272 for (GlobalVariable *LocalVar : LocalVars)
273 LocalVar->removeDeadConstantUsers();
274 }
275
276 static void markUsedByKernel(Function *Func, GlobalVariable *SGV) {
277 // The llvm.amdgcn.module.lds instance is implicitly used by all kernels
278 // that might call a function which accesses a field within it. This is
279 // presently approximated to 'all kernels' if there are any such functions
280 // in the module. This implicit use is redefined as an explicit use here so
281 // that later passes, specifically PromoteAlloca, account for the required
282 // memory without any knowledge of this transform.
283
284 // An operand bundle on llvm.donothing works because the call instruction
285 // survives until after the last pass that needs to account for LDS. It is
286 // better than inline asm as the latter survives until the end of codegen. A
287 // totally robust solution would be a function with the same semantics as
288 // llvm.donothing that takes a pointer to the instance and is lowered to a
289 // no-op after LDS is allocated, but that is not presently necessary.
290
291 // This intrinsic is eliminated shortly before instruction selection. It
292 // does not suffice to indicate to ISel that a given global which is not
293 // immediately used by the kernel must still be allocated by it. An
294 // equivalent target specific intrinsic which lasts until immediately after
295 // codegen would suffice for that, but one would still need to ensure that
296 // the variables are allocated in the anticpated order.
297 BasicBlock *Entry = &Func->getEntryBlock();
298 IRBuilder<> Builder(Entry, Entry->getFirstNonPHIIt());
299
300 Function *Decl =
301 Intrinsic::getDeclaration(Func->getParent(), Intrinsic::donothing, {});
302
303 Value *UseInstance[1] = {
304 Builder.CreateConstInBoundsGEP1_32(SGV->getValueType(), SGV, 0)};
305
306 Builder.CreateCall(
307 Decl, {}, {OperandBundleDefT<Value *>("ExplicitUse", UseInstance)});
308 }
309
310 static bool eliminateConstantExprUsesOfLDSFromAllInstructions(Module &M) {
311 // Constants are uniqued within LLVM. A ConstantExpr referring to a LDS
312 // global may have uses from multiple different functions as a result.
313 // This pass specialises LDS variables with respect to the kernel that
314 // allocates them.
315
316 // This is semantically equivalent to (the unimplemented as slow):
317 // for (auto &F : M.functions())
318 // for (auto &BB : F)
319 // for (auto &I : BB)
320 // for (Use &Op : I.operands())
321 // if (constantExprUsesLDS(Op))
322 // replaceConstantExprInFunction(I, Op);
323
324 SmallVector<Constant *> LDSGlobals;
325 for (auto &GV : M.globals())
327 LDSGlobals.push_back(&GV);
328
329 return convertUsersOfConstantsToInstructions(LDSGlobals);
330 }
331
332public:
333 AMDGPULowerModuleLDS(const AMDGPUTargetMachine &TM_) : TM(TM_) {}
334
335 using FunctionVariableMap = DenseMap<Function *, DenseSet<GlobalVariable *>>;
336
337 using VariableFunctionMap = DenseMap<GlobalVariable *, DenseSet<Function *>>;
338
339 static void getUsesOfLDSByFunction(CallGraph const &CG, Module &M,
340 FunctionVariableMap &kernels,
341 FunctionVariableMap &functions) {
342
343 // Get uses from the current function, excluding uses by called functions
344 // Two output variables to avoid walking the globals list twice
345 for (auto &GV : M.globals()) {
347 continue;
348 }
349
350 if (GV.isAbsoluteSymbolRef()) {
352 "LDS variables with absolute addresses are unimplemented.");
353 }
354
355 for (User *V : GV.users()) {
356 if (auto *I = dyn_cast<Instruction>(V)) {
357 Function *F = I->getFunction();
358 if (isKernelLDS(F)) {
359 kernels[F].insert(&GV);
360 } else {
361 functions[F].insert(&GV);
362 }
363 }
364 }
365 }
366 }
367
368 struct LDSUsesInfoTy {
369 FunctionVariableMap direct_access;
370 FunctionVariableMap indirect_access;
371 };
372
373 static LDSUsesInfoTy getTransitiveUsesOfLDS(CallGraph const &CG, Module &M) {
374
375 FunctionVariableMap direct_map_kernel;
376 FunctionVariableMap direct_map_function;
377 getUsesOfLDSByFunction(CG, M, direct_map_kernel, direct_map_function);
378
379 // Collect variables that are used by functions whose address has escaped
380 DenseSet<GlobalVariable *> VariablesReachableThroughFunctionPointer;
381 for (Function &F : M.functions()) {
382 if (!isKernelLDS(&F))
383 if (F.hasAddressTaken(nullptr,
384 /* IgnoreCallbackUses */ false,
385 /* IgnoreAssumeLikeCalls */ false,
386 /* IgnoreLLVMUsed */ true,
387 /* IgnoreArcAttachedCall */ false)) {
388 set_union(VariablesReachableThroughFunctionPointer,
389 direct_map_function[&F]);
390 }
391 }
392
393 auto functionMakesUnknownCall = [&](const Function *F) -> bool {
394 assert(!F->isDeclaration());
395 for (const CallGraphNode::CallRecord &R : *CG[F]) {
396 if (!R.second->getFunction()) {
397 return true;
398 }
399 }
400 return false;
401 };
402
403 // Work out which variables are reachable through function calls
404 FunctionVariableMap transitive_map_function = direct_map_function;
405
406 // If the function makes any unknown call, assume the worst case that it can
407 // access all variables accessed by functions whose address escaped
408 for (Function &F : M.functions()) {
409 if (!F.isDeclaration() && functionMakesUnknownCall(&F)) {
410 if (!isKernelLDS(&F)) {
411 set_union(transitive_map_function[&F],
412 VariablesReachableThroughFunctionPointer);
413 }
414 }
415 }
416
417 // Direct implementation of collecting all variables reachable from each
418 // function
419 for (Function &Func : M.functions()) {
420 if (Func.isDeclaration() || isKernelLDS(&Func))
421 continue;
422
423 DenseSet<Function *> seen; // catches cycles
425
426 while (!wip.empty()) {
427 Function *F = wip.pop_back_val();
428
429 // Can accelerate this by referring to transitive map for functions that
430 // have already been computed, with more care than this
431 set_union(transitive_map_function[&Func], direct_map_function[F]);
432
433 for (const CallGraphNode::CallRecord &R : *CG[F]) {
434 Function *ith = R.second->getFunction();
435 if (ith) {
436 if (!seen.contains(ith)) {
437 seen.insert(ith);
438 wip.push_back(ith);
439 }
440 }
441 }
442 }
443 }
444
445 // direct_map_kernel lists which variables are used by the kernel
446 // find the variables which are used through a function call
447 FunctionVariableMap indirect_map_kernel;
448
449 for (Function &Func : M.functions()) {
450 if (Func.isDeclaration() || !isKernelLDS(&Func))
451 continue;
452
453 for (const CallGraphNode::CallRecord &R : *CG[&Func]) {
454 Function *ith = R.second->getFunction();
455 if (ith) {
456 set_union(indirect_map_kernel[&Func], transitive_map_function[ith]);
457 } else {
458 set_union(indirect_map_kernel[&Func],
459 VariablesReachableThroughFunctionPointer);
460 }
461 }
462 }
463
464 return {std::move(direct_map_kernel), std::move(indirect_map_kernel)};
465 }
466
467 struct LDSVariableReplacement {
468 GlobalVariable *SGV = nullptr;
469 DenseMap<GlobalVariable *, Constant *> LDSVarsToConstantGEP;
470 };
471
472 // remap from lds global to a constantexpr gep to where it has been moved to
473 // for each kernel
474 // an array with an element for each kernel containing where the corresponding
475 // variable was remapped to
476
477 static Constant *getAddressesOfVariablesInKernel(
479 const DenseMap<GlobalVariable *, Constant *> &LDSVarsToConstantGEP) {
480 // Create a ConstantArray containing the address of each Variable within the
481 // kernel corresponding to LDSVarsToConstantGEP, or poison if that kernel
482 // does not allocate it
483 // TODO: Drop the ptrtoint conversion
484
485 Type *I32 = Type::getInt32Ty(Ctx);
486
487 ArrayType *KernelOffsetsType = ArrayType::get(I32, Variables.size());
488
490 for (size_t i = 0; i < Variables.size(); i++) {
491 GlobalVariable *GV = Variables[i];
492 auto ConstantGepIt = LDSVarsToConstantGEP.find(GV);
493 if (ConstantGepIt != LDSVarsToConstantGEP.end()) {
494 auto elt = ConstantExpr::getPtrToInt(ConstantGepIt->second, I32);
495 Elements.push_back(elt);
496 } else {
497 Elements.push_back(PoisonValue::get(I32));
498 }
499 }
500 return ConstantArray::get(KernelOffsetsType, Elements);
501 }
502
503 static GlobalVariable *buildLookupTable(
505 ArrayRef<Function *> kernels,
507 if (Variables.empty()) {
508 return nullptr;
509 }
510 LLVMContext &Ctx = M.getContext();
511
512 const size_t NumberVariables = Variables.size();
513 const size_t NumberKernels = kernels.size();
514
515 ArrayType *KernelOffsetsType =
516 ArrayType::get(Type::getInt32Ty(Ctx), NumberVariables);
517
518 ArrayType *AllKernelsOffsetsType =
519 ArrayType::get(KernelOffsetsType, NumberKernels);
520
521 Constant *Missing = PoisonValue::get(KernelOffsetsType);
522 std::vector<Constant *> overallConstantExprElts(NumberKernels);
523 for (size_t i = 0; i < NumberKernels; i++) {
524 auto Replacement = KernelToReplacement.find(kernels[i]);
525 overallConstantExprElts[i] =
526 (Replacement == KernelToReplacement.end())
527 ? Missing
528 : getAddressesOfVariablesInKernel(
529 Ctx, Variables, Replacement->second.LDSVarsToConstantGEP);
530 }
531
532 Constant *init =
533 ConstantArray::get(AllKernelsOffsetsType, overallConstantExprElts);
534
535 return new GlobalVariable(
536 M, AllKernelsOffsetsType, true, GlobalValue::InternalLinkage, init,
537 "llvm.amdgcn.lds.offset.table", nullptr, GlobalValue::NotThreadLocal,
539 }
540
541 void replaceUseWithTableLookup(Module &M, IRBuilder<> &Builder,
542 GlobalVariable *LookupTable,
543 GlobalVariable *GV, Use &U,
544 Value *OptionalIndex) {
545 // Table is a constant array of the same length as OrderedKernels
546 LLVMContext &Ctx = M.getContext();
547 Type *I32 = Type::getInt32Ty(Ctx);
548 auto *I = cast<Instruction>(U.getUser());
549
550 Value *tableKernelIndex = getTableLookupKernelIndex(M, I->getFunction());
551
552 if (auto *Phi = dyn_cast<PHINode>(I)) {
553 BasicBlock *BB = Phi->getIncomingBlock(U);
554 Builder.SetInsertPoint(&(*(BB->getFirstInsertionPt())));
555 } else {
556 Builder.SetInsertPoint(I);
557 }
558
559 SmallVector<Value *, 3> GEPIdx = {
560 ConstantInt::get(I32, 0),
561 tableKernelIndex,
562 };
563 if (OptionalIndex)
564 GEPIdx.push_back(OptionalIndex);
565
566 Value *Address = Builder.CreateInBoundsGEP(
567 LookupTable->getValueType(), LookupTable, GEPIdx, GV->getName());
568
569 Value *loaded = Builder.CreateLoad(I32, Address);
570
571 Value *replacement =
572 Builder.CreateIntToPtr(loaded, GV->getType(), GV->getName());
573
574 U.set(replacement);
575 }
576
577 void replaceUsesInInstructionsWithTableLookup(
578 Module &M, ArrayRef<GlobalVariable *> ModuleScopeVariables,
579 GlobalVariable *LookupTable) {
580
581 LLVMContext &Ctx = M.getContext();
582 IRBuilder<> Builder(Ctx);
583 Type *I32 = Type::getInt32Ty(Ctx);
584
585 for (size_t Index = 0; Index < ModuleScopeVariables.size(); Index++) {
586 auto *GV = ModuleScopeVariables[Index];
587
588 for (Use &U : make_early_inc_range(GV->uses())) {
589 auto *I = dyn_cast<Instruction>(U.getUser());
590 if (!I)
591 continue;
592
593 replaceUseWithTableLookup(M, Builder, LookupTable, GV, U,
594 ConstantInt::get(I32, Index));
595 }
596 }
597 }
598
599 static DenseSet<Function *> kernelsThatIndirectlyAccessAnyOfPassedVariables(
600 Module &M, LDSUsesInfoTy &LDSUsesInfo,
601 DenseSet<GlobalVariable *> const &VariableSet) {
602
603 DenseSet<Function *> KernelSet;
604
605 if (VariableSet.empty())
606 return KernelSet;
607
608 for (Function &Func : M.functions()) {
609 if (Func.isDeclaration() || !isKernelLDS(&Func))
610 continue;
611 for (GlobalVariable *GV : LDSUsesInfo.indirect_access[&Func]) {
612 if (VariableSet.contains(GV)) {
613 KernelSet.insert(&Func);
614 break;
615 }
616 }
617 }
618
619 return KernelSet;
620 }
621
622 static GlobalVariable *
623 chooseBestVariableForModuleStrategy(const DataLayout &DL,
624 VariableFunctionMap &LDSVars) {
625 // Find the global variable with the most indirect uses from kernels
626
627 struct CandidateTy {
628 GlobalVariable *GV = nullptr;
629 size_t UserCount = 0;
630 size_t Size = 0;
631
632 CandidateTy() = default;
633
634 CandidateTy(GlobalVariable *GV, uint64_t UserCount, uint64_t AllocSize)
635 : GV(GV), UserCount(UserCount), Size(AllocSize) {}
636
637 bool operator<(const CandidateTy &Other) const {
638 // Fewer users makes module scope variable less attractive
639 if (UserCount < Other.UserCount) {
640 return true;
641 }
642 if (UserCount > Other.UserCount) {
643 return false;
644 }
645
646 // Bigger makes module scope variable less attractive
647 if (Size < Other.Size) {
648 return false;
649 }
650
651 if (Size > Other.Size) {
652 return true;
653 }
654
655 // Arbitrary but consistent
656 return GV->getName() < Other.GV->getName();
657 }
658 };
659
660 CandidateTy MostUsed;
661
662 for (auto &K : LDSVars) {
663 GlobalVariable *GV = K.first;
664 if (K.second.size() <= 1) {
665 // A variable reachable by only one kernel is best lowered with kernel
666 // strategy
667 continue;
668 }
669 CandidateTy Candidate(
670 GV, K.second.size(),
671 DL.getTypeAllocSize(GV->getValueType()).getFixedValue());
672 if (MostUsed < Candidate)
673 MostUsed = Candidate;
674 }
675
676 return MostUsed.GV;
677 }
678
679 static void recordLDSAbsoluteAddress(Module *M, GlobalVariable *GV,
680 uint32_t Address) {
681 // Write the specified address into metadata where it can be retrieved by
682 // the assembler. Format is a half open range, [Address Address+1)
683 LLVMContext &Ctx = M->getContext();
684 auto *IntTy =
685 M->getDataLayout().getIntPtrType(Ctx, AMDGPUAS::LOCAL_ADDRESS);
686 auto *MinC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address));
687 auto *MaxC = ConstantAsMetadata::get(ConstantInt::get(IntTy, Address + 1));
688 GV->setMetadata(LLVMContext::MD_absolute_symbol,
689 MDNode::get(Ctx, {MinC, MaxC}));
690 }
691
692 DenseMap<Function *, Value *> tableKernelIndexCache;
693 Value *getTableLookupKernelIndex(Module &M, Function *F) {
694 // Accesses from a function use the amdgcn_lds_kernel_id intrinsic which
695 // lowers to a read from a live in register. Emit it once in the entry
696 // block to spare deduplicating it later.
697 auto [It, Inserted] = tableKernelIndexCache.try_emplace(F);
698 if (Inserted) {
699 Function *Decl =
700 Intrinsic::getDeclaration(&M, Intrinsic::amdgcn_lds_kernel_id, {});
701
702 auto InsertAt = F->getEntryBlock().getFirstNonPHIOrDbgOrAlloca();
703 IRBuilder<> Builder(&*InsertAt);
704
705 It->second = Builder.CreateCall(Decl, {});
706 }
707
708 return It->second;
709 }
710
711 static std::vector<Function *> assignLDSKernelIDToEachKernel(
712 Module *M, DenseSet<Function *> const &KernelsThatAllocateTableLDS,
713 DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS) {
714 // Associate kernels in the set with an arbirary but reproducible order and
715 // annotate them with that order in metadata. This metadata is recognised by
716 // the backend and lowered to a SGPR which can be read from using
717 // amdgcn_lds_kernel_id.
718
719 std::vector<Function *> OrderedKernels;
720 if (!KernelsThatAllocateTableLDS.empty() ||
721 !KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
722
723 for (Function &Func : M->functions()) {
724 if (Func.isDeclaration())
725 continue;
726 if (!isKernelLDS(&Func))
727 continue;
728
729 if (KernelsThatAllocateTableLDS.contains(&Func) ||
730 KernelsThatIndirectlyAllocateDynamicLDS.contains(&Func)) {
731 assert(Func.hasName()); // else fatal error earlier
732 OrderedKernels.push_back(&Func);
733 }
734 }
735
736 // Put them in an arbitrary but reproducible order
737 OrderedKernels = sortByName(std::move(OrderedKernels));
738
739 // Annotate the kernels with their order in this vector
740 LLVMContext &Ctx = M->getContext();
741 IRBuilder<> Builder(Ctx);
742
743 if (OrderedKernels.size() > UINT32_MAX) {
744 // 32 bit keeps it in one SGPR. > 2**32 kernels won't fit on the GPU
745 report_fatal_error("Unimplemented LDS lowering for > 2**32 kernels");
746 }
747
748 for (size_t i = 0; i < OrderedKernels.size(); i++) {
749 Metadata *AttrMDArgs[1] = {
750 ConstantAsMetadata::get(Builder.getInt32(i)),
751 };
752 OrderedKernels[i]->setMetadata("llvm.amdgcn.lds.kernel.id",
753 MDNode::get(Ctx, AttrMDArgs));
754 }
755 }
756 return OrderedKernels;
757 }
758
759 static void partitionVariablesIntoIndirectStrategies(
760 Module &M, LDSUsesInfoTy const &LDSUsesInfo,
761 VariableFunctionMap &LDSToKernelsThatNeedToAccessItIndirectly,
762 DenseSet<GlobalVariable *> &ModuleScopeVariables,
763 DenseSet<GlobalVariable *> &TableLookupVariables,
764 DenseSet<GlobalVariable *> &KernelAccessVariables,
765 DenseSet<GlobalVariable *> &DynamicVariables) {
766
767 GlobalVariable *HybridModuleRoot =
768 LoweringKindLoc != LoweringKind::hybrid
769 ? nullptr
770 : chooseBestVariableForModuleStrategy(
771 M.getDataLayout(), LDSToKernelsThatNeedToAccessItIndirectly);
772
773 DenseSet<Function *> const EmptySet;
774 DenseSet<Function *> const &HybridModuleRootKernels =
775 HybridModuleRoot
776 ? LDSToKernelsThatNeedToAccessItIndirectly[HybridModuleRoot]
777 : EmptySet;
778
779 for (auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {
780 // Each iteration of this loop assigns exactly one global variable to
781 // exactly one of the implementation strategies.
782
783 GlobalVariable *GV = K.first;
785 assert(K.second.size() != 0);
786
787 if (AMDGPU::isDynamicLDS(*GV)) {
788 DynamicVariables.insert(GV);
789 continue;
790 }
791
792 switch (LoweringKindLoc) {
793 case LoweringKind::module:
794 ModuleScopeVariables.insert(GV);
795 break;
796
797 case LoweringKind::table:
798 TableLookupVariables.insert(GV);
799 break;
800
801 case LoweringKind::kernel:
802 if (K.second.size() == 1) {
803 KernelAccessVariables.insert(GV);
804 } else {
806 "cannot lower LDS '" + GV->getName() +
807 "' to kernel access as it is reachable from multiple kernels");
808 }
809 break;
810
811 case LoweringKind::hybrid: {
812 if (GV == HybridModuleRoot) {
813 assert(K.second.size() != 1);
814 ModuleScopeVariables.insert(GV);
815 } else if (K.second.size() == 1) {
816 KernelAccessVariables.insert(GV);
817 } else if (set_is_subset(K.second, HybridModuleRootKernels)) {
818 ModuleScopeVariables.insert(GV);
819 } else {
820 TableLookupVariables.insert(GV);
821 }
822 break;
823 }
824 }
825 }
826
827 // All LDS variables accessed indirectly have now been partitioned into
828 // the distinct lowering strategies.
829 assert(ModuleScopeVariables.size() + TableLookupVariables.size() +
830 KernelAccessVariables.size() + DynamicVariables.size() ==
831 LDSToKernelsThatNeedToAccessItIndirectly.size());
832 }
833
834 static GlobalVariable *lowerModuleScopeStructVariables(
835 Module &M, DenseSet<GlobalVariable *> const &ModuleScopeVariables,
836 DenseSet<Function *> const &KernelsThatAllocateModuleLDS) {
837 // Create a struct to hold the ModuleScopeVariables
838 // Replace all uses of those variables from non-kernel functions with the
839 // new struct instance Replace only the uses from kernel functions that will
840 // allocate this instance. That is a space optimisation - kernels that use a
841 // subset of the module scope struct and do not need to allocate it for
842 // indirect calls will only allocate the subset they use (they do so as part
843 // of the per-kernel lowering).
844 if (ModuleScopeVariables.empty()) {
845 return nullptr;
846 }
847
848 LLVMContext &Ctx = M.getContext();
849
850 LDSVariableReplacement ModuleScopeReplacement =
851 createLDSVariableReplacement(M, "llvm.amdgcn.module.lds",
852 ModuleScopeVariables);
853
854 appendToCompilerUsed(M, {static_cast<GlobalValue *>(
856 cast<Constant>(ModuleScopeReplacement.SGV),
857 Type::getInt8PtrTy(Ctx)))});
858
859 // module.lds will be allocated at zero in any kernel that allocates it
860 recordLDSAbsoluteAddress(&M, ModuleScopeReplacement.SGV, 0);
861
862 // historic
863 removeLocalVarsFromUsedLists(M, ModuleScopeVariables);
864
865 // Replace all uses of module scope variable from non-kernel functions
866 replaceLDSVariablesWithStruct(
867 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
868 Instruction *I = dyn_cast<Instruction>(U.getUser());
869 if (!I) {
870 return false;
871 }
872 Function *F = I->getFunction();
873 return !isKernelLDS(F);
874 });
875
876 // Replace uses of module scope variable from kernel functions that
877 // allocate the module scope variable, otherwise leave them unchanged
878 // Record on each kernel whether the module scope global is used by it
879
880 for (Function &Func : M.functions()) {
881 if (Func.isDeclaration() || !isKernelLDS(&Func))
882 continue;
883
884 if (KernelsThatAllocateModuleLDS.contains(&Func)) {
885 replaceLDSVariablesWithStruct(
886 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
887 Instruction *I = dyn_cast<Instruction>(U.getUser());
888 if (!I) {
889 return false;
890 }
891 Function *F = I->getFunction();
892 return F == &Func;
893 });
894
895 markUsedByKernel(&Func, ModuleScopeReplacement.SGV);
896 }
897 }
898
899 return ModuleScopeReplacement.SGV;
900 }
901
903 lowerKernelScopeStructVariables(
904 Module &M, LDSUsesInfoTy &LDSUsesInfo,
905 DenseSet<GlobalVariable *> const &ModuleScopeVariables,
906 DenseSet<Function *> const &KernelsThatAllocateModuleLDS,
907 GlobalVariable *MaybeModuleScopeStruct) {
908
909 // Create a struct for each kernel for the non-module-scope variables.
910
912 for (Function &Func : M.functions()) {
913 if (Func.isDeclaration() || !isKernelLDS(&Func))
914 continue;
915
916 DenseSet<GlobalVariable *> KernelUsedVariables;
917 // Allocating variables that are used directly in this struct to get
918 // alignment aware allocation and predictable frame size.
919 for (auto &v : LDSUsesInfo.direct_access[&Func]) {
920 if (!AMDGPU::isDynamicLDS(*v)) {
921 KernelUsedVariables.insert(v);
922 }
923 }
924
925 // Allocating variables that are accessed indirectly so that a lookup of
926 // this struct instance can find them from nested functions.
927 for (auto &v : LDSUsesInfo.indirect_access[&Func]) {
928 if (!AMDGPU::isDynamicLDS(*v)) {
929 KernelUsedVariables.insert(v);
930 }
931 }
932
933 // Variables allocated in module lds must all resolve to that struct,
934 // not to the per-kernel instance.
935 if (KernelsThatAllocateModuleLDS.contains(&Func)) {
936 for (GlobalVariable *v : ModuleScopeVariables) {
937 KernelUsedVariables.erase(v);
938 }
939 }
940
941 if (KernelUsedVariables.empty()) {
942 // Either used no LDS, or the LDS it used was all in the module struct
943 // or dynamically sized
944 continue;
945 }
946
947 // The association between kernel function and LDS struct is done by
948 // symbol name, which only works if the function in question has a
949 // name This is not expected to be a problem in practice as kernels
950 // are called by name making anonymous ones (which are named by the
951 // backend) difficult to use. This does mean that llvm test cases need
952 // to name the kernels.
953 if (!Func.hasName()) {
954 report_fatal_error("Anonymous kernels cannot use LDS variables");
955 }
956
957 std::string VarName =
958 (Twine("llvm.amdgcn.kernel.") + Func.getName() + ".lds").str();
959
960 auto Replacement =
961 createLDSVariableReplacement(M, VarName, KernelUsedVariables);
962
963 // If any indirect uses, create a direct use to ensure allocation
964 // TODO: Simpler to unconditionally mark used but that regresses
965 // codegen in test/CodeGen/AMDGPU/noclobber-barrier.ll
966 auto Accesses = LDSUsesInfo.indirect_access.find(&Func);
967 if ((Accesses != LDSUsesInfo.indirect_access.end()) &&
968 !Accesses->second.empty())
969 markUsedByKernel(&Func, Replacement.SGV);
970
971 // remove preserves existing codegen
972 removeLocalVarsFromUsedLists(M, KernelUsedVariables);
973 KernelToReplacement[&Func] = Replacement;
974
975 // Rewrite uses within kernel to the new struct
976 replaceLDSVariablesWithStruct(
977 M, KernelUsedVariables, Replacement, [&Func](Use &U) {
978 Instruction *I = dyn_cast<Instruction>(U.getUser());
979 return I && I->getFunction() == &Func;
980 });
981 }
982 return KernelToReplacement;
983 }
984
985 static GlobalVariable *
986 buildRepresentativeDynamicLDSInstance(Module &M, LDSUsesInfoTy &LDSUsesInfo,
987 Function *func) {
988 // Create a dynamic lds variable with a name associated with the passed
989 // function that has the maximum alignment of any dynamic lds variable
990 // reachable from this kernel. Dynamic LDS is allocated after the static LDS
991 // allocation, possibly after alignment padding. The representative variable
992 // created here has the maximum alignment of any other dynamic variable
993 // reachable by that kernel. All dynamic LDS variables are allocated at the
994 // same address in each kernel in order to provide the documented aliasing
995 // semantics. Setting the alignment here allows this IR pass to accurately
996 // predict the exact constant at which it will be allocated.
997
998 assert(isKernelLDS(func));
999
1000 LLVMContext &Ctx = M.getContext();
1001 const DataLayout &DL = M.getDataLayout();
1002 Align MaxDynamicAlignment(1);
1003
1004 auto UpdateMaxAlignment = [&MaxDynamicAlignment, &DL](GlobalVariable *GV) {
1005 if (AMDGPU::isDynamicLDS(*GV)) {
1006 MaxDynamicAlignment =
1007 std::max(MaxDynamicAlignment, AMDGPU::getAlign(DL, GV));
1008 }
1009 };
1010
1011 for (GlobalVariable *GV : LDSUsesInfo.indirect_access[func]) {
1012 UpdateMaxAlignment(GV);
1013 }
1014
1015 for (GlobalVariable *GV : LDSUsesInfo.direct_access[func]) {
1016 UpdateMaxAlignment(GV);
1017 }
1018
1019 assert(func->hasName()); // Checked by caller
1020 auto emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
1022 M, emptyCharArray, false, GlobalValue::ExternalLinkage, nullptr,
1023 Twine("llvm.amdgcn." + func->getName() + ".dynlds"), nullptr, GlobalValue::NotThreadLocal, AMDGPUAS::LOCAL_ADDRESS,
1024 false);
1025 N->setAlignment(MaxDynamicAlignment);
1026
1028 return N;
1029 }
1030
1031 DenseMap<Function *, GlobalVariable *> lowerDynamicLDSVariables(
1032 Module &M, LDSUsesInfoTy &LDSUsesInfo,
1033 DenseSet<Function *> const &KernelsThatIndirectlyAllocateDynamicLDS,
1034 DenseSet<GlobalVariable *> const &DynamicVariables,
1035 std::vector<Function *> const &OrderedKernels) {
1036 DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS;
1037 if (!KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
1038 LLVMContext &Ctx = M.getContext();
1039 IRBuilder<> Builder(Ctx);
1040 Type *I32 = Type::getInt32Ty(Ctx);
1041
1042 std::vector<Constant *> newDynamicLDS;
1043
1044 // Table is built in the same order as OrderedKernels
1045 for (auto &func : OrderedKernels) {
1046
1047 if (KernelsThatIndirectlyAllocateDynamicLDS.contains(func)) {
1048 assert(isKernelLDS(func));
1049 if (!func->hasName()) {
1050 report_fatal_error("Anonymous kernels cannot use LDS variables");
1051 }
1052
1053 GlobalVariable *N =
1054 buildRepresentativeDynamicLDSInstance(M, LDSUsesInfo, func);
1055
1056 KernelToCreatedDynamicLDS[func] = N;
1057
1058 markUsedByKernel(func, N);
1059
1060 auto emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
1062 emptyCharArray, N, ConstantInt::get(I32, 0), true);
1063 newDynamicLDS.push_back(ConstantExpr::getPtrToInt(GEP, I32));
1064 } else {
1065 newDynamicLDS.push_back(PoisonValue::get(I32));
1066 }
1067 }
1068 assert(OrderedKernels.size() == newDynamicLDS.size());
1069
1070 ArrayType *t = ArrayType::get(I32, newDynamicLDS.size());
1071 Constant *init = ConstantArray::get(t, newDynamicLDS);
1072 GlobalVariable *table = new GlobalVariable(
1073 M, t, true, GlobalValue::InternalLinkage, init,
1074 "llvm.amdgcn.dynlds.offset.table", nullptr,
1076
1077 for (GlobalVariable *GV : DynamicVariables) {
1078 for (Use &U : make_early_inc_range(GV->uses())) {
1079 auto *I = dyn_cast<Instruction>(U.getUser());
1080 if (!I)
1081 continue;
1082 if (isKernelLDS(I->getFunction()))
1083 continue;
1084
1085 replaceUseWithTableLookup(M, Builder, table, GV, U, nullptr);
1086 }
1087 }
1088 }
1089 return KernelToCreatedDynamicLDS;
1090 }
1091
1092 bool runOnModule(Module &M) {
1093 CallGraph CG = CallGraph(M);
1094 bool Changed = superAlignLDSGlobals(M);
1095
1096 Changed |= eliminateConstantExprUsesOfLDSFromAllInstructions(M);
1097
1098 Changed = true; // todo: narrow this down
1099
1100 // For each kernel, what variables does it access directly or through
1101 // callees
1102 LDSUsesInfoTy LDSUsesInfo = getTransitiveUsesOfLDS(CG, M);
1103
1104 // For each variable accessed through callees, which kernels access it
1105 VariableFunctionMap LDSToKernelsThatNeedToAccessItIndirectly;
1106 for (auto &K : LDSUsesInfo.indirect_access) {
1107 Function *F = K.first;
1108 assert(isKernelLDS(F));
1109 for (GlobalVariable *GV : K.second) {
1110 LDSToKernelsThatNeedToAccessItIndirectly[GV].insert(F);
1111 }
1112 }
1113
1114 // Partition variables accessed indirectly into the different strategies
1115 DenseSet<GlobalVariable *> ModuleScopeVariables;
1116 DenseSet<GlobalVariable *> TableLookupVariables;
1117 DenseSet<GlobalVariable *> KernelAccessVariables;
1118 DenseSet<GlobalVariable *> DynamicVariables;
1119 partitionVariablesIntoIndirectStrategies(
1120 M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly,
1121 ModuleScopeVariables, TableLookupVariables, KernelAccessVariables,
1122 DynamicVariables);
1123
1124 // If the kernel accesses a variable that is going to be stored in the
1125 // module instance through a call then that kernel needs to allocate the
1126 // module instance
1127 const DenseSet<Function *> KernelsThatAllocateModuleLDS =
1128 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1129 ModuleScopeVariables);
1130 const DenseSet<Function *> KernelsThatAllocateTableLDS =
1131 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1132 TableLookupVariables);
1133
1134 const DenseSet<Function *> KernelsThatIndirectlyAllocateDynamicLDS =
1135 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1136 DynamicVariables);
1137
1138 GlobalVariable *MaybeModuleScopeStruct = lowerModuleScopeStructVariables(
1139 M, ModuleScopeVariables, KernelsThatAllocateModuleLDS);
1140
1142 lowerKernelScopeStructVariables(M, LDSUsesInfo, ModuleScopeVariables,
1143 KernelsThatAllocateModuleLDS,
1144 MaybeModuleScopeStruct);
1145
1146 // Lower zero cost accesses to the kernel instances just created
1147 for (auto &GV : KernelAccessVariables) {
1148 auto &funcs = LDSToKernelsThatNeedToAccessItIndirectly[GV];
1149 assert(funcs.size() == 1); // Only one kernel can access it
1150 LDSVariableReplacement Replacement =
1151 KernelToReplacement[*(funcs.begin())];
1152
1154 Vec.insert(GV);
1155
1156 replaceLDSVariablesWithStruct(M, Vec, Replacement, [](Use &U) {
1157 return isa<Instruction>(U.getUser());
1158 });
1159 }
1160
1161 // The ith element of this vector is kernel id i
1162 std::vector<Function *> OrderedKernels =
1163 assignLDSKernelIDToEachKernel(&M, KernelsThatAllocateTableLDS,
1164 KernelsThatIndirectlyAllocateDynamicLDS);
1165
1166 if (!KernelsThatAllocateTableLDS.empty()) {
1167 LLVMContext &Ctx = M.getContext();
1168 IRBuilder<> Builder(Ctx);
1169
1170 // The order must be consistent between lookup table and accesses to
1171 // lookup table
1172 auto TableLookupVariablesOrdered =
1173 sortByName(std::vector<GlobalVariable *>(TableLookupVariables.begin(),
1174 TableLookupVariables.end()));
1175
1176 GlobalVariable *LookupTable = buildLookupTable(
1177 M, TableLookupVariablesOrdered, OrderedKernels, KernelToReplacement);
1178 replaceUsesInInstructionsWithTableLookup(M, TableLookupVariablesOrdered,
1179 LookupTable);
1180 }
1181
1182 DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS =
1183 lowerDynamicLDSVariables(M, LDSUsesInfo,
1184 KernelsThatIndirectlyAllocateDynamicLDS,
1185 DynamicVariables, OrderedKernels);
1186
1187 // All kernel frames have been allocated. Calculate and record the
1188 // addresses.
1189 {
1190 const DataLayout &DL = M.getDataLayout();
1191
1192 for (Function &Func : M.functions()) {
1193 if (Func.isDeclaration() || !isKernelLDS(&Func))
1194 continue;
1195
1196 // All three of these are optional. The first variable is allocated at
1197 // zero. They are allocated by AMDGPUMachineFunction as one block.
1198 // Layout:
1199 //{
1200 // module.lds
1201 // alignment padding
1202 // kernel instance
1203 // alignment padding
1204 // dynamic lds variables
1205 //}
1206
1207 const bool AllocateModuleScopeStruct =
1208 MaybeModuleScopeStruct &&
1209 KernelsThatAllocateModuleLDS.contains(&Func);
1210
1211 auto Replacement = KernelToReplacement.find(&Func);
1212 const bool AllocateKernelScopeStruct =
1213 Replacement != KernelToReplacement.end();
1214
1215 const bool AllocateDynamicVariable =
1216 KernelToCreatedDynamicLDS.contains(&Func);
1217
1218 uint32_t Offset = 0;
1219
1220 if (AllocateModuleScopeStruct) {
1221 // Allocated at zero, recorded once on construction, not once per
1222 // kernel
1223 Offset += DL.getTypeAllocSize(MaybeModuleScopeStruct->getValueType());
1224 }
1225
1226 if (AllocateKernelScopeStruct) {
1227 GlobalVariable *KernelStruct = Replacement->second.SGV;
1228 Offset = alignTo(Offset, AMDGPU::getAlign(DL, KernelStruct));
1229 recordLDSAbsoluteAddress(&M, KernelStruct, Offset);
1230 Offset += DL.getTypeAllocSize(KernelStruct->getValueType());
1231 }
1232
1233 // If there is dynamic allocation, the alignment needed is included in
1234 // the static frame size. There may be no reference to the dynamic
1235 // variable in the kernel itself, so without including it here, that
1236 // alignment padding could be missed.
1237 if (AllocateDynamicVariable) {
1238 GlobalVariable *DynamicVariable = KernelToCreatedDynamicLDS[&Func];
1239 Offset = alignTo(Offset, AMDGPU::getAlign(DL, DynamicVariable));
1240 recordLDSAbsoluteAddress(&M, DynamicVariable, Offset);
1241 }
1242
1243 if (Offset != 0) {
1244 (void)TM; // TODO: Account for target maximum LDS
1245 std::string Buffer;
1246 raw_string_ostream SS{Buffer};
1247 SS << format("%u", Offset);
1248
1249 // Instead of explictly marking kernels that access dynamic variables
1250 // using special case metadata, annotate with min-lds == max-lds, i.e.
1251 // that there is no more space available for allocating more static
1252 // LDS variables. That is the right condition to prevent allocating
1253 // more variables which would collide with the addresses assigned to
1254 // dynamic variables.
1255 if (AllocateDynamicVariable)
1256 SS << format(",%u", Offset);
1257
1258 Func.addFnAttr("amdgpu-lds-size", Buffer);
1259 }
1260 }
1261 }
1262
1263 for (auto &GV : make_early_inc_range(M.globals()))
1265 // probably want to remove from used lists
1267 if (GV.use_empty())
1268 GV.eraseFromParent();
1269 }
1270
1271 return Changed;
1272 }
1273
1274private:
1275 // Increase the alignment of LDS globals if necessary to maximise the chance
1276 // that we can use aligned LDS instructions to access them.
1277 static bool superAlignLDSGlobals(Module &M) {
1278 const DataLayout &DL = M.getDataLayout();
1279 bool Changed = false;
1280 if (!SuperAlignLDSGlobals) {
1281 return Changed;
1282 }
1283
1284 for (auto &GV : M.globals()) {
1286 // Only changing alignment of LDS variables
1287 continue;
1288 }
1289 if (!GV.hasInitializer()) {
1290 // cuda/hip extern __shared__ variable, leave alignment alone
1291 continue;
1292 }
1293
1294 Align Alignment = AMDGPU::getAlign(DL, &GV);
1295 TypeSize GVSize = DL.getTypeAllocSize(GV.getValueType());
1296
1297 if (GVSize > 8) {
1298 // We might want to use a b96 or b128 load/store
1299 Alignment = std::max(Alignment, Align(16));
1300 } else if (GVSize > 4) {
1301 // We might want to use a b64 load/store
1302 Alignment = std::max(Alignment, Align(8));
1303 } else if (GVSize > 2) {
1304 // We might want to use a b32 load/store
1305 Alignment = std::max(Alignment, Align(4));
1306 } else if (GVSize > 1) {
1307 // We might want to use a b16 load/store
1308 Alignment = std::max(Alignment, Align(2));
1309 }
1310
1311 if (Alignment != AMDGPU::getAlign(DL, &GV)) {
1312 Changed = true;
1313 GV.setAlignment(Alignment);
1314 }
1315 }
1316 return Changed;
1317 }
1318
1319 static LDSVariableReplacement createLDSVariableReplacement(
1320 Module &M, std::string VarName,
1321 DenseSet<GlobalVariable *> const &LDSVarsToTransform) {
1322 // Create a struct instance containing LDSVarsToTransform and map from those
1323 // variables to ConstantExprGEP
1324 // Variables may be introduced to meet alignment requirements. No aliasing
1325 // metadata is useful for these as they have no uses. Erased before return.
1326
1327 LLVMContext &Ctx = M.getContext();
1328 const DataLayout &DL = M.getDataLayout();
1329 assert(!LDSVarsToTransform.empty());
1330
1332 LayoutFields.reserve(LDSVarsToTransform.size());
1333 {
1334 // The order of fields in this struct depends on the order of
1335 // varables in the argument which varies when changing how they
1336 // are identified, leading to spurious test breakage.
1337 auto Sorted = sortByName(std::vector<GlobalVariable *>(
1338 LDSVarsToTransform.begin(), LDSVarsToTransform.end()));
1339
1340 for (GlobalVariable *GV : Sorted) {
1342 DL.getTypeAllocSize(GV->getValueType()),
1343 AMDGPU::getAlign(DL, GV));
1344 LayoutFields.emplace_back(F);
1345 }
1346 }
1347
1348 performOptimizedStructLayout(LayoutFields);
1349
1350 std::vector<GlobalVariable *> LocalVars;
1351 BitVector IsPaddingField;
1352 LocalVars.reserve(LDSVarsToTransform.size()); // will be at least this large
1353 IsPaddingField.reserve(LDSVarsToTransform.size());
1354 {
1355 uint64_t CurrentOffset = 0;
1356 for (size_t I = 0; I < LayoutFields.size(); I++) {
1357 GlobalVariable *FGV = static_cast<GlobalVariable *>(
1358 const_cast<void *>(LayoutFields[I].Id));
1359 Align DataAlign = LayoutFields[I].Alignment;
1360
1361 uint64_t DataAlignV = DataAlign.value();
1362 if (uint64_t Rem = CurrentOffset % DataAlignV) {
1363 uint64_t Padding = DataAlignV - Rem;
1364
1365 // Append an array of padding bytes to meet alignment requested
1366 // Note (o + (a - (o % a)) ) % a == 0
1367 // (offset + Padding ) % align == 0
1368
1369 Type *ATy = ArrayType::get(Type::getInt8Ty(Ctx), Padding);
1370 LocalVars.push_back(new GlobalVariable(
1371 M, ATy, false, GlobalValue::InternalLinkage,
1373 AMDGPUAS::LOCAL_ADDRESS, false));
1374 IsPaddingField.push_back(true);
1375 CurrentOffset += Padding;
1376 }
1377
1378 LocalVars.push_back(FGV);
1379 IsPaddingField.push_back(false);
1380 CurrentOffset += LayoutFields[I].Size;
1381 }
1382 }
1383
1384 std::vector<Type *> LocalVarTypes;
1385 LocalVarTypes.reserve(LocalVars.size());
1386 std::transform(
1387 LocalVars.cbegin(), LocalVars.cend(), std::back_inserter(LocalVarTypes),
1388 [](const GlobalVariable *V) -> Type * { return V->getValueType(); });
1389
1390 StructType *LDSTy = StructType::create(Ctx, LocalVarTypes, VarName + ".t");
1391
1392 Align StructAlign = AMDGPU::getAlign(DL, LocalVars[0]);
1393
1394 GlobalVariable *SGV = new GlobalVariable(
1395 M, LDSTy, false, GlobalValue::InternalLinkage, PoisonValue::get(LDSTy),
1397 false);
1398 SGV->setAlignment(StructAlign);
1399
1401 Type *I32 = Type::getInt32Ty(Ctx);
1402 for (size_t I = 0; I < LocalVars.size(); I++) {
1403 GlobalVariable *GV = LocalVars[I];
1404 Constant *GEPIdx[] = {ConstantInt::get(I32, 0), ConstantInt::get(I32, I)};
1405 Constant *GEP = ConstantExpr::getGetElementPtr(LDSTy, SGV, GEPIdx, true);
1406 if (IsPaddingField[I]) {
1407 assert(GV->use_empty());
1408 GV->eraseFromParent();
1409 } else {
1410 Map[GV] = GEP;
1411 }
1412 }
1413 assert(Map.size() == LDSVarsToTransform.size());
1414 return {SGV, std::move(Map)};
1415 }
1416
1417 template <typename PredicateTy>
1418 static void replaceLDSVariablesWithStruct(
1419 Module &M, DenseSet<GlobalVariable *> const &LDSVarsToTransformArg,
1420 const LDSVariableReplacement &Replacement, PredicateTy Predicate) {
1421 LLVMContext &Ctx = M.getContext();
1422 const DataLayout &DL = M.getDataLayout();
1423
1424 // A hack... we need to insert the aliasing info in a predictable order for
1425 // lit tests. Would like to have them in a stable order already, ideally the
1426 // same order they get allocated, which might mean an ordered set container
1427 auto LDSVarsToTransform = sortByName(std::vector<GlobalVariable *>(
1428 LDSVarsToTransformArg.begin(), LDSVarsToTransformArg.end()));
1429
1430 // Create alias.scope and their lists. Each field in the new structure
1431 // does not alias with all other fields.
1432 SmallVector<MDNode *> AliasScopes;
1433 SmallVector<Metadata *> NoAliasList;
1434 const size_t NumberVars = LDSVarsToTransform.size();
1435 if (NumberVars > 1) {
1436 MDBuilder MDB(Ctx);
1437 AliasScopes.reserve(NumberVars);
1439 for (size_t I = 0; I < NumberVars; I++) {
1441 AliasScopes.push_back(Scope);
1442 }
1443 NoAliasList.append(&AliasScopes[1], AliasScopes.end());
1444 }
1445
1446 // Replace uses of ith variable with a constantexpr to the corresponding
1447 // field of the instance that will be allocated by AMDGPUMachineFunction
1448 for (size_t I = 0; I < NumberVars; I++) {
1449 GlobalVariable *GV = LDSVarsToTransform[I];
1450 Constant *GEP = Replacement.LDSVarsToConstantGEP.at(GV);
1451
1452 GV->replaceUsesWithIf(GEP, Predicate);
1453
1454 APInt APOff(DL.getIndexTypeSizeInBits(GEP->getType()), 0);
1455 GEP->stripAndAccumulateInBoundsConstantOffsets(DL, APOff);
1456 uint64_t Offset = APOff.getZExtValue();
1457
1458 Align A =
1459 commonAlignment(Replacement.SGV->getAlign().valueOrOne(), Offset);
1460
1461 if (I)
1462 NoAliasList[I - 1] = AliasScopes[I - 1];
1463 MDNode *NoAlias =
1464 NoAliasList.empty() ? nullptr : MDNode::get(Ctx, NoAliasList);
1465 MDNode *AliasScope =
1466 AliasScopes.empty() ? nullptr : MDNode::get(Ctx, {AliasScopes[I]});
1467
1468 refineUsesAlignmentAndAA(GEP, A, DL, AliasScope, NoAlias);
1469 }
1470 }
1471
1472 static void refineUsesAlignmentAndAA(Value *Ptr, Align A,
1473 const DataLayout &DL, MDNode *AliasScope,
1474 MDNode *NoAlias, unsigned MaxDepth = 5) {
1475 if (!MaxDepth || (A == 1 && !AliasScope))
1476 return;
1477
1478 for (User *U : Ptr->users()) {
1479 if (auto *I = dyn_cast<Instruction>(U)) {
1480 if (AliasScope && I->mayReadOrWriteMemory()) {
1481 MDNode *AS = I->getMetadata(LLVMContext::MD_alias_scope);
1482 AS = (AS ? MDNode::getMostGenericAliasScope(AS, AliasScope)
1483 : AliasScope);
1484 I->setMetadata(LLVMContext::MD_alias_scope, AS);
1485
1486 MDNode *NA = I->getMetadata(LLVMContext::MD_noalias);
1487 NA = (NA ? MDNode::intersect(NA, NoAlias) : NoAlias);
1488 I->setMetadata(LLVMContext::MD_noalias, NA);
1489 }
1490 }
1491
1492 if (auto *LI = dyn_cast<LoadInst>(U)) {
1493 LI->setAlignment(std::max(A, LI->getAlign()));
1494 continue;
1495 }
1496 if (auto *SI = dyn_cast<StoreInst>(U)) {
1497 if (SI->getPointerOperand() == Ptr)
1498 SI->setAlignment(std::max(A, SI->getAlign()));
1499 continue;
1500 }
1501 if (auto *AI = dyn_cast<AtomicRMWInst>(U)) {
1502 // None of atomicrmw operations can work on pointers, but let's
1503 // check it anyway in case it will or we will process ConstantExpr.
1504 if (AI->getPointerOperand() == Ptr)
1505 AI->setAlignment(std::max(A, AI->getAlign()));
1506 continue;
1507 }
1508 if (auto *AI = dyn_cast<AtomicCmpXchgInst>(U)) {
1509 if (AI->getPointerOperand() == Ptr)
1510 AI->setAlignment(std::max(A, AI->getAlign()));
1511 continue;
1512 }
1513 if (auto *GEP = dyn_cast<GetElementPtrInst>(U)) {
1514 unsigned BitWidth = DL.getIndexTypeSizeInBits(GEP->getType());
1515 APInt Off(BitWidth, 0);
1516 if (GEP->getPointerOperand() == Ptr) {
1517 Align GA;
1518 if (GEP->accumulateConstantOffset(DL, Off))
1519 GA = commonAlignment(A, Off.getLimitedValue());
1520 refineUsesAlignmentAndAA(GEP, GA, DL, AliasScope, NoAlias,
1521 MaxDepth - 1);
1522 }
1523 continue;
1524 }
1525 if (auto *I = dyn_cast<Instruction>(U)) {
1526 if (I->getOpcode() == Instruction::BitCast ||
1527 I->getOpcode() == Instruction::AddrSpaceCast)
1528 refineUsesAlignmentAndAA(I, A, DL, AliasScope, NoAlias, MaxDepth - 1);
1529 }
1530 }
1531 }
1532};
1533
1534class AMDGPULowerModuleLDSLegacy : public ModulePass {
1535public:
1536 const AMDGPUTargetMachine *TM;
1537 static char ID;
1538
1539 AMDGPULowerModuleLDSLegacy(const AMDGPUTargetMachine *TM_ = nullptr)
1540 : ModulePass(ID), TM(TM_) {
1542 }
1543
1544 void getAnalysisUsage(AnalysisUsage &AU) const override {
1545 if (!TM)
1547 }
1548
1549 bool runOnModule(Module &M) override {
1550 if (!TM) {
1551 auto &TPC = getAnalysis<TargetPassConfig>();
1552 TM = &TPC.getTM<AMDGPUTargetMachine>();
1553 }
1554
1555 return AMDGPULowerModuleLDS(*TM).runOnModule(M);
1556 }
1557};
1558
1559} // namespace
1560char AMDGPULowerModuleLDSLegacy::ID = 0;
1561
1562char &llvm::AMDGPULowerModuleLDSLegacyPassID = AMDGPULowerModuleLDSLegacy::ID;
1563
1564INITIALIZE_PASS_BEGIN(AMDGPULowerModuleLDSLegacy, DEBUG_TYPE,
1565 "Lower uses of LDS variables from non-kernel functions",
1566 false, false)
1568INITIALIZE_PASS_END(AMDGPULowerModuleLDSLegacy, DEBUG_TYPE,
1569 "Lower uses of LDS variables from non-kernel functions",
1571
1572ModulePass *
1574 return new AMDGPULowerModuleLDSLegacy(TM);
1575}
1576
1579 return AMDGPULowerModuleLDS(TM).runOnModule(M) ? PreservedAnalyses::none()
1581}
aarch64 promote const
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
Lower uses of LDS variables from non kernel functions
#define DEBUG_TYPE
AMDGPU promote alloca to vector or LDS
The AMDGPU TargetMachine interface definition for hw codegen targets.
assume Assume Builder
This file implements the BitVector class.
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
This file provides interfaces used to build and manipulate a call graph, which is a very useful tool ...
#define clEnumValN(ENUMVAL, FLAGNAME, DESC)
Definition: CommandLine.h:680
This file contains the declarations for the subclasses of Constant, which represent the different fla...
Given that RA is a live propagate it s liveness to any other values it uses(according to Uses). void DeadArgumentEliminationPass
This file defines the DenseMap class.
This file defines the DenseSet and SmallDenseSet classes.
uint64_t Size
std::optional< std::vector< StOtherPiece > > Other
Definition: ELFYAML.cpp:1272
Hexagon Common GEP
static const unsigned MaxDepth
#define F(x, y, z)
Definition: MD5.cpp:55
#define I(x, y, z)
Definition: MD5.cpp:58
This file provides an interface for laying out a sequence of fields as a struct in a way that attempt...
const char LLVMTargetMachineRef TM
#define INITIALIZE_PASS_DEPENDENCY(depName)
Definition: PassSupport.h:55
#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:59
#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis)
Definition: PassSupport.h:52
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
This file contains some templates that are useful if you are working with the STL at all.
This file defines generic set operations that may be used on set's of different types,...
This file implements a set that has insertion order iteration characteristics.
Target-Independent Code Generator Pass Configuration Options pass.
Class for arbitrary precision integers.
Definition: APInt.h:76
uint64_t getZExtValue() const
Get zero extended value.
Definition: APInt.h:1485
A container for analyses that lazily runs them and caches their results.
Definition: PassManager.h:620
Represent the analysis usage information of a pass.
AnalysisUsage & addRequired()
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition: ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition: ArrayRef.h:165
bool empty() const
empty - Check if the array is empty.
Definition: ArrayRef.h:160
LLVM Basic Block Representation.
Definition: BasicBlock.h:56
const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
Definition: BasicBlock.cpp:257
void reserve(unsigned N)
Definition: BitVector.h:348
void push_back(bool Val)
Definition: BitVector.h:466
std::pair< std::optional< WeakTrackingVH >, CallGraphNode * > CallRecord
A pair of the calling instruction (a call or invoke) and the call graph node being called.
Definition: CallGraph.h:178
The basic data container for the call graph of a Module of IR.
Definition: CallGraph.h:72
static Constant * get(ArrayType *T, ArrayRef< Constant * > V)
Definition: Constants.cpp:1235
static ConstantAsMetadata * get(Constant *C)
Definition: Metadata.h:419
static Constant * getPointerBitCastOrAddrSpaceCast(Constant *C, Type *Ty)
Create a BitCast or AddrSpaceCast for a pointer type depending on the address space.
Definition: Constants.cpp:2040
static Constant * getPtrToInt(Constant *C, Type *Ty, bool OnlyIfReduced=false)
Definition: Constants.cpp:2185
static Constant * getGetElementPtr(Type *Ty, Constant *C, ArrayRef< Constant * > IdxList, bool InBounds=false, std::optional< unsigned > InRangeIndex=std::nullopt, Type *OnlyIfReducedTy=nullptr)
Getelementptr form.
Definition: Constants.h:1225
static Constant * get(Type *Ty, uint64_t V, bool IsSigned=false)
If Ty is a vector type, return a Constant with a splat of the given value.
Definition: Constants.cpp:888
This is an important base class in LLVM.
Definition: Constant.h:41
void removeDeadConstantUsers() const
If there are any dead constant users dangling off of this constant, remove them.
Definition: Constants.cpp:708
A parsed version of the target data layout string in and methods for querying it.
Definition: DataLayout.h:110
iterator find(const_arg_type_t< KeyT > Val)
Definition: DenseMap.h:155
std::pair< iterator, bool > try_emplace(KeyT &&Key, Ts &&... Args)
Definition: DenseMap.h:235
iterator end()
Definition: DenseMap.h:84
bool contains(const_arg_type_t< KeyT > Val) const
Return true if the specified key is in the map, false otherwise.
Definition: DenseMap.h:145
Implements a dense probed hash-table based set.
Definition: DenseSet.h:271
void setMetadata(unsigned KindID, MDNode *Node)
Set a particular kind of metadata attachment.
Definition: Metadata.cpp:1391
void setAlignment(Align Align)
Sets the alignment attribute of the GlobalObject.
Definition: Globals.cpp:128
PointerType * getType() const
Global values are always pointers.
Definition: GlobalValue.h:290
@ InternalLinkage
Rename collisions when linking (static functions).
Definition: GlobalValue.h:55
@ ExternalLinkage
Externally visible function.
Definition: GlobalValue.h:48
Type * getValueType() const
Definition: GlobalValue.h:292
bool hasInitializer() const
Definitions have initializers, declarations don't.
void eraseFromParent()
eraseFromParent - This method unlinks 'this' from the containing module and deletes it.
Definition: Globals.cpp:454
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition: IRBuilder.h:2628
This is an important class for using LLVM in a threaded context.
Definition: LLVMContext.h:67
MDNode * createAnonymousAliasScope(MDNode *Domain, StringRef Name=StringRef())
Return metadata appropriate for an alias scope root node.
Definition: MDBuilder.h:159
MDNode * createAnonymousAliasScopeDomain(StringRef Name=StringRef())
Return metadata appropriate for an alias scope domain node.
Definition: MDBuilder.h:152
Metadata node.
Definition: Metadata.h:950
static MDNode * getMostGenericAliasScope(MDNode *A, MDNode *B)
Definition: Metadata.cpp:1034
static MDTuple * get(LLVMContext &Context, ArrayRef< Metadata * > MDs)
Definition: Metadata.h:1416
static MDNode * intersect(MDNode *A, MDNode *B)
Definition: Metadata.cpp:1021
Root of the metadata hierarchy.
Definition: Metadata.h:61
ModulePass class - This class is used to implement unstructured interprocedural optimizations and ana...
Definition: Pass.h:251
virtual bool runOnModule(Module &M)=0
runOnModule - Virtual method overriden by subclasses to process the module being operated on.
A Module instance is used to store all the information related to an LLVM module.
Definition: Module.h:65
A container for an operand bundle being viewed as a set of values rather than a set of uses.
Definition: InstrTypes.h:1143
static PassRegistry * getPassRegistry()
getPassRegistry - Access the global registry object, which is automatically initialized at applicatio...
virtual void getAnalysisUsage(AnalysisUsage &) const
getAnalysisUsage - This function should be overriden by passes that need analysis information to do t...
Definition: Pass.cpp:98
static PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
Definition: Constants.cpp:1743
A set of analyses that are preserved following a run of a transformation pass.
Definition: PassManager.h:152
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition: PassManager.h:155
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition: PassManager.h:158
size_type count(ConstPtrType Ptr) const
count - Return 1 if the specified pointer is in the set, 0 otherwise.
Definition: SmallPtrSet.h:384
std::pair< iterator, bool > insert(PtrType Ptr)
Inserts Ptr if and only if there is no element in the container equal to Ptr.
Definition: SmallPtrSet.h:366
SmallPtrSet - This class implements a set which is optimized for holding SmallSize or less elements.
Definition: SmallPtrSet.h:451
bool empty() const
Definition: SmallVector.h:94
size_t size() const
Definition: SmallVector.h:91
reference emplace_back(ArgTypes &&... Args)
Definition: SmallVector.h:941
void reserve(size_type N)
Definition: SmallVector.h:667
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition: SmallVector.h:687
void push_back(const T &Elt)
Definition: SmallVector.h:416
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition: SmallVector.h:1200
Class to represent struct types.
Definition: DerivedTypes.h:213
static StructType * create(LLVMContext &Context, StringRef Name)
This creates an identified struct.
Definition: Type.cpp:514
Target-Independent Code Generator Pass Configuration Options.
Twine - A lightweight data structure for efficiently representing the concatenation of temporary valu...
Definition: Twine.h:81
The instances of the Type class are immutable: once they are created, they are never changed.
Definition: Type.h:45
unsigned getPointerAddressSpace() const
Get the address space of this pointer or pointer vector type.
static IntegerType * getInt8Ty(LLVMContext &C)
static PointerType * getInt8PtrTy(LLVMContext &C, unsigned AS=0)
static IntegerType * getInt32Ty(LLVMContext &C)
A Use represents the edge between a Value definition and its users.
Definition: Use.h:43
LLVM Value Representation.
Definition: Value.h:74
void replaceUsesWithIf(Value *New, llvm::function_ref< bool(Use &U)> ShouldReplace)
Go through the uses list for this definition and make each use point to "V" if the callback ShouldRep...
Definition: Value.cpp:543
bool use_empty() const
Definition: Value.h:344
iterator_range< use_iterator > uses()
Definition: Value.h:376
bool hasName() const
Definition: Value.h:261
StringRef getName() const
Return a constant reference to the value's name.
Definition: Value.cpp:309
std::pair< iterator, bool > insert(const ValueT &V)
Definition: DenseSet.h:206
size_type size() const
Definition: DenseSet.h:81
bool contains(const_arg_type_t< ValueT > V) const
Check if the set contains the given element.
Definition: DenseSet.h:185
bool erase(const ValueT &V)
Definition: DenseSet.h:101
A raw_ostream that writes to an std::string.
Definition: raw_ostream.h:642
@ LOCAL_ADDRESS
Address space for local memory.
Definition: AMDGPU.h:382
@ CONSTANT_ADDRESS
Address space for constant memory (VTX2).
Definition: AMDGPU.h:381
LLVM_READNONE bool isKernel(CallingConv::ID CC)
bool isDynamicLDS(const GlobalVariable &GV)
Align getAlign(DataLayout const &DL, const GlobalVariable *GV)
bool isLDSVariableToLower(const GlobalVariable &GV)
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition: CallingConv.h:24
@ C
The default llvm calling convention, compatible with C.
Definition: CallingConv.h:34
Function * getDeclaration(Module *M, ID id, ArrayRef< Type * > Tys=std::nullopt)
Create or insert an LLVM Function declaration for an intrinsic, and return it.
Definition: Function.cpp:1422
ValuesClass values(OptsTy... Options)
Helper to build a ValuesClass by forwarding a variable number of arguments as an initializer list to ...
Definition: CommandLine.h:705
initializer< Ty > init(const Ty &Val)
Definition: CommandLine.h:445
This is an optimization pass for GlobalISel generic memory operations.
Definition: AddressRanges.h:18
@ Offset
Definition: DWP.cpp:440
bool operator<(int64_t V1, const APSInt &V2)
Definition: APSInt.h:361
bool convertUsersOfConstantsToInstructions(ArrayRef< Constant * > Consts)
Replace constant expressions users of the given constants with instructions.
bool set_is_subset(const S1Ty &S1, const S2Ty &S2)
set_is_subset(A, B) - Return true iff A in B
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition: STLExtras.h:666
void initializeAMDGPULowerModuleLDSLegacyPass(PassRegistry &)
void sort(IteratorTy Start, IteratorTy End)
Definition: STLExtras.h:1652
void report_fatal_error(Error Err, bool gen_crash_diag=true)
Report a serious error, calling any installed error handler.
Definition: Error.cpp:156
char & AMDGPULowerModuleLDSLegacyPassID
bool set_union(S1Ty &S1, const S2Ty &S2)
set_union(A, B) - Compute A := A u B, return whether A changed.
Definition: SetOperations.h:23
void removeFromUsedLists(Module &M, function_ref< bool(Constant *)> ShouldRemove)
Removes global values from the llvm.used and llvm.compiler.used arrays.
format_object< Ts... > format(const char *Fmt, const Ts &... Vals)
These are helper functions used to produce formatted output.
Definition: Format.h:125
ModulePass * createAMDGPULowerModuleLDSLegacyPass(const AMDGPUTargetMachine *TM=nullptr)
void appendToCompilerUsed(Module &M, ArrayRef< GlobalValue * > Values)
Adds global values to the llvm.compiler.used list.
std::pair< uint64_t, Align > performOptimizedStructLayout(MutableArrayRef< OptimizedStructLayoutField > Fields)
Compute a layout for a struct containing the given fields, making a best-effort attempt to minimize t...
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition: Alignment.h:155
constexpr unsigned BitWidth
Definition: BitmaskEnum.h:184
Align commonAlignment(Align A, uint64_t Offset)
Returns the alignment that satisfies both alignments.
Definition: Alignment.h:212
#define N
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
const AMDGPUTargetMachine & TM
Definition: AMDGPU.h:121
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition: Alignment.h:39
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition: Alignment.h:85