LLVM 22.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | (Win64 only) callee-saved SVE reg |
48// | |
49// |-----------------------------------|
50// | |
51// | callee-saved gpr registers | <--.
52// | | | On Darwin platforms these
53// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
54// | prev_lr | | (frame record first)
55// | prev_fp | <--'
56// | async context if needed |
57// | (a.k.a. "frame record") |
58// |-----------------------------------| <- fp(=x29)
59// Default SVE stack layout Split SVE objects
60// (aarch64-split-sve-objects=false) (aarch64-split-sve-objects=true)
61// |-----------------------------------| |-----------------------------------|
62// | <hazard padding> | | callee-saved PPR registers |
63// |-----------------------------------| |-----------------------------------|
64// | | | PPR stack objects |
65// | callee-saved fp/simd/SVE regs | |-----------------------------------|
66// | | | <hazard padding> |
67// |-----------------------------------| |-----------------------------------|
68// | | | callee-saved ZPR/FPR registers |
69// | SVE stack objects | |-----------------------------------|
70// | | | ZPR stack objects |
71// |-----------------------------------| |-----------------------------------|
72// ^ NB: FPR CSRs are promoted to ZPRs
73// |-----------------------------------|
74// |.empty.space.to.make.part.below....|
75// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
76// |.the.standard.16-byte.alignment....| compile time; if present)
77// |-----------------------------------|
78// | local variables of fixed size |
79// | including spill slots |
80// | <FPR> |
81// | <hazard padding> |
82// | <GPR> |
83// |-----------------------------------| <- bp(not defined by ABI,
84// |.variable-sized.local.variables....| LLVM chooses X19)
85// |.(VLAs)............................| (size of this area is unknown at
86// |...................................| compile time)
87// |-----------------------------------| <- sp
88// | | Lower address
89//
90//
91// To access the data in a frame, at-compile time, a constant offset must be
92// computable from one of the pointers (fp, bp, sp) to access it. The size
93// of the areas with a dotted background cannot be computed at compile-time
94// if they are present, making it required to have all three of fp, bp and
95// sp to be set up to be able to access all contents in the frame areas,
96// assuming all of the frame areas are non-empty.
97//
98// For most functions, some of the frame areas are empty. For those functions,
99// it may not be necessary to set up fp or bp:
100// * A base pointer is definitely needed when there are both VLAs and local
101// variables with more-than-default alignment requirements.
102// * A frame pointer is definitely needed when there are local variables with
103// more-than-default alignment requirements.
104//
105// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
106// callee-saved area, since the unwind encoding does not allow for encoding
107// this dynamically and existing tools depend on this layout. For other
108// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
109// area to allow SVE stack objects (allocated directly below the callee-saves,
110// if available) to be accessed directly from the framepointer.
111// The SVE spill/fill instructions have VL-scaled addressing modes such
112// as:
113// ldr z8, [fp, #-7 mul vl]
114// For SVE the size of the vector length (VL) is not known at compile-time, so
115// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
116// layout, we don't need to add an unscaled offset to the framepointer before
117// accessing the SVE object in the frame.
118//
119// In some cases when a base pointer is not strictly needed, it is generated
120// anyway when offsets from the frame pointer to access local variables become
121// so large that the offset can't be encoded in the immediate fields of loads
122// or stores.
123//
124// Outgoing function arguments must be at the bottom of the stack frame when
125// calling another function. If we do not have variable-sized stack objects, we
126// can allocate a "reserved call frame" area at the bottom of the local
127// variable area, large enough for all outgoing calls. If we do have VLAs, then
128// the stack pointer must be decremented and incremented around each call to
129// make space for the arguments below the VLAs.
130//
131// FIXME: also explain the redzone concept.
132//
133// About stack hazards: Under some SME contexts, a coprocessor with its own
134// separate cache can used for FP operations. This can create hazards if the CPU
135// and the SME unit try to access the same area of memory, including if the
136// access is to an area of the stack. To try to alleviate this we attempt to
137// introduce extra padding into the stack frame between FP and GPR accesses,
138// controlled by the aarch64-stack-hazard-size option. Without changing the
139// layout of the stack frame in the diagram above, a stack object of size
140// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
141// to the stack objects section, and stack objects are sorted so that FPR >
142// Hazard padding slot > GPRs (where possible). Unfortunately some things are
143// not handled well (VLA area, arguments on the stack, objects with both GPR and
144// FPR accesses), but if those are controlled by the user then the entire stack
145// frame becomes GPR at the start/end with FPR in the middle, surrounded by
146// Hazard padding.
147//
148// An example of the prologue:
149//
150// .globl __foo
151// .align 2
152// __foo:
153// Ltmp0:
154// .cfi_startproc
155// .cfi_personality 155, ___gxx_personality_v0
156// Leh_func_begin:
157// .cfi_lsda 16, Lexception33
158//
159// stp xa,bx, [sp, -#offset]!
160// ...
161// stp x28, x27, [sp, #offset-32]
162// stp fp, lr, [sp, #offset-16]
163// add fp, sp, #offset - 16
164// sub sp, sp, #1360
165//
166// The Stack:
167// +-------------------------------------------+
168// 10000 | ........ | ........ | ........ | ........ |
169// 10004 | ........ | ........ | ........ | ........ |
170// +-------------------------------------------+
171// 10008 | ........ | ........ | ........ | ........ |
172// 1000c | ........ | ........ | ........ | ........ |
173// +===========================================+
174// 10010 | X28 Register |
175// 10014 | X28 Register |
176// +-------------------------------------------+
177// 10018 | X27 Register |
178// 1001c | X27 Register |
179// +===========================================+
180// 10020 | Frame Pointer |
181// 10024 | Frame Pointer |
182// +-------------------------------------------+
183// 10028 | Link Register |
184// 1002c | Link Register |
185// +===========================================+
186// 10030 | ........ | ........ | ........ | ........ |
187// 10034 | ........ | ........ | ........ | ........ |
188// +-------------------------------------------+
189// 10038 | ........ | ........ | ........ | ........ |
190// 1003c | ........ | ........ | ........ | ........ |
191// +-------------------------------------------+
192//
193// [sp] = 10030 :: >>initial value<<
194// sp = 10020 :: stp fp, lr, [sp, #-16]!
195// fp = sp == 10020 :: mov fp, sp
196// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
197// sp == 10010 :: >>final value<<
198//
199// The frame pointer (w29) points to address 10020. If we use an offset of
200// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
201// for w27, and -32 for w28:
202//
203// Ltmp1:
204// .cfi_def_cfa w29, 16
205// Ltmp2:
206// .cfi_offset w30, -8
207// Ltmp3:
208// .cfi_offset w29, -16
209// Ltmp4:
210// .cfi_offset w27, -24
211// Ltmp5:
212// .cfi_offset w28, -32
213//
214//===----------------------------------------------------------------------===//
215
216#include "AArch64FrameLowering.h"
217#include "AArch64InstrInfo.h"
220#include "AArch64RegisterInfo.h"
221#include "AArch64SMEAttributes.h"
222#include "AArch64Subtarget.h"
225#include "llvm/ADT/ScopeExit.h"
226#include "llvm/ADT/SmallVector.h"
244#include "llvm/IR/Attributes.h"
245#include "llvm/IR/CallingConv.h"
246#include "llvm/IR/DataLayout.h"
247#include "llvm/IR/DebugLoc.h"
248#include "llvm/IR/Function.h"
249#include "llvm/MC/MCAsmInfo.h"
250#include "llvm/MC/MCDwarf.h"
252#include "llvm/Support/Debug.h"
259#include <cassert>
260#include <cstdint>
261#include <iterator>
262#include <optional>
263#include <vector>
264
265using namespace llvm;
266
267#define DEBUG_TYPE "frame-info"
268
269static cl::opt<bool> EnableRedZone("aarch64-redzone",
270 cl::desc("enable use of redzone on AArch64"),
271 cl::init(false), cl::Hidden);
272
274 "stack-tagging-merge-settag",
275 cl::desc("merge settag instruction in function epilog"), cl::init(true),
276 cl::Hidden);
277
278static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
279 cl::desc("sort stack allocations"),
280 cl::init(true), cl::Hidden);
281
282static cl::opt<bool>
283 SplitSVEObjects("aarch64-split-sve-objects",
284 cl::desc("Split allocation of ZPR & PPR objects"),
285 cl::init(true), cl::Hidden);
286
288 "homogeneous-prolog-epilog", cl::Hidden,
289 cl::desc("Emit homogeneous prologue and epilogue for the size "
290 "optimization (default = off)"));
291
292// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
294 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
295 cl::Hidden);
296// Whether to insert padding into non-streaming functions (for testing).
297static cl::opt<bool>
298 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
299 cl::init(false), cl::Hidden);
300
302 "aarch64-disable-multivector-spill-fill",
303 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
304 cl::Hidden);
305
306int64_t
307AArch64FrameLowering::getArgumentStackToRestore(MachineFunction &MF,
308 MachineBasicBlock &MBB) const {
309 MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
311 bool IsTailCallReturn = (MBB.end() != MBBI)
313 : false;
314
315 int64_t ArgumentPopSize = 0;
316 if (IsTailCallReturn) {
317 MachineOperand &StackAdjust = MBBI->getOperand(1);
318
319 // For a tail-call in a callee-pops-arguments environment, some or all of
320 // the stack may actually be in use for the call's arguments, this is
321 // calculated during LowerCall and consumed here...
322 ArgumentPopSize = StackAdjust.getImm();
323 } else {
324 // ... otherwise the amount to pop is *all* of the argument space,
325 // conveniently stored in the MachineFunctionInfo by
326 // LowerFormalArguments. This will, of course, be zero for the C calling
327 // convention.
328 ArgumentPopSize = AFI->getArgumentStackToRestore();
329 }
330
331 return ArgumentPopSize;
332}
333
335 MachineFunction &MF);
336
337enum class AssignObjectOffsets { No, Yes };
338/// Process all the SVE stack objects and the SVE stack size and offsets for
339/// each object. If AssignOffsets is "Yes", the offsets get assigned (and SVE
340/// stack sizes set). Returns the size of the SVE stack.
342 AssignObjectOffsets AssignOffsets);
343
344static unsigned getStackHazardSize(const MachineFunction &MF) {
345 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
346}
347
353
356 // With split SVE objects, the hazard padding is added to the PPR region,
357 // which places it between the [GPR, PPR] area and the [ZPR, FPR] area. This
358 // avoids hazards between both GPRs and FPRs and ZPRs and PPRs.
361 : 0,
362 AFI->getStackSizePPR());
363}
364
365// Conservatively, returns true if the function is likely to have SVE vectors
366// on the stack. This function is safe to be called before callee-saves or
367// object offsets have been determined.
369 const MachineFunction &MF) {
370 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
371 if (AFI->isSVECC())
372 return true;
373
374 if (AFI->hasCalculatedStackSizeSVE())
375 return bool(AFL.getSVEStackSize(MF));
376
377 const MachineFrameInfo &MFI = MF.getFrameInfo();
378 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd(); FI++) {
379 if (MFI.hasScalableStackID(FI))
380 return true;
381 }
382
383 return false;
384}
385
386/// Returns true if a homogeneous prolog or epilog code can be emitted
387/// for the size optimization. If possible, a frame helper call is injected.
388/// When Exit block is given, this check is for epilog.
389bool AArch64FrameLowering::homogeneousPrologEpilog(
390 MachineFunction &MF, MachineBasicBlock *Exit) const {
391 if (!MF.getFunction().hasMinSize())
392 return false;
394 return false;
395 if (EnableRedZone)
396 return false;
397
398 // TODO: Window is supported yet.
399 if (needsWinCFI(MF))
400 return false;
401
402 // TODO: SVE is not supported yet.
403 if (isLikelyToHaveSVEStack(*this, MF))
404 return false;
405
406 // Bail on stack adjustment needed on return for simplicity.
407 const MachineFrameInfo &MFI = MF.getFrameInfo();
408 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
409 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
410 return false;
411 if (Exit && getArgumentStackToRestore(MF, *Exit))
412 return false;
413
414 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
416 return false;
417
418 // If there are an odd number of GPRs before LR and FP in the CSRs list,
419 // they will not be paired into one RegPairInfo, which is incompatible with
420 // the assumption made by the homogeneous prolog epilog pass.
421 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
422 unsigned NumGPRs = 0;
423 for (unsigned I = 0; CSRegs[I]; ++I) {
424 Register Reg = CSRegs[I];
425 if (Reg == AArch64::LR) {
426 assert(CSRegs[I + 1] == AArch64::FP);
427 if (NumGPRs % 2 != 0)
428 return false;
429 break;
430 }
431 if (AArch64::GPR64RegClass.contains(Reg))
432 ++NumGPRs;
433 }
434
435 return true;
436}
437
438/// Returns true if CSRs should be paired.
439bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
440 return produceCompactUnwindFrame(*this, MF) || homogeneousPrologEpilog(MF);
441}
442
443/// This is the biggest offset to the stack pointer we can encode in aarch64
444/// instructions (without using a separate calculation and a temp register).
445/// Note that the exception here are vector stores/loads which cannot encode any
446/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
447static const unsigned DefaultSafeSPDisplacement = 255;
448
449/// Look at each instruction that references stack frames and return the stack
450/// size limit beyond which some of these instructions will require a scratch
451/// register during their expansion later.
453 // FIXME: For now, just conservatively guesstimate based on unscaled indexing
454 // range. We'll end up allocating an unnecessary spill slot a lot, but
455 // realistically that's not a big deal at this stage of the game.
456 for (MachineBasicBlock &MBB : MF) {
457 for (MachineInstr &MI : MBB) {
458 if (MI.isDebugInstr() || MI.isPseudo() ||
459 MI.getOpcode() == AArch64::ADDXri ||
460 MI.getOpcode() == AArch64::ADDSXri)
461 continue;
462
463 for (const MachineOperand &MO : MI.operands()) {
464 if (!MO.isFI())
465 continue;
466
468 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
470 return 0;
471 }
472 }
473 }
475}
476
481
482unsigned
483AArch64FrameLowering::getFixedObjectSize(const MachineFunction &MF,
484 const AArch64FunctionInfo *AFI,
485 bool IsWin64, bool IsFunclet) const {
486 assert(AFI->getTailCallReservedStack() % 16 == 0 &&
487 "Tail call reserved stack must be aligned to 16 bytes");
488 if (!IsWin64 || IsFunclet) {
489 return AFI->getTailCallReservedStack();
490 } else {
491 if (AFI->getTailCallReservedStack() != 0 &&
492 !MF.getFunction().getAttributes().hasAttrSomewhere(
493 Attribute::SwiftAsync))
494 report_fatal_error("cannot generate ABI-changing tail call for Win64");
495 unsigned FixedObjectSize = AFI->getTailCallReservedStack();
496
497 // Var args are stored here in the primary function.
498 FixedObjectSize += AFI->getVarArgsGPRSize();
499
500 if (MF.hasEHFunclets()) {
501 // Catch objects are stored here in the primary function.
502 const MachineFrameInfo &MFI = MF.getFrameInfo();
503 const WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
504 SmallSetVector<int, 8> CatchObjFrameIndices;
505 for (const WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
506 for (const WinEHHandlerType &H : TBME.HandlerArray) {
507 int FrameIndex = H.CatchObj.FrameIndex;
508 if ((FrameIndex != INT_MAX) &&
509 CatchObjFrameIndices.insert(FrameIndex)) {
510 FixedObjectSize = alignTo(FixedObjectSize,
511 MFI.getObjectAlign(FrameIndex).value()) +
512 MFI.getObjectSize(FrameIndex);
513 }
514 }
515 }
516 // To support EH funclets we allocate an UnwindHelp object
517 FixedObjectSize += 8;
518 }
519 return alignTo(FixedObjectSize, 16);
520 }
521}
522
524 if (!EnableRedZone)
525 return false;
526
527 // Don't use the red zone if the function explicitly asks us not to.
528 // This is typically used for kernel code.
529 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
530 const unsigned RedZoneSize =
532 if (!RedZoneSize)
533 return false;
534
535 const MachineFrameInfo &MFI = MF.getFrameInfo();
537 uint64_t NumBytes = AFI->getLocalStackSize();
538
539 // If neither NEON or SVE are available, a COPY from one Q-reg to
540 // another requires a spill -> reload sequence. We can do that
541 // using a pre-decrementing store/post-decrementing load, but
542 // if we do so, we can't use the Red Zone.
543 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
544 !Subtarget.isNeonAvailable() &&
545 !Subtarget.hasSVE();
546
547 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
548 AFI->hasSVEStackSize() || LowerQRegCopyThroughMem);
549}
550
551/// hasFPImpl - Return true if the specified function should have a dedicated
552/// frame pointer register.
554 const MachineFrameInfo &MFI = MF.getFrameInfo();
555 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
557
558 // Win64 EH requires a frame pointer if funclets are present, as the locals
559 // are accessed off the frame pointer in both the parent function and the
560 // funclets.
561 if (MF.hasEHFunclets())
562 return true;
563 // Retain behavior of always omitting the FP for leaf functions when possible.
565 return true;
566 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
567 MFI.hasStackMap() || MFI.hasPatchPoint() ||
568 RegInfo->hasStackRealignment(MF))
569 return true;
570
571 // If we:
572 //
573 // 1. Have streaming mode changes
574 // OR:
575 // 2. Have a streaming body with SVE stack objects
576 //
577 // Then the value of VG restored when unwinding to this function may not match
578 // the value of VG used to set up the stack.
579 //
580 // This is a problem as the CFA can be described with an expression of the
581 // form: CFA = SP + NumBytes + VG * NumScalableBytes.
582 //
583 // If the value of VG used in that expression does not match the value used to
584 // set up the stack, an incorrect address for the CFA will be computed, and
585 // unwinding will fail.
586 //
587 // We work around this issue by ensuring the frame-pointer can describe the
588 // CFA in either of these cases.
589 if (AFI.needsDwarfUnwindInfo(MF) &&
592 return true;
593 // With large callframes around we may need to use FP to access the scavenging
594 // emergency spillslot.
595 //
596 // Unfortunately some calls to hasFP() like machine verifier ->
597 // getReservedReg() -> hasFP in the middle of global isel are too early
598 // to know the max call frame size. Hopefully conservatively returning "true"
599 // in those cases is fine.
600 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
601 if (!MFI.isMaxCallFrameSizeComputed() ||
603 return true;
604
605 return false;
606}
607
608/// Should the Frame Pointer be reserved for the current function?
610 const TargetMachine &TM = MF.getTarget();
611 const Triple &TT = TM.getTargetTriple();
612
613 // These OSes require the frame chain is valid, even if the current frame does
614 // not use a frame pointer.
615 if (TT.isOSDarwin() || TT.isOSWindows())
616 return true;
617
618 // If the function has a frame pointer, it is reserved.
619 if (hasFP(MF))
620 return true;
621
622 // Frontend has requested to preserve the frame pointer.
624 return true;
625
626 return false;
627}
628
629/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
630/// not required, we reserve argument space for call sites in the function
631/// immediately on entry to the current function. This eliminates the need for
632/// add/sub sp brackets around call sites. Returns true if the call frame is
633/// included as part of the stack frame.
635 const MachineFunction &MF) const {
636 // The stack probing code for the dynamically allocated outgoing arguments
637 // area assumes that the stack is probed at the top - either by the prologue
638 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
639 // most recent variable-sized object allocation. Changing the condition here
640 // may need to be followed up by changes to the probe issuing logic.
641 return !MF.getFrameInfo().hasVarSizedObjects();
642}
643
647
648 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
649 const AArch64InstrInfo *TII = Subtarget.getInstrInfo();
650 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
651 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
652 DebugLoc DL = I->getDebugLoc();
653 unsigned Opc = I->getOpcode();
654 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
655 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
656
657 if (!hasReservedCallFrame(MF)) {
658 int64_t Amount = I->getOperand(0).getImm();
659 Amount = alignTo(Amount, getStackAlign());
660 if (!IsDestroy)
661 Amount = -Amount;
662
663 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
664 // doesn't have to pop anything), then the first operand will be zero too so
665 // this adjustment is a no-op.
666 if (CalleePopAmount == 0) {
667 // FIXME: in-function stack adjustment for calls is limited to 24-bits
668 // because there's no guaranteed temporary register available.
669 //
670 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
671 // 1) For offset <= 12-bit, we use LSL #0
672 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
673 // LSL #0, and the other uses LSL #12.
674 //
675 // Most call frames will be allocated at the start of a function so
676 // this is OK, but it is a limitation that needs dealing with.
677 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
678
679 if (TLI->hasInlineStackProbe(MF) &&
681 // When stack probing is enabled, the decrement of SP may need to be
682 // probed. We only need to do this if the call site needs 1024 bytes of
683 // space or more, because a region smaller than that is allowed to be
684 // unprobed at an ABI boundary. We rely on the fact that SP has been
685 // probed exactly at this point, either by the prologue or most recent
686 // dynamic allocation.
688 "non-reserved call frame without var sized objects?");
689 Register ScratchReg =
690 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
691 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
692 } else {
693 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
694 StackOffset::getFixed(Amount), TII);
695 }
696 }
697 } else if (CalleePopAmount != 0) {
698 // If the calling convention demands that the callee pops arguments from the
699 // stack, we want to add it back if we have a reserved call frame.
700 assert(CalleePopAmount < 0xffffff && "call frame too large");
701 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
702 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
703 }
704 return MBB.erase(I);
705}
706
708 MachineBasicBlock &MBB) const {
709
710 MachineFunction &MF = *MBB.getParent();
711 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
712 const auto &TRI = *Subtarget.getRegisterInfo();
713 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
714
715 CFIInstBuilder CFIBuilder(MBB, MBB.begin(), MachineInstr::NoFlags);
716
717 // Reset the CFA to `SP + 0`.
718 CFIBuilder.buildDefCFA(AArch64::SP, 0);
719
720 // Flip the RA sign state.
721 if (MFI.shouldSignReturnAddress(MF))
722 MFI.branchProtectionPAuthLR() ? CFIBuilder.buildNegateRAStateWithPC()
723 : CFIBuilder.buildNegateRAState();
724
725 // Shadow call stack uses X18, reset it.
726 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
727 CFIBuilder.buildSameValue(AArch64::X18);
728
729 // Emit .cfi_same_value for callee-saved registers.
730 const std::vector<CalleeSavedInfo> &CSI =
732 for (const auto &Info : CSI) {
733 MCRegister Reg = Info.getReg();
734 if (!TRI.regNeedsCFI(Reg, Reg))
735 continue;
736 CFIBuilder.buildSameValue(Reg);
737 }
738}
739
741 switch (Reg.id()) {
742 default:
743 // The called routine is expected to preserve r19-r28
744 // r29 and r30 are used as frame pointer and link register resp.
745 return 0;
746
747 // GPRs
748#define CASE(n) \
749 case AArch64::W##n: \
750 case AArch64::X##n: \
751 return AArch64::X##n
752 CASE(0);
753 CASE(1);
754 CASE(2);
755 CASE(3);
756 CASE(4);
757 CASE(5);
758 CASE(6);
759 CASE(7);
760 CASE(8);
761 CASE(9);
762 CASE(10);
763 CASE(11);
764 CASE(12);
765 CASE(13);
766 CASE(14);
767 CASE(15);
768 CASE(16);
769 CASE(17);
770 CASE(18);
771#undef CASE
772
773 // FPRs
774#define CASE(n) \
775 case AArch64::B##n: \
776 case AArch64::H##n: \
777 case AArch64::S##n: \
778 case AArch64::D##n: \
779 case AArch64::Q##n: \
780 return HasSVE ? AArch64::Z##n : AArch64::Q##n
781 CASE(0);
782 CASE(1);
783 CASE(2);
784 CASE(3);
785 CASE(4);
786 CASE(5);
787 CASE(6);
788 CASE(7);
789 CASE(8);
790 CASE(9);
791 CASE(10);
792 CASE(11);
793 CASE(12);
794 CASE(13);
795 CASE(14);
796 CASE(15);
797 CASE(16);
798 CASE(17);
799 CASE(18);
800 CASE(19);
801 CASE(20);
802 CASE(21);
803 CASE(22);
804 CASE(23);
805 CASE(24);
806 CASE(25);
807 CASE(26);
808 CASE(27);
809 CASE(28);
810 CASE(29);
811 CASE(30);
812 CASE(31);
813#undef CASE
814 }
815}
816
817void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
818 MachineBasicBlock &MBB) const {
819 // Insertion point.
821
822 // Fake a debug loc.
823 DebugLoc DL;
824 if (MBBI != MBB.end())
825 DL = MBBI->getDebugLoc();
826
827 const MachineFunction &MF = *MBB.getParent();
828 const AArch64Subtarget &STI = MF.getSubtarget<AArch64Subtarget>();
829 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
830
831 BitVector GPRsToZero(TRI.getNumRegs());
832 BitVector FPRsToZero(TRI.getNumRegs());
833 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
834 for (MCRegister Reg : RegsToZero.set_bits()) {
835 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
836 // For GPRs, we only care to clear out the 64-bit register.
837 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
838 GPRsToZero.set(XReg);
839 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
840 // For FPRs,
841 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
842 FPRsToZero.set(XReg);
843 }
844 }
845
846 const AArch64InstrInfo &TII = *STI.getInstrInfo();
847
848 // Zero out GPRs.
849 for (MCRegister Reg : GPRsToZero.set_bits())
850 TII.buildClearRegister(Reg, MBB, MBBI, DL);
851
852 // Zero out FP/vector registers.
853 for (MCRegister Reg : FPRsToZero.set_bits())
854 TII.buildClearRegister(Reg, MBB, MBBI, DL);
855
856 if (HasSVE) {
857 for (MCRegister PReg :
858 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
859 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
860 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
861 AArch64::P15}) {
862 if (RegsToZero[PReg])
863 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
864 }
865 }
866}
867
868bool AArch64FrameLowering::windowsRequiresStackProbe(
869 const MachineFunction &MF, uint64_t StackSizeInBytes) const {
870 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
871 const AArch64FunctionInfo &MFI = *MF.getInfo<AArch64FunctionInfo>();
872 // TODO: When implementing stack protectors, take that into account
873 // for the probe threshold.
874 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
875 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
876}
877
879 const MachineBasicBlock &MBB) {
880 const MachineFunction *MF = MBB.getParent();
881 LiveRegs.addLiveIns(MBB);
882 // Mark callee saved registers as used so we will not choose them.
883 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
884 for (unsigned i = 0; CSRegs[i]; ++i)
885 LiveRegs.addReg(CSRegs[i]);
886}
887
889AArch64FrameLowering::findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB,
890 bool HasCall) const {
891 MachineFunction *MF = MBB->getParent();
892
893 // If MBB is an entry block, use X9 as the scratch register
894 // preserve_none functions may be using X9 to pass arguments,
895 // so prefer to pick an available register below.
896 if (&MF->front() == MBB &&
898 return AArch64::X9;
899
900 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
901 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
902 LivePhysRegs LiveRegs(TRI);
903 getLiveRegsForEntryMBB(LiveRegs, *MBB);
904 if (HasCall) {
905 LiveRegs.addReg(AArch64::X16);
906 LiveRegs.addReg(AArch64::X17);
907 LiveRegs.addReg(AArch64::X18);
908 }
909
910 // Prefer X9 since it was historically used for the prologue scratch reg.
911 const MachineRegisterInfo &MRI = MF->getRegInfo();
912 if (LiveRegs.available(MRI, AArch64::X9))
913 return AArch64::X9;
914
915 for (unsigned Reg : AArch64::GPR64RegClass) {
916 if (LiveRegs.available(MRI, Reg))
917 return Reg;
918 }
919 return AArch64::NoRegister;
920}
921
923 const MachineBasicBlock &MBB) const {
924 const MachineFunction *MF = MBB.getParent();
925 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
926 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
927 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
928 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
930
931 if (AFI->hasSwiftAsyncContext()) {
932 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
933 const MachineRegisterInfo &MRI = MF->getRegInfo();
936 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
937 // available.
938 if (!LiveRegs.available(MRI, AArch64::X16) ||
939 !LiveRegs.available(MRI, AArch64::X17))
940 return false;
941 }
942
943 // Certain stack probing sequences might clobber flags, then we can't use
944 // the block as a prologue if the flags register is a live-in.
946 MBB.isLiveIn(AArch64::NZCV))
947 return false;
948
949 if (RegInfo->hasStackRealignment(*MF) || TLI->hasInlineStackProbe(*MF))
950 if (findScratchNonCalleeSaveRegister(TmpMBB) == AArch64::NoRegister)
951 return false;
952
953 // May need a scratch register (for return value) if require making a special
954 // call
955 if (requiresSaveVG(*MF) ||
956 windowsRequiresStackProbe(*MF, std::numeric_limits<uint64_t>::max()))
957 if (findScratchNonCalleeSaveRegister(TmpMBB, true) == AArch64::NoRegister)
958 return false;
959
960 return true;
961}
962
964 const Function &F = MF.getFunction();
965 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
966 F.needsUnwindTableEntry();
967}
968
969bool AArch64FrameLowering::shouldSignReturnAddressEverywhere(
970 const MachineFunction &MF) const {
971 // FIXME: With WinCFI, extra care should be taken to place SEH_PACSignLR
972 // and SEH_EpilogEnd instructions in the correct order.
974 return false;
977}
978
979// Given a load or a store instruction, generate an appropriate unwinding SEH
980// code on Windows.
982AArch64FrameLowering::insertSEH(MachineBasicBlock::iterator MBBI,
983 const TargetInstrInfo &TII,
984 MachineInstr::MIFlag Flag) const {
985 unsigned Opc = MBBI->getOpcode();
986 MachineBasicBlock *MBB = MBBI->getParent();
987 MachineFunction &MF = *MBB->getParent();
988 DebugLoc DL = MBBI->getDebugLoc();
989 unsigned ImmIdx = MBBI->getNumOperands() - 1;
990 int Imm = MBBI->getOperand(ImmIdx).getImm();
992 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
993 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
994
995 switch (Opc) {
996 default:
997 report_fatal_error("No SEH Opcode for this instruction");
998 case AArch64::STR_ZXI:
999 case AArch64::LDR_ZXI: {
1000 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1001 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveZReg))
1002 .addImm(Reg0)
1003 .addImm(Imm)
1004 .setMIFlag(Flag);
1005 break;
1006 }
1007 case AArch64::STR_PXI:
1008 case AArch64::LDR_PXI: {
1009 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1010 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SavePReg))
1011 .addImm(Reg0)
1012 .addImm(Imm)
1013 .setMIFlag(Flag);
1014 break;
1015 }
1016 case AArch64::LDPDpost:
1017 Imm = -Imm;
1018 [[fallthrough]];
1019 case AArch64::STPDpre: {
1020 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1021 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1022 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1023 .addImm(Reg0)
1024 .addImm(Reg1)
1025 .addImm(Imm * 8)
1026 .setMIFlag(Flag);
1027 break;
1028 }
1029 case AArch64::LDPXpost:
1030 Imm = -Imm;
1031 [[fallthrough]];
1032 case AArch64::STPXpre: {
1033 Register Reg0 = MBBI->getOperand(1).getReg();
1034 Register Reg1 = MBBI->getOperand(2).getReg();
1035 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1036 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1037 .addImm(Imm * 8)
1038 .setMIFlag(Flag);
1039 else
1040 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1041 .addImm(RegInfo->getSEHRegNum(Reg0))
1042 .addImm(RegInfo->getSEHRegNum(Reg1))
1043 .addImm(Imm * 8)
1044 .setMIFlag(Flag);
1045 break;
1046 }
1047 case AArch64::LDRDpost:
1048 Imm = -Imm;
1049 [[fallthrough]];
1050 case AArch64::STRDpre: {
1051 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1052 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1053 .addImm(Reg)
1054 .addImm(Imm)
1055 .setMIFlag(Flag);
1056 break;
1057 }
1058 case AArch64::LDRXpost:
1059 Imm = -Imm;
1060 [[fallthrough]];
1061 case AArch64::STRXpre: {
1062 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1063 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1064 .addImm(Reg)
1065 .addImm(Imm)
1066 .setMIFlag(Flag);
1067 break;
1068 }
1069 case AArch64::STPDi:
1070 case AArch64::LDPDi: {
1071 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1072 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1073 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1074 .addImm(Reg0)
1075 .addImm(Reg1)
1076 .addImm(Imm * 8)
1077 .setMIFlag(Flag);
1078 break;
1079 }
1080 case AArch64::STPXi:
1081 case AArch64::LDPXi: {
1082 Register Reg0 = MBBI->getOperand(0).getReg();
1083 Register Reg1 = MBBI->getOperand(1).getReg();
1084
1085 int SEHReg0 = RegInfo->getSEHRegNum(Reg0);
1086 int SEHReg1 = RegInfo->getSEHRegNum(Reg1);
1087
1088 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1089 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1090 .addImm(Imm * 8)
1091 .setMIFlag(Flag);
1092 else if (SEHReg0 >= 19 && SEHReg1 >= 19)
1093 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1094 .addImm(SEHReg0)
1095 .addImm(SEHReg1)
1096 .addImm(Imm * 8)
1097 .setMIFlag(Flag);
1098 else
1099 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegIP))
1100 .addImm(SEHReg0)
1101 .addImm(SEHReg1)
1102 .addImm(Imm * 8)
1103 .setMIFlag(Flag);
1104 break;
1105 }
1106 case AArch64::STRXui:
1107 case AArch64::LDRXui: {
1108 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1109 if (Reg >= 19)
1110 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1111 .addImm(Reg)
1112 .addImm(Imm * 8)
1113 .setMIFlag(Flag);
1114 else
1115 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegI))
1116 .addImm(Reg)
1117 .addImm(Imm * 8)
1118 .setMIFlag(Flag);
1119 break;
1120 }
1121 case AArch64::STRDui:
1122 case AArch64::LDRDui: {
1123 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1124 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1125 .addImm(Reg)
1126 .addImm(Imm * 8)
1127 .setMIFlag(Flag);
1128 break;
1129 }
1130 case AArch64::STPQi:
1131 case AArch64::LDPQi: {
1132 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1133 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1134 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1135 .addImm(Reg0)
1136 .addImm(Reg1)
1137 .addImm(Imm * 16)
1138 .setMIFlag(Flag);
1139 break;
1140 }
1141 case AArch64::LDPQpost:
1142 Imm = -Imm;
1143 [[fallthrough]];
1144 case AArch64::STPQpre: {
1145 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1146 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1147 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1148 .addImm(Reg0)
1149 .addImm(Reg1)
1150 .addImm(Imm * 16)
1151 .setMIFlag(Flag);
1152 break;
1153 }
1154 }
1155 auto I = MBB->insertAfter(MBBI, MIB);
1156 return I;
1157}
1158
1161 if (!AFI->needsDwarfUnwindInfo(MF) || !AFI->hasStreamingModeChanges())
1162 return false;
1163 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1164 // is enabled with streaming mode changes.
1165 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1166 if (ST.isTargetDarwin())
1167 return ST.hasSVE();
1168 return true;
1169}
1170
1171static bool isTargetWindows(const MachineFunction &MF) {
1173}
1174
1176 MachineFunction &MF) const {
1177 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1178 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1179
1180 auto EmitSignRA = [&](MachineBasicBlock &MBB) {
1181 DebugLoc DL; // Set debug location to unknown.
1183
1184 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1186 };
1187
1188 auto EmitAuthRA = [&](MachineBasicBlock &MBB) {
1189 DebugLoc DL;
1190 MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();
1191 if (MBBI != MBB.end())
1192 DL = MBBI->getDebugLoc();
1193
1194 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_EPILOGUE))
1196 };
1197
1198 // This should be in sync with PEIImpl::calculateSaveRestoreBlocks.
1199 EmitSignRA(MF.front());
1200 for (MachineBasicBlock &MBB : MF) {
1201 if (MBB.isEHFuncletEntry())
1202 EmitSignRA(MBB);
1203 if (MBB.isReturnBlock())
1204 EmitAuthRA(MBB);
1205 }
1206}
1207
1209 MachineBasicBlock &MBB) const {
1210 AArch64PrologueEmitter PrologueEmitter(MF, MBB, *this);
1211 PrologueEmitter.emitPrologue();
1212}
1213
1215 MachineBasicBlock &MBB) const {
1216 AArch64EpilogueEmitter EpilogueEmitter(MF, MBB, *this);
1217 EpilogueEmitter.emitEpilogue();
1218}
1219
1222 MF.getInfo<AArch64FunctionInfo>()->needsDwarfUnwindInfo(MF);
1223}
1224
1226 return enableCFIFixup(MF) &&
1227 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
1228}
1229
1230/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
1231/// debug info. It's the same as what we use for resolving the code-gen
1232/// references for now. FIXME: This can go wrong when references are
1233/// SP-relative and simple call frames aren't used.
1236 Register &FrameReg) const {
1238 MF, FI, FrameReg,
1239 /*PreferFP=*/
1240 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
1241 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
1242 /*ForSimm=*/false);
1243}
1244
1247 int FI) const {
1248 // This function serves to provide a comparable offset from a single reference
1249 // point (the value of SP at function entry) that can be used for analysis,
1250 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
1251 // correct for all objects in the presence of VLA-area objects or dynamic
1252 // stack re-alignment.
1253
1254 const auto &MFI = MF.getFrameInfo();
1255
1256 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1257 StackOffset ZPRStackSize = getZPRStackSize(MF);
1258 StackOffset PPRStackSize = getPPRStackSize(MF);
1259 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1260
1261 // For VLA-area objects, just emit an offset at the end of the stack frame.
1262 // Whilst not quite correct, these objects do live at the end of the frame and
1263 // so it is more useful for analysis for the offset to reflect this.
1264 if (MFI.isVariableSizedObjectIndex(FI)) {
1265 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
1266 }
1267
1268 // This is correct in the absence of any SVE stack objects.
1269 if (!SVEStackSize)
1270 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
1271
1272 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1273 bool FPAfterSVECalleeSaves =
1275 if (MFI.hasScalableStackID(FI)) {
1276 if (FPAfterSVECalleeSaves &&
1277 -ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1278 assert(!AFI->hasSplitSVEObjects() &&
1279 "split-sve-objects not supported with FPAfterSVECalleeSaves");
1280 return StackOffset::getScalable(ObjectOffset);
1281 }
1282 StackOffset AccessOffset{};
1283 // The scalable vectors are below (lower address) the scalable predicates
1284 // with split SVE objects, so we must subtract the size of the predicates.
1285 if (AFI->hasSplitSVEObjects() &&
1286 MFI.getStackID(FI) == TargetStackID::ScalableVector)
1287 AccessOffset = -PPRStackSize;
1288 return AccessOffset +
1289 StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
1290 ObjectOffset);
1291 }
1292
1293 bool IsFixed = MFI.isFixedObjectIndex(FI);
1294 bool IsCSR =
1295 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1296
1297 StackOffset ScalableOffset = {};
1298 if (!IsFixed && !IsCSR) {
1299 ScalableOffset = -SVEStackSize;
1300 } else if (FPAfterSVECalleeSaves && IsCSR) {
1301 ScalableOffset =
1303 }
1304
1305 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
1306}
1307
1313
1314StackOffset AArch64FrameLowering::getFPOffset(const MachineFunction &MF,
1315 int64_t ObjectOffset) const {
1316 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1317 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1318 const Function &F = MF.getFunction();
1319 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1320 unsigned FixedObject =
1321 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
1322 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
1323 int64_t FPAdjust =
1324 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
1325 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
1326}
1327
1328StackOffset AArch64FrameLowering::getStackOffset(const MachineFunction &MF,
1329 int64_t ObjectOffset) const {
1330 const auto &MFI = MF.getFrameInfo();
1331 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
1332}
1333
1334// TODO: This function currently does not work for scalable vectors.
1336 int FI) const {
1337 const AArch64RegisterInfo *RegInfo =
1338 MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
1339 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
1340 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
1341 ? getFPOffset(MF, ObjectOffset).getFixed()
1342 : getStackOffset(MF, ObjectOffset).getFixed();
1343}
1344
1346 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
1347 bool ForSimm) const {
1348 const auto &MFI = MF.getFrameInfo();
1349 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1350 bool isFixed = MFI.isFixedObjectIndex(FI);
1351 auto StackID = static_cast<TargetStackID::Value>(MFI.getStackID(FI));
1352 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, StackID,
1353 FrameReg, PreferFP, ForSimm);
1354}
1355
1357 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed,
1358 TargetStackID::Value StackID, Register &FrameReg, bool PreferFP,
1359 bool ForSimm) const {
1360 const auto &MFI = MF.getFrameInfo();
1361 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1362 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1363 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1364
1365 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
1366 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
1367 bool isCSR =
1368 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1369 bool isSVE = MFI.isScalableStackID(StackID);
1370
1371 StackOffset ZPRStackSize = getZPRStackSize(MF);
1372 StackOffset PPRStackSize = getPPRStackSize(MF);
1373 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1374
1375 // Use frame pointer to reference fixed objects. Use it for locals if
1376 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
1377 // reliable as a base). Make sure useFPForScavengingIndex() does the
1378 // right thing for the emergency spill slot.
1379 bool UseFP = false;
1380 if (AFI->hasStackFrame() && !isSVE) {
1381 // We shouldn't prefer using the FP to access fixed-sized stack objects when
1382 // there are scalable (SVE) objects in between the FP and the fixed-sized
1383 // objects.
1384 PreferFP &= !SVEStackSize;
1385
1386 // Note: Keeping the following as multiple 'if' statements rather than
1387 // merging to a single expression for readability.
1388 //
1389 // Argument access should always use the FP.
1390 if (isFixed) {
1391 UseFP = hasFP(MF);
1392 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
1393 // References to the CSR area must use FP if we're re-aligning the stack
1394 // since the dynamically-sized alignment padding is between the SP/BP and
1395 // the CSR area.
1396 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
1397 UseFP = true;
1398 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
1399 // If the FPOffset is negative and we're producing a signed immediate, we
1400 // have to keep in mind that the available offset range for negative
1401 // offsets is smaller than for positive ones. If an offset is available
1402 // via the FP and the SP, use whichever is closest.
1403 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
1404 PreferFP |= Offset > -FPOffset && !SVEStackSize;
1405
1406 if (FPOffset >= 0) {
1407 // If the FPOffset is positive, that'll always be best, as the SP/BP
1408 // will be even further away.
1409 UseFP = true;
1410 } else if (MFI.hasVarSizedObjects()) {
1411 // If we have variable sized objects, we can use either FP or BP, as the
1412 // SP offset is unknown. We can use the base pointer if we have one and
1413 // FP is not preferred. If not, we're stuck with using FP.
1414 bool CanUseBP = RegInfo->hasBasePointer(MF);
1415 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
1416 UseFP = PreferFP;
1417 else if (!CanUseBP) // Can't use BP. Forced to use FP.
1418 UseFP = true;
1419 // else we can use BP and FP, but the offset from FP won't fit.
1420 // That will make us scavenge registers which we can probably avoid by
1421 // using BP. If it won't fit for BP either, we'll scavenge anyway.
1422 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
1423 // Funclets access the locals contained in the parent's stack frame
1424 // via the frame pointer, so we have to use the FP in the parent
1425 // function.
1426 (void) Subtarget;
1427 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
1428 MF.getFunction().isVarArg()) &&
1429 "Funclets should only be present on Win64");
1430 UseFP = true;
1431 } else {
1432 // We have the choice between FP and (SP or BP).
1433 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
1434 UseFP = true;
1435 }
1436 }
1437 }
1438
1439 assert(
1440 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
1441 "In the presence of dynamic stack pointer realignment, "
1442 "non-argument/CSR objects cannot be accessed through the frame pointer");
1443
1444 bool FPAfterSVECalleeSaves =
1446
1447 if (isSVE) {
1448 StackOffset FPOffset = StackOffset::get(
1449 -AFI->getCalleeSaveBaseToFrameRecordOffset(), ObjectOffset);
1450 StackOffset SPOffset =
1451 SVEStackSize +
1452 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
1453 ObjectOffset);
1454
1455 // With split SVE objects the ObjectOffset is relative to the split area
1456 // (i.e. the PPR area or ZPR area respectively).
1457 if (AFI->hasSplitSVEObjects() && StackID == TargetStackID::ScalableVector) {
1458 // If we're accessing an SVE vector with split SVE objects...
1459 // - From the FP we need to move down past the PPR area:
1460 FPOffset -= PPRStackSize;
1461 // - From the SP we only need to move up to the ZPR area:
1462 SPOffset -= PPRStackSize;
1463 // Note: `SPOffset = SVEStackSize + ...`, so `-= PPRStackSize` results in
1464 // `SPOffset = ZPRStackSize + ...`.
1465 }
1466
1467 if (FPAfterSVECalleeSaves) {
1469 if (-ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1472 }
1473 }
1474
1475 // Always use the FP for SVE spills if available and beneficial.
1476 if (hasFP(MF) && (SPOffset.getFixed() ||
1477 FPOffset.getScalable() < SPOffset.getScalable() ||
1478 RegInfo->hasStackRealignment(MF))) {
1479 FrameReg = RegInfo->getFrameRegister(MF);
1480 return FPOffset;
1481 }
1482 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
1483 : MCRegister(AArch64::SP);
1484
1485 return SPOffset;
1486 }
1487
1488 StackOffset SVEAreaOffset = {};
1489 if (FPAfterSVECalleeSaves) {
1490 // In this stack layout, the FP is in between the callee saves and other
1491 // SVE allocations.
1492 StackOffset SVECalleeSavedStack =
1494 if (UseFP) {
1495 if (isFixed)
1496 SVEAreaOffset = SVECalleeSavedStack;
1497 else if (!isCSR)
1498 SVEAreaOffset = SVECalleeSavedStack - SVEStackSize;
1499 } else {
1500 if (isFixed)
1501 SVEAreaOffset = SVEStackSize;
1502 else if (isCSR)
1503 SVEAreaOffset = SVEStackSize - SVECalleeSavedStack;
1504 }
1505 } else {
1506 if (UseFP && !(isFixed || isCSR))
1507 SVEAreaOffset = -SVEStackSize;
1508 if (!UseFP && (isFixed || isCSR))
1509 SVEAreaOffset = SVEStackSize;
1510 }
1511
1512 if (UseFP) {
1513 FrameReg = RegInfo->getFrameRegister(MF);
1514 return StackOffset::getFixed(FPOffset) + SVEAreaOffset;
1515 }
1516
1517 // Use the base pointer if we have one.
1518 if (RegInfo->hasBasePointer(MF))
1519 FrameReg = RegInfo->getBaseRegister();
1520 else {
1521 assert(!MFI.hasVarSizedObjects() &&
1522 "Can't use SP when we have var sized objects.");
1523 FrameReg = AArch64::SP;
1524 // If we're using the red zone for this function, the SP won't actually
1525 // be adjusted, so the offsets will be negative. They're also all
1526 // within range of the signed 9-bit immediate instructions.
1527 if (canUseRedZone(MF))
1528 Offset -= AFI->getLocalStackSize();
1529 }
1530
1531 return StackOffset::getFixed(Offset) + SVEAreaOffset;
1532}
1533
1534static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
1535 // Do not set a kill flag on values that are also marked as live-in. This
1536 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
1537 // callee saved registers.
1538 // Omitting the kill flags is conservatively correct even if the live-in
1539 // is not used after all.
1540 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
1541 return getKillRegState(!IsLiveIn);
1542}
1543
1545 MachineFunction &MF) {
1546 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1547 AttributeList Attrs = MF.getFunction().getAttributes();
1549 return Subtarget.isTargetMachO() &&
1550 !(Subtarget.getTargetLowering()->supportSwiftError() &&
1551 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
1553 !AFL.requiresSaveVG(MF) && !AFI->isSVECC();
1554}
1555
1556static bool invalidateWindowsRegisterPairing(bool SpillExtendedVolatile,
1557 unsigned SpillCount, unsigned Reg1,
1558 unsigned Reg2, bool NeedsWinCFI,
1559 bool IsFirst,
1560 const TargetRegisterInfo *TRI) {
1561 // If we are generating register pairs for a Windows function that requires
1562 // EH support, then pair consecutive registers only. There are no unwind
1563 // opcodes for saves/restores of non-consecutive register pairs.
1564 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
1565 // save_lrpair.
1566 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
1567
1568 if (Reg2 == AArch64::FP)
1569 return true;
1570 if (!NeedsWinCFI)
1571 return false;
1572
1573 // ARM64EC introduced `save_any_regp`, which expects 16-byte alignment.
1574 // This is handled by only allowing paired spills for registers spilled at
1575 // even positions (which should be 16-byte aligned, as other GPRs/FPRs are
1576 // 8-bytes). We carve out an exception for {FP,LR}, which does not require
1577 // 16-byte alignment in the uop representation.
1578 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
1579 return SpillExtendedVolatile
1580 ? !((Reg1 == AArch64::FP && Reg2 == AArch64::LR) ||
1581 (SpillCount % 2) == 0)
1582 : false;
1583
1584 // If pairing a GPR with LR, the pair can be described by the save_lrpair
1585 // opcode. If this is the first register pair, it would end up with a
1586 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
1587 // if LR is paired with something else than the first register.
1588 // The save_lrpair opcode requires the first register to be an odd one.
1589 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
1590 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
1591 return false;
1592 return true;
1593}
1594
1595/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
1596/// WindowsCFI requires that only consecutive registers can be paired.
1597/// LR and FP need to be allocated together when the frame needs to save
1598/// the frame-record. This means any other register pairing with LR is invalid.
1599static bool invalidateRegisterPairing(bool SpillExtendedVolatile,
1600 unsigned SpillCount, unsigned Reg1,
1601 unsigned Reg2, bool UsesWinAAPCS,
1602 bool NeedsWinCFI, bool NeedsFrameRecord,
1603 bool IsFirst,
1604 const TargetRegisterInfo *TRI) {
1605 if (UsesWinAAPCS)
1606 return invalidateWindowsRegisterPairing(SpillExtendedVolatile, SpillCount,
1607 Reg1, Reg2, NeedsWinCFI, IsFirst,
1608 TRI);
1609
1610 // If we need to store the frame record, don't pair any register
1611 // with LR other than FP.
1612 if (NeedsFrameRecord)
1613 return Reg2 == AArch64::LR;
1614
1615 return false;
1616}
1617
1618namespace {
1619
1620struct RegPairInfo {
1621 Register Reg1;
1622 Register Reg2;
1623 int FrameIdx;
1624 int Offset;
1625 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
1626 const TargetRegisterClass *RC;
1627
1628 RegPairInfo() = default;
1629
1630 bool isPaired() const { return Reg2.isValid(); }
1631
1632 bool isScalable() const { return Type == PPR || Type == ZPR; }
1633};
1634
1635} // end anonymous namespace
1636
1638 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
1639 if (SavedRegs.test(PReg)) {
1640 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
1641 return MCRegister(PNReg);
1642 }
1643 }
1644 return MCRegister();
1645}
1646
1647// The multivector LD/ST are available only for SME or SVE2p1 targets
1649 MachineFunction &MF) {
1651 return false;
1652
1653 SMEAttrs FuncAttrs = MF.getInfo<AArch64FunctionInfo>()->getSMEFnAttrs();
1654 bool IsLocallyStreaming =
1655 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
1656
1657 // Only when in streaming mode SME2 instructions can be safely used.
1658 // It is not safe to use SME2 instructions when in streaming compatible or
1659 // locally streaming mode.
1660 return Subtarget.hasSVE2p1() ||
1661 (Subtarget.hasSME2() &&
1662 (!IsLocallyStreaming && Subtarget.isStreaming()));
1663}
1664
1666 MachineFunction &MF,
1668 const TargetRegisterInfo *TRI,
1670 bool NeedsFrameRecord) {
1671
1672 if (CSI.empty())
1673 return;
1674
1675 bool IsWindows = isTargetWindows(MF);
1676 bool NeedsWinCFI = AFL.needsWinCFI(MF);
1678 unsigned StackHazardSize = getStackHazardSize(MF);
1679 MachineFrameInfo &MFI = MF.getFrameInfo();
1681 unsigned Count = CSI.size();
1682 (void)CC;
1683 // MachO's compact unwind format relies on all registers being stored in
1684 // pairs.
1685 assert((!produceCompactUnwindFrame(AFL, MF) ||
1688 (Count & 1) == 0) &&
1689 "Odd number of callee-saved regs to spill!");
1690 int ByteOffset = AFI->getCalleeSavedStackSize();
1691 int StackFillDir = -1;
1692 int RegInc = 1;
1693 unsigned FirstReg = 0;
1694 if (NeedsWinCFI) {
1695 // For WinCFI, fill the stack from the bottom up.
1696 ByteOffset = 0;
1697 StackFillDir = 1;
1698 // As the CSI array is reversed to match PrologEpilogInserter, iterate
1699 // backwards, to pair up registers starting from lower numbered registers.
1700 RegInc = -1;
1701 FirstReg = Count - 1;
1702 }
1703
1704 bool FPAfterSVECalleeSaves = IsWindows && AFI->getSVECalleeSavedStackSize();
1705 // Windows AAPCS has x9-x15 as volatile registers, x16-x17 as intra-procedural
1706 // scratch, x18 as platform reserved. However, clang has extended calling
1707 // convensions such as preserve_most and preserve_all which treat these as
1708 // CSR. As such, the ARM64 unwind uOPs bias registers by 19. We use ARM64EC
1709 // uOPs which have separate restrictions. We need to check for that.
1710 //
1711 // NOTE: we currently do not account for the D registers as LLVM does not
1712 // support non-ABI compliant D register spills.
1713 bool SpillExtendedVolatile =
1714 IsWindows && llvm::any_of(CSI, [](const CalleeSavedInfo &CSI) {
1715 const auto &Reg = CSI.getReg();
1716 return Reg >= AArch64::X0 && Reg <= AArch64::X18;
1717 });
1718
1719 int ZPRByteOffset = 0;
1720 int PPRByteOffset = 0;
1721 bool SplitPPRs = AFI->hasSplitSVEObjects();
1722 if (SplitPPRs) {
1723 ZPRByteOffset = AFI->getZPRCalleeSavedStackSize();
1724 PPRByteOffset = AFI->getPPRCalleeSavedStackSize();
1725 } else if (!FPAfterSVECalleeSaves) {
1726 ZPRByteOffset =
1728 // Unused: Everything goes in ZPR space.
1729 PPRByteOffset = 0;
1730 }
1731
1732 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
1733 Register LastReg = 0;
1734 bool HasCSHazardPadding = AFI->hasStackHazardSlotIndex() && !SplitPPRs;
1735
1736 // When iterating backwards, the loop condition relies on unsigned wraparound.
1737 for (unsigned i = FirstReg; i < Count; i += RegInc) {
1738 RegPairInfo RPI;
1739 RPI.Reg1 = CSI[i].getReg();
1740
1741 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
1742 RPI.Type = RegPairInfo::GPR;
1743 RPI.RC = &AArch64::GPR64RegClass;
1744 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
1745 RPI.Type = RegPairInfo::FPR64;
1746 RPI.RC = &AArch64::FPR64RegClass;
1747 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
1748 RPI.Type = RegPairInfo::FPR128;
1749 RPI.RC = &AArch64::FPR128RegClass;
1750 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
1751 RPI.Type = RegPairInfo::ZPR;
1752 RPI.RC = &AArch64::ZPRRegClass;
1753 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
1754 RPI.Type = RegPairInfo::PPR;
1755 RPI.RC = &AArch64::PPRRegClass;
1756 } else if (RPI.Reg1 == AArch64::VG) {
1757 RPI.Type = RegPairInfo::VG;
1758 RPI.RC = &AArch64::FIXED_REGSRegClass;
1759 } else {
1760 llvm_unreachable("Unsupported register class.");
1761 }
1762
1763 int &ScalableByteOffset = RPI.Type == RegPairInfo::PPR && SplitPPRs
1764 ? PPRByteOffset
1765 : ZPRByteOffset;
1766
1767 // Add the stack hazard size as we transition from GPR->FPR CSRs.
1768 if (HasCSHazardPadding &&
1769 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
1771 ByteOffset += StackFillDir * StackHazardSize;
1772 LastReg = RPI.Reg1;
1773
1774 int Scale = TRI->getSpillSize(*RPI.RC);
1775 // Add the next reg to the pair if it is in the same register class.
1776 if (unsigned(i + RegInc) < Count && !HasCSHazardPadding) {
1777 MCRegister NextReg = CSI[i + RegInc].getReg();
1778 bool IsFirst = i == FirstReg;
1779 unsigned SpillCount = NeedsWinCFI ? FirstReg - i : i;
1780 switch (RPI.Type) {
1781 case RegPairInfo::GPR:
1782 if (AArch64::GPR64RegClass.contains(NextReg) &&
1784 SpillExtendedVolatile, SpillCount, RPI.Reg1, NextReg, IsWindows,
1785 NeedsWinCFI, NeedsFrameRecord, IsFirst, TRI))
1786 RPI.Reg2 = NextReg;
1787 break;
1788 case RegPairInfo::FPR64:
1789 if (AArch64::FPR64RegClass.contains(NextReg) &&
1790 !invalidateWindowsRegisterPairing(SpillExtendedVolatile, SpillCount,
1791 RPI.Reg1, NextReg, NeedsWinCFI,
1792 IsFirst, TRI))
1793 RPI.Reg2 = NextReg;
1794 break;
1795 case RegPairInfo::FPR128:
1796 if (AArch64::FPR128RegClass.contains(NextReg))
1797 RPI.Reg2 = NextReg;
1798 break;
1799 case RegPairInfo::PPR:
1800 break;
1801 case RegPairInfo::ZPR:
1802 if (AFI->getPredicateRegForFillSpill() != 0 &&
1803 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
1804 // Calculate offset of register pair to see if pair instruction can be
1805 // used.
1806 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
1807 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
1808 RPI.Reg2 = NextReg;
1809 }
1810 break;
1811 case RegPairInfo::VG:
1812 break;
1813 }
1814 }
1815
1816 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
1817 // list to come in sorted by frame index so that we can issue the store
1818 // pair instructions directly. Assert if we see anything otherwise.
1819 //
1820 // The order of the registers in the list is controlled by
1821 // getCalleeSavedRegs(), so they will always be in-order, as well.
1822 assert((!RPI.isPaired() ||
1823 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
1824 "Out of order callee saved regs!");
1825
1826 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
1827 RPI.Reg1 == AArch64::LR) &&
1828 "FrameRecord must be allocated together with LR");
1829
1830 // Windows AAPCS has FP and LR reversed.
1831 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
1832 RPI.Reg2 == AArch64::LR) &&
1833 "FrameRecord must be allocated together with LR");
1834
1835 // MachO's compact unwind format relies on all registers being stored in
1836 // adjacent register pairs.
1837 assert((!produceCompactUnwindFrame(AFL, MF) ||
1840 (RPI.isPaired() &&
1841 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
1842 RPI.Reg1 + 1 == RPI.Reg2))) &&
1843 "Callee-save registers not saved as adjacent register pair!");
1844
1845 RPI.FrameIdx = CSI[i].getFrameIdx();
1846 if (NeedsWinCFI &&
1847 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
1848 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
1849
1850 // Realign the scalable offset if necessary. This is relevant when
1851 // spilling predicates on Windows.
1852 if (RPI.isScalable() && ScalableByteOffset % Scale != 0) {
1853 ScalableByteOffset = alignTo(ScalableByteOffset, Scale);
1854 }
1855
1856 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1857 assert(OffsetPre % Scale == 0);
1858
1859 if (RPI.isScalable())
1860 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1861 else
1862 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1863
1864 // Swift's async context is directly before FP, so allocate an extra
1865 // 8 bytes for it.
1866 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1867 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1868 (IsWindows && RPI.Reg2 == AArch64::LR)))
1869 ByteOffset += StackFillDir * 8;
1870
1871 // Round up size of non-pair to pair size if we need to pad the
1872 // callee-save area to ensure 16-byte alignment.
1873 if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
1874 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
1875 ByteOffset % 16 != 0) {
1876 ByteOffset += 8 * StackFillDir;
1877 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
1878 // A stack frame with a gap looks like this, bottom up:
1879 // d9, d8. x21, gap, x20, x19.
1880 // Set extra alignment on the x21 object to create the gap above it.
1881 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
1882 NeedGapToAlignStack = false;
1883 }
1884
1885 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1886 assert(OffsetPost % Scale == 0);
1887 // If filling top down (default), we want the offset after incrementing it.
1888 // If filling bottom up (WinCFI) we need the original offset.
1889 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
1890
1891 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
1892 // Swift context can directly precede FP.
1893 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1894 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1895 (IsWindows && RPI.Reg2 == AArch64::LR)))
1896 Offset += 8;
1897 RPI.Offset = Offset / Scale;
1898
1899 assert((!RPI.isPaired() ||
1900 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
1901 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
1902 "Offset out of bounds for LDP/STP immediate");
1903
1904 auto isFrameRecord = [&] {
1905 if (RPI.isPaired())
1906 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
1907 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
1908 // Otherwise, look for the frame record as two unpaired registers. This is
1909 // needed for -aarch64-stack-hazard-size=<val>, which disables register
1910 // pairing (as the padding may be too large for the LDP/STP offset). Note:
1911 // On Windows, this check works out as current reg == FP, next reg == LR,
1912 // and on other platforms current reg == FP, previous reg == LR. This
1913 // works out as the correct pre-increment or post-increment offsets
1914 // respectively.
1915 return i > 0 && RPI.Reg1 == AArch64::FP &&
1916 CSI[i - 1].getReg() == AArch64::LR;
1917 };
1918
1919 // Save the offset to frame record so that the FP register can point to the
1920 // innermost frame record (spilled FP and LR registers).
1921 if (NeedsFrameRecord && isFrameRecord())
1923
1924 RegPairs.push_back(RPI);
1925 if (RPI.isPaired())
1926 i += RegInc;
1927 }
1928 if (NeedsWinCFI) {
1929 // If we need an alignment gap in the stack, align the topmost stack
1930 // object. A stack frame with a gap looks like this, bottom up:
1931 // x19, d8. d9, gap.
1932 // Set extra alignment on the topmost stack object (the first element in
1933 // CSI, which goes top down), to create the gap above it.
1934 if (AFI->hasCalleeSaveStackFreeSpace())
1935 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
1936 // We iterated bottom up over the registers; flip RegPairs back to top
1937 // down order.
1938 std::reverse(RegPairs.begin(), RegPairs.end());
1939 }
1940}
1941
1945 MachineFunction &MF = *MBB.getParent();
1946 auto &TLI = *MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
1948 bool NeedsWinCFI = needsWinCFI(MF);
1949 DebugLoc DL;
1951
1952 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
1953
1955 // Refresh the reserved regs in case there are any potential changes since the
1956 // last freeze.
1957 MRI.freezeReservedRegs();
1958
1959 if (homogeneousPrologEpilog(MF)) {
1960 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
1962
1963 for (auto &RPI : RegPairs) {
1964 MIB.addReg(RPI.Reg1);
1965 MIB.addReg(RPI.Reg2);
1966
1967 // Update register live in.
1968 if (!MRI.isReserved(RPI.Reg1))
1969 MBB.addLiveIn(RPI.Reg1);
1970 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
1971 MBB.addLiveIn(RPI.Reg2);
1972 }
1973 return true;
1974 }
1975 bool PTrueCreated = false;
1976 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
1977 Register Reg1 = RPI.Reg1;
1978 Register Reg2 = RPI.Reg2;
1979 unsigned StrOpc;
1980
1981 // Issue sequence of spills for cs regs. The first spill may be converted
1982 // to a pre-decrement store later by emitPrologue if the callee-save stack
1983 // area allocation can't be combined with the local stack area allocation.
1984 // For example:
1985 // stp x22, x21, [sp, #0] // addImm(+0)
1986 // stp x20, x19, [sp, #16] // addImm(+2)
1987 // stp fp, lr, [sp, #32] // addImm(+4)
1988 // Rationale: This sequence saves uop updates compared to a sequence of
1989 // pre-increment spills like stp xi,xj,[sp,#-16]!
1990 // Note: Similar rationale and sequence for restores in epilog.
1991 unsigned Size = TRI->getSpillSize(*RPI.RC);
1992 Align Alignment = TRI->getSpillAlign(*RPI.RC);
1993 switch (RPI.Type) {
1994 case RegPairInfo::GPR:
1995 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
1996 break;
1997 case RegPairInfo::FPR64:
1998 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
1999 break;
2000 case RegPairInfo::FPR128:
2001 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
2002 break;
2003 case RegPairInfo::ZPR:
2004 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
2005 break;
2006 case RegPairInfo::PPR:
2007 StrOpc = AArch64::STR_PXI;
2008 break;
2009 case RegPairInfo::VG:
2010 StrOpc = AArch64::STRXui;
2011 break;
2012 }
2013
2014 Register X0Scratch;
2015 auto RestoreX0 = make_scope_exit([&] {
2016 if (X0Scratch != AArch64::NoRegister)
2017 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), AArch64::X0)
2018 .addReg(X0Scratch)
2020 });
2021
2022 if (Reg1 == AArch64::VG) {
2023 // Find an available register to store value of VG to.
2024 Reg1 = findScratchNonCalleeSaveRegister(&MBB, true);
2025 assert(Reg1 != AArch64::NoRegister);
2026 if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
2027 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
2028 .addImm(31)
2029 .addImm(1)
2031 } else {
2033 if (any_of(MBB.liveins(),
2034 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
2035 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
2036 AArch64::X0, LiveIn.PhysReg);
2037 })) {
2038 X0Scratch = Reg1;
2039 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), X0Scratch)
2040 .addReg(AArch64::X0)
2042 }
2043
2044 RTLIB::Libcall LC = RTLIB::SMEABI_GET_CURRENT_VG;
2045 const uint32_t *RegMask =
2046 TRI->getCallPreservedMask(MF, TLI.getLibcallCallingConv(LC));
2047 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
2048 .addExternalSymbol(TLI.getLibcallName(LC))
2049 .addRegMask(RegMask)
2050 .addReg(AArch64::X0, RegState::ImplicitDefine)
2052 Reg1 = AArch64::X0;
2053 }
2054 }
2055
2056 LLVM_DEBUG({
2057 dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
2058 if (RPI.isPaired())
2059 dbgs() << ", " << printReg(Reg2, TRI);
2060 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2061 if (RPI.isPaired())
2062 dbgs() << ", " << RPI.FrameIdx + 1;
2063 dbgs() << ")\n";
2064 });
2065
2066 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
2067 "Windows unwdinding requires a consecutive (FP,LR) pair");
2068 // Windows unwind codes require consecutive registers if registers are
2069 // paired. Make the switch here, so that the code below will save (x,x+1)
2070 // and not (x+1,x).
2071 unsigned FrameIdxReg1 = RPI.FrameIdx;
2072 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2073 if (NeedsWinCFI && RPI.isPaired()) {
2074 std::swap(Reg1, Reg2);
2075 std::swap(FrameIdxReg1, FrameIdxReg2);
2076 }
2077
2078 if (RPI.isPaired() && RPI.isScalable()) {
2079 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2082 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2083 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2084 "Expects SVE2.1 or SME2 target and a predicate register");
2085#ifdef EXPENSIVE_CHECKS
2086 auto IsPPR = [](const RegPairInfo &c) {
2087 return c.Reg1 == RegPairInfo::PPR;
2088 };
2089 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
2090 auto IsZPR = [](const RegPairInfo &c) {
2091 return c.Type == RegPairInfo::ZPR;
2092 };
2093 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
2094 assert(!(PPRBegin < ZPRBegin) &&
2095 "Expected callee save predicate to be handled first");
2096#endif
2097 if (!PTrueCreated) {
2098 PTrueCreated = true;
2099 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2101 }
2102 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2103 if (!MRI.isReserved(Reg1))
2104 MBB.addLiveIn(Reg1);
2105 if (!MRI.isReserved(Reg2))
2106 MBB.addLiveIn(Reg2);
2107 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
2109 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2110 MachineMemOperand::MOStore, Size, Alignment));
2111 MIB.addReg(PnReg);
2112 MIB.addReg(AArch64::SP)
2113 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
2114 // where 2*vscale is implicit
2117 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2118 MachineMemOperand::MOStore, Size, Alignment));
2119 if (NeedsWinCFI)
2120 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2121 } else { // The code when the pair of ZReg is not present
2122 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2123 if (!MRI.isReserved(Reg1))
2124 MBB.addLiveIn(Reg1);
2125 if (RPI.isPaired()) {
2126 if (!MRI.isReserved(Reg2))
2127 MBB.addLiveIn(Reg2);
2128 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2130 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2131 MachineMemOperand::MOStore, Size, Alignment));
2132 }
2133 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2134 .addReg(AArch64::SP)
2135 .addImm(RPI.Offset) // [sp, #offset*vscale],
2136 // where factor*vscale is implicit
2139 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2140 MachineMemOperand::MOStore, Size, Alignment));
2141 if (NeedsWinCFI)
2142 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2143 }
2144 // Update the StackIDs of the SVE stack slots.
2145 MachineFrameInfo &MFI = MF.getFrameInfo();
2146 if (RPI.Type == RegPairInfo::ZPR) {
2147 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
2148 if (RPI.isPaired())
2149 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
2150 } else if (RPI.Type == RegPairInfo::PPR) {
2152 if (RPI.isPaired())
2154 }
2155 }
2156 return true;
2157}
2158
2162 MachineFunction &MF = *MBB.getParent();
2164 DebugLoc DL;
2166 bool NeedsWinCFI = needsWinCFI(MF);
2167
2168 if (MBBI != MBB.end())
2169 DL = MBBI->getDebugLoc();
2170
2171 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
2172 if (homogeneousPrologEpilog(MF, &MBB)) {
2173 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
2175 for (auto &RPI : RegPairs) {
2176 MIB.addReg(RPI.Reg1, RegState::Define);
2177 MIB.addReg(RPI.Reg2, RegState::Define);
2178 }
2179 return true;
2180 }
2181
2182 // For performance reasons restore SVE register in increasing order
2183 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
2184 auto PPRBegin = llvm::find_if(RegPairs, IsPPR);
2185 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
2186 std::reverse(PPRBegin, PPREnd);
2187 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
2188 auto ZPRBegin = llvm::find_if(RegPairs, IsZPR);
2189 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
2190 std::reverse(ZPRBegin, ZPREnd);
2191
2192 bool PTrueCreated = false;
2193 for (const RegPairInfo &RPI : RegPairs) {
2194 Register Reg1 = RPI.Reg1;
2195 Register Reg2 = RPI.Reg2;
2196
2197 // Issue sequence of restores for cs regs. The last restore may be converted
2198 // to a post-increment load later by emitEpilogue if the callee-save stack
2199 // area allocation can't be combined with the local stack area allocation.
2200 // For example:
2201 // ldp fp, lr, [sp, #32] // addImm(+4)
2202 // ldp x20, x19, [sp, #16] // addImm(+2)
2203 // ldp x22, x21, [sp, #0] // addImm(+0)
2204 // Note: see comment in spillCalleeSavedRegisters()
2205 unsigned LdrOpc;
2206 unsigned Size = TRI->getSpillSize(*RPI.RC);
2207 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2208 switch (RPI.Type) {
2209 case RegPairInfo::GPR:
2210 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2211 break;
2212 case RegPairInfo::FPR64:
2213 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
2214 break;
2215 case RegPairInfo::FPR128:
2216 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
2217 break;
2218 case RegPairInfo::ZPR:
2219 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
2220 break;
2221 case RegPairInfo::PPR:
2222 LdrOpc = AArch64::LDR_PXI;
2223 break;
2224 case RegPairInfo::VG:
2225 continue;
2226 }
2227 LLVM_DEBUG({
2228 dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
2229 if (RPI.isPaired())
2230 dbgs() << ", " << printReg(Reg2, TRI);
2231 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2232 if (RPI.isPaired())
2233 dbgs() << ", " << RPI.FrameIdx + 1;
2234 dbgs() << ")\n";
2235 });
2236
2237 // Windows unwind codes require consecutive registers if registers are
2238 // paired. Make the switch here, so that the code below will save (x,x+1)
2239 // and not (x+1,x).
2240 unsigned FrameIdxReg1 = RPI.FrameIdx;
2241 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2242 if (NeedsWinCFI && RPI.isPaired()) {
2243 std::swap(Reg1, Reg2);
2244 std::swap(FrameIdxReg1, FrameIdxReg2);
2245 }
2246
2248 if (RPI.isPaired() && RPI.isScalable()) {
2249 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2251 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2252 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2253 "Expects SVE2.1 or SME2 target and a predicate register");
2254#ifdef EXPENSIVE_CHECKS
2255 assert(!(PPRBegin < ZPRBegin) &&
2256 "Expected callee save predicate to be handled first");
2257#endif
2258 if (!PTrueCreated) {
2259 PTrueCreated = true;
2260 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2262 }
2263 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2264 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
2265 getDefRegState(true));
2267 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2268 MachineMemOperand::MOLoad, Size, Alignment));
2269 MIB.addReg(PnReg);
2270 MIB.addReg(AArch64::SP)
2271 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
2272 // where 2*vscale is implicit
2275 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2276 MachineMemOperand::MOLoad, Size, Alignment));
2277 if (NeedsWinCFI)
2278 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2279 } else {
2280 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2281 if (RPI.isPaired()) {
2282 MIB.addReg(Reg2, getDefRegState(true));
2284 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2285 MachineMemOperand::MOLoad, Size, Alignment));
2286 }
2287 MIB.addReg(Reg1, getDefRegState(true));
2288 MIB.addReg(AArch64::SP)
2289 .addImm(RPI.Offset) // [sp, #offset*vscale]
2290 // where factor*vscale is implicit
2293 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2294 MachineMemOperand::MOLoad, Size, Alignment));
2295 if (NeedsWinCFI)
2296 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2297 }
2298 }
2299 return true;
2300}
2301
2302// Return the FrameID for a MMO.
2303static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
2304 const MachineFrameInfo &MFI) {
2305 auto *PSV =
2307 if (PSV)
2308 return std::optional<int>(PSV->getFrameIndex());
2309
2310 if (MMO->getValue()) {
2311 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
2312 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
2313 FI++)
2314 if (MFI.getObjectAllocation(FI) == Al)
2315 return FI;
2316 }
2317 }
2318
2319 return std::nullopt;
2320}
2321
2322// Return the FrameID for a Load/Store instruction by looking at the first MMO.
2323static std::optional<int> getLdStFrameID(const MachineInstr &MI,
2324 const MachineFrameInfo &MFI) {
2325 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
2326 return std::nullopt;
2327
2328 return getMMOFrameID(*MI.memoperands_begin(), MFI);
2329}
2330
2331// Returns true if the LDST MachineInstr \p MI is a PPR access.
2332static bool isPPRAccess(const MachineInstr &MI) {
2333 return AArch64::PPRRegClass.contains(MI.getOperand(0).getReg());
2334}
2335
2336// Check if a Hazard slot is needed for the current function, and if so create
2337// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
2338// which can be used to determine if any hazard padding is needed.
2339void AArch64FrameLowering::determineStackHazardSlot(
2340 MachineFunction &MF, BitVector &SavedRegs) const {
2341 unsigned StackHazardSize = getStackHazardSize(MF);
2342 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2343 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
2345 return;
2346
2347 // Stack hazards are only needed in streaming functions.
2348 SMEAttrs Attrs = AFI->getSMEFnAttrs();
2349 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
2350 return;
2351
2352 MachineFrameInfo &MFI = MF.getFrameInfo();
2353
2354 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
2355 // stack objects.
2356 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2357 return AArch64::FPR64RegClass.contains(Reg) ||
2358 AArch64::FPR128RegClass.contains(Reg) ||
2359 AArch64::ZPRRegClass.contains(Reg);
2360 });
2361 bool HasPPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2362 return AArch64::PPRRegClass.contains(Reg);
2363 });
2364 bool HasFPRStackObjects = false;
2365 bool HasPPRStackObjects = false;
2366 if (!HasFPRCSRs || SplitSVEObjects) {
2367 enum SlotType : uint8_t {
2368 Unknown = 0,
2369 ZPRorFPR = 1 << 0,
2370 PPR = 1 << 1,
2371 GPR = 1 << 2,
2373 };
2374
2375 // Find stack slots solely used for one kind of register (ZPR, PPR, etc.),
2376 // based on the kinds of accesses used in the function.
2377 SmallVector<SlotType> SlotTypes(MFI.getObjectIndexEnd(), SlotType::Unknown);
2378 for (auto &MBB : MF) {
2379 for (auto &MI : MBB) {
2380 std::optional<int> FI = getLdStFrameID(MI, MFI);
2381 if (!FI || FI < 0 || FI > int(SlotTypes.size()))
2382 continue;
2383 if (MFI.hasScalableStackID(*FI)) {
2384 SlotTypes[*FI] |=
2385 isPPRAccess(MI) ? SlotType::PPR : SlotType::ZPRorFPR;
2386 } else {
2387 SlotTypes[*FI] |= AArch64InstrInfo::isFpOrNEON(MI)
2388 ? SlotType::ZPRorFPR
2389 : SlotType::GPR;
2390 }
2391 }
2392 }
2393
2394 for (int FI = 0; FI < int(SlotTypes.size()); ++FI) {
2395 HasFPRStackObjects |= SlotTypes[FI] == SlotType::ZPRorFPR;
2396 // For SplitSVEObjects remember that this stack slot is a predicate, this
2397 // will be needed later when determining the frame layout.
2398 if (SlotTypes[FI] == SlotType::PPR) {
2400 HasPPRStackObjects = true;
2401 }
2402 }
2403 }
2404
2405 if (HasFPRCSRs || HasFPRStackObjects) {
2406 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
2407 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
2408 << StackHazardSize << "\n");
2410 }
2411
2412 if (!AFI->hasStackHazardSlotIndex())
2413 return;
2414
2415 if (SplitSVEObjects) {
2416 CallingConv::ID CC = MF.getFunction().getCallingConv();
2417 if (AFI->isSVECC() || CC == CallingConv::AArch64_SVE_VectorCall) {
2418 AFI->setSplitSVEObjects(true);
2419 LLVM_DEBUG(dbgs() << "Using SplitSVEObjects for SVE CC function\n");
2420 return;
2421 }
2422
2423 // We only use SplitSVEObjects in non-SVE CC functions if there's a
2424 // possibility of a stack hazard between PPRs and ZPRs/FPRs.
2425 LLVM_DEBUG(dbgs() << "Determining if SplitSVEObjects should be used in "
2426 "non-SVE CC function...\n");
2427
2428 // If another calling convention is explicitly set FPRs can't be promoted to
2429 // ZPR callee-saves.
2431 LLVM_DEBUG(
2432 dbgs()
2433 << "Calling convention is not supported with SplitSVEObjects\n");
2434 return;
2435 }
2436
2437 if (!HasPPRCSRs && !HasPPRStackObjects) {
2438 LLVM_DEBUG(
2439 dbgs() << "Not using SplitSVEObjects as no PPRs are on the stack\n");
2440 return;
2441 }
2442
2443 if (!HasFPRCSRs && !HasFPRStackObjects) {
2444 LLVM_DEBUG(
2445 dbgs()
2446 << "Not using SplitSVEObjects as no FPRs or ZPRs are on the stack\n");
2447 return;
2448 }
2449
2450 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2451 MF.getSubtarget<AArch64Subtarget>();
2453 "Expected SVE to be available for PPRs");
2454
2455 const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
2456 // With SplitSVEObjects the CS hazard padding is placed between the
2457 // PPRs and ZPRs. If there are any FPR CS there would be a hazard between
2458 // them and the CS GRPs. Avoid this by promoting all FPR CS to ZPRs.
2459 BitVector FPRZRegs(SavedRegs.size());
2460 for (size_t Reg = 0, E = SavedRegs.size(); HasFPRCSRs && Reg < E; ++Reg) {
2461 BitVector::reference RegBit = SavedRegs[Reg];
2462 if (!RegBit)
2463 continue;
2464 unsigned SubRegIdx = 0;
2465 if (AArch64::FPR64RegClass.contains(Reg))
2466 SubRegIdx = AArch64::dsub;
2467 else if (AArch64::FPR128RegClass.contains(Reg))
2468 SubRegIdx = AArch64::zsub;
2469 else
2470 continue;
2471 // Clear the bit for the FPR save.
2472 RegBit = false;
2473 // Mark that we should save the corresponding ZPR.
2474 Register ZReg =
2475 TRI->getMatchingSuperReg(Reg, SubRegIdx, &AArch64::ZPRRegClass);
2476 FPRZRegs.set(ZReg);
2477 }
2478 SavedRegs |= FPRZRegs;
2479
2480 AFI->setSplitSVEObjects(true);
2481 LLVM_DEBUG(dbgs() << "SplitSVEObjects enabled!\n");
2482 }
2483}
2484
2486 BitVector &SavedRegs,
2487 RegScavenger *RS) const {
2488 // All calls are tail calls in GHC calling conv, and functions have no
2489 // prologue/epilogue.
2491 return;
2492
2493 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2494
2496 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
2498 unsigned UnspilledCSGPR = AArch64::NoRegister;
2499 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
2500
2501 MachineFrameInfo &MFI = MF.getFrameInfo();
2502 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
2503
2504 MCRegister BasePointerReg =
2505 RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister() : MCRegister();
2506
2507 unsigned ExtraCSSpill = 0;
2508 bool HasUnpairedGPR64 = false;
2509 bool HasPairZReg = false;
2510 BitVector UserReservedRegs = RegInfo->getUserReservedRegs(MF);
2511 BitVector ReservedRegs = RegInfo->getReservedRegs(MF);
2512
2513 // Figure out which callee-saved registers to save/restore.
2514 for (unsigned i = 0; CSRegs[i]; ++i) {
2515 const MCRegister Reg = CSRegs[i];
2516
2517 // Add the base pointer register to SavedRegs if it is callee-save.
2518 if (Reg == BasePointerReg)
2519 SavedRegs.set(Reg);
2520
2521 // Don't save manually reserved registers set through +reserve-x#i,
2522 // even for callee-saved registers, as per GCC's behavior.
2523 if (UserReservedRegs[Reg]) {
2524 SavedRegs.reset(Reg);
2525 continue;
2526 }
2527
2528 bool RegUsed = SavedRegs.test(Reg);
2529 MCRegister PairedReg;
2530 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
2531 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
2532 AArch64::FPR128RegClass.contains(Reg)) {
2533 // Compensate for odd numbers of GP CSRs.
2534 // For now, all the known cases of odd number of CSRs are of GPRs.
2535 if (HasUnpairedGPR64)
2536 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
2537 else
2538 PairedReg = CSRegs[i ^ 1];
2539 }
2540
2541 // If the function requires all the GP registers to save (SavedRegs),
2542 // and there are an odd number of GP CSRs at the same time (CSRegs),
2543 // PairedReg could be in a different register class from Reg, which would
2544 // lead to a FPR (usually D8) accidentally being marked saved.
2545 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
2546 PairedReg = AArch64::NoRegister;
2547 HasUnpairedGPR64 = true;
2548 }
2549 assert(PairedReg == AArch64::NoRegister ||
2550 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
2551 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
2552 AArch64::FPR128RegClass.contains(Reg, PairedReg));
2553
2554 if (!RegUsed) {
2555 if (AArch64::GPR64RegClass.contains(Reg) && !ReservedRegs[Reg]) {
2556 UnspilledCSGPR = Reg;
2557 UnspilledCSGPRPaired = PairedReg;
2558 }
2559 continue;
2560 }
2561
2562 // MachO's compact unwind format relies on all registers being stored in
2563 // pairs.
2564 // FIXME: the usual format is actually better if unwinding isn't needed.
2565 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
2566 !SavedRegs.test(PairedReg)) {
2567 SavedRegs.set(PairedReg);
2568 if (AArch64::GPR64RegClass.contains(PairedReg) &&
2569 !ReservedRegs[PairedReg])
2570 ExtraCSSpill = PairedReg;
2571 }
2572 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
2573 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
2574 SavedRegs.test(CSRegs[i ^ 1]));
2575 }
2576
2577 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
2579 // Find a suitable predicate register for the multi-vector spill/fill
2580 // instructions.
2581 MCRegister PnReg = findFreePredicateReg(SavedRegs);
2582 if (PnReg.isValid())
2583 AFI->setPredicateRegForFillSpill(PnReg);
2584 // If no free callee-save has been found assign one.
2585 if (!AFI->getPredicateRegForFillSpill() &&
2586 MF.getFunction().getCallingConv() ==
2588 SavedRegs.set(AArch64::P8);
2589 AFI->setPredicateRegForFillSpill(AArch64::PN8);
2590 }
2591
2592 assert(!ReservedRegs[AFI->getPredicateRegForFillSpill()] &&
2593 "Predicate cannot be a reserved register");
2594 }
2595
2597 !Subtarget.isTargetWindows()) {
2598 // For Windows calling convention on a non-windows OS, where X18 is treated
2599 // as reserved, back up X18 when entering non-windows code (marked with the
2600 // Windows calling convention) and restore when returning regardless of
2601 // whether the individual function uses it - it might call other functions
2602 // that clobber it.
2603 SavedRegs.set(AArch64::X18);
2604 }
2605
2606 // Determine if a Hazard slot should be used and where it should go.
2607 // If SplitSVEObjects is used, the hazard padding is placed between the PPRs
2608 // and ZPRs. Otherwise, it goes in the callee save area.
2609 determineStackHazardSlot(MF, SavedRegs);
2610
2611 // Calculates the callee saved stack size.
2612 unsigned CSStackSize = 0;
2613 unsigned ZPRCSStackSize = 0;
2614 unsigned PPRCSStackSize = 0;
2616 for (unsigned Reg : SavedRegs.set_bits()) {
2617 auto *RC = TRI->getMinimalPhysRegClass(MCRegister(Reg));
2618 assert(RC && "expected register class!");
2619 auto SpillSize = TRI->getSpillSize(*RC);
2620 bool IsZPR = AArch64::ZPRRegClass.contains(Reg);
2621 bool IsPPR = !IsZPR && AArch64::PPRRegClass.contains(Reg);
2622 if (IsZPR)
2623 ZPRCSStackSize += SpillSize;
2624 else if (IsPPR)
2625 PPRCSStackSize += SpillSize;
2626 else
2627 CSStackSize += SpillSize;
2628 }
2629
2630 // Save number of saved regs, so we can easily update CSStackSize later to
2631 // account for any additional 64-bit GPR saves. Note: After this point
2632 // only 64-bit GPRs can be added to SavedRegs.
2633 unsigned NumSavedRegs = SavedRegs.count();
2634
2635 // If we have hazard padding in the CS area add that to the size.
2637 CSStackSize += getStackHazardSize(MF);
2638
2639 // Increase the callee-saved stack size if the function has streaming mode
2640 // changes, as we will need to spill the value of the VG register.
2641 if (requiresSaveVG(MF))
2642 CSStackSize += 8;
2643
2644 // If we must call __arm_get_current_vg in the prologue preserve the LR.
2645 if (requiresSaveVG(MF) && !Subtarget.hasSVE())
2646 SavedRegs.set(AArch64::LR);
2647
2648 // The frame record needs to be created by saving the appropriate registers
2649 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
2650 if (hasFP(MF) ||
2651 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
2652 SavedRegs.set(AArch64::FP);
2653 SavedRegs.set(AArch64::LR);
2654 }
2655
2656 LLVM_DEBUG({
2657 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
2658 for (unsigned Reg : SavedRegs.set_bits())
2659 dbgs() << ' ' << printReg(MCRegister(Reg), RegInfo);
2660 dbgs() << "\n";
2661 });
2662
2663 // If any callee-saved registers are used, the frame cannot be eliminated.
2664 auto [ZPRLocalStackSize, PPRLocalStackSize] =
2666 uint64_t SVELocals = ZPRLocalStackSize + PPRLocalStackSize;
2667 uint64_t SVEStackSize =
2668 alignTo(ZPRCSStackSize + PPRCSStackSize + SVELocals, 16);
2669 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
2670
2671 // The CSR spill slots have not been allocated yet, so estimateStackSize
2672 // won't include them.
2673 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
2674
2675 // We may address some of the stack above the canonical frame address, either
2676 // for our own arguments or during a call. Include that in calculating whether
2677 // we have complicated addressing concerns.
2678 int64_t CalleeStackUsed = 0;
2679 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
2680 int64_t FixedOff = MFI.getObjectOffset(I);
2681 if (FixedOff > CalleeStackUsed)
2682 CalleeStackUsed = FixedOff;
2683 }
2684
2685 // Conservatively always assume BigStack when there are SVE spills.
2686 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
2687 CalleeStackUsed) > EstimatedStackSizeLimit;
2688 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
2689 AFI->setHasStackFrame(true);
2690
2691 // Estimate if we might need to scavenge a register at some point in order
2692 // to materialize a stack offset. If so, either spill one additional
2693 // callee-saved register or reserve a special spill slot to facilitate
2694 // register scavenging. If we already spilled an extra callee-saved register
2695 // above to keep the number of spills even, we don't need to do anything else
2696 // here.
2697 if (BigStack) {
2698 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
2699 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
2700 << " to get a scratch register.\n");
2701 SavedRegs.set(UnspilledCSGPR);
2702 ExtraCSSpill = UnspilledCSGPR;
2703
2704 // MachO's compact unwind format relies on all registers being stored in
2705 // pairs, so if we need to spill one extra for BigStack, then we need to
2706 // store the pair.
2707 if (producePairRegisters(MF)) {
2708 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
2709 // Failed to make a pair for compact unwind format, revert spilling.
2710 if (produceCompactUnwindFrame(*this, MF)) {
2711 SavedRegs.reset(UnspilledCSGPR);
2712 ExtraCSSpill = AArch64::NoRegister;
2713 }
2714 } else
2715 SavedRegs.set(UnspilledCSGPRPaired);
2716 }
2717 }
2718
2719 // If we didn't find an extra callee-saved register to spill, create
2720 // an emergency spill slot.
2721 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
2723 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
2724 unsigned Size = TRI->getSpillSize(RC);
2725 Align Alignment = TRI->getSpillAlign(RC);
2726 int FI = MFI.CreateSpillStackObject(Size, Alignment);
2727 RS->addScavengingFrameIndex(FI);
2728 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
2729 << " as the emergency spill slot.\n");
2730 }
2731 }
2732
2733 // Adding the size of additional 64bit GPR saves.
2734 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
2735
2736 // A Swift asynchronous context extends the frame record with a pointer
2737 // directly before FP.
2738 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
2739 CSStackSize += 8;
2740
2741 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
2742 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
2743 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
2744
2746 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
2747 "Should not invalidate callee saved info");
2748
2749 // Round up to register pair alignment to avoid additional SP adjustment
2750 // instructions.
2751 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
2752 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
2753 AFI->setSVECalleeSavedStackSize(ZPRCSStackSize, alignTo(PPRCSStackSize, 16));
2754}
2755
2757 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
2758 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
2759 unsigned &MaxCSFrameIndex) const {
2760 bool NeedsWinCFI = needsWinCFI(MF);
2761 unsigned StackHazardSize = getStackHazardSize(MF);
2762 // To match the canonical windows frame layout, reverse the list of
2763 // callee saved registers to get them laid out by PrologEpilogInserter
2764 // in the right order. (PrologEpilogInserter allocates stack objects top
2765 // down. Windows canonical prologs store higher numbered registers at
2766 // the top, thus have the CSI array start from the highest registers.)
2767 if (NeedsWinCFI)
2768 std::reverse(CSI.begin(), CSI.end());
2769
2770 if (CSI.empty())
2771 return true; // Early exit if no callee saved registers are modified!
2772
2773 // Now that we know which registers need to be saved and restored, allocate
2774 // stack slots for them.
2775 MachineFrameInfo &MFI = MF.getFrameInfo();
2776 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2777
2778 bool UsesWinAAPCS = isTargetWindows(MF);
2779 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2780 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
2781 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2782 if ((unsigned)FrameIdx < MinCSFrameIndex)
2783 MinCSFrameIndex = FrameIdx;
2784 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2785 MaxCSFrameIndex = FrameIdx;
2786 }
2787
2788 // Insert VG into the list of CSRs, immediately before LR if saved.
2789 if (requiresSaveVG(MF)) {
2790 CalleeSavedInfo VGInfo(AArch64::VG);
2791 auto It =
2792 find_if(CSI, [](auto &Info) { return Info.getReg() == AArch64::LR; });
2793 if (It != CSI.end())
2794 CSI.insert(It, VGInfo);
2795 else
2796 CSI.push_back(VGInfo);
2797 }
2798
2799 Register LastReg = 0;
2800 int HazardSlotIndex = std::numeric_limits<int>::max();
2801 for (auto &CS : CSI) {
2802 MCRegister Reg = CS.getReg();
2803 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
2804
2805 // Create a hazard slot as we switch between GPR and FPR CSRs.
2807 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
2809 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
2810 "Unexpected register order for hazard slot");
2811 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2812 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2813 << "\n");
2814 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2815 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
2816 MinCSFrameIndex = HazardSlotIndex;
2817 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
2818 MaxCSFrameIndex = HazardSlotIndex;
2819 }
2820
2821 unsigned Size = RegInfo->getSpillSize(*RC);
2822 Align Alignment(RegInfo->getSpillAlign(*RC));
2823 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
2824 CS.setFrameIdx(FrameIdx);
2825
2826 if ((unsigned)FrameIdx < MinCSFrameIndex)
2827 MinCSFrameIndex = FrameIdx;
2828 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2829 MaxCSFrameIndex = FrameIdx;
2830
2831 // Grab 8 bytes below FP for the extended asynchronous frame info.
2832 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
2833 Reg == AArch64::FP) {
2834 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
2835 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2836 if ((unsigned)FrameIdx < MinCSFrameIndex)
2837 MinCSFrameIndex = FrameIdx;
2838 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2839 MaxCSFrameIndex = FrameIdx;
2840 }
2841 LastReg = Reg;
2842 }
2843
2844 // Add hazard slot in the case where no FPR CSRs are present.
2846 HazardSlotIndex == std::numeric_limits<int>::max()) {
2847 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2848 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2849 << "\n");
2850 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2851 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
2852 MinCSFrameIndex = HazardSlotIndex;
2853 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
2854 MaxCSFrameIndex = HazardSlotIndex;
2855 }
2856
2857 return true;
2858}
2859
2861 const MachineFunction &MF) const {
2863 // If the function has streaming-mode changes, don't scavenge a
2864 // spillslot in the callee-save area, as that might require an
2865 // 'addvl' in the streaming-mode-changing call-sequence when the
2866 // function doesn't use a FP.
2867 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
2868 return false;
2869 // Don't allow register salvaging with hazard slots, in case it moves objects
2870 // into the wrong place.
2871 if (AFI->hasStackHazardSlotIndex())
2872 return false;
2873 return AFI->hasCalleeSaveStackFreeSpace();
2874}
2875
2876/// returns true if there are any SVE callee saves.
2878 int &Min, int &Max) {
2879 Min = std::numeric_limits<int>::max();
2880 Max = std::numeric_limits<int>::min();
2881
2882 if (!MFI.isCalleeSavedInfoValid())
2883 return false;
2884
2885 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
2886 for (auto &CS : CSI) {
2887 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
2888 AArch64::PPRRegClass.contains(CS.getReg())) {
2889 assert((Max == std::numeric_limits<int>::min() ||
2890 Max + 1 == CS.getFrameIdx()) &&
2891 "SVE CalleeSaves are not consecutive");
2892 Min = std::min(Min, CS.getFrameIdx());
2893 Max = std::max(Max, CS.getFrameIdx());
2894 }
2895 }
2896 return Min != std::numeric_limits<int>::max();
2897}
2898
2900 AssignObjectOffsets AssignOffsets) {
2901 MachineFrameInfo &MFI = MF.getFrameInfo();
2902 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2903
2904 SVEStackSizes SVEStack{};
2905
2906 // With SplitSVEObjects we maintain separate stack offsets for predicates
2907 // (PPRs) and SVE vectors (ZPRs). When SplitSVEObjects is disabled predicates
2908 // are included in the SVE vector area.
2909 uint64_t &ZPRStackTop = SVEStack.ZPRStackSize;
2910 uint64_t &PPRStackTop =
2911 AFI->hasSplitSVEObjects() ? SVEStack.PPRStackSize : SVEStack.ZPRStackSize;
2912
2913#ifndef NDEBUG
2914 // First process all fixed stack objects.
2915 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
2916 assert(!MFI.hasScalableStackID(I) &&
2917 "SVE vectors should never be passed on the stack by value, only by "
2918 "reference.");
2919#endif
2920
2921 auto AllocateObject = [&](int FI) {
2923 ? ZPRStackTop
2924 : PPRStackTop;
2925
2926 // FIXME: Given that the length of SVE vectors is not necessarily a power of
2927 // two, we'd need to align every object dynamically at runtime if the
2928 // alignment is larger than 16. This is not yet supported.
2929 Align Alignment = MFI.getObjectAlign(FI);
2930 if (Alignment > Align(16))
2932 "Alignment of scalable vectors > 16 bytes is not yet supported");
2933
2934 StackTop += MFI.getObjectSize(FI);
2935 StackTop = alignTo(StackTop, Alignment);
2936
2937 assert(StackTop < (uint64_t)std::numeric_limits<int64_t>::max() &&
2938 "SVE StackTop far too large?!");
2939
2940 int64_t Offset = -int64_t(StackTop);
2941 if (AssignOffsets == AssignObjectOffsets::Yes)
2942 MFI.setObjectOffset(FI, Offset);
2943
2944 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
2945 };
2946
2947 // Then process all callee saved slots.
2948 int MinCSFrameIndex, MaxCSFrameIndex;
2949 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
2950 for (int FI = MinCSFrameIndex; FI <= MaxCSFrameIndex; ++FI)
2951 AllocateObject(FI);
2952 }
2953
2954 // Ensure the CS area is 16-byte aligned.
2955 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2956 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2957
2958 // Create a buffer of SVE objects to allocate and sort it.
2959 SmallVector<int, 8> ObjectsToAllocate;
2960 // If we have a stack protector, and we've previously decided that we have SVE
2961 // objects on the stack and thus need it to go in the SVE stack area, then it
2962 // needs to go first.
2963 int StackProtectorFI = -1;
2964 if (MFI.hasStackProtectorIndex()) {
2965 StackProtectorFI = MFI.getStackProtectorIndex();
2966 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
2967 ObjectsToAllocate.push_back(StackProtectorFI);
2968 }
2969
2970 for (int FI = 0, E = MFI.getObjectIndexEnd(); FI != E; ++FI) {
2971 if (FI == StackProtectorFI || MFI.isDeadObjectIndex(FI))
2972 continue;
2973 if (MaxCSFrameIndex >= FI && FI >= MinCSFrameIndex)
2974 continue;
2975
2978 continue;
2979
2980 ObjectsToAllocate.push_back(FI);
2981 }
2982
2983 // Allocate all SVE locals and spills
2984 for (unsigned FI : ObjectsToAllocate)
2985 AllocateObject(FI);
2986
2987 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2988 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2989
2990 if (AssignOffsets == AssignObjectOffsets::Yes)
2991 AFI->setStackSizeSVE(SVEStack.ZPRStackSize, SVEStack.PPRStackSize);
2992
2993 return SVEStack;
2994}
2995
2997 MachineFunction &MF, RegScavenger *RS) const {
2999 "Upwards growing stack unsupported");
3000
3002
3003 // If this function isn't doing Win64-style C++ EH, we don't need to do
3004 // anything.
3005 if (!MF.hasEHFunclets())
3006 return;
3007
3008 MachineFrameInfo &MFI = MF.getFrameInfo();
3009 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3010
3011 // Win64 C++ EH needs to allocate space for the catch objects in the fixed
3012 // object area right next to the UnwindHelp object.
3013 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
3014 int64_t CurrentOffset =
3016 for (WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
3017 for (WinEHHandlerType &H : TBME.HandlerArray) {
3018 int FrameIndex = H.CatchObj.FrameIndex;
3019 if ((FrameIndex != INT_MAX) && MFI.getObjectOffset(FrameIndex) == 0) {
3020 CurrentOffset =
3021 alignTo(CurrentOffset, MFI.getObjectAlign(FrameIndex).value());
3022 CurrentOffset += MFI.getObjectSize(FrameIndex);
3023 MFI.setObjectOffset(FrameIndex, -CurrentOffset);
3024 }
3025 }
3026 }
3027
3028 // Create an UnwindHelp object.
3029 // The UnwindHelp object is allocated at the start of the fixed object area
3030 int64_t UnwindHelpOffset = alignTo(CurrentOffset + 8, Align(16));
3031 assert(UnwindHelpOffset == getFixedObjectSize(MF, AFI, /*IsWin64*/ true,
3032 /*IsFunclet*/ false) &&
3033 "UnwindHelpOffset must be at the start of the fixed object area");
3034 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8, -UnwindHelpOffset,
3035 /*IsImmutable=*/false);
3036 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
3037
3038 MachineBasicBlock &MBB = MF.front();
3039 auto MBBI = MBB.begin();
3040 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
3041 ++MBBI;
3042
3043 // We need to store -2 into the UnwindHelp object at the start of the
3044 // function.
3045 DebugLoc DL;
3046 RS->enterBasicBlockEnd(MBB);
3047 RS->backward(MBBI);
3048 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
3049 assert(DstReg && "There must be a free register after frame setup");
3051 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3052 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3053 .addReg(DstReg, getKillRegState(true))
3054 .addFrameIndex(UnwindHelpFI)
3055 .addImm(0);
3056}
3057
3058namespace {
3059struct TagStoreInstr {
3061 int64_t Offset, Size;
3062 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3063 : MI(MI), Offset(Offset), Size(Size) {}
3064};
3065
3066class TagStoreEdit {
3067 MachineFunction *MF;
3068 MachineBasicBlock *MBB;
3069 MachineRegisterInfo *MRI;
3070 // Tag store instructions that are being replaced.
3072 // Combined memref arguments of the above instructions.
3074
3075 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3076 // FrameRegOffset + Size) with the address tag of SP.
3077 Register FrameReg;
3078 StackOffset FrameRegOffset;
3079 int64_t Size;
3080 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3081 // end.
3082 std::optional<int64_t> FrameRegUpdate;
3083 // MIFlags for any FrameReg updating instructions.
3084 unsigned FrameRegUpdateFlags;
3085
3086 // Use zeroing instruction variants.
3087 bool ZeroData;
3088 DebugLoc DL;
3089
3090 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3091 void emitLoop(MachineBasicBlock::iterator InsertI);
3092
3093public:
3094 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3095 : MBB(MBB), ZeroData(ZeroData) {
3096 MF = MBB->getParent();
3097 MRI = &MF->getRegInfo();
3098 }
3099 // Add an instruction to be replaced. Instructions must be added in the
3100 // ascending order of Offset, and have to be adjacent.
3101 void addInstruction(TagStoreInstr I) {
3102 assert((TagStores.empty() ||
3103 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3104 "Non-adjacent tag store instructions.");
3105 TagStores.push_back(I);
3106 }
3107 void clear() { TagStores.clear(); }
3108 // Emit equivalent code at the given location, and erase the current set of
3109 // instructions. May skip if the replacement is not profitable. May invalidate
3110 // the input iterator and replace it with a valid one.
3111 void emitCode(MachineBasicBlock::iterator &InsertI,
3112 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3113};
3114
3115void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3116 const AArch64InstrInfo *TII =
3117 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3118
3119 const int64_t kMinOffset = -256 * 16;
3120 const int64_t kMaxOffset = 255 * 16;
3121
3122 Register BaseReg = FrameReg;
3123 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3124 if (BaseRegOffsetBytes < kMinOffset ||
3125 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3126 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3127 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3128 // is required for the offset of ST2G.
3129 BaseRegOffsetBytes % 16 != 0) {
3130 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3131 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3132 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3133 BaseReg = ScratchReg;
3134 BaseRegOffsetBytes = 0;
3135 }
3136
3137 MachineInstr *LastI = nullptr;
3138 while (Size) {
3139 int64_t InstrSize = (Size > 16) ? 32 : 16;
3140 unsigned Opcode =
3141 InstrSize == 16
3142 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3143 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3144 assert(BaseRegOffsetBytes % 16 == 0);
3145 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3146 .addReg(AArch64::SP)
3147 .addReg(BaseReg)
3148 .addImm(BaseRegOffsetBytes / 16)
3149 .setMemRefs(CombinedMemRefs);
3150 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3151 // final SP adjustment in the epilogue.
3152 if (BaseRegOffsetBytes == 0)
3153 LastI = I;
3154 BaseRegOffsetBytes += InstrSize;
3155 Size -= InstrSize;
3156 }
3157
3158 if (LastI)
3159 MBB->splice(InsertI, MBB, LastI);
3160}
3161
3162void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3163 const AArch64InstrInfo *TII =
3164 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3165
3166 Register BaseReg = FrameRegUpdate
3167 ? FrameReg
3168 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3169 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3170
3171 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3172
3173 int64_t LoopSize = Size;
3174 // If the loop size is not a multiple of 32, split off one 16-byte store at
3175 // the end to fold BaseReg update into.
3176 if (FrameRegUpdate && *FrameRegUpdate)
3177 LoopSize -= LoopSize % 32;
3178 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3179 TII->get(ZeroData ? AArch64::STZGloop_wback
3180 : AArch64::STGloop_wback))
3181 .addDef(SizeReg)
3182 .addDef(BaseReg)
3183 .addImm(LoopSize)
3184 .addReg(BaseReg)
3185 .setMemRefs(CombinedMemRefs);
3186 if (FrameRegUpdate)
3187 LoopI->setFlags(FrameRegUpdateFlags);
3188
3189 int64_t ExtraBaseRegUpdate =
3190 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3191 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
3192 << ", Size=" << Size
3193 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
3194 << ", FrameRegUpdate=" << FrameRegUpdate
3195 << ", FrameRegOffset.getFixed()="
3196 << FrameRegOffset.getFixed() << "\n");
3197 if (LoopSize < Size) {
3198 assert(FrameRegUpdate);
3199 assert(Size - LoopSize == 16);
3200 // Tag 16 more bytes at BaseReg and update BaseReg.
3201 int64_t STGOffset = ExtraBaseRegUpdate + 16;
3202 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
3203 "STG immediate out of range");
3204 BuildMI(*MBB, InsertI, DL,
3205 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3206 .addDef(BaseReg)
3207 .addReg(BaseReg)
3208 .addReg(BaseReg)
3209 .addImm(STGOffset / 16)
3210 .setMemRefs(CombinedMemRefs)
3211 .setMIFlags(FrameRegUpdateFlags);
3212 } else if (ExtraBaseRegUpdate) {
3213 // Update BaseReg.
3214 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
3215 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
3216 BuildMI(
3217 *MBB, InsertI, DL,
3218 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3219 .addDef(BaseReg)
3220 .addReg(BaseReg)
3221 .addImm(AddSubOffset)
3222 .addImm(0)
3223 .setMIFlags(FrameRegUpdateFlags);
3224 }
3225}
3226
3227// Check if *II is a register update that can be merged into STGloop that ends
3228// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3229// end of the loop.
3230bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3231 int64_t Size, int64_t *TotalOffset) {
3232 MachineInstr &MI = *II;
3233 if ((MI.getOpcode() == AArch64::ADDXri ||
3234 MI.getOpcode() == AArch64::SUBXri) &&
3235 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3236 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3237 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3238 if (MI.getOpcode() == AArch64::SUBXri)
3239 Offset = -Offset;
3240 int64_t PostOffset = Offset - Size;
3241 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
3242 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
3243 // chosen depends on the alignment of the loop size, but the difference
3244 // between the valid ranges for the two instructions is small, so we
3245 // conservatively assume that it could be either case here.
3246 //
3247 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
3248 // instruction.
3249 const int64_t kMaxOffset = 4080 - 16;
3250 // Max offset of SUBXri.
3251 const int64_t kMinOffset = -4095;
3252 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
3253 PostOffset % 16 == 0) {
3254 *TotalOffset = Offset;
3255 return true;
3256 }
3257 }
3258 return false;
3259}
3260
3261void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3263 MemRefs.clear();
3264 for (auto &TS : TSE) {
3265 MachineInstr *MI = TS.MI;
3266 // An instruction without memory operands may access anything. Be
3267 // conservative and return an empty list.
3268 if (MI->memoperands_empty()) {
3269 MemRefs.clear();
3270 return;
3271 }
3272 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3273 }
3274}
3275
3276void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3277 const AArch64FrameLowering *TFI,
3278 bool TryMergeSPUpdate) {
3279 if (TagStores.empty())
3280 return;
3281 TagStoreInstr &FirstTagStore = TagStores[0];
3282 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3283 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3284 DL = TagStores[0].MI->getDebugLoc();
3285
3286 Register Reg;
3287 FrameRegOffset = TFI->resolveFrameOffsetReference(
3288 *MF, FirstTagStore.Offset, false /*isFixed*/,
3289 TargetStackID::Default /*StackID*/, Reg,
3290 /*PreferFP=*/false, /*ForSimm=*/true);
3291 FrameReg = Reg;
3292 FrameRegUpdate = std::nullopt;
3293
3294 mergeMemRefs(TagStores, CombinedMemRefs);
3295
3296 LLVM_DEBUG({
3297 dbgs() << "Replacing adjacent STG instructions:\n";
3298 for (const auto &Instr : TagStores) {
3299 dbgs() << " " << *Instr.MI;
3300 }
3301 });
3302
3303 // Size threshold where a loop becomes shorter than a linear sequence of
3304 // tagging instructions.
3305 const int kSetTagLoopThreshold = 176;
3306 if (Size < kSetTagLoopThreshold) {
3307 if (TagStores.size() < 2)
3308 return;
3309 emitUnrolled(InsertI);
3310 } else {
3311 MachineInstr *UpdateInstr = nullptr;
3312 int64_t TotalOffset = 0;
3313 if (TryMergeSPUpdate) {
3314 // See if we can merge base register update into the STGloop.
3315 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
3316 // but STGloop is way too unusual for that, and also it only
3317 // realistically happens in function epilogue. Also, STGloop is expanded
3318 // before that pass.
3319 if (InsertI != MBB->end() &&
3320 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
3321 &TotalOffset)) {
3322 UpdateInstr = &*InsertI++;
3323 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
3324 << *UpdateInstr);
3325 }
3326 }
3327
3328 if (!UpdateInstr && TagStores.size() < 2)
3329 return;
3330
3331 if (UpdateInstr) {
3332 FrameRegUpdate = TotalOffset;
3333 FrameRegUpdateFlags = UpdateInstr->getFlags();
3334 }
3335 emitLoop(InsertI);
3336 if (UpdateInstr)
3337 UpdateInstr->eraseFromParent();
3338 }
3339
3340 for (auto &TS : TagStores)
3341 TS.MI->eraseFromParent();
3342}
3343
3344bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
3345 int64_t &Size, bool &ZeroData) {
3346 MachineFunction &MF = *MI.getParent()->getParent();
3347 const MachineFrameInfo &MFI = MF.getFrameInfo();
3348
3349 unsigned Opcode = MI.getOpcode();
3350 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
3351 Opcode == AArch64::STZ2Gi);
3352
3353 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
3354 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
3355 return false;
3356 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
3357 return false;
3358 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
3359 Size = MI.getOperand(2).getImm();
3360 return true;
3361 }
3362
3363 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
3364 Size = 16;
3365 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
3366 Size = 32;
3367 else
3368 return false;
3369
3370 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
3371 return false;
3372
3373 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
3374 16 * MI.getOperand(2).getImm();
3375 return true;
3376}
3377
3378// Detect a run of memory tagging instructions for adjacent stack frame slots,
3379// and replace them with a shorter instruction sequence:
3380// * replace STG + STG with ST2G
3381// * replace STGloop + STGloop with STGloop
3382// This code needs to run when stack slot offsets are already known, but before
3383// FrameIndex operands in STG instructions are eliminated.
3385 const AArch64FrameLowering *TFI,
3386 RegScavenger *RS) {
3387 bool FirstZeroData;
3388 int64_t Size, Offset;
3389 MachineInstr &MI = *II;
3390 MachineBasicBlock *MBB = MI.getParent();
3392 if (&MI == &MBB->instr_back())
3393 return II;
3394 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
3395 return II;
3396
3398 Instrs.emplace_back(&MI, Offset, Size);
3399
3400 constexpr int kScanLimit = 10;
3401 int Count = 0;
3403 NextI != E && Count < kScanLimit; ++NextI) {
3404 MachineInstr &MI = *NextI;
3405 bool ZeroData;
3406 int64_t Size, Offset;
3407 // Collect instructions that update memory tags with a FrameIndex operand
3408 // and (when applicable) constant size, and whose output registers are dead
3409 // (the latter is almost always the case in practice). Since these
3410 // instructions effectively have no inputs or outputs, we are free to skip
3411 // any non-aliasing instructions in between without tracking used registers.
3412 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
3413 if (ZeroData != FirstZeroData)
3414 break;
3415 Instrs.emplace_back(&MI, Offset, Size);
3416 continue;
3417 }
3418
3419 // Only count non-transient, non-tagging instructions toward the scan
3420 // limit.
3421 if (!MI.isTransient())
3422 ++Count;
3423
3424 // Just in case, stop before the epilogue code starts.
3425 if (MI.getFlag(MachineInstr::FrameSetup) ||
3427 break;
3428
3429 // Reject anything that may alias the collected instructions.
3430 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
3431 break;
3432 }
3433
3434 // New code will be inserted after the last tagging instruction we've found.
3435 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
3436
3437 // All the gathered stack tag instructions are merged and placed after
3438 // last tag store in the list. The check should be made if the nzcv
3439 // flag is live at the point where we are trying to insert. Otherwise
3440 // the nzcv flag might get clobbered if any stg loops are present.
3441
3442 // FIXME : This approach of bailing out from merge is conservative in
3443 // some ways like even if stg loops are not present after merge the
3444 // insert list, this liveness check is done (which is not needed).
3446 LiveRegs.addLiveOuts(*MBB);
3447 for (auto I = MBB->rbegin();; ++I) {
3448 MachineInstr &MI = *I;
3449 if (MI == InsertI)
3450 break;
3451 LiveRegs.stepBackward(*I);
3452 }
3453 InsertI++;
3454 if (LiveRegs.contains(AArch64::NZCV))
3455 return InsertI;
3456
3457 llvm::stable_sort(Instrs,
3458 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
3459 return Left.Offset < Right.Offset;
3460 });
3461
3462 // Make sure that we don't have any overlapping stores.
3463 int64_t CurOffset = Instrs[0].Offset;
3464 for (auto &Instr : Instrs) {
3465 if (CurOffset > Instr.Offset)
3466 return NextI;
3467 CurOffset = Instr.Offset + Instr.Size;
3468 }
3469
3470 // Find contiguous runs of tagged memory and emit shorter instruction
3471 // sequences for them when possible.
3472 TagStoreEdit TSE(MBB, FirstZeroData);
3473 std::optional<int64_t> EndOffset;
3474 for (auto &Instr : Instrs) {
3475 if (EndOffset && *EndOffset != Instr.Offset) {
3476 // Found a gap.
3477 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
3478 TSE.clear();
3479 }
3480
3481 TSE.addInstruction(Instr);
3482 EndOffset = Instr.Offset + Instr.Size;
3483 }
3484
3485 const MachineFunction *MF = MBB->getParent();
3486 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
3487 TSE.emitCode(
3488 InsertI, TFI, /*TryMergeSPUpdate = */
3490
3491 return InsertI;
3492}
3493} // namespace
3494
3496 MachineFunction &MF, RegScavenger *RS = nullptr) const {
3497 for (auto &BB : MF)
3498 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
3500 II = tryMergeAdjacentSTG(II, this, RS);
3501 }
3502
3503 // By the time this method is called, most of the prologue/epilogue code is
3504 // already emitted, whether its location was affected by the shrink-wrapping
3505 // optimization or not.
3506 if (!MF.getFunction().hasFnAttribute(Attribute::Naked) &&
3507 shouldSignReturnAddressEverywhere(MF))
3509}
3510
3511/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
3512/// before the update. This is easily retrieved as it is exactly the offset
3513/// that is set in processFunctionBeforeFrameFinalized.
3515 const MachineFunction &MF, int FI, Register &FrameReg,
3516 bool IgnoreSPUpdates) const {
3517 const MachineFrameInfo &MFI = MF.getFrameInfo();
3518 if (IgnoreSPUpdates) {
3519 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
3520 << MFI.getObjectOffset(FI) << "\n");
3521 FrameReg = AArch64::SP;
3522 return StackOffset::getFixed(MFI.getObjectOffset(FI));
3523 }
3524
3525 // Go to common code if we cannot provide sp + offset.
3526 if (MFI.hasVarSizedObjects() ||
3529 return getFrameIndexReference(MF, FI, FrameReg);
3530
3531 FrameReg = AArch64::SP;
3532 return getStackOffset(MF, MFI.getObjectOffset(FI));
3533}
3534
3535/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
3536/// the parent's frame pointer
3538 const MachineFunction &MF) const {
3539 return 0;
3540}
3541
3542/// Funclets only need to account for space for the callee saved registers,
3543/// as the locals are accounted for in the parent's stack frame.
3545 const MachineFunction &MF) const {
3546 // This is the size of the pushed CSRs.
3547 unsigned CSSize =
3548 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
3549 // This is the amount of stack a funclet needs to allocate.
3550 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
3551 getStackAlign());
3552}
3553
3554namespace {
3555struct FrameObject {
3556 bool IsValid = false;
3557 // Index of the object in MFI.
3558 int ObjectIndex = 0;
3559 // Group ID this object belongs to.
3560 int GroupIndex = -1;
3561 // This object should be placed first (closest to SP).
3562 bool ObjectFirst = false;
3563 // This object's group (which always contains the object with
3564 // ObjectFirst==true) should be placed first.
3565 bool GroupFirst = false;
3566
3567 // Used to distinguish between FP and GPR accesses. The values are decided so
3568 // that they sort FPR < Hazard < GPR and they can be or'd together.
3569 unsigned Accesses = 0;
3570 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
3571};
3572
3573class GroupBuilder {
3574 SmallVector<int, 8> CurrentMembers;
3575 int NextGroupIndex = 0;
3576 std::vector<FrameObject> &Objects;
3577
3578public:
3579 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
3580 void AddMember(int Index) { CurrentMembers.push_back(Index); }
3581 void EndCurrentGroup() {
3582 if (CurrentMembers.size() > 1) {
3583 // Create a new group with the current member list. This might remove them
3584 // from their pre-existing groups. That's OK, dealing with overlapping
3585 // groups is too hard and unlikely to make a difference.
3586 LLVM_DEBUG(dbgs() << "group:");
3587 for (int Index : CurrentMembers) {
3588 Objects[Index].GroupIndex = NextGroupIndex;
3589 LLVM_DEBUG(dbgs() << " " << Index);
3590 }
3591 LLVM_DEBUG(dbgs() << "\n");
3592 NextGroupIndex++;
3593 }
3594 CurrentMembers.clear();
3595 }
3596};
3597
3598bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
3599 // Objects at a lower index are closer to FP; objects at a higher index are
3600 // closer to SP.
3601 //
3602 // For consistency in our comparison, all invalid objects are placed
3603 // at the end. This also allows us to stop walking when we hit the
3604 // first invalid item after it's all sorted.
3605 //
3606 // If we want to include a stack hazard region, order FPR accesses < the
3607 // hazard object < GPRs accesses in order to create a separation between the
3608 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
3609 //
3610 // Otherwise the "first" object goes first (closest to SP), followed by the
3611 // members of the "first" group.
3612 //
3613 // The rest are sorted by the group index to keep the groups together.
3614 // Higher numbered groups are more likely to be around longer (i.e. untagged
3615 // in the function epilogue and not at some earlier point). Place them closer
3616 // to SP.
3617 //
3618 // If all else equal, sort by the object index to keep the objects in the
3619 // original order.
3620 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
3621 A.GroupIndex, A.ObjectIndex) <
3622 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
3623 B.GroupIndex, B.ObjectIndex);
3624}
3625} // namespace
3626
3628 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
3630
3631 if ((!OrderFrameObjects && !AFI.hasSplitSVEObjects()) ||
3632 ObjectsToAllocate.empty())
3633 return;
3634
3635 const MachineFrameInfo &MFI = MF.getFrameInfo();
3636 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
3637 for (auto &Obj : ObjectsToAllocate) {
3638 FrameObjects[Obj].IsValid = true;
3639 FrameObjects[Obj].ObjectIndex = Obj;
3640 }
3641
3642 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
3643 // the same time.
3644 GroupBuilder GB(FrameObjects);
3645 for (auto &MBB : MF) {
3646 for (auto &MI : MBB) {
3647 if (MI.isDebugInstr())
3648 continue;
3649
3650 if (AFI.hasStackHazardSlotIndex()) {
3651 std::optional<int> FI = getLdStFrameID(MI, MFI);
3652 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3653 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3655 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
3656 else
3657 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
3658 }
3659 }
3660
3661 int OpIndex;
3662 switch (MI.getOpcode()) {
3663 case AArch64::STGloop:
3664 case AArch64::STZGloop:
3665 OpIndex = 3;
3666 break;
3667 case AArch64::STGi:
3668 case AArch64::STZGi:
3669 case AArch64::ST2Gi:
3670 case AArch64::STZ2Gi:
3671 OpIndex = 1;
3672 break;
3673 default:
3674 OpIndex = -1;
3675 }
3676
3677 int TaggedFI = -1;
3678 if (OpIndex >= 0) {
3679 const MachineOperand &MO = MI.getOperand(OpIndex);
3680 if (MO.isFI()) {
3681 int FI = MO.getIndex();
3682 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
3683 FrameObjects[FI].IsValid)
3684 TaggedFI = FI;
3685 }
3686 }
3687
3688 // If this is a stack tagging instruction for a slot that is not part of a
3689 // group yet, either start a new group or add it to the current one.
3690 if (TaggedFI >= 0)
3691 GB.AddMember(TaggedFI);
3692 else
3693 GB.EndCurrentGroup();
3694 }
3695 // Groups should never span multiple basic blocks.
3696 GB.EndCurrentGroup();
3697 }
3698
3699 if (AFI.hasStackHazardSlotIndex()) {
3700 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
3701 FrameObject::AccessHazard;
3702 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
3703 for (auto &Obj : FrameObjects)
3704 if (!Obj.Accesses ||
3705 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
3706 Obj.Accesses = FrameObject::AccessGPR;
3707 }
3708
3709 // If the function's tagged base pointer is pinned to a stack slot, we want to
3710 // put that slot first when possible. This will likely place it at SP + 0,
3711 // and save one instruction when generating the base pointer because IRG does
3712 // not allow an immediate offset.
3713 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
3714 if (TBPI) {
3715 FrameObjects[*TBPI].ObjectFirst = true;
3716 FrameObjects[*TBPI].GroupFirst = true;
3717 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
3718 if (FirstGroupIndex >= 0)
3719 for (FrameObject &Object : FrameObjects)
3720 if (Object.GroupIndex == FirstGroupIndex)
3721 Object.GroupFirst = true;
3722 }
3723
3724 llvm::stable_sort(FrameObjects, FrameObjectCompare);
3725
3726 int i = 0;
3727 for (auto &Obj : FrameObjects) {
3728 // All invalid items are sorted at the end, so it's safe to stop.
3729 if (!Obj.IsValid)
3730 break;
3731 ObjectsToAllocate[i++] = Obj.ObjectIndex;
3732 }
3733
3734 LLVM_DEBUG({
3735 dbgs() << "Final frame order:\n";
3736 for (auto &Obj : FrameObjects) {
3737 if (!Obj.IsValid)
3738 break;
3739 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
3740 if (Obj.ObjectFirst)
3741 dbgs() << ", first";
3742 if (Obj.GroupFirst)
3743 dbgs() << ", group-first";
3744 dbgs() << "\n";
3745 }
3746 });
3747}
3748
3749/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
3750/// least every ProbeSize bytes. Returns an iterator of the first instruction
3751/// after the loop. The difference between SP and TargetReg must be an exact
3752/// multiple of ProbeSize.
3754AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
3755 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
3756 Register TargetReg) const {
3757 MachineBasicBlock &MBB = *MBBI->getParent();
3758 MachineFunction &MF = *MBB.getParent();
3759 const AArch64InstrInfo *TII =
3760 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3761 DebugLoc DL = MBB.findDebugLoc(MBBI);
3762
3763 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
3764 MachineBasicBlock *LoopMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3765 MF.insert(MBBInsertPoint, LoopMBB);
3766 MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3767 MF.insert(MBBInsertPoint, ExitMBB);
3768
3769 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
3770 // in SUB).
3771 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
3772 StackOffset::getFixed(-ProbeSize), TII,
3774 // STR XZR, [SP]
3775 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
3776 .addReg(AArch64::XZR)
3777 .addReg(AArch64::SP)
3778 .addImm(0)
3780 // CMP SP, TargetReg
3781 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
3782 AArch64::XZR)
3783 .addReg(AArch64::SP)
3784 .addReg(TargetReg)
3787 // B.CC Loop
3788 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
3790 .addMBB(LoopMBB)
3792
3793 LoopMBB->addSuccessor(ExitMBB);
3794 LoopMBB->addSuccessor(LoopMBB);
3795 // Synthesize the exit MBB.
3796 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
3798 MBB.addSuccessor(LoopMBB);
3799 // Update liveins.
3800 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
3801
3802 return ExitMBB->begin();
3803}
3804
3805void AArch64FrameLowering::inlineStackProbeFixed(
3806 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
3807 StackOffset CFAOffset) const {
3808 MachineBasicBlock *MBB = MBBI->getParent();
3809 MachineFunction &MF = *MBB->getParent();
3810 const AArch64InstrInfo *TII =
3811 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3812 AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
3813 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
3814 bool HasFP = hasFP(MF);
3815
3816 DebugLoc DL;
3817 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
3818 int64_t NumBlocks = FrameSize / ProbeSize;
3819 int64_t ResidualSize = FrameSize % ProbeSize;
3820
3821 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
3822 << NumBlocks << " blocks of " << ProbeSize
3823 << " bytes, plus " << ResidualSize << " bytes\n");
3824
3825 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
3826 // ordinary loop.
3827 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
3828 for (int i = 0; i < NumBlocks; ++i) {
3829 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
3830 // encodable in a SUB).
3831 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3832 StackOffset::getFixed(-ProbeSize), TII,
3833 MachineInstr::FrameSetup, false, false, nullptr,
3834 EmitAsyncCFI && !HasFP, CFAOffset);
3835 CFAOffset += StackOffset::getFixed(ProbeSize);
3836 // STR XZR, [SP]
3837 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
3838 .addReg(AArch64::XZR)
3839 .addReg(AArch64::SP)
3840 .addImm(0)
3842 }
3843 } else if (NumBlocks != 0) {
3844 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
3845 // encodable in ADD). ScrathReg may temporarily become the CFA register.
3846 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
3847 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
3848 MachineInstr::FrameSetup, false, false, nullptr,
3849 EmitAsyncCFI && !HasFP, CFAOffset);
3850 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
3851 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
3852 MBB = MBBI->getParent();
3853 if (EmitAsyncCFI && !HasFP) {
3854 // Set the CFA register back to SP.
3855 CFIInstBuilder(*MBB, MBBI, MachineInstr::FrameSetup)
3856 .buildDefCFARegister(AArch64::SP);
3857 }
3858 }
3859
3860 if (ResidualSize != 0) {
3861 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
3862 // in SUB).
3863 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3864 StackOffset::getFixed(-ResidualSize), TII,
3865 MachineInstr::FrameSetup, false, false, nullptr,
3866 EmitAsyncCFI && !HasFP, CFAOffset);
3867 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
3868 // STR XZR, [SP]
3869 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
3870 .addReg(AArch64::XZR)
3871 .addReg(AArch64::SP)
3872 .addImm(0)
3874 }
3875 }
3876}
3877
3878void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
3879 MachineBasicBlock &MBB) const {
3880 // Get the instructions that need to be replaced. We emit at most two of
3881 // these. Remember them in order to avoid complications coming from the need
3882 // to traverse the block while potentially creating more blocks.
3883 SmallVector<MachineInstr *, 4> ToReplace;
3884 for (MachineInstr &MI : MBB)
3885 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
3886 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
3887 ToReplace.push_back(&MI);
3888
3889 for (MachineInstr *MI : ToReplace) {
3890 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
3891 Register ScratchReg = MI->getOperand(0).getReg();
3892 int64_t FrameSize = MI->getOperand(1).getImm();
3893 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
3894 MI->getOperand(3).getImm());
3895 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
3896 CFAOffset);
3897 } else {
3898 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
3899 "Stack probe pseudo-instruction expected");
3900 const AArch64InstrInfo *TII =
3901 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
3902 Register TargetReg = MI->getOperand(0).getReg();
3903 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
3904 }
3905 MI->eraseFromParent();
3906 }
3907}
3908
3911 NotAccessed = 0, // Stack object not accessed by load/store instructions.
3912 GPR = 1 << 0, // A general purpose register.
3913 PPR = 1 << 1, // A predicate register.
3914 FPR = 1 << 2, // A floating point/Neon/SVE register.
3915 };
3916
3917 int Idx;
3919 int64_t Size;
3920 unsigned AccessTypes;
3921
3923
3924 bool operator<(const StackAccess &Rhs) const {
3925 return std::make_tuple(start(), Idx) <
3926 std::make_tuple(Rhs.start(), Rhs.Idx);
3927 }
3928
3929 bool isCPU() const {
3930 // Predicate register load and store instructions execute on the CPU.
3932 }
3933 bool isSME() const { return AccessTypes & AccessType::FPR; }
3934 bool isMixed() const { return isCPU() && isSME(); }
3935
3936 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
3937 int64_t end() const { return start() + Size; }
3938
3939 std::string getTypeString() const {
3940 switch (AccessTypes) {
3941 case AccessType::FPR:
3942 return "FPR";
3943 case AccessType::PPR:
3944 return "PPR";
3945 case AccessType::GPR:
3946 return "GPR";
3948 return "NA";
3949 default:
3950 return "Mixed";
3951 }
3952 }
3953
3954 void print(raw_ostream &OS) const {
3955 OS << getTypeString() << " stack object at [SP"
3956 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
3957 if (Offset.getScalable())
3958 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
3959 << " * vscale";
3960 OS << "]";
3961 }
3962};
3963
3964static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
3965 SA.print(OS);
3966 return OS;
3967}
3968
3969void AArch64FrameLowering::emitRemarks(
3970 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
3971
3972 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3974 return;
3975
3976 unsigned StackHazardSize = getStackHazardSize(MF);
3977 const uint64_t HazardSize =
3978 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
3979
3980 if (HazardSize == 0)
3981 return;
3982
3983 const MachineFrameInfo &MFI = MF.getFrameInfo();
3984 // Bail if function has no stack objects.
3985 if (!MFI.hasStackObjects())
3986 return;
3987
3988 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
3989
3990 size_t NumFPLdSt = 0;
3991 size_t NumNonFPLdSt = 0;
3992
3993 // Collect stack accesses via Load/Store instructions.
3994 for (const MachineBasicBlock &MBB : MF) {
3995 for (const MachineInstr &MI : MBB) {
3996 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
3997 continue;
3998 for (MachineMemOperand *MMO : MI.memoperands()) {
3999 std::optional<int> FI = getMMOFrameID(MMO, MFI);
4000 if (FI && !MFI.isDeadObjectIndex(*FI)) {
4001 int FrameIdx = *FI;
4002
4003 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
4004 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
4005 StackAccesses[ArrIdx].Idx = FrameIdx;
4006 StackAccesses[ArrIdx].Offset =
4007 getFrameIndexReferenceFromSP(MF, FrameIdx);
4008 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
4009 }
4010
4011 unsigned RegTy = StackAccess::AccessType::GPR;
4012 if (MFI.hasScalableStackID(FrameIdx))
4015 RegTy = StackAccess::FPR;
4016
4017 StackAccesses[ArrIdx].AccessTypes |= RegTy;
4018
4019 if (RegTy == StackAccess::FPR)
4020 ++NumFPLdSt;
4021 else
4022 ++NumNonFPLdSt;
4023 }
4024 }
4025 }
4026 }
4027
4028 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
4029 return;
4030
4031 llvm::sort(StackAccesses);
4032 llvm::erase_if(StackAccesses, [](const StackAccess &S) {
4034 });
4035
4038
4039 if (StackAccesses.front().isMixed())
4040 MixedObjects.push_back(&StackAccesses.front());
4041
4042 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
4043 It != End; ++It) {
4044 const auto &First = *It;
4045 const auto &Second = *(It + 1);
4046
4047 if (Second.isMixed())
4048 MixedObjects.push_back(&Second);
4049
4050 if ((First.isSME() && Second.isCPU()) ||
4051 (First.isCPU() && Second.isSME())) {
4052 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
4053 if (Distance < HazardSize)
4054 HazardPairs.emplace_back(&First, &Second);
4055 }
4056 }
4057
4058 auto EmitRemark = [&](llvm::StringRef Str) {
4059 ORE->emit([&]() {
4060 auto R = MachineOptimizationRemarkAnalysis(
4061 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
4062 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
4063 });
4064 };
4065
4066 for (const auto &P : HazardPairs)
4067 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
4068
4069 for (const auto *Obj : MixedObjects)
4070 EmitRemark(
4071 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
4072}
unsigned const MachineRegisterInfo * MRI
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static bool produceCompactUnwindFrame(const AArch64FrameLowering &, MachineFunction &MF)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static cl::opt< bool > SplitSVEObjects("aarch64-split-sve-objects", cl::desc("Split allocation of ZPR & PPR objects"), cl::init(true), cl::Hidden)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
void computeCalleeSaveRegisterPairs(const AArch64FrameLowering &AFL, MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool invalidateRegisterPairing(bool SpillExtendedVolatile, unsigned SpillCount, unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool isLikelyToHaveSVEStack(const AArch64FrameLowering &AFL, const MachineFunction &MF)
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static SVEStackSizes determineSVEStackSizes(MachineFunction &MF, AssignObjectOffsets AssignOffsets)
Process all the SVE stack objects and the SVE stack size and offsets for each object.
static bool isTargetWindows(const MachineFunction &MF)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static unsigned getStackHazardSize(const MachineFunction &MF)
static bool invalidateWindowsRegisterPairing(bool SpillExtendedVolatile, unsigned SpillCount, unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
MCRegister findFreePredicateReg(BitVector &SavedRegs)
static bool isPPRAccess(const MachineInstr &MI)
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
This file contains the declaration of the AArch64PrologueEmitter and AArch64EpilogueEmitter classes,...
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
DXIL Forward Handle Accesses
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
static std::string getTypeString(Type *T)
Definition LLParser.cpp:67
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition MD5.cpp:54
#define I(x, y, z)
Definition MD5.cpp:57
#define H(x, y, z)
Definition MD5.cpp:56
Register Reg
Register const TargetRegisterInfo * TRI
Promote Memory to Register
Definition Mem2Reg.cpp:110
uint64_t IntrinsicInst * II
#define P(N)
This file declares the machine register scavenger class.
unsigned OpIndex
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition Value.cpp:480
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
#define LLVM_DEBUG(...)
Definition Debug.h:114
StackOffset getSVEStackSize(const MachineFunction &MF) const
Returns the size of the entire SVE stackframe (PPRs + ZPRs).
StackOffset getZPRStackSize(const MachineFunction &MF) const
Returns the size of the entire ZPR stackframe (calleesaves + spills).
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
bool enableFullCFIFixup(const MachineFunction &MF) const override
enableFullCFIFixup - Returns true if we may need to fix the unwind information such that it is accura...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
bool enableCFIFixup(const MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon function entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, TargetStackID::Value StackID, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
bool needsWinCFI(const MachineFunction &MF) const
bool isFPReserved(const MachineFunction &MF) const
Should the Frame Pointer be reserved for the current function?
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
StackOffset getPPRStackSize(const MachineFunction &MF) const
Returns the size of the entire PPR stackframe (calleesaves + spills + hazard padding).
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
bool requiresSaveVG(const MachineFunction &MF) const
void emitPacRetPlusLeafHardening(MachineFunction &MF) const
Harden the entire function with pac-ret.
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
SignReturnAddress getSignReturnAddressCondition() const
void setStackSizeSVE(uint64_t ZPR, uint64_t PPR)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setSVECalleeSavedStackSize(unsigned ZPR, unsigned PPR)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:40
size_t size() const
size - Get the array size.
Definition ArrayRef.h:142
bool empty() const
empty - Check if the array is empty.
Definition ArrayRef.h:137
bool test(unsigned Idx) const
Definition BitVector.h:480
BitVector & reset()
Definition BitVector.h:411
size_type count() const
count - Returns the number of bits which are set.
Definition BitVector.h:181
BitVector & set()
Definition BitVector.h:370
iterator_range< const_set_bits_iterator > set_bits() const
Definition BitVector.h:159
size_type size() const
size - Returns the number of bits in this bitvector.
Definition BitVector.h:178
Helper class for creating CFI instructions and inserting them into MIR.
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition DebugLoc.h:124
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition Function.h:703
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition Function.h:270
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition Function.h:352
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition Function.h:227
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition Function.cpp:730
A set of physical registers with utility functions to track liveness when walking backward/forward th...
bool usesWindowsCFI() const
Definition MCAsmInfo.h:652
Wrapper class representing physical registers. Should be passed by value.
Definition MCRegister.h:41
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
LLVM_ABI int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
LLVM_ABI int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
bool hasScalableStackID(int ObjectIdx) const
int getStackProtectorIndex() const
Return the index for the stack protector object.
LLVM_ABI int CreateSpillStackObject(uint64_t Size, Align Alignment)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
LLVM_ABI uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
void setFlags(unsigned flags)
LLVM_ABI void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
uint32_t getFlags() const
Return the MI flags bitvector.
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
int64_t getImm() const
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
LLVM_ABI void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
LLVM_ABI bool isLiveIn(Register Reg) const
LLVM_ABI const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
LLVM_ABI bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition ArrayRef.h:298
Wrapper class representing virtual and physical registers.
Definition Register.h:20
constexpr bool isValid() const
Definition Register.h:112
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasNonStreamingInterfaceAndBody() const
bool hasStreamingBody() const
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:149
A SetVector that performs no allocations if smaller than a certain size.
Definition SetVector.h:337
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
reference emplace_back(ArgTypes &&... Args)
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
StackOffset holds a fixed and a scalable offset in bytes.
Definition TypeSize.h:30
int64_t getFixed() const
Returns the fixed component of the stack.
Definition TypeSize.h:46
int64_t getScalable() const
Returns the scalable component of the stack.
Definition TypeSize.h:49
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition TypeSize.h:41
static StackOffset getScalable(int64_t Scalable)
Definition TypeSize.h:40
static StackOffset getFixed(int64_t Fixed)
Definition TypeSize.h:39
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(const MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
Primary interface to the complete machine description for the target machine.
const Triple & getTargetTriple() const
TargetOptions Options
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
LLVM_ABI bool FramePointerIsReserved(const MachineFunction &MF) const
FramePointerIsReserved - This returns true if the frame pointer must always either point to a new fra...
LLVM_ABI bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
virtual const TargetInstrInfo * getInstrInfo() const
virtual const TargetRegisterInfo * getRegisterInfo() const =0
Return the target's register information.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition CallingConv.h:50
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition CallingConv.h:66
@ Fast
Attempts to make calls as fast as possible (e.g.
Definition CallingConv.h:41
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition CallingConv.h:87
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ Define
Register definition.
initializer< Ty > init(const Ty &Val)
NodeAddr< InstrNode * > Instr
Definition RDFGraph.h:389
BaseReg
Stack frame base register. Bit 0 of FREInfo.Info.
Definition SFrame.h:77
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:532
void stable_sort(R &&Range)
Definition STLExtras.h:2058
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition ScopeExit.h:59
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1732
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition STLExtras.h:406
void sort(IteratorTy Start, IteratorTy End)
Definition STLExtras.h:1622
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:167
FunctionAddr VTableAddr Count
Definition InstrProf.h:139
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
@ LLVM_MARK_AS_BITMASK_ENUM
Definition ModRef.h:37
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:74
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint16_t MCPhysReg
An unsigned integer type large enough to represent all physical registers, but not necessarily virtua...
Definition MCRegister.h:21
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
auto find_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::find_if which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1758
void erase_if(Container &C, UnaryPredicate P)
Provide a container algorithm similar to C++ Library Fundamentals v2's erase_if which is equivalent t...
Definition STLExtras.h:2120
bool is_contained(R &&Range, const E &Element)
Returns true if Element is found in Range.
Definition STLExtras.h:1897
LLVM_ABI const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=MaxLookupSearchDepth)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
LLVM_ABI Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition BitVector.h:869
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
Pair of physical register and lane mask.
static LLVM_ABI MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.
SmallVector< WinEHTryBlockMapEntry, 4 > TryBlockMap
SmallVector< WinEHHandlerType, 1 > HandlerArray