LLVM 23.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | (Win64 only) callee-saved SVE reg |
48// | |
49// |-----------------------------------|
50// | |
51// | callee-saved gpr registers | <--.
52// | | | On Darwin platforms these
53// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
54// | prev_lr | | (frame record first)
55// | prev_fp | <--'
56// | async context if needed |
57// | (a.k.a. "frame record") |
58// |-----------------------------------| <- fp(=x29)
59// Default SVE stack layout Split SVE objects
60// (aarch64-split-sve-objects=false) (aarch64-split-sve-objects=true)
61// |-----------------------------------| |-----------------------------------|
62// | <hazard padding> | | callee-saved PPR registers |
63// |-----------------------------------| |-----------------------------------|
64// | | | PPR stack objects |
65// | callee-saved fp/simd/SVE regs | |-----------------------------------|
66// | | | <hazard padding> |
67// |-----------------------------------| |-----------------------------------|
68// | | | callee-saved ZPR/FPR registers |
69// | SVE stack objects | |-----------------------------------|
70// | | | ZPR stack objects |
71// |-----------------------------------| |-----------------------------------|
72// ^ NB: FPR CSRs are promoted to ZPRs
73// |-----------------------------------|
74// |.empty.space.to.make.part.below....|
75// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
76// |.the.standard.16-byte.alignment....| compile time; if present)
77// |-----------------------------------|
78// | local variables of fixed size |
79// | including spill slots |
80// | <FPR> |
81// | <hazard padding> |
82// | <GPR> |
83// |-----------------------------------| <- bp(not defined by ABI,
84// |.variable-sized.local.variables....| LLVM chooses X19)
85// |.(VLAs)............................| (size of this area is unknown at
86// |...................................| compile time)
87// |-----------------------------------| <- sp
88// | | Lower address
89//
90//
91// To access the data in a frame, at-compile time, a constant offset must be
92// computable from one of the pointers (fp, bp, sp) to access it. The size
93// of the areas with a dotted background cannot be computed at compile-time
94// if they are present, making it required to have all three of fp, bp and
95// sp to be set up to be able to access all contents in the frame areas,
96// assuming all of the frame areas are non-empty.
97//
98// For most functions, some of the frame areas are empty. For those functions,
99// it may not be necessary to set up fp or bp:
100// * A base pointer is definitely needed when there are both VLAs and local
101// variables with more-than-default alignment requirements.
102// * A frame pointer is definitely needed when there are local variables with
103// more-than-default alignment requirements.
104//
105// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
106// callee-saved area, since the unwind encoding does not allow for encoding
107// this dynamically and existing tools depend on this layout. For other
108// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
109// area to allow SVE stack objects (allocated directly below the callee-saves,
110// if available) to be accessed directly from the framepointer.
111// The SVE spill/fill instructions have VL-scaled addressing modes such
112// as:
113// ldr z8, [fp, #-7 mul vl]
114// For SVE the size of the vector length (VL) is not known at compile-time, so
115// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
116// layout, we don't need to add an unscaled offset to the framepointer before
117// accessing the SVE object in the frame.
118//
119// In some cases when a base pointer is not strictly needed, it is generated
120// anyway when offsets from the frame pointer to access local variables become
121// so large that the offset can't be encoded in the immediate fields of loads
122// or stores.
123//
124// Outgoing function arguments must be at the bottom of the stack frame when
125// calling another function. If we do not have variable-sized stack objects, we
126// can allocate a "reserved call frame" area at the bottom of the local
127// variable area, large enough for all outgoing calls. If we do have VLAs, then
128// the stack pointer must be decremented and incremented around each call to
129// make space for the arguments below the VLAs.
130//
131// FIXME: also explain the redzone concept.
132//
133// About stack hazards: Under some SME contexts, a coprocessor with its own
134// separate cache can used for FP operations. This can create hazards if the CPU
135// and the SME unit try to access the same area of memory, including if the
136// access is to an area of the stack. To try to alleviate this we attempt to
137// introduce extra padding into the stack frame between FP and GPR accesses,
138// controlled by the aarch64-stack-hazard-size option. Without changing the
139// layout of the stack frame in the diagram above, a stack object of size
140// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
141// to the stack objects section, and stack objects are sorted so that FPR >
142// Hazard padding slot > GPRs (where possible). Unfortunately some things are
143// not handled well (VLA area, arguments on the stack, objects with both GPR and
144// FPR accesses), but if those are controlled by the user then the entire stack
145// frame becomes GPR at the start/end with FPR in the middle, surrounded by
146// Hazard padding.
147//
148// An example of the prologue:
149//
150// .globl __foo
151// .align 2
152// __foo:
153// Ltmp0:
154// .cfi_startproc
155// .cfi_personality 155, ___gxx_personality_v0
156// Leh_func_begin:
157// .cfi_lsda 16, Lexception33
158//
159// stp xa,bx, [sp, -#offset]!
160// ...
161// stp x28, x27, [sp, #offset-32]
162// stp fp, lr, [sp, #offset-16]
163// add fp, sp, #offset - 16
164// sub sp, sp, #1360
165//
166// The Stack:
167// +-------------------------------------------+
168// 10000 | ........ | ........ | ........ | ........ |
169// 10004 | ........ | ........ | ........ | ........ |
170// +-------------------------------------------+
171// 10008 | ........ | ........ | ........ | ........ |
172// 1000c | ........ | ........ | ........ | ........ |
173// +===========================================+
174// 10010 | X28 Register |
175// 10014 | X28 Register |
176// +-------------------------------------------+
177// 10018 | X27 Register |
178// 1001c | X27 Register |
179// +===========================================+
180// 10020 | Frame Pointer |
181// 10024 | Frame Pointer |
182// +-------------------------------------------+
183// 10028 | Link Register |
184// 1002c | Link Register |
185// +===========================================+
186// 10030 | ........ | ........ | ........ | ........ |
187// 10034 | ........ | ........ | ........ | ........ |
188// +-------------------------------------------+
189// 10038 | ........ | ........ | ........ | ........ |
190// 1003c | ........ | ........ | ........ | ........ |
191// +-------------------------------------------+
192//
193// [sp] = 10030 :: >>initial value<<
194// sp = 10020 :: stp fp, lr, [sp, #-16]!
195// fp = sp == 10020 :: mov fp, sp
196// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
197// sp == 10010 :: >>final value<<
198//
199// The frame pointer (w29) points to address 10020. If we use an offset of
200// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
201// for w27, and -32 for w28:
202//
203// Ltmp1:
204// .cfi_def_cfa w29, 16
205// Ltmp2:
206// .cfi_offset w30, -8
207// Ltmp3:
208// .cfi_offset w29, -16
209// Ltmp4:
210// .cfi_offset w27, -24
211// Ltmp5:
212// .cfi_offset w28, -32
213//
214//===----------------------------------------------------------------------===//
215
216#include "AArch64FrameLowering.h"
217#include "AArch64InstrInfo.h"
220#include "AArch64RegisterInfo.h"
221#include "AArch64SMEAttributes.h"
222#include "AArch64Subtarget.h"
225#include "llvm/ADT/ScopeExit.h"
226#include "llvm/ADT/SmallVector.h"
244#include "llvm/IR/Attributes.h"
245#include "llvm/IR/CallingConv.h"
246#include "llvm/IR/DataLayout.h"
247#include "llvm/IR/DebugLoc.h"
248#include "llvm/IR/Function.h"
249#include "llvm/MC/MCAsmInfo.h"
250#include "llvm/MC/MCDwarf.h"
252#include "llvm/Support/Debug.h"
259#include <cassert>
260#include <cstdint>
261#include <iterator>
262#include <optional>
263#include <vector>
264
265using namespace llvm;
266
267#define DEBUG_TYPE "frame-info"
268
269static cl::opt<bool> EnableRedZone("aarch64-redzone",
270 cl::desc("enable use of redzone on AArch64"),
271 cl::init(false), cl::Hidden);
272
274 "stack-tagging-merge-settag",
275 cl::desc("merge settag instruction in function epilog"), cl::init(true),
276 cl::Hidden);
277
278static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
279 cl::desc("sort stack allocations"),
280 cl::init(true), cl::Hidden);
281
282static cl::opt<bool>
283 SplitSVEObjects("aarch64-split-sve-objects",
284 cl::desc("Split allocation of ZPR & PPR objects"),
285 cl::init(true), cl::Hidden);
286
288 "homogeneous-prolog-epilog", cl::Hidden,
289 cl::desc("Emit homogeneous prologue and epilogue for the size "
290 "optimization (default = off)"));
291
292// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
294 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
295 cl::Hidden);
296// Whether to insert padding into non-streaming functions (for testing).
297static cl::opt<bool>
298 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
299 cl::init(false), cl::Hidden);
300
302 "aarch64-disable-multivector-spill-fill",
303 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
304 cl::Hidden);
305
306int64_t
307AArch64FrameLowering::getArgumentStackToRestore(MachineFunction &MF,
308 MachineBasicBlock &MBB) const {
309 MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
311 bool IsTailCallReturn = (MBB.end() != MBBI)
313 : false;
314
315 int64_t ArgumentPopSize = 0;
316 if (IsTailCallReturn) {
317 MachineOperand &StackAdjust = MBBI->getOperand(1);
318
319 // For a tail-call in a callee-pops-arguments environment, some or all of
320 // the stack may actually be in use for the call's arguments, this is
321 // calculated during LowerCall and consumed here...
322 ArgumentPopSize = StackAdjust.getImm();
323 } else {
324 // ... otherwise the amount to pop is *all* of the argument space,
325 // conveniently stored in the MachineFunctionInfo by
326 // LowerFormalArguments. This will, of course, be zero for the C calling
327 // convention.
328 ArgumentPopSize = AFI->getArgumentStackToRestore();
329 }
330
331 return ArgumentPopSize;
332}
333
335 MachineFunction &MF);
336
337enum class AssignObjectOffsets { No, Yes };
338/// Process all the SVE stack objects and the SVE stack size and offsets for
339/// each object. If AssignOffsets is "Yes", the offsets get assigned (and SVE
340/// stack sizes set). Returns the size of the SVE stack.
342 AssignObjectOffsets AssignOffsets);
343
344static unsigned getStackHazardSize(const MachineFunction &MF) {
345 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
346}
347
353
356 // With split SVE objects, the hazard padding is added to the PPR region,
357 // which places it between the [GPR, PPR] area and the [ZPR, FPR] area. This
358 // avoids hazards between both GPRs and FPRs and ZPRs and PPRs.
361 : 0,
362 AFI->getStackSizePPR());
363}
364
365// Conservatively, returns true if the function is likely to have SVE vectors
366// on the stack. This function is safe to be called before callee-saves or
367// object offsets have been determined.
369 const MachineFunction &MF) {
370 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
371 if (AFI->isSVECC())
372 return true;
373
374 if (AFI->hasCalculatedStackSizeSVE())
375 return bool(AFL.getSVEStackSize(MF));
376
377 const MachineFrameInfo &MFI = MF.getFrameInfo();
378 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd(); FI++) {
379 if (MFI.hasScalableStackID(FI))
380 return true;
381 }
382
383 return false;
384}
385
386static bool isTargetWindows(const MachineFunction &MF) {
387 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI();
388}
389
395
396/// Returns true if a homogeneous prolog or epilog code can be emitted
397/// for the size optimization. If possible, a frame helper call is injected.
398/// When Exit block is given, this check is for epilog.
399bool AArch64FrameLowering::homogeneousPrologEpilog(
400 MachineFunction &MF, MachineBasicBlock *Exit) const {
401 if (!MF.getFunction().hasMinSize())
402 return false;
404 return false;
405 if (EnableRedZone)
406 return false;
407
408 // TODO: Window is supported yet.
409 if (isTargetWindows(MF))
410 return false;
411
412 // TODO: SVE is not supported yet.
413 if (isLikelyToHaveSVEStack(*this, MF))
414 return false;
415
416 // Bail on stack adjustment needed on return for simplicity.
417 const MachineFrameInfo &MFI = MF.getFrameInfo();
418 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
419 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
420 return false;
421 if (Exit && getArgumentStackToRestore(MF, *Exit))
422 return false;
423
424 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
426 return false;
427
428 // If there are an odd number of GPRs before LR and FP in the CSRs list,
429 // they will not be paired into one RegPairInfo, which is incompatible with
430 // the assumption made by the homogeneous prolog epilog pass.
431 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
432 unsigned NumGPRs = 0;
433 for (unsigned I = 0; CSRegs[I]; ++I) {
434 Register Reg = CSRegs[I];
435 if (Reg == AArch64::LR) {
436 assert(CSRegs[I + 1] == AArch64::FP);
437 if (NumGPRs % 2 != 0)
438 return false;
439 break;
440 }
441 if (AArch64::GPR64RegClass.contains(Reg))
442 ++NumGPRs;
443 }
444
445 return true;
446}
447
448/// Returns true if CSRs should be paired.
449bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
450 return produceCompactUnwindFrame(*this, MF) || homogeneousPrologEpilog(MF);
451}
452
453/// This is the biggest offset to the stack pointer we can encode in aarch64
454/// instructions (without using a separate calculation and a temp register).
455/// Note that the exception here are vector stores/loads which cannot encode any
456/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
457static const unsigned DefaultSafeSPDisplacement = 255;
458
459/// Look at each instruction that references stack frames and return the stack
460/// size limit beyond which some of these instructions will require a scratch
461/// register during their expansion later.
463 // FIXME: For now, just conservatively guesstimate based on unscaled indexing
464 // range. We'll end up allocating an unnecessary spill slot a lot, but
465 // realistically that's not a big deal at this stage of the game.
466 for (MachineBasicBlock &MBB : MF) {
467 for (MachineInstr &MI : MBB) {
468 if (MI.isDebugInstr() || MI.isPseudo() ||
469 MI.getOpcode() == AArch64::ADDXri ||
470 MI.getOpcode() == AArch64::ADDSXri)
471 continue;
472
473 for (const MachineOperand &MO : MI.operands()) {
474 if (!MO.isFI())
475 continue;
476
478 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
480 return 0;
481 }
482 }
483 }
485}
486
491
492unsigned
493AArch64FrameLowering::getFixedObjectSize(const MachineFunction &MF,
494 const AArch64FunctionInfo *AFI,
495 bool IsWin64, bool IsFunclet) const {
496 assert(AFI->getTailCallReservedStack() % 16 == 0 &&
497 "Tail call reserved stack must be aligned to 16 bytes");
498 if (!IsWin64 || IsFunclet) {
499 return AFI->getTailCallReservedStack();
500 } else {
501 if (AFI->getTailCallReservedStack() != 0 &&
502 !MF.getFunction().getAttributes().hasAttrSomewhere(
503 Attribute::SwiftAsync))
504 report_fatal_error("cannot generate ABI-changing tail call for Win64");
505 unsigned FixedObjectSize = AFI->getTailCallReservedStack();
506
507 // Var args are stored here in the primary function.
508 FixedObjectSize += AFI->getVarArgsGPRSize();
509
510 if (MF.hasEHFunclets()) {
511 // Catch objects are stored here in the primary function.
512 const MachineFrameInfo &MFI = MF.getFrameInfo();
513 const WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
514 SmallSetVector<int, 8> CatchObjFrameIndices;
515 for (const WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
516 for (const WinEHHandlerType &H : TBME.HandlerArray) {
517 int FrameIndex = H.CatchObj.FrameIndex;
518 if ((FrameIndex != INT_MAX) &&
519 CatchObjFrameIndices.insert(FrameIndex)) {
520 FixedObjectSize = alignTo(FixedObjectSize,
521 MFI.getObjectAlign(FrameIndex).value()) +
522 MFI.getObjectSize(FrameIndex);
523 }
524 }
525 }
526 // To support EH funclets we allocate an UnwindHelp object
527 FixedObjectSize += 8;
528 }
529 return alignTo(FixedObjectSize, 16);
530 }
531}
532
534 if (!EnableRedZone)
535 return false;
536
537 // Don't use the red zone if the function explicitly asks us not to.
538 // This is typically used for kernel code.
539 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
540 const unsigned RedZoneSize =
542 if (!RedZoneSize)
543 return false;
544
545 const MachineFrameInfo &MFI = MF.getFrameInfo();
547 uint64_t NumBytes = AFI->getLocalStackSize();
548
549 // If neither NEON or SVE are available, a COPY from one Q-reg to
550 // another requires a spill -> reload sequence. We can do that
551 // using a pre-decrementing store/post-decrementing load, but
552 // if we do so, we can't use the Red Zone.
553 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
554 !Subtarget.isNeonAvailable() &&
555 !Subtarget.hasSVE();
556
557 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
558 AFI->hasSVEStackSize() || LowerQRegCopyThroughMem);
559}
560
561/// hasFPImpl - Return true if the specified function should have a dedicated
562/// frame pointer register.
564 const MachineFrameInfo &MFI = MF.getFrameInfo();
565 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
567
568 // Win64 EH requires a frame pointer if funclets are present, as the locals
569 // are accessed off the frame pointer in both the parent function and the
570 // funclets.
571 if (MF.hasEHFunclets())
572 return true;
573 // Retain behavior of always omitting the FP for leaf functions when possible.
575 return true;
576 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
577 MFI.hasStackMap() || MFI.hasPatchPoint() ||
578 RegInfo->hasStackRealignment(MF))
579 return true;
580
581 // If we:
582 //
583 // 1. Have streaming mode changes
584 // OR:
585 // 2. Have a streaming body with SVE stack objects
586 //
587 // Then the value of VG restored when unwinding to this function may not match
588 // the value of VG used to set up the stack.
589 //
590 // This is a problem as the CFA can be described with an expression of the
591 // form: CFA = SP + NumBytes + VG * NumScalableBytes.
592 //
593 // If the value of VG used in that expression does not match the value used to
594 // set up the stack, an incorrect address for the CFA will be computed, and
595 // unwinding will fail.
596 //
597 // We work around this issue by ensuring the frame-pointer can describe the
598 // CFA in either of these cases.
599 if (AFI.needsDwarfUnwindInfo(MF) &&
602 return true;
603 // With large callframes around we may need to use FP to access the scavenging
604 // emergency spillslot.
605 //
606 // Unfortunately some calls to hasFP() like machine verifier ->
607 // getReservedReg() -> hasFP in the middle of global isel are too early
608 // to know the max call frame size. Hopefully conservatively returning "true"
609 // in those cases is fine.
610 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
611 if (!MFI.isMaxCallFrameSizeComputed() ||
613 return true;
614
615 return false;
616}
617
618/// Should the Frame Pointer be reserved for the current function?
620 const TargetMachine &TM = MF.getTarget();
621 const Triple &TT = TM.getTargetTriple();
622
623 // These OSes require the frame chain is valid, even if the current frame does
624 // not use a frame pointer.
625 if (TT.isOSDarwin() || TT.isOSWindows())
626 return true;
627
628 // If the function has a frame pointer, it is reserved.
629 if (hasFP(MF))
630 return true;
631
632 // Frontend has requested to preserve the frame pointer.
634 return true;
635
636 return false;
637}
638
639/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
640/// not required, we reserve argument space for call sites in the function
641/// immediately on entry to the current function. This eliminates the need for
642/// add/sub sp brackets around call sites. Returns true if the call frame is
643/// included as part of the stack frame.
645 const MachineFunction &MF) const {
646 // The stack probing code for the dynamically allocated outgoing arguments
647 // area assumes that the stack is probed at the top - either by the prologue
648 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
649 // most recent variable-sized object allocation. Changing the condition here
650 // may need to be followed up by changes to the probe issuing logic.
651 return !MF.getFrameInfo().hasVarSizedObjects();
652}
653
657
658 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
659 const AArch64InstrInfo *TII = Subtarget.getInstrInfo();
660 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
661 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
662 DebugLoc DL = I->getDebugLoc();
663 unsigned Opc = I->getOpcode();
664 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
665 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
666
667 if (!hasReservedCallFrame(MF)) {
668 int64_t Amount = I->getOperand(0).getImm();
669 Amount = alignTo(Amount, getStackAlign());
670 if (!IsDestroy)
671 Amount = -Amount;
672
673 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
674 // doesn't have to pop anything), then the first operand will be zero too so
675 // this adjustment is a no-op.
676 if (CalleePopAmount == 0) {
677 // FIXME: in-function stack adjustment for calls is limited to 24-bits
678 // because there's no guaranteed temporary register available.
679 //
680 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
681 // 1) For offset <= 12-bit, we use LSL #0
682 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
683 // LSL #0, and the other uses LSL #12.
684 //
685 // Most call frames will be allocated at the start of a function so
686 // this is OK, but it is a limitation that needs dealing with.
687 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
688
689 if (TLI->hasInlineStackProbe(MF) &&
691 // When stack probing is enabled, the decrement of SP may need to be
692 // probed. We only need to do this if the call site needs 1024 bytes of
693 // space or more, because a region smaller than that is allowed to be
694 // unprobed at an ABI boundary. We rely on the fact that SP has been
695 // probed exactly at this point, either by the prologue or most recent
696 // dynamic allocation.
698 "non-reserved call frame without var sized objects?");
699 Register ScratchReg =
700 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
701 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
702 } else {
703 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
704 StackOffset::getFixed(Amount), TII);
705 }
706 }
707 } else if (CalleePopAmount != 0) {
708 // If the calling convention demands that the callee pops arguments from the
709 // stack, we want to add it back if we have a reserved call frame.
710 assert(CalleePopAmount < 0xffffff && "call frame too large");
711 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
712 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
713 }
714 return MBB.erase(I);
715}
716
718 MachineBasicBlock &MBB) const {
719
720 MachineFunction &MF = *MBB.getParent();
721 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
722 const auto &TRI = *Subtarget.getRegisterInfo();
723 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
724
725 CFIInstBuilder CFIBuilder(MBB, MBB.begin(), MachineInstr::NoFlags);
726
727 // Reset the CFA to `SP + 0`.
728 CFIBuilder.buildDefCFA(AArch64::SP, 0);
729
730 // Flip the RA sign state.
731 if (MFI.shouldSignReturnAddress(MF))
732 MFI.branchProtectionPAuthLR() ? CFIBuilder.buildNegateRAStateWithPC()
733 : CFIBuilder.buildNegateRAState();
734
735 // Shadow call stack uses X18, reset it.
736 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
737 CFIBuilder.buildSameValue(AArch64::X18);
738
739 // Emit .cfi_same_value for callee-saved registers.
740 const std::vector<CalleeSavedInfo> &CSI =
742 for (const auto &Info : CSI) {
743 MCRegister Reg = Info.getReg();
744 if (!TRI.regNeedsCFI(Reg, Reg))
745 continue;
746 CFIBuilder.buildSameValue(Reg);
747 }
748}
749
751 switch (Reg.id()) {
752 default:
753 // The called routine is expected to preserve r19-r28
754 // r29 and r30 are used as frame pointer and link register resp.
755 return 0;
756
757 // GPRs
758#define CASE(n) \
759 case AArch64::W##n: \
760 case AArch64::X##n: \
761 return AArch64::X##n
762 CASE(0);
763 CASE(1);
764 CASE(2);
765 CASE(3);
766 CASE(4);
767 CASE(5);
768 CASE(6);
769 CASE(7);
770 CASE(8);
771 CASE(9);
772 CASE(10);
773 CASE(11);
774 CASE(12);
775 CASE(13);
776 CASE(14);
777 CASE(15);
778 CASE(16);
779 CASE(17);
780 CASE(18);
781#undef CASE
782
783 // FPRs
784#define CASE(n) \
785 case AArch64::B##n: \
786 case AArch64::H##n: \
787 case AArch64::S##n: \
788 case AArch64::D##n: \
789 case AArch64::Q##n: \
790 return HasSVE ? AArch64::Z##n : AArch64::Q##n
791 CASE(0);
792 CASE(1);
793 CASE(2);
794 CASE(3);
795 CASE(4);
796 CASE(5);
797 CASE(6);
798 CASE(7);
799 CASE(8);
800 CASE(9);
801 CASE(10);
802 CASE(11);
803 CASE(12);
804 CASE(13);
805 CASE(14);
806 CASE(15);
807 CASE(16);
808 CASE(17);
809 CASE(18);
810 CASE(19);
811 CASE(20);
812 CASE(21);
813 CASE(22);
814 CASE(23);
815 CASE(24);
816 CASE(25);
817 CASE(26);
818 CASE(27);
819 CASE(28);
820 CASE(29);
821 CASE(30);
822 CASE(31);
823#undef CASE
824 }
825}
826
827void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
828 MachineBasicBlock &MBB) const {
829 // Insertion point.
831
832 // Fake a debug loc.
833 DebugLoc DL;
834 if (MBBI != MBB.end())
835 DL = MBBI->getDebugLoc();
836
837 const MachineFunction &MF = *MBB.getParent();
838 const AArch64Subtarget &STI = MF.getSubtarget<AArch64Subtarget>();
839 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
840
841 BitVector GPRsToZero(TRI.getNumRegs());
842 BitVector FPRsToZero(TRI.getNumRegs());
843 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
844 for (MCRegister Reg : RegsToZero.set_bits()) {
845 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
846 // For GPRs, we only care to clear out the 64-bit register.
847 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
848 GPRsToZero.set(XReg);
849 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
850 // For FPRs,
851 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
852 FPRsToZero.set(XReg);
853 }
854 }
855
856 const AArch64InstrInfo &TII = *STI.getInstrInfo();
857
858 // Zero out GPRs.
859 for (MCRegister Reg : GPRsToZero.set_bits())
860 TII.buildClearRegister(Reg, MBB, MBBI, DL);
861
862 // Zero out FP/vector registers.
863 for (MCRegister Reg : FPRsToZero.set_bits())
864 TII.buildClearRegister(Reg, MBB, MBBI, DL);
865
866 if (HasSVE) {
867 for (MCRegister PReg :
868 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
869 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
870 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
871 AArch64::P15}) {
872 if (RegsToZero[PReg])
873 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
874 }
875 }
876}
877
878bool AArch64FrameLowering::windowsRequiresStackProbe(
879 const MachineFunction &MF, uint64_t StackSizeInBytes) const {
880 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
881 const AArch64FunctionInfo &MFI = *MF.getInfo<AArch64FunctionInfo>();
882 // TODO: When implementing stack protectors, take that into account
883 // for the probe threshold.
884 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
885 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
886}
887
889 const MachineBasicBlock &MBB) {
890 const MachineFunction *MF = MBB.getParent();
891 LiveRegs.addLiveIns(MBB);
892 // Mark callee saved registers as used so we will not choose them.
893 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
894 for (unsigned i = 0; CSRegs[i]; ++i)
895 LiveRegs.addReg(CSRegs[i]);
896}
897
899AArch64FrameLowering::findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB,
900 bool HasCall) const {
901 MachineFunction *MF = MBB->getParent();
902
903 // If MBB is an entry block, use X9 as the scratch register
904 // preserve_none functions may be using X9 to pass arguments,
905 // so prefer to pick an available register below.
906 if (&MF->front() == MBB &&
908 return AArch64::X9;
909
910 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
911 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
912 LivePhysRegs LiveRegs(TRI);
913 getLiveRegsForEntryMBB(LiveRegs, *MBB);
914 if (HasCall) {
915 LiveRegs.addReg(AArch64::X16);
916 LiveRegs.addReg(AArch64::X17);
917 LiveRegs.addReg(AArch64::X18);
918 }
919
920 // Prefer X9 since it was historically used for the prologue scratch reg.
921 const MachineRegisterInfo &MRI = MF->getRegInfo();
922 if (LiveRegs.available(MRI, AArch64::X9))
923 return AArch64::X9;
924
925 for (unsigned Reg : AArch64::GPR64RegClass) {
926 if (LiveRegs.available(MRI, Reg))
927 return Reg;
928 }
929 return AArch64::NoRegister;
930}
931
933 const MachineBasicBlock &MBB) const {
934 const MachineFunction *MF = MBB.getParent();
935 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
936 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
937 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
938 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
940
941 if (AFI->hasSwiftAsyncContext()) {
942 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
943 const MachineRegisterInfo &MRI = MF->getRegInfo();
946 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
947 // available.
948 if (!LiveRegs.available(MRI, AArch64::X16) ||
949 !LiveRegs.available(MRI, AArch64::X17))
950 return false;
951 }
952
953 // Certain stack probing sequences might clobber flags, then we can't use
954 // the block as a prologue if the flags register is a live-in.
956 MBB.isLiveIn(AArch64::NZCV))
957 return false;
958
959 if (RegInfo->hasStackRealignment(*MF) || TLI->hasInlineStackProbe(*MF))
960 if (findScratchNonCalleeSaveRegister(TmpMBB) == AArch64::NoRegister)
961 return false;
962
963 // May need a scratch register (for return value) if require making a special
964 // call
965 if (requiresSaveVG(*MF) ||
966 windowsRequiresStackProbe(*MF, std::numeric_limits<uint64_t>::max()))
967 if (findScratchNonCalleeSaveRegister(TmpMBB, true) == AArch64::NoRegister)
968 return false;
969
970 return true;
971}
972
974 const Function &F = MF.getFunction();
975 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
976 F.needsUnwindTableEntry();
977}
978
979bool AArch64FrameLowering::shouldSignReturnAddressEverywhere(
980 const MachineFunction &MF) const {
981 // FIXME: With WinCFI, extra care should be taken to place SEH_PACSignLR
982 // and SEH_EpilogEnd instructions in the correct order.
984 return false;
987}
988
989// Given a load or a store instruction, generate an appropriate unwinding SEH
990// code on Windows.
992AArch64FrameLowering::insertSEH(MachineBasicBlock::iterator MBBI,
993 const AArch64InstrInfo &TII,
994 MachineInstr::MIFlag Flag) const {
995 unsigned Opc = MBBI->getOpcode();
996 MachineBasicBlock *MBB = MBBI->getParent();
997 MachineFunction &MF = *MBB->getParent();
998 DebugLoc DL = MBBI->getDebugLoc();
999 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1000 int Imm = MBBI->getOperand(ImmIdx).getImm();
1002 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1003 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1004
1005 switch (Opc) {
1006 default:
1007 report_fatal_error("No SEH Opcode for this instruction");
1008 case AArch64::STR_ZXI:
1009 case AArch64::LDR_ZXI: {
1010 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1011 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveZReg))
1012 .addImm(Reg0)
1013 .addImm(Imm)
1014 .setMIFlag(Flag);
1015 break;
1016 }
1017 case AArch64::STR_PXI:
1018 case AArch64::LDR_PXI: {
1019 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1020 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SavePReg))
1021 .addImm(Reg0)
1022 .addImm(Imm)
1023 .setMIFlag(Flag);
1024 break;
1025 }
1026 case AArch64::LDPDpost:
1027 Imm = -Imm;
1028 [[fallthrough]];
1029 case AArch64::STPDpre: {
1030 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1031 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1032 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1033 .addImm(Reg0)
1034 .addImm(Reg1)
1035 .addImm(Imm * 8)
1036 .setMIFlag(Flag);
1037 break;
1038 }
1039 case AArch64::LDPXpost:
1040 Imm = -Imm;
1041 [[fallthrough]];
1042 case AArch64::STPXpre: {
1043 Register Reg0 = MBBI->getOperand(1).getReg();
1044 Register Reg1 = MBBI->getOperand(2).getReg();
1045 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1046 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1047 .addImm(Imm * 8)
1048 .setMIFlag(Flag);
1049 else
1050 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1051 .addImm(RegInfo->getSEHRegNum(Reg0))
1052 .addImm(RegInfo->getSEHRegNum(Reg1))
1053 .addImm(Imm * 8)
1054 .setMIFlag(Flag);
1055 break;
1056 }
1057 case AArch64::LDRDpost:
1058 Imm = -Imm;
1059 [[fallthrough]];
1060 case AArch64::STRDpre: {
1061 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1062 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1063 .addImm(Reg)
1064 .addImm(Imm)
1065 .setMIFlag(Flag);
1066 break;
1067 }
1068 case AArch64::LDRXpost:
1069 Imm = -Imm;
1070 [[fallthrough]];
1071 case AArch64::STRXpre: {
1072 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1073 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1074 .addImm(Reg)
1075 .addImm(Imm)
1076 .setMIFlag(Flag);
1077 break;
1078 }
1079 case AArch64::STPDi:
1080 case AArch64::LDPDi: {
1081 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1082 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1083 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1084 .addImm(Reg0)
1085 .addImm(Reg1)
1086 .addImm(Imm * 8)
1087 .setMIFlag(Flag);
1088 break;
1089 }
1090 case AArch64::STPXi:
1091 case AArch64::LDPXi: {
1092 Register Reg0 = MBBI->getOperand(0).getReg();
1093 Register Reg1 = MBBI->getOperand(1).getReg();
1094
1095 int SEHReg0 = RegInfo->getSEHRegNum(Reg0);
1096 int SEHReg1 = RegInfo->getSEHRegNum(Reg1);
1097
1098 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1099 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1100 .addImm(Imm * 8)
1101 .setMIFlag(Flag);
1102 else if (SEHReg0 >= 19 && SEHReg1 >= 19)
1103 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1104 .addImm(SEHReg0)
1105 .addImm(SEHReg1)
1106 .addImm(Imm * 8)
1107 .setMIFlag(Flag);
1108 else
1109 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegIP))
1110 .addImm(SEHReg0)
1111 .addImm(SEHReg1)
1112 .addImm(Imm * 8)
1113 .setMIFlag(Flag);
1114 break;
1115 }
1116 case AArch64::STRXui:
1117 case AArch64::LDRXui: {
1118 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1119 if (Reg >= 19)
1120 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1121 .addImm(Reg)
1122 .addImm(Imm * 8)
1123 .setMIFlag(Flag);
1124 else
1125 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegI))
1126 .addImm(Reg)
1127 .addImm(Imm * 8)
1128 .setMIFlag(Flag);
1129 break;
1130 }
1131 case AArch64::STRDui:
1132 case AArch64::LDRDui: {
1133 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1134 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1135 .addImm(Reg)
1136 .addImm(Imm * 8)
1137 .setMIFlag(Flag);
1138 break;
1139 }
1140 case AArch64::STPQi:
1141 case AArch64::LDPQi: {
1142 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1143 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1144 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1145 .addImm(Reg0)
1146 .addImm(Reg1)
1147 .addImm(Imm * 16)
1148 .setMIFlag(Flag);
1149 break;
1150 }
1151 case AArch64::LDPQpost:
1152 Imm = -Imm;
1153 [[fallthrough]];
1154 case AArch64::STPQpre: {
1155 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1156 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1157 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1158 .addImm(Reg0)
1159 .addImm(Reg1)
1160 .addImm(Imm * 16)
1161 .setMIFlag(Flag);
1162 break;
1163 }
1164 }
1165 auto I = MBB->insertAfter(MBBI, MIB);
1166 return I;
1167}
1168
1171 if (!AFI->needsDwarfUnwindInfo(MF) || !AFI->hasStreamingModeChanges())
1172 return false;
1173 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1174 // is enabled with streaming mode changes.
1175 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1176 if (ST.isTargetDarwin())
1177 return ST.hasSVE();
1178 return true;
1179}
1180
1182 MachineFunction &MF) const {
1183 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1184 const AArch64InstrInfo *TII = Subtarget.getInstrInfo();
1185
1186 auto EmitSignRA = [&](MachineBasicBlock &MBB) {
1187 DebugLoc DL; // Set debug location to unknown.
1189
1190 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1192 };
1193
1194 auto EmitAuthRA = [&](MachineBasicBlock &MBB) {
1195 DebugLoc DL;
1196 MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();
1197 if (MBBI != MBB.end())
1198 DL = MBBI->getDebugLoc();
1199
1200 TII->createPauthEpilogueInstr(MBB, DL);
1201 };
1202
1203 // This should be in sync with PEIImpl::calculateSaveRestoreBlocks.
1204 EmitSignRA(MF.front());
1205 for (MachineBasicBlock &MBB : MF) {
1206 if (MBB.isEHFuncletEntry())
1207 EmitSignRA(MBB);
1208 if (MBB.isReturnBlock())
1209 EmitAuthRA(MBB);
1210 }
1211}
1212
1214 MachineBasicBlock &MBB) const {
1215 AArch64PrologueEmitter PrologueEmitter(MF, MBB, *this);
1216 PrologueEmitter.emitPrologue();
1217}
1218
1220 MachineBasicBlock &MBB) const {
1221 AArch64EpilogueEmitter EpilogueEmitter(MF, MBB, *this);
1222 EpilogueEmitter.emitEpilogue();
1223}
1224
1227 MF.getInfo<AArch64FunctionInfo>()->needsDwarfUnwindInfo(MF);
1228}
1229
1231 return enableCFIFixup(MF) &&
1232 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
1233}
1234
1235/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
1236/// debug info. It's the same as what we use for resolving the code-gen
1237/// references for now. FIXME: This can go wrong when references are
1238/// SP-relative and simple call frames aren't used.
1241 Register &FrameReg) const {
1243 MF, FI, FrameReg,
1244 /*PreferFP=*/
1245 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
1246 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
1247 /*ForSimm=*/false);
1248}
1249
1252 int FI) const {
1253 // This function serves to provide a comparable offset from a single reference
1254 // point (the value of SP at function entry) that can be used for analysis,
1255 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
1256 // correct for all objects in the presence of VLA-area objects or dynamic
1257 // stack re-alignment.
1258
1259 const auto &MFI = MF.getFrameInfo();
1260
1261 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1262 StackOffset ZPRStackSize = getZPRStackSize(MF);
1263 StackOffset PPRStackSize = getPPRStackSize(MF);
1264 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1265
1266 // For VLA-area objects, just emit an offset at the end of the stack frame.
1267 // Whilst not quite correct, these objects do live at the end of the frame and
1268 // so it is more useful for analysis for the offset to reflect this.
1269 if (MFI.isVariableSizedObjectIndex(FI)) {
1270 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
1271 }
1272
1273 // This is correct in the absence of any SVE stack objects.
1274 if (!SVEStackSize)
1275 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
1276
1277 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1278 bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
1279 if (MFI.hasScalableStackID(FI)) {
1280 if (FPAfterSVECalleeSaves &&
1281 -ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1282 assert(!AFI->hasSplitSVEObjects() &&
1283 "split-sve-objects not supported with FPAfterSVECalleeSaves");
1284 return StackOffset::getScalable(ObjectOffset);
1285 }
1286 StackOffset AccessOffset{};
1287 // The scalable vectors are below (lower address) the scalable predicates
1288 // with split SVE objects, so we must subtract the size of the predicates.
1289 if (AFI->hasSplitSVEObjects() &&
1290 MFI.getStackID(FI) == TargetStackID::ScalableVector)
1291 AccessOffset = -PPRStackSize;
1292 return AccessOffset +
1293 StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
1294 ObjectOffset);
1295 }
1296
1297 bool IsFixed = MFI.isFixedObjectIndex(FI);
1298 bool IsCSR =
1299 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1300
1301 StackOffset ScalableOffset = {};
1302 if (!IsFixed && !IsCSR) {
1303 ScalableOffset = -SVEStackSize;
1304 } else if (FPAfterSVECalleeSaves && IsCSR) {
1305 ScalableOffset =
1307 }
1308
1309 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
1310}
1311
1317
1318StackOffset AArch64FrameLowering::getFPOffset(const MachineFunction &MF,
1319 int64_t ObjectOffset) const {
1320 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1321 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1322 const Function &F = MF.getFunction();
1323 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1324 unsigned FixedObject =
1325 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
1326 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
1327 int64_t FPAdjust =
1328 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
1329 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
1330}
1331
1332StackOffset AArch64FrameLowering::getStackOffset(const MachineFunction &MF,
1333 int64_t ObjectOffset) const {
1334 const auto &MFI = MF.getFrameInfo();
1335 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
1336}
1337
1338// TODO: This function currently does not work for scalable vectors.
1340 int FI) const {
1341 const AArch64RegisterInfo *RegInfo =
1342 MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
1343 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
1344 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
1345 ? getFPOffset(MF, ObjectOffset).getFixed()
1346 : getStackOffset(MF, ObjectOffset).getFixed();
1347}
1348
1350 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
1351 bool ForSimm) const {
1352 const auto &MFI = MF.getFrameInfo();
1353 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1354 bool isFixed = MFI.isFixedObjectIndex(FI);
1355 auto StackID = static_cast<TargetStackID::Value>(MFI.getStackID(FI));
1356 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, StackID,
1357 FrameReg, PreferFP, ForSimm);
1358}
1359
1361 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed,
1362 TargetStackID::Value StackID, Register &FrameReg, bool PreferFP,
1363 bool ForSimm) const {
1364 const auto &MFI = MF.getFrameInfo();
1365 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1366 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1367 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1368
1369 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
1370 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
1371 bool isCSR =
1372 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1373 bool isSVE = MFI.isScalableStackID(StackID);
1374
1375 StackOffset ZPRStackSize = getZPRStackSize(MF);
1376 StackOffset PPRStackSize = getPPRStackSize(MF);
1377 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1378
1379 // Use frame pointer to reference fixed objects. Use it for locals if
1380 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
1381 // reliable as a base). Make sure useFPForScavengingIndex() does the
1382 // right thing for the emergency spill slot.
1383 bool UseFP = false;
1384 if (AFI->hasStackFrame() && !isSVE) {
1385 // We shouldn't prefer using the FP to access fixed-sized stack objects when
1386 // there are scalable (SVE) objects in between the FP and the fixed-sized
1387 // objects.
1388 PreferFP &= !SVEStackSize;
1389
1390 // Note: Keeping the following as multiple 'if' statements rather than
1391 // merging to a single expression for readability.
1392 //
1393 // Argument access should always use the FP.
1394 if (isFixed) {
1395 UseFP = hasFP(MF);
1396 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
1397 // References to the CSR area must use FP if we're re-aligning the stack
1398 // since the dynamically-sized alignment padding is between the SP/BP and
1399 // the CSR area.
1400 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
1401 UseFP = true;
1402 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
1403 // If the FPOffset is negative and we're producing a signed immediate, we
1404 // have to keep in mind that the available offset range for negative
1405 // offsets is smaller than for positive ones. If an offset is available
1406 // via the FP and the SP, use whichever is closest.
1407 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
1408 PreferFP |= Offset > -FPOffset && !SVEStackSize;
1409
1410 if (FPOffset >= 0) {
1411 // If the FPOffset is positive, that'll always be best, as the SP/BP
1412 // will be even further away.
1413 UseFP = true;
1414 } else if (MFI.hasVarSizedObjects()) {
1415 // If we have variable sized objects, we can use either FP or BP, as the
1416 // SP offset is unknown. We can use the base pointer if we have one and
1417 // FP is not preferred. If not, we're stuck with using FP.
1418 bool CanUseBP = RegInfo->hasBasePointer(MF);
1419 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
1420 UseFP = PreferFP;
1421 else if (!CanUseBP) // Can't use BP. Forced to use FP.
1422 UseFP = true;
1423 // else we can use BP and FP, but the offset from FP won't fit.
1424 // That will make us scavenge registers which we can probably avoid by
1425 // using BP. If it won't fit for BP either, we'll scavenge anyway.
1426 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
1427 // Funclets access the locals contained in the parent's stack frame
1428 // via the frame pointer, so we have to use the FP in the parent
1429 // function.
1430 (void) Subtarget;
1431 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
1432 MF.getFunction().isVarArg()) &&
1433 "Funclets should only be present on Win64");
1434 UseFP = true;
1435 } else {
1436 // We have the choice between FP and (SP or BP).
1437 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
1438 UseFP = true;
1439 }
1440 }
1441 }
1442
1443 assert(
1444 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
1445 "In the presence of dynamic stack pointer realignment, "
1446 "non-argument/CSR objects cannot be accessed through the frame pointer");
1447
1448 bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
1449
1450 if (isSVE) {
1451 StackOffset FPOffset = StackOffset::get(
1452 -AFI->getCalleeSaveBaseToFrameRecordOffset(), ObjectOffset);
1453 StackOffset SPOffset =
1454 SVEStackSize +
1455 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
1456 ObjectOffset);
1457
1458 // With split SVE objects the ObjectOffset is relative to the split area
1459 // (i.e. the PPR area or ZPR area respectively).
1460 if (AFI->hasSplitSVEObjects() && StackID == TargetStackID::ScalableVector) {
1461 // If we're accessing an SVE vector with split SVE objects...
1462 // - From the FP we need to move down past the PPR area:
1463 FPOffset -= PPRStackSize;
1464 // - From the SP we only need to move up to the ZPR area:
1465 SPOffset -= PPRStackSize;
1466 // Note: `SPOffset = SVEStackSize + ...`, so `-= PPRStackSize` results in
1467 // `SPOffset = ZPRStackSize + ...`.
1468 }
1469
1470 if (FPAfterSVECalleeSaves) {
1472 if (-ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1475 }
1476 }
1477
1478 // Always use the FP for SVE spills if available and beneficial.
1479 if (hasFP(MF) && (SPOffset.getFixed() ||
1480 FPOffset.getScalable() < SPOffset.getScalable() ||
1481 RegInfo->hasStackRealignment(MF))) {
1482 FrameReg = RegInfo->getFrameRegister(MF);
1483 return FPOffset;
1484 }
1485 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
1486 : MCRegister(AArch64::SP);
1487
1488 return SPOffset;
1489 }
1490
1491 StackOffset SVEAreaOffset = {};
1492 if (FPAfterSVECalleeSaves) {
1493 // In this stack layout, the FP is in between the callee saves and other
1494 // SVE allocations.
1495 StackOffset SVECalleeSavedStack =
1497 if (UseFP) {
1498 if (isFixed)
1499 SVEAreaOffset = SVECalleeSavedStack;
1500 else if (!isCSR)
1501 SVEAreaOffset = SVECalleeSavedStack - SVEStackSize;
1502 } else {
1503 if (isFixed)
1504 SVEAreaOffset = SVEStackSize;
1505 else if (isCSR)
1506 SVEAreaOffset = SVEStackSize - SVECalleeSavedStack;
1507 }
1508 } else {
1509 if (UseFP && !(isFixed || isCSR))
1510 SVEAreaOffset = -SVEStackSize;
1511 if (!UseFP && (isFixed || isCSR))
1512 SVEAreaOffset = SVEStackSize;
1513 }
1514
1515 if (UseFP) {
1516 FrameReg = RegInfo->getFrameRegister(MF);
1517 return StackOffset::getFixed(FPOffset) + SVEAreaOffset;
1518 }
1519
1520 // Use the base pointer if we have one.
1521 if (RegInfo->hasBasePointer(MF))
1522 FrameReg = RegInfo->getBaseRegister();
1523 else {
1524 assert(!MFI.hasVarSizedObjects() &&
1525 "Can't use SP when we have var sized objects.");
1526 FrameReg = AArch64::SP;
1527 // If we're using the red zone for this function, the SP won't actually
1528 // be adjusted, so the offsets will be negative. They're also all
1529 // within range of the signed 9-bit immediate instructions.
1530 if (canUseRedZone(MF))
1531 Offset -= AFI->getLocalStackSize();
1532 }
1533
1534 return StackOffset::getFixed(Offset) + SVEAreaOffset;
1535}
1536
1538 // Do not set a kill flag on values that are also marked as live-in. This
1539 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
1540 // callee saved registers.
1541 // Omitting the kill flags is conservatively correct even if the live-in
1542 // is not used after all.
1543 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
1544 return getKillRegState(!IsLiveIn);
1545}
1546
1548 MachineFunction &MF) {
1549 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1550 AttributeList Attrs = MF.getFunction().getAttributes();
1552 return Subtarget.isTargetMachO() &&
1553 !(Subtarget.getTargetLowering()->supportSwiftError() &&
1554 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
1556 !AFL.requiresSaveVG(MF) && !AFI->isSVECC();
1557}
1558
1559static bool invalidateWindowsRegisterPairing(bool SpillExtendedVolatile,
1560 unsigned SpillCount, unsigned Reg1,
1561 unsigned Reg2, bool NeedsWinCFI,
1562 const TargetRegisterInfo *TRI) {
1563 // If we are generating register pairs for a Windows function that requires
1564 // EH support, then pair consecutive registers only. There are no unwind
1565 // opcodes for saves/restores of non-consecutive register pairs.
1566 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
1567 // save_lrpair.
1568 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
1569
1570 if (Reg2 == AArch64::FP)
1571 return true;
1572 if (!NeedsWinCFI)
1573 return false;
1574
1575 // ARM64EC introduced `save_any_regp`, which expects 16-byte alignment.
1576 // This is handled by only allowing paired spills for registers spilled at
1577 // even positions (which should be 16-byte aligned, as other GPRs/FPRs are
1578 // 8-bytes). We carve out an exception for {FP,LR}, which does not require
1579 // 16-byte alignment in the uop representation.
1580 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
1581 return SpillExtendedVolatile
1582 ? !((Reg1 == AArch64::FP && Reg2 == AArch64::LR) ||
1583 (SpillCount % 2) == 0)
1584 : false;
1585
1586 // If pairing a GPR with LR, the pair can be described by the save_lrpair
1587 // opcode. The save_lrpair opcode requires the first register to be odd.
1588 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
1589 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR)
1590 return false;
1591 return true;
1592}
1593
1594/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
1595/// WindowsCFI requires that only consecutive registers can be paired.
1596/// LR and FP need to be allocated together when the frame needs to save
1597/// the frame-record. This means any other register pairing with LR is invalid.
1598static bool invalidateRegisterPairing(bool SpillExtendedVolatile,
1599 unsigned SpillCount, unsigned Reg1,
1600 unsigned Reg2, bool UsesWinAAPCS,
1601 bool NeedsWinCFI, bool NeedsFrameRecord,
1602 const TargetRegisterInfo *TRI) {
1603 if (UsesWinAAPCS)
1604 return invalidateWindowsRegisterPairing(SpillExtendedVolatile, SpillCount,
1605 Reg1, Reg2, NeedsWinCFI, TRI);
1606
1607 // If we need to store the frame record, don't pair any register
1608 // with LR other than FP.
1609 if (NeedsFrameRecord)
1610 return Reg2 == AArch64::LR;
1611
1612 return false;
1613}
1614
1615namespace {
1616
1617struct RegPairInfo {
1618 Register Reg1;
1619 Register Reg2;
1620 int FrameIdx;
1621 int Offset;
1622 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
1623 const TargetRegisterClass *RC;
1624
1625 RegPairInfo() = default;
1626
1627 bool isPaired() const { return Reg2.isValid(); }
1628
1629 bool isScalable() const { return Type == PPR || Type == ZPR; }
1630};
1631
1632} // end anonymous namespace
1633
1635 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
1636 if (SavedRegs.test(PReg)) {
1637 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
1638 return MCRegister(PNReg);
1639 }
1640 }
1641 return MCRegister();
1642}
1643
1644// The multivector LD/ST are available only for SME or SVE2p1 targets
1646 MachineFunction &MF) {
1648 return false;
1649
1650 SMEAttrs FuncAttrs = MF.getInfo<AArch64FunctionInfo>()->getSMEFnAttrs();
1651 bool IsLocallyStreaming =
1652 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
1653
1654 // Only when in streaming mode SME2 instructions can be safely used.
1655 // It is not safe to use SME2 instructions when in streaming compatible or
1656 // locally streaming mode.
1657 return Subtarget.hasSVE2p1() ||
1658 (Subtarget.hasSME2() &&
1659 (!IsLocallyStreaming && Subtarget.isStreaming()));
1660}
1661
1663 MachineFunction &MF,
1665 const TargetRegisterInfo *TRI,
1667 bool NeedsFrameRecord) {
1668
1669 if (CSI.empty())
1670 return;
1671
1672 bool IsWindows = isTargetWindows(MF);
1674 unsigned StackHazardSize = getStackHazardSize(MF);
1675 MachineFrameInfo &MFI = MF.getFrameInfo();
1677 unsigned Count = CSI.size();
1678 (void)CC;
1679 // MachO's compact unwind format relies on all registers being stored in
1680 // pairs.
1681 assert((!produceCompactUnwindFrame(AFL, MF) ||
1684 (Count & 1) == 0) &&
1685 "Odd number of callee-saved regs to spill!");
1686 int ByteOffset = AFI->getCalleeSavedStackSize();
1687 int StackFillDir = -1;
1688 int RegInc = 1;
1689 unsigned FirstReg = 0;
1690 if (IsWindows) {
1691 // For WinCFI, fill the stack from the bottom up.
1692 ByteOffset = 0;
1693 StackFillDir = 1;
1694 // As the CSI array is reversed to match PrologEpilogInserter, iterate
1695 // backwards, to pair up registers starting from lower numbered registers.
1696 RegInc = -1;
1697 FirstReg = Count - 1;
1698 }
1699
1700 bool FPAfterSVECalleeSaves = AFL.hasSVECalleeSavesAboveFrameRecord(MF);
1701 // Windows AAPCS has x9-x15 as volatile registers, x16-x17 as intra-procedural
1702 // scratch, x18 as platform reserved. However, clang has extended calling
1703 // convensions such as preserve_most and preserve_all which treat these as
1704 // CSR. As such, the ARM64 unwind uOPs bias registers by 19. We use ARM64EC
1705 // uOPs which have separate restrictions. We need to check for that.
1706 //
1707 // NOTE: we currently do not account for the D registers as LLVM does not
1708 // support non-ABI compliant D register spills.
1709 bool SpillExtendedVolatile =
1710 IsWindows && llvm::any_of(CSI, [](const CalleeSavedInfo &CSI) {
1711 const auto &Reg = CSI.getReg();
1712 return Reg >= AArch64::X0 && Reg <= AArch64::X18;
1713 });
1714
1715 int ZPRByteOffset = 0;
1716 int PPRByteOffset = 0;
1717 bool SplitPPRs = AFI->hasSplitSVEObjects();
1718 if (SplitPPRs) {
1719 ZPRByteOffset = AFI->getZPRCalleeSavedStackSize();
1720 PPRByteOffset = AFI->getPPRCalleeSavedStackSize();
1721 } else if (!FPAfterSVECalleeSaves) {
1722 ZPRByteOffset =
1724 // Unused: Everything goes in ZPR space.
1725 PPRByteOffset = 0;
1726 }
1727
1728 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
1729 Register LastReg = 0;
1730 bool HasCSHazardPadding = AFI->hasStackHazardSlotIndex() && !SplitPPRs;
1731
1732 auto AlignOffset = [StackFillDir](int Offset, int Align) {
1733 if (StackFillDir < 0)
1734 return alignDown(Offset, Align);
1735 return alignTo(Offset, Align);
1736 };
1737
1738 // When iterating backwards, the loop condition relies on unsigned wraparound.
1739 for (unsigned i = FirstReg; i < Count; i += RegInc) {
1740 RegPairInfo RPI;
1741 RPI.Reg1 = CSI[i].getReg();
1742
1743 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
1744 RPI.Type = RegPairInfo::GPR;
1745 RPI.RC = &AArch64::GPR64RegClass;
1746 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
1747 RPI.Type = RegPairInfo::FPR64;
1748 RPI.RC = &AArch64::FPR64RegClass;
1749 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
1750 RPI.Type = RegPairInfo::FPR128;
1751 RPI.RC = &AArch64::FPR128RegClass;
1752 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
1753 RPI.Type = RegPairInfo::ZPR;
1754 RPI.RC = &AArch64::ZPRRegClass;
1755 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
1756 RPI.Type = RegPairInfo::PPR;
1757 RPI.RC = &AArch64::PPRRegClass;
1758 } else if (RPI.Reg1 == AArch64::VG) {
1759 RPI.Type = RegPairInfo::VG;
1760 RPI.RC = &AArch64::FIXED_REGSRegClass;
1761 } else {
1762 llvm_unreachable("Unsupported register class.");
1763 }
1764
1765 int &ScalableByteOffset = RPI.Type == RegPairInfo::PPR && SplitPPRs
1766 ? PPRByteOffset
1767 : ZPRByteOffset;
1768
1769 // Add the stack hazard size as we transition from GPR->FPR CSRs.
1770 if (HasCSHazardPadding &&
1771 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
1773 ByteOffset += StackFillDir * StackHazardSize;
1774 LastReg = RPI.Reg1;
1775
1776 bool NeedsWinCFI = AFL.needsWinCFI(MF);
1777 int Scale = TRI->getSpillSize(*RPI.RC);
1778 // Add the next reg to the pair if it is in the same register class.
1779 if (unsigned(i + RegInc) < Count && !HasCSHazardPadding) {
1780 MCRegister NextReg = CSI[i + RegInc].getReg();
1781 unsigned SpillCount = NeedsWinCFI ? FirstReg - i : i;
1782 switch (RPI.Type) {
1783 case RegPairInfo::GPR:
1784 if (AArch64::GPR64RegClass.contains(NextReg) &&
1785 !invalidateRegisterPairing(SpillExtendedVolatile, SpillCount,
1786 RPI.Reg1, NextReg, IsWindows,
1787 NeedsWinCFI, NeedsFrameRecord, TRI))
1788 RPI.Reg2 = NextReg;
1789 break;
1790 case RegPairInfo::FPR64:
1791 if (AArch64::FPR64RegClass.contains(NextReg) &&
1792 !invalidateRegisterPairing(SpillExtendedVolatile, SpillCount,
1793 RPI.Reg1, NextReg, IsWindows,
1794 NeedsWinCFI, NeedsFrameRecord, TRI))
1795 RPI.Reg2 = NextReg;
1796 break;
1797 case RegPairInfo::FPR128:
1798 if (AArch64::FPR128RegClass.contains(NextReg))
1799 RPI.Reg2 = NextReg;
1800 break;
1801 case RegPairInfo::PPR:
1802 break;
1803 case RegPairInfo::ZPR:
1804 if (AFI->getPredicateRegForFillSpill() != 0 &&
1805 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
1806 // Calculate offset of register pair to see if pair instruction can be
1807 // used.
1808 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
1809 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
1810 RPI.Reg2 = NextReg;
1811 }
1812 break;
1813 case RegPairInfo::VG:
1814 break;
1815 }
1816 }
1817
1818 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
1819 // list to come in sorted by frame index so that we can issue the store
1820 // pair instructions directly. Assert if we see anything otherwise.
1821 //
1822 // The order of the registers in the list is controlled by
1823 // getCalleeSavedRegs(), so they will always be in-order, as well.
1824 assert((!RPI.isPaired() ||
1825 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
1826 "Out of order callee saved regs!");
1827
1828 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
1829 RPI.Reg1 == AArch64::LR) &&
1830 "FrameRecord must be allocated together with LR");
1831
1832 // Windows AAPCS has FP and LR reversed.
1833 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
1834 RPI.Reg2 == AArch64::LR) &&
1835 "FrameRecord must be allocated together with LR");
1836
1837 // MachO's compact unwind format relies on all registers being stored in
1838 // adjacent register pairs.
1839 assert((!produceCompactUnwindFrame(AFL, MF) ||
1842 (RPI.isPaired() &&
1843 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
1844 RPI.Reg1 + 1 == RPI.Reg2))) &&
1845 "Callee-save registers not saved as adjacent register pair!");
1846
1847 RPI.FrameIdx = CSI[i].getFrameIdx();
1848 if (IsWindows &&
1849 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
1850 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
1851
1852 // Realign the scalable offset if necessary. This is relevant when spilling
1853 // predicates on Windows.
1854 if (RPI.isScalable() && ScalableByteOffset % Scale != 0)
1855 ScalableByteOffset = AlignOffset(ScalableByteOffset, Scale);
1856
1857 // Realign the fixed offset if necessary. This is relevant when spilling Q
1858 // registers after spilling an odd amount of X registers.
1859 if (!RPI.isScalable() && ByteOffset % Scale != 0)
1860 ByteOffset = AlignOffset(ByteOffset, Scale);
1861
1862 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1863 assert(OffsetPre % Scale == 0);
1864
1865 if (RPI.isScalable())
1866 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1867 else
1868 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1869
1870 // Swift's async context is directly before FP, so allocate an extra
1871 // 8 bytes for it.
1872 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1873 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1874 (IsWindows && RPI.Reg2 == AArch64::LR)))
1875 ByteOffset += StackFillDir * 8;
1876
1877 // Round up size of non-pair to pair size if we need to pad the
1878 // callee-save area to ensure 16-byte alignment.
1879 if (NeedGapToAlignStack && !IsWindows && !RPI.isScalable() &&
1880 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
1881 ByteOffset % 16 != 0) {
1882 ByteOffset += 8 * StackFillDir;
1883 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
1884 // A stack frame with a gap looks like this, bottom up:
1885 // d9, d8. x21, gap, x20, x19.
1886 // Set extra alignment on the x21 object to create the gap above it.
1887 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
1888 NeedGapToAlignStack = false;
1889 }
1890
1891 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1892 assert(OffsetPost % Scale == 0);
1893 // If filling top down (default), we want the offset after incrementing it.
1894 // If filling bottom up (WinCFI) we need the original offset.
1895 int Offset = IsWindows ? OffsetPre : OffsetPost;
1896
1897 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
1898 // Swift context can directly precede FP.
1899 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1900 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1901 (IsWindows && RPI.Reg2 == AArch64::LR)))
1902 Offset += 8;
1903 RPI.Offset = Offset / Scale;
1904
1905 assert((!RPI.isPaired() ||
1906 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
1907 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
1908 "Offset out of bounds for LDP/STP immediate");
1909
1910 auto isFrameRecord = [&] {
1911 if (RPI.isPaired())
1912 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
1913 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
1914 // Otherwise, look for the frame record as two unpaired registers. This is
1915 // needed for -aarch64-stack-hazard-size=<val>, which disables register
1916 // pairing (as the padding may be too large for the LDP/STP offset). Note:
1917 // On Windows, this check works out as current reg == FP, next reg == LR,
1918 // and on other platforms current reg == FP, previous reg == LR. This
1919 // works out as the correct pre-increment or post-increment offsets
1920 // respectively.
1921 return i > 0 && RPI.Reg1 == AArch64::FP &&
1922 CSI[i - 1].getReg() == AArch64::LR;
1923 };
1924
1925 // Save the offset to frame record so that the FP register can point to the
1926 // innermost frame record (spilled FP and LR registers).
1927 if (NeedsFrameRecord && isFrameRecord())
1929
1930 RegPairs.push_back(RPI);
1931 if (RPI.isPaired())
1932 i += RegInc;
1933 }
1934 if (IsWindows) {
1935 // If we need an alignment gap in the stack, align the topmost stack
1936 // object. A stack frame with a gap looks like this, bottom up:
1937 // x19, d8. d9, gap.
1938 // Set extra alignment on the topmost stack object (the first element in
1939 // CSI, which goes top down), to create the gap above it.
1940 if (AFI->hasCalleeSaveStackFreeSpace())
1941 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
1942 // We iterated bottom up over the registers; flip RegPairs back to top
1943 // down order.
1944 std::reverse(RegPairs.begin(), RegPairs.end());
1945 }
1946}
1947
1951 MachineFunction &MF = *MBB.getParent();
1952 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1953 auto &TLI = *Subtarget.getTargetLowering();
1954 const AArch64InstrInfo &TII = *Subtarget.getInstrInfo();
1955 bool NeedsWinCFI = needsWinCFI(MF);
1956 DebugLoc DL;
1958
1959 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
1960
1961 MachineRegisterInfo &MRI = MF.getRegInfo();
1962 // Refresh the reserved regs in case there are any potential changes since the
1963 // last freeze.
1964 MRI.freezeReservedRegs();
1965
1966 if (homogeneousPrologEpilog(MF)) {
1967 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
1969
1970 for (auto &RPI : RegPairs) {
1971 MIB.addReg(RPI.Reg1);
1972 MIB.addReg(RPI.Reg2);
1973
1974 // Update register live in.
1975 if (!MRI.isReserved(RPI.Reg1))
1976 MBB.addLiveIn(RPI.Reg1);
1977 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
1978 MBB.addLiveIn(RPI.Reg2);
1979 }
1980 return true;
1981 }
1982 bool PTrueCreated = false;
1983 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
1984 Register Reg1 = RPI.Reg1;
1985 Register Reg2 = RPI.Reg2;
1986 unsigned StrOpc;
1987
1988 // Issue sequence of spills for cs regs. The first spill may be converted
1989 // to a pre-decrement store later by emitPrologue if the callee-save stack
1990 // area allocation can't be combined with the local stack area allocation.
1991 // For example:
1992 // stp x22, x21, [sp, #0] // addImm(+0)
1993 // stp x20, x19, [sp, #16] // addImm(+2)
1994 // stp fp, lr, [sp, #32] // addImm(+4)
1995 // Rationale: This sequence saves uop updates compared to a sequence of
1996 // pre-increment spills like stp xi,xj,[sp,#-16]!
1997 // Note: Similar rationale and sequence for restores in epilog.
1998 unsigned Size = TRI->getSpillSize(*RPI.RC);
1999 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2000 switch (RPI.Type) {
2001 case RegPairInfo::GPR:
2002 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
2003 break;
2004 case RegPairInfo::FPR64:
2005 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
2006 break;
2007 case RegPairInfo::FPR128:
2008 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
2009 break;
2010 case RegPairInfo::ZPR:
2011 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
2012 break;
2013 case RegPairInfo::PPR:
2014 StrOpc = AArch64::STR_PXI;
2015 break;
2016 case RegPairInfo::VG:
2017 StrOpc = AArch64::STRXui;
2018 break;
2019 }
2020
2021 Register X0Scratch;
2022 llvm::scope_exit RestoreX0([&] {
2023 if (X0Scratch != AArch64::NoRegister)
2024 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), AArch64::X0)
2025 .addReg(X0Scratch)
2027 });
2028
2029 if (Reg1 == AArch64::VG) {
2030 // Find an available register to store value of VG to.
2031 Reg1 = findScratchNonCalleeSaveRegister(&MBB, true);
2032 assert(Reg1 != AArch64::NoRegister);
2033 if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
2034 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
2035 .addImm(31)
2036 .addImm(1)
2038 } else {
2040 if (any_of(MBB.liveins(),
2041 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
2042 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
2043 AArch64::X0, LiveIn.PhysReg);
2044 })) {
2045 X0Scratch = Reg1;
2046 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), X0Scratch)
2047 .addReg(AArch64::X0)
2049 }
2050
2051 RTLIB::Libcall LC = RTLIB::SMEABI_GET_CURRENT_VG;
2052 const uint32_t *RegMask =
2053 TRI->getCallPreservedMask(MF, TLI.getLibcallCallingConv(LC));
2054 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
2055 .addExternalSymbol(TLI.getLibcallName(LC))
2056 .addRegMask(RegMask)
2057 .addReg(AArch64::X0, RegState::ImplicitDefine)
2059 Reg1 = AArch64::X0;
2060 }
2061 }
2062
2063 LLVM_DEBUG({
2064 dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
2065 if (RPI.isPaired())
2066 dbgs() << ", " << printReg(Reg2, TRI);
2067 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2068 if (RPI.isPaired())
2069 dbgs() << ", " << RPI.FrameIdx + 1;
2070 dbgs() << ")\n";
2071 });
2072
2073 assert((!isTargetWindows(MF) ||
2074 !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
2075 "Windows unwdinding requires a consecutive (FP,LR) pair");
2076 // Windows unwind codes require consecutive registers if registers are
2077 // paired. Make the switch here, so that the code below will save (x,x+1)
2078 // and not (x+1,x).
2079 unsigned FrameIdxReg1 = RPI.FrameIdx;
2080 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2081 if (isTargetWindows(MF) && RPI.isPaired()) {
2082 std::swap(Reg1, Reg2);
2083 std::swap(FrameIdxReg1, FrameIdxReg2);
2084 }
2085
2086 if (RPI.isPaired() && RPI.isScalable()) {
2087 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2090 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2091 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2092 "Expects SVE2.1 or SME2 target and a predicate register");
2093#ifdef EXPENSIVE_CHECKS
2094 auto IsPPR = [](const RegPairInfo &c) {
2095 return c.Reg1 == RegPairInfo::PPR;
2096 };
2097 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
2098 auto IsZPR = [](const RegPairInfo &c) {
2099 return c.Type == RegPairInfo::ZPR;
2100 };
2101 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
2102 assert(!(PPRBegin < ZPRBegin) &&
2103 "Expected callee save predicate to be handled first");
2104#endif
2105 if (!PTrueCreated) {
2106 PTrueCreated = true;
2107 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2109 }
2110 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2111 if (!MRI.isReserved(Reg1))
2112 MBB.addLiveIn(Reg1);
2113 if (!MRI.isReserved(Reg2))
2114 MBB.addLiveIn(Reg2);
2115 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
2117 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2118 MachineMemOperand::MOStore, Size, Alignment));
2119 MIB.addReg(PnReg);
2120 MIB.addReg(AArch64::SP)
2121 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
2122 // where 2*vscale is implicit
2125 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2126 MachineMemOperand::MOStore, Size, Alignment));
2127 if (NeedsWinCFI)
2128 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2129 } else { // The code when the pair of ZReg is not present
2130 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2131 if (!MRI.isReserved(Reg1))
2132 MBB.addLiveIn(Reg1);
2133 if (RPI.isPaired()) {
2134 if (!MRI.isReserved(Reg2))
2135 MBB.addLiveIn(Reg2);
2136 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2138 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2139 MachineMemOperand::MOStore, Size, Alignment));
2140 }
2141 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2142 .addReg(AArch64::SP)
2143 .addImm(RPI.Offset) // [sp, #offset*vscale],
2144 // where factor*vscale is implicit
2147 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2148 MachineMemOperand::MOStore, Size, Alignment));
2149 if (NeedsWinCFI)
2150 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2151 }
2152 // Update the StackIDs of the SVE stack slots.
2153 MachineFrameInfo &MFI = MF.getFrameInfo();
2154 if (RPI.Type == RegPairInfo::ZPR) {
2155 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
2156 if (RPI.isPaired())
2157 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
2158 } else if (RPI.Type == RegPairInfo::PPR) {
2160 if (RPI.isPaired())
2162 }
2163 }
2164 return true;
2165}
2166
2170 MachineFunction &MF = *MBB.getParent();
2171 const AArch64InstrInfo &TII =
2172 *MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
2173 DebugLoc DL;
2175 bool NeedsWinCFI = needsWinCFI(MF);
2176
2177 if (MBBI != MBB.end())
2178 DL = MBBI->getDebugLoc();
2179
2180 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
2181 if (homogeneousPrologEpilog(MF, &MBB)) {
2182 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
2184 for (auto &RPI : RegPairs) {
2185 MIB.addReg(RPI.Reg1, RegState::Define);
2186 MIB.addReg(RPI.Reg2, RegState::Define);
2187 }
2188 return true;
2189 }
2190
2191 // For performance reasons restore SVE register in increasing order
2192 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
2193 auto PPRBegin = llvm::find_if(RegPairs, IsPPR);
2194 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
2195 std::reverse(PPRBegin, PPREnd);
2196 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
2197 auto ZPRBegin = llvm::find_if(RegPairs, IsZPR);
2198 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
2199 std::reverse(ZPRBegin, ZPREnd);
2200
2201 bool PTrueCreated = false;
2202 for (const RegPairInfo &RPI : RegPairs) {
2203 Register Reg1 = RPI.Reg1;
2204 Register Reg2 = RPI.Reg2;
2205
2206 // Issue sequence of restores for cs regs. The last restore may be converted
2207 // to a post-increment load later by emitEpilogue if the callee-save stack
2208 // area allocation can't be combined with the local stack area allocation.
2209 // For example:
2210 // ldp fp, lr, [sp, #32] // addImm(+4)
2211 // ldp x20, x19, [sp, #16] // addImm(+2)
2212 // ldp x22, x21, [sp, #0] // addImm(+0)
2213 // Note: see comment in spillCalleeSavedRegisters()
2214 unsigned LdrOpc;
2215 unsigned Size = TRI->getSpillSize(*RPI.RC);
2216 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2217 switch (RPI.Type) {
2218 case RegPairInfo::GPR:
2219 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2220 break;
2221 case RegPairInfo::FPR64:
2222 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
2223 break;
2224 case RegPairInfo::FPR128:
2225 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
2226 break;
2227 case RegPairInfo::ZPR:
2228 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
2229 break;
2230 case RegPairInfo::PPR:
2231 LdrOpc = AArch64::LDR_PXI;
2232 break;
2233 case RegPairInfo::VG:
2234 continue;
2235 }
2236 LLVM_DEBUG({
2237 dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
2238 if (RPI.isPaired())
2239 dbgs() << ", " << printReg(Reg2, TRI);
2240 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2241 if (RPI.isPaired())
2242 dbgs() << ", " << RPI.FrameIdx + 1;
2243 dbgs() << ")\n";
2244 });
2245
2246 // Windows unwind codes require consecutive registers if registers are
2247 // paired. Make the switch here, so that the code below will save (x,x+1)
2248 // and not (x+1,x).
2249 unsigned FrameIdxReg1 = RPI.FrameIdx;
2250 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2251 if (isTargetWindows(MF) && RPI.isPaired()) {
2252 std::swap(Reg1, Reg2);
2253 std::swap(FrameIdxReg1, FrameIdxReg2);
2254 }
2255
2257 if (RPI.isPaired() && RPI.isScalable()) {
2258 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2260 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2261 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2262 "Expects SVE2.1 or SME2 target and a predicate register");
2263#ifdef EXPENSIVE_CHECKS
2264 assert(!(PPRBegin < ZPRBegin) &&
2265 "Expected callee save predicate to be handled first");
2266#endif
2267 if (!PTrueCreated) {
2268 PTrueCreated = true;
2269 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2271 }
2272 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2273 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
2274 getDefRegState(true));
2276 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2277 MachineMemOperand::MOLoad, Size, Alignment));
2278 MIB.addReg(PnReg);
2279 MIB.addReg(AArch64::SP)
2280 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
2281 // where 2*vscale is implicit
2284 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2285 MachineMemOperand::MOLoad, Size, Alignment));
2286 if (NeedsWinCFI)
2287 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2288 } else {
2289 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2290 if (RPI.isPaired()) {
2291 MIB.addReg(Reg2, getDefRegState(true));
2293 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2294 MachineMemOperand::MOLoad, Size, Alignment));
2295 }
2296 MIB.addReg(Reg1, getDefRegState(true));
2297 MIB.addReg(AArch64::SP)
2298 .addImm(RPI.Offset) // [sp, #offset*vscale]
2299 // where factor*vscale is implicit
2302 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2303 MachineMemOperand::MOLoad, Size, Alignment));
2304 if (NeedsWinCFI)
2305 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2306 }
2307 }
2308 return true;
2309}
2310
2311// Return the FrameID for a MMO.
2312static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
2313 const MachineFrameInfo &MFI) {
2314 auto *PSV =
2316 if (PSV)
2317 return std::optional<int>(PSV->getFrameIndex());
2318
2319 if (MMO->getValue()) {
2320 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
2321 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
2322 FI++)
2323 if (MFI.getObjectAllocation(FI) == Al)
2324 return FI;
2325 }
2326 }
2327
2328 return std::nullopt;
2329}
2330
2331// Return the FrameID for a Load/Store instruction by looking at the first MMO.
2332static std::optional<int> getLdStFrameID(const MachineInstr &MI,
2333 const MachineFrameInfo &MFI) {
2334 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
2335 return std::nullopt;
2336
2337 return getMMOFrameID(*MI.memoperands_begin(), MFI);
2338}
2339
2340// Returns true if the LDST MachineInstr \p MI is a PPR access.
2341static bool isPPRAccess(const MachineInstr &MI) {
2342 return AArch64::PPRRegClass.contains(MI.getOperand(0).getReg());
2343}
2344
2345// Check if a Hazard slot is needed for the current function, and if so create
2346// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
2347// which can be used to determine if any hazard padding is needed.
2348void AArch64FrameLowering::determineStackHazardSlot(
2349 MachineFunction &MF, BitVector &SavedRegs) const {
2350 unsigned StackHazardSize = getStackHazardSize(MF);
2351 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2352 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
2354 return;
2355
2356 // Stack hazards are only needed in streaming functions.
2357 SMEAttrs Attrs = AFI->getSMEFnAttrs();
2358 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
2359 return;
2360
2361 MachineFrameInfo &MFI = MF.getFrameInfo();
2362
2363 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
2364 // stack objects.
2365 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2366 return AArch64::FPR64RegClass.contains(Reg) ||
2367 AArch64::FPR128RegClass.contains(Reg) ||
2368 AArch64::ZPRRegClass.contains(Reg);
2369 });
2370 bool HasPPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2371 return AArch64::PPRRegClass.contains(Reg);
2372 });
2373 bool HasFPRStackObjects = false;
2374 bool HasPPRStackObjects = false;
2375 if (!HasFPRCSRs || SplitSVEObjects) {
2376 enum SlotType : uint8_t {
2377 Unknown = 0,
2378 ZPRorFPR = 1 << 0,
2379 PPR = 1 << 1,
2380 GPR = 1 << 2,
2382 };
2383
2384 // Find stack slots solely used for one kind of register (ZPR, PPR, etc.),
2385 // based on the kinds of accesses used in the function.
2386 SmallVector<SlotType> SlotTypes(MFI.getObjectIndexEnd(), SlotType::Unknown);
2387 for (auto &MBB : MF) {
2388 for (auto &MI : MBB) {
2389 std::optional<int> FI = getLdStFrameID(MI, MFI);
2390 if (!FI || FI < 0 || FI > int(SlotTypes.size()))
2391 continue;
2392 if (MFI.hasScalableStackID(*FI)) {
2393 SlotTypes[*FI] |=
2394 isPPRAccess(MI) ? SlotType::PPR : SlotType::ZPRorFPR;
2395 } else {
2396 SlotTypes[*FI] |= AArch64InstrInfo::isFpOrNEON(MI)
2397 ? SlotType::ZPRorFPR
2398 : SlotType::GPR;
2399 }
2400 }
2401 }
2402
2403 for (int FI = 0; FI < int(SlotTypes.size()); ++FI) {
2404 HasFPRStackObjects |= SlotTypes[FI] == SlotType::ZPRorFPR;
2405 // For SplitSVEObjects remember that this stack slot is a predicate, this
2406 // will be needed later when determining the frame layout.
2407 if (SlotTypes[FI] == SlotType::PPR) {
2409 HasPPRStackObjects = true;
2410 }
2411 }
2412 }
2413
2414 if (HasFPRCSRs || HasFPRStackObjects) {
2415 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
2416 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
2417 << StackHazardSize << "\n");
2419 }
2420
2421 if (!AFI->hasStackHazardSlotIndex())
2422 return;
2423
2424 if (SplitSVEObjects) {
2425 CallingConv::ID CC = MF.getFunction().getCallingConv();
2426 if (AFI->isSVECC() || CC == CallingConv::AArch64_SVE_VectorCall) {
2427 AFI->setSplitSVEObjects(true);
2428 LLVM_DEBUG(dbgs() << "Using SplitSVEObjects for SVE CC function\n");
2429 return;
2430 }
2431
2432 // We only use SplitSVEObjects in non-SVE CC functions if there's a
2433 // possibility of a stack hazard between PPRs and ZPRs/FPRs.
2434 LLVM_DEBUG(dbgs() << "Determining if SplitSVEObjects should be used in "
2435 "non-SVE CC function...\n");
2436
2437 // If another calling convention is explicitly set FPRs can't be promoted to
2438 // ZPR callee-saves.
2440 LLVM_DEBUG(
2441 dbgs()
2442 << "Calling convention is not supported with SplitSVEObjects\n");
2443 return;
2444 }
2445
2446 if (!HasPPRCSRs && !HasPPRStackObjects) {
2447 LLVM_DEBUG(
2448 dbgs() << "Not using SplitSVEObjects as no PPRs are on the stack\n");
2449 return;
2450 }
2451
2452 if (!HasFPRCSRs && !HasFPRStackObjects) {
2453 LLVM_DEBUG(
2454 dbgs()
2455 << "Not using SplitSVEObjects as no FPRs or ZPRs are on the stack\n");
2456 return;
2457 }
2458
2459 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2460 MF.getSubtarget<AArch64Subtarget>();
2462 "Expected SVE to be available for PPRs");
2463
2464 const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
2465 // With SplitSVEObjects the CS hazard padding is placed between the
2466 // PPRs and ZPRs. If there are any FPR CS there would be a hazard between
2467 // them and the CS GRPs. Avoid this by promoting all FPR CS to ZPRs.
2468 BitVector FPRZRegs(SavedRegs.size());
2469 for (size_t Reg = 0, E = SavedRegs.size(); HasFPRCSRs && Reg < E; ++Reg) {
2470 BitVector::reference RegBit = SavedRegs[Reg];
2471 if (!RegBit)
2472 continue;
2473 unsigned SubRegIdx = 0;
2474 if (AArch64::FPR64RegClass.contains(Reg))
2475 SubRegIdx = AArch64::dsub;
2476 else if (AArch64::FPR128RegClass.contains(Reg))
2477 SubRegIdx = AArch64::zsub;
2478 else
2479 continue;
2480 // Clear the bit for the FPR save.
2481 RegBit = false;
2482 // Mark that we should save the corresponding ZPR.
2483 Register ZReg =
2484 TRI->getMatchingSuperReg(Reg, SubRegIdx, &AArch64::ZPRRegClass);
2485 FPRZRegs.set(ZReg);
2486 }
2487 SavedRegs |= FPRZRegs;
2488
2489 AFI->setSplitSVEObjects(true);
2490 LLVM_DEBUG(dbgs() << "SplitSVEObjects enabled!\n");
2491 }
2492}
2493
2495 BitVector &SavedRegs,
2496 RegScavenger *RS) const {
2497 // All calls are tail calls in GHC calling conv, and functions have no
2498 // prologue/epilogue.
2500 return;
2501
2502 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2503
2505 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
2507 unsigned UnspilledCSGPR = AArch64::NoRegister;
2508 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
2509
2510 MachineFrameInfo &MFI = MF.getFrameInfo();
2511 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
2512
2513 MCRegister BasePointerReg =
2514 RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister() : MCRegister();
2515
2516 unsigned ExtraCSSpill = 0;
2517 bool HasUnpairedGPR64 = false;
2518 bool HasPairZReg = false;
2519 BitVector UserReservedRegs = RegInfo->getUserReservedRegs(MF);
2520 BitVector ReservedRegs = RegInfo->getReservedRegs(MF);
2521
2522 // Figure out which callee-saved registers to save/restore.
2523 for (unsigned i = 0; CSRegs[i]; ++i) {
2524 const MCRegister Reg = CSRegs[i];
2525
2526 // Add the base pointer register to SavedRegs if it is callee-save.
2527 if (Reg == BasePointerReg)
2528 SavedRegs.set(Reg);
2529
2530 // Don't save manually reserved registers set through +reserve-x#i,
2531 // even for callee-saved registers, as per GCC's behavior.
2532 if (UserReservedRegs[Reg]) {
2533 SavedRegs.reset(Reg);
2534 continue;
2535 }
2536
2537 bool RegUsed = SavedRegs.test(Reg);
2538 MCRegister PairedReg;
2539 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
2540 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
2541 AArch64::FPR128RegClass.contains(Reg)) {
2542 // Compensate for odd numbers of GP CSRs.
2543 // For now, all the known cases of odd number of CSRs are of GPRs.
2544 if (HasUnpairedGPR64)
2545 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
2546 else
2547 PairedReg = CSRegs[i ^ 1];
2548 }
2549
2550 // If the function requires all the GP registers to save (SavedRegs),
2551 // and there are an odd number of GP CSRs at the same time (CSRegs),
2552 // PairedReg could be in a different register class from Reg, which would
2553 // lead to a FPR (usually D8) accidentally being marked saved.
2554 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
2555 PairedReg = AArch64::NoRegister;
2556 HasUnpairedGPR64 = true;
2557 }
2558 assert(PairedReg == AArch64::NoRegister ||
2559 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
2560 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
2561 AArch64::FPR128RegClass.contains(Reg, PairedReg));
2562
2563 if (!RegUsed) {
2564 if (AArch64::GPR64RegClass.contains(Reg) && !ReservedRegs[Reg]) {
2565 UnspilledCSGPR = Reg;
2566 UnspilledCSGPRPaired = PairedReg;
2567 }
2568 continue;
2569 }
2570
2571 // MachO's compact unwind format relies on all registers being stored in
2572 // pairs.
2573 // FIXME: the usual format is actually better if unwinding isn't needed.
2574 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
2575 !SavedRegs.test(PairedReg)) {
2576 SavedRegs.set(PairedReg);
2577 if (AArch64::GPR64RegClass.contains(PairedReg) &&
2578 !ReservedRegs[PairedReg])
2579 ExtraCSSpill = PairedReg;
2580 }
2581 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
2582 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
2583 SavedRegs.test(CSRegs[i ^ 1]));
2584 }
2585
2586 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
2588 // Find a suitable predicate register for the multi-vector spill/fill
2589 // instructions.
2590 MCRegister PnReg = findFreePredicateReg(SavedRegs);
2591 if (PnReg.isValid())
2592 AFI->setPredicateRegForFillSpill(PnReg);
2593 // If no free callee-save has been found assign one.
2594 if (!AFI->getPredicateRegForFillSpill() &&
2595 MF.getFunction().getCallingConv() ==
2597 SavedRegs.set(AArch64::P8);
2598 AFI->setPredicateRegForFillSpill(AArch64::PN8);
2599 }
2600
2601 assert(!ReservedRegs[AFI->getPredicateRegForFillSpill()] &&
2602 "Predicate cannot be a reserved register");
2603 }
2604
2606 !Subtarget.isTargetWindows()) {
2607 // For Windows calling convention on a non-windows OS, where X18 is treated
2608 // as reserved, back up X18 when entering non-windows code (marked with the
2609 // Windows calling convention) and restore when returning regardless of
2610 // whether the individual function uses it - it might call other functions
2611 // that clobber it.
2612 SavedRegs.set(AArch64::X18);
2613 }
2614
2615 // Determine if a Hazard slot should be used and where it should go.
2616 // If SplitSVEObjects is used, the hazard padding is placed between the PPRs
2617 // and ZPRs. Otherwise, it goes in the callee save area.
2618 determineStackHazardSlot(MF, SavedRegs);
2619
2620 // Calculates the callee saved stack size.
2621 unsigned CSStackSize = 0;
2622 unsigned ZPRCSStackSize = 0;
2623 unsigned PPRCSStackSize = 0;
2625 for (unsigned Reg : SavedRegs.set_bits()) {
2626 auto *RC = TRI->getMinimalPhysRegClass(MCRegister(Reg));
2627 assert(RC && "expected register class!");
2628 auto SpillSize = TRI->getSpillSize(*RC);
2629 bool IsZPR = AArch64::ZPRRegClass.contains(Reg);
2630 bool IsPPR = !IsZPR && AArch64::PPRRegClass.contains(Reg);
2631 if (IsZPR)
2632 ZPRCSStackSize += SpillSize;
2633 else if (IsPPR)
2634 PPRCSStackSize += SpillSize;
2635 else
2636 CSStackSize += SpillSize;
2637 }
2638
2639 // Save number of saved regs, so we can easily update CSStackSize later to
2640 // account for any additional 64-bit GPR saves. Note: After this point
2641 // only 64-bit GPRs can be added to SavedRegs.
2642 unsigned NumSavedRegs = SavedRegs.count();
2643
2644 // If we have hazard padding in the CS area add that to the size.
2646 CSStackSize += getStackHazardSize(MF);
2647
2648 // Increase the callee-saved stack size if the function has streaming mode
2649 // changes, as we will need to spill the value of the VG register.
2650 if (requiresSaveVG(MF))
2651 CSStackSize += 8;
2652
2653 // If we must call __arm_get_current_vg in the prologue preserve the LR.
2654 if (requiresSaveVG(MF) && !Subtarget.hasSVE())
2655 SavedRegs.set(AArch64::LR);
2656
2657 // The frame record needs to be created by saving the appropriate registers
2658 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
2659 if (hasFP(MF) ||
2660 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
2661 SavedRegs.set(AArch64::FP);
2662 SavedRegs.set(AArch64::LR);
2663 }
2664
2665 LLVM_DEBUG({
2666 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
2667 for (unsigned Reg : SavedRegs.set_bits())
2668 dbgs() << ' ' << printReg(MCRegister(Reg), RegInfo);
2669 dbgs() << "\n";
2670 });
2671
2672 // If any callee-saved registers are used, the frame cannot be eliminated.
2673 auto [ZPRLocalStackSize, PPRLocalStackSize] =
2675 uint64_t SVELocals = ZPRLocalStackSize + PPRLocalStackSize;
2676 uint64_t SVEStackSize =
2677 alignTo(ZPRCSStackSize + PPRCSStackSize + SVELocals, 16);
2678 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
2679
2680 // The CSR spill slots have not been allocated yet, so estimateStackSize
2681 // won't include them.
2682 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
2683
2684 // We may address some of the stack above the canonical frame address, either
2685 // for our own arguments or during a call. Include that in calculating whether
2686 // we have complicated addressing concerns.
2687 int64_t CalleeStackUsed = 0;
2688 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
2689 int64_t FixedOff = MFI.getObjectOffset(I);
2690 if (FixedOff > CalleeStackUsed)
2691 CalleeStackUsed = FixedOff;
2692 }
2693
2694 // Conservatively always assume BigStack when there are SVE spills.
2695 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
2696 CalleeStackUsed) > EstimatedStackSizeLimit;
2697 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
2698 AFI->setHasStackFrame(true);
2699
2700 // Estimate if we might need to scavenge a register at some point in order
2701 // to materialize a stack offset. If so, either spill one additional
2702 // callee-saved register or reserve a special spill slot to facilitate
2703 // register scavenging. If we already spilled an extra callee-saved register
2704 // above to keep the number of spills even, we don't need to do anything else
2705 // here.
2706 if (BigStack) {
2707 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
2708 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
2709 << " to get a scratch register.\n");
2710 SavedRegs.set(UnspilledCSGPR);
2711 ExtraCSSpill = UnspilledCSGPR;
2712
2713 // MachO's compact unwind format relies on all registers being stored in
2714 // pairs, so if we need to spill one extra for BigStack, then we need to
2715 // store the pair.
2716 if (producePairRegisters(MF)) {
2717 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
2718 // Failed to make a pair for compact unwind format, revert spilling.
2719 if (produceCompactUnwindFrame(*this, MF)) {
2720 SavedRegs.reset(UnspilledCSGPR);
2721 ExtraCSSpill = AArch64::NoRegister;
2722 }
2723 } else
2724 SavedRegs.set(UnspilledCSGPRPaired);
2725 }
2726 }
2727
2728 // If we didn't find an extra callee-saved register to spill, create
2729 // an emergency spill slot.
2730 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
2732 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
2733 unsigned Size = TRI->getSpillSize(RC);
2734 Align Alignment = TRI->getSpillAlign(RC);
2735 int FI = MFI.CreateSpillStackObject(Size, Alignment);
2736 RS->addScavengingFrameIndex(FI);
2737 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
2738 << " as the emergency spill slot.\n");
2739 }
2740 }
2741
2742 // Adding the size of additional 64bit GPR saves.
2743 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
2744
2745 // A Swift asynchronous context extends the frame record with a pointer
2746 // directly before FP.
2747 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
2748 CSStackSize += 8;
2749
2750 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
2751 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
2752 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
2753
2755 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
2756 "Should not invalidate callee saved info");
2757
2758 // Round up to register pair alignment to avoid additional SP adjustment
2759 // instructions.
2760 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
2761 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
2762 AFI->setSVECalleeSavedStackSize(ZPRCSStackSize, alignTo(PPRCSStackSize, 16));
2763}
2764
2766 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
2767 std::vector<CalleeSavedInfo> &CSI) const {
2768 bool IsWindows = isTargetWindows(MF);
2769 unsigned StackHazardSize = getStackHazardSize(MF);
2770 // To match the canonical windows frame layout, reverse the list of
2771 // callee saved registers to get them laid out by PrologEpilogInserter
2772 // in the right order. (PrologEpilogInserter allocates stack objects top
2773 // down. Windows canonical prologs store higher numbered registers at
2774 // the top, thus have the CSI array start from the highest registers.)
2775 if (IsWindows)
2776 std::reverse(CSI.begin(), CSI.end());
2777
2778 if (CSI.empty())
2779 return true; // Early exit if no callee saved registers are modified!
2780
2781 // Now that we know which registers need to be saved and restored, allocate
2782 // stack slots for them.
2783 MachineFrameInfo &MFI = MF.getFrameInfo();
2784 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2785
2786 if (IsWindows && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2787 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
2788 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2789 MFI.setIsCalleeSavedObjectIndex(FrameIdx, true);
2790 }
2791
2792 // Insert VG into the list of CSRs, immediately before LR if saved.
2793 if (requiresSaveVG(MF)) {
2794 CalleeSavedInfo VGInfo(AArch64::VG);
2795 auto It =
2796 find_if(CSI, [](auto &Info) { return Info.getReg() == AArch64::LR; });
2797 if (It != CSI.end())
2798 CSI.insert(It, VGInfo);
2799 else
2800 CSI.push_back(VGInfo);
2801 }
2802
2803 Register LastReg = 0;
2804 int HazardSlotIndex = std::numeric_limits<int>::max();
2805 for (auto &CS : CSI) {
2806 MCRegister Reg = CS.getReg();
2807 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
2808
2809 // Create a hazard slot as we switch between GPR and FPR CSRs.
2811 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
2813 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
2814 "Unexpected register order for hazard slot");
2815 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2816 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2817 << "\n");
2818 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2819 MFI.setIsCalleeSavedObjectIndex(HazardSlotIndex, true);
2820 }
2821
2822 unsigned Size = RegInfo->getSpillSize(*RC);
2823 Align Alignment(RegInfo->getSpillAlign(*RC));
2824 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
2825 CS.setFrameIdx(FrameIdx);
2826 MFI.setIsCalleeSavedObjectIndex(FrameIdx, true);
2827
2828 // Grab 8 bytes below FP for the extended asynchronous frame info.
2829 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !IsWindows &&
2830 Reg == AArch64::FP) {
2831 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
2832 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2833 MFI.setIsCalleeSavedObjectIndex(FrameIdx, true);
2834 }
2835 LastReg = Reg;
2836 }
2837
2838 // Add hazard slot in the case where no FPR CSRs are present.
2840 HazardSlotIndex == std::numeric_limits<int>::max()) {
2841 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2842 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2843 << "\n");
2844 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2845 MFI.setIsCalleeSavedObjectIndex(HazardSlotIndex, true);
2846 }
2847
2848 return true;
2849}
2850
2852 const MachineFunction &MF) const {
2854 // If the function has streaming-mode changes, don't scavenge a
2855 // spillslot in the callee-save area, as that might require an
2856 // 'addvl' in the streaming-mode-changing call-sequence when the
2857 // function doesn't use a FP.
2858 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
2859 return false;
2860 // Don't allow register salvaging with hazard slots, in case it moves objects
2861 // into the wrong place.
2862 if (AFI->hasStackHazardSlotIndex())
2863 return false;
2864 return AFI->hasCalleeSaveStackFreeSpace();
2865}
2866
2867/// returns true if there are any SVE callee saves.
2869 int &Min, int &Max) {
2870 Min = std::numeric_limits<int>::max();
2871 Max = std::numeric_limits<int>::min();
2872
2873 if (!MFI.isCalleeSavedInfoValid())
2874 return false;
2875
2876 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
2877 for (auto &CS : CSI) {
2878 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
2879 AArch64::PPRRegClass.contains(CS.getReg())) {
2880 assert((Max == std::numeric_limits<int>::min() ||
2881 Max + 1 == CS.getFrameIdx()) &&
2882 "SVE CalleeSaves are not consecutive");
2883 Min = std::min(Min, CS.getFrameIdx());
2884 Max = std::max(Max, CS.getFrameIdx());
2885 }
2886 }
2887 return Min != std::numeric_limits<int>::max();
2888}
2889
2891 AssignObjectOffsets AssignOffsets) {
2892 MachineFrameInfo &MFI = MF.getFrameInfo();
2893 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2894
2895 SVEStackSizes SVEStack{};
2896
2897 // With SplitSVEObjects we maintain separate stack offsets for predicates
2898 // (PPRs) and SVE vectors (ZPRs). When SplitSVEObjects is disabled predicates
2899 // are included in the SVE vector area.
2900 uint64_t &ZPRStackTop = SVEStack.ZPRStackSize;
2901 uint64_t &PPRStackTop =
2902 AFI->hasSplitSVEObjects() ? SVEStack.PPRStackSize : SVEStack.ZPRStackSize;
2903
2904#ifndef NDEBUG
2905 // First process all fixed stack objects.
2906 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
2907 assert(!MFI.hasScalableStackID(I) &&
2908 "SVE vectors should never be passed on the stack by value, only by "
2909 "reference.");
2910#endif
2911
2912 auto AllocateObject = [&](int FI) {
2914 ? ZPRStackTop
2915 : PPRStackTop;
2916
2917 // FIXME: Given that the length of SVE vectors is not necessarily a power of
2918 // two, we'd need to align every object dynamically at runtime if the
2919 // alignment is larger than 16. This is not yet supported.
2920 Align Alignment = MFI.getObjectAlign(FI);
2921 if (Alignment > Align(16))
2923 "Alignment of scalable vectors > 16 bytes is not yet supported");
2924
2925 StackTop += MFI.getObjectSize(FI);
2926 StackTop = alignTo(StackTop, Alignment);
2927
2928 assert(StackTop < (uint64_t)std::numeric_limits<int64_t>::max() &&
2929 "SVE StackTop far too large?!");
2930
2931 int64_t Offset = -int64_t(StackTop);
2932 if (AssignOffsets == AssignObjectOffsets::Yes)
2933 MFI.setObjectOffset(FI, Offset);
2934
2935 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
2936 };
2937
2938 // Then process all callee saved slots.
2939 int MinCSFrameIndex, MaxCSFrameIndex;
2940 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
2941 for (int FI = MinCSFrameIndex; FI <= MaxCSFrameIndex; ++FI)
2942 AllocateObject(FI);
2943 }
2944
2945 // Ensure the CS area is 16-byte aligned.
2946 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2947 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2948
2949 // Create a buffer of SVE objects to allocate and sort it.
2950 SmallVector<int, 8> ObjectsToAllocate;
2951 // If we have a stack protector, and we've previously decided that we have SVE
2952 // objects on the stack and thus need it to go in the SVE stack area, then it
2953 // needs to go first.
2954 int StackProtectorFI = -1;
2955 if (MFI.hasStackProtectorIndex()) {
2956 StackProtectorFI = MFI.getStackProtectorIndex();
2957 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
2958 ObjectsToAllocate.push_back(StackProtectorFI);
2959 }
2960
2961 for (int FI = 0, E = MFI.getObjectIndexEnd(); FI != E; ++FI) {
2962 if (FI == StackProtectorFI || MFI.isDeadObjectIndex(FI) ||
2964 continue;
2965
2968 continue;
2969
2970 ObjectsToAllocate.push_back(FI);
2971 }
2972
2973 // Allocate all SVE locals and spills
2974 for (unsigned FI : ObjectsToAllocate)
2975 AllocateObject(FI);
2976
2977 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2978 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2979
2980 if (AssignOffsets == AssignObjectOffsets::Yes)
2981 AFI->setStackSizeSVE(SVEStack.ZPRStackSize, SVEStack.PPRStackSize);
2982
2983 return SVEStack;
2984}
2985
2987 MachineFunction &MF, RegScavenger *RS) const {
2989 "Upwards growing stack unsupported");
2990
2992
2993 // If this function isn't doing Win64-style C++ EH, we don't need to do
2994 // anything.
2995 if (!MF.hasEHFunclets())
2996 return;
2997
2998 MachineFrameInfo &MFI = MF.getFrameInfo();
2999 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3000
3001 // Win64 C++ EH needs to allocate space for the catch objects in the fixed
3002 // object area right next to the UnwindHelp object.
3003 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
3004 int64_t CurrentOffset =
3006 for (WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
3007 for (WinEHHandlerType &H : TBME.HandlerArray) {
3008 int FrameIndex = H.CatchObj.FrameIndex;
3009 if ((FrameIndex != INT_MAX) && MFI.getObjectOffset(FrameIndex) == 0) {
3010 CurrentOffset =
3011 alignTo(CurrentOffset, MFI.getObjectAlign(FrameIndex).value());
3012 CurrentOffset += MFI.getObjectSize(FrameIndex);
3013 MFI.setObjectOffset(FrameIndex, -CurrentOffset);
3014 }
3015 }
3016 }
3017
3018 // Create an UnwindHelp object.
3019 // The UnwindHelp object is allocated at the start of the fixed object area
3020 int64_t UnwindHelpOffset = alignTo(CurrentOffset + 8, Align(16));
3021 assert(UnwindHelpOffset == getFixedObjectSize(MF, AFI, /*IsWin64*/ true,
3022 /*IsFunclet*/ false) &&
3023 "UnwindHelpOffset must be at the start of the fixed object area");
3024 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8, -UnwindHelpOffset,
3025 /*IsImmutable=*/false);
3026 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
3027
3028 MachineBasicBlock &MBB = MF.front();
3029 auto MBBI = MBB.begin();
3030 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
3031 ++MBBI;
3032
3033 // We need to store -2 into the UnwindHelp object at the start of the
3034 // function.
3035 DebugLoc DL;
3036 RS->enterBasicBlockEnd(MBB);
3037 RS->backward(MBBI);
3038 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
3039 assert(DstReg && "There must be a free register after frame setup");
3040 const AArch64InstrInfo &TII =
3041 *MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3042 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3043 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3044 .addReg(DstReg, getKillRegState(true))
3045 .addFrameIndex(UnwindHelpFI)
3046 .addImm(0);
3047}
3048
3049namespace {
3050struct TagStoreInstr {
3052 int64_t Offset, Size;
3053 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3054 : MI(MI), Offset(Offset), Size(Size) {}
3055};
3056
3057class TagStoreEdit {
3058 MachineFunction *MF;
3059 MachineBasicBlock *MBB;
3060 MachineRegisterInfo *MRI;
3061 // Tag store instructions that are being replaced.
3063 // Combined memref arguments of the above instructions.
3065
3066 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3067 // FrameRegOffset + Size) with the address tag of SP.
3068 Register FrameReg;
3069 StackOffset FrameRegOffset;
3070 int64_t Size;
3071 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3072 // end.
3073 std::optional<int64_t> FrameRegUpdate;
3074 // MIFlags for any FrameReg updating instructions.
3075 unsigned FrameRegUpdateFlags;
3076
3077 // Use zeroing instruction variants.
3078 bool ZeroData;
3079 DebugLoc DL;
3080
3081 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3082 void emitLoop(MachineBasicBlock::iterator InsertI);
3083
3084public:
3085 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3086 : MBB(MBB), ZeroData(ZeroData) {
3087 MF = MBB->getParent();
3088 MRI = &MF->getRegInfo();
3089 }
3090 // Add an instruction to be replaced. Instructions must be added in the
3091 // ascending order of Offset, and have to be adjacent.
3092 void addInstruction(TagStoreInstr I) {
3093 assert((TagStores.empty() ||
3094 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3095 "Non-adjacent tag store instructions.");
3096 TagStores.push_back(I);
3097 }
3098 void clear() { TagStores.clear(); }
3099 // Emit equivalent code at the given location, and erase the current set of
3100 // instructions. May skip if the replacement is not profitable. May invalidate
3101 // the input iterator and replace it with a valid one.
3102 void emitCode(MachineBasicBlock::iterator &InsertI,
3103 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3104};
3105
3106void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3107 const AArch64InstrInfo *TII =
3108 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3109
3110 const int64_t kMinOffset = -256 * 16;
3111 const int64_t kMaxOffset = 255 * 16;
3112
3113 Register BaseReg = FrameReg;
3114 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3115 if (BaseRegOffsetBytes < kMinOffset ||
3116 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3117 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3118 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3119 // is required for the offset of ST2G.
3120 BaseRegOffsetBytes % 16 != 0) {
3121 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3122 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3123 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3124 BaseReg = ScratchReg;
3125 BaseRegOffsetBytes = 0;
3126 }
3127
3128 MachineInstr *LastI = nullptr;
3129 while (Size) {
3130 int64_t InstrSize = (Size > 16) ? 32 : 16;
3131 unsigned Opcode =
3132 InstrSize == 16
3133 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3134 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3135 assert(BaseRegOffsetBytes % 16 == 0);
3136 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3137 .addReg(AArch64::SP)
3138 .addReg(BaseReg)
3139 .addImm(BaseRegOffsetBytes / 16)
3140 .setMemRefs(CombinedMemRefs);
3141 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3142 // final SP adjustment in the epilogue.
3143 if (BaseRegOffsetBytes == 0)
3144 LastI = I;
3145 BaseRegOffsetBytes += InstrSize;
3146 Size -= InstrSize;
3147 }
3148
3149 if (LastI)
3150 MBB->splice(InsertI, MBB, LastI);
3151}
3152
3153void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3154 const AArch64InstrInfo *TII =
3155 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3156
3157 Register BaseReg = FrameRegUpdate
3158 ? FrameReg
3159 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3160 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3161
3162 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3163
3164 int64_t LoopSize = Size;
3165 // If the loop size is not a multiple of 32, split off one 16-byte store at
3166 // the end to fold BaseReg update into.
3167 if (FrameRegUpdate && *FrameRegUpdate)
3168 LoopSize -= LoopSize % 32;
3169 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3170 TII->get(ZeroData ? AArch64::STZGloop_wback
3171 : AArch64::STGloop_wback))
3172 .addDef(SizeReg)
3173 .addDef(BaseReg)
3174 .addImm(LoopSize)
3175 .addReg(BaseReg)
3176 .setMemRefs(CombinedMemRefs);
3177 if (FrameRegUpdate)
3178 LoopI->setFlags(FrameRegUpdateFlags);
3179
3180 int64_t ExtraBaseRegUpdate =
3181 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3182 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
3183 << ", Size=" << Size
3184 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
3185 << ", FrameRegUpdate=" << FrameRegUpdate
3186 << ", FrameRegOffset.getFixed()="
3187 << FrameRegOffset.getFixed() << "\n");
3188 if (LoopSize < Size) {
3189 assert(FrameRegUpdate);
3190 assert(Size - LoopSize == 16);
3191 // Tag 16 more bytes at BaseReg and update BaseReg.
3192 int64_t STGOffset = ExtraBaseRegUpdate + 16;
3193 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
3194 "STG immediate out of range");
3195 BuildMI(*MBB, InsertI, DL,
3196 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3197 .addDef(BaseReg)
3198 .addReg(BaseReg)
3199 .addReg(BaseReg)
3200 .addImm(STGOffset / 16)
3201 .setMemRefs(CombinedMemRefs)
3202 .setMIFlags(FrameRegUpdateFlags);
3203 } else if (ExtraBaseRegUpdate) {
3204 // Update BaseReg.
3205 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
3206 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
3207 BuildMI(
3208 *MBB, InsertI, DL,
3209 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3210 .addDef(BaseReg)
3211 .addReg(BaseReg)
3212 .addImm(AddSubOffset)
3213 .addImm(0)
3214 .setMIFlags(FrameRegUpdateFlags);
3215 }
3216}
3217
3218// Check if *II is a register update that can be merged into STGloop that ends
3219// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3220// end of the loop.
3221bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3222 int64_t Size, int64_t *TotalOffset) {
3223 MachineInstr &MI = *II;
3224 if ((MI.getOpcode() == AArch64::ADDXri ||
3225 MI.getOpcode() == AArch64::SUBXri) &&
3226 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3227 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3228 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3229 if (MI.getOpcode() == AArch64::SUBXri)
3230 Offset = -Offset;
3231 int64_t PostOffset = Offset - Size;
3232 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
3233 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
3234 // chosen depends on the alignment of the loop size, but the difference
3235 // between the valid ranges for the two instructions is small, so we
3236 // conservatively assume that it could be either case here.
3237 //
3238 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
3239 // instruction.
3240 const int64_t kMaxOffset = 4080 - 16;
3241 // Max offset of SUBXri.
3242 const int64_t kMinOffset = -4095;
3243 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
3244 PostOffset % 16 == 0) {
3245 *TotalOffset = Offset;
3246 return true;
3247 }
3248 }
3249 return false;
3250}
3251
3252void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3254 MemRefs.clear();
3255 for (auto &TS : TSE) {
3256 MachineInstr *MI = TS.MI;
3257 // An instruction without memory operands may access anything. Be
3258 // conservative and return an empty list.
3259 if (MI->memoperands_empty()) {
3260 MemRefs.clear();
3261 return;
3262 }
3263 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3264 }
3265}
3266
3267void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3268 const AArch64FrameLowering *TFI,
3269 bool TryMergeSPUpdate) {
3270 if (TagStores.empty())
3271 return;
3272 TagStoreInstr &FirstTagStore = TagStores[0];
3273 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3274 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3275 DL = TagStores[0].MI->getDebugLoc();
3276
3277 Register Reg;
3278 FrameRegOffset = TFI->resolveFrameOffsetReference(
3279 *MF, FirstTagStore.Offset, false /*isFixed*/,
3280 TargetStackID::Default /*StackID*/, Reg,
3281 /*PreferFP=*/false, /*ForSimm=*/true);
3282 FrameReg = Reg;
3283 FrameRegUpdate = std::nullopt;
3284
3285 mergeMemRefs(TagStores, CombinedMemRefs);
3286
3287 LLVM_DEBUG({
3288 dbgs() << "Replacing adjacent STG instructions:\n";
3289 for (const auto &Instr : TagStores) {
3290 dbgs() << " " << *Instr.MI;
3291 }
3292 });
3293
3294 // Size threshold where a loop becomes shorter than a linear sequence of
3295 // tagging instructions.
3296 const int kSetTagLoopThreshold = 176;
3297 if (Size < kSetTagLoopThreshold) {
3298 if (TagStores.size() < 2)
3299 return;
3300 emitUnrolled(InsertI);
3301 } else {
3302 MachineInstr *UpdateInstr = nullptr;
3303 int64_t TotalOffset = 0;
3304 if (TryMergeSPUpdate) {
3305 // See if we can merge base register update into the STGloop.
3306 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
3307 // but STGloop is way too unusual for that, and also it only
3308 // realistically happens in function epilogue. Also, STGloop is expanded
3309 // before that pass.
3310 if (InsertI != MBB->end() &&
3311 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
3312 &TotalOffset)) {
3313 UpdateInstr = &*InsertI++;
3314 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
3315 << *UpdateInstr);
3316 }
3317 }
3318
3319 if (!UpdateInstr && TagStores.size() < 2)
3320 return;
3321
3322 if (UpdateInstr) {
3323 FrameRegUpdate = TotalOffset;
3324 FrameRegUpdateFlags = UpdateInstr->getFlags();
3325 }
3326 emitLoop(InsertI);
3327 if (UpdateInstr)
3328 UpdateInstr->eraseFromParent();
3329 }
3330
3331 for (auto &TS : TagStores)
3332 TS.MI->eraseFromParent();
3333}
3334
3335bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
3336 int64_t &Size, bool &ZeroData) {
3337 MachineFunction &MF = *MI.getParent()->getParent();
3338 const MachineFrameInfo &MFI = MF.getFrameInfo();
3339
3340 unsigned Opcode = MI.getOpcode();
3341 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
3342 Opcode == AArch64::STZ2Gi);
3343
3344 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
3345 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
3346 return false;
3347 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
3348 return false;
3349 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
3350 Size = MI.getOperand(2).getImm();
3351 return true;
3352 }
3353
3354 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
3355 Size = 16;
3356 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
3357 Size = 32;
3358 else
3359 return false;
3360
3361 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
3362 return false;
3363
3364 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
3365 16 * MI.getOperand(2).getImm();
3366 return true;
3367}
3368
3369static size_t countAvailableScavengerSlots(LivePhysRegs &LiveRegs,
3371 RegScavenger *RS) {
3372 auto FreeGPRs =
3373 llvm::count_if(AArch64::GPR64RegClass, [&LiveRegs, &MRI](auto Reg) {
3374 return LiveRegs.available(MRI, Reg);
3375 });
3376
3377 size_t NumEmergencySlots = 0;
3378 if (RS)
3379 NumEmergencySlots = RS->getNumScavengingFrameIndices();
3380
3381 return FreeGPRs + NumEmergencySlots;
3382}
3383
3384// Detect a run of memory tagging instructions for adjacent stack frame slots,
3385// and replace them with a shorter instruction sequence:
3386// * replace STG + STG with ST2G
3387// * replace STGloop + STGloop with STGloop
3388// This code needs to run when stack slot offsets are already known, but before
3389// FrameIndex operands in STG instructions are eliminated.
3391 const AArch64FrameLowering *TFI,
3392 RegScavenger *RS) {
3393 bool FirstZeroData;
3394 int64_t Size, Offset;
3395 MachineInstr &MI = *II;
3396 MachineBasicBlock *MBB = MI.getParent();
3398 if (&MI == &MBB->instr_back())
3399 return II;
3400 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
3401 return II;
3402
3404 Instrs.emplace_back(&MI, Offset, Size);
3405
3406 constexpr int kScanLimit = 10;
3407 int Count = 0;
3409 NextI != E && Count < kScanLimit; ++NextI) {
3410 MachineInstr &MI = *NextI;
3411 bool ZeroData;
3412 int64_t Size, Offset;
3413 // Collect instructions that update memory tags with a FrameIndex operand
3414 // and (when applicable) constant size, and whose output registers are dead
3415 // (the latter is almost always the case in practice). Since these
3416 // instructions effectively have no inputs or outputs, we are free to skip
3417 // any non-aliasing instructions in between without tracking used registers.
3418 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
3419 if (ZeroData != FirstZeroData)
3420 break;
3421 Instrs.emplace_back(&MI, Offset, Size);
3422 continue;
3423 }
3424
3425 // Only count non-transient, non-tagging instructions toward the scan
3426 // limit.
3427 if (!MI.isTransient())
3428 ++Count;
3429
3430 // Just in case, stop before the epilogue code starts.
3431 if (MI.getFlag(MachineInstr::FrameSetup) ||
3433 break;
3434
3435 // Reject anything that may alias the collected instructions.
3436 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
3437 break;
3438 }
3439
3440 // New code will be inserted after the last tagging instruction we've found.
3441 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
3442
3443 // All the gathered stack tag instructions are merged and placed after
3444 // last tag store in the list. The check should be made if the nzcv
3445 // flag is live at the point where we are trying to insert. Otherwise
3446 // the nzcv flag might get clobbered if any stg loops are present.
3447
3448 // FIXME : This approach of bailing out from merge is conservative in
3449 // some ways like even if stg loops are not present after merge the
3450 // insert list, this liveness check is done (which is not needed).
3452 LiveRegs.addLiveOuts(*MBB);
3453 for (auto I = MBB->rbegin();; ++I) {
3454 MachineInstr &MI = *I;
3455 if (MI == InsertI)
3456 break;
3457 LiveRegs.stepBackward(*I);
3458 }
3459 InsertI++;
3460 if (LiveRegs.contains(AArch64::NZCV))
3461 return InsertI;
3462
3463 // Emitting an MTE loop requires two physical registers (BaseReg and
3464 // SizeReg). If the function is under register pressure, the register
3465 // scavenger will crash trying to allocate them. If we don't have at least
3466 // two free slots (free registers + emergency slots), bail out and fall back
3467 // to the unrolled sequence.
3468 if (countAvailableScavengerSlots(LiveRegs, MBB->getParent()->getRegInfo(),
3469 RS) < 2) {
3470 LLVM_DEBUG(
3471 dbgs() << "Failed to merge MTE stack tagging instructions into loop "
3472 << "due to high register pressure.\n");
3473 return InsertI;
3474 }
3475
3476 llvm::stable_sort(Instrs,
3477 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
3478 return Left.Offset < Right.Offset;
3479 });
3480
3481 // Make sure that we don't have any overlapping stores.
3482 int64_t CurOffset = Instrs[0].Offset;
3483 for (auto &Instr : Instrs) {
3484 if (CurOffset > Instr.Offset)
3485 return NextI;
3486 CurOffset = Instr.Offset + Instr.Size;
3487 }
3488
3489 // Find contiguous runs of tagged memory and emit shorter instruction
3490 // sequences for them when possible.
3491 TagStoreEdit TSE(MBB, FirstZeroData);
3492 std::optional<int64_t> EndOffset;
3493 for (auto &Instr : Instrs) {
3494 if (EndOffset && *EndOffset != Instr.Offset) {
3495 // Found a gap.
3496 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
3497 TSE.clear();
3498 }
3499
3500 TSE.addInstruction(Instr);
3501 EndOffset = Instr.Offset + Instr.Size;
3502 }
3503
3504 const MachineFunction *MF = MBB->getParent();
3505 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
3506 TSE.emitCode(
3507 InsertI, TFI, /*TryMergeSPUpdate = */
3509
3510 return InsertI;
3511}
3512} // namespace
3513
3515 MachineFunction &MF, RegScavenger *RS = nullptr) const {
3516 for (auto &BB : MF)
3517 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
3519 II = tryMergeAdjacentSTG(II, this, RS);
3520 }
3521
3522 // By the time this method is called, most of the prologue/epilogue code is
3523 // already emitted, whether its location was affected by the shrink-wrapping
3524 // optimization or not.
3525 if (!MF.getFunction().hasFnAttribute(Attribute::Naked) &&
3526 shouldSignReturnAddressEverywhere(MF))
3528}
3529
3530/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
3531/// before the update. This is easily retrieved as it is exactly the offset
3532/// that is set in processFunctionBeforeFrameFinalized.
3534 const MachineFunction &MF, int FI, Register &FrameReg,
3535 bool IgnoreSPUpdates) const {
3536 const MachineFrameInfo &MFI = MF.getFrameInfo();
3537 if (IgnoreSPUpdates) {
3538 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
3539 << MFI.getObjectOffset(FI) << "\n");
3540 FrameReg = AArch64::SP;
3541 return StackOffset::getFixed(MFI.getObjectOffset(FI));
3542 }
3543
3544 // Go to common code if we cannot provide sp + offset.
3545 if (MFI.hasVarSizedObjects() ||
3548 return getFrameIndexReference(MF, FI, FrameReg);
3549
3550 FrameReg = AArch64::SP;
3551 return getStackOffset(MF, MFI.getObjectOffset(FI));
3552}
3553
3554/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
3555/// the parent's frame pointer
3557 const MachineFunction &MF) const {
3558 return 0;
3559}
3560
3561/// Funclets only need to account for space for the callee saved registers,
3562/// as the locals are accounted for in the parent's stack frame.
3564 const MachineFunction &MF) const {
3565 // This is the size of the pushed CSRs.
3566 unsigned CSSize =
3567 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
3568 // This is the amount of stack a funclet needs to allocate.
3569 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
3570 getStackAlign());
3571}
3572
3573namespace {
3574struct FrameObject {
3575 bool IsValid = false;
3576 // Index of the object in MFI.
3577 int ObjectIndex = 0;
3578 // Group ID this object belongs to.
3579 int GroupIndex = -1;
3580 // This object should be placed first (closest to SP).
3581 bool ObjectFirst = false;
3582 // This object's group (which always contains the object with
3583 // ObjectFirst==true) should be placed first.
3584 bool GroupFirst = false;
3585
3586 // Used to distinguish between FP and GPR accesses. The values are decided so
3587 // that they sort FPR < Hazard < GPR and they can be or'd together.
3588 unsigned Accesses = 0;
3589 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
3590};
3591
3592class GroupBuilder {
3593 SmallVector<int, 8> CurrentMembers;
3594 int NextGroupIndex = 0;
3595 std::vector<FrameObject> &Objects;
3596
3597public:
3598 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
3599 void AddMember(int Index) { CurrentMembers.push_back(Index); }
3600 void EndCurrentGroup() {
3601 if (CurrentMembers.size() > 1) {
3602 // Create a new group with the current member list. This might remove them
3603 // from their pre-existing groups. That's OK, dealing with overlapping
3604 // groups is too hard and unlikely to make a difference.
3605 LLVM_DEBUG(dbgs() << "group:");
3606 for (int Index : CurrentMembers) {
3607 Objects[Index].GroupIndex = NextGroupIndex;
3608 LLVM_DEBUG(dbgs() << " " << Index);
3609 }
3610 LLVM_DEBUG(dbgs() << "\n");
3611 NextGroupIndex++;
3612 }
3613 CurrentMembers.clear();
3614 }
3615};
3616
3617bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
3618 // Objects at a lower index are closer to FP; objects at a higher index are
3619 // closer to SP.
3620 //
3621 // For consistency in our comparison, all invalid objects are placed
3622 // at the end. This also allows us to stop walking when we hit the
3623 // first invalid item after it's all sorted.
3624 //
3625 // If we want to include a stack hazard region, order FPR accesses < the
3626 // hazard object < GPRs accesses in order to create a separation between the
3627 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
3628 //
3629 // Otherwise the "first" object goes first (closest to SP), followed by the
3630 // members of the "first" group.
3631 //
3632 // The rest are sorted by the group index to keep the groups together.
3633 // Higher numbered groups are more likely to be around longer (i.e. untagged
3634 // in the function epilogue and not at some earlier point). Place them closer
3635 // to SP.
3636 //
3637 // If all else equal, sort by the object index to keep the objects in the
3638 // original order.
3639 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
3640 A.GroupIndex, A.ObjectIndex) <
3641 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
3642 B.GroupIndex, B.ObjectIndex);
3643}
3644} // namespace
3645
3647 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
3649
3650 if ((!OrderFrameObjects && !AFI.hasSplitSVEObjects()) ||
3651 ObjectsToAllocate.empty())
3652 return;
3653
3654 const MachineFrameInfo &MFI = MF.getFrameInfo();
3655 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
3656 for (auto &Obj : ObjectsToAllocate) {
3657 FrameObjects[Obj].IsValid = true;
3658 FrameObjects[Obj].ObjectIndex = Obj;
3659 }
3660
3661 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
3662 // the same time.
3663 GroupBuilder GB(FrameObjects);
3664 for (auto &MBB : MF) {
3665 for (auto &MI : MBB) {
3666 if (MI.isDebugInstr())
3667 continue;
3668
3669 if (AFI.hasStackHazardSlotIndex()) {
3670 std::optional<int> FI = getLdStFrameID(MI, MFI);
3671 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3672 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3674 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
3675 else
3676 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
3677 }
3678 }
3679
3680 int OpIndex;
3681 switch (MI.getOpcode()) {
3682 case AArch64::STGloop:
3683 case AArch64::STZGloop:
3684 OpIndex = 3;
3685 break;
3686 case AArch64::STGi:
3687 case AArch64::STZGi:
3688 case AArch64::ST2Gi:
3689 case AArch64::STZ2Gi:
3690 OpIndex = 1;
3691 break;
3692 default:
3693 OpIndex = -1;
3694 }
3695
3696 int TaggedFI = -1;
3697 if (OpIndex >= 0) {
3698 const MachineOperand &MO = MI.getOperand(OpIndex);
3699 if (MO.isFI()) {
3700 int FI = MO.getIndex();
3701 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
3702 FrameObjects[FI].IsValid)
3703 TaggedFI = FI;
3704 }
3705 }
3706
3707 // If this is a stack tagging instruction for a slot that is not part of a
3708 // group yet, either start a new group or add it to the current one.
3709 if (TaggedFI >= 0)
3710 GB.AddMember(TaggedFI);
3711 else
3712 GB.EndCurrentGroup();
3713 }
3714 // Groups should never span multiple basic blocks.
3715 GB.EndCurrentGroup();
3716 }
3717
3718 if (AFI.hasStackHazardSlotIndex()) {
3719 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
3720 FrameObject::AccessHazard;
3721 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
3722 for (auto &Obj : FrameObjects)
3723 if (!Obj.Accesses ||
3724 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
3725 Obj.Accesses = FrameObject::AccessGPR;
3726 }
3727
3728 // If the function's tagged base pointer is pinned to a stack slot, we want to
3729 // put that slot first when possible. This will likely place it at SP + 0,
3730 // and save one instruction when generating the base pointer because IRG does
3731 // not allow an immediate offset.
3732 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
3733 if (TBPI) {
3734 FrameObjects[*TBPI].ObjectFirst = true;
3735 FrameObjects[*TBPI].GroupFirst = true;
3736 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
3737 if (FirstGroupIndex >= 0)
3738 for (FrameObject &Object : FrameObjects)
3739 if (Object.GroupIndex == FirstGroupIndex)
3740 Object.GroupFirst = true;
3741 }
3742
3743 llvm::stable_sort(FrameObjects, FrameObjectCompare);
3744
3745 int i = 0;
3746 for (auto &Obj : FrameObjects) {
3747 // All invalid items are sorted at the end, so it's safe to stop.
3748 if (!Obj.IsValid)
3749 break;
3750 ObjectsToAllocate[i++] = Obj.ObjectIndex;
3751 }
3752
3753 LLVM_DEBUG({
3754 dbgs() << "Final frame order:\n";
3755 for (auto &Obj : FrameObjects) {
3756 if (!Obj.IsValid)
3757 break;
3758 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
3759 if (Obj.ObjectFirst)
3760 dbgs() << ", first";
3761 if (Obj.GroupFirst)
3762 dbgs() << ", group-first";
3763 dbgs() << "\n";
3764 }
3765 });
3766}
3767
3768/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
3769/// least every ProbeSize bytes. Returns an iterator of the first instruction
3770/// after the loop. The difference between SP and TargetReg must be an exact
3771/// multiple of ProbeSize.
3773AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
3774 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
3775 Register TargetReg) const {
3776 MachineBasicBlock &MBB = *MBBI->getParent();
3777 MachineFunction &MF = *MBB.getParent();
3778 const AArch64InstrInfo *TII =
3779 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3780 DebugLoc DL = MBB.findDebugLoc(MBBI);
3781
3782 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
3783 MachineBasicBlock *LoopMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3784 MF.insert(MBBInsertPoint, LoopMBB);
3785 MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3786 MF.insert(MBBInsertPoint, ExitMBB);
3787
3788 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
3789 // in SUB).
3790 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
3791 StackOffset::getFixed(-ProbeSize), TII,
3793 // LDR XZR, [SP]
3794 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::LDRXui))
3795 .addDef(AArch64::XZR)
3796 .addReg(AArch64::SP)
3797 .addImm(0)
3801 Align(8)))
3803 // CMP SP, TargetReg
3804 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
3805 AArch64::XZR)
3806 .addReg(AArch64::SP)
3807 .addReg(TargetReg)
3810 // B.CC Loop
3811 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
3813 .addMBB(LoopMBB)
3815
3816 LoopMBB->addSuccessor(ExitMBB);
3817 LoopMBB->addSuccessor(LoopMBB);
3818 // Synthesize the exit MBB.
3819 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
3821 MBB.addSuccessor(LoopMBB);
3822 // Update liveins.
3823 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
3824
3825 return ExitMBB->begin();
3826}
3827
3828void AArch64FrameLowering::inlineStackProbeFixed(
3829 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
3830 StackOffset CFAOffset) const {
3831 MachineBasicBlock *MBB = MBBI->getParent();
3832 MachineFunction &MF = *MBB->getParent();
3833 const AArch64InstrInfo *TII =
3834 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3835 AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
3836 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
3837 bool HasFP = hasFP(MF);
3838
3839 DebugLoc DL;
3840 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
3841 int64_t NumBlocks = FrameSize / ProbeSize;
3842 int64_t ResidualSize = FrameSize % ProbeSize;
3843
3844 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
3845 << NumBlocks << " blocks of " << ProbeSize
3846 << " bytes, plus " << ResidualSize << " bytes\n");
3847
3848 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
3849 // ordinary loop.
3850 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
3851 for (int i = 0; i < NumBlocks; ++i) {
3852 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
3853 // encodable in a SUB).
3854 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3855 StackOffset::getFixed(-ProbeSize), TII,
3856 MachineInstr::FrameSetup, false, false, nullptr,
3857 EmitAsyncCFI && !HasFP, CFAOffset);
3858 CFAOffset += StackOffset::getFixed(ProbeSize);
3859 // LDR XZR, [SP]
3860 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::LDRXui))
3861 .addDef(AArch64::XZR)
3862 .addReg(AArch64::SP)
3863 .addImm(0)
3867 Align(8)))
3869 }
3870 } else if (NumBlocks != 0) {
3871 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
3872 // encodable in ADD). ScrathReg may temporarily become the CFA register.
3873 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
3874 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
3875 MachineInstr::FrameSetup, false, false, nullptr,
3876 EmitAsyncCFI && !HasFP, CFAOffset);
3877 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
3878 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
3879 MBB = MBBI->getParent();
3880 if (EmitAsyncCFI && !HasFP) {
3881 // Set the CFA register back to SP.
3882 CFIInstBuilder(*MBB, MBBI, MachineInstr::FrameSetup)
3883 .buildDefCFARegister(AArch64::SP);
3884 }
3885 }
3886
3887 if (ResidualSize != 0) {
3888 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
3889 // in SUB).
3890 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3891 StackOffset::getFixed(-ResidualSize), TII,
3892 MachineInstr::FrameSetup, false, false, nullptr,
3893 EmitAsyncCFI && !HasFP, CFAOffset);
3894 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
3895 // LDR XZR, [SP]
3896 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::LDRXui))
3897 .addDef(AArch64::XZR)
3898 .addReg(AArch64::SP)
3899 .addImm(0)
3903 Align(8)))
3905 }
3906 }
3907}
3908
3909void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
3910 MachineBasicBlock &MBB) const {
3911 // Get the instructions that need to be replaced. We emit at most two of
3912 // these. Remember them in order to avoid complications coming from the need
3913 // to traverse the block while potentially creating more blocks.
3914 SmallVector<MachineInstr *, 4> ToReplace;
3915 for (MachineInstr &MI : MBB)
3916 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
3917 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
3918 ToReplace.push_back(&MI);
3919
3920 for (MachineInstr *MI : ToReplace) {
3921 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
3922 Register ScratchReg = MI->getOperand(0).getReg();
3923 int64_t FrameSize = MI->getOperand(1).getImm();
3924 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
3925 MI->getOperand(3).getImm());
3926 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
3927 CFAOffset);
3928 } else {
3929 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
3930 "Stack probe pseudo-instruction expected");
3931 const AArch64InstrInfo *TII =
3932 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
3933 Register TargetReg = MI->getOperand(0).getReg();
3934 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
3935 }
3936 MI->eraseFromParent();
3937 }
3938}
3939
3942 NotAccessed = 0, // Stack object not accessed by load/store instructions.
3943 GPR = 1 << 0, // A general purpose register.
3944 PPR = 1 << 1, // A predicate register.
3945 FPR = 1 << 2, // A floating point/Neon/SVE register.
3946 };
3947
3948 int Idx;
3950 int64_t Size;
3951 unsigned AccessTypes;
3952
3954
3955 bool operator<(const StackAccess &Rhs) const {
3956 return std::make_tuple(start(), Idx) <
3957 std::make_tuple(Rhs.start(), Rhs.Idx);
3958 }
3959
3960 bool isCPU() const {
3961 // Predicate register load and store instructions execute on the CPU.
3963 }
3964 bool isSME() const { return AccessTypes & AccessType::FPR; }
3965 bool isMixed() const { return isCPU() && isSME(); }
3966
3967 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
3968 int64_t end() const { return start() + Size; }
3969
3970 std::string getTypeString() const {
3971 switch (AccessTypes) {
3972 case AccessType::FPR:
3973 return "FPR";
3974 case AccessType::PPR:
3975 return "PPR";
3976 case AccessType::GPR:
3977 return "GPR";
3979 return "NA";
3980 default:
3981 return "Mixed";
3982 }
3983 }
3984
3985 void print(raw_ostream &OS) const {
3986 OS << getTypeString() << " stack object at [SP"
3987 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
3988 if (Offset.getScalable())
3989 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
3990 << " * vscale";
3991 OS << "]";
3992 }
3993};
3994
3995static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
3996 SA.print(OS);
3997 return OS;
3998}
3999
4000void AArch64FrameLowering::emitRemarks(
4001 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
4002
4003 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
4005 return;
4006
4007 unsigned StackHazardSize = getStackHazardSize(MF);
4008 const uint64_t HazardSize =
4009 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
4010
4011 if (HazardSize == 0)
4012 return;
4013
4014 const MachineFrameInfo &MFI = MF.getFrameInfo();
4015 // Bail if function has no stack objects.
4016 if (!MFI.hasStackObjects())
4017 return;
4018
4019 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
4020
4021 size_t NumFPLdSt = 0;
4022 size_t NumNonFPLdSt = 0;
4023
4024 // Collect stack accesses via Load/Store instructions.
4025 for (const MachineBasicBlock &MBB : MF) {
4026 for (const MachineInstr &MI : MBB) {
4027 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
4028 continue;
4029 for (MachineMemOperand *MMO : MI.memoperands()) {
4030 std::optional<int> FI = getMMOFrameID(MMO, MFI);
4031 if (FI && !MFI.isDeadObjectIndex(*FI)) {
4032 int FrameIdx = *FI;
4033
4034 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
4035 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
4036 StackAccesses[ArrIdx].Idx = FrameIdx;
4037 StackAccesses[ArrIdx].Offset =
4038 getFrameIndexReferenceFromSP(MF, FrameIdx);
4039 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
4040 }
4041
4042 unsigned RegTy = StackAccess::AccessType::GPR;
4043 if (MFI.hasScalableStackID(FrameIdx))
4046 RegTy = StackAccess::FPR;
4047
4048 StackAccesses[ArrIdx].AccessTypes |= RegTy;
4049
4050 if (RegTy == StackAccess::FPR)
4051 ++NumFPLdSt;
4052 else
4053 ++NumNonFPLdSt;
4054 }
4055 }
4056 }
4057 }
4058
4059 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
4060 return;
4061
4062 llvm::sort(StackAccesses);
4063 llvm::erase_if(StackAccesses, [](const StackAccess &S) {
4065 });
4066
4069
4070 if (StackAccesses.front().isMixed())
4071 MixedObjects.push_back(&StackAccesses.front());
4072
4073 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
4074 It != End; ++It) {
4075 const auto &First = *It;
4076 const auto &Second = *(It + 1);
4077
4078 if (Second.isMixed())
4079 MixedObjects.push_back(&Second);
4080
4081 if ((First.isSME() && Second.isCPU()) ||
4082 (First.isCPU() && Second.isSME())) {
4083 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
4084 if (Distance < HazardSize)
4085 HazardPairs.emplace_back(&First, &Second);
4086 }
4087 }
4088
4089 auto EmitRemark = [&](llvm::StringRef Str) {
4090 ORE->emit([&]() {
4091 auto R = MachineOptimizationRemarkAnalysis(
4092 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
4093 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
4094 });
4095 };
4096
4097 for (const auto &P : HazardPairs)
4098 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
4099
4100 for (const auto *Obj : MixedObjects)
4101 EmitRemark(
4102 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
4103}
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static RegState getPrologueDeath(MachineFunction &MF, unsigned Reg)
static bool produceCompactUnwindFrame(const AArch64FrameLowering &, MachineFunction &MF)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static cl::opt< bool > SplitSVEObjects("aarch64-split-sve-objects", cl::desc("Split allocation of ZPR & PPR objects"), cl::init(true), cl::Hidden)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
void computeCalleeSaveRegisterPairs(const AArch64FrameLowering &AFL, MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static bool invalidateRegisterPairing(bool SpillExtendedVolatile, unsigned SpillCount, unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool isLikelyToHaveSVEStack(const AArch64FrameLowering &AFL, const MachineFunction &MF)
static bool invalidateWindowsRegisterPairing(bool SpillExtendedVolatile, unsigned SpillCount, unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, const TargetRegisterInfo *TRI)
static SVEStackSizes determineSVEStackSizes(MachineFunction &MF, AssignObjectOffsets AssignOffsets)
Process all the SVE stack objects and the SVE stack size and offsets for each object.
static bool isTargetWindows(const MachineFunction &MF)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static unsigned getStackHazardSize(const MachineFunction &MF)
MCRegister findFreePredicateReg(BitVector &SavedRegs)
static bool isPPRAccess(const MachineInstr &MI)
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
This file contains the declaration of the AArch64PrologueEmitter and AArch64EpilogueEmitter classes,...
static const int kSetTagLoopThreshold
static int getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB)
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
DXIL Forward Handle Accesses
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
static std::string getTypeString(Type *T)
Definition LLParser.cpp:68
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition MD5.cpp:54
#define I(x, y, z)
Definition MD5.cpp:57
#define H(x, y, z)
Definition MD5.cpp:56
Register Reg
Register const TargetRegisterInfo * TRI
Promote Memory to Register
Definition Mem2Reg.cpp:110
uint64_t IntrinsicInst * II
#define P(N)
This file declares the machine register scavenger class.
unsigned OpIndex
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition Value.cpp:483
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
#define LLVM_DEBUG(...)
Definition Debug.h:114
StackOffset getSVEStackSize(const MachineFunction &MF) const
Returns the size of the entire SVE stackframe (PPRs + ZPRs).
StackOffset getZPRStackSize(const MachineFunction &MF) const
Returns the size of the entire ZPR stackframe (calleesaves + spills).
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
bool enableFullCFIFixup(const MachineFunction &MF) const override
enableFullCFIFixup - Returns true if we may need to fix the unwind information such that it is accura...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
bool enableCFIFixup(const MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon function entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool hasSVECalleeSavesAboveFrameRecord(const MachineFunction &MF) const
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, TargetStackID::Value StackID, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
bool needsWinCFI(const MachineFunction &MF) const
bool isFPReserved(const MachineFunction &MF) const
Should the Frame Pointer be reserved for the current function?
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
StackOffset getPPRStackSize(const MachineFunction &MF) const
Returns the size of the entire PPR stackframe (calleesaves + spills + hazard padding).
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
bool requiresSaveVG(const MachineFunction &MF) const
void emitPacRetPlusLeafHardening(MachineFunction &MF) const
Harden the entire function with pac-ret.
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
SignReturnAddress getSignReturnAddressCondition() const
void setStackSizeSVE(uint64_t ZPR, uint64_t PPR)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setSVECalleeSavedStackSize(unsigned ZPR, unsigned PPR)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:40
size_t size() const
size - Get the array size.
Definition ArrayRef.h:142
bool empty() const
empty - Check if the array is empty.
Definition ArrayRef.h:137
bool test(unsigned Idx) const
Definition BitVector.h:480
BitVector & reset()
Definition BitVector.h:411
size_type count() const
count - Returns the number of bits which are set.
Definition BitVector.h:181
BitVector & set()
Definition BitVector.h:370
iterator_range< const_set_bits_iterator > set_bits() const
Definition BitVector.h:159
size_type size() const
size - Returns the number of bits in this bitvector.
Definition BitVector.h:178
Helper class for creating CFI instructions and inserting them into MIR.
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition DebugLoc.h:123
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition Function.h:711
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition Function.h:272
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition Function.h:354
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition Function.h:229
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition Function.cpp:728
A set of physical registers with utility functions to track liveness when walking backward/forward th...
bool usesWindowsCFI() const
Definition MCAsmInfo.h:665
Wrapper class representing physical registers. Should be passed by value.
Definition MCRegister.h:41
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
LLVM_ABI int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
LLVM_ABI int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
bool isCalleeSavedObjectIndex(int ObjectIdx) const
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
bool hasScalableStackID(int ObjectIdx) const
int getStackProtectorIndex() const
Return the index for the stack protector object.
LLVM_ABI int CreateSpillStackObject(uint64_t Size, Align Alignment)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
LLVM_ABI uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
void setIsCalleeSavedObjectIndex(int ObjectIdx, bool IsCalleeSaved)
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & addReg(Register RegNo, RegState Flags={}, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & addDef(Register RegNo, RegState Flags={}, unsigned SubReg=0) const
Add a virtual register definition operand.
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
Representation of each machine instruction.
void setFlags(unsigned flags)
uint32_t getFlags() const
Return the MI flags bitvector.
LLVM_ABI MachineInstrBundleIterator< MachineInstr > eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOVolatile
The memory access is volatile.
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
int64_t getImm() const
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
LLVM_ABI void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
LLVM_ABI void freezeReservedRegs()
freezeReservedRegs - Called by the register allocator to freeze the set of reserved registers before ...
bool isReserved(MCRegister PhysReg) const
isReserved - Returns true when PhysReg is a reserved register.
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
LLVM_ABI bool isLiveIn(Register Reg) const
LLVM_ABI const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
LLVM_ABI bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition ArrayRef.h:298
Wrapper class representing virtual and physical registers.
Definition Register.h:20
constexpr bool isValid() const
Definition Register.h:112
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasNonStreamingInterfaceAndBody() const
bool hasStreamingBody() const
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:151
A SetVector that performs no allocations if smaller than a certain size.
Definition SetVector.h:339
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
reference emplace_back(ArgTypes &&... Args)
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
StackOffset holds a fixed and a scalable offset in bytes.
Definition TypeSize.h:30
int64_t getFixed() const
Returns the fixed component of the stack.
Definition TypeSize.h:46
int64_t getScalable() const
Returns the scalable component of the stack.
Definition TypeSize.h:49
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition TypeSize.h:41
static StackOffset getScalable(int64_t Scalable)
Definition TypeSize.h:40
static StackOffset getFixed(int64_t Fixed)
Definition TypeSize.h:39
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(const MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
Primary interface to the complete machine description for the target machine.
const Triple & getTargetTriple() const
TargetOptions Options
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
LLVM_ABI bool FramePointerIsReserved(const MachineFunction &MF) const
FramePointerIsReserved - This returns true if the frame pointer must always either point to a new fra...
LLVM_ABI bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
virtual const TargetRegisterInfo * getRegisterInfo() const =0
Return the target's register information.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition CallingConv.h:50
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition CallingConv.h:66
@ Fast
Attempts to make calls as fast as possible (e.g.
Definition CallingConv.h:41
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition CallingConv.h:87
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
initializer< Ty > init(const Ty &Val)
NodeAddr< InstrNode * > Instr
Definition RDFGraph.h:389
BaseReg
Stack frame base register. Bit 0 of FREInfo.Info.
Definition SFrame.h:77
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:532
void stable_sort(R &&Range)
Definition STLExtras.h:2116
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
RegState
Flags to represent properties of register accesses.
@ Define
Register definition.
constexpr RegState getKillRegState(bool B)
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
constexpr T alignDown(U Value, V Align, W Skew=0)
Returns the largest unsigned integer less than or equal to Value and is Skew mod Align.
Definition MathExtras.h:546
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1746
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition STLExtras.h:408
void sort(IteratorTy Start, IteratorTy End)
Definition STLExtras.h:1636
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:163
constexpr uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
FunctionAddr VTableAddr Count
Definition InstrProf.h:139
constexpr RegState getDefRegState(bool B)
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:74
uint16_t MCPhysReg
An unsigned integer type large enough to represent all physical registers, but not necessarily virtua...
Definition MCRegister.h:21
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
auto count_if(R &&Range, UnaryPredicate P)
Wrapper function around std::count_if to count the number of times an element satisfying a given pred...
Definition STLExtras.h:2019
auto find_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::find_if which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1772
void erase_if(Container &C, UnaryPredicate P)
Provide a container algorithm similar to C++ Library Fundamentals v2's erase_if which is equivalent t...
Definition STLExtras.h:2192
bool is_contained(R &&Range, const E &Element)
Returns true if Element is found in Range.
Definition STLExtras.h:1947
LLVM_ABI const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=MaxLookupSearchDepth)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
LLVM_ABI Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition BitVector.h:872
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
Pair of physical register and lane mask.
static LLVM_ABI MachinePointerInfo getUnknownStack(MachineFunction &MF)
Stack memory without other information.
static LLVM_ABI MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.
SmallVector< WinEHTryBlockMapEntry, 4 > TryBlockMap
SmallVector< WinEHHandlerType, 1 > HandlerArray