LLVM 23.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | (Win64 only) callee-saved SVE reg |
48// | |
49// |-----------------------------------|
50// | |
51// | callee-saved gpr registers | <--.
52// | | | On Darwin platforms these
53// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
54// | prev_lr | | (frame record first)
55// | prev_fp | <--'
56// | async context if needed |
57// | (a.k.a. "frame record") |
58// |-----------------------------------| <- fp(=x29)
59// Default SVE stack layout Split SVE objects
60// (aarch64-split-sve-objects=false) (aarch64-split-sve-objects=true)
61// |-----------------------------------| |-----------------------------------|
62// | <hazard padding> | | callee-saved PPR registers |
63// |-----------------------------------| |-----------------------------------|
64// | | | PPR stack objects |
65// | callee-saved fp/simd/SVE regs | |-----------------------------------|
66// | | | <hazard padding> |
67// |-----------------------------------| |-----------------------------------|
68// | | | callee-saved ZPR/FPR registers |
69// | SVE stack objects | |-----------------------------------|
70// | | | ZPR stack objects |
71// |-----------------------------------| |-----------------------------------|
72// ^ NB: FPR CSRs are promoted to ZPRs
73// |-----------------------------------|
74// |.empty.space.to.make.part.below....|
75// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
76// |.the.standard.16-byte.alignment....| compile time; if present)
77// |-----------------------------------|
78// | local variables of fixed size |
79// | including spill slots |
80// | <FPR> |
81// | <hazard padding> |
82// | <GPR> |
83// |-----------------------------------| <- bp(not defined by ABI,
84// |.variable-sized.local.variables....| LLVM chooses X19)
85// |.(VLAs)............................| (size of this area is unknown at
86// |...................................| compile time)
87// |-----------------------------------| <- sp
88// | | Lower address
89//
90//
91// To access the data in a frame, at-compile time, a constant offset must be
92// computable from one of the pointers (fp, bp, sp) to access it. The size
93// of the areas with a dotted background cannot be computed at compile-time
94// if they are present, making it required to have all three of fp, bp and
95// sp to be set up to be able to access all contents in the frame areas,
96// assuming all of the frame areas are non-empty.
97//
98// For most functions, some of the frame areas are empty. For those functions,
99// it may not be necessary to set up fp or bp:
100// * A base pointer is definitely needed when there are both VLAs and local
101// variables with more-than-default alignment requirements.
102// * A frame pointer is definitely needed when there are local variables with
103// more-than-default alignment requirements.
104//
105// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
106// callee-saved area, since the unwind encoding does not allow for encoding
107// this dynamically and existing tools depend on this layout. For other
108// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
109// area to allow SVE stack objects (allocated directly below the callee-saves,
110// if available) to be accessed directly from the framepointer.
111// The SVE spill/fill instructions have VL-scaled addressing modes such
112// as:
113// ldr z8, [fp, #-7 mul vl]
114// For SVE the size of the vector length (VL) is not known at compile-time, so
115// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
116// layout, we don't need to add an unscaled offset to the framepointer before
117// accessing the SVE object in the frame.
118//
119// In some cases when a base pointer is not strictly needed, it is generated
120// anyway when offsets from the frame pointer to access local variables become
121// so large that the offset can't be encoded in the immediate fields of loads
122// or stores.
123//
124// Outgoing function arguments must be at the bottom of the stack frame when
125// calling another function. If we do not have variable-sized stack objects, we
126// can allocate a "reserved call frame" area at the bottom of the local
127// variable area, large enough for all outgoing calls. If we do have VLAs, then
128// the stack pointer must be decremented and incremented around each call to
129// make space for the arguments below the VLAs.
130//
131// FIXME: also explain the redzone concept.
132//
133// About stack hazards: Under some SME contexts, a coprocessor with its own
134// separate cache can used for FP operations. This can create hazards if the CPU
135// and the SME unit try to access the same area of memory, including if the
136// access is to an area of the stack. To try to alleviate this we attempt to
137// introduce extra padding into the stack frame between FP and GPR accesses,
138// controlled by the aarch64-stack-hazard-size option. Without changing the
139// layout of the stack frame in the diagram above, a stack object of size
140// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
141// to the stack objects section, and stack objects are sorted so that FPR >
142// Hazard padding slot > GPRs (where possible). Unfortunately some things are
143// not handled well (VLA area, arguments on the stack, objects with both GPR and
144// FPR accesses), but if those are controlled by the user then the entire stack
145// frame becomes GPR at the start/end with FPR in the middle, surrounded by
146// Hazard padding.
147//
148// An example of the prologue:
149//
150// .globl __foo
151// .align 2
152// __foo:
153// Ltmp0:
154// .cfi_startproc
155// .cfi_personality 155, ___gxx_personality_v0
156// Leh_func_begin:
157// .cfi_lsda 16, Lexception33
158//
159// stp xa,bx, [sp, -#offset]!
160// ...
161// stp x28, x27, [sp, #offset-32]
162// stp fp, lr, [sp, #offset-16]
163// add fp, sp, #offset - 16
164// sub sp, sp, #1360
165//
166// The Stack:
167// +-------------------------------------------+
168// 10000 | ........ | ........ | ........ | ........ |
169// 10004 | ........ | ........ | ........ | ........ |
170// +-------------------------------------------+
171// 10008 | ........ | ........ | ........ | ........ |
172// 1000c | ........ | ........ | ........ | ........ |
173// +===========================================+
174// 10010 | X28 Register |
175// 10014 | X28 Register |
176// +-------------------------------------------+
177// 10018 | X27 Register |
178// 1001c | X27 Register |
179// +===========================================+
180// 10020 | Frame Pointer |
181// 10024 | Frame Pointer |
182// +-------------------------------------------+
183// 10028 | Link Register |
184// 1002c | Link Register |
185// +===========================================+
186// 10030 | ........ | ........ | ........ | ........ |
187// 10034 | ........ | ........ | ........ | ........ |
188// +-------------------------------------------+
189// 10038 | ........ | ........ | ........ | ........ |
190// 1003c | ........ | ........ | ........ | ........ |
191// +-------------------------------------------+
192//
193// [sp] = 10030 :: >>initial value<<
194// sp = 10020 :: stp fp, lr, [sp, #-16]!
195// fp = sp == 10020 :: mov fp, sp
196// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
197// sp == 10010 :: >>final value<<
198//
199// The frame pointer (w29) points to address 10020. If we use an offset of
200// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
201// for w27, and -32 for w28:
202//
203// Ltmp1:
204// .cfi_def_cfa w29, 16
205// Ltmp2:
206// .cfi_offset w30, -8
207// Ltmp3:
208// .cfi_offset w29, -16
209// Ltmp4:
210// .cfi_offset w27, -24
211// Ltmp5:
212// .cfi_offset w28, -32
213//
214//===----------------------------------------------------------------------===//
215
216#include "AArch64FrameLowering.h"
217#include "AArch64InstrInfo.h"
220#include "AArch64RegisterInfo.h"
221#include "AArch64SMEAttributes.h"
222#include "AArch64Subtarget.h"
225#include "llvm/ADT/ScopeExit.h"
226#include "llvm/ADT/SmallVector.h"
244#include "llvm/IR/Attributes.h"
245#include "llvm/IR/CallingConv.h"
246#include "llvm/IR/DataLayout.h"
247#include "llvm/IR/DebugLoc.h"
248#include "llvm/IR/Function.h"
249#include "llvm/MC/MCAsmInfo.h"
250#include "llvm/MC/MCDwarf.h"
252#include "llvm/Support/Debug.h"
259#include <cassert>
260#include <cstdint>
261#include <iterator>
262#include <optional>
263#include <vector>
264
265using namespace llvm;
266
267#define DEBUG_TYPE "frame-info"
268
269static cl::opt<bool> EnableRedZone("aarch64-redzone",
270 cl::desc("enable use of redzone on AArch64"),
271 cl::init(false), cl::Hidden);
272
274 "stack-tagging-merge-settag",
275 cl::desc("merge settag instruction in function epilog"), cl::init(true),
276 cl::Hidden);
277
278static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
279 cl::desc("sort stack allocations"),
280 cl::init(true), cl::Hidden);
281
282static cl::opt<bool>
283 SplitSVEObjects("aarch64-split-sve-objects",
284 cl::desc("Split allocation of ZPR & PPR objects"),
285 cl::init(true), cl::Hidden);
286
288 "homogeneous-prolog-epilog", cl::Hidden,
289 cl::desc("Emit homogeneous prologue and epilogue for the size "
290 "optimization (default = off)"));
291
292// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
294 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
295 cl::Hidden);
296// Whether to insert padding into non-streaming functions (for testing).
297static cl::opt<bool>
298 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
299 cl::init(false), cl::Hidden);
300
302 "aarch64-disable-multivector-spill-fill",
303 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
304 cl::Hidden);
305
306int64_t
308 MachineBasicBlock &MBB) const {
309 MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
311 bool IsTailCallReturn = (MBB.end() != MBBI)
313 : false;
314
315 int64_t ArgumentPopSize = 0;
316 if (IsTailCallReturn) {
317 MachineOperand &StackAdjust = MBBI->getOperand(1);
318
319 // For a tail-call in a callee-pops-arguments environment, some or all of
320 // the stack may actually be in use for the call's arguments, this is
321 // calculated during LowerCall and consumed here...
322 ArgumentPopSize = StackAdjust.getImm();
323 } else {
324 // ... otherwise the amount to pop is *all* of the argument space,
325 // conveniently stored in the MachineFunctionInfo by
326 // LowerFormalArguments. This will, of course, be zero for the C calling
327 // convention.
328 ArgumentPopSize = AFI->getArgumentStackToRestore();
329 }
330
331 return ArgumentPopSize;
332}
333
335 MachineFunction &MF);
336
337enum class AssignObjectOffsets { No, Yes };
338/// Process all the SVE stack objects and the SVE stack size and offsets for
339/// each object. If AssignOffsets is "Yes", the offsets get assigned (and SVE
340/// stack sizes set). Returns the size of the SVE stack.
342 AssignObjectOffsets AssignOffsets);
343
344static unsigned getStackHazardSize(const MachineFunction &MF) {
345 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
346}
347
353
356 // With split SVE objects, the hazard padding is added to the PPR region,
357 // which places it between the [GPR, PPR] area and the [ZPR, FPR] area. This
358 // avoids hazards between both GPRs and FPRs and ZPRs and PPRs.
361 : 0,
362 AFI->getStackSizePPR());
363}
364
365// Conservatively, returns true if the function is likely to have SVE vectors
366// on the stack. This function is safe to be called before callee-saves or
367// object offsets have been determined.
369 const MachineFunction &MF) {
370 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
371 if (AFI->isSVECC())
372 return true;
373
374 if (AFI->hasCalculatedStackSizeSVE())
375 return bool(AFL.getSVEStackSize(MF));
376
377 const MachineFrameInfo &MFI = MF.getFrameInfo();
378 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd(); FI++) {
379 if (MFI.hasScalableStackID(FI))
380 return true;
381 }
382
383 return false;
384}
385
386static bool isTargetWindows(const MachineFunction &MF) {
387 // TODO: Should this include targets like UEFI (which use Windows CFI)?
388 // Note: Currently, there is not AArch64 support for UEFI. The value returned
389 // here must align with the predicate used for returning the list of callee
390 // saved regs in AArch64RegisterInfo::getCalleeSavedRegs(), so that we use
391 // invalidateWindowsRegisterPairing() where appropriate.
393}
394
396 const MachineFunction &MF) const {
397 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
398 return isTargetWindows(MF) && AFI->getSVECalleeSavedStackSize();
399}
400
401/// Returns true if a homogeneous prolog or epilog code can be emitted
402/// for the size optimization. If possible, a frame helper call is injected.
403/// When Exit block is given, this check is for epilog.
404bool AArch64FrameLowering::homogeneousPrologEpilog(
405 MachineFunction &MF, MachineBasicBlock *Exit) const {
406 if (!MF.getFunction().hasMinSize())
407 return false;
409 return false;
410 if (EnableRedZone)
411 return false;
412
413 // TODO: Window is supported yet.
414 if (isTargetWindows(MF))
415 return false;
416
417 // TODO: SVE is not supported yet.
418 if (isLikelyToHaveSVEStack(*this, MF))
419 return false;
420
421 // Bail on stack adjustment needed on return for simplicity.
422 const MachineFrameInfo &MFI = MF.getFrameInfo();
423 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
424 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
425 return false;
426 if (Exit && getArgumentStackToRestore(MF, *Exit))
427 return false;
428
429 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
430 if (AFI->hasSwiftAsyncContext() || AFI->hasStreamingModeChanges())
431 return false;
432
433 // If there are an odd number of GPRs before LR and FP in the CSRs list,
434 // they will not be paired into one RegPairInfo, which is incompatible with
435 // the assumption made by the homogeneous prolog epilog pass.
436 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
437 unsigned NumGPRs = 0;
438 for (unsigned I = 0; CSRegs[I]; ++I) {
439 Register Reg = CSRegs[I];
440 if (Reg == AArch64::LR) {
441 assert(CSRegs[I + 1] == AArch64::FP);
442 if (NumGPRs % 2 != 0)
443 return false;
444 break;
445 }
446 if (AArch64::GPR64RegClass.contains(Reg))
447 ++NumGPRs;
448 }
449
450 return true;
451}
452
453/// Returns true if CSRs should be paired.
454bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
455 return produceCompactUnwindFrame(*this, MF) || homogeneousPrologEpilog(MF);
456}
457
458/// This is the biggest offset to the stack pointer we can encode in aarch64
459/// instructions (without using a separate calculation and a temp register).
460/// Note that the exception here are vector stores/loads which cannot encode any
461/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
462static const unsigned DefaultSafeSPDisplacement = 255;
463
464/// Look at each instruction that references stack frames and return the stack
465/// size limit beyond which some of these instructions will require a scratch
466/// register during their expansion later.
468 // FIXME: For now, just conservatively guesstimate based on unscaled indexing
469 // range. We'll end up allocating an unnecessary spill slot a lot, but
470 // realistically that's not a big deal at this stage of the game.
471 for (MachineBasicBlock &MBB : MF) {
472 for (MachineInstr &MI : MBB) {
473 if (MI.isDebugInstr() || MI.isPseudo() ||
474 MI.getOpcode() == AArch64::ADDXri ||
475 MI.getOpcode() == AArch64::ADDSXri)
476 continue;
477
478 for (const MachineOperand &MO : MI.operands()) {
479 if (!MO.isFI())
480 continue;
481
483 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
485 return 0;
486 }
487 }
488 }
490}
491
496
497unsigned
498AArch64FrameLowering::getFixedObjectSize(const MachineFunction &MF,
499 const AArch64FunctionInfo *AFI,
500 bool IsWin64, bool IsFunclet) const {
501 assert(AFI->getTailCallReservedStack() % 16 == 0 &&
502 "Tail call reserved stack must be aligned to 16 bytes");
503 if (!IsWin64 || IsFunclet) {
504 return AFI->getTailCallReservedStack();
505 } else {
506 if (AFI->getTailCallReservedStack() != 0 &&
507 !MF.getFunction().getAttributes().hasAttrSomewhere(
508 Attribute::SwiftAsync))
509 report_fatal_error("cannot generate ABI-changing tail call for Win64");
510 unsigned FixedObjectSize = AFI->getTailCallReservedStack();
511
512 // Var args are stored here in the primary function.
513 FixedObjectSize += AFI->getVarArgsGPRSize();
514
515 if (MF.hasEHFunclets()) {
516 // Catch objects are stored here in the primary function.
517 const MachineFrameInfo &MFI = MF.getFrameInfo();
518 const WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
519 SmallSetVector<int, 8> CatchObjFrameIndices;
520 for (const WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
521 for (const WinEHHandlerType &H : TBME.HandlerArray) {
522 int FrameIndex = H.CatchObj.FrameIndex;
523 if ((FrameIndex != INT_MAX) &&
524 CatchObjFrameIndices.insert(FrameIndex)) {
525 FixedObjectSize = alignTo(FixedObjectSize,
526 MFI.getObjectAlign(FrameIndex).value()) +
527 MFI.getObjectSize(FrameIndex);
528 }
529 }
530 }
531 // To support EH funclets we allocate an UnwindHelp object
532 FixedObjectSize += 8;
533 }
534 return alignTo(FixedObjectSize, 16);
535 }
536}
537
539 if (!EnableRedZone)
540 return false;
541
542 // Don't use the red zone if the function explicitly asks us not to.
543 // This is typically used for kernel code.
544 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
545 const unsigned RedZoneSize =
547 if (!RedZoneSize)
548 return false;
549
550 const MachineFrameInfo &MFI = MF.getFrameInfo();
552 uint64_t NumBytes = AFI->getLocalStackSize();
553
554 // If neither NEON or SVE are available, a COPY from one Q-reg to
555 // another requires a spill -> reload sequence. We can do that
556 // using a pre-decrementing store/post-decrementing load, but
557 // if we do so, we can't use the Red Zone.
558 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
559 !Subtarget.isNeonAvailable() &&
560 !Subtarget.hasSVE();
561
562 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
563 AFI->hasSVEStackSize() || LowerQRegCopyThroughMem);
564}
565
566/// hasFPImpl - Return true if the specified function should have a dedicated
567/// frame pointer register.
569 const MachineFrameInfo &MFI = MF.getFrameInfo();
570 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
572
573 // Win64 EH requires a frame pointer if funclets are present, as the locals
574 // are accessed off the frame pointer in both the parent function and the
575 // funclets.
576 if (MF.hasEHFunclets())
577 return true;
578 // Retain behavior of always omitting the FP for leaf functions when possible.
580 return true;
581 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
582 MFI.hasStackMap() || MFI.hasPatchPoint() ||
583 RegInfo->hasStackRealignment(MF))
584 return true;
585
586 // If we:
587 //
588 // 1. Have streaming mode changes
589 // OR:
590 // 2. Have a streaming body with SVE stack objects
591 //
592 // Then the value of VG restored when unwinding to this function may not match
593 // the value of VG used to set up the stack.
594 //
595 // This is a problem as the CFA can be described with an expression of the
596 // form: CFA = SP + NumBytes + VG * NumScalableBytes.
597 //
598 // If the value of VG used in that expression does not match the value used to
599 // set up the stack, an incorrect address for the CFA will be computed, and
600 // unwinding will fail.
601 //
602 // We work around this issue by ensuring the frame-pointer can describe the
603 // CFA in either of these cases.
604 if (AFI.needsDwarfUnwindInfo(MF) &&
607 return true;
608 // With large callframes around we may need to use FP to access the scavenging
609 // emergency spillslot.
610 //
611 // Unfortunately some calls to hasFP() like machine verifier ->
612 // getReservedReg() -> hasFP in the middle of global isel are too early
613 // to know the max call frame size. Hopefully conservatively returning "true"
614 // in those cases is fine.
615 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
616 if (!MFI.isMaxCallFrameSizeComputed() ||
618 return true;
619
620 return false;
621}
622
623/// Should the Frame Pointer be reserved for the current function?
625 const TargetMachine &TM = MF.getTarget();
626 const Triple &TT = TM.getTargetTriple();
627
628 // These OSes require the frame chain is valid, even if the current frame does
629 // not use a frame pointer.
630 if (TT.isOSDarwin() || TT.isOSWindows())
631 return true;
632
633 // If the function has a frame pointer, it is reserved.
634 if (hasFP(MF))
635 return true;
636
637 // Frontend has requested to preserve the frame pointer.
639 return true;
640
641 return false;
642}
643
644/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
645/// not required, we reserve argument space for call sites in the function
646/// immediately on entry to the current function. This eliminates the need for
647/// add/sub sp brackets around call sites. Returns true if the call frame is
648/// included as part of the stack frame.
650 const MachineFunction &MF) const {
651 // The stack probing code for the dynamically allocated outgoing arguments
652 // area assumes that the stack is probed at the top - either by the prologue
653 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
654 // most recent variable-sized object allocation. Changing the condition here
655 // may need to be followed up by changes to the probe issuing logic.
656 return !MF.getFrameInfo().hasVarSizedObjects();
657}
658
662
663 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
664 const AArch64InstrInfo *TII = Subtarget.getInstrInfo();
665 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
666 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
667 DebugLoc DL = I->getDebugLoc();
668 unsigned Opc = I->getOpcode();
669 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
670 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
671
672 if (!hasReservedCallFrame(MF)) {
673 int64_t Amount = I->getOperand(0).getImm();
674 Amount = alignTo(Amount, getStackAlign());
675 if (!IsDestroy)
676 Amount = -Amount;
677
678 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
679 // doesn't have to pop anything), then the first operand will be zero too so
680 // this adjustment is a no-op.
681 if (CalleePopAmount == 0) {
682 // FIXME: in-function stack adjustment for calls is limited to 24-bits
683 // because there's no guaranteed temporary register available.
684 //
685 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
686 // 1) For offset <= 12-bit, we use LSL #0
687 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
688 // LSL #0, and the other uses LSL #12.
689 //
690 // Most call frames will be allocated at the start of a function so
691 // this is OK, but it is a limitation that needs dealing with.
692 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
693
694 if (TLI->hasInlineStackProbe(MF) &&
696 // When stack probing is enabled, the decrement of SP may need to be
697 // probed. We only need to do this if the call site needs 1024 bytes of
698 // space or more, because a region smaller than that is allowed to be
699 // unprobed at an ABI boundary. We rely on the fact that SP has been
700 // probed exactly at this point, either by the prologue or most recent
701 // dynamic allocation.
703 "non-reserved call frame without var sized objects?");
704 Register ScratchReg =
705 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
706 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
707 } else {
708 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
709 StackOffset::getFixed(Amount), TII);
710 }
711 }
712 } else if (CalleePopAmount != 0) {
713 // If the calling convention demands that the callee pops arguments from the
714 // stack, we want to add it back if we have a reserved call frame.
715 assert(CalleePopAmount < 0xffffff && "call frame too large");
716 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
717 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
718 }
719 return MBB.erase(I);
720}
721
723 MachineBasicBlock &MBB) const {
724
725 MachineFunction &MF = *MBB.getParent();
726 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
727 const auto &TRI = *Subtarget.getRegisterInfo();
728 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
729
730 CFIInstBuilder CFIBuilder(MBB, MBB.begin(), MachineInstr::NoFlags);
731
732 // Reset the CFA to `SP + 0`.
733 CFIBuilder.buildDefCFA(AArch64::SP, 0);
734
735 // Flip the RA sign state.
736 if (MFI.shouldSignReturnAddress(MF)) {
737 if (MFI.branchProtectionPAuthLR()) {
738 CFIBuilder.buildNegateRAStateWithPC();
739 } else if (!MF.getTarget().getTargetTriple().isOSBinFormatMachO()) {
740 CFIBuilder.buildNegateRAState();
741 }
742 }
743
744 // Shadow call stack uses X18, reset it.
745 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
746 CFIBuilder.buildSameValue(AArch64::X18);
747
748 // Emit .cfi_same_value for callee-saved registers.
749 const std::vector<CalleeSavedInfo> &CSI =
751 for (const auto &Info : CSI) {
752 MCRegister Reg = Info.getReg();
753 if (!TRI.regNeedsCFI(Reg, Reg))
754 continue;
755 CFIBuilder.buildSameValue(Reg);
756 }
757}
758
760 switch (Reg.id()) {
761 default:
762 // The called routine is expected to preserve r19-r28
763 // r29 and r30 are used as frame pointer and link register resp.
764 return 0;
765
766 // GPRs
767#define CASE(n) \
768 case AArch64::W##n: \
769 case AArch64::X##n: \
770 return AArch64::X##n
771 CASE(0);
772 CASE(1);
773 CASE(2);
774 CASE(3);
775 CASE(4);
776 CASE(5);
777 CASE(6);
778 CASE(7);
779 CASE(8);
780 CASE(9);
781 CASE(10);
782 CASE(11);
783 CASE(12);
784 CASE(13);
785 CASE(14);
786 CASE(15);
787 CASE(16);
788 CASE(17);
789 CASE(18);
790#undef CASE
791
792 // FPRs
793#define CASE(n) \
794 case AArch64::B##n: \
795 case AArch64::H##n: \
796 case AArch64::S##n: \
797 case AArch64::D##n: \
798 case AArch64::Q##n: \
799 return HasSVE ? AArch64::Z##n : AArch64::Q##n
800 CASE(0);
801 CASE(1);
802 CASE(2);
803 CASE(3);
804 CASE(4);
805 CASE(5);
806 CASE(6);
807 CASE(7);
808 CASE(8);
809 CASE(9);
810 CASE(10);
811 CASE(11);
812 CASE(12);
813 CASE(13);
814 CASE(14);
815 CASE(15);
816 CASE(16);
817 CASE(17);
818 CASE(18);
819 CASE(19);
820 CASE(20);
821 CASE(21);
822 CASE(22);
823 CASE(23);
824 CASE(24);
825 CASE(25);
826 CASE(26);
827 CASE(27);
828 CASE(28);
829 CASE(29);
830 CASE(30);
831 CASE(31);
832#undef CASE
833 }
834}
835
836void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
837 MachineBasicBlock &MBB) const {
838 // Insertion point.
840
841 // Fake a debug loc.
842 DebugLoc DL;
843 if (MBBI != MBB.end())
844 DL = MBBI->getDebugLoc();
845
846 const MachineFunction &MF = *MBB.getParent();
847 const AArch64Subtarget &STI = MF.getSubtarget<AArch64Subtarget>();
848 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
849
850 BitVector GPRsToZero(TRI.getNumRegs());
851 BitVector FPRsToZero(TRI.getNumRegs());
852 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
853 for (MCRegister Reg : RegsToZero.set_bits()) {
854 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
855 // For GPRs, we only care to clear out the 64-bit register.
856 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
857 GPRsToZero.set(XReg);
858 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
859 // For FPRs,
860 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
861 FPRsToZero.set(XReg);
862 }
863 }
864
865 const AArch64InstrInfo &TII = *STI.getInstrInfo();
866
867 // Zero out GPRs.
868 for (MCRegister Reg : GPRsToZero.set_bits())
869 TII.buildClearRegister(Reg, MBB, MBBI, DL);
870
871 // Zero out FP/vector registers.
872 for (MCRegister Reg : FPRsToZero.set_bits())
873 TII.buildClearRegister(Reg, MBB, MBBI, DL);
874
875 if (HasSVE) {
876 for (MCRegister PReg :
877 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
878 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
879 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
880 AArch64::P15}) {
881 if (RegsToZero[PReg])
882 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
883 }
884 }
885}
886
887bool AArch64FrameLowering::windowsRequiresStackProbe(
888 const MachineFunction &MF, uint64_t StackSizeInBytes) const {
889 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
890 const AArch64FunctionInfo &MFI = *MF.getInfo<AArch64FunctionInfo>();
891 // TODO: When implementing stack protectors, take that into account
892 // for the probe threshold.
893 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
894 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
895}
896
898 const MachineBasicBlock &MBB) {
899 const MachineFunction *MF = MBB.getParent();
900 LiveRegs.addLiveIns(MBB);
901 // Mark callee saved registers as used so we will not choose them.
902 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
903 for (unsigned i = 0; CSRegs[i]; ++i)
904 LiveRegs.addReg(CSRegs[i]);
905}
906
908AArch64FrameLowering::findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB,
909 bool HasCall) const {
910 MachineFunction *MF = MBB->getParent();
911
912 // If MBB is an entry block, use X9 as the scratch register
913 // preserve_none functions may be using X9 to pass arguments,
914 // so prefer to pick an available register below.
915 if (&MF->front() == MBB &&
917 return AArch64::X9;
918
919 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
920 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
921 LivePhysRegs LiveRegs(TRI);
922 getLiveRegsForEntryMBB(LiveRegs, *MBB);
923 if (HasCall) {
924 LiveRegs.addReg(AArch64::X16);
925 LiveRegs.addReg(AArch64::X17);
926 LiveRegs.addReg(AArch64::X18);
927 }
928
929 // Prefer X9 since it was historically used for the prologue scratch reg.
930 const MachineRegisterInfo &MRI = MF->getRegInfo();
931 if (LiveRegs.available(MRI, AArch64::X9))
932 return AArch64::X9;
933
934 for (unsigned Reg : AArch64::GPR64RegClass) {
935 if (LiveRegs.available(MRI, Reg))
936 return Reg;
937 }
938 return AArch64::NoRegister;
939}
940
942 const MachineBasicBlock &MBB) const {
943 const MachineFunction *MF = MBB.getParent();
944 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
945 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
946 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
947 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
949
950 if (AFI->hasSwiftAsyncContext()) {
951 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
952 const MachineRegisterInfo &MRI = MF->getRegInfo();
955 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
956 // available.
957 if (!LiveRegs.available(MRI, AArch64::X16) ||
958 !LiveRegs.available(MRI, AArch64::X17))
959 return false;
960 }
961
962 // Certain stack probing sequences might clobber flags, then we can't use
963 // the block as a prologue if the flags register is a live-in.
965 MBB.isLiveIn(AArch64::NZCV))
966 return false;
967
968 if (RegInfo->hasStackRealignment(*MF) || TLI->hasInlineStackProbe(*MF))
969 if (findScratchNonCalleeSaveRegister(TmpMBB) == AArch64::NoRegister)
970 return false;
971
972 // May need a scratch register (for return value) if require making a special
973 // call
974 if (requiresSaveVG(*MF) ||
975 windowsRequiresStackProbe(*MF, std::numeric_limits<uint64_t>::max()))
976 if (findScratchNonCalleeSaveRegister(TmpMBB, true) == AArch64::NoRegister)
977 return false;
978
979 return true;
980}
981
983 const Function &F = MF.getFunction();
984 return MF.getTarget().getMCAsmInfo().usesWindowsCFI() &&
985 F.needsUnwindTableEntry();
986}
987
988bool AArch64FrameLowering::shouldSignReturnAddressEverywhere(
989 const MachineFunction &MF) const {
990 // FIXME: With WinCFI, extra care should be taken to place SEH_PACSignLR
991 // and SEH_EpilogEnd instructions in the correct order.
993 return false;
996}
997
998// Given a load or a store instruction, generate an appropriate unwinding SEH
999// code on Windows.
1001AArch64FrameLowering::insertSEH(MachineBasicBlock::iterator MBBI,
1002 const AArch64InstrInfo &TII,
1003 MachineInstr::MIFlag Flag) const {
1004 unsigned Opc = MBBI->getOpcode();
1005 MachineBasicBlock *MBB = MBBI->getParent();
1006 MachineFunction &MF = *MBB->getParent();
1007 DebugLoc DL = MBBI->getDebugLoc();
1008 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1009 int Imm = MBBI->getOperand(ImmIdx).getImm();
1011 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1012 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1013
1014 switch (Opc) {
1015 default:
1016 report_fatal_error("No SEH Opcode for this instruction");
1017 case AArch64::STR_ZXI:
1018 case AArch64::LDR_ZXI: {
1019 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1020 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveZReg))
1021 .addImm(Reg0)
1022 .addImm(Imm)
1023 .setMIFlag(Flag);
1024 break;
1025 }
1026 case AArch64::STR_PXI:
1027 case AArch64::LDR_PXI: {
1028 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1029 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SavePReg))
1030 .addImm(Reg0)
1031 .addImm(Imm)
1032 .setMIFlag(Flag);
1033 break;
1034 }
1035 case AArch64::LDPDpost:
1036 Imm = -Imm;
1037 [[fallthrough]];
1038 case AArch64::STPDpre: {
1039 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1040 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1041 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1042 .addImm(Reg0)
1043 .addImm(Reg1)
1044 .addImm(Imm * 8)
1045 .setMIFlag(Flag);
1046 break;
1047 }
1048 case AArch64::LDPXpost:
1049 Imm = -Imm;
1050 [[fallthrough]];
1051 case AArch64::STPXpre: {
1052 Register Reg0 = MBBI->getOperand(1).getReg();
1053 Register Reg1 = MBBI->getOperand(2).getReg();
1054 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1055 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1056 .addImm(Imm * 8)
1057 .setMIFlag(Flag);
1058 else
1059 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1060 .addImm(RegInfo->getSEHRegNum(Reg0))
1061 .addImm(RegInfo->getSEHRegNum(Reg1))
1062 .addImm(Imm * 8)
1063 .setMIFlag(Flag);
1064 break;
1065 }
1066 case AArch64::LDRDpost:
1067 Imm = -Imm;
1068 [[fallthrough]];
1069 case AArch64::STRDpre: {
1070 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1071 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1072 .addImm(Reg)
1073 .addImm(Imm)
1074 .setMIFlag(Flag);
1075 break;
1076 }
1077 case AArch64::LDRXpost:
1078 Imm = -Imm;
1079 [[fallthrough]];
1080 case AArch64::STRXpre: {
1081 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1082 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1083 .addImm(Reg)
1084 .addImm(Imm)
1085 .setMIFlag(Flag);
1086 break;
1087 }
1088 case AArch64::STPDi:
1089 case AArch64::LDPDi: {
1090 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1091 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1092 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1093 .addImm(Reg0)
1094 .addImm(Reg1)
1095 .addImm(Imm * 8)
1096 .setMIFlag(Flag);
1097 break;
1098 }
1099 case AArch64::STPXi:
1100 case AArch64::LDPXi: {
1101 Register Reg0 = MBBI->getOperand(0).getReg();
1102 Register Reg1 = MBBI->getOperand(1).getReg();
1103
1104 int SEHReg0 = RegInfo->getSEHRegNum(Reg0);
1105 int SEHReg1 = RegInfo->getSEHRegNum(Reg1);
1106
1107 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1108 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1109 .addImm(Imm * 8)
1110 .setMIFlag(Flag);
1111 else if (SEHReg0 >= 19 && SEHReg1 >= 19)
1112 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1113 .addImm(SEHReg0)
1114 .addImm(SEHReg1)
1115 .addImm(Imm * 8)
1116 .setMIFlag(Flag);
1117 else
1118 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegIP))
1119 .addImm(SEHReg0)
1120 .addImm(SEHReg1)
1121 .addImm(Imm * 8)
1122 .setMIFlag(Flag);
1123 break;
1124 }
1125 case AArch64::STRXui:
1126 case AArch64::LDRXui: {
1127 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1128 if (Reg >= 19)
1129 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1130 .addImm(Reg)
1131 .addImm(Imm * 8)
1132 .setMIFlag(Flag);
1133 else
1134 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegI))
1135 .addImm(Reg)
1136 .addImm(Imm * 8)
1137 .setMIFlag(Flag);
1138 break;
1139 }
1140 case AArch64::STRDui:
1141 case AArch64::LDRDui: {
1142 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1143 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1144 .addImm(Reg)
1145 .addImm(Imm * 8)
1146 .setMIFlag(Flag);
1147 break;
1148 }
1149 case AArch64::STPQi:
1150 case AArch64::LDPQi: {
1151 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1152 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1153 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1154 .addImm(Reg0)
1155 .addImm(Reg1)
1156 .addImm(Imm * 16)
1157 .setMIFlag(Flag);
1158 break;
1159 }
1160 case AArch64::LDPQpost:
1161 Imm = -Imm;
1162 [[fallthrough]];
1163 case AArch64::STPQpre: {
1164 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1165 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1166 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1167 .addImm(Reg0)
1168 .addImm(Reg1)
1169 .addImm(Imm * 16)
1170 .setMIFlag(Flag);
1171 break;
1172 }
1173 }
1174 auto I = MBB->insertAfter(MBBI, MIB);
1175 return I;
1176}
1177
1180 if (!AFI->needsDwarfUnwindInfo(MF) || !AFI->hasStreamingModeChanges())
1181 return false;
1182 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1183 // is enabled with streaming mode changes.
1184 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1185 if (ST.isTargetDarwin())
1186 return ST.hasSVE();
1187 return true;
1188}
1189
1191 MachineFunction &MF) const {
1192 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1193 const AArch64InstrInfo *TII = Subtarget.getInstrInfo();
1194
1195 auto EmitSignRA = [&](MachineBasicBlock &MBB) {
1196 DebugLoc DL; // Set debug location to unknown.
1198
1199 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1201 };
1202
1203 auto EmitAuthRA = [&](MachineBasicBlock &MBB) {
1204 DebugLoc DL;
1205 MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();
1206 if (MBBI != MBB.end())
1207 DL = MBBI->getDebugLoc();
1208
1209 TII->createPauthEpilogueInstr(MBB, DL);
1210 };
1211
1212 // This should be in sync with PEIImpl::calculateSaveRestoreBlocks.
1213 EmitSignRA(MF.front());
1214 for (MachineBasicBlock &MBB : MF) {
1215 if (MBB.isEHFuncletEntry())
1216 EmitSignRA(MBB);
1217 if (MBB.isReturnBlock())
1218 EmitAuthRA(MBB);
1219 }
1220}
1221
1223 MachineBasicBlock &MBB) const {
1224 AArch64PrologueEmitter PrologueEmitter(MF, MBB, *this);
1225 PrologueEmitter.emitPrologue();
1226}
1227
1229 MachineBasicBlock &MBB) const {
1230 AArch64EpilogueEmitter EpilogueEmitter(MF, MBB, *this);
1231 EpilogueEmitter.emitEpilogue();
1232}
1233
1236 MF.getInfo<AArch64FunctionInfo>()->needsDwarfUnwindInfo(MF);
1237}
1238
1240 return enableCFIFixup(MF) &&
1241 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
1242}
1243
1244/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
1245/// debug info. It's the same as what we use for resolving the code-gen
1246/// references for now. FIXME: This can go wrong when references are
1247/// SP-relative and simple call frames aren't used.
1250 Register &FrameReg) const {
1252 MF, FI, FrameReg,
1253 /*PreferFP=*/
1254 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
1255 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
1256 /*ForSimm=*/false);
1257}
1258
1261 int FI) const {
1262 // This function serves to provide a comparable offset from a single reference
1263 // point (the value of SP at function entry) that can be used for analysis,
1264 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
1265 // correct for all objects in the presence of VLA-area objects or dynamic
1266 // stack re-alignment.
1267
1268 const auto &MFI = MF.getFrameInfo();
1269
1270 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1271 StackOffset ZPRStackSize = getZPRStackSize(MF);
1272 StackOffset PPRStackSize = getPPRStackSize(MF);
1273 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1274
1275 // For VLA-area objects, just emit an offset at the end of the stack frame.
1276 // Whilst not quite correct, these objects do live at the end of the frame and
1277 // so it is more useful for analysis for the offset to reflect this.
1278 if (MFI.isVariableSizedObjectIndex(FI)) {
1279 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
1280 }
1281
1282 // This is correct in the absence of any SVE stack objects.
1283 if (!SVEStackSize)
1284 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
1285
1286 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1287 bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
1288 if (MFI.hasScalableStackID(FI)) {
1289 if (FPAfterSVECalleeSaves &&
1290 -ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1291 assert(!AFI->hasSplitSVEObjects() &&
1292 "split-sve-objects not supported with FPAfterSVECalleeSaves");
1293 return StackOffset::getScalable(ObjectOffset);
1294 }
1295 StackOffset AccessOffset{};
1296 // The scalable vectors are below (lower address) the scalable predicates
1297 // with split SVE objects, so we must subtract the size of the predicates.
1298 if (AFI->hasSplitSVEObjects() &&
1299 MFI.getStackID(FI) == TargetStackID::ScalableVector)
1300 AccessOffset = -PPRStackSize;
1301 return AccessOffset +
1302 StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
1303 ObjectOffset);
1304 }
1305
1306 bool IsFixed = MFI.isFixedObjectIndex(FI);
1307 bool IsCSR =
1308 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1309
1310 StackOffset ScalableOffset = {};
1311 if (!IsFixed && !IsCSR) {
1312 ScalableOffset = -SVEStackSize;
1313 } else if (FPAfterSVECalleeSaves && IsCSR) {
1314 ScalableOffset =
1316 }
1317
1318 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
1319}
1320
1326
1327StackOffset AArch64FrameLowering::getFPOffset(const MachineFunction &MF,
1328 int64_t ObjectOffset) const {
1329 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1330 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1331 const Function &F = MF.getFunction();
1332 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1333 unsigned FixedObject =
1334 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
1335 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
1336 int64_t FPAdjust =
1337 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
1338 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
1339}
1340
1341StackOffset AArch64FrameLowering::getStackOffset(const MachineFunction &MF,
1342 int64_t ObjectOffset) const {
1343 const auto &MFI = MF.getFrameInfo();
1344 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
1345}
1346
1347// TODO: This function currently does not work for scalable vectors.
1349 int FI) const {
1350 const AArch64RegisterInfo *RegInfo =
1351 MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
1352 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
1353 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
1354 ? getFPOffset(MF, ObjectOffset).getFixed()
1355 : getStackOffset(MF, ObjectOffset).getFixed();
1356}
1357
1359 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
1360 bool ForSimm) const {
1361 const auto &MFI = MF.getFrameInfo();
1362 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1363 bool isFixed = MFI.isFixedObjectIndex(FI);
1364 auto StackID = static_cast<TargetStackID::Value>(MFI.getStackID(FI));
1365 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, StackID,
1366 FrameReg, PreferFP, ForSimm);
1367}
1368
1370 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed,
1371 TargetStackID::Value StackID, Register &FrameReg, bool PreferFP,
1372 bool ForSimm) const {
1373 const auto &MFI = MF.getFrameInfo();
1374 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1375 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1376 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1377
1378 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
1379 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
1380 bool isCSR =
1381 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1382 bool isSVE = MFI.isScalableStackID(StackID);
1383
1384 StackOffset ZPRStackSize = getZPRStackSize(MF);
1385 StackOffset PPRStackSize = getPPRStackSize(MF);
1386 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1387
1388 // Use frame pointer to reference fixed objects. Use it for locals if
1389 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
1390 // reliable as a base). Make sure useFPForScavengingIndex() does the
1391 // right thing for the emergency spill slot.
1392 bool UseFP = false;
1393 if (AFI->hasStackFrame() && !isSVE) {
1394 // We shouldn't prefer using the FP to access fixed-sized stack objects when
1395 // there are scalable (SVE) objects in between the FP and the fixed-sized
1396 // objects.
1397 PreferFP &= !SVEStackSize;
1398
1399 // Note: Keeping the following as multiple 'if' statements rather than
1400 // merging to a single expression for readability.
1401 //
1402 // Argument access should always use the FP.
1403 if (isFixed) {
1404 UseFP = hasFP(MF);
1405 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
1406 // References to the CSR area must use FP if we're re-aligning the stack
1407 // since the dynamically-sized alignment padding is between the SP/BP and
1408 // the CSR area.
1409 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
1410 UseFP = true;
1411 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
1412 // If the FPOffset is negative and we're producing a signed immediate, we
1413 // have to keep in mind that the available offset range for negative
1414 // offsets is smaller than for positive ones. If an offset is available
1415 // via the FP and the SP, use whichever is closest.
1416 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
1417 PreferFP |= Offset > -FPOffset && !SVEStackSize;
1418
1419 if (FPOffset >= 0) {
1420 // If the FPOffset is positive, that'll always be best, as the SP/BP
1421 // will be even further away.
1422 UseFP = true;
1423 } else if (MFI.hasVarSizedObjects()) {
1424 // If we have variable sized objects, we can use either FP or BP, as the
1425 // SP offset is unknown. We can use the base pointer if we have one and
1426 // FP is not preferred. If not, we're stuck with using FP.
1427 bool CanUseBP = RegInfo->hasBasePointer(MF);
1428 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
1429 UseFP = PreferFP;
1430 else if (!CanUseBP) // Can't use BP. Forced to use FP.
1431 UseFP = true;
1432 // else we can use BP and FP, but the offset from FP won't fit.
1433 // That will make us scavenge registers which we can probably avoid by
1434 // using BP. If it won't fit for BP either, we'll scavenge anyway.
1435 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
1436 // Funclets access the locals contained in the parent's stack frame
1437 // via the frame pointer, so we have to use the FP in the parent
1438 // function.
1439 (void) Subtarget;
1440 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
1441 MF.getFunction().isVarArg()) &&
1442 "Funclets should only be present on Win64");
1443 UseFP = true;
1444 } else {
1445 // We have the choice between FP and (SP or BP).
1446 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
1447 UseFP = true;
1448 }
1449 }
1450 }
1451
1452 assert(
1453 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
1454 "In the presence of dynamic stack pointer realignment, "
1455 "non-argument/CSR objects cannot be accessed through the frame pointer");
1456
1457 bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
1458
1459 if (isSVE) {
1460 StackOffset FPOffset = StackOffset::get(
1461 -AFI->getCalleeSaveBaseToFrameRecordOffset(), ObjectOffset);
1462 StackOffset SPOffset =
1463 SVEStackSize +
1464 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
1465 ObjectOffset);
1466
1467 // With split SVE objects the ObjectOffset is relative to the split area
1468 // (i.e. the PPR area or ZPR area respectively).
1469 if (AFI->hasSplitSVEObjects() && StackID == TargetStackID::ScalableVector) {
1470 // If we're accessing an SVE vector with split SVE objects...
1471 // - From the FP we need to move down past the PPR area:
1472 FPOffset -= PPRStackSize;
1473 // - From the SP we only need to move up to the ZPR area:
1474 SPOffset -= PPRStackSize;
1475 // Note: `SPOffset = SVEStackSize + ...`, so `-= PPRStackSize` results in
1476 // `SPOffset = ZPRStackSize + ...`.
1477 }
1478
1479 if (FPAfterSVECalleeSaves) {
1481 if (-ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1484 }
1485 }
1486
1487 // Always use the FP for SVE spills if available and beneficial.
1488 if (hasFP(MF) && (SPOffset.getFixed() ||
1489 FPOffset.getScalable() < SPOffset.getScalable() ||
1490 RegInfo->hasStackRealignment(MF))) {
1491 FrameReg = RegInfo->getFrameRegister(MF);
1492 return FPOffset;
1493 }
1494 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
1495 : MCRegister(AArch64::SP);
1496
1497 return SPOffset;
1498 }
1499
1500 StackOffset SVEAreaOffset = {};
1501 if (FPAfterSVECalleeSaves) {
1502 // In this stack layout, the FP is in between the callee saves and other
1503 // SVE allocations.
1504 StackOffset SVECalleeSavedStack =
1506 if (UseFP) {
1507 if (isFixed)
1508 SVEAreaOffset = SVECalleeSavedStack;
1509 else if (!isCSR)
1510 SVEAreaOffset = SVECalleeSavedStack - SVEStackSize;
1511 } else {
1512 if (isFixed)
1513 SVEAreaOffset = SVEStackSize;
1514 else if (isCSR)
1515 SVEAreaOffset = SVEStackSize - SVECalleeSavedStack;
1516 }
1517 } else {
1518 if (UseFP && !(isFixed || isCSR))
1519 SVEAreaOffset = -SVEStackSize;
1520 if (!UseFP && (isFixed || isCSR))
1521 SVEAreaOffset = SVEStackSize;
1522 }
1523
1524 if (UseFP) {
1525 FrameReg = RegInfo->getFrameRegister(MF);
1526 return StackOffset::getFixed(FPOffset) + SVEAreaOffset;
1527 }
1528
1529 // Use the base pointer if we have one.
1530 if (RegInfo->hasBasePointer(MF))
1531 FrameReg = RegInfo->getBaseRegister();
1532 else {
1533 assert(!MFI.hasVarSizedObjects() &&
1534 "Can't use SP when we have var sized objects.");
1535 FrameReg = AArch64::SP;
1536 // If we're using the red zone for this function, the SP won't actually
1537 // be adjusted, so the offsets will be negative. They're also all
1538 // within range of the signed 9-bit immediate instructions.
1539 if (canUseRedZone(MF))
1540 Offset -= AFI->getLocalStackSize();
1541 }
1542
1543 return StackOffset::getFixed(Offset) + SVEAreaOffset;
1544}
1545
1547 // Do not set a kill flag on values that are also marked as live-in. This
1548 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
1549 // callee saved registers.
1550 // Omitting the kill flags is conservatively correct even if the live-in
1551 // is not used after all.
1552 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
1553 return getKillRegState(!IsLiveIn);
1554}
1555
1557 MachineFunction &MF) {
1558 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1559 AttributeList Attrs = MF.getFunction().getAttributes();
1561 return Subtarget.isTargetMachO() &&
1562 !(Subtarget.getTargetLowering()->supportSwiftError() &&
1563 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
1565 !AFL.requiresSaveVG(MF) && !AFI->isSVECC();
1566}
1567
1568static bool invalidateWindowsRegisterPairing(bool SpillExtendedVolatile,
1569 unsigned SpillCount, unsigned Reg1,
1570 unsigned Reg2, bool NeedsWinCFI,
1571 const TargetRegisterInfo *TRI) {
1572 // If we are generating register pairs for a Windows function that requires
1573 // EH support, then pair consecutive registers only. There are no unwind
1574 // opcodes for saves/restores of non-consecutive register pairs.
1575 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
1576 // save_lrpair.
1577 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
1578
1579 if (Reg2 == AArch64::FP)
1580 return true;
1581 if (!NeedsWinCFI)
1582 return false;
1583
1584 // ARM64EC introduced `save_any_regp`, which expects 16-byte alignment.
1585 // This is handled by only allowing paired spills for registers spilled at
1586 // even positions (which should be 16-byte aligned, as other GPRs/FPRs are
1587 // 8-bytes). We carve out an exception for {FP,LR}, which does not require
1588 // 16-byte alignment in the uop representation.
1589 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
1590 return SpillExtendedVolatile
1591 ? !((Reg1 == AArch64::FP && Reg2 == AArch64::LR) ||
1592 (SpillCount % 2) == 0)
1593 : false;
1594
1595 // If pairing a GPR with LR, the pair can be described by the save_lrpair
1596 // opcode. The save_lrpair opcode requires the first register to be odd.
1597 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
1598 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR)
1599 return false;
1600 return true;
1601}
1602
1603/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
1604/// WindowsCFI requires that only consecutive registers can be paired.
1605/// LR and FP need to be allocated together when the frame needs to save
1606/// the frame-record. This means any other register pairing with LR is invalid.
1607static bool invalidateRegisterPairing(bool SpillExtendedVolatile,
1608 unsigned SpillCount, unsigned Reg1,
1609 unsigned Reg2, bool UsesWinAAPCS,
1610 bool NeedsWinCFI, bool NeedsFrameRecord,
1611 const TargetRegisterInfo *TRI) {
1612 if (UsesWinAAPCS)
1613 return invalidateWindowsRegisterPairing(SpillExtendedVolatile, SpillCount,
1614 Reg1, Reg2, NeedsWinCFI, TRI);
1615
1616 // If we need to store the frame record, don't pair any register
1617 // with LR other than FP.
1618 if (NeedsFrameRecord)
1619 return Reg2 == AArch64::LR;
1620
1621 return false;
1622}
1623
1624namespace {
1625
1626struct RegPairInfo {
1627 Register Reg1;
1628 Register Reg2;
1629 int FrameIdx;
1630 int Offset;
1631 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
1632 const TargetRegisterClass *RC;
1633
1634 RegPairInfo() = default;
1635
1636 bool isPaired() const { return Reg2.isValid(); }
1637
1638 bool isScalable() const { return Type == PPR || Type == ZPR; }
1639};
1640
1641} // end anonymous namespace
1642
1644 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
1645 if (SavedRegs.test(PReg)) {
1646 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
1647 return MCRegister(PNReg);
1648 }
1649 }
1650 return MCRegister();
1651}
1652
1653// The multivector LD/ST are available only for SME or SVE2p1 targets
1655 MachineFunction &MF) {
1657 return false;
1658
1659 SMEAttrs FuncAttrs = MF.getInfo<AArch64FunctionInfo>()->getSMEFnAttrs();
1660 bool IsLocallyStreaming =
1661 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
1662
1663 // Only when in streaming mode SME2 instructions can be safely used.
1664 // It is not safe to use SME2 instructions when in streaming compatible or
1665 // locally streaming mode.
1666 return Subtarget.hasSVE2p1() ||
1667 (Subtarget.hasSME2() &&
1668 (!IsLocallyStreaming && Subtarget.isStreaming()));
1669}
1670
1672 MachineFunction &MF,
1674 const TargetRegisterInfo *TRI,
1676 bool NeedsFrameRecord) {
1677
1678 if (CSI.empty())
1679 return;
1680
1681 bool IsWindows = isTargetWindows(MF);
1683 unsigned StackHazardSize = getStackHazardSize(MF);
1684 MachineFrameInfo &MFI = MF.getFrameInfo();
1686 unsigned Count = CSI.size();
1687 (void)CC;
1688 // MachO's compact unwind format relies on all registers being stored in
1689 // pairs.
1690 assert((!produceCompactUnwindFrame(AFL, MF) ||
1693 (Count & 1) == 0) &&
1694 "Odd number of callee-saved regs to spill!");
1695 int ByteOffset = AFI->getCalleeSavedStackSize();
1696 int StackFillDir = -1;
1697 int RegInc = 1;
1698 unsigned FirstReg = 0;
1699 if (IsWindows) {
1700 // For WinCFI, fill the stack from the bottom up.
1701 ByteOffset = 0;
1702 StackFillDir = 1;
1703 // As the CSI array is reversed to match PrologEpilogInserter, iterate
1704 // backwards, to pair up registers starting from lower numbered registers.
1705 RegInc = -1;
1706 FirstReg = Count - 1;
1707 }
1708
1709 bool FPAfterSVECalleeSaves = AFL.hasSVECalleeSavesAboveFrameRecord(MF);
1710 // Windows AAPCS has x9-x15 as volatile registers, x16-x17 as intra-procedural
1711 // scratch, x18 as platform reserved. However, clang has extended calling
1712 // convensions such as preserve_most and preserve_all which treat these as
1713 // CSR. As such, the ARM64 unwind uOPs bias registers by 19. We use ARM64EC
1714 // uOPs which have separate restrictions. We need to check for that.
1715 //
1716 // NOTE: we currently do not account for the D registers as LLVM does not
1717 // support non-ABI compliant D register spills.
1718 bool SpillExtendedVolatile =
1719 IsWindows && llvm::any_of(CSI, [](const CalleeSavedInfo &CSI) {
1720 const auto &Reg = CSI.getReg();
1721 return Reg >= AArch64::X0 && Reg <= AArch64::X18;
1722 });
1723
1724 int ZPRByteOffset = 0;
1725 int PPRByteOffset = 0;
1726 bool SplitPPRs = AFI->hasSplitSVEObjects();
1727 if (SplitPPRs) {
1728 ZPRByteOffset = AFI->getZPRCalleeSavedStackSize();
1729 PPRByteOffset = AFI->getPPRCalleeSavedStackSize();
1730 } else if (!FPAfterSVECalleeSaves) {
1731 ZPRByteOffset =
1733 // Unused: Everything goes in ZPR space.
1734 PPRByteOffset = 0;
1735 }
1736
1737 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
1738 Register LastReg = 0;
1739 bool HasCSHazardPadding = AFI->hasStackHazardSlotIndex() && !SplitPPRs;
1740
1741 auto AlignOffset = [StackFillDir](int Offset, int Align) {
1742 if (StackFillDir < 0)
1743 return alignDown(Offset, Align);
1744 return alignTo(Offset, Align);
1745 };
1746
1747 // When iterating backwards, the loop condition relies on unsigned wraparound.
1748 for (unsigned i = FirstReg; i < Count; i += RegInc) {
1749 RegPairInfo RPI;
1750 RPI.Reg1 = CSI[i].getReg();
1751
1752 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
1753 RPI.Type = RegPairInfo::GPR;
1754 RPI.RC = &AArch64::GPR64RegClass;
1755 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
1756 RPI.Type = RegPairInfo::FPR64;
1757 RPI.RC = &AArch64::FPR64RegClass;
1758 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
1759 RPI.Type = RegPairInfo::FPR128;
1760 RPI.RC = &AArch64::FPR128RegClass;
1761 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
1762 RPI.Type = RegPairInfo::ZPR;
1763 RPI.RC = &AArch64::ZPRRegClass;
1764 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
1765 RPI.Type = RegPairInfo::PPR;
1766 RPI.RC = &AArch64::PPRRegClass;
1767 } else if (RPI.Reg1 == AArch64::VG) {
1768 RPI.Type = RegPairInfo::VG;
1769 RPI.RC = &AArch64::FIXED_REGSRegClass;
1770 } else {
1771 llvm_unreachable("Unsupported register class.");
1772 }
1773
1774 int &ScalableByteOffset = RPI.Type == RegPairInfo::PPR && SplitPPRs
1775 ? PPRByteOffset
1776 : ZPRByteOffset;
1777
1778 // Add the stack hazard size as we transition from GPR->FPR CSRs.
1779 if (HasCSHazardPadding &&
1780 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
1782 ByteOffset += StackFillDir * StackHazardSize;
1783 LastReg = RPI.Reg1;
1784
1785 bool NeedsWinCFI = AFL.needsWinCFI(MF);
1786 int Scale = TRI->getSpillSize(*RPI.RC);
1787 // Add the next reg to the pair if it is in the same register class.
1788 if (unsigned(i + RegInc) < Count && !HasCSHazardPadding) {
1789 MCRegister NextReg = CSI[i + RegInc].getReg();
1790 unsigned SpillCount = NeedsWinCFI ? FirstReg - i : i;
1791 switch (RPI.Type) {
1792 case RegPairInfo::GPR:
1793 if (AArch64::GPR64RegClass.contains(NextReg) &&
1794 !invalidateRegisterPairing(SpillExtendedVolatile, SpillCount,
1795 RPI.Reg1, NextReg, IsWindows,
1796 NeedsWinCFI, NeedsFrameRecord, TRI))
1797 RPI.Reg2 = NextReg;
1798 break;
1799 case RegPairInfo::FPR64:
1800 if (AArch64::FPR64RegClass.contains(NextReg) &&
1801 !invalidateRegisterPairing(SpillExtendedVolatile, SpillCount,
1802 RPI.Reg1, NextReg, IsWindows,
1803 NeedsWinCFI, NeedsFrameRecord, TRI))
1804 RPI.Reg2 = NextReg;
1805 break;
1806 case RegPairInfo::FPR128:
1807 if (AArch64::FPR128RegClass.contains(NextReg))
1808 RPI.Reg2 = NextReg;
1809 break;
1810 case RegPairInfo::PPR:
1811 break;
1812 case RegPairInfo::ZPR:
1813 if (AFI->getPredicateRegForFillSpill() != 0 &&
1814 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
1815 // Calculate offset of register pair to see if pair instruction can be
1816 // used.
1817 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
1818 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
1819 RPI.Reg2 = NextReg;
1820 }
1821 break;
1822 case RegPairInfo::VG:
1823 break;
1824 }
1825 }
1826
1827 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
1828 // list to come in sorted by frame index so that we can issue the store
1829 // pair instructions directly. Assert if we see anything otherwise.
1830 //
1831 // The order of the registers in the list is controlled by
1832 // getCalleeSavedRegs(), so they will always be in-order, as well.
1833 assert((!RPI.isPaired() ||
1834 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
1835 "Out of order callee saved regs!");
1836
1837 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
1838 RPI.Reg1 == AArch64::LR) &&
1839 "FrameRecord must be allocated together with LR");
1840
1841 // Windows AAPCS has FP and LR reversed.
1842 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
1843 RPI.Reg2 == AArch64::LR) &&
1844 "FrameRecord must be allocated together with LR");
1845
1846 // MachO's compact unwind format relies on all registers being stored in
1847 // adjacent register pairs.
1848 assert((!produceCompactUnwindFrame(AFL, MF) ||
1851 (RPI.isPaired() &&
1852 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
1853 RPI.Reg1 + 1 == RPI.Reg2))) &&
1854 "Callee-save registers not saved as adjacent register pair!");
1855
1856 RPI.FrameIdx = CSI[i].getFrameIdx();
1857 if (IsWindows &&
1858 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
1859 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
1860
1861 // Realign the scalable offset if necessary. This is relevant when spilling
1862 // predicates on Windows.
1863 if (RPI.isScalable() && ScalableByteOffset % Scale != 0)
1864 ScalableByteOffset = AlignOffset(ScalableByteOffset, Scale);
1865
1866 // Realign the fixed offset if necessary. This is relevant when spilling Q
1867 // registers after spilling an odd amount of X registers.
1868 if (!RPI.isScalable() && ByteOffset % Scale != 0)
1869 ByteOffset = AlignOffset(ByteOffset, Scale);
1870
1871 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1872 assert(OffsetPre % Scale == 0);
1873
1874 if (RPI.isScalable())
1875 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1876 else
1877 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1878
1879 // Swift's async context is directly before FP, so allocate an extra
1880 // 8 bytes for it.
1881 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1882 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1883 (IsWindows && RPI.Reg2 == AArch64::LR)))
1884 ByteOffset += StackFillDir * 8;
1885
1886 // Round up size of non-pair to pair size if we need to pad the
1887 // callee-save area to ensure 16-byte alignment.
1888 if (NeedGapToAlignStack && !IsWindows && !RPI.isScalable() &&
1889 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
1890 ByteOffset % 16 != 0) {
1891 ByteOffset += 8 * StackFillDir;
1892 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
1893 // A stack frame with a gap looks like this, bottom up:
1894 // d9, d8. x21, gap, x20, x19.
1895 // Set extra alignment on the x21 object to create the gap above it.
1896 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
1897 NeedGapToAlignStack = false;
1898 }
1899
1900 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1901 assert(OffsetPost % Scale == 0);
1902 // If filling top down (default), we want the offset after incrementing it.
1903 // If filling bottom up (WinCFI) we need the original offset.
1904 int Offset = IsWindows ? OffsetPre : OffsetPost;
1905
1906 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
1907 // Swift context can directly precede FP.
1908 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1909 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1910 (IsWindows && RPI.Reg2 == AArch64::LR)))
1911 Offset += 8;
1912 RPI.Offset = Offset / Scale;
1913
1914 assert((!RPI.isPaired() ||
1915 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
1916 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
1917 "Offset out of bounds for LDP/STP immediate");
1918
1919 auto isFrameRecord = [&] {
1920 if (RPI.isPaired())
1921 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
1922 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
1923 // Otherwise, look for the frame record as two unpaired registers. This is
1924 // needed for -aarch64-stack-hazard-size=<val>, which disables register
1925 // pairing (as the padding may be too large for the LDP/STP offset). Note:
1926 // On Windows, this check works out as current reg == FP, next reg == LR,
1927 // and on other platforms current reg == FP, previous reg == LR. This
1928 // works out as the correct pre-increment or post-increment offsets
1929 // respectively.
1930 return i > 0 && RPI.Reg1 == AArch64::FP &&
1931 CSI[i - 1].getReg() == AArch64::LR;
1932 };
1933
1934 // Save the offset to frame record so that the FP register can point to the
1935 // innermost frame record (spilled FP and LR registers).
1936 if (NeedsFrameRecord && isFrameRecord())
1938
1939 RegPairs.push_back(RPI);
1940 if (RPI.isPaired())
1941 i += RegInc;
1942 }
1943 if (IsWindows) {
1944 // If we need an alignment gap in the stack, align the topmost stack
1945 // object. A stack frame with a gap looks like this, bottom up:
1946 // x19, d8. d9, gap.
1947 // Set extra alignment on the topmost stack object (the first element in
1948 // CSI, which goes top down), to create the gap above it.
1949 if (AFI->hasCalleeSaveStackFreeSpace())
1950 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
1951 // We iterated bottom up over the registers; flip RegPairs back to top
1952 // down order.
1953 std::reverse(RegPairs.begin(), RegPairs.end());
1954 }
1955}
1956
1960 MachineFunction &MF = *MBB.getParent();
1961 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1962 auto &TLI = *Subtarget.getTargetLowering();
1963 const AArch64InstrInfo &TII = *Subtarget.getInstrInfo();
1964 bool NeedsWinCFI = needsWinCFI(MF);
1965 DebugLoc DL;
1967
1968 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
1969
1970 MachineRegisterInfo &MRI = MF.getRegInfo();
1971 // Refresh the reserved regs in case there are any potential changes since the
1972 // last freeze.
1973 MRI.freezeReservedRegs();
1974
1975 if (homogeneousPrologEpilog(MF)) {
1976 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
1978
1979 for (auto &RPI : RegPairs) {
1980 MIB.addReg(RPI.Reg1);
1981 MIB.addReg(RPI.Reg2);
1982
1983 // Update register live in.
1984 if (!MRI.isReserved(RPI.Reg1))
1985 MBB.addLiveIn(RPI.Reg1);
1986 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
1987 MBB.addLiveIn(RPI.Reg2);
1988 }
1989 return true;
1990 }
1991 bool PTrueCreated = false;
1992 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
1993 Register Reg1 = RPI.Reg1;
1994 Register Reg2 = RPI.Reg2;
1995 unsigned StrOpc;
1996
1997 // Issue sequence of spills for cs regs. The first spill may be converted
1998 // to a pre-decrement store later by emitPrologue if the callee-save stack
1999 // area allocation can't be combined with the local stack area allocation.
2000 // For example:
2001 // stp x22, x21, [sp, #0] // addImm(+0)
2002 // stp x20, x19, [sp, #16] // addImm(+2)
2003 // stp fp, lr, [sp, #32] // addImm(+4)
2004 // Rationale: This sequence saves uop updates compared to a sequence of
2005 // pre-increment spills like stp xi,xj,[sp,#-16]!
2006 // Note: Similar rationale and sequence for restores in epilog.
2007 unsigned Size = TRI->getSpillSize(*RPI.RC);
2008 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2009 switch (RPI.Type) {
2010 case RegPairInfo::GPR:
2011 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
2012 break;
2013 case RegPairInfo::FPR64:
2014 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
2015 break;
2016 case RegPairInfo::FPR128:
2017 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
2018 break;
2019 case RegPairInfo::ZPR:
2020 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
2021 break;
2022 case RegPairInfo::PPR:
2023 StrOpc = AArch64::STR_PXI;
2024 break;
2025 case RegPairInfo::VG:
2026 StrOpc = AArch64::STRXui;
2027 break;
2028 }
2029
2030 Register X0Scratch;
2031 llvm::scope_exit RestoreX0([&] {
2032 if (X0Scratch != AArch64::NoRegister)
2033 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), AArch64::X0)
2034 .addReg(X0Scratch)
2036 });
2037
2038 if (Reg1 == AArch64::VG) {
2039 // Find an available register to store value of VG to.
2040 Reg1 = findScratchNonCalleeSaveRegister(&MBB, true);
2041 assert(Reg1 != AArch64::NoRegister);
2042 if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
2043 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
2044 .addImm(31)
2045 .addImm(1)
2047 } else {
2049 if (any_of(MBB.liveins(),
2050 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
2051 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
2052 AArch64::X0, LiveIn.PhysReg);
2053 })) {
2054 X0Scratch = Reg1;
2055 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), X0Scratch)
2056 .addReg(AArch64::X0)
2058 }
2059
2060 RTLIB::Libcall LC = RTLIB::SMEABI_GET_CURRENT_VG;
2061 const uint32_t *RegMask =
2062 TRI->getCallPreservedMask(MF, TLI.getLibcallCallingConv(LC));
2063 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
2064 .addExternalSymbol(TLI.getLibcallName(LC))
2065 .addRegMask(RegMask)
2066 .addReg(AArch64::X0, RegState::ImplicitDefine)
2068 Reg1 = AArch64::X0;
2069 }
2070 }
2071
2072 LLVM_DEBUG({
2073 dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
2074 if (RPI.isPaired())
2075 dbgs() << ", " << printReg(Reg2, TRI);
2076 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2077 if (RPI.isPaired())
2078 dbgs() << ", " << RPI.FrameIdx + 1;
2079 dbgs() << ")\n";
2080 });
2081
2082 assert((!isTargetWindows(MF) ||
2083 !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
2084 "Windows unwdinding requires a consecutive (FP,LR) pair");
2085 // Windows unwind codes require consecutive registers if registers are
2086 // paired. Make the switch here, so that the code below will save (x,x+1)
2087 // and not (x+1,x).
2088 unsigned FrameIdxReg1 = RPI.FrameIdx;
2089 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2090 if (isTargetWindows(MF) && RPI.isPaired()) {
2091 std::swap(Reg1, Reg2);
2092 std::swap(FrameIdxReg1, FrameIdxReg2);
2093 }
2094
2095 if (RPI.isPaired() && RPI.isScalable()) {
2096 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2099 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2100 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2101 "Expects SVE2.1 or SME2 target and a predicate register");
2102#ifdef EXPENSIVE_CHECKS
2103 auto IsPPR = [](const RegPairInfo &c) {
2104 return c.Reg1 == RegPairInfo::PPR;
2105 };
2106 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
2107 auto IsZPR = [](const RegPairInfo &c) {
2108 return c.Type == RegPairInfo::ZPR;
2109 };
2110 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
2111 assert(!(PPRBegin < ZPRBegin) &&
2112 "Expected callee save predicate to be handled first");
2113#endif
2114 if (!PTrueCreated) {
2115 PTrueCreated = true;
2116 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2118 }
2119 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2120 if (!MRI.isReserved(Reg1))
2121 MBB.addLiveIn(Reg1);
2122 if (!MRI.isReserved(Reg2))
2123 MBB.addLiveIn(Reg2);
2124 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
2126 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2127 MachineMemOperand::MOStore, Size, Alignment));
2128 MIB.addReg(PnReg);
2129 MIB.addReg(AArch64::SP)
2130 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
2131 // where 2*vscale is implicit
2134 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2135 MachineMemOperand::MOStore, Size, Alignment));
2136 if (NeedsWinCFI)
2137 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2138 } else { // The code when the pair of ZReg is not present
2139 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2140 if (!MRI.isReserved(Reg1))
2141 MBB.addLiveIn(Reg1);
2142 if (RPI.isPaired()) {
2143 if (!MRI.isReserved(Reg2))
2144 MBB.addLiveIn(Reg2);
2145 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2147 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2148 MachineMemOperand::MOStore, Size, Alignment));
2149 }
2150 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2151 .addReg(AArch64::SP)
2152 .addImm(RPI.Offset) // [sp, #offset*vscale],
2153 // where factor*vscale is implicit
2156 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2157 MachineMemOperand::MOStore, Size, Alignment));
2158 if (NeedsWinCFI)
2159 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2160 }
2161 // Update the StackIDs of the SVE stack slots.
2162 MachineFrameInfo &MFI = MF.getFrameInfo();
2163 if (RPI.Type == RegPairInfo::ZPR) {
2164 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
2165 if (RPI.isPaired())
2166 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
2167 } else if (RPI.Type == RegPairInfo::PPR) {
2169 if (RPI.isPaired())
2171 }
2172 }
2173 return true;
2174}
2175
2179 MachineFunction &MF = *MBB.getParent();
2180 const AArch64InstrInfo &TII =
2181 *MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
2182 DebugLoc DL;
2184 bool NeedsWinCFI = needsWinCFI(MF);
2185
2186 if (MBBI != MBB.end())
2187 DL = MBBI->getDebugLoc();
2188
2189 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
2190 if (homogeneousPrologEpilog(MF, &MBB)) {
2191 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
2193 for (auto &RPI : RegPairs) {
2194 MIB.addReg(RPI.Reg1, RegState::Define);
2195 MIB.addReg(RPI.Reg2, RegState::Define);
2196 }
2197 return true;
2198 }
2199
2200 // For performance reasons restore SVE register in increasing order
2201 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
2202 auto PPRBegin = llvm::find_if(RegPairs, IsPPR);
2203 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
2204 std::reverse(PPRBegin, PPREnd);
2205 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
2206 auto ZPRBegin = llvm::find_if(RegPairs, IsZPR);
2207 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
2208 std::reverse(ZPRBegin, ZPREnd);
2209
2210 bool PTrueCreated = false;
2211 for (const RegPairInfo &RPI : RegPairs) {
2212 Register Reg1 = RPI.Reg1;
2213 Register Reg2 = RPI.Reg2;
2214
2215 // Issue sequence of restores for cs regs. The last restore may be converted
2216 // to a post-increment load later by emitEpilogue if the callee-save stack
2217 // area allocation can't be combined with the local stack area allocation.
2218 // For example:
2219 // ldp fp, lr, [sp, #32] // addImm(+4)
2220 // ldp x20, x19, [sp, #16] // addImm(+2)
2221 // ldp x22, x21, [sp, #0] // addImm(+0)
2222 // Note: see comment in spillCalleeSavedRegisters()
2223 unsigned LdrOpc;
2224 unsigned Size = TRI->getSpillSize(*RPI.RC);
2225 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2226 switch (RPI.Type) {
2227 case RegPairInfo::GPR:
2228 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2229 break;
2230 case RegPairInfo::FPR64:
2231 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
2232 break;
2233 case RegPairInfo::FPR128:
2234 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
2235 break;
2236 case RegPairInfo::ZPR:
2237 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
2238 break;
2239 case RegPairInfo::PPR:
2240 LdrOpc = AArch64::LDR_PXI;
2241 break;
2242 case RegPairInfo::VG:
2243 continue;
2244 }
2245 LLVM_DEBUG({
2246 dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
2247 if (RPI.isPaired())
2248 dbgs() << ", " << printReg(Reg2, TRI);
2249 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2250 if (RPI.isPaired())
2251 dbgs() << ", " << RPI.FrameIdx + 1;
2252 dbgs() << ")\n";
2253 });
2254
2255 // Windows unwind codes require consecutive registers if registers are
2256 // paired. Make the switch here, so that the code below will save (x,x+1)
2257 // and not (x+1,x).
2258 unsigned FrameIdxReg1 = RPI.FrameIdx;
2259 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2260 if (isTargetWindows(MF) && RPI.isPaired()) {
2261 std::swap(Reg1, Reg2);
2262 std::swap(FrameIdxReg1, FrameIdxReg2);
2263 }
2264
2266 if (RPI.isPaired() && RPI.isScalable()) {
2267 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2269 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2270 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2271 "Expects SVE2.1 or SME2 target and a predicate register");
2272#ifdef EXPENSIVE_CHECKS
2273 assert(!(PPRBegin < ZPRBegin) &&
2274 "Expected callee save predicate to be handled first");
2275#endif
2276 if (!PTrueCreated) {
2277 PTrueCreated = true;
2278 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2280 }
2281 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2282 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
2283 getDefRegState(true));
2285 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2286 MachineMemOperand::MOLoad, Size, Alignment));
2287 MIB.addReg(PnReg);
2288 MIB.addReg(AArch64::SP)
2289 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
2290 // where 2*vscale is implicit
2293 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2294 MachineMemOperand::MOLoad, Size, Alignment));
2295 if (NeedsWinCFI)
2296 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2297 } else {
2298 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2299 if (RPI.isPaired()) {
2300 MIB.addReg(Reg2, getDefRegState(true));
2302 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2303 MachineMemOperand::MOLoad, Size, Alignment));
2304 }
2305 MIB.addReg(Reg1, getDefRegState(true));
2306 MIB.addReg(AArch64::SP)
2307 .addImm(RPI.Offset) // [sp, #offset*vscale]
2308 // where factor*vscale is implicit
2311 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2312 MachineMemOperand::MOLoad, Size, Alignment));
2313 if (NeedsWinCFI)
2314 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2315 }
2316 }
2317 return true;
2318}
2319
2320// Return the FrameID for a MMO.
2321static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
2322 const MachineFrameInfo &MFI) {
2323 auto *PSV =
2325 if (PSV)
2326 return std::optional<int>(PSV->getFrameIndex());
2327
2328 if (MMO->getValue()) {
2329 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
2330 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
2331 FI++)
2332 if (MFI.getObjectAllocation(FI) == Al)
2333 return FI;
2334 }
2335 }
2336
2337 return std::nullopt;
2338}
2339
2340// Return the FrameID for a Load/Store instruction by looking at the first MMO.
2341static std::optional<int> getLdStFrameID(const MachineInstr &MI,
2342 const MachineFrameInfo &MFI) {
2343 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
2344 return std::nullopt;
2345
2346 return getMMOFrameID(*MI.memoperands_begin(), MFI);
2347}
2348
2349// Returns true if the LDST MachineInstr \p MI is a PPR access.
2350static bool isPPRAccess(const MachineInstr &MI) {
2351 return AArch64::PPRRegClass.contains(MI.getOperand(0).getReg());
2352}
2353
2354// Check if a Hazard slot is needed for the current function, and if so create
2355// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
2356// which can be used to determine if any hazard padding is needed.
2357void AArch64FrameLowering::determineStackHazardSlot(
2358 MachineFunction &MF, BitVector &SavedRegs) const {
2359 unsigned StackHazardSize = getStackHazardSize(MF);
2360 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2361 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
2363 return;
2364
2365 // Stack hazards are only needed in streaming functions.
2366 SMEAttrs Attrs = AFI->getSMEFnAttrs();
2367 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
2368 return;
2369
2370 MachineFrameInfo &MFI = MF.getFrameInfo();
2371
2372 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
2373 // stack objects.
2374 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2375 return AArch64::FPR64RegClass.contains(Reg) ||
2376 AArch64::FPR128RegClass.contains(Reg) ||
2377 AArch64::ZPRRegClass.contains(Reg);
2378 });
2379 bool HasPPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2380 return AArch64::PPRRegClass.contains(Reg);
2381 });
2382 bool HasFPRStackObjects = false;
2383 bool HasPPRStackObjects = false;
2384 if (!HasFPRCSRs || SplitSVEObjects) {
2385 enum SlotType : uint8_t {
2386 Unknown = 0,
2387 ZPRorFPR = 1 << 0,
2388 PPR = 1 << 1,
2389 GPR = 1 << 2,
2391 };
2392
2393 // Find stack slots solely used for one kind of register (ZPR, PPR, etc.),
2394 // based on the kinds of accesses used in the function.
2395 SmallVector<SlotType> SlotTypes(MFI.getObjectIndexEnd(), SlotType::Unknown);
2396 for (auto &MBB : MF) {
2397 for (auto &MI : MBB) {
2398 std::optional<int> FI = getLdStFrameID(MI, MFI);
2399 if (!FI || FI < 0 || FI > int(SlotTypes.size()))
2400 continue;
2401 if (MFI.hasScalableStackID(*FI)) {
2402 SlotTypes[*FI] |=
2403 isPPRAccess(MI) ? SlotType::PPR : SlotType::ZPRorFPR;
2404 } else {
2405 SlotTypes[*FI] |= AArch64InstrInfo::isFpOrNEON(MI)
2406 ? SlotType::ZPRorFPR
2407 : SlotType::GPR;
2408 }
2409 }
2410 }
2411
2412 for (int FI = 0; FI < int(SlotTypes.size()); ++FI) {
2413 HasFPRStackObjects |= SlotTypes[FI] == SlotType::ZPRorFPR;
2414 // For SplitSVEObjects remember that this stack slot is a predicate, this
2415 // will be needed later when determining the frame layout.
2416 if (SlotTypes[FI] == SlotType::PPR) {
2418 HasPPRStackObjects = true;
2419 }
2420 }
2421 }
2422
2423 if (HasFPRCSRs || HasFPRStackObjects) {
2424 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
2425 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
2426 << StackHazardSize << "\n");
2428 }
2429
2430 if (!AFI->hasStackHazardSlotIndex())
2431 return;
2432
2433 if (SplitSVEObjects) {
2434 CallingConv::ID CC = MF.getFunction().getCallingConv();
2435 if (AFI->isSVECC() || CC == CallingConv::AArch64_SVE_VectorCall) {
2436 AFI->setSplitSVEObjects(true);
2437 LLVM_DEBUG(dbgs() << "Using SplitSVEObjects for SVE CC function\n");
2438 return;
2439 }
2440
2441 // We only use SplitSVEObjects in non-SVE CC functions if there's a
2442 // possibility of a stack hazard between PPRs and ZPRs/FPRs.
2443 LLVM_DEBUG(dbgs() << "Determining if SplitSVEObjects should be used in "
2444 "non-SVE CC function...\n");
2445
2446 // If another calling convention is explicitly set FPRs can't be promoted to
2447 // ZPR callee-saves.
2449 LLVM_DEBUG(
2450 dbgs()
2451 << "Calling convention is not supported with SplitSVEObjects\n");
2452 return;
2453 }
2454
2455 if (!HasPPRCSRs && !HasPPRStackObjects) {
2456 LLVM_DEBUG(
2457 dbgs() << "Not using SplitSVEObjects as no PPRs are on the stack\n");
2458 return;
2459 }
2460
2461 if (!HasFPRCSRs && !HasFPRStackObjects) {
2462 LLVM_DEBUG(
2463 dbgs()
2464 << "Not using SplitSVEObjects as no FPRs or ZPRs are on the stack\n");
2465 return;
2466 }
2467
2468 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2469 MF.getSubtarget<AArch64Subtarget>();
2471 "Expected SVE to be available for PPRs");
2472
2473 const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
2474 // With SplitSVEObjects the CS hazard padding is placed between the
2475 // PPRs and ZPRs. If there are any FPR CS there would be a hazard between
2476 // them and the CS GRPs. Avoid this by promoting all FPR CS to ZPRs.
2477 BitVector FPRZRegs(SavedRegs.size());
2478 for (size_t Reg = 0, E = SavedRegs.size(); HasFPRCSRs && Reg < E; ++Reg) {
2479 BitVector::reference RegBit = SavedRegs[Reg];
2480 if (!RegBit)
2481 continue;
2482 unsigned SubRegIdx = 0;
2483 if (AArch64::FPR64RegClass.contains(Reg))
2484 SubRegIdx = AArch64::dsub;
2485 else if (AArch64::FPR128RegClass.contains(Reg))
2486 SubRegIdx = AArch64::zsub;
2487 else
2488 continue;
2489 // Clear the bit for the FPR save.
2490 RegBit = false;
2491 // Mark that we should save the corresponding ZPR.
2492 Register ZReg =
2493 TRI->getMatchingSuperReg(Reg, SubRegIdx, &AArch64::ZPRRegClass);
2494 FPRZRegs.set(ZReg);
2495 }
2496 SavedRegs |= FPRZRegs;
2497
2498 AFI->setSplitSVEObjects(true);
2499 LLVM_DEBUG(dbgs() << "SplitSVEObjects enabled!\n");
2500 }
2501}
2502
2504 BitVector &SavedRegs,
2505 RegScavenger *RS) const {
2506 // All calls are tail calls in GHC calling conv, and functions have no
2507 // prologue/epilogue.
2509 return;
2510
2511 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2512
2514 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
2516 unsigned UnspilledCSGPR = AArch64::NoRegister;
2517 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
2518
2519 MachineFrameInfo &MFI = MF.getFrameInfo();
2520 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
2521
2522 MCRegister BasePointerReg =
2523 RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister() : MCRegister();
2524
2525 unsigned ExtraCSSpill = 0;
2526 bool HasUnpairedGPR64 = false;
2527 bool HasPairZReg = false;
2528 BitVector UserReservedRegs = RegInfo->getUserReservedRegs(MF);
2529 BitVector ReservedRegs = RegInfo->getReservedRegs(MF);
2530
2531 // Figure out which callee-saved registers to save/restore.
2532 for (unsigned i = 0; CSRegs[i]; ++i) {
2533 const MCRegister Reg = CSRegs[i];
2534
2535 // Add the base pointer register to SavedRegs if it is callee-save.
2536 if (Reg == BasePointerReg)
2537 SavedRegs.set(Reg);
2538
2539 // Don't save manually reserved registers set through +reserve-x#i,
2540 // even for callee-saved registers, as per GCC's behavior.
2541 if (UserReservedRegs[Reg]) {
2542 SavedRegs.reset(Reg);
2543 continue;
2544 }
2545
2546 bool RegUsed = SavedRegs.test(Reg);
2547 MCRegister PairedReg;
2548 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
2549 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
2550 AArch64::FPR128RegClass.contains(Reg)) {
2551 // Compensate for odd numbers of GP CSRs.
2552 // For now, all the known cases of odd number of CSRs are of GPRs.
2553 if (HasUnpairedGPR64)
2554 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
2555 else
2556 PairedReg = CSRegs[i ^ 1];
2557 }
2558
2559 // If the function requires all the GP registers to save (SavedRegs),
2560 // and there are an odd number of GP CSRs at the same time (CSRegs),
2561 // PairedReg could be in a different register class from Reg, which would
2562 // lead to a FPR (usually D8) accidentally being marked saved.
2563 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
2564 PairedReg = AArch64::NoRegister;
2565 HasUnpairedGPR64 = true;
2566 }
2567 assert(PairedReg == AArch64::NoRegister ||
2568 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
2569 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
2570 AArch64::FPR128RegClass.contains(Reg, PairedReg));
2571
2572 if (!RegUsed) {
2573 if (AArch64::GPR64RegClass.contains(Reg) && !ReservedRegs[Reg]) {
2574 UnspilledCSGPR = Reg;
2575 UnspilledCSGPRPaired = PairedReg;
2576 }
2577 continue;
2578 }
2579
2580 // MachO's compact unwind format relies on all registers being stored in
2581 // pairs.
2582 // FIXME: the usual format is actually better if unwinding isn't needed.
2583 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
2584 !SavedRegs.test(PairedReg)) {
2585 SavedRegs.set(PairedReg);
2586 if (AArch64::GPR64RegClass.contains(PairedReg) &&
2587 !ReservedRegs[PairedReg])
2588 ExtraCSSpill = PairedReg;
2589 }
2590 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
2591 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
2592 SavedRegs.test(CSRegs[i ^ 1]));
2593 }
2594
2595 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
2597 // Find a suitable predicate register for the multi-vector spill/fill
2598 // instructions.
2599 MCRegister PnReg = findFreePredicateReg(SavedRegs);
2600 if (PnReg.isValid())
2601 AFI->setPredicateRegForFillSpill(PnReg);
2602 // If no free callee-save has been found assign one.
2603 if (!AFI->getPredicateRegForFillSpill() &&
2604 MF.getFunction().getCallingConv() ==
2606 SavedRegs.set(AArch64::P8);
2607 AFI->setPredicateRegForFillSpill(AArch64::PN8);
2608 }
2609
2610 assert(!ReservedRegs[AFI->getPredicateRegForFillSpill()] &&
2611 "Predicate cannot be a reserved register");
2612 }
2613
2615 !Subtarget.isTargetWindows()) {
2616 // For Windows calling convention on a non-windows OS, where X18 is treated
2617 // as reserved, back up X18 when entering non-windows code (marked with the
2618 // Windows calling convention) and restore when returning regardless of
2619 // whether the individual function uses it - it might call other functions
2620 // that clobber it.
2621 SavedRegs.set(AArch64::X18);
2622 }
2623
2624 // Determine if a Hazard slot should be used and where it should go.
2625 // If SplitSVEObjects is used, the hazard padding is placed between the PPRs
2626 // and ZPRs. Otherwise, it goes in the callee save area.
2627 determineStackHazardSlot(MF, SavedRegs);
2628
2629 // Calculates the callee saved stack size.
2630 unsigned CSStackSize = 0;
2631 unsigned ZPRCSStackSize = 0;
2632 unsigned PPRCSStackSize = 0;
2634 for (unsigned Reg : SavedRegs.set_bits()) {
2635 auto *RC = TRI->getMinimalPhysRegClass(MCRegister(Reg));
2636 assert(RC && "expected register class!");
2637 auto SpillSize = TRI->getSpillSize(*RC);
2638 bool IsZPR = AArch64::ZPRRegClass.contains(Reg);
2639 bool IsPPR = !IsZPR && AArch64::PPRRegClass.contains(Reg);
2640 if (IsZPR)
2641 ZPRCSStackSize += SpillSize;
2642 else if (IsPPR)
2643 PPRCSStackSize += SpillSize;
2644 else
2645 CSStackSize += SpillSize;
2646 }
2647
2648 // Save number of saved regs, so we can easily update CSStackSize later to
2649 // account for any additional 64-bit GPR saves. Note: After this point
2650 // only 64-bit GPRs can be added to SavedRegs.
2651 unsigned NumSavedRegs = SavedRegs.count();
2652
2653 // If we have hazard padding in the CS area add that to the size.
2655 CSStackSize += getStackHazardSize(MF);
2656
2657 // Increase the callee-saved stack size if the function has streaming mode
2658 // changes, as we will need to spill the value of the VG register.
2659 if (requiresSaveVG(MF))
2660 CSStackSize += 8;
2661
2662 // If we must call __arm_get_current_vg in the prologue preserve the LR.
2663 if (requiresSaveVG(MF) && !Subtarget.hasSVE())
2664 SavedRegs.set(AArch64::LR);
2665
2666 // The frame record needs to be created by saving the appropriate registers
2667 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
2668 if (hasFP(MF) ||
2669 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
2670 SavedRegs.set(AArch64::FP);
2671 SavedRegs.set(AArch64::LR);
2672 }
2673
2674 LLVM_DEBUG({
2675 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
2676 for (unsigned Reg : SavedRegs.set_bits())
2677 dbgs() << ' ' << printReg(MCRegister(Reg), RegInfo);
2678 dbgs() << "\n";
2679 });
2680
2681 // If any callee-saved registers are used, the frame cannot be eliminated.
2682 auto [ZPRLocalStackSize, PPRLocalStackSize] =
2684 uint64_t SVELocals = ZPRLocalStackSize + PPRLocalStackSize;
2685 uint64_t SVEStackSize =
2686 alignTo(ZPRCSStackSize + PPRCSStackSize + SVELocals, 16);
2687 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
2688
2689 // The CSR spill slots have not been allocated yet, so estimateStackSize
2690 // won't include them.
2691 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
2692
2693 // We may address some of the stack above the canonical frame address, either
2694 // for our own arguments or during a call. Include that in calculating whether
2695 // we have complicated addressing concerns.
2696 int64_t CalleeStackUsed = 0;
2697 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
2698 int64_t FixedOff = MFI.getObjectOffset(I);
2699 if (FixedOff > CalleeStackUsed)
2700 CalleeStackUsed = FixedOff;
2701 }
2702
2703 // Conservatively always assume BigStack when there are SVE spills.
2704 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
2705 CalleeStackUsed) > EstimatedStackSizeLimit;
2706 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
2707 AFI->setHasStackFrame(true);
2708
2709 // Estimate if we might need to scavenge a register at some point in order
2710 // to materialize a stack offset. If so, either spill one additional
2711 // callee-saved register or reserve a special spill slot to facilitate
2712 // register scavenging. If we already spilled an extra callee-saved register
2713 // above to keep the number of spills even, we don't need to do anything else
2714 // here.
2715 if (BigStack) {
2716 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
2717 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
2718 << " to get a scratch register.\n");
2719 SavedRegs.set(UnspilledCSGPR);
2720 ExtraCSSpill = UnspilledCSGPR;
2721
2722 // MachO's compact unwind format relies on all registers being stored in
2723 // pairs, so if we need to spill one extra for BigStack, then we need to
2724 // store the pair.
2725 if (producePairRegisters(MF)) {
2726 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
2727 // Failed to make a pair for compact unwind format, revert spilling.
2728 if (produceCompactUnwindFrame(*this, MF)) {
2729 SavedRegs.reset(UnspilledCSGPR);
2730 ExtraCSSpill = AArch64::NoRegister;
2731 }
2732 } else
2733 SavedRegs.set(UnspilledCSGPRPaired);
2734 }
2735 }
2736
2737 // If we didn't find an extra callee-saved register to spill, create
2738 // an emergency spill slot.
2739 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
2741 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
2742 unsigned Size = TRI->getSpillSize(RC);
2743 Align Alignment = TRI->getSpillAlign(RC);
2744 int FI = MFI.CreateSpillStackObject(Size, Alignment);
2745 RS->addScavengingFrameIndex(FI);
2746 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
2747 << " as the emergency spill slot.\n");
2748 }
2749 }
2750
2751 // Adding the size of additional 64bit GPR saves.
2752 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
2753
2754 // A Swift asynchronous context extends the frame record with a pointer
2755 // directly before FP.
2756 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
2757 CSStackSize += 8;
2758
2759 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
2760 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
2761 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
2762
2764 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
2765 "Should not invalidate callee saved info");
2766
2767 // Round up to register pair alignment to avoid additional SP adjustment
2768 // instructions.
2769 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
2770 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
2771 AFI->setSVECalleeSavedStackSize(ZPRCSStackSize, alignTo(PPRCSStackSize, 16));
2772}
2773
2775 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
2776 std::vector<CalleeSavedInfo> &CSI) const {
2777 bool IsWindows = isTargetWindows(MF);
2778 unsigned StackHazardSize = getStackHazardSize(MF);
2779 // To match the canonical windows frame layout, reverse the list of
2780 // callee saved registers to get them laid out by PrologEpilogInserter
2781 // in the right order. (PrologEpilogInserter allocates stack objects top
2782 // down. Windows canonical prologs store higher numbered registers at
2783 // the top, thus have the CSI array start from the highest registers.)
2784 if (IsWindows)
2785 std::reverse(CSI.begin(), CSI.end());
2786
2787 if (CSI.empty())
2788 return true; // Early exit if no callee saved registers are modified!
2789
2790 // Now that we know which registers need to be saved and restored, allocate
2791 // stack slots for them.
2792 MachineFrameInfo &MFI = MF.getFrameInfo();
2793 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2794
2795 if (IsWindows && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2796 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
2797 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2798 MFI.setIsCalleeSavedObjectIndex(FrameIdx, true);
2799 }
2800
2801 // Insert VG into the list of CSRs, immediately before LR if saved.
2802 if (requiresSaveVG(MF)) {
2803 CalleeSavedInfo VGInfo(AArch64::VG);
2804 auto It =
2805 find_if(CSI, [](auto &Info) { return Info.getReg() == AArch64::LR; });
2806 if (It != CSI.end())
2807 CSI.insert(It, VGInfo);
2808 else
2809 CSI.push_back(VGInfo);
2810 }
2811
2812 Register LastReg = 0;
2813 int HazardSlotIndex = std::numeric_limits<int>::max();
2814 for (auto &CS : CSI) {
2815 MCRegister Reg = CS.getReg();
2816 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
2817
2818 // Create a hazard slot as we switch between GPR and FPR CSRs.
2820 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
2822 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
2823 "Unexpected register order for hazard slot");
2824 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2825 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2826 << "\n");
2827 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2828 MFI.setIsCalleeSavedObjectIndex(HazardSlotIndex, true);
2829 }
2830
2831 unsigned Size = RegInfo->getSpillSize(*RC);
2832 Align Alignment(RegInfo->getSpillAlign(*RC));
2833 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
2834 CS.setFrameIdx(FrameIdx);
2835 MFI.setIsCalleeSavedObjectIndex(FrameIdx, true);
2836
2837 // Grab 8 bytes below FP for the extended asynchronous frame info.
2838 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !IsWindows &&
2839 Reg == AArch64::FP) {
2840 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
2841 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2842 MFI.setIsCalleeSavedObjectIndex(FrameIdx, true);
2843 }
2844 LastReg = Reg;
2845 }
2846
2847 // Add hazard slot in the case where no FPR CSRs are present.
2849 HazardSlotIndex == std::numeric_limits<int>::max()) {
2850 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2851 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2852 << "\n");
2853 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2854 MFI.setIsCalleeSavedObjectIndex(HazardSlotIndex, true);
2855 }
2856
2857 return true;
2858}
2859
2861 const MachineFunction &MF) const {
2863 // If the function has streaming-mode changes, don't scavenge a
2864 // spillslot in the callee-save area, as that might require an
2865 // 'addvl' in the streaming-mode-changing call-sequence when the
2866 // function doesn't use a FP.
2867 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
2868 return false;
2869 // Don't allow register salvaging with hazard slots, in case it moves objects
2870 // into the wrong place.
2871 if (AFI->hasStackHazardSlotIndex())
2872 return false;
2873 return AFI->hasCalleeSaveStackFreeSpace();
2874}
2875
2876/// returns true if there are any SVE callee saves.
2878 int &Min, int &Max) {
2879 Min = std::numeric_limits<int>::max();
2880 Max = std::numeric_limits<int>::min();
2881
2882 if (!MFI.isCalleeSavedInfoValid())
2883 return false;
2884
2885 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
2886 for (auto &CS : CSI) {
2887 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
2888 AArch64::PPRRegClass.contains(CS.getReg())) {
2889 assert((Max == std::numeric_limits<int>::min() ||
2890 Max + 1 == CS.getFrameIdx()) &&
2891 "SVE CalleeSaves are not consecutive");
2892 Min = std::min(Min, CS.getFrameIdx());
2893 Max = std::max(Max, CS.getFrameIdx());
2894 }
2895 }
2896 return Min != std::numeric_limits<int>::max();
2897}
2898
2900 AssignObjectOffsets AssignOffsets) {
2901 MachineFrameInfo &MFI = MF.getFrameInfo();
2902 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2903
2904 SVEStackSizes SVEStack{};
2905
2906 // With SplitSVEObjects we maintain separate stack offsets for predicates
2907 // (PPRs) and SVE vectors (ZPRs). When SplitSVEObjects is disabled predicates
2908 // are included in the SVE vector area.
2909 uint64_t &ZPRStackTop = SVEStack.ZPRStackSize;
2910 uint64_t &PPRStackTop =
2911 AFI->hasSplitSVEObjects() ? SVEStack.PPRStackSize : SVEStack.ZPRStackSize;
2912
2913#ifndef NDEBUG
2914 // First process all fixed stack objects.
2915 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
2916 assert(!MFI.hasScalableStackID(I) &&
2917 "SVE vectors should never be passed on the stack by value, only by "
2918 "reference.");
2919#endif
2920
2921 auto AllocateObject = [&](int FI) {
2923 ? ZPRStackTop
2924 : PPRStackTop;
2925
2926 // FIXME: Given that the length of SVE vectors is not necessarily a power of
2927 // two, we'd need to align every object dynamically at runtime if the
2928 // alignment is larger than 16. This is not yet supported.
2929 Align Alignment = MFI.getObjectAlign(FI);
2930 if (Alignment > Align(16))
2932 "Alignment of scalable vectors > 16 bytes is not yet supported");
2933
2934 StackTop += MFI.getObjectSize(FI);
2935 StackTop = alignTo(StackTop, Alignment);
2936
2937 assert(StackTop < (uint64_t)std::numeric_limits<int64_t>::max() &&
2938 "SVE StackTop far too large?!");
2939
2940 int64_t Offset = -int64_t(StackTop);
2941 if (AssignOffsets == AssignObjectOffsets::Yes)
2942 MFI.setObjectOffset(FI, Offset);
2943
2944 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
2945 };
2946
2947 // Then process all callee saved slots.
2948 int MinCSFrameIndex, MaxCSFrameIndex;
2949 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
2950 for (int FI = MinCSFrameIndex; FI <= MaxCSFrameIndex; ++FI)
2951 AllocateObject(FI);
2952 }
2953
2954 // Ensure the CS area is 16-byte aligned.
2955 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2956 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2957
2958 // Create a buffer of SVE objects to allocate and sort it.
2959 SmallVector<int, 8> ObjectsToAllocate;
2960 // If we have a stack protector, and we've previously decided that we have SVE
2961 // objects on the stack and thus need it to go in the SVE stack area, then it
2962 // needs to go first.
2963 int StackProtectorFI = -1;
2964 if (MFI.hasStackProtectorIndex()) {
2965 StackProtectorFI = MFI.getStackProtectorIndex();
2966 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
2967 ObjectsToAllocate.push_back(StackProtectorFI);
2968 }
2969
2970 for (int FI = 0, E = MFI.getObjectIndexEnd(); FI != E; ++FI) {
2971 if (FI == StackProtectorFI || MFI.isDeadObjectIndex(FI) ||
2973 continue;
2974
2977 continue;
2978
2979 ObjectsToAllocate.push_back(FI);
2980 }
2981
2982 // Allocate all SVE locals and spills
2983 for (unsigned FI : ObjectsToAllocate)
2984 AllocateObject(FI);
2985
2986 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2987 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2988
2989 if (AssignOffsets == AssignObjectOffsets::Yes)
2990 AFI->setStackSizeSVE(SVEStack.ZPRStackSize, SVEStack.PPRStackSize);
2991
2992 return SVEStack;
2993}
2994
2996 MachineFunction &MF, RegScavenger *RS) const {
2998 "Upwards growing stack unsupported");
2999
3001
3002 // If this function isn't doing Win64-style C++ EH, we don't need to do
3003 // anything.
3004 if (!MF.hasEHFunclets())
3005 return;
3006
3007 MachineFrameInfo &MFI = MF.getFrameInfo();
3008 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3009
3010 // Win64 C++ EH needs to allocate space for the catch objects in the fixed
3011 // object area right next to the UnwindHelp object.
3012 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
3013 int64_t CurrentOffset =
3015 for (WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
3016 for (WinEHHandlerType &H : TBME.HandlerArray) {
3017 int FrameIndex = H.CatchObj.FrameIndex;
3018 if ((FrameIndex != INT_MAX) && MFI.getObjectOffset(FrameIndex) == 0) {
3019 CurrentOffset =
3020 alignTo(CurrentOffset, MFI.getObjectAlign(FrameIndex).value());
3021 CurrentOffset += MFI.getObjectSize(FrameIndex);
3022 MFI.setObjectOffset(FrameIndex, -CurrentOffset);
3023 }
3024 }
3025 }
3026
3027 // Create an UnwindHelp object.
3028 // The UnwindHelp object is allocated at the start of the fixed object area
3029 int64_t UnwindHelpOffset = alignTo(CurrentOffset + 8, Align(16));
3030 assert(UnwindHelpOffset == getFixedObjectSize(MF, AFI, /*IsWin64*/ true,
3031 /*IsFunclet*/ false) &&
3032 "UnwindHelpOffset must be at the start of the fixed object area");
3033 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8, -UnwindHelpOffset,
3034 /*IsImmutable=*/false);
3035 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
3036
3037 MachineBasicBlock &MBB = MF.front();
3038 auto MBBI = MBB.begin();
3039 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
3040 ++MBBI;
3041
3042 // We need to store -2 into the UnwindHelp object at the start of the
3043 // function.
3044 DebugLoc DL;
3045 RS->enterBasicBlockEnd(MBB);
3046 RS->backward(MBBI);
3047 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
3048 assert(DstReg && "There must be a free register after frame setup");
3049 const AArch64InstrInfo &TII =
3050 *MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3051 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3052 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3053 .addReg(DstReg, getKillRegState(true))
3054 .addFrameIndex(UnwindHelpFI)
3055 .addImm(0);
3056}
3057
3058namespace {
3059struct TagStoreInstr {
3061 int64_t Offset, Size;
3062 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3063 : MI(MI), Offset(Offset), Size(Size) {}
3064};
3065
3066class TagStoreEdit {
3067 MachineFunction *MF;
3068 MachineBasicBlock *MBB;
3069 MachineRegisterInfo *MRI;
3070 // Tag store instructions that are being replaced.
3072 // Combined memref arguments of the above instructions.
3074
3075 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3076 // FrameRegOffset + Size) with the address tag of SP.
3077 Register FrameReg;
3078 StackOffset FrameRegOffset;
3079 int64_t Size;
3080 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3081 // end.
3082 std::optional<int64_t> FrameRegUpdate;
3083 // MIFlags for any FrameReg updating instructions.
3084 unsigned FrameRegUpdateFlags;
3085
3086 // Use zeroing instruction variants.
3087 bool ZeroData;
3088 DebugLoc DL;
3089
3090 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3091 void emitLoop(MachineBasicBlock::iterator InsertI);
3092
3093public:
3094 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3095 : MBB(MBB), ZeroData(ZeroData) {
3096 MF = MBB->getParent();
3097 MRI = &MF->getRegInfo();
3098 }
3099 // Add an instruction to be replaced. Instructions must be added in the
3100 // ascending order of Offset, and have to be adjacent.
3101 void addInstruction(TagStoreInstr I) {
3102 assert((TagStores.empty() ||
3103 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3104 "Non-adjacent tag store instructions.");
3105 TagStores.push_back(I);
3106 }
3107 void clear() { TagStores.clear(); }
3108 // Emit equivalent code at the given location, and erase the current set of
3109 // instructions. May skip if the replacement is not profitable. May invalidate
3110 // the input iterator and replace it with a valid one.
3111 void emitCode(MachineBasicBlock::iterator &InsertI,
3112 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3113};
3114
3115void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3116 const AArch64InstrInfo *TII =
3117 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3118
3119 const int64_t kMinOffset = -256 * 16;
3120 const int64_t kMaxOffset = 255 * 16;
3121
3122 Register BaseReg = FrameReg;
3123 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3124 if (BaseRegOffsetBytes < kMinOffset ||
3125 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3126 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3127 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3128 // is required for the offset of ST2G.
3129 BaseRegOffsetBytes % 16 != 0) {
3130 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3131 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3132 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3133 BaseReg = ScratchReg;
3134 BaseRegOffsetBytes = 0;
3135 }
3136
3137 MachineInstr *LastI = nullptr;
3138 while (Size) {
3139 int64_t InstrSize = (Size > 16) ? 32 : 16;
3140 unsigned Opcode =
3141 InstrSize == 16
3142 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3143 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3144 assert(BaseRegOffsetBytes % 16 == 0);
3145 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3146 .addReg(AArch64::SP)
3147 .addReg(BaseReg)
3148 .addImm(BaseRegOffsetBytes / 16)
3149 .setMemRefs(CombinedMemRefs);
3150 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3151 // final SP adjustment in the epilogue.
3152 if (BaseRegOffsetBytes == 0)
3153 LastI = I;
3154 BaseRegOffsetBytes += InstrSize;
3155 Size -= InstrSize;
3156 }
3157
3158 if (LastI)
3159 MBB->splice(InsertI, MBB, LastI);
3160}
3161
3162void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3163 const AArch64InstrInfo *TII =
3164 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3165
3166 Register BaseReg = FrameRegUpdate
3167 ? FrameReg
3168 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3169 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3170
3171 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3172
3173 int64_t LoopSize = Size;
3174 // If the loop size is not a multiple of 32, split off one 16-byte store at
3175 // the end to fold BaseReg update into.
3176 if (FrameRegUpdate && *FrameRegUpdate)
3177 LoopSize -= LoopSize % 32;
3178 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3179 TII->get(ZeroData ? AArch64::STZGloop_wback
3180 : AArch64::STGloop_wback))
3181 .addDef(SizeReg)
3182 .addDef(BaseReg)
3183 .addImm(LoopSize)
3184 .addReg(BaseReg)
3185 .setMemRefs(CombinedMemRefs);
3186 if (FrameRegUpdate)
3187 LoopI->setFlags(FrameRegUpdateFlags);
3188
3189 int64_t ExtraBaseRegUpdate =
3190 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3191 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
3192 << ", Size=" << Size
3193 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
3194 << ", FrameRegUpdate=" << FrameRegUpdate
3195 << ", FrameRegOffset.getFixed()="
3196 << FrameRegOffset.getFixed() << "\n");
3197 if (LoopSize < Size) {
3198 assert(FrameRegUpdate);
3199 assert(Size - LoopSize == 16);
3200 // Tag 16 more bytes at BaseReg and update BaseReg.
3201 int64_t STGOffset = ExtraBaseRegUpdate + 16;
3202 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
3203 "STG immediate out of range");
3204 BuildMI(*MBB, InsertI, DL,
3205 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3206 .addDef(BaseReg)
3207 .addReg(BaseReg)
3208 .addReg(BaseReg)
3209 .addImm(STGOffset / 16)
3210 .setMemRefs(CombinedMemRefs)
3211 .setMIFlags(FrameRegUpdateFlags);
3212 } else if (ExtraBaseRegUpdate) {
3213 // Update BaseReg.
3214 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
3215 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
3216 BuildMI(
3217 *MBB, InsertI, DL,
3218 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3219 .addDef(BaseReg)
3220 .addReg(BaseReg)
3221 .addImm(AddSubOffset)
3222 .addImm(0)
3223 .setMIFlags(FrameRegUpdateFlags);
3224 }
3225}
3226
3227// Check if *II is a register update that can be merged into STGloop that ends
3228// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3229// end of the loop.
3230bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3231 int64_t Size, int64_t *TotalOffset) {
3232 MachineInstr &MI = *II;
3233 if ((MI.getOpcode() == AArch64::ADDXri ||
3234 MI.getOpcode() == AArch64::SUBXri) &&
3235 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3236 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3237 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3238 if (MI.getOpcode() == AArch64::SUBXri)
3239 Offset = -Offset;
3240 int64_t PostOffset = Offset - Size;
3241 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
3242 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
3243 // chosen depends on the alignment of the loop size, but the difference
3244 // between the valid ranges for the two instructions is small, so we
3245 // conservatively assume that it could be either case here.
3246 //
3247 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
3248 // instruction.
3249 const int64_t kMaxOffset = 4080 - 16;
3250 // Max offset of SUBXri.
3251 const int64_t kMinOffset = -4095;
3252 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
3253 PostOffset % 16 == 0) {
3254 *TotalOffset = Offset;
3255 return true;
3256 }
3257 }
3258 return false;
3259}
3260
3261void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3263 MemRefs.clear();
3264 for (auto &TS : TSE) {
3265 MachineInstr *MI = TS.MI;
3266 // An instruction without memory operands may access anything. Be
3267 // conservative and return an empty list.
3268 if (MI->memoperands_empty()) {
3269 MemRefs.clear();
3270 return;
3271 }
3272 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3273 }
3274}
3275
3276void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3277 const AArch64FrameLowering *TFI,
3278 bool TryMergeSPUpdate) {
3279 if (TagStores.empty())
3280 return;
3281 TagStoreInstr &FirstTagStore = TagStores[0];
3282 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3283 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3284 DL = TagStores[0].MI->getDebugLoc();
3285
3286 Register Reg;
3287 FrameRegOffset = TFI->resolveFrameOffsetReference(
3288 *MF, FirstTagStore.Offset, false /*isFixed*/,
3289 TargetStackID::Default /*StackID*/, Reg,
3290 /*PreferFP=*/false, /*ForSimm=*/true);
3291 FrameReg = Reg;
3292 FrameRegUpdate = std::nullopt;
3293
3294 mergeMemRefs(TagStores, CombinedMemRefs);
3295
3296 LLVM_DEBUG({
3297 dbgs() << "Replacing adjacent STG instructions:\n";
3298 for (const auto &Instr : TagStores) {
3299 dbgs() << " " << *Instr.MI;
3300 }
3301 });
3302
3303 // Size threshold where a loop becomes shorter than a linear sequence of
3304 // tagging instructions.
3305 const int kSetTagLoopThreshold = 176;
3306 if (Size < kSetTagLoopThreshold) {
3307 if (TagStores.size() < 2)
3308 return;
3309 emitUnrolled(InsertI);
3310 } else {
3311 MachineInstr *UpdateInstr = nullptr;
3312 int64_t TotalOffset = 0;
3313 if (TryMergeSPUpdate) {
3314 // See if we can merge base register update into the STGloop.
3315 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
3316 // but STGloop is way too unusual for that, and also it only
3317 // realistically happens in function epilogue. Also, STGloop is expanded
3318 // before that pass.
3319 if (InsertI != MBB->end() &&
3320 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
3321 &TotalOffset)) {
3322 UpdateInstr = &*InsertI++;
3323 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
3324 << *UpdateInstr);
3325 }
3326 }
3327
3328 if (!UpdateInstr && TagStores.size() < 2)
3329 return;
3330
3331 if (UpdateInstr) {
3332 FrameRegUpdate = TotalOffset;
3333 FrameRegUpdateFlags = UpdateInstr->getFlags();
3334 }
3335 emitLoop(InsertI);
3336 if (UpdateInstr)
3337 UpdateInstr->eraseFromParent();
3338 }
3339
3340 for (auto &TS : TagStores)
3341 TS.MI->eraseFromParent();
3342}
3343
3344bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
3345 int64_t &Size, bool &ZeroData) {
3346 MachineFunction &MF = *MI.getParent()->getParent();
3347 const MachineFrameInfo &MFI = MF.getFrameInfo();
3348
3349 unsigned Opcode = MI.getOpcode();
3350 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
3351 Opcode == AArch64::STZ2Gi);
3352
3353 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
3354 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
3355 return false;
3356 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
3357 return false;
3358 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
3359 Size = MI.getOperand(2).getImm();
3360 return true;
3361 }
3362
3363 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
3364 Size = 16;
3365 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
3366 Size = 32;
3367 else
3368 return false;
3369
3370 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
3371 return false;
3372
3373 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
3374 16 * MI.getOperand(2).getImm();
3375 return true;
3376}
3377
3378static size_t countAvailableScavengerSlots(LivePhysRegs &LiveRegs,
3380 RegScavenger *RS) {
3381 auto FreeGPRs =
3382 llvm::count_if(AArch64::GPR64RegClass, [&LiveRegs, &MRI](auto Reg) {
3383 return LiveRegs.available(MRI, Reg);
3384 });
3385
3386 size_t NumEmergencySlots = 0;
3387 if (RS)
3388 NumEmergencySlots = RS->getNumScavengingFrameIndices();
3389
3390 return FreeGPRs + NumEmergencySlots;
3391}
3392
3393// Detect a run of memory tagging instructions for adjacent stack frame slots,
3394// and replace them with a shorter instruction sequence:
3395// * replace STG + STG with ST2G
3396// * replace STGloop + STGloop with STGloop
3397// This code needs to run when stack slot offsets are already known, but before
3398// FrameIndex operands in STG instructions are eliminated.
3400 const AArch64FrameLowering *TFI,
3401 RegScavenger *RS) {
3402 bool FirstZeroData;
3403 int64_t Size, Offset;
3404 MachineInstr &MI = *II;
3405 MachineBasicBlock *MBB = MI.getParent();
3407 if (&MI == &MBB->instr_back())
3408 return II;
3409 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
3410 return II;
3411
3413 Instrs.emplace_back(&MI, Offset, Size);
3414
3415 constexpr int kScanLimit = 10;
3416 int Count = 0;
3418 NextI != E && Count < kScanLimit; ++NextI) {
3419 MachineInstr &MI = *NextI;
3420 bool ZeroData;
3421 int64_t Size, Offset;
3422 // Collect instructions that update memory tags with a FrameIndex operand
3423 // and (when applicable) constant size, and whose output registers are dead
3424 // (the latter is almost always the case in practice). Since these
3425 // instructions effectively have no inputs or outputs, we are free to skip
3426 // any non-aliasing instructions in between without tracking used registers.
3427 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
3428 if (ZeroData != FirstZeroData)
3429 break;
3430 Instrs.emplace_back(&MI, Offset, Size);
3431 continue;
3432 }
3433
3434 // Only count non-transient, non-tagging instructions toward the scan
3435 // limit.
3436 if (!MI.isTransient())
3437 ++Count;
3438
3439 // Just in case, stop before the epilogue code starts.
3440 if (MI.getFlag(MachineInstr::FrameSetup) ||
3442 break;
3443
3444 // Reject anything that may alias the collected instructions.
3445 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
3446 break;
3447 }
3448
3449 // New code will be inserted after the last tagging instruction we've found.
3450 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
3451
3452 // All the gathered stack tag instructions are merged and placed after
3453 // last tag store in the list. The check should be made if the nzcv
3454 // flag is live at the point where we are trying to insert. Otherwise
3455 // the nzcv flag might get clobbered if any stg loops are present.
3456
3457 // FIXME : This approach of bailing out from merge is conservative in
3458 // some ways like even if stg loops are not present after merge the
3459 // insert list, this liveness check is done (which is not needed).
3461 LiveRegs.addLiveOuts(*MBB);
3462 for (auto I = MBB->rbegin();; ++I) {
3463 MachineInstr &MI = *I;
3464 if (MI == InsertI)
3465 break;
3466 LiveRegs.stepBackward(*I);
3467 }
3468 InsertI++;
3469 if (LiveRegs.contains(AArch64::NZCV))
3470 return InsertI;
3471
3472 // Emitting an MTE loop requires two physical registers (BaseReg and
3473 // SizeReg). If the function is under register pressure, the register
3474 // scavenger will crash trying to allocate them. If we don't have at least
3475 // two free slots (free registers + emergency slots), bail out and fall back
3476 // to the unrolled sequence.
3477 if (countAvailableScavengerSlots(LiveRegs, MBB->getParent()->getRegInfo(),
3478 RS) < 2) {
3479 LLVM_DEBUG(
3480 dbgs() << "Failed to merge MTE stack tagging instructions into loop "
3481 << "due to high register pressure.\n");
3482 return InsertI;
3483 }
3484
3485 llvm::stable_sort(Instrs,
3486 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
3487 return Left.Offset < Right.Offset;
3488 });
3489
3490 // Make sure that we don't have any overlapping stores.
3491 int64_t CurOffset = Instrs[0].Offset;
3492 for (auto &Instr : Instrs) {
3493 if (CurOffset > Instr.Offset)
3494 return NextI;
3495 CurOffset = Instr.Offset + Instr.Size;
3496 }
3497
3498 // Find contiguous runs of tagged memory and emit shorter instruction
3499 // sequences for them when possible.
3500 TagStoreEdit TSE(MBB, FirstZeroData);
3501 std::optional<int64_t> EndOffset;
3502 for (auto &Instr : Instrs) {
3503 if (EndOffset && *EndOffset != Instr.Offset) {
3504 // Found a gap.
3505 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
3506 TSE.clear();
3507 }
3508
3509 TSE.addInstruction(Instr);
3510 EndOffset = Instr.Offset + Instr.Size;
3511 }
3512
3513 const MachineFunction *MF = MBB->getParent();
3514 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
3515 TSE.emitCode(
3516 InsertI, TFI, /*TryMergeSPUpdate = */
3518
3519 return InsertI;
3520}
3521} // namespace
3522
3524 MachineFunction &MF, RegScavenger *RS = nullptr) const {
3525 for (auto &BB : MF)
3526 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
3528 II = tryMergeAdjacentSTG(II, this, RS);
3529 }
3530
3531 // By the time this method is called, most of the prologue/epilogue code is
3532 // already emitted, whether its location was affected by the shrink-wrapping
3533 // optimization or not.
3534 if (!MF.getFunction().hasFnAttribute(Attribute::Naked) &&
3535 shouldSignReturnAddressEverywhere(MF))
3537}
3538
3539/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
3540/// before the update. This is easily retrieved as it is exactly the offset
3541/// that is set in processFunctionBeforeFrameFinalized.
3543 const MachineFunction &MF, int FI, Register &FrameReg,
3544 bool IgnoreSPUpdates) const {
3545 const MachineFrameInfo &MFI = MF.getFrameInfo();
3546 if (IgnoreSPUpdates) {
3547 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
3548 << MFI.getObjectOffset(FI) << "\n");
3549 FrameReg = AArch64::SP;
3550 return StackOffset::getFixed(MFI.getObjectOffset(FI));
3551 }
3552
3553 // Go to common code if we cannot provide sp + offset.
3554 if (MFI.hasVarSizedObjects() ||
3557 return getFrameIndexReference(MF, FI, FrameReg);
3558
3559 FrameReg = AArch64::SP;
3560 return getStackOffset(MF, MFI.getObjectOffset(FI));
3561}
3562
3563/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
3564/// the parent's frame pointer
3566 const MachineFunction &MF) const {
3567 return 0;
3568}
3569
3570/// Funclets only need to account for space for the callee saved registers,
3571/// as the locals are accounted for in the parent's stack frame.
3573 const MachineFunction &MF) const {
3574 // This is the size of the pushed CSRs.
3575 unsigned CSSize =
3576 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
3577 // This is the amount of stack a funclet needs to allocate.
3578 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
3579 getStackAlign());
3580}
3581
3582namespace {
3583struct FrameObject {
3584 bool IsValid = false;
3585 // Index of the object in MFI.
3586 int ObjectIndex = 0;
3587 // Group ID this object belongs to.
3588 int GroupIndex = -1;
3589 // This object should be placed first (closest to SP).
3590 bool ObjectFirst = false;
3591 // This object's group (which always contains the object with
3592 // ObjectFirst==true) should be placed first.
3593 bool GroupFirst = false;
3594
3595 // Used to distinguish between FP and GPR accesses. The values are decided so
3596 // that they sort FPR < Hazard < GPR and they can be or'd together.
3597 unsigned Accesses = 0;
3598 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
3599};
3600
3601class GroupBuilder {
3602 SmallVector<int, 8> CurrentMembers;
3603 int NextGroupIndex = 0;
3604 std::vector<FrameObject> &Objects;
3605
3606public:
3607 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
3608 void AddMember(int Index) { CurrentMembers.push_back(Index); }
3609 void EndCurrentGroup() {
3610 if (CurrentMembers.size() > 1) {
3611 // Create a new group with the current member list. This might remove them
3612 // from their pre-existing groups. That's OK, dealing with overlapping
3613 // groups is too hard and unlikely to make a difference.
3614 LLVM_DEBUG(dbgs() << "group:");
3615 for (int Index : CurrentMembers) {
3616 Objects[Index].GroupIndex = NextGroupIndex;
3617 LLVM_DEBUG(dbgs() << " " << Index);
3618 }
3619 LLVM_DEBUG(dbgs() << "\n");
3620 NextGroupIndex++;
3621 }
3622 CurrentMembers.clear();
3623 }
3624};
3625
3626bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
3627 // Objects at a lower index are closer to FP; objects at a higher index are
3628 // closer to SP.
3629 //
3630 // For consistency in our comparison, all invalid objects are placed
3631 // at the end. This also allows us to stop walking when we hit the
3632 // first invalid item after it's all sorted.
3633 //
3634 // If we want to include a stack hazard region, order FPR accesses < the
3635 // hazard object < GPRs accesses in order to create a separation between the
3636 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
3637 //
3638 // Otherwise the "first" object goes first (closest to SP), followed by the
3639 // members of the "first" group.
3640 //
3641 // The rest are sorted by the group index to keep the groups together.
3642 // Higher numbered groups are more likely to be around longer (i.e. untagged
3643 // in the function epilogue and not at some earlier point). Place them closer
3644 // to SP.
3645 //
3646 // If all else equal, sort by the object index to keep the objects in the
3647 // original order.
3648 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
3649 A.GroupIndex, A.ObjectIndex) <
3650 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
3651 B.GroupIndex, B.ObjectIndex);
3652}
3653} // namespace
3654
3656 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
3658
3659 if ((!OrderFrameObjects && !AFI.hasSplitSVEObjects()) ||
3660 ObjectsToAllocate.empty())
3661 return;
3662
3663 const MachineFrameInfo &MFI = MF.getFrameInfo();
3664 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
3665 for (auto &Obj : ObjectsToAllocate) {
3666 FrameObjects[Obj].IsValid = true;
3667 FrameObjects[Obj].ObjectIndex = Obj;
3668 }
3669
3670 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
3671 // the same time.
3672 GroupBuilder GB(FrameObjects);
3673 for (auto &MBB : MF) {
3674 for (auto &MI : MBB) {
3675 if (MI.isDebugInstr())
3676 continue;
3677
3678 if (AFI.hasStackHazardSlotIndex()) {
3679 std::optional<int> FI = getLdStFrameID(MI, MFI);
3680 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3681 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3683 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
3684 else
3685 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
3686 }
3687 }
3688
3689 int OpIndex;
3690 switch (MI.getOpcode()) {
3691 case AArch64::STGloop:
3692 case AArch64::STZGloop:
3693 OpIndex = 3;
3694 break;
3695 case AArch64::STGi:
3696 case AArch64::STZGi:
3697 case AArch64::ST2Gi:
3698 case AArch64::STZ2Gi:
3699 OpIndex = 1;
3700 break;
3701 default:
3702 OpIndex = -1;
3703 }
3704
3705 int TaggedFI = -1;
3706 if (OpIndex >= 0) {
3707 const MachineOperand &MO = MI.getOperand(OpIndex);
3708 if (MO.isFI()) {
3709 int FI = MO.getIndex();
3710 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
3711 FrameObjects[FI].IsValid)
3712 TaggedFI = FI;
3713 }
3714 }
3715
3716 // If this is a stack tagging instruction for a slot that is not part of a
3717 // group yet, either start a new group or add it to the current one.
3718 if (TaggedFI >= 0)
3719 GB.AddMember(TaggedFI);
3720 else
3721 GB.EndCurrentGroup();
3722 }
3723 // Groups should never span multiple basic blocks.
3724 GB.EndCurrentGroup();
3725 }
3726
3727 if (AFI.hasStackHazardSlotIndex()) {
3728 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
3729 FrameObject::AccessHazard;
3730 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
3731 for (auto &Obj : FrameObjects)
3732 if (!Obj.Accesses ||
3733 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
3734 Obj.Accesses = FrameObject::AccessGPR;
3735 }
3736
3737 // If the function's tagged base pointer is pinned to a stack slot, we want to
3738 // put that slot first when possible. This will likely place it at SP + 0,
3739 // and save one instruction when generating the base pointer because IRG does
3740 // not allow an immediate offset.
3741 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
3742 if (TBPI) {
3743 FrameObjects[*TBPI].ObjectFirst = true;
3744 FrameObjects[*TBPI].GroupFirst = true;
3745 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
3746 if (FirstGroupIndex >= 0)
3747 for (FrameObject &Object : FrameObjects)
3748 if (Object.GroupIndex == FirstGroupIndex)
3749 Object.GroupFirst = true;
3750 }
3751
3752 llvm::stable_sort(FrameObjects, FrameObjectCompare);
3753
3754 int i = 0;
3755 for (auto &Obj : FrameObjects) {
3756 // All invalid items are sorted at the end, so it's safe to stop.
3757 if (!Obj.IsValid)
3758 break;
3759 ObjectsToAllocate[i++] = Obj.ObjectIndex;
3760 }
3761
3762 LLVM_DEBUG({
3763 dbgs() << "Final frame order:\n";
3764 for (auto &Obj : FrameObjects) {
3765 if (!Obj.IsValid)
3766 break;
3767 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
3768 if (Obj.ObjectFirst)
3769 dbgs() << ", first";
3770 if (Obj.GroupFirst)
3771 dbgs() << ", group-first";
3772 dbgs() << "\n";
3773 }
3774 });
3775}
3776
3777/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
3778/// least every ProbeSize bytes. Returns an iterator of the first instruction
3779/// after the loop. The difference between SP and TargetReg must be an exact
3780/// multiple of ProbeSize.
3782AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
3783 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
3784 Register TargetReg) const {
3785 MachineBasicBlock &MBB = *MBBI->getParent();
3786 MachineFunction &MF = *MBB.getParent();
3787 const AArch64InstrInfo *TII =
3788 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3789 DebugLoc DL = MBB.findDebugLoc(MBBI);
3790
3791 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
3792 MachineBasicBlock *LoopMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3793 MF.insert(MBBInsertPoint, LoopMBB);
3794 MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3795 MF.insert(MBBInsertPoint, ExitMBB);
3796
3797 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
3798 // in SUB).
3799 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
3800 StackOffset::getFixed(-ProbeSize), TII,
3802 // LDR XZR, [SP]
3803 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::LDRXui))
3804 .addDef(AArch64::XZR)
3805 .addReg(AArch64::SP)
3806 .addImm(0)
3810 Align(8)))
3812 // CMP SP, TargetReg
3813 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
3814 AArch64::XZR)
3815 .addReg(AArch64::SP)
3816 .addReg(TargetReg)
3819 // B.CC Loop
3820 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
3822 .addMBB(LoopMBB)
3824
3825 LoopMBB->addSuccessor(ExitMBB);
3826 LoopMBB->addSuccessor(LoopMBB);
3827 // Synthesize the exit MBB.
3828 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
3830 MBB.addSuccessor(LoopMBB);
3831 // Update liveins.
3832 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
3833
3834 return ExitMBB->begin();
3835}
3836
3837void AArch64FrameLowering::inlineStackProbeFixed(
3838 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
3839 StackOffset CFAOffset) const {
3840 MachineBasicBlock *MBB = MBBI->getParent();
3841 MachineFunction &MF = *MBB->getParent();
3842 const AArch64InstrInfo *TII =
3843 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3844 AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
3845 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
3846 bool HasFP = hasFP(MF);
3847
3848 DebugLoc DL;
3849 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
3850 int64_t NumBlocks = FrameSize / ProbeSize;
3851 int64_t ResidualSize = FrameSize % ProbeSize;
3852
3853 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
3854 << NumBlocks << " blocks of " << ProbeSize
3855 << " bytes, plus " << ResidualSize << " bytes\n");
3856
3857 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
3858 // ordinary loop.
3859 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
3860 for (int i = 0; i < NumBlocks; ++i) {
3861 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
3862 // encodable in a SUB).
3863 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3864 StackOffset::getFixed(-ProbeSize), TII,
3865 MachineInstr::FrameSetup, false, false, nullptr,
3866 EmitAsyncCFI && !HasFP, CFAOffset);
3867 CFAOffset += StackOffset::getFixed(ProbeSize);
3868 // LDR XZR, [SP]
3869 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::LDRXui))
3870 .addDef(AArch64::XZR)
3871 .addReg(AArch64::SP)
3872 .addImm(0)
3876 Align(8)))
3878 }
3879 } else if (NumBlocks != 0) {
3880 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
3881 // encodable in ADD). ScrathReg may temporarily become the CFA register.
3882 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
3883 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
3884 MachineInstr::FrameSetup, false, false, nullptr,
3885 EmitAsyncCFI && !HasFP, CFAOffset);
3886 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
3887 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
3888 MBB = MBBI->getParent();
3889 if (EmitAsyncCFI && !HasFP) {
3890 // Set the CFA register back to SP.
3891 CFIInstBuilder(*MBB, MBBI, MachineInstr::FrameSetup)
3892 .buildDefCFARegister(AArch64::SP);
3893 }
3894 }
3895
3896 if (ResidualSize != 0) {
3897 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
3898 // in SUB).
3899 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3900 StackOffset::getFixed(-ResidualSize), TII,
3901 MachineInstr::FrameSetup, false, false, nullptr,
3902 EmitAsyncCFI && !HasFP, CFAOffset);
3903 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
3904 // LDR XZR, [SP]
3905 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::LDRXui))
3906 .addDef(AArch64::XZR)
3907 .addReg(AArch64::SP)
3908 .addImm(0)
3912 Align(8)))
3914 }
3915 }
3916}
3917
3918void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
3919 MachineBasicBlock &MBB) const {
3920 // Get the instructions that need to be replaced. We emit at most two of
3921 // these. Remember them in order to avoid complications coming from the need
3922 // to traverse the block while potentially creating more blocks.
3923 SmallVector<MachineInstr *, 4> ToReplace;
3924 for (MachineInstr &MI : MBB)
3925 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
3926 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
3927 ToReplace.push_back(&MI);
3928
3929 for (MachineInstr *MI : ToReplace) {
3930 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
3931 Register ScratchReg = MI->getOperand(0).getReg();
3932 int64_t FrameSize = MI->getOperand(1).getImm();
3933 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
3934 MI->getOperand(3).getImm());
3935 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
3936 CFAOffset);
3937 } else {
3938 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
3939 "Stack probe pseudo-instruction expected");
3940 const AArch64InstrInfo *TII =
3941 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
3942 Register TargetReg = MI->getOperand(0).getReg();
3943 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
3944 }
3945 MI->eraseFromParent();
3946 }
3947}
3948
3951 NotAccessed = 0, // Stack object not accessed by load/store instructions.
3952 GPR = 1 << 0, // A general purpose register.
3953 PPR = 1 << 1, // A predicate register.
3954 FPR = 1 << 2, // A floating point/Neon/SVE register.
3955 };
3956
3957 int Idx;
3959 int64_t Size;
3960 unsigned AccessTypes;
3961
3963
3964 bool operator<(const StackAccess &Rhs) const {
3965 return std::make_tuple(start(), Idx) <
3966 std::make_tuple(Rhs.start(), Rhs.Idx);
3967 }
3968
3969 bool isCPU() const {
3970 // Predicate register load and store instructions execute on the CPU.
3972 }
3973 bool isSME() const { return AccessTypes & AccessType::FPR; }
3974 bool isMixed() const { return isCPU() && isSME(); }
3975
3976 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
3977 int64_t end() const { return start() + Size; }
3978
3979 std::string getTypeString() const {
3980 switch (AccessTypes) {
3981 case AccessType::FPR:
3982 return "FPR";
3983 case AccessType::PPR:
3984 return "PPR";
3985 case AccessType::GPR:
3986 return "GPR";
3988 return "NA";
3989 default:
3990 return "Mixed";
3991 }
3992 }
3993
3994 void print(raw_ostream &OS) const {
3995 OS << getTypeString() << " stack object at [SP"
3996 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
3997 if (Offset.getScalable())
3998 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
3999 << " * vscale";
4000 OS << "]";
4001 }
4002};
4003
4004static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
4005 SA.print(OS);
4006 return OS;
4007}
4008
4009void AArch64FrameLowering::emitRemarks(
4010 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
4011
4012 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
4014 return;
4015
4016 unsigned StackHazardSize = getStackHazardSize(MF);
4017 const uint64_t HazardSize =
4018 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
4019
4020 if (HazardSize == 0)
4021 return;
4022
4023 const MachineFrameInfo &MFI = MF.getFrameInfo();
4024 // Bail if function has no stack objects.
4025 if (!MFI.hasStackObjects())
4026 return;
4027
4028 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
4029
4030 size_t NumFPLdSt = 0;
4031 size_t NumNonFPLdSt = 0;
4032
4033 // Collect stack accesses via Load/Store instructions.
4034 for (const MachineBasicBlock &MBB : MF) {
4035 for (const MachineInstr &MI : MBB) {
4036 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
4037 continue;
4038 for (MachineMemOperand *MMO : MI.memoperands()) {
4039 std::optional<int> FI = getMMOFrameID(MMO, MFI);
4040 if (FI && !MFI.isDeadObjectIndex(*FI)) {
4041 int FrameIdx = *FI;
4042
4043 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
4044 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
4045 StackAccesses[ArrIdx].Idx = FrameIdx;
4046 StackAccesses[ArrIdx].Offset =
4047 getFrameIndexReferenceFromSP(MF, FrameIdx);
4048 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
4049 }
4050
4051 unsigned RegTy = StackAccess::AccessType::GPR;
4052 if (MFI.hasScalableStackID(FrameIdx))
4055 RegTy = StackAccess::FPR;
4056
4057 StackAccesses[ArrIdx].AccessTypes |= RegTy;
4058
4059 if (RegTy == StackAccess::FPR)
4060 ++NumFPLdSt;
4061 else
4062 ++NumNonFPLdSt;
4063 }
4064 }
4065 }
4066 }
4067
4068 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
4069 return;
4070
4071 llvm::sort(StackAccesses);
4072 llvm::erase_if(StackAccesses, [](const StackAccess &S) {
4074 });
4075
4078
4079 if (StackAccesses.front().isMixed())
4080 MixedObjects.push_back(&StackAccesses.front());
4081
4082 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
4083 It != End; ++It) {
4084 const auto &First = *It;
4085 const auto &Second = *(It + 1);
4086
4087 if (Second.isMixed())
4088 MixedObjects.push_back(&Second);
4089
4090 if ((First.isSME() && Second.isCPU()) ||
4091 (First.isCPU() && Second.isSME())) {
4092 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
4093 if (Distance < HazardSize)
4094 HazardPairs.emplace_back(&First, &Second);
4095 }
4096 }
4097
4098 auto EmitRemark = [&](llvm::StringRef Str) {
4099 ORE->emit([&]() {
4100 auto R = MachineOptimizationRemarkAnalysis(
4101 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
4102 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
4103 });
4104 };
4105
4106 for (const auto &P : HazardPairs)
4107 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
4108
4109 for (const auto *Obj : MixedObjects)
4110 EmitRemark(
4111 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
4112}
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static RegState getPrologueDeath(MachineFunction &MF, unsigned Reg)
static bool produceCompactUnwindFrame(const AArch64FrameLowering &, MachineFunction &MF)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static cl::opt< bool > SplitSVEObjects("aarch64-split-sve-objects", cl::desc("Split allocation of ZPR & PPR objects"), cl::init(true), cl::Hidden)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
void computeCalleeSaveRegisterPairs(const AArch64FrameLowering &AFL, MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
static bool invalidateRegisterPairing(bool SpillExtendedVolatile, unsigned SpillCount, unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool isLikelyToHaveSVEStack(const AArch64FrameLowering &AFL, const MachineFunction &MF)
static bool invalidateWindowsRegisterPairing(bool SpillExtendedVolatile, unsigned SpillCount, unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, const TargetRegisterInfo *TRI)
static SVEStackSizes determineSVEStackSizes(MachineFunction &MF, AssignObjectOffsets AssignOffsets)
Process all the SVE stack objects and the SVE stack size and offsets for each object.
static bool isTargetWindows(const MachineFunction &MF)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static unsigned getStackHazardSize(const MachineFunction &MF)
MCRegister findFreePredicateReg(BitVector &SavedRegs)
static bool isPPRAccess(const MachineInstr &MI)
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
This file contains the declaration of the AArch64PrologueEmitter and AArch64EpilogueEmitter classes,...
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
DXIL Forward Handle Accesses
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
static std::string getTypeString(Type *T)
Definition LLParser.cpp:68
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition MD5.cpp:54
#define I(x, y, z)
Definition MD5.cpp:57
#define H(x, y, z)
Definition MD5.cpp:56
Register Reg
Register const TargetRegisterInfo * TRI
Promote Memory to Register
Definition Mem2Reg.cpp:110
uint64_t IntrinsicInst * II
#define P(N)
This file declares the machine register scavenger class.
unsigned OpIndex
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition Value.cpp:484
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
#define LLVM_DEBUG(...)
Definition Debug.h:119
StackOffset getSVEStackSize(const MachineFunction &MF) const
Returns the size of the entire SVE stackframe (PPRs + ZPRs).
StackOffset getZPRStackSize(const MachineFunction &MF) const
Returns the size of the entire ZPR stackframe (calleesaves + spills).
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
bool enableFullCFIFixup(const MachineFunction &MF) const override
enableFullCFIFixup - Returns true if we may need to fix the unwind information such that it is accura...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
bool enableCFIFixup(const MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon function entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool hasSVECalleeSavesAboveFrameRecord(const MachineFunction &MF) const
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, TargetStackID::Value StackID, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
bool needsWinCFI(const MachineFunction &MF) const
bool isFPReserved(const MachineFunction &MF) const
Should the Frame Pointer be reserved for the current function?
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
StackOffset getPPRStackSize(const MachineFunction &MF) const
Returns the size of the entire PPR stackframe (calleesaves + spills + hazard padding).
int64_t getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB) const
Returns how much of the incoming argument stack area (in bytes) we should clean up in an epilogue.
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
bool requiresSaveVG(const MachineFunction &MF) const
void emitPacRetPlusLeafHardening(MachineFunction &MF) const
Harden the entire function with pac-ret.
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
SignReturnAddress getSignReturnAddressCondition() const
void setStackSizeSVE(uint64_t ZPR, uint64_t PPR)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setSVECalleeSavedStackSize(unsigned ZPR, unsigned PPR)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:40
size_t size() const
Get the array size.
Definition ArrayRef.h:141
bool empty() const
Check if the array is empty.
Definition ArrayRef.h:136
bool test(unsigned Idx) const
Returns true if bit Idx is set.
Definition BitVector.h:482
BitVector & reset()
Reset all bits in the bitvector.
Definition BitVector.h:409
size_type count() const
Returns the number of bits which are set.
Definition BitVector.h:181
BitVector & set()
Set all bits in the bitvector.
Definition BitVector.h:366
iterator_range< const_set_bits_iterator > set_bits() const
Definition BitVector.h:159
size_type size() const
Returns the number of bits in this bitvector.
Definition BitVector.h:178
Helper class for creating CFI instructions and inserting them into MIR.
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition DebugLoc.h:124
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition Function.h:711
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition Function.h:272
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition Function.h:354
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition Function.h:229
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition Function.cpp:724
A set of physical registers with utility functions to track liveness when walking backward/forward th...
bool usesWindowsCFI() const
Definition MCAsmInfo.h:674
Wrapper class representing physical registers. Should be passed by value.
Definition MCRegister.h:41
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
LLVM_ABI int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
LLVM_ABI int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
bool isCalleeSavedObjectIndex(int ObjectIdx) const
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
bool hasScalableStackID(int ObjectIdx) const
int getStackProtectorIndex() const
Return the index for the stack protector object.
LLVM_ABI uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
LLVM_ABI int CreateSpillStackObject(uint64_t Size, Align Alignment, TargetStackID::Value StackID=TargetStackID::Default)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
void setIsCalleeSavedObjectIndex(int ObjectIdx, bool IsCalleeSaved)
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & addReg(Register RegNo, RegState Flags={}, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & addDef(Register RegNo, RegState Flags={}, unsigned SubReg=0) const
Add a virtual register definition operand.
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
Representation of each machine instruction.
void setFlags(unsigned flags)
uint32_t getFlags() const
Return the MI flags bitvector.
LLVM_ABI MachineInstrBundleIterator< MachineInstr > eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOVolatile
The memory access is volatile.
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
int64_t getImm() const
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
LLVM_ABI void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
LLVM_ABI void freezeReservedRegs()
freezeReservedRegs - Called by the register allocator to freeze the set of reserved registers before ...
bool isReserved(MCRegister PhysReg) const
isReserved - Returns true when PhysReg is a reserved register.
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
LLVM_ABI bool isLiveIn(Register Reg) const
LLVM_ABI const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
LLVM_ABI bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
Represent a mutable reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:294
Wrapper class representing virtual and physical registers.
Definition Register.h:20
constexpr bool isValid() const
Definition Register.h:112
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasNonStreamingInterfaceAndBody() const
bool hasStreamingBody() const
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:151
A SetVector that performs no allocations if smaller than a certain size.
Definition SetVector.h:339
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
reference emplace_back(ArgTypes &&... Args)
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
StackOffset holds a fixed and a scalable offset in bytes.
Definition TypeSize.h:30
int64_t getFixed() const
Returns the fixed component of the stack.
Definition TypeSize.h:46
int64_t getScalable() const
Returns the scalable component of the stack.
Definition TypeSize.h:49
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition TypeSize.h:41
static StackOffset getScalable(int64_t Scalable)
Definition TypeSize.h:40
static StackOffset getFixed(int64_t Fixed)
Definition TypeSize.h:39
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(const MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
Primary interface to the complete machine description for the target machine.
const Triple & getTargetTriple() const
const MCAsmInfo & getMCAsmInfo() const
Return target specific asm information.
TargetOptions Options
LLVM_ABI bool FramePointerIsReserved(const MachineFunction &MF) const
FramePointerIsReserved - This returns true if the frame pointer must always either point to a new fra...
LLVM_ABI bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
virtual const TargetRegisterInfo * getRegisterInfo() const =0
Return the target's register information.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
bool isOSBinFormatMachO() const
Tests whether the environment is MachO.
Definition Triple.h:791
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition CallingConv.h:50
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition CallingConv.h:66
@ Fast
Attempts to make calls as fast as possible (e.g.
Definition CallingConv.h:41
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition CallingConv.h:87
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
initializer< Ty > init(const Ty &Val)
NodeAddr< InstrNode * > Instr
Definition RDFGraph.h:389
BaseReg
Stack frame base register. Bit 0 of FREInfo.Info.
Definition SFrame.h:77
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:558
void stable_sort(R &&Range)
Definition STLExtras.h:2115
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
RegState
Flags to represent properties of register accesses.
@ Define
Register definition.
constexpr RegState getKillRegState(bool B)
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
constexpr T alignDown(U Value, V Align, W Skew=0)
Returns the largest unsigned integer less than or equal to Value and is Skew mod Align.
Definition MathExtras.h:546
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1745
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition STLExtras.h:407
void sort(IteratorTy Start, IteratorTy End)
Definition STLExtras.h:1635
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:209
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:163
constexpr uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
FunctionAddr VTableAddr Count
Definition InstrProf.h:139
constexpr RegState getDefRegState(bool B)
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:74
uint16_t MCPhysReg
An unsigned integer type large enough to represent all physical registers, but not necessarily virtua...
Definition MCRegister.h:21
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
auto count_if(R &&Range, UnaryPredicate P)
Wrapper function around std::count_if to count the number of times an element satisfying a given pred...
Definition STLExtras.h:2018
auto find_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::find_if which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1771
void erase_if(Container &C, UnaryPredicate P)
Provide a container algorithm similar to C++ Library Fundamentals v2's erase_if which is equivalent t...
Definition STLExtras.h:2191
bool is_contained(R &&Range, const E &Element)
Returns true if Element is found in Range.
Definition STLExtras.h:1946
LLVM_ABI const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=MaxLookupSearchDepth)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
LLVM_ABI Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition BitVector.h:862
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
Pair of physical register and lane mask.
static LLVM_ABI MachinePointerInfo getUnknownStack(MachineFunction &MF)
Stack memory without other information.
static LLVM_ABI MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.
SmallVector< WinEHTryBlockMapEntry, 4 > TryBlockMap
SmallVector< WinEHHandlerType, 1 > HandlerArray