LLVM 22.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | (Win64 only) callee-saved SVE reg |
48// | |
49// |-----------------------------------|
50// | |
51// | callee-saved gpr registers | <--.
52// | | | On Darwin platforms these
53// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
54// | prev_lr | | (frame record first)
55// | prev_fp | <--'
56// | async context if needed |
57// | (a.k.a. "frame record") |
58// |-----------------------------------| <- fp(=x29)
59// Default SVE stack layout Split SVE objects
60// (aarch64-split-sve-objects=false) (aarch64-split-sve-objects=true)
61// |-----------------------------------| |-----------------------------------|
62// | <hazard padding> | | callee-saved PPR registers |
63// |-----------------------------------| |-----------------------------------|
64// | | | PPR stack objects |
65// | callee-saved fp/simd/SVE regs | |-----------------------------------|
66// | | | <hazard padding> |
67// |-----------------------------------| |-----------------------------------|
68// | | | callee-saved ZPR/FPR registers |
69// | SVE stack objects | |-----------------------------------|
70// | | | ZPR stack objects |
71// |-----------------------------------| |-----------------------------------|
72// ^ NB: FPR CSRs are promoted to ZPRs
73// |-----------------------------------|
74// |.empty.space.to.make.part.below....|
75// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
76// |.the.standard.16-byte.alignment....| compile time; if present)
77// |-----------------------------------|
78// | local variables of fixed size |
79// | including spill slots |
80// | <FPR> |
81// | <hazard padding> |
82// | <GPR> |
83// |-----------------------------------| <- bp(not defined by ABI,
84// |.variable-sized.local.variables....| LLVM chooses X19)
85// |.(VLAs)............................| (size of this area is unknown at
86// |...................................| compile time)
87// |-----------------------------------| <- sp
88// | | Lower address
89//
90//
91// To access the data in a frame, at-compile time, a constant offset must be
92// computable from one of the pointers (fp, bp, sp) to access it. The size
93// of the areas with a dotted background cannot be computed at compile-time
94// if they are present, making it required to have all three of fp, bp and
95// sp to be set up to be able to access all contents in the frame areas,
96// assuming all of the frame areas are non-empty.
97//
98// For most functions, some of the frame areas are empty. For those functions,
99// it may not be necessary to set up fp or bp:
100// * A base pointer is definitely needed when there are both VLAs and local
101// variables with more-than-default alignment requirements.
102// * A frame pointer is definitely needed when there are local variables with
103// more-than-default alignment requirements.
104//
105// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
106// callee-saved area, since the unwind encoding does not allow for encoding
107// this dynamically and existing tools depend on this layout. For other
108// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
109// area to allow SVE stack objects (allocated directly below the callee-saves,
110// if available) to be accessed directly from the framepointer.
111// The SVE spill/fill instructions have VL-scaled addressing modes such
112// as:
113// ldr z8, [fp, #-7 mul vl]
114// For SVE the size of the vector length (VL) is not known at compile-time, so
115// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
116// layout, we don't need to add an unscaled offset to the framepointer before
117// accessing the SVE object in the frame.
118//
119// In some cases when a base pointer is not strictly needed, it is generated
120// anyway when offsets from the frame pointer to access local variables become
121// so large that the offset can't be encoded in the immediate fields of loads
122// or stores.
123//
124// Outgoing function arguments must be at the bottom of the stack frame when
125// calling another function. If we do not have variable-sized stack objects, we
126// can allocate a "reserved call frame" area at the bottom of the local
127// variable area, large enough for all outgoing calls. If we do have VLAs, then
128// the stack pointer must be decremented and incremented around each call to
129// make space for the arguments below the VLAs.
130//
131// FIXME: also explain the redzone concept.
132//
133// About stack hazards: Under some SME contexts, a coprocessor with its own
134// separate cache can used for FP operations. This can create hazards if the CPU
135// and the SME unit try to access the same area of memory, including if the
136// access is to an area of the stack. To try to alleviate this we attempt to
137// introduce extra padding into the stack frame between FP and GPR accesses,
138// controlled by the aarch64-stack-hazard-size option. Without changing the
139// layout of the stack frame in the diagram above, a stack object of size
140// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
141// to the stack objects section, and stack objects are sorted so that FPR >
142// Hazard padding slot > GPRs (where possible). Unfortunately some things are
143// not handled well (VLA area, arguments on the stack, objects with both GPR and
144// FPR accesses), but if those are controlled by the user then the entire stack
145// frame becomes GPR at the start/end with FPR in the middle, surrounded by
146// Hazard padding.
147//
148// An example of the prologue:
149//
150// .globl __foo
151// .align 2
152// __foo:
153// Ltmp0:
154// .cfi_startproc
155// .cfi_personality 155, ___gxx_personality_v0
156// Leh_func_begin:
157// .cfi_lsda 16, Lexception33
158//
159// stp xa,bx, [sp, -#offset]!
160// ...
161// stp x28, x27, [sp, #offset-32]
162// stp fp, lr, [sp, #offset-16]
163// add fp, sp, #offset - 16
164// sub sp, sp, #1360
165//
166// The Stack:
167// +-------------------------------------------+
168// 10000 | ........ | ........ | ........ | ........ |
169// 10004 | ........ | ........ | ........ | ........ |
170// +-------------------------------------------+
171// 10008 | ........ | ........ | ........ | ........ |
172// 1000c | ........ | ........ | ........ | ........ |
173// +===========================================+
174// 10010 | X28 Register |
175// 10014 | X28 Register |
176// +-------------------------------------------+
177// 10018 | X27 Register |
178// 1001c | X27 Register |
179// +===========================================+
180// 10020 | Frame Pointer |
181// 10024 | Frame Pointer |
182// +-------------------------------------------+
183// 10028 | Link Register |
184// 1002c | Link Register |
185// +===========================================+
186// 10030 | ........ | ........ | ........ | ........ |
187// 10034 | ........ | ........ | ........ | ........ |
188// +-------------------------------------------+
189// 10038 | ........ | ........ | ........ | ........ |
190// 1003c | ........ | ........ | ........ | ........ |
191// +-------------------------------------------+
192//
193// [sp] = 10030 :: >>initial value<<
194// sp = 10020 :: stp fp, lr, [sp, #-16]!
195// fp = sp == 10020 :: mov fp, sp
196// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
197// sp == 10010 :: >>final value<<
198//
199// The frame pointer (w29) points to address 10020. If we use an offset of
200// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
201// for w27, and -32 for w28:
202//
203// Ltmp1:
204// .cfi_def_cfa w29, 16
205// Ltmp2:
206// .cfi_offset w30, -8
207// Ltmp3:
208// .cfi_offset w29, -16
209// Ltmp4:
210// .cfi_offset w27, -24
211// Ltmp5:
212// .cfi_offset w28, -32
213//
214//===----------------------------------------------------------------------===//
215
216#include "AArch64FrameLowering.h"
217#include "AArch64InstrInfo.h"
220#include "AArch64RegisterInfo.h"
221#include "AArch64SMEAttributes.h"
222#include "AArch64Subtarget.h"
225#include "llvm/ADT/ScopeExit.h"
226#include "llvm/ADT/SmallVector.h"
244#include "llvm/IR/Attributes.h"
245#include "llvm/IR/CallingConv.h"
246#include "llvm/IR/DataLayout.h"
247#include "llvm/IR/DebugLoc.h"
248#include "llvm/IR/Function.h"
249#include "llvm/MC/MCAsmInfo.h"
250#include "llvm/MC/MCDwarf.h"
252#include "llvm/Support/Debug.h"
259#include <cassert>
260#include <cstdint>
261#include <iterator>
262#include <optional>
263#include <vector>
264
265using namespace llvm;
266
267#define DEBUG_TYPE "frame-info"
268
269static cl::opt<bool> EnableRedZone("aarch64-redzone",
270 cl::desc("enable use of redzone on AArch64"),
271 cl::init(false), cl::Hidden);
272
274 "stack-tagging-merge-settag",
275 cl::desc("merge settag instruction in function epilog"), cl::init(true),
276 cl::Hidden);
277
278static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
279 cl::desc("sort stack allocations"),
280 cl::init(true), cl::Hidden);
281
282static cl::opt<bool>
283 SplitSVEObjects("aarch64-split-sve-objects",
284 cl::desc("Split allocation of ZPR & PPR objects"),
285 cl::init(true), cl::Hidden);
286
288 "homogeneous-prolog-epilog", cl::Hidden,
289 cl::desc("Emit homogeneous prologue and epilogue for the size "
290 "optimization (default = off)"));
291
292// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
294 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
295 cl::Hidden);
296// Whether to insert padding into non-streaming functions (for testing).
297static cl::opt<bool>
298 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
299 cl::init(false), cl::Hidden);
300
302 "aarch64-disable-multivector-spill-fill",
303 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
304 cl::Hidden);
305
306int64_t
307AArch64FrameLowering::getArgumentStackToRestore(MachineFunction &MF,
308 MachineBasicBlock &MBB) const {
309 MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
311 bool IsTailCallReturn = (MBB.end() != MBBI)
313 : false;
314
315 int64_t ArgumentPopSize = 0;
316 if (IsTailCallReturn) {
317 MachineOperand &StackAdjust = MBBI->getOperand(1);
318
319 // For a tail-call in a callee-pops-arguments environment, some or all of
320 // the stack may actually be in use for the call's arguments, this is
321 // calculated during LowerCall and consumed here...
322 ArgumentPopSize = StackAdjust.getImm();
323 } else {
324 // ... otherwise the amount to pop is *all* of the argument space,
325 // conveniently stored in the MachineFunctionInfo by
326 // LowerFormalArguments. This will, of course, be zero for the C calling
327 // convention.
328 ArgumentPopSize = AFI->getArgumentStackToRestore();
329 }
330
331 return ArgumentPopSize;
332}
333
335 MachineFunction &MF);
336
337enum class AssignObjectOffsets { No, Yes };
338/// Process all the SVE stack objects and the SVE stack size and offsets for
339/// each object. If AssignOffsets is "Yes", the offsets get assigned (and SVE
340/// stack sizes set). Returns the size of the SVE stack.
342 AssignObjectOffsets AssignOffsets);
343
344static unsigned getStackHazardSize(const MachineFunction &MF) {
345 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
346}
347
353
356 // With split SVE objects, the hazard padding is added to the PPR region,
357 // which places it between the [GPR, PPR] area and the [ZPR, FPR] area. This
358 // avoids hazards between both GPRs and FPRs and ZPRs and PPRs.
361 : 0,
362 AFI->getStackSizePPR());
363}
364
365// Conservatively, returns true if the function is likely to have SVE vectors
366// on the stack. This function is safe to be called before callee-saves or
367// object offsets have been determined.
369 const MachineFunction &MF) {
370 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
371 if (AFI->isSVECC())
372 return true;
373
374 if (AFI->hasCalculatedStackSizeSVE())
375 return bool(AFL.getSVEStackSize(MF));
376
377 const MachineFrameInfo &MFI = MF.getFrameInfo();
378 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd(); FI++) {
379 if (MFI.hasScalableStackID(FI))
380 return true;
381 }
382
383 return false;
384}
385
386static bool isTargetWindows(const MachineFunction &MF) {
387 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI();
388}
389
395
396/// Returns true if a homogeneous prolog or epilog code can be emitted
397/// for the size optimization. If possible, a frame helper call is injected.
398/// When Exit block is given, this check is for epilog.
399bool AArch64FrameLowering::homogeneousPrologEpilog(
400 MachineFunction &MF, MachineBasicBlock *Exit) const {
401 if (!MF.getFunction().hasMinSize())
402 return false;
404 return false;
405 if (EnableRedZone)
406 return false;
407
408 // TODO: Window is supported yet.
409 if (isTargetWindows(MF))
410 return false;
411
412 // TODO: SVE is not supported yet.
413 if (isLikelyToHaveSVEStack(*this, MF))
414 return false;
415
416 // Bail on stack adjustment needed on return for simplicity.
417 const MachineFrameInfo &MFI = MF.getFrameInfo();
418 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
419 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
420 return false;
421 if (Exit && getArgumentStackToRestore(MF, *Exit))
422 return false;
423
424 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
426 return false;
427
428 // If there are an odd number of GPRs before LR and FP in the CSRs list,
429 // they will not be paired into one RegPairInfo, which is incompatible with
430 // the assumption made by the homogeneous prolog epilog pass.
431 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
432 unsigned NumGPRs = 0;
433 for (unsigned I = 0; CSRegs[I]; ++I) {
434 Register Reg = CSRegs[I];
435 if (Reg == AArch64::LR) {
436 assert(CSRegs[I + 1] == AArch64::FP);
437 if (NumGPRs % 2 != 0)
438 return false;
439 break;
440 }
441 if (AArch64::GPR64RegClass.contains(Reg))
442 ++NumGPRs;
443 }
444
445 return true;
446}
447
448/// Returns true if CSRs should be paired.
449bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
450 return produceCompactUnwindFrame(*this, MF) || homogeneousPrologEpilog(MF);
451}
452
453/// This is the biggest offset to the stack pointer we can encode in aarch64
454/// instructions (without using a separate calculation and a temp register).
455/// Note that the exception here are vector stores/loads which cannot encode any
456/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
457static const unsigned DefaultSafeSPDisplacement = 255;
458
459/// Look at each instruction that references stack frames and return the stack
460/// size limit beyond which some of these instructions will require a scratch
461/// register during their expansion later.
463 // FIXME: For now, just conservatively guesstimate based on unscaled indexing
464 // range. We'll end up allocating an unnecessary spill slot a lot, but
465 // realistically that's not a big deal at this stage of the game.
466 for (MachineBasicBlock &MBB : MF) {
467 for (MachineInstr &MI : MBB) {
468 if (MI.isDebugInstr() || MI.isPseudo() ||
469 MI.getOpcode() == AArch64::ADDXri ||
470 MI.getOpcode() == AArch64::ADDSXri)
471 continue;
472
473 for (const MachineOperand &MO : MI.operands()) {
474 if (!MO.isFI())
475 continue;
476
478 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
480 return 0;
481 }
482 }
483 }
485}
486
491
492unsigned
493AArch64FrameLowering::getFixedObjectSize(const MachineFunction &MF,
494 const AArch64FunctionInfo *AFI,
495 bool IsWin64, bool IsFunclet) const {
496 assert(AFI->getTailCallReservedStack() % 16 == 0 &&
497 "Tail call reserved stack must be aligned to 16 bytes");
498 if (!IsWin64 || IsFunclet) {
499 return AFI->getTailCallReservedStack();
500 } else {
501 if (AFI->getTailCallReservedStack() != 0 &&
502 !MF.getFunction().getAttributes().hasAttrSomewhere(
503 Attribute::SwiftAsync))
504 report_fatal_error("cannot generate ABI-changing tail call for Win64");
505 unsigned FixedObjectSize = AFI->getTailCallReservedStack();
506
507 // Var args are stored here in the primary function.
508 FixedObjectSize += AFI->getVarArgsGPRSize();
509
510 if (MF.hasEHFunclets()) {
511 // Catch objects are stored here in the primary function.
512 const MachineFrameInfo &MFI = MF.getFrameInfo();
513 const WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
514 SmallSetVector<int, 8> CatchObjFrameIndices;
515 for (const WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
516 for (const WinEHHandlerType &H : TBME.HandlerArray) {
517 int FrameIndex = H.CatchObj.FrameIndex;
518 if ((FrameIndex != INT_MAX) &&
519 CatchObjFrameIndices.insert(FrameIndex)) {
520 FixedObjectSize = alignTo(FixedObjectSize,
521 MFI.getObjectAlign(FrameIndex).value()) +
522 MFI.getObjectSize(FrameIndex);
523 }
524 }
525 }
526 // To support EH funclets we allocate an UnwindHelp object
527 FixedObjectSize += 8;
528 }
529 return alignTo(FixedObjectSize, 16);
530 }
531}
532
534 if (!EnableRedZone)
535 return false;
536
537 // Don't use the red zone if the function explicitly asks us not to.
538 // This is typically used for kernel code.
539 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
540 const unsigned RedZoneSize =
542 if (!RedZoneSize)
543 return false;
544
545 const MachineFrameInfo &MFI = MF.getFrameInfo();
547 uint64_t NumBytes = AFI->getLocalStackSize();
548
549 // If neither NEON or SVE are available, a COPY from one Q-reg to
550 // another requires a spill -> reload sequence. We can do that
551 // using a pre-decrementing store/post-decrementing load, but
552 // if we do so, we can't use the Red Zone.
553 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
554 !Subtarget.isNeonAvailable() &&
555 !Subtarget.hasSVE();
556
557 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
558 AFI->hasSVEStackSize() || LowerQRegCopyThroughMem);
559}
560
561/// hasFPImpl - Return true if the specified function should have a dedicated
562/// frame pointer register.
564 const MachineFrameInfo &MFI = MF.getFrameInfo();
565 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
567
568 // Win64 EH requires a frame pointer if funclets are present, as the locals
569 // are accessed off the frame pointer in both the parent function and the
570 // funclets.
571 if (MF.hasEHFunclets())
572 return true;
573 // Retain behavior of always omitting the FP for leaf functions when possible.
575 return true;
576 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
577 MFI.hasStackMap() || MFI.hasPatchPoint() ||
578 RegInfo->hasStackRealignment(MF))
579 return true;
580
581 // If we:
582 //
583 // 1. Have streaming mode changes
584 // OR:
585 // 2. Have a streaming body with SVE stack objects
586 //
587 // Then the value of VG restored when unwinding to this function may not match
588 // the value of VG used to set up the stack.
589 //
590 // This is a problem as the CFA can be described with an expression of the
591 // form: CFA = SP + NumBytes + VG * NumScalableBytes.
592 //
593 // If the value of VG used in that expression does not match the value used to
594 // set up the stack, an incorrect address for the CFA will be computed, and
595 // unwinding will fail.
596 //
597 // We work around this issue by ensuring the frame-pointer can describe the
598 // CFA in either of these cases.
599 if (AFI.needsDwarfUnwindInfo(MF) &&
602 return true;
603 // With large callframes around we may need to use FP to access the scavenging
604 // emergency spillslot.
605 //
606 // Unfortunately some calls to hasFP() like machine verifier ->
607 // getReservedReg() -> hasFP in the middle of global isel are too early
608 // to know the max call frame size. Hopefully conservatively returning "true"
609 // in those cases is fine.
610 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
611 if (!MFI.isMaxCallFrameSizeComputed() ||
613 return true;
614
615 return false;
616}
617
618/// Should the Frame Pointer be reserved for the current function?
620 const TargetMachine &TM = MF.getTarget();
621 const Triple &TT = TM.getTargetTriple();
622
623 // These OSes require the frame chain is valid, even if the current frame does
624 // not use a frame pointer.
625 if (TT.isOSDarwin() || TT.isOSWindows())
626 return true;
627
628 // If the function has a frame pointer, it is reserved.
629 if (hasFP(MF))
630 return true;
631
632 // Frontend has requested to preserve the frame pointer.
634 return true;
635
636 return false;
637}
638
639/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
640/// not required, we reserve argument space for call sites in the function
641/// immediately on entry to the current function. This eliminates the need for
642/// add/sub sp brackets around call sites. Returns true if the call frame is
643/// included as part of the stack frame.
645 const MachineFunction &MF) const {
646 // The stack probing code for the dynamically allocated outgoing arguments
647 // area assumes that the stack is probed at the top - either by the prologue
648 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
649 // most recent variable-sized object allocation. Changing the condition here
650 // may need to be followed up by changes to the probe issuing logic.
651 return !MF.getFrameInfo().hasVarSizedObjects();
652}
653
657
658 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
659 const AArch64InstrInfo *TII = Subtarget.getInstrInfo();
660 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
661 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
662 DebugLoc DL = I->getDebugLoc();
663 unsigned Opc = I->getOpcode();
664 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
665 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
666
667 if (!hasReservedCallFrame(MF)) {
668 int64_t Amount = I->getOperand(0).getImm();
669 Amount = alignTo(Amount, getStackAlign());
670 if (!IsDestroy)
671 Amount = -Amount;
672
673 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
674 // doesn't have to pop anything), then the first operand will be zero too so
675 // this adjustment is a no-op.
676 if (CalleePopAmount == 0) {
677 // FIXME: in-function stack adjustment for calls is limited to 24-bits
678 // because there's no guaranteed temporary register available.
679 //
680 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
681 // 1) For offset <= 12-bit, we use LSL #0
682 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
683 // LSL #0, and the other uses LSL #12.
684 //
685 // Most call frames will be allocated at the start of a function so
686 // this is OK, but it is a limitation that needs dealing with.
687 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
688
689 if (TLI->hasInlineStackProbe(MF) &&
691 // When stack probing is enabled, the decrement of SP may need to be
692 // probed. We only need to do this if the call site needs 1024 bytes of
693 // space or more, because a region smaller than that is allowed to be
694 // unprobed at an ABI boundary. We rely on the fact that SP has been
695 // probed exactly at this point, either by the prologue or most recent
696 // dynamic allocation.
698 "non-reserved call frame without var sized objects?");
699 Register ScratchReg =
700 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
701 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
702 } else {
703 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
704 StackOffset::getFixed(Amount), TII);
705 }
706 }
707 } else if (CalleePopAmount != 0) {
708 // If the calling convention demands that the callee pops arguments from the
709 // stack, we want to add it back if we have a reserved call frame.
710 assert(CalleePopAmount < 0xffffff && "call frame too large");
711 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
712 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
713 }
714 return MBB.erase(I);
715}
716
718 MachineBasicBlock &MBB) const {
719
720 MachineFunction &MF = *MBB.getParent();
721 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
722 const auto &TRI = *Subtarget.getRegisterInfo();
723 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
724
725 CFIInstBuilder CFIBuilder(MBB, MBB.begin(), MachineInstr::NoFlags);
726
727 // Reset the CFA to `SP + 0`.
728 CFIBuilder.buildDefCFA(AArch64::SP, 0);
729
730 // Flip the RA sign state.
731 if (MFI.shouldSignReturnAddress(MF))
732 MFI.branchProtectionPAuthLR() ? CFIBuilder.buildNegateRAStateWithPC()
733 : CFIBuilder.buildNegateRAState();
734
735 // Shadow call stack uses X18, reset it.
736 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
737 CFIBuilder.buildSameValue(AArch64::X18);
738
739 // Emit .cfi_same_value for callee-saved registers.
740 const std::vector<CalleeSavedInfo> &CSI =
742 for (const auto &Info : CSI) {
743 MCRegister Reg = Info.getReg();
744 if (!TRI.regNeedsCFI(Reg, Reg))
745 continue;
746 CFIBuilder.buildSameValue(Reg);
747 }
748}
749
751 switch (Reg.id()) {
752 default:
753 // The called routine is expected to preserve r19-r28
754 // r29 and r30 are used as frame pointer and link register resp.
755 return 0;
756
757 // GPRs
758#define CASE(n) \
759 case AArch64::W##n: \
760 case AArch64::X##n: \
761 return AArch64::X##n
762 CASE(0);
763 CASE(1);
764 CASE(2);
765 CASE(3);
766 CASE(4);
767 CASE(5);
768 CASE(6);
769 CASE(7);
770 CASE(8);
771 CASE(9);
772 CASE(10);
773 CASE(11);
774 CASE(12);
775 CASE(13);
776 CASE(14);
777 CASE(15);
778 CASE(16);
779 CASE(17);
780 CASE(18);
781#undef CASE
782
783 // FPRs
784#define CASE(n) \
785 case AArch64::B##n: \
786 case AArch64::H##n: \
787 case AArch64::S##n: \
788 case AArch64::D##n: \
789 case AArch64::Q##n: \
790 return HasSVE ? AArch64::Z##n : AArch64::Q##n
791 CASE(0);
792 CASE(1);
793 CASE(2);
794 CASE(3);
795 CASE(4);
796 CASE(5);
797 CASE(6);
798 CASE(7);
799 CASE(8);
800 CASE(9);
801 CASE(10);
802 CASE(11);
803 CASE(12);
804 CASE(13);
805 CASE(14);
806 CASE(15);
807 CASE(16);
808 CASE(17);
809 CASE(18);
810 CASE(19);
811 CASE(20);
812 CASE(21);
813 CASE(22);
814 CASE(23);
815 CASE(24);
816 CASE(25);
817 CASE(26);
818 CASE(27);
819 CASE(28);
820 CASE(29);
821 CASE(30);
822 CASE(31);
823#undef CASE
824 }
825}
826
827void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
828 MachineBasicBlock &MBB) const {
829 // Insertion point.
831
832 // Fake a debug loc.
833 DebugLoc DL;
834 if (MBBI != MBB.end())
835 DL = MBBI->getDebugLoc();
836
837 const MachineFunction &MF = *MBB.getParent();
838 const AArch64Subtarget &STI = MF.getSubtarget<AArch64Subtarget>();
839 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
840
841 BitVector GPRsToZero(TRI.getNumRegs());
842 BitVector FPRsToZero(TRI.getNumRegs());
843 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
844 for (MCRegister Reg : RegsToZero.set_bits()) {
845 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
846 // For GPRs, we only care to clear out the 64-bit register.
847 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
848 GPRsToZero.set(XReg);
849 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
850 // For FPRs,
851 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
852 FPRsToZero.set(XReg);
853 }
854 }
855
856 const AArch64InstrInfo &TII = *STI.getInstrInfo();
857
858 // Zero out GPRs.
859 for (MCRegister Reg : GPRsToZero.set_bits())
860 TII.buildClearRegister(Reg, MBB, MBBI, DL);
861
862 // Zero out FP/vector registers.
863 for (MCRegister Reg : FPRsToZero.set_bits())
864 TII.buildClearRegister(Reg, MBB, MBBI, DL);
865
866 if (HasSVE) {
867 for (MCRegister PReg :
868 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
869 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
870 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
871 AArch64::P15}) {
872 if (RegsToZero[PReg])
873 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
874 }
875 }
876}
877
878bool AArch64FrameLowering::windowsRequiresStackProbe(
879 const MachineFunction &MF, uint64_t StackSizeInBytes) const {
880 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
881 const AArch64FunctionInfo &MFI = *MF.getInfo<AArch64FunctionInfo>();
882 // TODO: When implementing stack protectors, take that into account
883 // for the probe threshold.
884 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
885 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
886}
887
889 const MachineBasicBlock &MBB) {
890 const MachineFunction *MF = MBB.getParent();
891 LiveRegs.addLiveIns(MBB);
892 // Mark callee saved registers as used so we will not choose them.
893 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
894 for (unsigned i = 0; CSRegs[i]; ++i)
895 LiveRegs.addReg(CSRegs[i]);
896}
897
899AArch64FrameLowering::findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB,
900 bool HasCall) const {
901 MachineFunction *MF = MBB->getParent();
902
903 // If MBB is an entry block, use X9 as the scratch register
904 // preserve_none functions may be using X9 to pass arguments,
905 // so prefer to pick an available register below.
906 if (&MF->front() == MBB &&
908 return AArch64::X9;
909
910 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
911 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
912 LivePhysRegs LiveRegs(TRI);
913 getLiveRegsForEntryMBB(LiveRegs, *MBB);
914 if (HasCall) {
915 LiveRegs.addReg(AArch64::X16);
916 LiveRegs.addReg(AArch64::X17);
917 LiveRegs.addReg(AArch64::X18);
918 }
919
920 // Prefer X9 since it was historically used for the prologue scratch reg.
921 const MachineRegisterInfo &MRI = MF->getRegInfo();
922 if (LiveRegs.available(MRI, AArch64::X9))
923 return AArch64::X9;
924
925 for (unsigned Reg : AArch64::GPR64RegClass) {
926 if (LiveRegs.available(MRI, Reg))
927 return Reg;
928 }
929 return AArch64::NoRegister;
930}
931
933 const MachineBasicBlock &MBB) const {
934 const MachineFunction *MF = MBB.getParent();
935 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
936 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
937 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
938 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
940
941 if (AFI->hasSwiftAsyncContext()) {
942 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
943 const MachineRegisterInfo &MRI = MF->getRegInfo();
946 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
947 // available.
948 if (!LiveRegs.available(MRI, AArch64::X16) ||
949 !LiveRegs.available(MRI, AArch64::X17))
950 return false;
951 }
952
953 // Certain stack probing sequences might clobber flags, then we can't use
954 // the block as a prologue if the flags register is a live-in.
956 MBB.isLiveIn(AArch64::NZCV))
957 return false;
958
959 if (RegInfo->hasStackRealignment(*MF) || TLI->hasInlineStackProbe(*MF))
960 if (findScratchNonCalleeSaveRegister(TmpMBB) == AArch64::NoRegister)
961 return false;
962
963 // May need a scratch register (for return value) if require making a special
964 // call
965 if (requiresSaveVG(*MF) ||
966 windowsRequiresStackProbe(*MF, std::numeric_limits<uint64_t>::max()))
967 if (findScratchNonCalleeSaveRegister(TmpMBB, true) == AArch64::NoRegister)
968 return false;
969
970 return true;
971}
972
974 const Function &F = MF.getFunction();
975 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
976 F.needsUnwindTableEntry();
977}
978
979bool AArch64FrameLowering::shouldSignReturnAddressEverywhere(
980 const MachineFunction &MF) const {
981 // FIXME: With WinCFI, extra care should be taken to place SEH_PACSignLR
982 // and SEH_EpilogEnd instructions in the correct order.
984 return false;
987}
988
989// Given a load or a store instruction, generate an appropriate unwinding SEH
990// code on Windows.
992AArch64FrameLowering::insertSEH(MachineBasicBlock::iterator MBBI,
993 const AArch64InstrInfo &TII,
994 MachineInstr::MIFlag Flag) const {
995 unsigned Opc = MBBI->getOpcode();
996 MachineBasicBlock *MBB = MBBI->getParent();
997 MachineFunction &MF = *MBB->getParent();
998 DebugLoc DL = MBBI->getDebugLoc();
999 unsigned ImmIdx = MBBI->getNumOperands() - 1;
1000 int Imm = MBBI->getOperand(ImmIdx).getImm();
1002 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1003 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1004
1005 switch (Opc) {
1006 default:
1007 report_fatal_error("No SEH Opcode for this instruction");
1008 case AArch64::STR_ZXI:
1009 case AArch64::LDR_ZXI: {
1010 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1011 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveZReg))
1012 .addImm(Reg0)
1013 .addImm(Imm)
1014 .setMIFlag(Flag);
1015 break;
1016 }
1017 case AArch64::STR_PXI:
1018 case AArch64::LDR_PXI: {
1019 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1020 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SavePReg))
1021 .addImm(Reg0)
1022 .addImm(Imm)
1023 .setMIFlag(Flag);
1024 break;
1025 }
1026 case AArch64::LDPDpost:
1027 Imm = -Imm;
1028 [[fallthrough]];
1029 case AArch64::STPDpre: {
1030 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1031 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1032 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1033 .addImm(Reg0)
1034 .addImm(Reg1)
1035 .addImm(Imm * 8)
1036 .setMIFlag(Flag);
1037 break;
1038 }
1039 case AArch64::LDPXpost:
1040 Imm = -Imm;
1041 [[fallthrough]];
1042 case AArch64::STPXpre: {
1043 Register Reg0 = MBBI->getOperand(1).getReg();
1044 Register Reg1 = MBBI->getOperand(2).getReg();
1045 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1046 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1047 .addImm(Imm * 8)
1048 .setMIFlag(Flag);
1049 else
1050 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1051 .addImm(RegInfo->getSEHRegNum(Reg0))
1052 .addImm(RegInfo->getSEHRegNum(Reg1))
1053 .addImm(Imm * 8)
1054 .setMIFlag(Flag);
1055 break;
1056 }
1057 case AArch64::LDRDpost:
1058 Imm = -Imm;
1059 [[fallthrough]];
1060 case AArch64::STRDpre: {
1061 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1062 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1063 .addImm(Reg)
1064 .addImm(Imm)
1065 .setMIFlag(Flag);
1066 break;
1067 }
1068 case AArch64::LDRXpost:
1069 Imm = -Imm;
1070 [[fallthrough]];
1071 case AArch64::STRXpre: {
1072 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1073 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1074 .addImm(Reg)
1075 .addImm(Imm)
1076 .setMIFlag(Flag);
1077 break;
1078 }
1079 case AArch64::STPDi:
1080 case AArch64::LDPDi: {
1081 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1082 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1083 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1084 .addImm(Reg0)
1085 .addImm(Reg1)
1086 .addImm(Imm * 8)
1087 .setMIFlag(Flag);
1088 break;
1089 }
1090 case AArch64::STPXi:
1091 case AArch64::LDPXi: {
1092 Register Reg0 = MBBI->getOperand(0).getReg();
1093 Register Reg1 = MBBI->getOperand(1).getReg();
1094
1095 int SEHReg0 = RegInfo->getSEHRegNum(Reg0);
1096 int SEHReg1 = RegInfo->getSEHRegNum(Reg1);
1097
1098 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1099 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1100 .addImm(Imm * 8)
1101 .setMIFlag(Flag);
1102 else if (SEHReg0 >= 19 && SEHReg1 >= 19)
1103 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1104 .addImm(SEHReg0)
1105 .addImm(SEHReg1)
1106 .addImm(Imm * 8)
1107 .setMIFlag(Flag);
1108 else
1109 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegIP))
1110 .addImm(SEHReg0)
1111 .addImm(SEHReg1)
1112 .addImm(Imm * 8)
1113 .setMIFlag(Flag);
1114 break;
1115 }
1116 case AArch64::STRXui:
1117 case AArch64::LDRXui: {
1118 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1119 if (Reg >= 19)
1120 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1121 .addImm(Reg)
1122 .addImm(Imm * 8)
1123 .setMIFlag(Flag);
1124 else
1125 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegI))
1126 .addImm(Reg)
1127 .addImm(Imm * 8)
1128 .setMIFlag(Flag);
1129 break;
1130 }
1131 case AArch64::STRDui:
1132 case AArch64::LDRDui: {
1133 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1134 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1135 .addImm(Reg)
1136 .addImm(Imm * 8)
1137 .setMIFlag(Flag);
1138 break;
1139 }
1140 case AArch64::STPQi:
1141 case AArch64::LDPQi: {
1142 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1143 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1144 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1145 .addImm(Reg0)
1146 .addImm(Reg1)
1147 .addImm(Imm * 16)
1148 .setMIFlag(Flag);
1149 break;
1150 }
1151 case AArch64::LDPQpost:
1152 Imm = -Imm;
1153 [[fallthrough]];
1154 case AArch64::STPQpre: {
1155 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1156 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1157 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1158 .addImm(Reg0)
1159 .addImm(Reg1)
1160 .addImm(Imm * 16)
1161 .setMIFlag(Flag);
1162 break;
1163 }
1164 }
1165 auto I = MBB->insertAfter(MBBI, MIB);
1166 return I;
1167}
1168
1171 if (!AFI->needsDwarfUnwindInfo(MF) || !AFI->hasStreamingModeChanges())
1172 return false;
1173 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1174 // is enabled with streaming mode changes.
1175 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1176 if (ST.isTargetDarwin())
1177 return ST.hasSVE();
1178 return true;
1179}
1180
1182 MachineFunction &MF) const {
1183 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1184 const AArch64InstrInfo *TII = Subtarget.getInstrInfo();
1185
1186 auto EmitSignRA = [&](MachineBasicBlock &MBB) {
1187 DebugLoc DL; // Set debug location to unknown.
1189
1190 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1192 };
1193
1194 auto EmitAuthRA = [&](MachineBasicBlock &MBB) {
1195 DebugLoc DL;
1196 MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();
1197 if (MBBI != MBB.end())
1198 DL = MBBI->getDebugLoc();
1199
1200 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_EPILOGUE))
1202 };
1203
1204 // This should be in sync with PEIImpl::calculateSaveRestoreBlocks.
1205 EmitSignRA(MF.front());
1206 for (MachineBasicBlock &MBB : MF) {
1207 if (MBB.isEHFuncletEntry())
1208 EmitSignRA(MBB);
1209 if (MBB.isReturnBlock())
1210 EmitAuthRA(MBB);
1211 }
1212}
1213
1215 MachineBasicBlock &MBB) const {
1216 AArch64PrologueEmitter PrologueEmitter(MF, MBB, *this);
1217 PrologueEmitter.emitPrologue();
1218}
1219
1221 MachineBasicBlock &MBB) const {
1222 AArch64EpilogueEmitter EpilogueEmitter(MF, MBB, *this);
1223 EpilogueEmitter.emitEpilogue();
1224}
1225
1228 MF.getInfo<AArch64FunctionInfo>()->needsDwarfUnwindInfo(MF);
1229}
1230
1232 return enableCFIFixup(MF) &&
1233 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
1234}
1235
1236/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
1237/// debug info. It's the same as what we use for resolving the code-gen
1238/// references for now. FIXME: This can go wrong when references are
1239/// SP-relative and simple call frames aren't used.
1242 Register &FrameReg) const {
1244 MF, FI, FrameReg,
1245 /*PreferFP=*/
1246 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
1247 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
1248 /*ForSimm=*/false);
1249}
1250
1253 int FI) const {
1254 // This function serves to provide a comparable offset from a single reference
1255 // point (the value of SP at function entry) that can be used for analysis,
1256 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
1257 // correct for all objects in the presence of VLA-area objects or dynamic
1258 // stack re-alignment.
1259
1260 const auto &MFI = MF.getFrameInfo();
1261
1262 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1263 StackOffset ZPRStackSize = getZPRStackSize(MF);
1264 StackOffset PPRStackSize = getPPRStackSize(MF);
1265 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1266
1267 // For VLA-area objects, just emit an offset at the end of the stack frame.
1268 // Whilst not quite correct, these objects do live at the end of the frame and
1269 // so it is more useful for analysis for the offset to reflect this.
1270 if (MFI.isVariableSizedObjectIndex(FI)) {
1271 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
1272 }
1273
1274 // This is correct in the absence of any SVE stack objects.
1275 if (!SVEStackSize)
1276 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
1277
1278 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1279 bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
1280 if (MFI.hasScalableStackID(FI)) {
1281 if (FPAfterSVECalleeSaves &&
1282 -ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1283 assert(!AFI->hasSplitSVEObjects() &&
1284 "split-sve-objects not supported with FPAfterSVECalleeSaves");
1285 return StackOffset::getScalable(ObjectOffset);
1286 }
1287 StackOffset AccessOffset{};
1288 // The scalable vectors are below (lower address) the scalable predicates
1289 // with split SVE objects, so we must subtract the size of the predicates.
1290 if (AFI->hasSplitSVEObjects() &&
1291 MFI.getStackID(FI) == TargetStackID::ScalableVector)
1292 AccessOffset = -PPRStackSize;
1293 return AccessOffset +
1294 StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
1295 ObjectOffset);
1296 }
1297
1298 bool IsFixed = MFI.isFixedObjectIndex(FI);
1299 bool IsCSR =
1300 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1301
1302 StackOffset ScalableOffset = {};
1303 if (!IsFixed && !IsCSR) {
1304 ScalableOffset = -SVEStackSize;
1305 } else if (FPAfterSVECalleeSaves && IsCSR) {
1306 ScalableOffset =
1308 }
1309
1310 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
1311}
1312
1318
1319StackOffset AArch64FrameLowering::getFPOffset(const MachineFunction &MF,
1320 int64_t ObjectOffset) const {
1321 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1322 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1323 const Function &F = MF.getFunction();
1324 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1325 unsigned FixedObject =
1326 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
1327 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
1328 int64_t FPAdjust =
1329 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
1330 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
1331}
1332
1333StackOffset AArch64FrameLowering::getStackOffset(const MachineFunction &MF,
1334 int64_t ObjectOffset) const {
1335 const auto &MFI = MF.getFrameInfo();
1336 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
1337}
1338
1339// TODO: This function currently does not work for scalable vectors.
1341 int FI) const {
1342 const AArch64RegisterInfo *RegInfo =
1343 MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
1344 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
1345 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
1346 ? getFPOffset(MF, ObjectOffset).getFixed()
1347 : getStackOffset(MF, ObjectOffset).getFixed();
1348}
1349
1351 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
1352 bool ForSimm) const {
1353 const auto &MFI = MF.getFrameInfo();
1354 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1355 bool isFixed = MFI.isFixedObjectIndex(FI);
1356 auto StackID = static_cast<TargetStackID::Value>(MFI.getStackID(FI));
1357 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, StackID,
1358 FrameReg, PreferFP, ForSimm);
1359}
1360
1362 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed,
1363 TargetStackID::Value StackID, Register &FrameReg, bool PreferFP,
1364 bool ForSimm) const {
1365 const auto &MFI = MF.getFrameInfo();
1366 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1367 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1368 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1369
1370 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
1371 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
1372 bool isCSR =
1373 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1374 bool isSVE = MFI.isScalableStackID(StackID);
1375
1376 StackOffset ZPRStackSize = getZPRStackSize(MF);
1377 StackOffset PPRStackSize = getPPRStackSize(MF);
1378 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1379
1380 // Use frame pointer to reference fixed objects. Use it for locals if
1381 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
1382 // reliable as a base). Make sure useFPForScavengingIndex() does the
1383 // right thing for the emergency spill slot.
1384 bool UseFP = false;
1385 if (AFI->hasStackFrame() && !isSVE) {
1386 // We shouldn't prefer using the FP to access fixed-sized stack objects when
1387 // there are scalable (SVE) objects in between the FP and the fixed-sized
1388 // objects.
1389 PreferFP &= !SVEStackSize;
1390
1391 // Note: Keeping the following as multiple 'if' statements rather than
1392 // merging to a single expression for readability.
1393 //
1394 // Argument access should always use the FP.
1395 if (isFixed) {
1396 UseFP = hasFP(MF);
1397 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
1398 // References to the CSR area must use FP if we're re-aligning the stack
1399 // since the dynamically-sized alignment padding is between the SP/BP and
1400 // the CSR area.
1401 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
1402 UseFP = true;
1403 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
1404 // If the FPOffset is negative and we're producing a signed immediate, we
1405 // have to keep in mind that the available offset range for negative
1406 // offsets is smaller than for positive ones. If an offset is available
1407 // via the FP and the SP, use whichever is closest.
1408 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
1409 PreferFP |= Offset > -FPOffset && !SVEStackSize;
1410
1411 if (FPOffset >= 0) {
1412 // If the FPOffset is positive, that'll always be best, as the SP/BP
1413 // will be even further away.
1414 UseFP = true;
1415 } else if (MFI.hasVarSizedObjects()) {
1416 // If we have variable sized objects, we can use either FP or BP, as the
1417 // SP offset is unknown. We can use the base pointer if we have one and
1418 // FP is not preferred. If not, we're stuck with using FP.
1419 bool CanUseBP = RegInfo->hasBasePointer(MF);
1420 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
1421 UseFP = PreferFP;
1422 else if (!CanUseBP) // Can't use BP. Forced to use FP.
1423 UseFP = true;
1424 // else we can use BP and FP, but the offset from FP won't fit.
1425 // That will make us scavenge registers which we can probably avoid by
1426 // using BP. If it won't fit for BP either, we'll scavenge anyway.
1427 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
1428 // Funclets access the locals contained in the parent's stack frame
1429 // via the frame pointer, so we have to use the FP in the parent
1430 // function.
1431 (void) Subtarget;
1432 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
1433 MF.getFunction().isVarArg()) &&
1434 "Funclets should only be present on Win64");
1435 UseFP = true;
1436 } else {
1437 // We have the choice between FP and (SP or BP).
1438 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
1439 UseFP = true;
1440 }
1441 }
1442 }
1443
1444 assert(
1445 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
1446 "In the presence of dynamic stack pointer realignment, "
1447 "non-argument/CSR objects cannot be accessed through the frame pointer");
1448
1449 bool FPAfterSVECalleeSaves = hasSVECalleeSavesAboveFrameRecord(MF);
1450
1451 if (isSVE) {
1452 StackOffset FPOffset = StackOffset::get(
1453 -AFI->getCalleeSaveBaseToFrameRecordOffset(), ObjectOffset);
1454 StackOffset SPOffset =
1455 SVEStackSize +
1456 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
1457 ObjectOffset);
1458
1459 // With split SVE objects the ObjectOffset is relative to the split area
1460 // (i.e. the PPR area or ZPR area respectively).
1461 if (AFI->hasSplitSVEObjects() && StackID == TargetStackID::ScalableVector) {
1462 // If we're accessing an SVE vector with split SVE objects...
1463 // - From the FP we need to move down past the PPR area:
1464 FPOffset -= PPRStackSize;
1465 // - From the SP we only need to move up to the ZPR area:
1466 SPOffset -= PPRStackSize;
1467 // Note: `SPOffset = SVEStackSize + ...`, so `-= PPRStackSize` results in
1468 // `SPOffset = ZPRStackSize + ...`.
1469 }
1470
1471 if (FPAfterSVECalleeSaves) {
1473 if (-ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1476 }
1477 }
1478
1479 // Always use the FP for SVE spills if available and beneficial.
1480 if (hasFP(MF) && (SPOffset.getFixed() ||
1481 FPOffset.getScalable() < SPOffset.getScalable() ||
1482 RegInfo->hasStackRealignment(MF))) {
1483 FrameReg = RegInfo->getFrameRegister(MF);
1484 return FPOffset;
1485 }
1486 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
1487 : MCRegister(AArch64::SP);
1488
1489 return SPOffset;
1490 }
1491
1492 StackOffset SVEAreaOffset = {};
1493 if (FPAfterSVECalleeSaves) {
1494 // In this stack layout, the FP is in between the callee saves and other
1495 // SVE allocations.
1496 StackOffset SVECalleeSavedStack =
1498 if (UseFP) {
1499 if (isFixed)
1500 SVEAreaOffset = SVECalleeSavedStack;
1501 else if (!isCSR)
1502 SVEAreaOffset = SVECalleeSavedStack - SVEStackSize;
1503 } else {
1504 if (isFixed)
1505 SVEAreaOffset = SVEStackSize;
1506 else if (isCSR)
1507 SVEAreaOffset = SVEStackSize - SVECalleeSavedStack;
1508 }
1509 } else {
1510 if (UseFP && !(isFixed || isCSR))
1511 SVEAreaOffset = -SVEStackSize;
1512 if (!UseFP && (isFixed || isCSR))
1513 SVEAreaOffset = SVEStackSize;
1514 }
1515
1516 if (UseFP) {
1517 FrameReg = RegInfo->getFrameRegister(MF);
1518 return StackOffset::getFixed(FPOffset) + SVEAreaOffset;
1519 }
1520
1521 // Use the base pointer if we have one.
1522 if (RegInfo->hasBasePointer(MF))
1523 FrameReg = RegInfo->getBaseRegister();
1524 else {
1525 assert(!MFI.hasVarSizedObjects() &&
1526 "Can't use SP when we have var sized objects.");
1527 FrameReg = AArch64::SP;
1528 // If we're using the red zone for this function, the SP won't actually
1529 // be adjusted, so the offsets will be negative. They're also all
1530 // within range of the signed 9-bit immediate instructions.
1531 if (canUseRedZone(MF))
1532 Offset -= AFI->getLocalStackSize();
1533 }
1534
1535 return StackOffset::getFixed(Offset) + SVEAreaOffset;
1536}
1537
1538static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
1539 // Do not set a kill flag on values that are also marked as live-in. This
1540 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
1541 // callee saved registers.
1542 // Omitting the kill flags is conservatively correct even if the live-in
1543 // is not used after all.
1544 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
1545 return getKillRegState(!IsLiveIn);
1546}
1547
1549 MachineFunction &MF) {
1550 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1551 AttributeList Attrs = MF.getFunction().getAttributes();
1553 return Subtarget.isTargetMachO() &&
1554 !(Subtarget.getTargetLowering()->supportSwiftError() &&
1555 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
1557 !AFL.requiresSaveVG(MF) && !AFI->isSVECC();
1558}
1559
1560static bool invalidateWindowsRegisterPairing(bool SpillExtendedVolatile,
1561 unsigned SpillCount, unsigned Reg1,
1562 unsigned Reg2, bool NeedsWinCFI,
1563 bool IsFirst,
1564 const TargetRegisterInfo *TRI) {
1565 // If we are generating register pairs for a Windows function that requires
1566 // EH support, then pair consecutive registers only. There are no unwind
1567 // opcodes for saves/restores of non-consecutive register pairs.
1568 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
1569 // save_lrpair.
1570 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
1571
1572 if (Reg2 == AArch64::FP)
1573 return true;
1574 if (!NeedsWinCFI)
1575 return false;
1576
1577 // ARM64EC introduced `save_any_regp`, which expects 16-byte alignment.
1578 // This is handled by only allowing paired spills for registers spilled at
1579 // even positions (which should be 16-byte aligned, as other GPRs/FPRs are
1580 // 8-bytes). We carve out an exception for {FP,LR}, which does not require
1581 // 16-byte alignment in the uop representation.
1582 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
1583 return SpillExtendedVolatile
1584 ? !((Reg1 == AArch64::FP && Reg2 == AArch64::LR) ||
1585 (SpillCount % 2) == 0)
1586 : false;
1587
1588 // If pairing a GPR with LR, the pair can be described by the save_lrpair
1589 // opcode. If this is the first register pair, it would end up with a
1590 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
1591 // if LR is paired with something else than the first register.
1592 // The save_lrpair opcode requires the first register to be an odd one.
1593 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
1594 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
1595 return false;
1596 return true;
1597}
1598
1599/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
1600/// WindowsCFI requires that only consecutive registers can be paired.
1601/// LR and FP need to be allocated together when the frame needs to save
1602/// the frame-record. This means any other register pairing with LR is invalid.
1603static bool invalidateRegisterPairing(bool SpillExtendedVolatile,
1604 unsigned SpillCount, unsigned Reg1,
1605 unsigned Reg2, bool UsesWinAAPCS,
1606 bool NeedsWinCFI, bool NeedsFrameRecord,
1607 bool IsFirst,
1608 const TargetRegisterInfo *TRI) {
1609 if (UsesWinAAPCS)
1610 return invalidateWindowsRegisterPairing(SpillExtendedVolatile, SpillCount,
1611 Reg1, Reg2, NeedsWinCFI, IsFirst,
1612 TRI);
1613
1614 // If we need to store the frame record, don't pair any register
1615 // with LR other than FP.
1616 if (NeedsFrameRecord)
1617 return Reg2 == AArch64::LR;
1618
1619 return false;
1620}
1621
1622namespace {
1623
1624struct RegPairInfo {
1625 Register Reg1;
1626 Register Reg2;
1627 int FrameIdx;
1628 int Offset;
1629 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
1630 const TargetRegisterClass *RC;
1631
1632 RegPairInfo() = default;
1633
1634 bool isPaired() const { return Reg2.isValid(); }
1635
1636 bool isScalable() const { return Type == PPR || Type == ZPR; }
1637};
1638
1639} // end anonymous namespace
1640
1642 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
1643 if (SavedRegs.test(PReg)) {
1644 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
1645 return MCRegister(PNReg);
1646 }
1647 }
1648 return MCRegister();
1649}
1650
1651// The multivector LD/ST are available only for SME or SVE2p1 targets
1653 MachineFunction &MF) {
1655 return false;
1656
1657 SMEAttrs FuncAttrs = MF.getInfo<AArch64FunctionInfo>()->getSMEFnAttrs();
1658 bool IsLocallyStreaming =
1659 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
1660
1661 // Only when in streaming mode SME2 instructions can be safely used.
1662 // It is not safe to use SME2 instructions when in streaming compatible or
1663 // locally streaming mode.
1664 return Subtarget.hasSVE2p1() ||
1665 (Subtarget.hasSME2() &&
1666 (!IsLocallyStreaming && Subtarget.isStreaming()));
1667}
1668
1670 MachineFunction &MF,
1672 const TargetRegisterInfo *TRI,
1674 bool NeedsFrameRecord) {
1675
1676 if (CSI.empty())
1677 return;
1678
1679 bool IsWindows = isTargetWindows(MF);
1681 unsigned StackHazardSize = getStackHazardSize(MF);
1682 MachineFrameInfo &MFI = MF.getFrameInfo();
1684 unsigned Count = CSI.size();
1685 (void)CC;
1686 // MachO's compact unwind format relies on all registers being stored in
1687 // pairs.
1688 assert((!produceCompactUnwindFrame(AFL, MF) ||
1691 (Count & 1) == 0) &&
1692 "Odd number of callee-saved regs to spill!");
1693 int ByteOffset = AFI->getCalleeSavedStackSize();
1694 int StackFillDir = -1;
1695 int RegInc = 1;
1696 unsigned FirstReg = 0;
1697 if (IsWindows) {
1698 // For WinCFI, fill the stack from the bottom up.
1699 ByteOffset = 0;
1700 StackFillDir = 1;
1701 // As the CSI array is reversed to match PrologEpilogInserter, iterate
1702 // backwards, to pair up registers starting from lower numbered registers.
1703 RegInc = -1;
1704 FirstReg = Count - 1;
1705 }
1706
1707 bool FPAfterSVECalleeSaves = AFL.hasSVECalleeSavesAboveFrameRecord(MF);
1708 // Windows AAPCS has x9-x15 as volatile registers, x16-x17 as intra-procedural
1709 // scratch, x18 as platform reserved. However, clang has extended calling
1710 // convensions such as preserve_most and preserve_all which treat these as
1711 // CSR. As such, the ARM64 unwind uOPs bias registers by 19. We use ARM64EC
1712 // uOPs which have separate restrictions. We need to check for that.
1713 //
1714 // NOTE: we currently do not account for the D registers as LLVM does not
1715 // support non-ABI compliant D register spills.
1716 bool SpillExtendedVolatile =
1717 IsWindows && llvm::any_of(CSI, [](const CalleeSavedInfo &CSI) {
1718 const auto &Reg = CSI.getReg();
1719 return Reg >= AArch64::X0 && Reg <= AArch64::X18;
1720 });
1721
1722 int ZPRByteOffset = 0;
1723 int PPRByteOffset = 0;
1724 bool SplitPPRs = AFI->hasSplitSVEObjects();
1725 if (SplitPPRs) {
1726 ZPRByteOffset = AFI->getZPRCalleeSavedStackSize();
1727 PPRByteOffset = AFI->getPPRCalleeSavedStackSize();
1728 } else if (!FPAfterSVECalleeSaves) {
1729 ZPRByteOffset =
1731 // Unused: Everything goes in ZPR space.
1732 PPRByteOffset = 0;
1733 }
1734
1735 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
1736 Register LastReg = 0;
1737 bool HasCSHazardPadding = AFI->hasStackHazardSlotIndex() && !SplitPPRs;
1738
1739 // When iterating backwards, the loop condition relies on unsigned wraparound.
1740 for (unsigned i = FirstReg; i < Count; i += RegInc) {
1741 RegPairInfo RPI;
1742 RPI.Reg1 = CSI[i].getReg();
1743
1744 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
1745 RPI.Type = RegPairInfo::GPR;
1746 RPI.RC = &AArch64::GPR64RegClass;
1747 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
1748 RPI.Type = RegPairInfo::FPR64;
1749 RPI.RC = &AArch64::FPR64RegClass;
1750 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
1751 RPI.Type = RegPairInfo::FPR128;
1752 RPI.RC = &AArch64::FPR128RegClass;
1753 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
1754 RPI.Type = RegPairInfo::ZPR;
1755 RPI.RC = &AArch64::ZPRRegClass;
1756 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
1757 RPI.Type = RegPairInfo::PPR;
1758 RPI.RC = &AArch64::PPRRegClass;
1759 } else if (RPI.Reg1 == AArch64::VG) {
1760 RPI.Type = RegPairInfo::VG;
1761 RPI.RC = &AArch64::FIXED_REGSRegClass;
1762 } else {
1763 llvm_unreachable("Unsupported register class.");
1764 }
1765
1766 int &ScalableByteOffset = RPI.Type == RegPairInfo::PPR && SplitPPRs
1767 ? PPRByteOffset
1768 : ZPRByteOffset;
1769
1770 // Add the stack hazard size as we transition from GPR->FPR CSRs.
1771 if (HasCSHazardPadding &&
1772 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
1774 ByteOffset += StackFillDir * StackHazardSize;
1775 LastReg = RPI.Reg1;
1776
1777 bool NeedsWinCFI = AFL.needsWinCFI(MF);
1778 int Scale = TRI->getSpillSize(*RPI.RC);
1779 // Add the next reg to the pair if it is in the same register class.
1780 if (unsigned(i + RegInc) < Count && !HasCSHazardPadding) {
1781 MCRegister NextReg = CSI[i + RegInc].getReg();
1782 bool IsFirst = i == FirstReg;
1783 unsigned SpillCount = NeedsWinCFI ? FirstReg - i : i;
1784 switch (RPI.Type) {
1785 case RegPairInfo::GPR:
1786 if (AArch64::GPR64RegClass.contains(NextReg) &&
1788 SpillExtendedVolatile, SpillCount, RPI.Reg1, NextReg, IsWindows,
1789 NeedsWinCFI, NeedsFrameRecord, IsFirst, TRI))
1790 RPI.Reg2 = NextReg;
1791 break;
1792 case RegPairInfo::FPR64:
1793 if (AArch64::FPR64RegClass.contains(NextReg) &&
1795 SpillExtendedVolatile, SpillCount, RPI.Reg1, NextReg, IsWindows,
1796 NeedsWinCFI, NeedsFrameRecord, IsFirst, TRI))
1797 RPI.Reg2 = NextReg;
1798 break;
1799 case RegPairInfo::FPR128:
1800 if (AArch64::FPR128RegClass.contains(NextReg))
1801 RPI.Reg2 = NextReg;
1802 break;
1803 case RegPairInfo::PPR:
1804 break;
1805 case RegPairInfo::ZPR:
1806 if (AFI->getPredicateRegForFillSpill() != 0 &&
1807 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
1808 // Calculate offset of register pair to see if pair instruction can be
1809 // used.
1810 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
1811 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
1812 RPI.Reg2 = NextReg;
1813 }
1814 break;
1815 case RegPairInfo::VG:
1816 break;
1817 }
1818 }
1819
1820 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
1821 // list to come in sorted by frame index so that we can issue the store
1822 // pair instructions directly. Assert if we see anything otherwise.
1823 //
1824 // The order of the registers in the list is controlled by
1825 // getCalleeSavedRegs(), so they will always be in-order, as well.
1826 assert((!RPI.isPaired() ||
1827 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
1828 "Out of order callee saved regs!");
1829
1830 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
1831 RPI.Reg1 == AArch64::LR) &&
1832 "FrameRecord must be allocated together with LR");
1833
1834 // Windows AAPCS has FP and LR reversed.
1835 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
1836 RPI.Reg2 == AArch64::LR) &&
1837 "FrameRecord must be allocated together with LR");
1838
1839 // MachO's compact unwind format relies on all registers being stored in
1840 // adjacent register pairs.
1841 assert((!produceCompactUnwindFrame(AFL, MF) ||
1844 (RPI.isPaired() &&
1845 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
1846 RPI.Reg1 + 1 == RPI.Reg2))) &&
1847 "Callee-save registers not saved as adjacent register pair!");
1848
1849 RPI.FrameIdx = CSI[i].getFrameIdx();
1850 if (IsWindows &&
1851 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
1852 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
1853
1854 // Realign the scalable offset if necessary. This is relevant when
1855 // spilling predicates on Windows.
1856 if (RPI.isScalable() && ScalableByteOffset % Scale != 0) {
1857 ScalableByteOffset = alignTo(ScalableByteOffset, Scale);
1858 }
1859
1860 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1861 assert(OffsetPre % Scale == 0);
1862
1863 if (RPI.isScalable())
1864 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1865 else
1866 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1867
1868 // Swift's async context is directly before FP, so allocate an extra
1869 // 8 bytes for it.
1870 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1871 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1872 (IsWindows && RPI.Reg2 == AArch64::LR)))
1873 ByteOffset += StackFillDir * 8;
1874
1875 // Round up size of non-pair to pair size if we need to pad the
1876 // callee-save area to ensure 16-byte alignment.
1877 if (NeedGapToAlignStack && !IsWindows && !RPI.isScalable() &&
1878 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
1879 ByteOffset % 16 != 0) {
1880 ByteOffset += 8 * StackFillDir;
1881 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
1882 // A stack frame with a gap looks like this, bottom up:
1883 // d9, d8. x21, gap, x20, x19.
1884 // Set extra alignment on the x21 object to create the gap above it.
1885 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
1886 NeedGapToAlignStack = false;
1887 }
1888
1889 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1890 assert(OffsetPost % Scale == 0);
1891 // If filling top down (default), we want the offset after incrementing it.
1892 // If filling bottom up (WinCFI) we need the original offset.
1893 int Offset = IsWindows ? OffsetPre : OffsetPost;
1894
1895 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
1896 // Swift context can directly precede FP.
1897 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1898 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1899 (IsWindows && RPI.Reg2 == AArch64::LR)))
1900 Offset += 8;
1901 RPI.Offset = Offset / Scale;
1902
1903 assert((!RPI.isPaired() ||
1904 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
1905 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
1906 "Offset out of bounds for LDP/STP immediate");
1907
1908 auto isFrameRecord = [&] {
1909 if (RPI.isPaired())
1910 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
1911 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
1912 // Otherwise, look for the frame record as two unpaired registers. This is
1913 // needed for -aarch64-stack-hazard-size=<val>, which disables register
1914 // pairing (as the padding may be too large for the LDP/STP offset). Note:
1915 // On Windows, this check works out as current reg == FP, next reg == LR,
1916 // and on other platforms current reg == FP, previous reg == LR. This
1917 // works out as the correct pre-increment or post-increment offsets
1918 // respectively.
1919 return i > 0 && RPI.Reg1 == AArch64::FP &&
1920 CSI[i - 1].getReg() == AArch64::LR;
1921 };
1922
1923 // Save the offset to frame record so that the FP register can point to the
1924 // innermost frame record (spilled FP and LR registers).
1925 if (NeedsFrameRecord && isFrameRecord())
1927
1928 RegPairs.push_back(RPI);
1929 if (RPI.isPaired())
1930 i += RegInc;
1931 }
1932 if (IsWindows) {
1933 // If we need an alignment gap in the stack, align the topmost stack
1934 // object. A stack frame with a gap looks like this, bottom up:
1935 // x19, d8. d9, gap.
1936 // Set extra alignment on the topmost stack object (the first element in
1937 // CSI, which goes top down), to create the gap above it.
1938 if (AFI->hasCalleeSaveStackFreeSpace())
1939 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
1940 // We iterated bottom up over the registers; flip RegPairs back to top
1941 // down order.
1942 std::reverse(RegPairs.begin(), RegPairs.end());
1943 }
1944}
1945
1949 MachineFunction &MF = *MBB.getParent();
1950 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1951 auto &TLI = *Subtarget.getTargetLowering();
1952 const AArch64InstrInfo &TII = *Subtarget.getInstrInfo();
1953 bool NeedsWinCFI = needsWinCFI(MF);
1954 DebugLoc DL;
1956
1957 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
1958
1960 // Refresh the reserved regs in case there are any potential changes since the
1961 // last freeze.
1962 MRI.freezeReservedRegs();
1963
1964 if (homogeneousPrologEpilog(MF)) {
1965 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
1967
1968 for (auto &RPI : RegPairs) {
1969 MIB.addReg(RPI.Reg1);
1970 MIB.addReg(RPI.Reg2);
1971
1972 // Update register live in.
1973 if (!MRI.isReserved(RPI.Reg1))
1974 MBB.addLiveIn(RPI.Reg1);
1975 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
1976 MBB.addLiveIn(RPI.Reg2);
1977 }
1978 return true;
1979 }
1980 bool PTrueCreated = false;
1981 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
1982 Register Reg1 = RPI.Reg1;
1983 Register Reg2 = RPI.Reg2;
1984 unsigned StrOpc;
1985
1986 // Issue sequence of spills for cs regs. The first spill may be converted
1987 // to a pre-decrement store later by emitPrologue if the callee-save stack
1988 // area allocation can't be combined with the local stack area allocation.
1989 // For example:
1990 // stp x22, x21, [sp, #0] // addImm(+0)
1991 // stp x20, x19, [sp, #16] // addImm(+2)
1992 // stp fp, lr, [sp, #32] // addImm(+4)
1993 // Rationale: This sequence saves uop updates compared to a sequence of
1994 // pre-increment spills like stp xi,xj,[sp,#-16]!
1995 // Note: Similar rationale and sequence for restores in epilog.
1996 unsigned Size = TRI->getSpillSize(*RPI.RC);
1997 Align Alignment = TRI->getSpillAlign(*RPI.RC);
1998 switch (RPI.Type) {
1999 case RegPairInfo::GPR:
2000 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
2001 break;
2002 case RegPairInfo::FPR64:
2003 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
2004 break;
2005 case RegPairInfo::FPR128:
2006 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
2007 break;
2008 case RegPairInfo::ZPR:
2009 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
2010 break;
2011 case RegPairInfo::PPR:
2012 StrOpc = AArch64::STR_PXI;
2013 break;
2014 case RegPairInfo::VG:
2015 StrOpc = AArch64::STRXui;
2016 break;
2017 }
2018
2019 Register X0Scratch;
2020 auto RestoreX0 = make_scope_exit([&] {
2021 if (X0Scratch != AArch64::NoRegister)
2022 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), AArch64::X0)
2023 .addReg(X0Scratch)
2025 });
2026
2027 if (Reg1 == AArch64::VG) {
2028 // Find an available register to store value of VG to.
2029 Reg1 = findScratchNonCalleeSaveRegister(&MBB, true);
2030 assert(Reg1 != AArch64::NoRegister);
2031 if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
2032 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
2033 .addImm(31)
2034 .addImm(1)
2036 } else {
2038 if (any_of(MBB.liveins(),
2039 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
2040 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
2041 AArch64::X0, LiveIn.PhysReg);
2042 })) {
2043 X0Scratch = Reg1;
2044 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), X0Scratch)
2045 .addReg(AArch64::X0)
2047 }
2048
2049 RTLIB::Libcall LC = RTLIB::SMEABI_GET_CURRENT_VG;
2050 const uint32_t *RegMask =
2051 TRI->getCallPreservedMask(MF, TLI.getLibcallCallingConv(LC));
2052 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
2053 .addExternalSymbol(TLI.getLibcallName(LC))
2054 .addRegMask(RegMask)
2055 .addReg(AArch64::X0, RegState::ImplicitDefine)
2057 Reg1 = AArch64::X0;
2058 }
2059 }
2060
2061 LLVM_DEBUG({
2062 dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
2063 if (RPI.isPaired())
2064 dbgs() << ", " << printReg(Reg2, TRI);
2065 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2066 if (RPI.isPaired())
2067 dbgs() << ", " << RPI.FrameIdx + 1;
2068 dbgs() << ")\n";
2069 });
2070
2071 assert((!isTargetWindows(MF) ||
2072 !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
2073 "Windows unwdinding requires a consecutive (FP,LR) pair");
2074 // Windows unwind codes require consecutive registers if registers are
2075 // paired. Make the switch here, so that the code below will save (x,x+1)
2076 // and not (x+1,x).
2077 unsigned FrameIdxReg1 = RPI.FrameIdx;
2078 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2079 if (isTargetWindows(MF) && RPI.isPaired()) {
2080 std::swap(Reg1, Reg2);
2081 std::swap(FrameIdxReg1, FrameIdxReg2);
2082 }
2083
2084 if (RPI.isPaired() && RPI.isScalable()) {
2085 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2088 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2089 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2090 "Expects SVE2.1 or SME2 target and a predicate register");
2091#ifdef EXPENSIVE_CHECKS
2092 auto IsPPR = [](const RegPairInfo &c) {
2093 return c.Reg1 == RegPairInfo::PPR;
2094 };
2095 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
2096 auto IsZPR = [](const RegPairInfo &c) {
2097 return c.Type == RegPairInfo::ZPR;
2098 };
2099 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
2100 assert(!(PPRBegin < ZPRBegin) &&
2101 "Expected callee save predicate to be handled first");
2102#endif
2103 if (!PTrueCreated) {
2104 PTrueCreated = true;
2105 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2107 }
2108 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2109 if (!MRI.isReserved(Reg1))
2110 MBB.addLiveIn(Reg1);
2111 if (!MRI.isReserved(Reg2))
2112 MBB.addLiveIn(Reg2);
2113 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
2115 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2116 MachineMemOperand::MOStore, Size, Alignment));
2117 MIB.addReg(PnReg);
2118 MIB.addReg(AArch64::SP)
2119 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
2120 // where 2*vscale is implicit
2123 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2124 MachineMemOperand::MOStore, Size, Alignment));
2125 if (NeedsWinCFI)
2126 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2127 } else { // The code when the pair of ZReg is not present
2128 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2129 if (!MRI.isReserved(Reg1))
2130 MBB.addLiveIn(Reg1);
2131 if (RPI.isPaired()) {
2132 if (!MRI.isReserved(Reg2))
2133 MBB.addLiveIn(Reg2);
2134 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2136 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2137 MachineMemOperand::MOStore, Size, Alignment));
2138 }
2139 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2140 .addReg(AArch64::SP)
2141 .addImm(RPI.Offset) // [sp, #offset*vscale],
2142 // where factor*vscale is implicit
2145 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2146 MachineMemOperand::MOStore, Size, Alignment));
2147 if (NeedsWinCFI)
2148 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2149 }
2150 // Update the StackIDs of the SVE stack slots.
2151 MachineFrameInfo &MFI = MF.getFrameInfo();
2152 if (RPI.Type == RegPairInfo::ZPR) {
2153 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
2154 if (RPI.isPaired())
2155 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
2156 } else if (RPI.Type == RegPairInfo::PPR) {
2158 if (RPI.isPaired())
2160 }
2161 }
2162 return true;
2163}
2164
2168 MachineFunction &MF = *MBB.getParent();
2169 const AArch64InstrInfo &TII =
2170 *MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
2171 DebugLoc DL;
2173 bool NeedsWinCFI = needsWinCFI(MF);
2174
2175 if (MBBI != MBB.end())
2176 DL = MBBI->getDebugLoc();
2177
2178 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
2179 if (homogeneousPrologEpilog(MF, &MBB)) {
2180 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
2182 for (auto &RPI : RegPairs) {
2183 MIB.addReg(RPI.Reg1, RegState::Define);
2184 MIB.addReg(RPI.Reg2, RegState::Define);
2185 }
2186 return true;
2187 }
2188
2189 // For performance reasons restore SVE register in increasing order
2190 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
2191 auto PPRBegin = llvm::find_if(RegPairs, IsPPR);
2192 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
2193 std::reverse(PPRBegin, PPREnd);
2194 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
2195 auto ZPRBegin = llvm::find_if(RegPairs, IsZPR);
2196 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
2197 std::reverse(ZPRBegin, ZPREnd);
2198
2199 bool PTrueCreated = false;
2200 for (const RegPairInfo &RPI : RegPairs) {
2201 Register Reg1 = RPI.Reg1;
2202 Register Reg2 = RPI.Reg2;
2203
2204 // Issue sequence of restores for cs regs. The last restore may be converted
2205 // to a post-increment load later by emitEpilogue if the callee-save stack
2206 // area allocation can't be combined with the local stack area allocation.
2207 // For example:
2208 // ldp fp, lr, [sp, #32] // addImm(+4)
2209 // ldp x20, x19, [sp, #16] // addImm(+2)
2210 // ldp x22, x21, [sp, #0] // addImm(+0)
2211 // Note: see comment in spillCalleeSavedRegisters()
2212 unsigned LdrOpc;
2213 unsigned Size = TRI->getSpillSize(*RPI.RC);
2214 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2215 switch (RPI.Type) {
2216 case RegPairInfo::GPR:
2217 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2218 break;
2219 case RegPairInfo::FPR64:
2220 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
2221 break;
2222 case RegPairInfo::FPR128:
2223 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
2224 break;
2225 case RegPairInfo::ZPR:
2226 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
2227 break;
2228 case RegPairInfo::PPR:
2229 LdrOpc = AArch64::LDR_PXI;
2230 break;
2231 case RegPairInfo::VG:
2232 continue;
2233 }
2234 LLVM_DEBUG({
2235 dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
2236 if (RPI.isPaired())
2237 dbgs() << ", " << printReg(Reg2, TRI);
2238 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2239 if (RPI.isPaired())
2240 dbgs() << ", " << RPI.FrameIdx + 1;
2241 dbgs() << ")\n";
2242 });
2243
2244 // Windows unwind codes require consecutive registers if registers are
2245 // paired. Make the switch here, so that the code below will save (x,x+1)
2246 // and not (x+1,x).
2247 unsigned FrameIdxReg1 = RPI.FrameIdx;
2248 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2249 if (isTargetWindows(MF) && RPI.isPaired()) {
2250 std::swap(Reg1, Reg2);
2251 std::swap(FrameIdxReg1, FrameIdxReg2);
2252 }
2253
2255 if (RPI.isPaired() && RPI.isScalable()) {
2256 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2258 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2259 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2260 "Expects SVE2.1 or SME2 target and a predicate register");
2261#ifdef EXPENSIVE_CHECKS
2262 assert(!(PPRBegin < ZPRBegin) &&
2263 "Expected callee save predicate to be handled first");
2264#endif
2265 if (!PTrueCreated) {
2266 PTrueCreated = true;
2267 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2269 }
2270 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2271 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
2272 getDefRegState(true));
2274 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2275 MachineMemOperand::MOLoad, Size, Alignment));
2276 MIB.addReg(PnReg);
2277 MIB.addReg(AArch64::SP)
2278 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
2279 // where 2*vscale is implicit
2282 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2283 MachineMemOperand::MOLoad, Size, Alignment));
2284 if (NeedsWinCFI)
2285 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2286 } else {
2287 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2288 if (RPI.isPaired()) {
2289 MIB.addReg(Reg2, getDefRegState(true));
2291 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2292 MachineMemOperand::MOLoad, Size, Alignment));
2293 }
2294 MIB.addReg(Reg1, getDefRegState(true));
2295 MIB.addReg(AArch64::SP)
2296 .addImm(RPI.Offset) // [sp, #offset*vscale]
2297 // where factor*vscale is implicit
2300 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2301 MachineMemOperand::MOLoad, Size, Alignment));
2302 if (NeedsWinCFI)
2303 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2304 }
2305 }
2306 return true;
2307}
2308
2309// Return the FrameID for a MMO.
2310static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
2311 const MachineFrameInfo &MFI) {
2312 auto *PSV =
2314 if (PSV)
2315 return std::optional<int>(PSV->getFrameIndex());
2316
2317 if (MMO->getValue()) {
2318 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
2319 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
2320 FI++)
2321 if (MFI.getObjectAllocation(FI) == Al)
2322 return FI;
2323 }
2324 }
2325
2326 return std::nullopt;
2327}
2328
2329// Return the FrameID for a Load/Store instruction by looking at the first MMO.
2330static std::optional<int> getLdStFrameID(const MachineInstr &MI,
2331 const MachineFrameInfo &MFI) {
2332 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
2333 return std::nullopt;
2334
2335 return getMMOFrameID(*MI.memoperands_begin(), MFI);
2336}
2337
2338// Returns true if the LDST MachineInstr \p MI is a PPR access.
2339static bool isPPRAccess(const MachineInstr &MI) {
2340 return AArch64::PPRRegClass.contains(MI.getOperand(0).getReg());
2341}
2342
2343// Check if a Hazard slot is needed for the current function, and if so create
2344// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
2345// which can be used to determine if any hazard padding is needed.
2346void AArch64FrameLowering::determineStackHazardSlot(
2347 MachineFunction &MF, BitVector &SavedRegs) const {
2348 unsigned StackHazardSize = getStackHazardSize(MF);
2349 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2350 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
2352 return;
2353
2354 // Stack hazards are only needed in streaming functions.
2355 SMEAttrs Attrs = AFI->getSMEFnAttrs();
2356 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
2357 return;
2358
2359 MachineFrameInfo &MFI = MF.getFrameInfo();
2360
2361 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
2362 // stack objects.
2363 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2364 return AArch64::FPR64RegClass.contains(Reg) ||
2365 AArch64::FPR128RegClass.contains(Reg) ||
2366 AArch64::ZPRRegClass.contains(Reg);
2367 });
2368 bool HasPPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2369 return AArch64::PPRRegClass.contains(Reg);
2370 });
2371 bool HasFPRStackObjects = false;
2372 bool HasPPRStackObjects = false;
2373 if (!HasFPRCSRs || SplitSVEObjects) {
2374 enum SlotType : uint8_t {
2375 Unknown = 0,
2376 ZPRorFPR = 1 << 0,
2377 PPR = 1 << 1,
2378 GPR = 1 << 2,
2380 };
2381
2382 // Find stack slots solely used for one kind of register (ZPR, PPR, etc.),
2383 // based on the kinds of accesses used in the function.
2384 SmallVector<SlotType> SlotTypes(MFI.getObjectIndexEnd(), SlotType::Unknown);
2385 for (auto &MBB : MF) {
2386 for (auto &MI : MBB) {
2387 std::optional<int> FI = getLdStFrameID(MI, MFI);
2388 if (!FI || FI < 0 || FI > int(SlotTypes.size()))
2389 continue;
2390 if (MFI.hasScalableStackID(*FI)) {
2391 SlotTypes[*FI] |=
2392 isPPRAccess(MI) ? SlotType::PPR : SlotType::ZPRorFPR;
2393 } else {
2394 SlotTypes[*FI] |= AArch64InstrInfo::isFpOrNEON(MI)
2395 ? SlotType::ZPRorFPR
2396 : SlotType::GPR;
2397 }
2398 }
2399 }
2400
2401 for (int FI = 0; FI < int(SlotTypes.size()); ++FI) {
2402 HasFPRStackObjects |= SlotTypes[FI] == SlotType::ZPRorFPR;
2403 // For SplitSVEObjects remember that this stack slot is a predicate, this
2404 // will be needed later when determining the frame layout.
2405 if (SlotTypes[FI] == SlotType::PPR) {
2407 HasPPRStackObjects = true;
2408 }
2409 }
2410 }
2411
2412 if (HasFPRCSRs || HasFPRStackObjects) {
2413 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
2414 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
2415 << StackHazardSize << "\n");
2417 }
2418
2419 if (!AFI->hasStackHazardSlotIndex())
2420 return;
2421
2422 if (SplitSVEObjects) {
2423 CallingConv::ID CC = MF.getFunction().getCallingConv();
2424 if (AFI->isSVECC() || CC == CallingConv::AArch64_SVE_VectorCall) {
2425 AFI->setSplitSVEObjects(true);
2426 LLVM_DEBUG(dbgs() << "Using SplitSVEObjects for SVE CC function\n");
2427 return;
2428 }
2429
2430 // We only use SplitSVEObjects in non-SVE CC functions if there's a
2431 // possibility of a stack hazard between PPRs and ZPRs/FPRs.
2432 LLVM_DEBUG(dbgs() << "Determining if SplitSVEObjects should be used in "
2433 "non-SVE CC function...\n");
2434
2435 // If another calling convention is explicitly set FPRs can't be promoted to
2436 // ZPR callee-saves.
2438 LLVM_DEBUG(
2439 dbgs()
2440 << "Calling convention is not supported with SplitSVEObjects\n");
2441 return;
2442 }
2443
2444 if (!HasPPRCSRs && !HasPPRStackObjects) {
2445 LLVM_DEBUG(
2446 dbgs() << "Not using SplitSVEObjects as no PPRs are on the stack\n");
2447 return;
2448 }
2449
2450 if (!HasFPRCSRs && !HasFPRStackObjects) {
2451 LLVM_DEBUG(
2452 dbgs()
2453 << "Not using SplitSVEObjects as no FPRs or ZPRs are on the stack\n");
2454 return;
2455 }
2456
2457 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2458 MF.getSubtarget<AArch64Subtarget>();
2460 "Expected SVE to be available for PPRs");
2461
2462 const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
2463 // With SplitSVEObjects the CS hazard padding is placed between the
2464 // PPRs and ZPRs. If there are any FPR CS there would be a hazard between
2465 // them and the CS GRPs. Avoid this by promoting all FPR CS to ZPRs.
2466 BitVector FPRZRegs(SavedRegs.size());
2467 for (size_t Reg = 0, E = SavedRegs.size(); HasFPRCSRs && Reg < E; ++Reg) {
2468 BitVector::reference RegBit = SavedRegs[Reg];
2469 if (!RegBit)
2470 continue;
2471 unsigned SubRegIdx = 0;
2472 if (AArch64::FPR64RegClass.contains(Reg))
2473 SubRegIdx = AArch64::dsub;
2474 else if (AArch64::FPR128RegClass.contains(Reg))
2475 SubRegIdx = AArch64::zsub;
2476 else
2477 continue;
2478 // Clear the bit for the FPR save.
2479 RegBit = false;
2480 // Mark that we should save the corresponding ZPR.
2481 Register ZReg =
2482 TRI->getMatchingSuperReg(Reg, SubRegIdx, &AArch64::ZPRRegClass);
2483 FPRZRegs.set(ZReg);
2484 }
2485 SavedRegs |= FPRZRegs;
2486
2487 AFI->setSplitSVEObjects(true);
2488 LLVM_DEBUG(dbgs() << "SplitSVEObjects enabled!\n");
2489 }
2490}
2491
2493 BitVector &SavedRegs,
2494 RegScavenger *RS) const {
2495 // All calls are tail calls in GHC calling conv, and functions have no
2496 // prologue/epilogue.
2498 return;
2499
2500 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2501
2503 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
2505 unsigned UnspilledCSGPR = AArch64::NoRegister;
2506 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
2507
2508 MachineFrameInfo &MFI = MF.getFrameInfo();
2509 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
2510
2511 MCRegister BasePointerReg =
2512 RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister() : MCRegister();
2513
2514 unsigned ExtraCSSpill = 0;
2515 bool HasUnpairedGPR64 = false;
2516 bool HasPairZReg = false;
2517 BitVector UserReservedRegs = RegInfo->getUserReservedRegs(MF);
2518 BitVector ReservedRegs = RegInfo->getReservedRegs(MF);
2519
2520 // Figure out which callee-saved registers to save/restore.
2521 for (unsigned i = 0; CSRegs[i]; ++i) {
2522 const MCRegister Reg = CSRegs[i];
2523
2524 // Add the base pointer register to SavedRegs if it is callee-save.
2525 if (Reg == BasePointerReg)
2526 SavedRegs.set(Reg);
2527
2528 // Don't save manually reserved registers set through +reserve-x#i,
2529 // even for callee-saved registers, as per GCC's behavior.
2530 if (UserReservedRegs[Reg]) {
2531 SavedRegs.reset(Reg);
2532 continue;
2533 }
2534
2535 bool RegUsed = SavedRegs.test(Reg);
2536 MCRegister PairedReg;
2537 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
2538 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
2539 AArch64::FPR128RegClass.contains(Reg)) {
2540 // Compensate for odd numbers of GP CSRs.
2541 // For now, all the known cases of odd number of CSRs are of GPRs.
2542 if (HasUnpairedGPR64)
2543 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
2544 else
2545 PairedReg = CSRegs[i ^ 1];
2546 }
2547
2548 // If the function requires all the GP registers to save (SavedRegs),
2549 // and there are an odd number of GP CSRs at the same time (CSRegs),
2550 // PairedReg could be in a different register class from Reg, which would
2551 // lead to a FPR (usually D8) accidentally being marked saved.
2552 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
2553 PairedReg = AArch64::NoRegister;
2554 HasUnpairedGPR64 = true;
2555 }
2556 assert(PairedReg == AArch64::NoRegister ||
2557 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
2558 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
2559 AArch64::FPR128RegClass.contains(Reg, PairedReg));
2560
2561 if (!RegUsed) {
2562 if (AArch64::GPR64RegClass.contains(Reg) && !ReservedRegs[Reg]) {
2563 UnspilledCSGPR = Reg;
2564 UnspilledCSGPRPaired = PairedReg;
2565 }
2566 continue;
2567 }
2568
2569 // MachO's compact unwind format relies on all registers being stored in
2570 // pairs.
2571 // FIXME: the usual format is actually better if unwinding isn't needed.
2572 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
2573 !SavedRegs.test(PairedReg)) {
2574 SavedRegs.set(PairedReg);
2575 if (AArch64::GPR64RegClass.contains(PairedReg) &&
2576 !ReservedRegs[PairedReg])
2577 ExtraCSSpill = PairedReg;
2578 }
2579 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
2580 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
2581 SavedRegs.test(CSRegs[i ^ 1]));
2582 }
2583
2584 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
2586 // Find a suitable predicate register for the multi-vector spill/fill
2587 // instructions.
2588 MCRegister PnReg = findFreePredicateReg(SavedRegs);
2589 if (PnReg.isValid())
2590 AFI->setPredicateRegForFillSpill(PnReg);
2591 // If no free callee-save has been found assign one.
2592 if (!AFI->getPredicateRegForFillSpill() &&
2593 MF.getFunction().getCallingConv() ==
2595 SavedRegs.set(AArch64::P8);
2596 AFI->setPredicateRegForFillSpill(AArch64::PN8);
2597 }
2598
2599 assert(!ReservedRegs[AFI->getPredicateRegForFillSpill()] &&
2600 "Predicate cannot be a reserved register");
2601 }
2602
2604 !Subtarget.isTargetWindows()) {
2605 // For Windows calling convention on a non-windows OS, where X18 is treated
2606 // as reserved, back up X18 when entering non-windows code (marked with the
2607 // Windows calling convention) and restore when returning regardless of
2608 // whether the individual function uses it - it might call other functions
2609 // that clobber it.
2610 SavedRegs.set(AArch64::X18);
2611 }
2612
2613 // Determine if a Hazard slot should be used and where it should go.
2614 // If SplitSVEObjects is used, the hazard padding is placed between the PPRs
2615 // and ZPRs. Otherwise, it goes in the callee save area.
2616 determineStackHazardSlot(MF, SavedRegs);
2617
2618 // Calculates the callee saved stack size.
2619 unsigned CSStackSize = 0;
2620 unsigned ZPRCSStackSize = 0;
2621 unsigned PPRCSStackSize = 0;
2623 for (unsigned Reg : SavedRegs.set_bits()) {
2624 auto *RC = TRI->getMinimalPhysRegClass(MCRegister(Reg));
2625 assert(RC && "expected register class!");
2626 auto SpillSize = TRI->getSpillSize(*RC);
2627 bool IsZPR = AArch64::ZPRRegClass.contains(Reg);
2628 bool IsPPR = !IsZPR && AArch64::PPRRegClass.contains(Reg);
2629 if (IsZPR)
2630 ZPRCSStackSize += SpillSize;
2631 else if (IsPPR)
2632 PPRCSStackSize += SpillSize;
2633 else
2634 CSStackSize += SpillSize;
2635 }
2636
2637 // Save number of saved regs, so we can easily update CSStackSize later to
2638 // account for any additional 64-bit GPR saves. Note: After this point
2639 // only 64-bit GPRs can be added to SavedRegs.
2640 unsigned NumSavedRegs = SavedRegs.count();
2641
2642 // If we have hazard padding in the CS area add that to the size.
2644 CSStackSize += getStackHazardSize(MF);
2645
2646 // Increase the callee-saved stack size if the function has streaming mode
2647 // changes, as we will need to spill the value of the VG register.
2648 if (requiresSaveVG(MF))
2649 CSStackSize += 8;
2650
2651 // If we must call __arm_get_current_vg in the prologue preserve the LR.
2652 if (requiresSaveVG(MF) && !Subtarget.hasSVE())
2653 SavedRegs.set(AArch64::LR);
2654
2655 // The frame record needs to be created by saving the appropriate registers
2656 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
2657 if (hasFP(MF) ||
2658 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
2659 SavedRegs.set(AArch64::FP);
2660 SavedRegs.set(AArch64::LR);
2661 }
2662
2663 LLVM_DEBUG({
2664 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
2665 for (unsigned Reg : SavedRegs.set_bits())
2666 dbgs() << ' ' << printReg(MCRegister(Reg), RegInfo);
2667 dbgs() << "\n";
2668 });
2669
2670 // If any callee-saved registers are used, the frame cannot be eliminated.
2671 auto [ZPRLocalStackSize, PPRLocalStackSize] =
2673 uint64_t SVELocals = ZPRLocalStackSize + PPRLocalStackSize;
2674 uint64_t SVEStackSize =
2675 alignTo(ZPRCSStackSize + PPRCSStackSize + SVELocals, 16);
2676 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
2677
2678 // The CSR spill slots have not been allocated yet, so estimateStackSize
2679 // won't include them.
2680 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
2681
2682 // We may address some of the stack above the canonical frame address, either
2683 // for our own arguments or during a call. Include that in calculating whether
2684 // we have complicated addressing concerns.
2685 int64_t CalleeStackUsed = 0;
2686 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
2687 int64_t FixedOff = MFI.getObjectOffset(I);
2688 if (FixedOff > CalleeStackUsed)
2689 CalleeStackUsed = FixedOff;
2690 }
2691
2692 // Conservatively always assume BigStack when there are SVE spills.
2693 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
2694 CalleeStackUsed) > EstimatedStackSizeLimit;
2695 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
2696 AFI->setHasStackFrame(true);
2697
2698 // Estimate if we might need to scavenge a register at some point in order
2699 // to materialize a stack offset. If so, either spill one additional
2700 // callee-saved register or reserve a special spill slot to facilitate
2701 // register scavenging. If we already spilled an extra callee-saved register
2702 // above to keep the number of spills even, we don't need to do anything else
2703 // here.
2704 if (BigStack) {
2705 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
2706 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
2707 << " to get a scratch register.\n");
2708 SavedRegs.set(UnspilledCSGPR);
2709 ExtraCSSpill = UnspilledCSGPR;
2710
2711 // MachO's compact unwind format relies on all registers being stored in
2712 // pairs, so if we need to spill one extra for BigStack, then we need to
2713 // store the pair.
2714 if (producePairRegisters(MF)) {
2715 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
2716 // Failed to make a pair for compact unwind format, revert spilling.
2717 if (produceCompactUnwindFrame(*this, MF)) {
2718 SavedRegs.reset(UnspilledCSGPR);
2719 ExtraCSSpill = AArch64::NoRegister;
2720 }
2721 } else
2722 SavedRegs.set(UnspilledCSGPRPaired);
2723 }
2724 }
2725
2726 // If we didn't find an extra callee-saved register to spill, create
2727 // an emergency spill slot.
2728 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
2730 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
2731 unsigned Size = TRI->getSpillSize(RC);
2732 Align Alignment = TRI->getSpillAlign(RC);
2733 int FI = MFI.CreateSpillStackObject(Size, Alignment);
2734 RS->addScavengingFrameIndex(FI);
2735 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
2736 << " as the emergency spill slot.\n");
2737 }
2738 }
2739
2740 // Adding the size of additional 64bit GPR saves.
2741 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
2742
2743 // A Swift asynchronous context extends the frame record with a pointer
2744 // directly before FP.
2745 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
2746 CSStackSize += 8;
2747
2748 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
2749 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
2750 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
2751
2753 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
2754 "Should not invalidate callee saved info");
2755
2756 // Round up to register pair alignment to avoid additional SP adjustment
2757 // instructions.
2758 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
2759 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
2760 AFI->setSVECalleeSavedStackSize(ZPRCSStackSize, alignTo(PPRCSStackSize, 16));
2761}
2762
2764 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
2765 std::vector<CalleeSavedInfo> &CSI) const {
2766 bool IsWindows = isTargetWindows(MF);
2767 unsigned StackHazardSize = getStackHazardSize(MF);
2768 // To match the canonical windows frame layout, reverse the list of
2769 // callee saved registers to get them laid out by PrologEpilogInserter
2770 // in the right order. (PrologEpilogInserter allocates stack objects top
2771 // down. Windows canonical prologs store higher numbered registers at
2772 // the top, thus have the CSI array start from the highest registers.)
2773 if (IsWindows)
2774 std::reverse(CSI.begin(), CSI.end());
2775
2776 if (CSI.empty())
2777 return true; // Early exit if no callee saved registers are modified!
2778
2779 // Now that we know which registers need to be saved and restored, allocate
2780 // stack slots for them.
2781 MachineFrameInfo &MFI = MF.getFrameInfo();
2782 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2783
2784 if (IsWindows && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2785 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
2786 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2787 MFI.setIsCalleeSavedObjectIndex(FrameIdx, true);
2788 }
2789
2790 // Insert VG into the list of CSRs, immediately before LR if saved.
2791 if (requiresSaveVG(MF)) {
2792 CalleeSavedInfo VGInfo(AArch64::VG);
2793 auto It =
2794 find_if(CSI, [](auto &Info) { return Info.getReg() == AArch64::LR; });
2795 if (It != CSI.end())
2796 CSI.insert(It, VGInfo);
2797 else
2798 CSI.push_back(VGInfo);
2799 }
2800
2801 Register LastReg = 0;
2802 int HazardSlotIndex = std::numeric_limits<int>::max();
2803 for (auto &CS : CSI) {
2804 MCRegister Reg = CS.getReg();
2805 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
2806
2807 // Create a hazard slot as we switch between GPR and FPR CSRs.
2809 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
2811 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
2812 "Unexpected register order for hazard slot");
2813 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2814 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2815 << "\n");
2816 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2817 MFI.setIsCalleeSavedObjectIndex(HazardSlotIndex, true);
2818 }
2819
2820 unsigned Size = RegInfo->getSpillSize(*RC);
2821 Align Alignment(RegInfo->getSpillAlign(*RC));
2822 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
2823 CS.setFrameIdx(FrameIdx);
2824 MFI.setIsCalleeSavedObjectIndex(FrameIdx, true);
2825
2826 // Grab 8 bytes below FP for the extended asynchronous frame info.
2827 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !IsWindows &&
2828 Reg == AArch64::FP) {
2829 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
2830 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2831 MFI.setIsCalleeSavedObjectIndex(FrameIdx, true);
2832 }
2833 LastReg = Reg;
2834 }
2835
2836 // Add hazard slot in the case where no FPR CSRs are present.
2838 HazardSlotIndex == std::numeric_limits<int>::max()) {
2839 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2840 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2841 << "\n");
2842 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2843 MFI.setIsCalleeSavedObjectIndex(HazardSlotIndex, true);
2844 }
2845
2846 return true;
2847}
2848
2850 const MachineFunction &MF) const {
2852 // If the function has streaming-mode changes, don't scavenge a
2853 // spillslot in the callee-save area, as that might require an
2854 // 'addvl' in the streaming-mode-changing call-sequence when the
2855 // function doesn't use a FP.
2856 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
2857 return false;
2858 // Don't allow register salvaging with hazard slots, in case it moves objects
2859 // into the wrong place.
2860 if (AFI->hasStackHazardSlotIndex())
2861 return false;
2862 return AFI->hasCalleeSaveStackFreeSpace();
2863}
2864
2865/// returns true if there are any SVE callee saves.
2867 int &Min, int &Max) {
2868 Min = std::numeric_limits<int>::max();
2869 Max = std::numeric_limits<int>::min();
2870
2871 if (!MFI.isCalleeSavedInfoValid())
2872 return false;
2873
2874 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
2875 for (auto &CS : CSI) {
2876 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
2877 AArch64::PPRRegClass.contains(CS.getReg())) {
2878 assert((Max == std::numeric_limits<int>::min() ||
2879 Max + 1 == CS.getFrameIdx()) &&
2880 "SVE CalleeSaves are not consecutive");
2881 Min = std::min(Min, CS.getFrameIdx());
2882 Max = std::max(Max, CS.getFrameIdx());
2883 }
2884 }
2885 return Min != std::numeric_limits<int>::max();
2886}
2887
2889 AssignObjectOffsets AssignOffsets) {
2890 MachineFrameInfo &MFI = MF.getFrameInfo();
2891 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2892
2893 SVEStackSizes SVEStack{};
2894
2895 // With SplitSVEObjects we maintain separate stack offsets for predicates
2896 // (PPRs) and SVE vectors (ZPRs). When SplitSVEObjects is disabled predicates
2897 // are included in the SVE vector area.
2898 uint64_t &ZPRStackTop = SVEStack.ZPRStackSize;
2899 uint64_t &PPRStackTop =
2900 AFI->hasSplitSVEObjects() ? SVEStack.PPRStackSize : SVEStack.ZPRStackSize;
2901
2902#ifndef NDEBUG
2903 // First process all fixed stack objects.
2904 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
2905 assert(!MFI.hasScalableStackID(I) &&
2906 "SVE vectors should never be passed on the stack by value, only by "
2907 "reference.");
2908#endif
2909
2910 auto AllocateObject = [&](int FI) {
2912 ? ZPRStackTop
2913 : PPRStackTop;
2914
2915 // FIXME: Given that the length of SVE vectors is not necessarily a power of
2916 // two, we'd need to align every object dynamically at runtime if the
2917 // alignment is larger than 16. This is not yet supported.
2918 Align Alignment = MFI.getObjectAlign(FI);
2919 if (Alignment > Align(16))
2921 "Alignment of scalable vectors > 16 bytes is not yet supported");
2922
2923 StackTop += MFI.getObjectSize(FI);
2924 StackTop = alignTo(StackTop, Alignment);
2925
2926 assert(StackTop < (uint64_t)std::numeric_limits<int64_t>::max() &&
2927 "SVE StackTop far too large?!");
2928
2929 int64_t Offset = -int64_t(StackTop);
2930 if (AssignOffsets == AssignObjectOffsets::Yes)
2931 MFI.setObjectOffset(FI, Offset);
2932
2933 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
2934 };
2935
2936 // Then process all callee saved slots.
2937 int MinCSFrameIndex, MaxCSFrameIndex;
2938 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
2939 for (int FI = MinCSFrameIndex; FI <= MaxCSFrameIndex; ++FI)
2940 AllocateObject(FI);
2941 }
2942
2943 // Ensure the CS area is 16-byte aligned.
2944 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2945 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2946
2947 // Create a buffer of SVE objects to allocate and sort it.
2948 SmallVector<int, 8> ObjectsToAllocate;
2949 // If we have a stack protector, and we've previously decided that we have SVE
2950 // objects on the stack and thus need it to go in the SVE stack area, then it
2951 // needs to go first.
2952 int StackProtectorFI = -1;
2953 if (MFI.hasStackProtectorIndex()) {
2954 StackProtectorFI = MFI.getStackProtectorIndex();
2955 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
2956 ObjectsToAllocate.push_back(StackProtectorFI);
2957 }
2958
2959 for (int FI = 0, E = MFI.getObjectIndexEnd(); FI != E; ++FI) {
2960 if (FI == StackProtectorFI || MFI.isDeadObjectIndex(FI) ||
2962 continue;
2963
2966 continue;
2967
2968 ObjectsToAllocate.push_back(FI);
2969 }
2970
2971 // Allocate all SVE locals and spills
2972 for (unsigned FI : ObjectsToAllocate)
2973 AllocateObject(FI);
2974
2975 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2976 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2977
2978 if (AssignOffsets == AssignObjectOffsets::Yes)
2979 AFI->setStackSizeSVE(SVEStack.ZPRStackSize, SVEStack.PPRStackSize);
2980
2981 return SVEStack;
2982}
2983
2985 MachineFunction &MF, RegScavenger *RS) const {
2987 "Upwards growing stack unsupported");
2988
2990
2991 // If this function isn't doing Win64-style C++ EH, we don't need to do
2992 // anything.
2993 if (!MF.hasEHFunclets())
2994 return;
2995
2996 MachineFrameInfo &MFI = MF.getFrameInfo();
2997 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2998
2999 // Win64 C++ EH needs to allocate space for the catch objects in the fixed
3000 // object area right next to the UnwindHelp object.
3001 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
3002 int64_t CurrentOffset =
3004 for (WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
3005 for (WinEHHandlerType &H : TBME.HandlerArray) {
3006 int FrameIndex = H.CatchObj.FrameIndex;
3007 if ((FrameIndex != INT_MAX) && MFI.getObjectOffset(FrameIndex) == 0) {
3008 CurrentOffset =
3009 alignTo(CurrentOffset, MFI.getObjectAlign(FrameIndex).value());
3010 CurrentOffset += MFI.getObjectSize(FrameIndex);
3011 MFI.setObjectOffset(FrameIndex, -CurrentOffset);
3012 }
3013 }
3014 }
3015
3016 // Create an UnwindHelp object.
3017 // The UnwindHelp object is allocated at the start of the fixed object area
3018 int64_t UnwindHelpOffset = alignTo(CurrentOffset + 8, Align(16));
3019 assert(UnwindHelpOffset == getFixedObjectSize(MF, AFI, /*IsWin64*/ true,
3020 /*IsFunclet*/ false) &&
3021 "UnwindHelpOffset must be at the start of the fixed object area");
3022 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8, -UnwindHelpOffset,
3023 /*IsImmutable=*/false);
3024 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
3025
3026 MachineBasicBlock &MBB = MF.front();
3027 auto MBBI = MBB.begin();
3028 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
3029 ++MBBI;
3030
3031 // We need to store -2 into the UnwindHelp object at the start of the
3032 // function.
3033 DebugLoc DL;
3034 RS->enterBasicBlockEnd(MBB);
3035 RS->backward(MBBI);
3036 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
3037 assert(DstReg && "There must be a free register after frame setup");
3038 const AArch64InstrInfo &TII =
3039 *MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3040 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3041 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3042 .addReg(DstReg, getKillRegState(true))
3043 .addFrameIndex(UnwindHelpFI)
3044 .addImm(0);
3045}
3046
3047namespace {
3048struct TagStoreInstr {
3050 int64_t Offset, Size;
3051 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3052 : MI(MI), Offset(Offset), Size(Size) {}
3053};
3054
3055class TagStoreEdit {
3056 MachineFunction *MF;
3057 MachineBasicBlock *MBB;
3058 MachineRegisterInfo *MRI;
3059 // Tag store instructions that are being replaced.
3061 // Combined memref arguments of the above instructions.
3063
3064 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3065 // FrameRegOffset + Size) with the address tag of SP.
3066 Register FrameReg;
3067 StackOffset FrameRegOffset;
3068 int64_t Size;
3069 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3070 // end.
3071 std::optional<int64_t> FrameRegUpdate;
3072 // MIFlags for any FrameReg updating instructions.
3073 unsigned FrameRegUpdateFlags;
3074
3075 // Use zeroing instruction variants.
3076 bool ZeroData;
3077 DebugLoc DL;
3078
3079 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3080 void emitLoop(MachineBasicBlock::iterator InsertI);
3081
3082public:
3083 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3084 : MBB(MBB), ZeroData(ZeroData) {
3085 MF = MBB->getParent();
3086 MRI = &MF->getRegInfo();
3087 }
3088 // Add an instruction to be replaced. Instructions must be added in the
3089 // ascending order of Offset, and have to be adjacent.
3090 void addInstruction(TagStoreInstr I) {
3091 assert((TagStores.empty() ||
3092 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3093 "Non-adjacent tag store instructions.");
3094 TagStores.push_back(I);
3095 }
3096 void clear() { TagStores.clear(); }
3097 // Emit equivalent code at the given location, and erase the current set of
3098 // instructions. May skip if the replacement is not profitable. May invalidate
3099 // the input iterator and replace it with a valid one.
3100 void emitCode(MachineBasicBlock::iterator &InsertI,
3101 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3102};
3103
3104void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3105 const AArch64InstrInfo *TII =
3106 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3107
3108 const int64_t kMinOffset = -256 * 16;
3109 const int64_t kMaxOffset = 255 * 16;
3110
3111 Register BaseReg = FrameReg;
3112 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3113 if (BaseRegOffsetBytes < kMinOffset ||
3114 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3115 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3116 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3117 // is required for the offset of ST2G.
3118 BaseRegOffsetBytes % 16 != 0) {
3119 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3120 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3121 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3122 BaseReg = ScratchReg;
3123 BaseRegOffsetBytes = 0;
3124 }
3125
3126 MachineInstr *LastI = nullptr;
3127 while (Size) {
3128 int64_t InstrSize = (Size > 16) ? 32 : 16;
3129 unsigned Opcode =
3130 InstrSize == 16
3131 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3132 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3133 assert(BaseRegOffsetBytes % 16 == 0);
3134 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3135 .addReg(AArch64::SP)
3136 .addReg(BaseReg)
3137 .addImm(BaseRegOffsetBytes / 16)
3138 .setMemRefs(CombinedMemRefs);
3139 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3140 // final SP adjustment in the epilogue.
3141 if (BaseRegOffsetBytes == 0)
3142 LastI = I;
3143 BaseRegOffsetBytes += InstrSize;
3144 Size -= InstrSize;
3145 }
3146
3147 if (LastI)
3148 MBB->splice(InsertI, MBB, LastI);
3149}
3150
3151void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3152 const AArch64InstrInfo *TII =
3153 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3154
3155 Register BaseReg = FrameRegUpdate
3156 ? FrameReg
3157 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3158 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3159
3160 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3161
3162 int64_t LoopSize = Size;
3163 // If the loop size is not a multiple of 32, split off one 16-byte store at
3164 // the end to fold BaseReg update into.
3165 if (FrameRegUpdate && *FrameRegUpdate)
3166 LoopSize -= LoopSize % 32;
3167 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3168 TII->get(ZeroData ? AArch64::STZGloop_wback
3169 : AArch64::STGloop_wback))
3170 .addDef(SizeReg)
3171 .addDef(BaseReg)
3172 .addImm(LoopSize)
3173 .addReg(BaseReg)
3174 .setMemRefs(CombinedMemRefs);
3175 if (FrameRegUpdate)
3176 LoopI->setFlags(FrameRegUpdateFlags);
3177
3178 int64_t ExtraBaseRegUpdate =
3179 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3180 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
3181 << ", Size=" << Size
3182 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
3183 << ", FrameRegUpdate=" << FrameRegUpdate
3184 << ", FrameRegOffset.getFixed()="
3185 << FrameRegOffset.getFixed() << "\n");
3186 if (LoopSize < Size) {
3187 assert(FrameRegUpdate);
3188 assert(Size - LoopSize == 16);
3189 // Tag 16 more bytes at BaseReg and update BaseReg.
3190 int64_t STGOffset = ExtraBaseRegUpdate + 16;
3191 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
3192 "STG immediate out of range");
3193 BuildMI(*MBB, InsertI, DL,
3194 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3195 .addDef(BaseReg)
3196 .addReg(BaseReg)
3197 .addReg(BaseReg)
3198 .addImm(STGOffset / 16)
3199 .setMemRefs(CombinedMemRefs)
3200 .setMIFlags(FrameRegUpdateFlags);
3201 } else if (ExtraBaseRegUpdate) {
3202 // Update BaseReg.
3203 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
3204 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
3205 BuildMI(
3206 *MBB, InsertI, DL,
3207 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3208 .addDef(BaseReg)
3209 .addReg(BaseReg)
3210 .addImm(AddSubOffset)
3211 .addImm(0)
3212 .setMIFlags(FrameRegUpdateFlags);
3213 }
3214}
3215
3216// Check if *II is a register update that can be merged into STGloop that ends
3217// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3218// end of the loop.
3219bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3220 int64_t Size, int64_t *TotalOffset) {
3221 MachineInstr &MI = *II;
3222 if ((MI.getOpcode() == AArch64::ADDXri ||
3223 MI.getOpcode() == AArch64::SUBXri) &&
3224 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3225 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3226 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3227 if (MI.getOpcode() == AArch64::SUBXri)
3228 Offset = -Offset;
3229 int64_t PostOffset = Offset - Size;
3230 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
3231 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
3232 // chosen depends on the alignment of the loop size, but the difference
3233 // between the valid ranges for the two instructions is small, so we
3234 // conservatively assume that it could be either case here.
3235 //
3236 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
3237 // instruction.
3238 const int64_t kMaxOffset = 4080 - 16;
3239 // Max offset of SUBXri.
3240 const int64_t kMinOffset = -4095;
3241 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
3242 PostOffset % 16 == 0) {
3243 *TotalOffset = Offset;
3244 return true;
3245 }
3246 }
3247 return false;
3248}
3249
3250void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3252 MemRefs.clear();
3253 for (auto &TS : TSE) {
3254 MachineInstr *MI = TS.MI;
3255 // An instruction without memory operands may access anything. Be
3256 // conservative and return an empty list.
3257 if (MI->memoperands_empty()) {
3258 MemRefs.clear();
3259 return;
3260 }
3261 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3262 }
3263}
3264
3265void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3266 const AArch64FrameLowering *TFI,
3267 bool TryMergeSPUpdate) {
3268 if (TagStores.empty())
3269 return;
3270 TagStoreInstr &FirstTagStore = TagStores[0];
3271 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3272 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3273 DL = TagStores[0].MI->getDebugLoc();
3274
3275 Register Reg;
3276 FrameRegOffset = TFI->resolveFrameOffsetReference(
3277 *MF, FirstTagStore.Offset, false /*isFixed*/,
3278 TargetStackID::Default /*StackID*/, Reg,
3279 /*PreferFP=*/false, /*ForSimm=*/true);
3280 FrameReg = Reg;
3281 FrameRegUpdate = std::nullopt;
3282
3283 mergeMemRefs(TagStores, CombinedMemRefs);
3284
3285 LLVM_DEBUG({
3286 dbgs() << "Replacing adjacent STG instructions:\n";
3287 for (const auto &Instr : TagStores) {
3288 dbgs() << " " << *Instr.MI;
3289 }
3290 });
3291
3292 // Size threshold where a loop becomes shorter than a linear sequence of
3293 // tagging instructions.
3294 const int kSetTagLoopThreshold = 176;
3295 if (Size < kSetTagLoopThreshold) {
3296 if (TagStores.size() < 2)
3297 return;
3298 emitUnrolled(InsertI);
3299 } else {
3300 MachineInstr *UpdateInstr = nullptr;
3301 int64_t TotalOffset = 0;
3302 if (TryMergeSPUpdate) {
3303 // See if we can merge base register update into the STGloop.
3304 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
3305 // but STGloop is way too unusual for that, and also it only
3306 // realistically happens in function epilogue. Also, STGloop is expanded
3307 // before that pass.
3308 if (InsertI != MBB->end() &&
3309 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
3310 &TotalOffset)) {
3311 UpdateInstr = &*InsertI++;
3312 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
3313 << *UpdateInstr);
3314 }
3315 }
3316
3317 if (!UpdateInstr && TagStores.size() < 2)
3318 return;
3319
3320 if (UpdateInstr) {
3321 FrameRegUpdate = TotalOffset;
3322 FrameRegUpdateFlags = UpdateInstr->getFlags();
3323 }
3324 emitLoop(InsertI);
3325 if (UpdateInstr)
3326 UpdateInstr->eraseFromParent();
3327 }
3328
3329 for (auto &TS : TagStores)
3330 TS.MI->eraseFromParent();
3331}
3332
3333bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
3334 int64_t &Size, bool &ZeroData) {
3335 MachineFunction &MF = *MI.getParent()->getParent();
3336 const MachineFrameInfo &MFI = MF.getFrameInfo();
3337
3338 unsigned Opcode = MI.getOpcode();
3339 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
3340 Opcode == AArch64::STZ2Gi);
3341
3342 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
3343 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
3344 return false;
3345 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
3346 return false;
3347 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
3348 Size = MI.getOperand(2).getImm();
3349 return true;
3350 }
3351
3352 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
3353 Size = 16;
3354 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
3355 Size = 32;
3356 else
3357 return false;
3358
3359 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
3360 return false;
3361
3362 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
3363 16 * MI.getOperand(2).getImm();
3364 return true;
3365}
3366
3367// Detect a run of memory tagging instructions for adjacent stack frame slots,
3368// and replace them with a shorter instruction sequence:
3369// * replace STG + STG with ST2G
3370// * replace STGloop + STGloop with STGloop
3371// This code needs to run when stack slot offsets are already known, but before
3372// FrameIndex operands in STG instructions are eliminated.
3374 const AArch64FrameLowering *TFI,
3375 RegScavenger *RS) {
3376 bool FirstZeroData;
3377 int64_t Size, Offset;
3378 MachineInstr &MI = *II;
3379 MachineBasicBlock *MBB = MI.getParent();
3381 if (&MI == &MBB->instr_back())
3382 return II;
3383 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
3384 return II;
3385
3387 Instrs.emplace_back(&MI, Offset, Size);
3388
3389 constexpr int kScanLimit = 10;
3390 int Count = 0;
3392 NextI != E && Count < kScanLimit; ++NextI) {
3393 MachineInstr &MI = *NextI;
3394 bool ZeroData;
3395 int64_t Size, Offset;
3396 // Collect instructions that update memory tags with a FrameIndex operand
3397 // and (when applicable) constant size, and whose output registers are dead
3398 // (the latter is almost always the case in practice). Since these
3399 // instructions effectively have no inputs or outputs, we are free to skip
3400 // any non-aliasing instructions in between without tracking used registers.
3401 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
3402 if (ZeroData != FirstZeroData)
3403 break;
3404 Instrs.emplace_back(&MI, Offset, Size);
3405 continue;
3406 }
3407
3408 // Only count non-transient, non-tagging instructions toward the scan
3409 // limit.
3410 if (!MI.isTransient())
3411 ++Count;
3412
3413 // Just in case, stop before the epilogue code starts.
3414 if (MI.getFlag(MachineInstr::FrameSetup) ||
3416 break;
3417
3418 // Reject anything that may alias the collected instructions.
3419 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
3420 break;
3421 }
3422
3423 // New code will be inserted after the last tagging instruction we've found.
3424 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
3425
3426 // All the gathered stack tag instructions are merged and placed after
3427 // last tag store in the list. The check should be made if the nzcv
3428 // flag is live at the point where we are trying to insert. Otherwise
3429 // the nzcv flag might get clobbered if any stg loops are present.
3430
3431 // FIXME : This approach of bailing out from merge is conservative in
3432 // some ways like even if stg loops are not present after merge the
3433 // insert list, this liveness check is done (which is not needed).
3435 LiveRegs.addLiveOuts(*MBB);
3436 for (auto I = MBB->rbegin();; ++I) {
3437 MachineInstr &MI = *I;
3438 if (MI == InsertI)
3439 break;
3440 LiveRegs.stepBackward(*I);
3441 }
3442 InsertI++;
3443 if (LiveRegs.contains(AArch64::NZCV))
3444 return InsertI;
3445
3446 llvm::stable_sort(Instrs,
3447 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
3448 return Left.Offset < Right.Offset;
3449 });
3450
3451 // Make sure that we don't have any overlapping stores.
3452 int64_t CurOffset = Instrs[0].Offset;
3453 for (auto &Instr : Instrs) {
3454 if (CurOffset > Instr.Offset)
3455 return NextI;
3456 CurOffset = Instr.Offset + Instr.Size;
3457 }
3458
3459 // Find contiguous runs of tagged memory and emit shorter instruction
3460 // sequences for them when possible.
3461 TagStoreEdit TSE(MBB, FirstZeroData);
3462 std::optional<int64_t> EndOffset;
3463 for (auto &Instr : Instrs) {
3464 if (EndOffset && *EndOffset != Instr.Offset) {
3465 // Found a gap.
3466 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
3467 TSE.clear();
3468 }
3469
3470 TSE.addInstruction(Instr);
3471 EndOffset = Instr.Offset + Instr.Size;
3472 }
3473
3474 const MachineFunction *MF = MBB->getParent();
3475 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
3476 TSE.emitCode(
3477 InsertI, TFI, /*TryMergeSPUpdate = */
3479
3480 return InsertI;
3481}
3482} // namespace
3483
3485 MachineFunction &MF, RegScavenger *RS = nullptr) const {
3486 for (auto &BB : MF)
3487 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
3489 II = tryMergeAdjacentSTG(II, this, RS);
3490 }
3491
3492 // By the time this method is called, most of the prologue/epilogue code is
3493 // already emitted, whether its location was affected by the shrink-wrapping
3494 // optimization or not.
3495 if (!MF.getFunction().hasFnAttribute(Attribute::Naked) &&
3496 shouldSignReturnAddressEverywhere(MF))
3498}
3499
3500/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
3501/// before the update. This is easily retrieved as it is exactly the offset
3502/// that is set in processFunctionBeforeFrameFinalized.
3504 const MachineFunction &MF, int FI, Register &FrameReg,
3505 bool IgnoreSPUpdates) const {
3506 const MachineFrameInfo &MFI = MF.getFrameInfo();
3507 if (IgnoreSPUpdates) {
3508 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
3509 << MFI.getObjectOffset(FI) << "\n");
3510 FrameReg = AArch64::SP;
3511 return StackOffset::getFixed(MFI.getObjectOffset(FI));
3512 }
3513
3514 // Go to common code if we cannot provide sp + offset.
3515 if (MFI.hasVarSizedObjects() ||
3518 return getFrameIndexReference(MF, FI, FrameReg);
3519
3520 FrameReg = AArch64::SP;
3521 return getStackOffset(MF, MFI.getObjectOffset(FI));
3522}
3523
3524/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
3525/// the parent's frame pointer
3527 const MachineFunction &MF) const {
3528 return 0;
3529}
3530
3531/// Funclets only need to account for space for the callee saved registers,
3532/// as the locals are accounted for in the parent's stack frame.
3534 const MachineFunction &MF) const {
3535 // This is the size of the pushed CSRs.
3536 unsigned CSSize =
3537 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
3538 // This is the amount of stack a funclet needs to allocate.
3539 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
3540 getStackAlign());
3541}
3542
3543namespace {
3544struct FrameObject {
3545 bool IsValid = false;
3546 // Index of the object in MFI.
3547 int ObjectIndex = 0;
3548 // Group ID this object belongs to.
3549 int GroupIndex = -1;
3550 // This object should be placed first (closest to SP).
3551 bool ObjectFirst = false;
3552 // This object's group (which always contains the object with
3553 // ObjectFirst==true) should be placed first.
3554 bool GroupFirst = false;
3555
3556 // Used to distinguish between FP and GPR accesses. The values are decided so
3557 // that they sort FPR < Hazard < GPR and they can be or'd together.
3558 unsigned Accesses = 0;
3559 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
3560};
3561
3562class GroupBuilder {
3563 SmallVector<int, 8> CurrentMembers;
3564 int NextGroupIndex = 0;
3565 std::vector<FrameObject> &Objects;
3566
3567public:
3568 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
3569 void AddMember(int Index) { CurrentMembers.push_back(Index); }
3570 void EndCurrentGroup() {
3571 if (CurrentMembers.size() > 1) {
3572 // Create a new group with the current member list. This might remove them
3573 // from their pre-existing groups. That's OK, dealing with overlapping
3574 // groups is too hard and unlikely to make a difference.
3575 LLVM_DEBUG(dbgs() << "group:");
3576 for (int Index : CurrentMembers) {
3577 Objects[Index].GroupIndex = NextGroupIndex;
3578 LLVM_DEBUG(dbgs() << " " << Index);
3579 }
3580 LLVM_DEBUG(dbgs() << "\n");
3581 NextGroupIndex++;
3582 }
3583 CurrentMembers.clear();
3584 }
3585};
3586
3587bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
3588 // Objects at a lower index are closer to FP; objects at a higher index are
3589 // closer to SP.
3590 //
3591 // For consistency in our comparison, all invalid objects are placed
3592 // at the end. This also allows us to stop walking when we hit the
3593 // first invalid item after it's all sorted.
3594 //
3595 // If we want to include a stack hazard region, order FPR accesses < the
3596 // hazard object < GPRs accesses in order to create a separation between the
3597 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
3598 //
3599 // Otherwise the "first" object goes first (closest to SP), followed by the
3600 // members of the "first" group.
3601 //
3602 // The rest are sorted by the group index to keep the groups together.
3603 // Higher numbered groups are more likely to be around longer (i.e. untagged
3604 // in the function epilogue and not at some earlier point). Place them closer
3605 // to SP.
3606 //
3607 // If all else equal, sort by the object index to keep the objects in the
3608 // original order.
3609 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
3610 A.GroupIndex, A.ObjectIndex) <
3611 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
3612 B.GroupIndex, B.ObjectIndex);
3613}
3614} // namespace
3615
3617 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
3619
3620 if ((!OrderFrameObjects && !AFI.hasSplitSVEObjects()) ||
3621 ObjectsToAllocate.empty())
3622 return;
3623
3624 const MachineFrameInfo &MFI = MF.getFrameInfo();
3625 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
3626 for (auto &Obj : ObjectsToAllocate) {
3627 FrameObjects[Obj].IsValid = true;
3628 FrameObjects[Obj].ObjectIndex = Obj;
3629 }
3630
3631 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
3632 // the same time.
3633 GroupBuilder GB(FrameObjects);
3634 for (auto &MBB : MF) {
3635 for (auto &MI : MBB) {
3636 if (MI.isDebugInstr())
3637 continue;
3638
3639 if (AFI.hasStackHazardSlotIndex()) {
3640 std::optional<int> FI = getLdStFrameID(MI, MFI);
3641 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3642 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3644 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
3645 else
3646 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
3647 }
3648 }
3649
3650 int OpIndex;
3651 switch (MI.getOpcode()) {
3652 case AArch64::STGloop:
3653 case AArch64::STZGloop:
3654 OpIndex = 3;
3655 break;
3656 case AArch64::STGi:
3657 case AArch64::STZGi:
3658 case AArch64::ST2Gi:
3659 case AArch64::STZ2Gi:
3660 OpIndex = 1;
3661 break;
3662 default:
3663 OpIndex = -1;
3664 }
3665
3666 int TaggedFI = -1;
3667 if (OpIndex >= 0) {
3668 const MachineOperand &MO = MI.getOperand(OpIndex);
3669 if (MO.isFI()) {
3670 int FI = MO.getIndex();
3671 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
3672 FrameObjects[FI].IsValid)
3673 TaggedFI = FI;
3674 }
3675 }
3676
3677 // If this is a stack tagging instruction for a slot that is not part of a
3678 // group yet, either start a new group or add it to the current one.
3679 if (TaggedFI >= 0)
3680 GB.AddMember(TaggedFI);
3681 else
3682 GB.EndCurrentGroup();
3683 }
3684 // Groups should never span multiple basic blocks.
3685 GB.EndCurrentGroup();
3686 }
3687
3688 if (AFI.hasStackHazardSlotIndex()) {
3689 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
3690 FrameObject::AccessHazard;
3691 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
3692 for (auto &Obj : FrameObjects)
3693 if (!Obj.Accesses ||
3694 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
3695 Obj.Accesses = FrameObject::AccessGPR;
3696 }
3697
3698 // If the function's tagged base pointer is pinned to a stack slot, we want to
3699 // put that slot first when possible. This will likely place it at SP + 0,
3700 // and save one instruction when generating the base pointer because IRG does
3701 // not allow an immediate offset.
3702 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
3703 if (TBPI) {
3704 FrameObjects[*TBPI].ObjectFirst = true;
3705 FrameObjects[*TBPI].GroupFirst = true;
3706 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
3707 if (FirstGroupIndex >= 0)
3708 for (FrameObject &Object : FrameObjects)
3709 if (Object.GroupIndex == FirstGroupIndex)
3710 Object.GroupFirst = true;
3711 }
3712
3713 llvm::stable_sort(FrameObjects, FrameObjectCompare);
3714
3715 int i = 0;
3716 for (auto &Obj : FrameObjects) {
3717 // All invalid items are sorted at the end, so it's safe to stop.
3718 if (!Obj.IsValid)
3719 break;
3720 ObjectsToAllocate[i++] = Obj.ObjectIndex;
3721 }
3722
3723 LLVM_DEBUG({
3724 dbgs() << "Final frame order:\n";
3725 for (auto &Obj : FrameObjects) {
3726 if (!Obj.IsValid)
3727 break;
3728 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
3729 if (Obj.ObjectFirst)
3730 dbgs() << ", first";
3731 if (Obj.GroupFirst)
3732 dbgs() << ", group-first";
3733 dbgs() << "\n";
3734 }
3735 });
3736}
3737
3738/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
3739/// least every ProbeSize bytes. Returns an iterator of the first instruction
3740/// after the loop. The difference between SP and TargetReg must be an exact
3741/// multiple of ProbeSize.
3743AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
3744 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
3745 Register TargetReg) const {
3746 MachineBasicBlock &MBB = *MBBI->getParent();
3747 MachineFunction &MF = *MBB.getParent();
3748 const AArch64InstrInfo *TII =
3749 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3750 DebugLoc DL = MBB.findDebugLoc(MBBI);
3751
3752 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
3753 MachineBasicBlock *LoopMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3754 MF.insert(MBBInsertPoint, LoopMBB);
3755 MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3756 MF.insert(MBBInsertPoint, ExitMBB);
3757
3758 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
3759 // in SUB).
3760 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
3761 StackOffset::getFixed(-ProbeSize), TII,
3763 // STR XZR, [SP]
3764 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
3765 .addReg(AArch64::XZR)
3766 .addReg(AArch64::SP)
3767 .addImm(0)
3769 // CMP SP, TargetReg
3770 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
3771 AArch64::XZR)
3772 .addReg(AArch64::SP)
3773 .addReg(TargetReg)
3776 // B.CC Loop
3777 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
3779 .addMBB(LoopMBB)
3781
3782 LoopMBB->addSuccessor(ExitMBB);
3783 LoopMBB->addSuccessor(LoopMBB);
3784 // Synthesize the exit MBB.
3785 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
3787 MBB.addSuccessor(LoopMBB);
3788 // Update liveins.
3789 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
3790
3791 return ExitMBB->begin();
3792}
3793
3794void AArch64FrameLowering::inlineStackProbeFixed(
3795 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
3796 StackOffset CFAOffset) const {
3797 MachineBasicBlock *MBB = MBBI->getParent();
3798 MachineFunction &MF = *MBB->getParent();
3799 const AArch64InstrInfo *TII =
3800 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3801 AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
3802 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
3803 bool HasFP = hasFP(MF);
3804
3805 DebugLoc DL;
3806 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
3807 int64_t NumBlocks = FrameSize / ProbeSize;
3808 int64_t ResidualSize = FrameSize % ProbeSize;
3809
3810 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
3811 << NumBlocks << " blocks of " << ProbeSize
3812 << " bytes, plus " << ResidualSize << " bytes\n");
3813
3814 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
3815 // ordinary loop.
3816 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
3817 for (int i = 0; i < NumBlocks; ++i) {
3818 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
3819 // encodable in a SUB).
3820 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3821 StackOffset::getFixed(-ProbeSize), TII,
3822 MachineInstr::FrameSetup, false, false, nullptr,
3823 EmitAsyncCFI && !HasFP, CFAOffset);
3824 CFAOffset += StackOffset::getFixed(ProbeSize);
3825 // STR XZR, [SP]
3826 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
3827 .addReg(AArch64::XZR)
3828 .addReg(AArch64::SP)
3829 .addImm(0)
3831 }
3832 } else if (NumBlocks != 0) {
3833 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
3834 // encodable in ADD). ScrathReg may temporarily become the CFA register.
3835 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
3836 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
3837 MachineInstr::FrameSetup, false, false, nullptr,
3838 EmitAsyncCFI && !HasFP, CFAOffset);
3839 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
3840 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
3841 MBB = MBBI->getParent();
3842 if (EmitAsyncCFI && !HasFP) {
3843 // Set the CFA register back to SP.
3844 CFIInstBuilder(*MBB, MBBI, MachineInstr::FrameSetup)
3845 .buildDefCFARegister(AArch64::SP);
3846 }
3847 }
3848
3849 if (ResidualSize != 0) {
3850 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
3851 // in SUB).
3852 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3853 StackOffset::getFixed(-ResidualSize), TII,
3854 MachineInstr::FrameSetup, false, false, nullptr,
3855 EmitAsyncCFI && !HasFP, CFAOffset);
3856 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
3857 // STR XZR, [SP]
3858 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
3859 .addReg(AArch64::XZR)
3860 .addReg(AArch64::SP)
3861 .addImm(0)
3863 }
3864 }
3865}
3866
3867void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
3868 MachineBasicBlock &MBB) const {
3869 // Get the instructions that need to be replaced. We emit at most two of
3870 // these. Remember them in order to avoid complications coming from the need
3871 // to traverse the block while potentially creating more blocks.
3872 SmallVector<MachineInstr *, 4> ToReplace;
3873 for (MachineInstr &MI : MBB)
3874 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
3875 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
3876 ToReplace.push_back(&MI);
3877
3878 for (MachineInstr *MI : ToReplace) {
3879 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
3880 Register ScratchReg = MI->getOperand(0).getReg();
3881 int64_t FrameSize = MI->getOperand(1).getImm();
3882 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
3883 MI->getOperand(3).getImm());
3884 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
3885 CFAOffset);
3886 } else {
3887 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
3888 "Stack probe pseudo-instruction expected");
3889 const AArch64InstrInfo *TII =
3890 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
3891 Register TargetReg = MI->getOperand(0).getReg();
3892 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
3893 }
3894 MI->eraseFromParent();
3895 }
3896}
3897
3900 NotAccessed = 0, // Stack object not accessed by load/store instructions.
3901 GPR = 1 << 0, // A general purpose register.
3902 PPR = 1 << 1, // A predicate register.
3903 FPR = 1 << 2, // A floating point/Neon/SVE register.
3904 };
3905
3906 int Idx;
3908 int64_t Size;
3909 unsigned AccessTypes;
3910
3912
3913 bool operator<(const StackAccess &Rhs) const {
3914 return std::make_tuple(start(), Idx) <
3915 std::make_tuple(Rhs.start(), Rhs.Idx);
3916 }
3917
3918 bool isCPU() const {
3919 // Predicate register load and store instructions execute on the CPU.
3921 }
3922 bool isSME() const { return AccessTypes & AccessType::FPR; }
3923 bool isMixed() const { return isCPU() && isSME(); }
3924
3925 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
3926 int64_t end() const { return start() + Size; }
3927
3928 std::string getTypeString() const {
3929 switch (AccessTypes) {
3930 case AccessType::FPR:
3931 return "FPR";
3932 case AccessType::PPR:
3933 return "PPR";
3934 case AccessType::GPR:
3935 return "GPR";
3937 return "NA";
3938 default:
3939 return "Mixed";
3940 }
3941 }
3942
3943 void print(raw_ostream &OS) const {
3944 OS << getTypeString() << " stack object at [SP"
3945 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
3946 if (Offset.getScalable())
3947 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
3948 << " * vscale";
3949 OS << "]";
3950 }
3951};
3952
3953static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
3954 SA.print(OS);
3955 return OS;
3956}
3957
3958void AArch64FrameLowering::emitRemarks(
3959 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
3960
3961 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3963 return;
3964
3965 unsigned StackHazardSize = getStackHazardSize(MF);
3966 const uint64_t HazardSize =
3967 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
3968
3969 if (HazardSize == 0)
3970 return;
3971
3972 const MachineFrameInfo &MFI = MF.getFrameInfo();
3973 // Bail if function has no stack objects.
3974 if (!MFI.hasStackObjects())
3975 return;
3976
3977 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
3978
3979 size_t NumFPLdSt = 0;
3980 size_t NumNonFPLdSt = 0;
3981
3982 // Collect stack accesses via Load/Store instructions.
3983 for (const MachineBasicBlock &MBB : MF) {
3984 for (const MachineInstr &MI : MBB) {
3985 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
3986 continue;
3987 for (MachineMemOperand *MMO : MI.memoperands()) {
3988 std::optional<int> FI = getMMOFrameID(MMO, MFI);
3989 if (FI && !MFI.isDeadObjectIndex(*FI)) {
3990 int FrameIdx = *FI;
3991
3992 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
3993 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
3994 StackAccesses[ArrIdx].Idx = FrameIdx;
3995 StackAccesses[ArrIdx].Offset =
3996 getFrameIndexReferenceFromSP(MF, FrameIdx);
3997 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
3998 }
3999
4000 unsigned RegTy = StackAccess::AccessType::GPR;
4001 if (MFI.hasScalableStackID(FrameIdx))
4004 RegTy = StackAccess::FPR;
4005
4006 StackAccesses[ArrIdx].AccessTypes |= RegTy;
4007
4008 if (RegTy == StackAccess::FPR)
4009 ++NumFPLdSt;
4010 else
4011 ++NumNonFPLdSt;
4012 }
4013 }
4014 }
4015 }
4016
4017 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
4018 return;
4019
4020 llvm::sort(StackAccesses);
4021 llvm::erase_if(StackAccesses, [](const StackAccess &S) {
4023 });
4024
4027
4028 if (StackAccesses.front().isMixed())
4029 MixedObjects.push_back(&StackAccesses.front());
4030
4031 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
4032 It != End; ++It) {
4033 const auto &First = *It;
4034 const auto &Second = *(It + 1);
4035
4036 if (Second.isMixed())
4037 MixedObjects.push_back(&Second);
4038
4039 if ((First.isSME() && Second.isCPU()) ||
4040 (First.isCPU() && Second.isSME())) {
4041 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
4042 if (Distance < HazardSize)
4043 HazardPairs.emplace_back(&First, &Second);
4044 }
4045 }
4046
4047 auto EmitRemark = [&](llvm::StringRef Str) {
4048 ORE->emit([&]() {
4049 auto R = MachineOptimizationRemarkAnalysis(
4050 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
4051 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
4052 });
4053 };
4054
4055 for (const auto &P : HazardPairs)
4056 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
4057
4058 for (const auto *Obj : MixedObjects)
4059 EmitRemark(
4060 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
4061}
unsigned const MachineRegisterInfo * MRI
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static bool produceCompactUnwindFrame(const AArch64FrameLowering &, MachineFunction &MF)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static cl::opt< bool > SplitSVEObjects("aarch64-split-sve-objects", cl::desc("Split allocation of ZPR & PPR objects"), cl::init(true), cl::Hidden)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
void computeCalleeSaveRegisterPairs(const AArch64FrameLowering &AFL, MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool invalidateRegisterPairing(bool SpillExtendedVolatile, unsigned SpillCount, unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool isLikelyToHaveSVEStack(const AArch64FrameLowering &AFL, const MachineFunction &MF)
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static SVEStackSizes determineSVEStackSizes(MachineFunction &MF, AssignObjectOffsets AssignOffsets)
Process all the SVE stack objects and the SVE stack size and offsets for each object.
static bool isTargetWindows(const MachineFunction &MF)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static unsigned getStackHazardSize(const MachineFunction &MF)
static bool invalidateWindowsRegisterPairing(bool SpillExtendedVolatile, unsigned SpillCount, unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
MCRegister findFreePredicateReg(BitVector &SavedRegs)
static bool isPPRAccess(const MachineInstr &MI)
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
This file contains the declaration of the AArch64PrologueEmitter and AArch64EpilogueEmitter classes,...
static const int kSetTagLoopThreshold
static int getArgumentStackToRestore(MachineFunction &MF, MachineBasicBlock &MBB)
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
DXIL Forward Handle Accesses
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
static std::string getTypeString(Type *T)
Definition LLParser.cpp:67
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition MD5.cpp:54
#define I(x, y, z)
Definition MD5.cpp:57
#define H(x, y, z)
Definition MD5.cpp:56
Register Reg
Register const TargetRegisterInfo * TRI
Promote Memory to Register
Definition Mem2Reg.cpp:110
uint64_t IntrinsicInst * II
#define P(N)
This file declares the machine register scavenger class.
unsigned OpIndex
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition Value.cpp:480
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
#define LLVM_DEBUG(...)
Definition Debug.h:114
StackOffset getSVEStackSize(const MachineFunction &MF) const
Returns the size of the entire SVE stackframe (PPRs + ZPRs).
StackOffset getZPRStackSize(const MachineFunction &MF) const
Returns the size of the entire ZPR stackframe (calleesaves + spills).
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
bool enableFullCFIFixup(const MachineFunction &MF) const override
enableFullCFIFixup - Returns true if we may need to fix the unwind information such that it is accura...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
bool enableCFIFixup(const MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon function entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
bool hasSVECalleeSavesAboveFrameRecord(const MachineFunction &MF) const
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, TargetStackID::Value StackID, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
bool needsWinCFI(const MachineFunction &MF) const
bool isFPReserved(const MachineFunction &MF) const
Should the Frame Pointer be reserved for the current function?
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
StackOffset getPPRStackSize(const MachineFunction &MF) const
Returns the size of the entire PPR stackframe (calleesaves + spills + hazard padding).
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
bool requiresSaveVG(const MachineFunction &MF) const
void emitPacRetPlusLeafHardening(MachineFunction &MF) const
Harden the entire function with pac-ret.
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
SignReturnAddress getSignReturnAddressCondition() const
void setStackSizeSVE(uint64_t ZPR, uint64_t PPR)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setSVECalleeSavedStackSize(unsigned ZPR, unsigned PPR)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:40
size_t size() const
size - Get the array size.
Definition ArrayRef.h:142
bool empty() const
empty - Check if the array is empty.
Definition ArrayRef.h:137
bool test(unsigned Idx) const
Definition BitVector.h:480
BitVector & reset()
Definition BitVector.h:411
size_type count() const
count - Returns the number of bits which are set.
Definition BitVector.h:181
BitVector & set()
Definition BitVector.h:370
iterator_range< const_set_bits_iterator > set_bits() const
Definition BitVector.h:159
size_type size() const
size - Returns the number of bits in this bitvector.
Definition BitVector.h:178
Helper class for creating CFI instructions and inserting them into MIR.
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition DebugLoc.h:123
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition Function.h:703
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition Function.h:270
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition Function.h:352
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition Function.h:227
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition Function.cpp:730
A set of physical registers with utility functions to track liveness when walking backward/forward th...
bool usesWindowsCFI() const
Definition MCAsmInfo.h:652
Wrapper class representing physical registers. Should be passed by value.
Definition MCRegister.h:41
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
LLVM_ABI int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
LLVM_ABI int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
bool isCalleeSavedObjectIndex(int ObjectIdx) const
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
bool hasScalableStackID(int ObjectIdx) const
int getStackProtectorIndex() const
Return the index for the stack protector object.
LLVM_ABI int CreateSpillStackObject(uint64_t Size, Align Alignment)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
LLVM_ABI uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
void setIsCalleeSavedObjectIndex(int ObjectIdx, bool IsCalleeSaved)
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
void setFlags(unsigned flags)
LLVM_ABI void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
uint32_t getFlags() const
Return the MI flags bitvector.
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
int64_t getImm() const
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
LLVM_ABI void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
LLVM_ABI bool isLiveIn(Register Reg) const
LLVM_ABI const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
LLVM_ABI bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition ArrayRef.h:298
Wrapper class representing virtual and physical registers.
Definition Register.h:20
constexpr bool isValid() const
Definition Register.h:112
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasNonStreamingInterfaceAndBody() const
bool hasStreamingBody() const
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:151
A SetVector that performs no allocations if smaller than a certain size.
Definition SetVector.h:339
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
reference emplace_back(ArgTypes &&... Args)
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
StackOffset holds a fixed and a scalable offset in bytes.
Definition TypeSize.h:30
int64_t getFixed() const
Returns the fixed component of the stack.
Definition TypeSize.h:46
int64_t getScalable() const
Returns the scalable component of the stack.
Definition TypeSize.h:49
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition TypeSize.h:41
static StackOffset getScalable(int64_t Scalable)
Definition TypeSize.h:40
static StackOffset getFixed(int64_t Fixed)
Definition TypeSize.h:39
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(const MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
Primary interface to the complete machine description for the target machine.
const Triple & getTargetTriple() const
TargetOptions Options
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
LLVM_ABI bool FramePointerIsReserved(const MachineFunction &MF) const
FramePointerIsReserved - This returns true if the frame pointer must always either point to a new fra...
LLVM_ABI bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
virtual const TargetRegisterInfo * getRegisterInfo() const =0
Return the target's register information.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition CallingConv.h:50
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition CallingConv.h:66
@ Fast
Attempts to make calls as fast as possible (e.g.
Definition CallingConv.h:41
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition CallingConv.h:87
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ Define
Register definition.
initializer< Ty > init(const Ty &Val)
NodeAddr< InstrNode * > Instr
Definition RDFGraph.h:389
BaseReg
Stack frame base register. Bit 0 of FREInfo.Info.
Definition SFrame.h:77
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:532
void stable_sort(R &&Range)
Definition STLExtras.h:2070
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition ScopeExit.h:59
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1744
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition STLExtras.h:406
void sort(IteratorTy Start, IteratorTy End)
Definition STLExtras.h:1634
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:167
FunctionAddr VTableAddr Count
Definition InstrProf.h:139
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
@ LLVM_MARK_AS_BITMASK_ENUM
Definition ModRef.h:37
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:74
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint16_t MCPhysReg
An unsigned integer type large enough to represent all physical registers, but not necessarily virtua...
Definition MCRegister.h:21
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
auto find_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::find_if which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1770
void erase_if(Container &C, UnaryPredicate P)
Provide a container algorithm similar to C++ Library Fundamentals v2's erase_if which is equivalent t...
Definition STLExtras.h:2132
bool is_contained(R &&Range, const E &Element)
Returns true if Element is found in Range.
Definition STLExtras.h:1909
LLVM_ABI const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=MaxLookupSearchDepth)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
LLVM_ABI Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition BitVector.h:872
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
Pair of physical register and lane mask.
static LLVM_ABI MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.
SmallVector< WinEHTryBlockMapEntry, 4 > TryBlockMap
SmallVector< WinEHHandlerType, 1 > HandlerArray