LLVM 23.0.0git
AArch64SRLTDefineSuperRegs.cpp
Go to the documentation of this file.
1//===- AArch64SRLTDefineSuperRegs.cpp -------------------------------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// When SubRegister Liveness Tracking (SRLT) is enabled, this pass adds
10// extra implicit-def's to instructions that define the low N bits of
11// a GPR/FPR register to also define the top bits, because all AArch64
12// instructions that write the low bits of a GPR/FPR also implicitly zero
13// the top bits. For example, 'mov w0, w1' writes zeroes to the top 32-bits of
14// x0, so this pass adds a `implicit-def $x0` after register allocation.
15//
16// These semantics are originally represented in the MIR using `SUBREG_TO_REG`
17// which expresses that the top bits have been defined by the preceding
18// instructions, but during register coalescing this information is lost and in
19// contrast to when SRTL is disabled, when rewriting virtual -> physical
20// registers the implicit-defs are not added to the instruction.
21//
22// There have been several attempts to fix this in the coalescer [1], but each
23// iteration has exposed new bugs and the patch had to be reverted.
24// Additionally, the concept of adding 'implicit-def' of a virtual register is
25// particularly fragile and many places don't expect it (for example in
26// `X86::commuteInstructionImpl` the code only looks at specific operands and
27// does not consider implicit-defs. Similar in `SplitEditor::addDeadDef` where
28// it traverses operand 'defs' rather than 'all_defs').
29//
30// We want a temporary solution that doesn't impact other targets and is simpler
31// and less intrusive than the patch proposed for the register coalescer [1], so
32// that we can enable SRLT for AArch64.
33//
34// The approach here is to just add the 'implicit-def' manually after rewriting
35// virtual regs -> physical regs. This still means that during the register
36// allocation process the dependences are not accurately represented in the MIR
37// and LiveIntervals, but there are several reasons why we believe this isn't a
38// problem in practice:
39// (A) The register allocator only spills entire virtual registers.
40// This is additionally guarded by code in
41// AArch64InstrInfo::storeRegToStackSlot/loadRegFromStackSlot
42// where it checks if a register matches the expected register class.
43// (B) Rematerialization only happens when the instruction writes the full
44// register.
45// (C) The high bits of the AArch64 register cannot be written independently.
46// (D) Instructions that write only part of a register always take that same
47// register as a tied input operand, to indicate it's a merging operation.
48//
49// (A) means that for two virtual registers of regclass GPR32 and GPR64, if the
50// GPR32 register is coalesced into the GPR64 vreg then the full GPR64 would
51// be spilled/filled even if only the low 32-bits would be required for the
52// given liverange. (B) means that the top bits of a GPR64 would never be
53// overwritten by rematerialising a GPR32 sub-register for a given liverange.
54// (C-D) means that we can assume that the MIR as input to the register
55// allocator correctly expresses the instruction behaviour and dependences
56// between values, so unless the register allocator would violate (A) or (B),
57// the MIR is otherwise sound.
58//
59// Alternative approaches have also been considered, such as:
60// (1) Changing the AArch64 instruction definitions to write all bits and
61// extract the low N bits for the result.
62// (2) Disabling coalescing of SUBREG_TO_REG and using regalloc hints to tell
63// the register allocator to favour the same register for the input/output.
64// (3) Adding a new coalescer guard node with a tied-operand constraint, such
65// that when the SUBREG_TO_REG is removed, something still represents that
66// the top bits are defined. The node would get removed before rewriting
67// virtregs.
68// (4) Using an explicit INSERT_SUBREG into a zero value and try to optimize
69// away the INSERT_SUBREG (this is a more explicit variant of (2) and (3))
70// (5) Adding a new MachineOperand flag that represents the top bits would be
71// defined, but are not read nor undef.
72//
73// (1) would be the best approach but would be a significant effort as it
74// requires rewriting most/all instruction definitions and fixing MIR passes
75// that rely on the current definitions, whereas (2-4) result in sub-optimal
76// code that can't really be avoided because the explicit nodes would stop
77// rematerialization. (5) might be a way to mitigate the
78// fragility of implicit-def's of virtual registers if we want to pursue
79// landing [1], but then we'd rather choose approach (1) to avoid using
80// SUBREG_TO_REG entirely.
81//
82// [1] https://github.com/llvm/llvm-project/pull/168353
83//===----------------------------------------------------------------------===//
84
85#include "AArch64.h"
86#include "AArch64InstrInfo.h"
88#include "AArch64Subtarget.h"
90#include "llvm/ADT/BitVector.h"
91#include "llvm/ADT/SmallSet.h"
98#include "llvm/Support/Debug.h"
99
100using namespace llvm;
101
102#define DEBUG_TYPE "aarch64-srlt-define-superregs"
103#define PASS_NAME "AArch64 SRLT Define Super-Regs Pass"
104
105namespace {
106
107class AArch64SRLTDefineSuperRegsImpl {
108private:
109 const AArch64Subtarget *Subtarget = nullptr;
110 const AArch64RegisterInfo *TRI = nullptr;
111
112 Register getWidestSuperReg(Register R, const BitVector &RequiredBaseRegUnits,
113 const BitVector &QHiRegUnits);
114
115public:
116 bool run(MachineFunction &MF);
117};
118
119class AArch64SRLTDefineSuperRegsLegacy : public MachineFunctionPass {
120public:
121 inline static char ID = 0;
122
123 AArch64SRLTDefineSuperRegsLegacy() : MachineFunctionPass(ID) {}
124
125 bool runOnMachineFunction(MachineFunction &MF) override {
126 return AArch64SRLTDefineSuperRegsImpl().run(MF);
127 }
128
129 StringRef getPassName() const override { return PASS_NAME; }
130
131 void getAnalysisUsage(AnalysisUsage &AU) const override {
132 AU.setPreservesCFG();
136 }
137};
138
139} // end anonymous namespace
140
141INITIALIZE_PASS(AArch64SRLTDefineSuperRegsLegacy, DEBUG_TYPE, PASS_NAME, false,
142 false)
143
144// Returns the widest super-reg for a given reg, or NoRegister if no suitable
145// wider super-reg has been found. For example:
146// W0 -> X0
147// B1 -> Q1 (without SVE)
148// -> Z1 (with SVE)
149// W1_W2 -> X1_X2
150// D0_D1 -> Q0_Q1 (without SVE)
151// -> Z0_Z1 (with SVE)
152Register AArch64SRLTDefineSuperRegsImpl::getWidestSuperReg(
153 Register R, const BitVector &RequiredBaseRegUnits,
154 const BitVector &QHiRegUnits) {
155 assert(R.isPhysical() &&
156 "Expected to be run straight after virtregrewriter!");
157
158 BitVector Units(TRI->getNumRegUnits());
159 for (MCRegUnit U : TRI->regunits(R))
160 Units.set((unsigned)U);
161
162 auto IsSuitableSuperReg = [&](Register SR) {
163 for (MCRegUnit U : TRI->regunits(SR)) {
164 // Avoid choosing z1 as super-reg of d1 if SVE is not available.
165 // Q*_HI registers are only set for SVE registers, as those consist
166 // of the Q* register for the low 128 bits and the Q*_HI (artificial)
167 // register for the top (vscale-1) * 128 bits.
168 if (QHiRegUnits.test((unsigned)U) &&
169 !Subtarget->isSVEorStreamingSVEAvailable())
170 return false;
171 // We consider a super-reg as unsuitable if any of its reg units is not
172 // artificial and not shared, as that would imply that U is a unit for a
173 // different register, which means the candidate super-reg is likely
174 // a register tuple.
175 if (!TRI->isArtificialRegUnit(U) &&
176 (!Units.test((unsigned)U) || !RequiredBaseRegUnits.test((unsigned)U)))
177 return false;
178 }
179 return true;
180 };
181
182 Register LargestSuperReg = AArch64::NoRegister;
183 for (Register SR : TRI->superregs(R))
184 if (IsSuitableSuperReg(SR) && (LargestSuperReg == AArch64::NoRegister ||
185 TRI->isSuperRegister(LargestSuperReg, SR)))
186 LargestSuperReg = SR;
187
188 return LargestSuperReg;
189}
190
191bool AArch64SRLTDefineSuperRegsImpl::run(MachineFunction &MF) {
192 Subtarget = &MF.getSubtarget<AArch64Subtarget>();
193 TRI = Subtarget->getRegisterInfo();
194 const MachineRegisterInfo *MRI = &MF.getRegInfo();
195
196 if (!MRI->subRegLivenessEnabled())
197 return false;
198
199 assert(!MRI->isSSA() && "Expected to be run after breaking down SSA form!");
200
201 auto XRegs = seq_inclusive<unsigned>(AArch64::X0, AArch64::X28);
202 auto ZRegs = seq_inclusive<unsigned>(AArch64::Z0, AArch64::Z31);
203 constexpr unsigned FixedRegs[] = {AArch64::FP, AArch64::LR, AArch64::SP};
204
205 BitVector RequiredBaseRegUnits(TRI->getNumRegUnits());
206 for (Register R : concat<unsigned>(XRegs, ZRegs, FixedRegs))
207 for (MCRegUnit U : TRI->regunits(R))
208 RequiredBaseRegUnits.set((unsigned)U);
209
210 BitVector QHiRegUnits(TRI->getNumRegUnits());
211 for (Register R : seq_inclusive<unsigned>(AArch64::Q0_HI, AArch64::Q31_HI))
212 for (MCRegUnit U : TRI->regunits(R))
213 QHiRegUnits.set((unsigned)U);
214
215 bool Changed = false;
216 for (MachineBasicBlock &MBB : MF) {
217 for (MachineInstr &MI : MBB) {
218 // PATCHPOINT may have a 'def' that's not a register, avoid this.
219 if (MI.getOpcode() == TargetOpcode::PATCHPOINT)
220 continue;
221 // For each partial register write, also add an implicit-def for top bits
222 // of the register (e.g. for w0 add a def of x0).
223 SmallSet<Register, 8> SuperRegs;
224 for (const MachineOperand &DefOp : MI.defs())
225 if (Register R = getWidestSuperReg(DefOp.getReg(), RequiredBaseRegUnits,
226 QHiRegUnits);
227 R != AArch64::NoRegister)
228 SuperRegs.insert(R);
229
230 if (!SuperRegs.size())
231 continue;
232
233 LLVM_DEBUG(dbgs() << "Adding implicit-defs to: " << MI);
234 for (Register R : SuperRegs) {
235 LLVM_DEBUG(dbgs() << " " << printReg(R, TRI) << "\n");
236 bool IsRenamable = any_of(MI.defs(), [&](const MachineOperand &MO) {
237 return MO.isRenamable() && TRI->regsOverlap(MO.getReg(), R);
238 });
239 bool IsDead = any_of(MI.defs(), [&](const MachineOperand &MO) {
240 return MO.isDead() && TRI->regsOverlap(MO.getReg(), R);
241 });
242 MachineOperand DefOp = MachineOperand::CreateReg(
243 R, /*isDef=*/true, /*isImp=*/true, /*isKill=*/false,
244 /*isDead=*/IsDead, /*isUndef=*/false, /*isEarlyClobber=*/false,
245 /*SubReg=*/0, /*isDebug=*/false, /*isInternalRead=*/false,
246 /*isRenamable=*/IsRenamable);
247 MI.addOperand(DefOp);
248 }
249 Changed = true;
250 }
251 }
252
253 return Changed;
254}
255
257 return new AArch64SRLTDefineSuperRegsLegacy();
258}
259
263 const bool Changed = AArch64SRLTDefineSuperRegsImpl().run(MF);
264 if (!Changed)
265 return PreservedAnalyses::all();
268 return PA;
269}
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
aarch64 promote const
MachineBasicBlock & MBB
This file implements the BitVector class.
#define DEBUG_TYPE
IRTranslator LLVM IR MI
Register const TargetRegisterInfo * TRI
Promote Memory to Register
Definition Mem2Reg.cpp:110
#define INITIALIZE_PASS(passName, arg, name, cfg, analysis)
Definition PassSupport.h:56
bool IsDead
This file defines the SmallSet class.
#define LLVM_DEBUG(...)
Definition Debug.h:119
#define PASS_NAME
PreservedAnalyses run(MachineFunction &MF, MachineFunctionAnalysisManager &MFAM)
const AArch64RegisterInfo * getRegisterInfo() const override
Represent the analysis usage information of a pass.
AnalysisUsage & addPreservedID(const void *ID)
LLVM_ABI void setPreservesCFG()
This function should be called by the pass, iff they do not:
Definition Pass.cpp:275
bool test(unsigned Idx) const
Returns true if bit Idx is set.
Definition BitVector.h:482
BitVector & set()
Set all bits in the bitvector.
Definition BitVector.h:366
Represents analyses that only rely on functions' control flow.
Definition Analysis.h:73
FunctionPass class - This class is used to implement most global optimizations.
Definition Pass.h:314
MachineFunctionPass - This class adapts the FunctionPass interface to allow convenient creation of pa...
void getAnalysisUsage(AnalysisUsage &AU) const override
getAnalysisUsage - Subclasses that override getAnalysisUsage must call this.
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
static MachineOperand CreateReg(Register Reg, bool isDef, bool isImp=false, bool isKill=false, bool isDead=false, bool isUndef=false, bool isEarlyClobber=false, unsigned SubReg=0, bool isDebug=false, bool isInternalRead=false, bool isRenamable=false)
A set of analyses that are preserved following a run of a transformation pass.
Definition Analysis.h:112
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition Analysis.h:118
PreservedAnalyses & preserveSet()
Mark an analysis set as preserved.
Definition Analysis.h:151
Wrapper class representing virtual and physical registers.
Definition Register.h:20
std::pair< const_iterator, bool > insert(const T &V)
insert - Insert an element into the set if it isn't already there.
Definition SmallSet.h:184
size_type size() const
Definition SmallSet.h:171
Represent a constant reference to a string, i.e.
Definition StringRef.h:56
Changed
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
This is an optimization pass for GlobalISel generic memory operations.
auto seq_inclusive(T Begin, T End)
Iterate over an integral type from Begin to End inclusive.
Definition Sequence.h:325
LLVM_ABI char & MachineDominatorsID
MachineDominators - This pass is a machine dominators analysis pass.
AnalysisManager< MachineFunction > MachineFunctionAnalysisManager
detail::concat_range< ValueT, RangeTs... > concat(RangeTs &&...Ranges)
Returns a concatenated range across two or more ranges.
Definition STLExtras.h:1151
LLVM_ABI PreservedAnalyses getMachineFunctionPassPreservedAnalyses()
Returns the minimum set of Analyses that all machine function passes must preserve.
LLVM_ABI char & MachineLoopInfoID
MachineLoopInfo - This pass is a loop analysis pass.
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1746
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:209
FunctionPass * createAArch64SRLTDefineSuperRegsLegacyPass()
LLVM_ABI Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.