LLVM  13.0.0git
ARMTargetInfo.cpp
Go to the documentation of this file.
1 //===-- ARMTargetInfo.cpp - ARM Target Implementation ---------------------===//
2 //
3 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4 // See https://llvm.org/LICENSE.txt for license information.
5 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6 //
7 //===----------------------------------------------------------------------===//
8 
11 using namespace llvm;
12 
14  static Target TheARMLETarget;
15  return TheARMLETarget;
16 }
18  static Target TheARMBETarget;
19  return TheARMBETarget;
20 }
22  static Target TheThumbLETarget;
23  return TheThumbLETarget;
24 }
26  static Target TheThumbBETarget;
27  return TheThumbBETarget;
28 }
29 
31  RegisterTarget<Triple::arm, /*HasJIT=*/true> X(getTheARMLETarget(), "arm",
32  "ARM", "ARM");
33  RegisterTarget<Triple::armeb, /*HasJIT=*/true> Y(getTheARMBETarget(), "armeb",
34  "ARM (big endian)", "ARM");
35 
36  RegisterTarget<Triple::thumb, /*HasJIT=*/true> A(getTheThumbLETarget(),
37  "thumb", "Thumb", "ARM");
38  RegisterTarget<Triple::thumbeb, /*HasJIT=*/true> B(
39  getTheThumbBETarget(), "thumbeb", "Thumb (big endian)", "ARM");
40 }
use
Move duplicate certain instructions close to their use
Definition: Localizer.cpp:31
set
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 atomic and others It is also currently not done for read modify write instructions It is also current not done if the OF or CF flags are needed The shift operators have the complication that when the shift count is EFLAGS is not set
Definition: README.txt:1277
functions
amdgpu propagate attributes Late propagate attributes from kernels to functions
Definition: AMDGPUPropagateAttributes.cpp:199
is
should just be implemented with a CLZ instruction Since there are other e that share this it would be best to implement this in a target independent as zero is the default value for the binary encoder e add r0 add r5 Register operands should be distinct That is
Definition: README.txt:725
produces
entry stw r5 blr GCC produces
Definition: README.txt:174
then
RUN< i32 >< i1 > br i1 label then
Definition: README.txt:338
constant
we should consider alternate ways to model stack dependencies Lots of things could be done in WebAssemblyTargetTransformInfo cpp there are numerous optimization related hooks that can be overridden in WebAssemblyTargetLowering Instead of the OptimizeReturned which should consider preserving the returned attribute through to MachineInstrs and extending the MemIntrinsicResults pass to do this optimization on calls too That would also let the WebAssemblyPeephole pass clean up dead defs for such as it does for stores Consider implementing and or getMachineCombinerPatterns Find a clean way to fix the problem which leads to the Shrink Wrapping pass being run after the WebAssembly PEI pass When setting multiple variables to the same constant
Definition: README.txt:91
llvm
Definition: AllocatorList.h:23
r0
produces mov w r0
Definition: README-Thumb.txt:256
ARMTargetInfo.h
print
static void print(raw_ostream &Out, object::Archive::Kind Kind, T Val)
Definition: ArchiveWriter.cpp:147
llvm::getTheARMBETarget
Target & getTheARMBETarget()
Definition: ARMTargetInfo.cpp:17
it
Reference model for inliner Oz decision policy Note this model is also referenced by test Transforms Inline ML tests if replacing it
Definition: README.txt:3
store
Common register allocation spilling lr str ldr sxth r3 ldr mla r4 can lr mov lr str ldr sxth r3 mla r4 and then merge mul and lr str ldr sxth r3 mla r4 It also increase the likelihood the store may become dead bb27 Successors according to LLVM ID Predecessors according to mbb< bb27, 0x8b0a7c0 > Note ADDri is not a two address instruction its result reg1037 is an operand of the PHI node in bb76 and its operand reg1039 is the result of the PHI node We should treat it as a two address code and make sure the ADDri is scheduled after any node that reads reg1039 Use info(i.e. register scavenger) to assign it a free register to allow reuse the collector could move the objects and invalidate the derived pointer This is bad enough in the first but safe points can crop up unpredictably **array_addr i32 n y store obj obj **nth_el If the i64 division is lowered to a then a safe point array and nth_el no longer point into the correct object The fix for this is to copy address calculations so that dependent pointers are never live across safe point boundaries But the loads cannot be copied like this if there was an intervening store
Definition: README.txt:133
llvm::Target
Target - Wrapper for Target specific information.
Definition: TargetRegistry.h:124
codegen
Since we know that Vector is byte aligned and we know the element offset of we should change the load into a lve *x instead of doing a load store lve *x sequence Implement passing vectors by value into calls and receiving them as arguments GCC apparently tries to codegen
Definition: README_ALTIVEC.txt:46
LCPI1_1
into eax xorps xmm0 xmm0 eax xmm0 eax xmm0 ret esp eax movdqa xmm0 xmm0 esp const ret align it should be movdqa xmm0 xmm0 We should transform a shuffle of two vectors of constants into a single vector of constants insertelement of a constant into a vector of constants should also result in a vector of constants e g VecISelBug ll We compiled it to something globl _t xmm0 movhps xmm0 movss LCPI1_1
Definition: README-SSE.txt:677
to
Should compile to
Definition: README.txt:449
and
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 and
Definition: README.txt:1271
SpecialSubKind::allocator
@ allocator
like
Should compile to something like
Definition: README.txt:19
instructions
print must be executed print the must be executed context for all instructions
Definition: MustExecute.cpp:355
a
=0.0 ? 0.0 :(a > 0.0 ? 1.0 :-1.0) a
Definition: README.txt:489
way
should just be implemented with a CLZ instruction Since there are other e that share this it would be best to implement this in a target independent way
Definition: README.txt:720
clear
static void clear(coro::Shape &Shape)
Definition: Coroutines.cpp:233
llvm::getTheARMLETarget
Target & getTheARMLETarget()
Definition: ARMTargetInfo.cpp:13
select
into xmm2 addss xmm2 xmm1 xmm3 addss xmm3 movaps xmm0 unpcklps xmm0 ret seems silly when it could just be one addps Expand libm rounding functions main should enable SSE DAZ mode and other fast SSE modes Think about doing i64 math in SSE regs on x86 This testcase should have no SSE instructions in and only one load from a constant double ret double C the select is being which prevents the dag combiner from turning select(load CPI1)
llvm::getTheThumbLETarget
Target & getTheThumbLETarget()
Definition: ARMTargetInfo.cpp:21
cmp
< i32 >< i32 > cmp
Definition: README.txt:1447
paths
Reduce control height in the hot paths
Definition: ControlHeightReduction.cpp:137
Y
static GCMetadataPrinterRegistry::Add< OcamlGCMetadataPrinter > Y("ocaml", "ocaml 3.10-compatible collector")
test
Definition: README.txt:37
t
bitcast float %x to i32 %s=and i32 %t, 2147483647 %d=bitcast i32 %s to float ret float %d } declare float @fabsf(float %n) define float @bar(float %x) nounwind { %d=call float @fabsf(float %x) ret float %d } This IR(from PR6194):target datalayout="e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" target triple="x86_64-apple-darwin10.0.0" %0=type { double, double } %struct.float3=type { float, float, float } define void @test(%0, %struct.float3 *nocapture %res) nounwind noinline ssp { entry:%tmp18=extractvalue %0 %0, 0 t
Definition: README-SSE.txt:788
B
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
pass
modulo schedule Modulo Schedule test pass
Definition: ModuloSchedule.cpp:2126
in
The object format emitted by the WebAssembly backed is documented in
Definition: README.txt:11
be
Common register allocation spilling lr str ldr sxth r3 ldr mla r4 can be
Definition: README.txt:14
that
Reference model for inliner Oz decision policy Note that
Definition: README.txt:2
lr
Common register allocation spilling lr str lr
Definition: README.txt:6
llvm::ilist_detail::make
T & make()
LCPI1_2
cond_true lis lo16() lo16() LCPI1_2(r3) lfs f3
X
static GCMetadataPrinterRegistry::Add< ErlangGCPrinter > X("erlang", "erlang-compatible garbage collector")
will
Common register allocation spilling lr str ldr sxth r3 ldr mla r4 can lr mov lr str ldr sxth r3 mla r4 and then merge mul and lr str ldr sxth r3 mla r4 It also increase the likelihood the store may become dead bb27 Successors according to LLVM ID Predecessors according to mbb< bb27, 0x8b0a7c0 > Note ADDri is not a two address instruction its result reg1037 is an operand of the PHI node in bb76 and its operand reg1039 is the result of the PHI node We should treat it as a two address code and make sure the ADDri is scheduled after any node that reads reg1039 Use info(i.e. register scavenger) to assign it a free register to allow reuse the collector could move the objects and invalidate the derived pointer This is bad enough in the first but safe points can crop up unpredictably **array_addr i32 n y store obj obj **nth_el If the i64 division is lowered to a then a safe point will(must) appear for the call site. If a collection occurs
foo
< i32 > tmp foo
Definition: README.txt:383
bad
We currently generate an sqrtsd and divsd instructions This is bad
Definition: README-SSE.txt:820
llvm::cl::values
ValuesClass values(OptsTy... Options)
Helper to build a ValuesClass by forwarding a variable number of arguments as an initializer list to ...
Definition: CommandLine.h:699
LLVM_EXTERNAL_VISIBILITY
#define LLVM_EXTERNAL_VISIBILITY
Definition: Compiler.h:132
llvm::LegalityPredicates::all
Predicate all(Predicate P0, Predicate P1)
True iff P0 and P1 are true.
Definition: LegalizerInfo.h:196
s
multiplies can be turned into SHL s
Definition: README.txt:370
llvm::ARM_AM::add
@ add
Definition: ARMAddressingModes.h:39
though
test_demangle and llvm unittest Demangle The llvm directory should only get tests for stuff not included in the core library In the future though
Definition: README.txt:48
size
i< reg-> size
Definition: README.txt:166
LCPI1_0
> ldr r0, pc, #((LCPI1_0-(LPCRELL0+4))&0xfffffffc) We compile the following:define i16 @func_entry_2E_ce(i32 %i) { switch i32 %i, label %bb12.exitStub[i32 0, label %bb4.exitStub i32 1, label %bb9.exitStub i32 2, label %bb4.exitStub i32 3, label %bb4.exitStub i32 7, label %bb9.exitStub i32 8, label %bb.exitStub i32 9, label %bb9.exitStub] bb12.exitStub:ret i16 0 bb4.exitStub:ret i16 1 bb9.exitStub:ret i16 2 bb.exitStub:ret i16 3 } into:_func_entry_2E_ce:mov r2, #1 lsl r2, r0 cmp r0, #9 bhi LBB1_4 @bb12.exitStub LBB1_1:@newFuncRoot mov r1, #13 tst r2, r1 bne LBB1_5 @bb4.exitStub LBB1_2:@newFuncRoot ldr r1, LCPI1_0 tst r2, r1 bne LBB1_6 @bb9.exitStub LBB1_3:@newFuncRoot mov r1, #1 lsl r1, r1, #8 tst r2, r1 bne LBB1_7 @bb.exitStub LBB1_4:@bb12.exitStub mov r0, #0 bx lr LBB1_5:@bb4.exitStub mov r0, #1 bx lr LBB1_6:@bb9.exitStub mov r0, #2 bx lr LBB1_7:@bb.exitStub mov r0, #3 bx lr LBB1_8:.align 2 LCPI1_0:.long 642 gcc compiles to:cmp r0, #9 @ lr needed for prologue bhi L2 ldr r3, L11 mov r2, #1 mov r1, r2, asl r0 ands r0, r3, r2, asl r0 movne r0, #2 bxne lr tst r1, #13 beq L9 L3:mov r0, r2 bx lr L9:tst r1, #256 movne r0, #3 bxne lr L2:mov r0, #0 bx lr L12:.align 2 L11:.long 642 GCC is doing a couple of clever things here:1. It is predicating one of the returns. This isn 't a clear win though:in cases where that return isn 't taken, it is replacing one condbranch with two 'ne' predicated instructions. 2. It is sinking the shift of "1 << i" into the tst, and using ands instead of tst. This will probably require whole function isel. 3. GCC emits:tst r1, #256 we emit:mov r1, #1 lsl r1, r1, #8 tst r2, r1 When spilling in thumb mode and the sp offset is too large to fit in the ldr/str offset field, we load the offset from a constpool entry and add it to sp:ldr r2, LCPI add r2, sp ldr r2,[r2] These instructions preserve the condition code which is important if the spill is between a cmp and a bcc instruction. However, we can use the(potentially) cheaper sequence if we know it 's ok to clobber the condition register. add r2, sp, #255 *4 add r2, #132 ldr r2,[r2, #7 *4] This is especially bad when dynamic alloca is used. The all fixed size stack objects are referenced off the frame pointer with negative offsets. See oggenc for an example. Poor codegen test/CodeGen/ARM/select.ll f7:ldr r5, LCPI1_0 LPC0:add r5, pc ldr r6, LCPI1_1 ldr r2, LCPI1_2 mov r3, r6 mov lr, pc bx r5 Make register allocator/spiller smarter so we can re-materialize "mov r, imm", etc. Almost all Thumb instructions clobber condition code. Thumb load/store address mode offsets are scaled. The values kept in the instruction operands are pre-scale values. This probably ought to be changed to avoid extra work when we convert Thumb2 instructions to Thumb1 instructions. We need to make(some of the) Thumb1 instructions predicable. That will allow shrinking of predicated Thumb2 instructions. To allow this, we need to be able to toggle the 's' bit since they do not set CPSR when they are inside IT blocks. Make use of hi register variants of cmp:tCMPhir/tCMPZhir. Thumb1 immediate field sometimes keep pre-scaled values. See ThumbRegisterInfo::eliminateFrameIndex. This is inconsistent from ARM and Thumb2. Rather than having tBR_JTr print a ".align 2" and constant island pass pad it, add a target specific ALIGN instruction instead. That way, getInstSizeInBytes won 't have to over-estimate. It can also be used for loop alignment pass. We generate conditional code for icmp when we don 't need to. This code:int foo(int s) { return s==1 LCPI1_0
Definition: README-Thumb.txt:249
ldr
add sub stmia L5 ldr r0 bl L_printf $stub Instead of a and a ldr
Definition: README.txt:204
r6
bb420 i lbzx r7 addi r6
Definition: README.txt:40
load
LLVM currently emits rax rax movq rax rax ret It could narrow the loads and stores to emit rax rax movq rax rax ret The trouble is that there is a TokenFactor between the store and the load
Definition: README.txt:1531
example
llvm lib Support Unix the directory structure underneath this directory could look like only those directories actually needing to be created should be created further subdirectories could be created to reflect versions of the various standards For example
Definition: README.txt:15
llvm::Triple::armeb
@ armeb
Definition: Triple.h:51
IT
static cl::opt< ITMode > IT(cl::desc("IT block support"), cl::Hidden, cl::init(DefaultIT), cl::ZeroOrMore, cl::values(clEnumValN(DefaultIT, "arm-default-it", "Generate IT block based on arch"), clEnumValN(RestrictedIT, "arm-restrict-it", "Disallow deprecated IT based on ARMv8"), clEnumValN(NoRestrictedIT, "arm-no-restrict-it", "Allow IT blocks based on ARMv7")))
llvm::Triple::arm
@ arm
Definition: Triple.h:50
instruction
Since we know that Vector is byte aligned and we know the element offset of we should change the load into a lve *x instruction
Definition: README_ALTIVEC.txt:37
llvm::Triple::thumb
@ thumb
Definition: Triple.h:81
scale
static uint64_t scale(uint64_t Num, uint32_t N, uint32_t D)
Definition: BranchProbability.cpp:68
pc
add sub stmia L5 ldr pc
Definition: README.txt:201
not
Should compile r2 movcc movcs str strb mov lr r1 movcs movcc mov lr not
Definition: README.txt:465
sp
_bar sp
Definition: README.txt:151
r3
Common register allocation spilling lr str ldr sxth r3
Definition: README.txt:8
mode
*Add support for compiling functions in both ARM and Thumb mode
Definition: README-Thumb.txt:5
bit
compiles ldr LCPI1_0 ldr ldr mov lsr tst moveq r1 ldr LCPI1_1 and r0 bx lr It would be better to do something like to fold the shift into the conditional ldr LCPI1_0 ldr ldr tst movne lsr ldr LCPI1_1 and r0 bx lr it saves an instruction and a register It might be profitable to cse MOVi16 if there are lots of bit immediates with the same bottom half Robert Muth started working on an alternate jump table implementation that does not put the tables in line in the text This is more like the llvm default jump table implementation This might be useful sometime Several revisions of patches are on the mailing beginning while CMP sets them like a subtract Therefore to be able to use CMN for comparisons other than the Z bit
Definition: README.txt:584
code
*Add support for compiling functions in both ARM and Thumb then taking the smallest *Add support for compiling individual basic blocks in thumb when in a larger ARM function This can be used for presumed cold code
Definition: README-Thumb.txt:9
can
This might compile to this xmm1 xorps xmm0 movss xmm0 ret Now consider if the code caused xmm1 to get spilled This might produce this xmm1 movaps xmm0 movaps xmm1 movss xmm0 ret since the reload is only used by these we could fold it into the producing something like xmm1 movaps xmm0 ret saving two instructions The basic idea is that a reload from a spill can
Definition: README-SSE.txt:269
llvm::RegisterTarget
RegisterTarget - Helper template for registering a target, for use in the target's initialization fun...
Definition: TargetRegistry.h:948
than
So that lo16() r2 stb r3 blr Becomes r3 they should compile to something better than
Definition: README.txt:161
llvm::MCID::Add
@ Add
Definition: MCInstrDesc.h:183
pad
static void pad(raw_fd_ostream &OS)
Definition: TarWriter.cpp:82
LLVMInitializeARMTargetInfo
LLVM_EXTERNAL_VISIBILITY void LLVMInitializeARMTargetInfo()
Definition: ARMTargetInfo.cpp:30
This
the resulting code requires compare and branches when and if the revised code is with conditional branches instead of More there is a byte word extend before each where there should be only and the condition codes are not remembered when the same two values are compared twice More LSR enhancements i8 and i32 load store addressing modes are identical This
Definition: README.txt:418
r5
_bar mov r0 mov r5
Definition: README.txt:154
llvm::Triple::thumbeb
@ thumbeb
Definition: Triple.h:82
used
This might compile to this xmm1 xorps xmm0 movss xmm0 ret Now consider if the code caused xmm1 to get spilled This might produce this xmm1 movaps xmm0 movaps xmm1 movss xmm0 ret since the reload is only used by these we could fold it into the producing something like xmm1 movaps xmm0 ret saving two instructions The basic idea is that a reload from a spill if only one byte chunk is used
Definition: README-SSE.txt:270
mov
Common register allocation spilling lr str ldr sxth r3 ldr mla r4 can lr mov lr str ldr sxth r3 mla r4 and then merge mul and mov
Definition: README.txt:23
support
Reimplement select in terms of SEL *We would really like to support but we need to prove that the add doesn t need to overflow between the two bit chunks *Implement pre post increment support(e.g. PR935) *Implement smarter const ant generation for binops with large immediates. A few ARMv6T2 ops should be pattern matched
Definition: README.txt:10
stack
S is passed via registers r2 But gcc stores them to the stack
Definition: README.txt:189
llvm::getTheThumbBETarget
Target & getTheThumbBETarget()
Definition: ARMTargetInfo.cpp:25
larger
the resulting code requires compare and branches when and if the revised code is larger
Definition: README.txt:397
abort
*Add support for compiling functions in both ARM and Thumb then taking the smallest *Add support for compiling individual basic blocks in thumb when in a larger ARM function This can be used for presumed cold like paths to abort(failure path of asserts)
TargetRegistry.h
r2
llvm ldr r2
Definition: README.txt:126
of
Add support for conditional and other related patterns Instead of
Definition: README.txt:134
cases
Code Generation Notes for reduce the size of the ISel and reduce repetition in the implementation In a small number of cases
Definition: MSA.txt:6