LLVM  14.0.0git
PowerPCTargetInfo.cpp
Go to the documentation of this file.
1 //===-- PowerPCTargetInfo.cpp - PowerPC Target Implementation -------------===//
2 //
3 // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4 // See https://llvm.org/LICENSE.txt for license information.
5 // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6 //
7 //===----------------------------------------------------------------------===//
8 
10 #include "llvm/MC/TargetRegistry.h"
11 using namespace llvm;
12 
14  static Target ThePPC32Target;
15  return ThePPC32Target;
16 }
18  static Target ThePPC32LETarget;
19  return ThePPC32LETarget;
20 }
22  static Target ThePPC64Target;
23  return ThePPC64Target;
24 }
26  static Target ThePPC64LETarget;
27  return ThePPC64LETarget;
28 }
29 
31  RegisterTarget<Triple::ppc, /*HasJIT=*/true> W(getThePPC32Target(), "ppc32",
32  "PowerPC 32", "PPC");
33 
34  RegisterTarget<Triple::ppcle, /*HasJIT=*/true> X(
35  getThePPC32LETarget(), "ppc32le", "PowerPC 32 LE", "PPC");
36 
37  RegisterTarget<Triple::ppc64, /*HasJIT=*/true> Y(getThePPC64Target(), "ppc64",
38  "PowerPC 64", "PPC");
39 
40  RegisterTarget<Triple::ppc64le, /*HasJIT=*/true> Z(
41  getThePPC64LETarget(), "ppc64le", "PowerPC 64 LE", "PPC");
42 }
ix16addr
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision because you need to assign rounding mode in instruction Provide builtin(set f128:$vT,(int_ppc_vsx_xsrqpi f128:$vB))(set f128 yields< n x< ty > >< result > yields< ty >< result > No builtin are required Load Store load ix16addr
Definition: README_P9.txt:498
i
i
Definition: README.txt:29
use
Move duplicate certain instructions close to their use
Definition: Localizer.cpp:31
byte
SSE Variable shift can be custom lowered to something like which uses a small table unaligned load shuffle instead of going through memory byte
Definition: README-SSE.txt:11
SDAG
QP Compare Ordered outs ins xscmpudp No SDAG
Definition: README_P9.txt:301
set
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 atomic and others It is also currently not done for read modify write instructions It is also current not done if the OF or CF flags are needed The shift operators have the complication that when the shift count is EFLAGS is not set
Definition: README.txt:1277
behavior
should just be implemented with a CLZ instruction Since there are other e that share this behavior
Definition: README.txt:709
is
should just be implemented with a CLZ instruction Since there are other e that share this it would be best to implement this in a target independent as zero is the default value for the binary encoder e add r0 add r5 Register operands should be distinct That is
Definition: README.txt:725
as
compiles conv shl5 shl ret i32 or10 it would be better as
Definition: README.txt:615
Attrs
Function Attrs
Definition: README_ALTIVEC.txt:215
Signed
@ Signed
Definition: NVPTXISelLowering.cpp:4636
readnone
We generate the following IR with i32 b nounwind readnone
Definition: README.txt:1443
constant
we should consider alternate ways to model stack dependencies Lots of things could be done in WebAssemblyTargetTransformInfo cpp there are numerous optimization related hooks that can be overridden in WebAssemblyTargetLowering Instead of the OptimizeReturned which should consider preserving the returned attribute through to MachineInstrs and extending the MemIntrinsicResults pass to do this optimization on calls too That would also let the WebAssemblyPeephole pass clean up dead defs for such as it does for stores Consider implementing and or getMachineCombinerPatterns Find a clean way to fix the problem which leads to the Shrink Wrapping pass being run after the WebAssembly PEI pass When setting multiple variables to the same constant
Definition: README.txt:91
llvm::object::Equal
@ Equal
Definition: COFFModuleDefinition.cpp:38
llvm
This file implements support for optimizing divisions by a constant.
Definition: AllocatorList.h:23
f
vector float f(vector float a, vector float b)
Definition: README_ALTIVEC.txt:200
PPCfsqrtrto
Decimal Convert From to National Zoned Signed int_ppc_altivec_bcdcfno int_ppc_altivec_bcdcfzo int_ppc_altivec_bcdctno int_ppc_altivec_bcdctzo int_ppc_altivec_bcdcfsqo int_ppc_altivec_bcdctsqo int_ppc_altivec_bcdcpsgno int_ppc_altivec_bcdsetsgno int_ppc_altivec_bcdso int_ppc_altivec_bcduso int_ppc_altivec_bcdsro i e VA byte[7] Decimal(Unsigned) Truncate Define DAG Node in PPCInstrInfo def def def def PPCfsqrtrto
Definition: README_P9.txt:208
it
into xmm2 addss xmm2 xmm1 xmm3 addss xmm3 movaps xmm0 unpcklps xmm0 ret seems silly when it could just be one addps Expand libm rounding functions main should enable SSE DAZ mode and other fast SSE modes Think about doing i64 math in SSE regs on x86 This testcase should have no SSE instructions in it
Definition: README-SSE.txt:81
Insert
Vector Rotate Left Mask Mask Insert
Definition: README_P9.txt:112
PowerISA_V3
It looks like we only need to define PPCfmarto for these because according to PowerISA_V3
Definition: README_P9.txt:253
see
The object format emitted by the WebAssembly backed is documented see the home and packaging for producing WebAssembly applications that can run in browsers and other environments wasi sdk provides a more minimal C C SDK based on llvm and a libc based on for producing WebAssemmbly applictions that use the WASI ABI Rust provides WebAssembly support integrated into Cargo There are two main which provides a relatively minimal environment that has an emphasis on being native wasm32 unknown which uses Emscripten internally and provides standard C C filesystem GL and SDL bindings For more see
Definition: README.txt:41
b
cond_next The vcmpeqfp vcmpeqfp instructions currently cannot be merged when the vcmpeqfp result is used by a branch This can be improved The code generated for this is truly float b
Definition: README_ALTIVEC.txt:108
operations
SI optimize exec mask operations
Definition: SIOptimizeExecMasking.cpp:48
Note
We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles Note
Definition: README.txt:239
addis
Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha addis
Definition: README_ALTIVEC.txt:232
int_ppc_altivec_vmul10euq
Vector Multiply by Extended Write Unsigned int_ppc_altivec_vmul10euq
Definition: README_P9.txt:140
vsrc
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision because you need to assign rounding mode in instruction Provide builtin(set f128:$vT,(int_ppc_vsx_xsrqpi f128:$vB))(set f128 yields< n x< ty > >< result > yields< ty >< result > No builtin are required Load Store load store see def memrix16 in PPCInstrInfo td Load Store Vector load store outs ins lxsdx set load store with conversion from to outs ins lxsspx set load store outs ins lxsiwzx set PPClfiwzx ins stxsiwx PPCstfiwx outs vsrc
Definition: README_P9.txt:545
store
Common register allocation spilling lr str ldr sxth r3 ldr mla r4 can lr mov lr str ldr sxth r3 mla r4 and then merge mul and lr str ldr sxth r3 mla r4 It also increase the likelihood the store may become dead bb27 Successors according to LLVM ID Predecessors according to mbb< bb27, 0x8b0a7c0 > Note ADDri is not a two address instruction its result reg1037 is an operand of the PHI node in bb76 and its operand reg1039 is the result of the PHI node We should treat it as a two address code and make sure the ADDri is scheduled after any node that reads reg1039 Use info(i.e. register scavenger) to assign it a free register to allow reuse the collector could move the objects and invalidate the derived pointer This is bad enough in the first but safe points can crop up unpredictably **array_addr i32 n y store obj obj **nth_el If the i64 division is lowered to a then a safe point array and nth_el no longer point into the correct object The fix for this is to copy address calculations so that dependent pointers are never live across safe point boundaries But the loads cannot be copied like this if there was an intervening store
Definition: README.txt:133
llvm::Target
Target - Wrapper for Target specific information.
Definition: TargetRegistry.h:137
NoEncode<"$vTi">
fma NoEncode<"$vTi">
Definition: README_P9.txt:227
v2i64
QP Compare Ordered outs ins xscmpudp No builtin are required Or llvm fcmp order unorder compare DP QP Compare builtin are required DP xscmp *dp write to VSX register Use int_ppc_vsx_xscmpeqdp int_ppc_vsx_xscmpgedp int_ppc_vsx_xscmpgtdp int_ppc_vsx_xscmpnedp v2i64
Definition: README_P9.txt:323
llvm::tgtok::Code
@ Code
Definition: TGLexer.h:50
llvm::RISCVFenceField::W
@ W
Definition: RISCVBaseInfo.h:199
llvm::tgtok::paste
@ paste
Definition: TGLexer.h:45
codegen
Since we know that Vector is byte aligned and we know the element offset of we should change the load into a lve *x instead of doing a load store lve *x sequence Implement passing vectors by value into calls and receiving them as arguments GCC apparently tries to codegen
Definition: README_ALTIVEC.txt:46
llvm::dwarf::Form
Form
Definition: Dwarf.h:131
v16i8
TODO use VCMP VCMPo form(support intrinsic) - Vector Extract Unsigned v16i8
Definition: README_P9.txt:108
llvm::SPII::Store
@ Store
Definition: SparcInstrInfo.h:33
pool
Implement PPCInstrInfo::isLoadFromStackSlot isStoreToStackSlot for vector to generate better spill code The first should be a single lvx from the constant pool
Definition: README_ALTIVEC.txt:8
R600_InstFlag::FC
@ FC
Definition: R600Defines.h:32
llvm::support::aligned
@ aligned
Definition: Endian.h:30
stores
hexagon widen stores
Definition: HexagonStoreWidening.cpp:118
v8i16
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision because you need to assign rounding mode in instruction Provide builtin(set f128:$vT,(int_ppc_vsx_xsrqpi f128:$vB))(set f128 yields< n x< ty > >< result > yields< ty >< result > No builtin are required Load Store load store see def memrix16 in PPCInstrInfo td Load Store Vector load store outs ins lxsdx set load store with conversion from to outs ins lxsspx set load store outs ins lxsiwzx set PPClfiwzx ins stxsiwx PPCstfiwx outs ins lxvd2x set v8i16
Definition: README_P9.txt:548
instructions
Since we know that Vector is byte aligned and we know the element offset of we should change the load into a lve *x instead of doing a load store lve *x sequence Implement passing vectors by value into calls and receiving them as arguments GCC apparently tries to then a load and vperm of Variable We need a way to teach tblgen that some operands of an intrinsic are required to be constants The verifier should enforce this constraint We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a byte aligned stack followed by a load vperm We should probably just store it to a scalar stack then use lvsl vperm to load it If the value is already in memory this is a big win extract_vector_elt of an arbitrary constant vector can be done with the following instructions
Definition: README_ALTIVEC.txt:66
to
Should compile to
Definition: README.txt:449
Shift
bool Shift
Definition: README.txt:468
Instructions
Code Generation Notes for reduce the size of the ISel and reduce repetition in the implementation In a small number of this can cause even when no optimisation has taken place Instructions
Definition: MSA.txt:11
llvm::sys::path::end
const_iterator end(StringRef path)
Get end iterator over path.
Definition: Path.cpp:233
i8
Clang compiles this i8
Definition: README.txt:504
llvm::Triple::ppc
@ ppc
Definition: Triple.h:67
slot
Since we know that Vector is byte aligned and we know the element offset of we should change the load into a lve *x instead of doing a load store lve *x sequence Implement passing vectors by value into calls and receiving them as arguments GCC apparently tries to then a load and vperm of Variable We need a way to teach tblgen that some operands of an intrinsic are required to be constants The verifier should enforce this constraint We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a byte aligned stack slot
Definition: README_ALTIVEC.txt:57
T
#define T
Definition: Mips16ISelLowering.cpp:341
i1
Decimal Convert From to National Zoned Signed int_ppc_altivec_bcdcfno i1
Definition: README_P9.txt:147
llvm::HexagonII::Absolute
@ Absolute
Definition: HexagonBaseInfo.h:32
g
should just be implemented with a CLZ instruction Since there are other e g
Definition: README.txt:709
that
we should consider alternate ways to model stack dependencies Lots of things could be done in WebAssemblyTargetTransformInfo cpp there are numerous optimization related hooks that can be overridden in WebAssemblyTargetLowering Instead of the OptimizeReturned which should consider preserving the returned attribute through to MachineInstrs and extending the MemIntrinsicResults pass to do this optimization on calls too That would also let the WebAssemblyPeephole pass clean up dead defs for such as it does for stores Consider implementing and or getMachineCombinerPatterns Find a clean way to fix the problem which leads to the Shrink Wrapping pass being run after the WebAssembly PEI pass When setting multiple variables to the same we currently get code like const It could be done with a smaller encoding like local tee $pop5 local $pop6 WebAssembly registers are implicitly initialized to zero Explicit zeroing is therefore often redundant and could be optimized away Small indices may use smaller encodings than large indices WebAssemblyRegColoring and or WebAssemblyRegRenumbering should sort registers according to their usage frequency to maximize the usage of smaller encodings Many cases of irreducible control flow could be transformed more optimally than via the transform in WebAssemblyFixIrreducibleControlFlow cpp It may also be worthwhile to do transforms before register particularly when duplicating to allow register coloring to be aware of the duplication WebAssemblyRegStackify could use AliasAnalysis to reorder loads and stores more aggressively WebAssemblyRegStackify is currently a greedy algorithm This means that
Definition: README.txt:130
intrinsic
QP Compare Ordered outs ins xscmpudp No intrinsic
Definition: README_P9.txt:303
vpmsumb
Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha LCPI0_0 toc l LCPI0_1 toc l vpmsumb
Definition: README_ALTIVEC.txt:243
and
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 and
Definition: README.txt:1271
IIC_LdStSTFD
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision because you need to assign rounding mode in instruction Provide builtin(set f128:$vT,(int_ppc_vsx_xsrqpi f128:$vB))(set f128 yields< n x< ty > >< result > yields< ty >< result > No builtin are required Load Store load store see def memrix16 in PPCInstrInfo td Load Store Vector load store outs ins lxsdx set load store with conversion from to outs ins lxsspx set load store outs ins lxsiwzx set PPClfiwzx ins stxsiwx IIC_LdStSTFD
Definition: README_P9.txt:538
llvm::Data
@ Data
Definition: SIMachineScheduler.h:55
memrr
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision because you need to assign rounding mode in instruction Provide builtin(set f128:$vT,(int_ppc_vsx_xsrqpi f128:$vB))(set f128 yields< n x< ty > >< result > yields< ty >< result > No builtin are required Load Store load store see def memrix16 in PPCInstrInfo td Load Store Vector load store outs ins memrr
Definition: README_P9.txt:511
like
Should compile to something like
Definition: README.txt:19
a
=0.0 ? 0.0 :(a > 0.0 ? 1.0 :-1.0) a
Definition: README.txt:489
Quadword
Vector Multiply by Write Unsigned Quadword
Definition: README_P9.txt:131
a
Function align align store< 16 x i8 >< i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16 >< 16 x i8 > * a
Definition: README_ALTIVEC.txt:217
different
Code Generation Notes for reduce the size of the ISel and reduce repetition in the implementation In a small number of this can cause different(semantically equivalent) instructions to be used in place of the requested instruction
way
should just be implemented with a CLZ instruction Since there are other e that share this it would be best to implement this in a target independent way
Definition: README.txt:720
result
It looks like we only need to define PPCfmarto for these because according to these instructions perform RTO on fma s result
Definition: README_P9.txt:256
vsfrc
QP Compare Ordered outs ins vsfrc
Definition: README_P9.txt:300
llvm::RegState::Define
@ Define
Register definition.
Definition: MachineInstrBuilder.h:44
llvm::PPCISD::STXVD2X
@ STXVD2X
CHAIN = STXVD2X CHAIN, VSRC, Ptr - Occurs only for little endian.
Definition: PPCISelLowering.h:565
llvm::getThePPC64LETarget
Target & getThePPC64LETarget()
Definition: PowerPCTargetInfo.cpp:25
not
Altivec not
Definition: README_ALTIVEC.txt:28
i64
Clang compiles this i64
Definition: README.txt:504
mtriple
Function< 16 x i8 > Produces the following code with mtriple
Definition: README_ALTIVEC.txt:230
select
into xmm2 addss xmm2 xmm1 xmm3 addss xmm3 movaps xmm0 unpcklps xmm0 ret seems silly when it could just be one addps Expand libm rounding functions main should enable SSE DAZ mode and other fast SSE modes Think about doing i64 math in SSE regs on x86 This testcase should have no SSE instructions in and only one load from a constant double ret double C the select is being which prevents the dag combiner from turning select(load CPI1)
llvm::ModRefInfo::Ref
@ Ref
The access may reference the value stored in memory.
align
Function align align store< 16 x i8 >< i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16 >< 16 x i8 > align store< 16 x i8 >< i8 113, i8 114, i8 115, i8 116, i8 117, i8 118, i8 119, i8 120, i8 121, i8 122, i8 123, i8 124, i8 125, i8 126, i8 127, i8 112 >< 16 x i8 > align
Definition: README_ALTIVEC.txt:219
llvm::PatternMatch::match
bool match(Val *V, const Pattern &P)
Definition: PatternMatch.h:49
llvm::msgpack::Type::Map
@ Map
shl
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 shl
Definition: README.txt:1271
IIC_VecFP
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp IIC_VecFP
Definition: README_P9.txt:331
llvm::support::little
@ little
Definition: Endian.h:27
C
(vector float) vec_cmpeq(*A, *B) C
Definition: README_ALTIVEC.txt:86
f32
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision because you need to assign rounding mode in instruction Provide builtin(set f128:$vT,(int_ppc_vsx_xsrqpi f128:$vB))(set f128 yields< n x< ty > >< result > yields< ty >< result > No builtin are required Load Store load store see def memrix16 in PPCInstrInfo td Load Store Vector load store outs ins lxsdx set load store with conversion from to outs ins lxsspx set f32
Definition: README_P9.txt:522
Y
static GCMetadataPrinterRegistry::Add< OcamlGCMetadataPrinter > Y("ocaml", "ocaml 3.10-compatible collector")
test
Definition: README.txt:37
aa
return aa
Definition: README_ALTIVEC.txt:197
t
bitcast float %x to i32 %s=and i32 %t, 2147483647 %d=bitcast i32 %s to float ret float %d } declare float @fabsf(float %n) define float @bar(float %x) nounwind { %d=call float @fabsf(float %x) ret float %d } This IR(from PR6194):target datalayout="e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" target triple="x86_64-apple-darwin10.0.0" %0=type { double, double } %struct.float3=type { float, float, float } define void @test(%0, %struct.float3 *nocapture %res) nounwind noinline ssp { entry:%tmp18=extractvalue %0 %0, 0 t
Definition: README-SSE.txt:788
l
This requires reassociating to forms of expressions that are already something that reassoc doesn t think about yet These two functions should generate the same code on big endian int * l
Definition: README.txt:100
B
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
Integer
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision Integer
Definition: README_P9.txt:366
llvm::Triple::ppc64
@ ppc64
Definition: Triple.h:69
lxvw4x
Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha LCPI0_0 toc l LCPI0_1 toc l lxvw4x
Definition: README_ALTIVEC.txt:235
in
The object format emitted by the WebAssembly backed is documented in
Definition: README.txt:11
llvm::ARMISD::VCMP
@ VCMP
Definition: ARMISelLowering.h:146
be
Common register allocation spilling lr str ldr sxth r3 ldr mla r4 can be
Definition: README.txt:14
vslv
Vector Shift Left don t map to llvm shl and because they have different e g vslv
Definition: README_P9.txt:128
PPCfmulrto
Decimal Convert From to National Zoned Signed int_ppc_altivec_bcdcfno int_ppc_altivec_bcdcfzo int_ppc_altivec_bcdctno int_ppc_altivec_bcdctzo int_ppc_altivec_bcdcfsqo int_ppc_altivec_bcdctsqo int_ppc_altivec_bcdcpsgno int_ppc_altivec_bcdsetsgno int_ppc_altivec_bcdso int_ppc_altivec_bcduso int_ppc_altivec_bcdsro i e VA byte[7] Decimal(Unsigned) Truncate Define DAG Node in PPCInstrInfo def def PPCfmulrto
Definition: README_P9.txt:206
vmul10uq
Vector Multiply by Write Unsigned vmul10uq
Definition: README_P9.txt:134
QWord
Decimal Convert From to National Zoned Signed QWord
Definition: README_P9.txt:146
IR
Statically lint checks LLVM IR
Definition: Lint.cpp:744
input
The initial backend is deliberately restricted to z10 We should add support for later architectures at some point If an asm ties an i32 r result to an i64 input
Definition: README.txt:10
c
the resulting code requires compare and branches when and if the revised code is with conditional branches instead of More there is a byte word extend before each where there should be only and the condition codes are not remembered when the same two values are compared twice More LSR enhancements i8 and i32 load store addressing modes are identical int int c
Definition: README.txt:418
only
dot regions only
Definition: RegionPrinter.cpp:205
SDTFPTernaryOp
SDTFPTernaryOp
Definition: README_P9.txt:250
X
static GCMetadataPrinterRegistry::Add< ErlangGCPrinter > X("erlang", "erlang-compatible garbage collector")
PPCfdivrto
Decimal Convert From to National Zoned Signed int_ppc_altivec_bcdcfno int_ppc_altivec_bcdcfzo int_ppc_altivec_bcdctno int_ppc_altivec_bcdctzo int_ppc_altivec_bcdcfsqo int_ppc_altivec_bcdctsqo int_ppc_altivec_bcdcpsgno int_ppc_altivec_bcdsetsgno int_ppc_altivec_bcdso int_ppc_altivec_bcduso int_ppc_altivec_bcdsro i e VA byte[7] Decimal(Unsigned) Truncate Define DAG Node in PPCInstrInfo def PPCfdivrto
Definition: README_P9.txt:205
too
The initial backend is deliberately restricted to z10 We should add support for later architectures at some point If an asm ties an i32 r result to an i64 the input will be treated as an leaving the upper bits uninitialised For i64 store i32 i32 *dst ret void from CodeGen SystemZ asm ll will use LHI rather than LGHI to load This seems to be a general target independent problem The tuning of the choice between LOAD XC and CLC for constant length block operations We could extend them to variable length operations too
Definition: README.txt:40
will
Common register allocation spilling lr str ldr sxth r3 ldr mla r4 can lr mov lr str ldr sxth r3 mla r4 and then merge mul and lr str ldr sxth r3 mla r4 It also increase the likelihood the store may become dead bb27 Successors according to LLVM ID Predecessors according to mbb< bb27, 0x8b0a7c0 > Note ADDri is not a two address instruction its result reg1037 is an operand of the PHI node in bb76 and its operand reg1039 is the result of the PHI node We should treat it as a two address code and make sure the ADDri is scheduled after any node that reads reg1039 Use info(i.e. register scavenger) to assign it a free register to allow reuse the collector could move the objects and invalidate the derived pointer This is bad enough in the first but safe points can crop up unpredictably **array_addr i32 n y store obj obj **nth_el If the i64 division is lowered to a then a safe point will(must) appear for the call site. If a collection occurs
f128
fma f128
Definition: README_P9.txt:226
into
Clang compiles this into
Definition: README.txt:504
i32
Common register allocation spilling lr str ldr sxth r3 ldr mla r4 can lr mov lr str ldr sxth r3 mla r4 and then merge mul and lr str ldr sxth r3 mla r4 It also increase the likelihood the store may become dead bb27 Successors according to LLVM ID Predecessors according to mbb< bb27, 0x8b0a7c0 > Note ADDri is not a two address instruction its result reg1037 is an operand of the PHI node in bb76 and its operand reg1039 is the result of the PHI node We should treat it as a two address code and make sure the ADDri is scheduled after any node that reads reg1039 Use info(i.e. register scavenger) to assign it a free register to allow reuse the collector could move the objects and invalidate the derived pointer This is bad enough in the first but safe points can crop up unpredictably **array_addr i32
Definition: README.txt:122
bar
Implement PPCInstrInfo::isLoadFromStackSlot isStoreToStackSlot for vector to generate better spill code The first should be a single lvx from the constant the second should be a xor bar(x)
int_ppc_altivec_vslv
Vector Shift Left don t map to llvm shl and because they have different e g int_ppc_altivec_vslv
Definition: README_P9.txt:128
llvm::Triple::ppc64le
@ ppc64le
Definition: Triple.h:70
$src
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision because you need to assign rounding mode in instruction Provide builtin(set f128:$vT,(int_ppc_vsx_xsrqpi f128:$vB))(set f128 yields< n x< ty > >< result > yields< ty >< result > No builtin are required Load Store load store see def memrix16 in PPCInstrInfo td Load Store Vector load store outs ins lxsdx $src
Definition: README_P9.txt:512
val
The initial backend is deliberately restricted to z10 We should add support for later architectures at some point If an asm ties an i32 r result to an i64 the input will be treated as an leaving the upper bits uninitialised For i64 store i32 val
Definition: README.txt:15
needed
We should do a little better with eliminating dead stores The stores to the stack are dead since a and b are not needed
Definition: README_ALTIVEC.txt:212
type
AMD64 Optimization Manual has some nice information about optimizing integer multiplication by a constant How much of it applies to Intel s X86 implementation There are definite trade offs to xmm0 cvttss2siq rdx jb L3 subss xmm0 rax cvttss2siq rdx xorq rdx rax ret instead of xmm1 cvttss2siq rcx movaps xmm2 subss xmm2 cvttss2siq rax rdx xorq rax ucomiss xmm0 cmovb rax ret Seems like the jb branch has high likelihood of being taken It would have saved a few instructions It s not possible to reference and DH registers in an instruction requiring REX prefix divb and mulb both produce results in AH If isel emits a CopyFromReg which gets turned into a movb and that can be allocated a r8b r15b To get around isel emits a CopyFromReg from AX and then right shift it down by and truncate it It s not pretty but it works We need some register allocation magic to make the hack go which would often require a callee saved register Callees usually need to keep this value live for most of their body so it doesn t add a significant burden on them We currently implement this in however this is suboptimal because it means that it would be quite awkward to implement the optimization for callers A better implementation would be to relax the LLVM IR rules for sret arguments to allow a function with an sret argument to have a non void return type
Definition: README-X86-64.txt:70
be
entry mr r6 r2 blr CodeGen PowerPC vec_constants ll has an and operation that should be codegen d to andc The issue is that the all ones build vector is SelectNodeTo d a VSPLTISB instruction node before the and xor is selected which prevents the vnot pattern from matching An alternative to the store store load approach for illegal insert element lowering would be
Definition: README_ALTIVEC.txt:181
index
splat index
Definition: README_ALTIVEC.txt:181
following
This is equivalent to the following
Definition: README.txt:671
D
static GCRegistry::Add< StatepointGC > D("statepoint-example", "an example strategy for statepoint")
LLVM_EXTERNAL_VISIBILITY
#define LLVM_EXTERNAL_VISIBILITY
Definition: Compiler.h:132
llvm::LegalityPredicates::all
Predicate all(Predicate P0, Predicate P1)
True iff P0 and P1 are true.
Definition: LegalizerInfo.h:226
llvm::numbers::e
constexpr double e
Definition: MathExtras.h:57
SDTFPBinOp
Decimal Convert From to National Zoned Signed int_ppc_altivec_bcdcfno int_ppc_altivec_bcdcfzo int_ppc_altivec_bcdctno int_ppc_altivec_bcdctzo int_ppc_altivec_bcdcfsqo int_ppc_altivec_bcdctsqo int_ppc_altivec_bcdcpsgno int_ppc_altivec_bcdsetsgno int_ppc_altivec_bcdso int_ppc_altivec_bcduso int_ppc_altivec_bcdsro i e VA byte[7] Decimal(Unsigned) Truncate Define DAG Node in PPCInstrInfo SDTFPBinOp
Definition: README_P9.txt:205
semantics
Vector Shift Left don t map to llvm shl and because they have different semantics
Definition: README_P9.txt:119
I
#define I(x, y, z)
Definition: MD5.cpp:59
PowerPCTargetInfo.h
AltVSXFMARel
fma AltVSXFMARel
Definition: README_P9.txt:228
PPCfsubrto
Decimal Convert From to National Zoned Signed int_ppc_altivec_bcdcfno int_ppc_altivec_bcdcfzo int_ppc_altivec_bcdctno int_ppc_altivec_bcdctzo int_ppc_altivec_bcdcfsqo int_ppc_altivec_bcdctsqo int_ppc_altivec_bcdcpsgno int_ppc_altivec_bcdsetsgno int_ppc_altivec_bcdso int_ppc_altivec_bcduso int_ppc_altivec_bcdsro i e VA byte[7] Decimal(Unsigned) Truncate Define DAG Node in PPCInstrInfo def def def PPCfsubrto
Definition: README_P9.txt:207
instruction
Decimal Convert From to National Zoned Signed int_ppc_altivec_bcdcfno int_ppc_altivec_bcdcfzo int_ppc_altivec_bcdctno int_ppc_altivec_bcdctzo int_ppc_altivec_bcdcfsqo int_ppc_altivec_bcdctsqo int_ppc_altivec_bcdcpsgno int_ppc_altivec_bcdsetsgno int_ppc_altivec_bcdso int_ppc_altivec_bcduso int_ppc_altivec_bcdsro i e VA byte[7] Decimal(Unsigned) Truncate Define DAG Node in PPCInstrInfo def def def def DAG patterns of each instruction(PPCInstrVSX.td)
Definition: README_P9.txt:211
load
LLVM currently emits rax rax movq rax rax ret It could narrow the loads and stores to emit rax rax movq rax rax ret The trouble is that there is a TokenFactor between the store and the load
Definition: README.txt:1531
example
llvm lib Support Unix the directory structure underneath this directory could look like only those directories actually needing to be created should be created further subdirectories could be created to reflect versions of the various standards For example
Definition: README.txt:15
register
should just be implemented with a CLZ instruction Since there are other e that share this it would be best to implement this in a target independent as zero is the default value for the binary encoder e add r0 add r5 Register operands should be distinct That when the encoding does not require two syntactical operands to refer to the same register
Definition: README.txt:726
SDTFPUnaryOp
Decimal Convert From to National Zoned Signed int_ppc_altivec_bcdcfno int_ppc_altivec_bcdcfzo int_ppc_altivec_bcdctno int_ppc_altivec_bcdctzo int_ppc_altivec_bcdcfsqo int_ppc_altivec_bcdctsqo int_ppc_altivec_bcdcpsgno int_ppc_altivec_bcdsetsgno int_ppc_altivec_bcdso int_ppc_altivec_bcduso int_ppc_altivec_bcdsro i e VA byte[7] Decimal(Unsigned) Truncate Define DAG Node in PPCInstrInfo def def def def SDTFPUnaryOp
Definition: README_P9.txt:209
llvm::EngineKind::Either
const static Kind Either
Definition: ExecutionEngine.h:527
v4i32
Vector Rotate Left Mask Mask v4i32
Definition: README_P9.txt:112
llvm::irsymtab::storage::Word
support::ulittle32_t Word
Definition: IRSymtab.h:52
Compare
QP Compare Ordered outs ins xscmpudp No builtin are required Or llvm fcmp order unorder compare DP QP Compare builtin are required DP Compare
Definition: README_P9.txt:309
llvm::ReplayInlineScope::Function
@ Function
Indexed
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision because you need to assign rounding mode in instruction Provide builtin(set f128:$vT,(int_ppc_vsx_xsrqpi f128:$vB))(set f128 yields< n x< ty > >< result > yields< ty >< result > No builtin are required Load Store load store see def memrix16 in PPCInstrInfo td Load Store Vector Indexed
Definition: README_P9.txt:505
QP
So we should use XX3Form_Rcr to implement instrinsic Convert DP QP
Definition: README_P9.txt:329
this
Analysis the ScalarEvolution expression for r is this
Definition: README.txt:8
llvm::Triple::ppcle
@ ppcle
Definition: Triple.h:68
outs
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision because you need to assign rounding mode in instruction Provide builtin(set f128:$vT,(int_ppc_vsx_xsrqpi f128:$vB))(set f128 yields< n x< ty > >< result > yields< ty >< result > No builtin are required Load Store load store see def memrix16 in PPCInstrInfo td Load Store Vector load store outs ins lxsdx set load store with conversion from to outs ins lxsspx set load store outs ins lxsiwzx set PPClfiwzx outs
This returns a reference to a raw_fd_ostream for standard output.
Definition: README_P9.txt:537
instruction
Since we know that Vector is byte aligned and we know the element offset of we should change the load into a lve *x instruction
Definition: README_ALTIVEC.txt:37
Unsigned
@ Unsigned
Definition: NVPTXISelLowering.cpp:4637
Node
Definition: ItaniumDemangle.h:235
xor
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 xor
Definition: README.txt:1271
for
this could be done in SelectionDAGISel along with other special for
Definition: README.txt:104
llvm::SPII::Load
@ Load
Definition: SparcInstrInfo.h:32
v1i128
Vector Multiply by Write Unsigned v1i128
Definition: README_P9.txt:134
not
Should compile r2 movcc movcs str strb mov lr r1 movcs movcc mov lr not
Definition: README.txt:465
instructions
It looks like we only need to define PPCfmarto for these instructions
Definition: README_P9.txt:250
llvm::PPCISD::LXVD2X
@ LXVD2X
VSRC, CHAIN = LXVD2X_LE CHAIN, Ptr - Occurs only for little endian.
Definition: PPCISelLowering.h:541
xoaddr
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision because you need to assign rounding mode in instruction Provide builtin(set f128:$vT,(int_ppc_vsx_xsrqpi f128:$vB))(set f128 yields< n x< ty > >< result > yields< ty >< result > No builtin are required Load Store load store see def memrix16 in PPCInstrInfo td Load Store Vector load xoaddr
Definition: README_P9.txt:506
vmul10euq
Vector Multiply by Extended Write Unsigned vmul10euq
Definition: README_P9.txt:140
llvm::getThePPC64Target
Target & getThePPC64Target()
Definition: PowerPCTargetInfo.cpp:21
j
return j(j<< 16)
foo
void foo(void)
Definition: README_ALTIVEC.txt:17
std
Definition: BitVector.h:838
H
#define H(x, y, z)
Definition: MD5.cpp:58
bit
compiles ldr LCPI1_0 ldr ldr mov lsr tst moveq r1 ldr LCPI1_1 and r0 bx lr It would be better to do something like to fold the shift into the conditional ldr LCPI1_0 ldr ldr tst movne lsr ldr LCPI1_1 and r0 bx lr it saves an instruction and a register It might be profitable to cse MOVi16 if there are lots of bit immediates with the same bottom half Robert Muth started working on an alternate jump table implementation that does not put the tables in line in the text This is more like the llvm default jump table implementation This might be useful sometime Several revisions of patches are on the mailing beginning while CMP sets them like a subtract Therefore to be able to use CMN for comparisons other than the Z bit
Definition: README.txt:584
IIC_LdStLFD
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision because you need to assign rounding mode in instruction Provide builtin(set f128:$vT,(int_ppc_vsx_xsrqpi f128:$vB))(set f128 yields< n x< ty > >< result > yields< ty >< result > No builtin are required Load Store load store see def memrix16 in PPCInstrInfo td Load Store Vector load store outs ins lxsdx IIC_LdStLFD
Definition: README_P9.txt:512
code
*Add support for compiling functions in both ARM and Thumb then taking the smallest *Add support for compiling individual basic blocks in thumb when in a larger ARM function This can be used for presumed cold code
Definition: README-Thumb.txt:9
at
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision because you need to assign rounding mode in instruction Provide builtin(set f128:$vT,(int_ppc_vsx_xsrqpi f128:$vB))(set f128 yields< n x< ty > >< result > yields< ty >< result > No builtin are required Load Store load store see def memrix16 in PPCInstrInfo td Load Store Vector load store outs ins lxsdx set load store with conversion from to outs ins lxsspx set load store outs ins lxsiwzx set PPClfiwzx ins stxsiwx PPCstfiwx outs ins lxvd2x set int_ppc_vsx_lxvh8x int_ppc_vsx_lxvb16x ins stxvd2x store int_ppc_vsx_lxvl int_ppc_vsx_lxvll int_ppc_vsx_lxvwsx st[dw] at
Definition: README_P9.txt:578
rotate
The same transformation can work with an even modulo with the addition of a rotate
Definition: README.txt:680
or
compiles or
Definition: README.txt:606
x
vcmpeq to generate a select mask lvsl slot x
Definition: README_ALTIVEC.txt:182
addi
Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha addi
Definition: README_ALTIVEC.txt:233
examples
vperm to rotate result into correct slot vsel result together Should codegen branches on vec_any vec_all to avoid mfcr Two examples
Definition: README_ALTIVEC.txt:190
better
In x86 we generate this spiffy xmm0 xmm0 ret in x86 we generate this which could be better
Definition: README-SSE.txt:537
llvm::RegisterTarget
RegisterTarget - Helper template for registering a target, for use in the target's initialization fun...
Definition: TargetRegistry.h:1057
$XA
QP Compare Ordered outs ins xscmpudp $XA
Definition: README_P9.txt:301
LLVMInitializePowerPCTargetInfo
LLVM_EXTERNAL_VISIBILITY void LLVMInitializePowerPCTargetInfo()
Definition: PowerPCTargetInfo.cpp:30
blr
cond_true lis lo16() lo16() lo16() f1 fsel f3 blr
Definition: README.txt:108
$XT
QP Compare Ordered outs ins xscmpudp No builtin are required Or llvm fcmp order unorder compare DP QP Compare builtin are required DP xscmp *dp write to VSX register Use int_ppc_vsx_xscmpeqdp int_ppc_vsx_xscmpgedp int_ppc_vsx_xscmpgtdp int_ppc_vsx_xscmpnedp $XT
Definition: README_P9.txt:322
DP
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision because you need to assign rounding mode in instruction Provide builtin(set f128:$vT,(int_ppc_vsx_xsrqpi f128:$vB))(set f128 yields< n x< ty > >< result > yields< ty >< result > No builtin are required Load Store load store see def memrix16 in PPCInstrInfo td Load Store Vector load store outs ins lxsdx set load store with conversion from to DP
Definition: README_P9.txt:520
llvm::MCID::Add
@ Add
Definition: MCInstrDesc.h:183
llvm::tgtok::Class
@ Class
Definition: TGLexer.h:50
llvm::getThePPC32LETarget
Target & getThePPC32LETarget()
Definition: PowerPCTargetInfo.cpp:17
Vector
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision because you need to assign rounding mode in instruction Provide builtin(set f128:$vT,(int_ppc_vsx_xsrqpi f128:$vB))(set f128 yields< n x< ty > >< result > yields< ty >< result > No builtin are required Load Store Vector
Definition: README_P9.txt:497
This
the resulting code requires compare and branches when and if the revised code is with conditional branches instead of More there is a byte word extend before each where there should be only and the condition codes are not remembered when the same two values are compared twice More LSR enhancements i8 and i32 load store addressing modes are identical This
Definition: README.txt:418
used
This might compile to this xmm1 xorps xmm0 movss xmm0 ret Now consider if the code caused xmm1 to get spilled This might produce this xmm1 movaps xmm0 movaps xmm1 movss xmm0 ret since the reload is only used by these we could fold it into the producing something like xmm1 movaps xmm0 ret saving two instructions The basic idea is that a reload from a spill if only one byte chunk is used
Definition: README-SSE.txt:270
N
#define N
support
Reimplement select in terms of SEL *We would really like to support but we need to prove that the add doesn t need to overflow between the two bit chunks *Implement pre post increment support(e.g. PR935) *Implement smarter const ant generation for binops with large immediates. A few ARMv6T2 ops should be pattern matched
Definition: README.txt:10
generate
We currently generate
Definition: README.txt:597
LLVM
MIPS Relocation Principles In LLVM
Definition: Relocation.txt:3
stack
S is passed via registers r2 But gcc stores them to the stack
Definition: README.txt:189
$XB
QP Compare Ordered outs ins xscmpudp $XB
Definition: README_P9.txt:301
lowering
amdgpu printf runtime AMDGPU Printf lowering
Definition: AMDGPUPrintfRuntimeBinding.cpp:87
llvm::msgpack::Type
Type
MessagePack types as defined in the standard, with the exception of Integer being divided into a sign...
Definition: MsgPackReader.h:49
llvm::HexStyle::Asm
@ Asm
0ffh
Definition: MCInstPrinter.h:34
RegConstraint<"$vTi = $vT">
fma RegConstraint<"$vTi = $vT">
Definition: README_P9.txt:227
stxvw4x
Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha LCPI0_0 toc l LCPI0_1 toc l stxvw4x
Definition: README_ALTIVEC.txt:238
v2f64
QP Compare Ordered outs ins xscmpudp No builtin are required Or llvm fcmp order unorder compare DP QP Compare builtin are required DP xscmp *dp write to VSX register Use int_ppc_vsx_xscmpeqdp int_ppc_vsx_xscmpgedp int_ppc_vsx_xscmpgtdp int_ppc_vsx_xscmpnedp v2f64
Definition: README_P9.txt:323
iaddrX4
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision because you need to assign rounding mode in instruction Provide builtin(set f128:$vT,(int_ppc_vsx_xsrqpi f128:$vB))(set f128 yields< n x< ty > >< result > yields< ty >< result > No builtin are required Load Store load store see def memrix16 in PPCInstrInfo td Load Store Vector load store outs ins lxsdx set load iaddrX4
Definition: README_P9.txt:516
nounwind
this lets us change the cmpl into a which is and eliminate the shift We compile this i32 i32 i8 zeroext d nounwind
Definition: README.txt:973
From
BlockVerifier::State From
Definition: BlockVerifier.cpp:55
lshr
Vector Shift Left don t map to llvm shl and lshr
Definition: README_P9.txt:118
spill
the custom lowered code happens to be but we shouldn t have to custom lower anything This is probably related to< 2 x i64 > ops being so bad LLVM currently generates stack realignment when it is not necessary needed The problem is that we need to know about stack alignment too before RA runs At that point we don t whether there will be vector spill
Definition: README-SSE.txt:489
ori
Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha LCPI0_0 toc l LCPI0_1 toc l ori
Definition: README_ALTIVEC.txt:239
$dst
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision because you need to assign rounding mode in instruction Provide builtin(set f128:$vT,(int_ppc_vsx_xsrqpi f128:$vB))(set f128 yields< n x< ty > >< result > yields< ty >< result > No builtin are required Load Store load store see def memrix16 in PPCInstrInfo td Load Store Vector load store outs ins lxsdx set load store with conversion from to outs ins lxsspx set load store outs ins lxsiwzx set PPClfiwzx ins stxsiwx $dst
Definition: README_P9.txt:538
f64
QP Compare Ordered outs ins xscmpudp No builtin are required Or llvm fcmp order unorder compare DP QP Compare builtin are required DP xscmp *dp write to VSX register Use int_ppc_vsx_xscmpeqdp f64
Definition: README_P9.txt:314
d
the resulting code requires compare and branches when and if the revised code is with conditional branches instead of More there is a byte word extend before each where there should be only and the condition codes are not remembered when the same two values are compared twice More LSR enhancements i8 and i32 load store addressing modes are identical int int int d
Definition: README.txt:418
registers
Implement PPCInstrInfo::isLoadFromStackSlot isStoreToStackSlot for vector registers
Definition: README_ALTIVEC.txt:4
branches
arc branch ARC finalize branches
Definition: ARCBranchFinalize.cpp:66
vssrc
So we should use XX3Form_Rcr to implement instrinsic Convert DP outs ins xscvdpsp No builtin are required Round &Convert QP DP(dword[1] is set to zero) No builtin are required Round to Quad Precision because you need to assign rounding mode in instruction Provide builtin(set f128:$vT,(int_ppc_vsx_xsrqpi f128:$vB))(set f128 yields< n x< ty > >< result > yields< ty >< result > No builtin are required Load Store load store see def memrix16 in PPCInstrInfo td Load Store Vector load store outs ins lxsdx set load store with conversion from to outs vssrc
Definition: README_P9.txt:520
copy
we should consider alternate ways to model stack dependencies Lots of things could be done in WebAssemblyTargetTransformInfo cpp there are numerous optimization related hooks that can be overridden in WebAssemblyTargetLowering Instead of the OptimizeReturned which should consider preserving the returned attribute through to MachineInstrs and extending the MemIntrinsicResults pass to do this optimization on calls too That would also let the WebAssemblyPeephole pass clean up dead defs for such as it does for stores Consider implementing and or getMachineCombinerPatterns Find a clean way to fix the problem which leads to the Shrink Wrapping pass being run after the WebAssembly PEI pass When setting multiple variables to the same we currently get code like const It could be done with a smaller encoding like local tee $pop5 local copy
Definition: README.txt:101
TargetRegistry.h
entry
print Instructions which execute on loop entry
Definition: MustExecute.cpp:339
stvx
Implement PPCInstrInfo::isLoadFromStackSlot isStoreToStackSlot for vector to generate better spill code The first should be a single lvx from the constant the second should be a xor stvx
Definition: README_ALTIVEC.txt:12
of
Add support for conditional and other related patterns Instead of
Definition: README.txt:134
llvm::irsymtab::build
Error build(ArrayRef< Module * > Mods, SmallVector< char, 0 > &Symtab, StringTableBuilder &StrtabBuilder, BumpPtrAllocator &Alloc)
Fills in Symtab and StrtabBuilder with a valid symbol and string table for Mods.
Definition: IRSymtab.cpp:356
llvm::getThePPC32Target
Target & getThePPC32Target()
Definition: PowerPCTargetInfo.cpp:13