LLVM
17.0.0git
|
#include <algorithm>
Typedefs | |
using | tmp1 = urem i32 %X, 255 ret i32 %tmp1 } Currently it compiles to:... movl $2155905153, %ecx movl 8(%esp), %esi movl %esi, %eax mull %ecx ... This could be "reassociated" into:movl $2155905153, %eax movl 8(%esp), %ecx mull %ecx to avoid the copy. In fact, the existing two-address stuff would do this except that mul isn 't a commutative 2-addr instruction. I guess this has to be done at isel time based on the #uses to mul? Make sure the instruction which starts a loop does not cross a cacheline boundary. This requires knowning the exact length of each machine instruction. That is somewhat complicated, but doable. Example 256.bzip2:In the new trace, the hot loop has an instruction which crosses a cacheline boundary. In addition to potential cache misses, this can 't help decoding as I imagine there has to be some kind of complicated decoder reset and realignment to grab the bytes from the next cacheline. 532 532 0x3cfc movb(1809(%esp, %esi), %bl<<<--- spans 2 64 byte lines 942 942 0x3d03 movl %dh,(1809(%esp, %esi) 937 937 0x3d0a incl %esi 3 3 0x3d0b cmpb %bl, %dl 27 27 0x3d0d jnz 0x000062db< main+11707 > In c99 mode, the preprocessor doesn 't like assembly comments like #TRUNCATE. This could be a single 16-bit load. int f(char *p) { if((p[0]==1) &(p[1]==2)) return 1 |
using | a = =0.0 ? 0.0 :(a > 0.0 ? 1.0 :-1.0) |
using | edx = sar eax, 31) more aggressively |
using | mtune = pentium2/3/4/m/etc.:abs:movl 4(%esp), %eax cltd xorl %edx, %eax subl %edx, %eax ret Take the following code(from http:extern unsigned char first_one[65536] |
using | tmp = call i32 @t4( i32 %tmp.5 ) |
Functions | |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg | support (From http:but without the unnecessary and.) movl %ecx |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit | shifts (in general) expand to really bad code. Instead of using cmovs |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some | processors (which ones?) |
void | clearbit (int *target, int bit) |
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen one of which is eax eax | leal (%eax) |
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen one of which is eax eax ecx cmovle eax eax ret which is probably but it s interesting at since libc is hand tuned for medium and large mem | ops (avoiding RFO for large stores, TLB preheating, etc) Optimize this into something reasonable |
_test eax | movaps (%eax) |
_test eax xmm0 | movl (%esp) |
_test eax xmm0 eax xmm1 comiss xmm1 setae al movzbl ecx eax edx ecx cmove eax ret Note the cmove can be replaced with a single cmovae There are a number of issues We are introducing a setcc between the result of the intrisic call and select The intrinsic is expected to produce a i32 value so a any | extend (which becomes a zero extend) is added. We probably need some kind of target DAG combine hook to fix this. We generate significantly worse code for this than GCC |
We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles eax | leal (%eax,%eax, 2) |
the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section coalesced instead of section coalesced Take a look at darwin there are other Darwin assembler directives that we do not make use of define i32 | foo (i32 *%a, i32 %t) |
esp | movl (%esp, 1) |
esp eax movl ecx ecx cvtsi2ss xmm0 eax cvtsi2ss xmm1 xmm0 addss xmm0 movss | flds (%esp, 1) 0000002d addl $0x04 |
we currently eax | movsbl (%esp) |
_test eax | subl (%esp) |
For the entry BB esp pxor xmm0 | movsd (%esp) |
For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd | LCPI1_1 (%rip) |
For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd xmm3 ucomisd xmm0 ja LBB1_3 xmm2 xmm0 ret We should sink the load into xmm3 into the LBB1_2 block This should be pretty and will nuke all the copies bool | full_add (unsigned a, unsigned b) |
bool | no_overflow (unsigned a, unsigned b) |
< i32 > br label bb114 eax ecx movl ebp subl ebp | movswl (%ebp) |
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp | ret (also really horrible code on ppc). This is due to the expand code for 64-bit compares. GCC produces multiple branches |
int | caller (int32 arg1, int32 arg2) |
Here we don t need to write any variables to the top of the stack since they don t overwrite each other int | callee (int32 arg1, int32 arg2) |
Here we need to push the arguments because they overwrite each other | main () |
void | test (double *P) |
The generated code on x86 for checking for signed overflow on a multiply the obvious way is much longer than it needs to be int | x (int a, int b) |
int | FirstOnet (unsigned long long arg1) |
The following code is currently eax eax ecx jb LBB1_2 eax movzbl | first_one (%eax) |
The following code is currently eax eax ecx jb LBB1_2 eax movzbl eax ret eax ret We could change the eax into | movzwl (%esp) |
< i32 > ret i32 tmp10 ecx esp je LBB0_2 eax addl eax ret edx movl eax subl eax ret There s an obviously unnecessary movl in and we could eliminate a couple more movls by putting(%esp) into %eax instead of %ecx. See rdar< i1 * > define fastcc void | abort_gzip () noreturn nounwind |
declare void | exit (i32) noreturn nounwind This compiles into |
into (-m64) | |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp | leaq (%rsp) |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax | addq (%rsp) |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 | movq (%rsp) |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align edx rdx eax movl rsp eax rsp ret and it gets compiled into this on ebp esp eax movl ebp eax movl ebp eax esp popl ebp ret gcc ebp eax popl ebp ret Teach tblgen not to check bitconvert source type in some cases This allows us to consolidate the following patterns in X86InstrMMX | v2i32 (MMX_MOVDQ2Qrr VR128:$src))> |
def | v4i16 (MMX_MOVDQ2Qrr VR128:$src))> |
def | v8i8 (MMX_MOVDQ2Qrr VR128:$src))> |
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax | addl (%esp) |
Current eax eax eax ret Ideal eax eax ret Re implement atomic builtins | __sync_add_and_fetch () and __sync_sub_and_fetch properly. When the return value is not used(i.e. only care about the value in the memory) |
int | bar (struct B *a) |
define i32 | bar (%struct.B *nocapture %a) nounwind readonly optsize |
bar al al movzbl eax ret Missed when stored in a memory are stored as single byte objects the value of which is | always (false) or 1(true). We are not using this fact |
define i32 | bar (i8 *nocapture %a) nounwind readonly optsize |
<%struct.bf ** > define void | t1 () nounwind ssp |
LLVM currently emits rax rax movq | bfi (%rip) |
else | if (x==1) qux() |
Variables | |
Improvements to the multiply shift add | algorithm |
Improvements to the multiply shift add e g in | LLVM |
http | __pad4__ |
http eax xorl edx | testb |
http eax xorl edx cl sete al setne dl sall | cl |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support | Also |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra | shift |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction | shorter |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax | shrl |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl | eax |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx | xorl |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel | ideas |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel | Duplication |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence | Problem |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t | commutative |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many | cases |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like | this |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl | al |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl | ebx |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl esp setl al ret Doing this correctly is tricky | though |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl esp setl al ret Doing this correctly is tricky as the xor clobbers the flags We should generate bts btr etc instructions on targets where they are cheap or when codesize is important e | g |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl esp setl al ret Doing this correctly is tricky as the xor clobbers the flags We should generate bts btr etc instructions on targets where they are cheap or when codesize is important e | for |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl esp setl al ret Doing this correctly is tricky as the xor clobbers the flags We should generate bts btr etc instructions on targets where they are cheap or when codesize is important e int | bit |
Instead of the following for memset char edx | movl |
Instead of the following for memset char edx edx | movw |
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv | X |
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y | _test1 |
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx | sarl |
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl | ecx |
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen | it |
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen one of which is eax | cmpl |
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen one of which is eax eax ecx cmovle eax eax ret which is probably | slower |
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen one of which is eax eax ecx cmovle eax eax ret which is probably but it s interesting at | least |
_test | __pad5__ |
_test eax xmm0 eax xmm1 comiss | xmm0 |
_test eax xmm0 eax xmm1 comiss xmm1 setae al movzbl ecx eax edx ecx cmove eax ret Note the | setae |
_test eax xmm0 eax xmm1 comiss xmm1 setae al movzbl ecx eax edx ecx cmove eax ret Note the | movzbl |
We currently emits | imull |
We currently emits | esp |
We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles | Note |
We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles eax eax The current instruction priority is based on pattern complexity The former is more complex because it folds a load so the latter will not be emitted Perhaps we should use AddedComplexity to give LEA32r a higher priority We should always try to match LEA first since the LEA matching code does some estimate to determine whether the match is profitable | However |
We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles eax eax The current instruction priority is based on pattern complexity The former is more complex because it folds a load so the latter will not be emitted Perhaps we should use AddedComplexity to give LEA32r a higher priority We should always try to match LEA first since the LEA matching code does some estimate to determine whether the match is profitable if we care more about code | size |
We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles eax eax The current instruction priority is based on pattern complexity The former is more complex because it folds a load so the latter will not be emitted Perhaps we should use AddedComplexity to give LEA32r a higher priority We should always try to match LEA first since the LEA matching code does some estimate to determine whether the match is profitable if we care more about code then imull is better It s two bytes shorter than movl leal On a Pentium | M |
We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles eax eax The current instruction priority is based on pattern complexity The former is more complex because it folds a load so the latter will not be emitted Perhaps we should use AddedComplexity to give LEA32r a higher priority We should always try to match LEA first since the LEA matching code does some estimate to determine whether the match is profitable if we care more about code then imull is better It s two bytes shorter than movl leal On a Pentium both variants have the same characteristics with regard to | throughput |
however | |
the multiplication has a latency of four | cycles |
the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section | __TEXT |
the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section | __const_coal |
the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section coalesced instead of section | __DATA |
the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section coalesced instead of section coalesced Take a look at darwin | h |
is pessimized by loop reduce and indvars u32 to float conversion | improvement |
float | fh = (int) (u >> 16) |
fh *return fh | fl = 0x1.0p16f |
subl | |
esp eax movl ecx ecx cvtsi2ss xmm0 | andl |
esp eax movl ecx ecx cvtsi2ss xmm0 eax cvtsi2ss xmm1 | mulss |
esp eax movl ecx ecx cvtsi2ss xmm0 eax cvtsi2ss xmm1 xmm0 addss | xmm1 |
return | |
We should lrintf and probably other libc functions This | code |
is currently compiled | to |
is currently compiled esp esp jne LBB1_1 | addl |
is currently compiled esp esp jne LBB1_1 esp ret | LBB1_1 |
is currently compiled esp esp jne LBB1_1 esp ret esp esp jne L_abort $stub esp ret This can be applied to any no return function call that takes no arguments etc | Alternatively |
is currently compiled esp esp jne LBB1_1 esp ret esp esp jne L_abort $stub esp ret This can be applied to any no return function call that takes no arguments etc the stack save restore logic could be shrink | wrapped |
is currently compiled esp esp jne LBB1_1 esp ret esp esp jne L_abort $stub esp ret This can be applied to any no return function call that takes no arguments etc the stack save restore logic could be shrink producing something like esp jne LBB1_1 ret esp call L_abort $stub Both are useful in different situations | Finally |
is currently compiled esp esp jne LBB1_1 esp ret esp esp jne L_abort $stub esp ret This can be applied to any no return function call that takes no arguments etc the stack save restore logic could be shrink producing something like esp jne LBB1_1 ret esp call L_abort $stub Both are useful in different situations it could be shrink wrapped and tail | called |
we currently | produce |
we currently eax ecx subl eax ret We would use one fewer register if codegen d | as |
we currently eax ecx subl eax ret We would use one fewer register if codegen d eax neg eax | add |
we currently eax ecx subl eax ret We would use one fewer register if codegen d eax neg eax eax ret Note that this isn t beneficial if the load can be folded into the sub In this | case |
we currently eax ecx subl eax ret We would use one fewer register if codegen d eax neg eax eax ret Note that this isn t beneficial if the load can be folded into the sub In this we want a | sub |
_test | __pad6__ |
_test eax eax ret Leaf functions that require one byte spill slot have a prolog like esp and an epilog like esp popl esi ret It would be | smaller |
_test eax eax ret Leaf functions that require one byte spill slot have a prolog like esp and an epilog like esp popl esi ret It would be and potentially | faster |
For | example |
For the entry BB | is |
For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd xmm3 ucomisd xmm0 ja LBB1_3 | LBB1_2 |
For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd xmm3 ucomisd xmm0 ja LBB1_3 xmm2 | LBB1_3 |
For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd xmm3 ucomisd xmm0 ja LBB1_3 xmm2 xmm0 ret We should sink the load into xmm3 into the LBB1_2 block This should be pretty | easy |
For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd xmm3 ucomisd xmm0 ja LBB1_3 xmm2 xmm0 ret We should sink the load into xmm3 into the LBB1_2 block This should be pretty and will nuke all the copies | This |
Should compile edi setae al movzbl eax ret on | x86 |
Should compile edi setae al movzbl eax ret on instead of the rather stupid | looking |
Should compile edi setae al movzbl eax ret on instead of the rather stupid edi setb al | xorb |
preds | |
< i32 > | tmp233 = sub i32 32 |
< i32 > | tmp231232 |
< i32 > | tmp245246 = sext i16 %tmp65 to i32 |
< i32 > | tmp252253 = sext i16 %tmp68 to i32 |
< i32 > | tmp254 = sub i32 32 |
< i32 > | tmp553554 = bitcast i16* %tmp37 to i8* |
< i8 * > | tmp583584 = sext i16 %tmp98 to i32 |
< i32 > | tmp585 = sub i32 32 |
< i32 > | tmp614615 = sext i16 %tmp101 to i32 |
< i32 > | tmp621622 = sext i16 %tmp104 to i32 |
< i32 > | tmp623 = sub i32 32 |
< i32 > br label bb114 | produces |
< i32 > br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could | be |
< i32 > br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could ebp ebp This seems like a cross between remat and spill folding This has redundant subtractions of eax from a stack slot ecx doesn t | change |
< i32 > br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could ebp ebp This seems like a cross between remat and spill folding This has redundant subtractions of eax from a stack slot ecx doesn t so we could simply subtract eax from ecx first and then use ecx(or vice-versa). This code< i1 > br i1 | tmp659 |
< i32 > br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could ebp ebp This seems like a cross between remat and spill folding This has redundant subtractions of eax from a stack slot ecx doesn t so we could simply subtract eax from ecx first and then use ecx(or vice-versa). This code< i1 > br i1 label | cond_true662 |
< i32 > br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could ebp ebp This seems like a cross between remat and spill folding This has redundant subtractions of eax from a stack slot ecx doesn t so we could simply subtract eax from ecx first and then use ecx(or vice-versa). This code< i1 > br i1 label label cond_next715 produces cx movswl | cx |
to | __pad7__ |
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove | ax |
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much | nicer |
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 | L5 |
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret | p2align |
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret | L7 |
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret eax ja L5 | L4 |
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret eax ja L5 call abort Tail call optimization | improvements |
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret eax ja L5 call abort Tail call optimization | int64 |
Moving arg1 onto the stack slot of callee function would overwrite arg2 of the caller Possible | optimizations |
Moving arg1 onto the stack slot of callee function would overwrite arg2 of the caller Possible int32 | arg2 |
gcc compiles this esp xorl eax jmp L2 | L3 |
gcc compiles this esp xorl eax jmp L2 eax je L10 | L2 |
gcc compiles this esp xorl eax jmp L2 eax je L10 eax eax jne L3 call L_abort $stub | L10 |
gcc compiles this esp xorl eax jmp L2 eax je L10 eax eax jne L3 call L_abort $stub esp call L_exit $stub | llvm |
gcc compiles this esp xorl eax jmp L2 eax je L10 eax eax jne L3 call L_abort $stub esp call L_exit $stub esp eax ecx eax jge LBB1_4 ecx ecx jne LBB1_1 eax esp ret | LBB1_4 |
it should cost the same as a move shift on any modern | processor |
it should cost the same as a move shift on any modern but it s a lot shorter Downside is that it puts more pressure on register allocation because it has fixed operands | Example |
The following code is currently | generated |
this lets us change the cmpl into a | testl |
this lets us change the cmpl into a which is and eliminate the shift We compile this | function |
this lets us change the cmpl into a which is and eliminate the shift We compile this i32 | b { return (unsigned char)a == (unsigned char)b |
this lets us change the cmpl into a which is and eliminate the shift We compile this i32 i32 | c |
this lets us change the cmpl into a which is and eliminate the shift We compile this i32 i32 i8 zeroext d | nounwind |
< i1 > br i1 | tmp2 |
< i1 > br i1 label | bb7 |
< i1 > br i1 label label bb | bb |
< i32 > ret i32 tmp10 ecx | cmpb |
< i32 > ret i32 tmp10 ecx esp je LBB0_2 eax addl eax ret | LBB0_2 |
gets compiled into this on rsp movaps | xmm7 |
gets compiled into this on rsp movaps rsp movaps | xmm6 |
gets compiled into this on rsp movaps rsp movaps rsp movaps | xmm5 |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps | xmm4 |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps | xmm3 |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps | xmm2 |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq | r9 |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq | r8 |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq | rcx |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq | rdx |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq | rsi |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq | rax |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax | addq |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc | generates |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp | LCFI0 |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align | L6 |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align edx rdx eax movl rsp eax rsp ret and it gets compiled into this on ebp esp eax movl ebp eax movl ebp eax esp popl ebp ret gcc ebp eax popl ebp ret Teach tblgen not to check bitconvert source type in some cases This allows us to consolidate the following patterns in X86InstrMMX | td |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align edx rdx eax movl rsp eax rsp ret and it gets compiled into this on ebp esp eax movl ebp eax movl ebp eax esp popl ebp ret gcc ebp eax popl ebp ret Teach tblgen not to check bitconvert source type in some cases This allows us to consolidate the following patterns in X86InstrMMX | iPTR |
def | __pad8__ |
def | __pad9__ |
There are other cases in various td files Take something like the following on unsigned | y {return x % y |
We currently generate a | libcall |
We currently generate a but we really shouldn | t |
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 | and |
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 | or |
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 | xor |
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 | neg |
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 | shl |
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 | sra |
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 | srl |
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 | shld |
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 | shrd |
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 atomic | ops |
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 atomic and others It is also currently not done for read modify write instructions It is also current not done if the OF or CF flags are needed The shift operators have the complication that when the shift count is | zero |
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 atomic and others It is also currently not done for read modify write instructions It is also current not done if the OF or CF flags are needed The shift operators have the complication that when the shift count is EFLAGS is not | set |
Current | output |
Current eax eax eax ret Ideal eax eax ret Re implement atomic builtins x86 does not have to use add to implement these | Instead |
Current eax eax eax ret Ideal eax eax ret Re implement atomic builtins x86 does not have to use add to implement these it can use | inc |
bar | __pad10__ |
bar al | andb |
bar al al movzbl eax ret Missed | optimization |
bar al al movzbl eax ret Missed when stored in a memory | object |
bar | __pad11__ |
bar al al movzbl eax ret GCC produces | bar |
We generate the following IR with | clang |
We generate the following IR with i32 b nounwind | readnone |
< i32 > | tmp6 = and i32 %tmp |
< i32 >< i32 > | cmp = icmp eq i32 %tmp6 |
< i32 >< i32 >< i1 > | conv5 = zext i1 %cmp to i32 |
< i32 > | conv1 = ashr i32 %sext |
< i32 >< i32 > | sext6 = shl i32 %b |
< i32 >< i32 >< i32 > | conv4 = ashr i32 %sext6 |
< i32 > ret i32 conv5 And the following x86 eax movsbl | dil |
< i32 > ret i32 conv5 And the following x86 eax movsbl ecx cmpl ecx sete al movzbl eax ret It should be possible to eliminate the sign extensions LLVM misses a load store narrowing opportunity in this | i16 |
< i32 > ret i32 conv5 And the following x86 eax movsbl ecx cmpl ecx sete al movzbl eax ret It should be possible to eliminate the sign extensions LLVM misses a load store narrowing opportunity in this i32 | bfi = external global %struct.bf* |
LLVM currently emits rax rax movq rax rax ret It could narrow the loads and stores to emit rax rax movq rax rax ret The trouble is that there is a TokenFactor between the store and the | load |
currently compiles | into |
currently compiles eax eax je LBB0_3 testl eax jne LBB0_4 the testl could be | removed |
Definition at line 489 of file README.txt.
Definition at line 923 of file README.txt.
using mtune = pentium2/3/4/m/etc.: abs: movl 4(%esp), %eax cltd xorl %edx, %eax subl %edx, %eax ret Take the following code (from http: extern unsigned char first_one[65536] |
Definition at line 943 of file README.txt.
using tmp1 = urem i32 %X, 255 ret i32 %tmp1 } Currently it compiles to: ... movl $2155905153, %ecx movl 8(%esp), %esi movl %esi, %eax mull %ecx ... This could be "reassociated" into: movl $2155905153, %eax movl 8(%esp), %ecx mull %ecx to avoid the copy. In fact, the existing two-address stuff would do this except that mul isn't a commutative 2-addr instruction. I guess this has to be done at isel time based on the #uses to mul? Make sure the instruction which starts a loop does not cross a cacheline boundary. This requires knowning the exact length of each machine instruction. That is somewhat complicated, but doable. Example 256.bzip2: In the new trace, the hot loop has an instruction which crosses a cacheline boundary. In addition to potential cache misses, this can't help decoding as I imagine there has to be some kind of complicated decoder reset and realignment to grab the bytes from the next cacheline. 532 532 0x3cfc movb (1809(%esp, %esi), %bl <<<--- spans 2 64 byte lines 942 942 0x3d03 movl %dh, (1809(%esp, %esi) 937 937 0x3d0a incl %esi 3 3 0x3d0b cmpb %bl, %dl 27 27 0x3d0d jnz 0x000062db <main+11707> In c99 mode, the preprocessor doesn't like assembly comments like #TRUNCATE. This could be a single 16-bit load. int f(char *p) { if ((p[0] == 1) & (p[1] == 2)) return 1 |
Definition at line 375 of file README.txt.
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax addl | ( | % | esp | ) |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax addq | ( | % | rsp | ) |
bar al al movzbl eax ret Missed when stored in a memory are stored as single byte objects the value of which is always | ( | false | ) |
Definition at line 1412 of file README.txt.
Referenced by llvm::AArch64LegalizerInfo::AArch64LegalizerInfo(), llvm::LegalizeRuleSet::alwaysLegal(), llvm::LegalizeRuleSet::fallback(), llvm::LegalizeRuleSet::libcall(), and llvm::LegalizeRuleSet::lower().
Definition at line 1390 of file README.txt.
Definition at line 1418 of file README.txt.
Definition at line 1388 of file README.txt.
Here we don t need to write any variables to the top of the stack since they don t overwrite each other int callee | ( | int32 | arg1, |
int32 | arg2 | ||
) |
int caller | ( | int32 | arg1, |
int32 | arg2 | ||
) |
Definition at line 681 of file README.txt.
Referenced by useFuncSeen().
Definition at line 111 of file README.txt.
References bit.
declare void exit | ( | i32 | ) |
Definition at line 1072 of file README.txt.
Referenced by abort_gzip(), appendFile(), llvm::RegionBase< RegionTraits< Function > >::contains(), llvm::Hexagon_MC::createHexagonMCSubtargetInfo(), llvm::LLVMContext::diagnose(), llvm::sys::Process::Exit(), llvm::Interpreter::exitCalled(), fatalOpenError(), llvm::RegionBase< RegionTraits< Function > >::getExit(), llvm::RegionBase< RegionTraits< Function > >::getExitingBlock(), llvm::RegionBase< RegionTraits< Function > >::getExitingBlocks(), llvm::RegionBase< RegionTraits< Function > >::getExpandedRegion(), llvm::handleExecNameEncodedBEOpts(), llvm::handleExecNameEncodedOptimizerOpts(), iJIT_NotifyEvent(), llvm::RegionBase< RegionTraits< Function > >::isTopLevelRegion(), parseCHRFilterFiles(), llvm::PrintFatalError(), llvm::PrintFatalNote(), llvm::RegionBase< RegionTraits< Function > >::replaceExit(), llvm::report_fatal_error(), and reportOpenError().
_test eax xmm0 eax xmm1 comiss xmm1 setae al movzbl ecx eax edx ecx cmove eax ret Note the cmove can be replaced with a single cmovae There are a number of issues We are introducing a setcc between the result of the intrisic call and select The intrinsic is expected to produce a i32 value so a any extend | ( | which becomes a zero | extend | ) |
Definition at line 213 of file README.txt.
Referenced by DecodeAddSubERegInstruction().
Referenced by FirstOnet().
int FirstOnet | ( | unsigned long long | arg1 | ) |
Definition at line 944 of file README.txt.
References first_one().
the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section coalesced instead of section coalesced Take a look at darwin there are other Darwin assembler directives that we do not make use of define i32 foo | ( | i32 *% | a, |
i32 % | t | ||
) |
For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd xmm3 ucomisd xmm0 ja LBB1_3 xmm2 xmm0 ret We should sink the load into xmm3 into the LBB1_2 block This should be pretty and will nuke all the copies bool full_add | ( | unsigned | a, |
unsigned | b | ||
) |
Definition at line 531 of file README.txt.
into | ( | - | m64 | ) |
Definition at line 1092 of file README.txt.
References x.
For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd LCPI1_1 | ( | % | rip | ) |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align edx rdx eax movl rsp eax rsp ret and it gets compiled into this on ebp esp eax movl ebp leal | ( | % | eax | ) |
We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles eax leal | ( | % | eax, |
% | eax, | ||
2 | |||
) |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp leaq | ( | % | rsp | ) |
Definition at line 718 of file README.txt.
esp movl | ( | % | esp, |
1 | |||
) |
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 movq | ( | % | rsp | ) |
we currently eax movsbl | ( | % | esp | ) |
< i32 > br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp movswl | ( | % | ebp | ) |
The following code is currently eax eax ecx jb LBB1_2 eax movzbl eax ret eax ret We could change the eax into movzwl | ( | % | esp | ) |
bool no_overflow | ( | unsigned | a, |
unsigned | b | ||
) |
Definition at line 533 of file README.txt.
References b, and full_add().
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen one of which is eax eax ecx cmovle eax eax ret which is probably but it s interesting at since libc is hand tuned for medium and large mem ops | ( | avoiding RFO for large | stores, |
TLB | preheating, | ||
etc | |||
) |
Definition at line 167 of file README.txt.
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some processors | ( | which ones? | ) |
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp ret | ( | also really horrible code on | ppc | ) |
Referenced by llvm::LTOCodeGenerator::addModule(), bar(), llvm::yaml::begin(), llvm::AArch64InstrInfo::buildOutlinedFrame(), llvm::NVPTXAsmPrinter::doFinalization(), dupl(), llvm::sampleprof::FunctionSamples::findCallTargetMapAt(), llvm::sampleprof::FunctionSamples::findSamplesAt(), foo(), llvm::object::MachOObjectFile::getBuildPlatform(), llvm::object::MachOObjectFile::getBuildTool(), llvm::RegionInfoBase< RegionTraits< Function > >::getCommonRegion(), llvm::InstrProfSymtab::getFuncNameOrExternalSymbol(), LLVMGetRelocationSymbol(), LLVMGetRelocationTypeName(), make_output(), llvm::OProfileWrapper::op_close_agent(), llvm::support::endian::read(), llvm::support::endian::readNext(), to(), and zero().
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit shifts | ( | in | general | ) |
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg support | ( | From http:but without the unnecessary | and. | ) |
<%struct.bf**> define void t1 | ( | ) |
Definition at line 1497 of file README.txt.
Referenced by compress_pre(), expandLog(), expandLog10(), expandLog2(), GetExponent(), getLimitedPrecisionExp2(), and GetSignificand().
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align edx rdx eax movl rsp eax rsp ret and it gets compiled into this on ebp esp eax movl ebp eax movl ebp eax esp popl ebp ret gcc ebp eax popl ebp ret Teach tblgen not to check bitconvert source type in some cases This allows us to consolidate the following patterns in X86InstrMMX v2i32 | ( | MMX_MOVDQ2Qrr VR128:$src | ) |
def v4i16 | ( | MMX_MOVDQ2Qrr VR128:$src | ) |
def v8i8 | ( | MMX_MOVDQ2Qrr VR128:$src | ) |
The generated code on x86 for checking for signed overflow on a multiply the obvious way is much longer than it needs to be int x | ( | int | a, |
int | b | ||
) |
Definition at line 913 of file README.txt.
References b.
the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section coalesced instead of section __const_coal |
Definition at line 259 of file README.txt.
the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section coalesced instead of section __DATA |
Definition at line 260 of file README.txt.
bar __pad10__ |
Definition at line 1400 of file README.txt.
bar __pad11__ |
Definition at line 1426 of file README.txt.
http __pad4__ |
Definition at line 20 of file README.txt.
_test __pad5__ |
Definition at line 197 of file README.txt.
_test __pad6__ |
Definition at line 462 of file README.txt.
to __pad7__ |
Definition at line 630 of file README.txt.
def __pad8__ |
Definition at line 1207 of file README.txt.
def __pad9__ |
Definition at line 1210 of file README.txt.
the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section __TEXT |
Definition at line 259 of file README.txt.
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y _test1 |
Definition at line 143 of file README.txt.
Current eax eax eax ret Ideal eax eax ret Re implement atomic builtins x86 does not have to use add to implement these it can use add |
Definition at line 454 of file README.txt.
Referenced by foo().
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align edx rdx eax movl rsp eax rsp ret and it gets compiled into this on ebp esp eax movl ebp eax movl ebp eax addl |
Definition at line 397 of file README.txt.
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align edx rdx eax movl rsp eax addq |
Definition at line 1143 of file README.txt.
Definition at line 89 of file README.txt.
Definition at line 10 of file README.txt.
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 atomic and others It is also currently not done for read modify write instructions It is also current not done if the OF or CF flags are needed The shift operators have the complication that when the shift count is EFLAGS is not so they can only subsume a test instruction if the shift count is known to be non zero Also |
Definition at line 30 of file README.txt.
is currently compiled esp esp jne LBB1_1 esp ret esp esp jne L_abort $stub esp ret This can be applied to any no return function call that takes no arguments etc Alternatively |
Definition at line 412 of file README.txt.
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 and |
Definition at line 1271 of file README.txt.
LLVM currently emits rax rax movq rax rax ret It could narrow the loads and stores to emit rax rax movq rax andb |
Definition at line 1401 of file README.txt.
Moving arg1 onto the stack slot of callee function would overwrite arg2 of the caller Possible int32 arg2 |
Definition at line 700 of file README.txt.
Definition at line 452 of file README.txt.
Definition at line 637 of file README.txt.
<i32> ret i32 conv5 And the following x86 edi dil sete al movzbl eax ret A cmpb instead of the xorl testb would be one instruction shorter Given the following C int b { return (unsigned char)a == (unsigned char)b |
Definition at line 973 of file README.txt.
Definition at line 1434 of file README.txt.
Definition at line 978 of file README.txt.
Referenced by abort_gzip(), llvm::MachineFunction::CreateMachineBasicBlock(), llvm::ScheduleDAGInstrs::enterRegion(), llvm::ScheduleDAGMI::enterRegion(), llvm::ScheduleDAGMILive::enterRegion(), llvm::MipsAsmPrinter::isBlockOnlyReachableByFallthrough(), llvm::MachineBasicBlock::printName(), llvm::ScheduleDAGSDNodes::Run(), llvm::ScheduleDAGInstrs::startBlock(), llvm::ScheduleDAGMI::startBlock(), and useFuncSeen().
<i32> br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could be |
Definition at line 592 of file README.txt.
LLVM currently emits rax rax movq rax rax ret It could narrow the loads and stores to emit rax rax movq bfi = external global %struct.bf* |
Definition at line 1495 of file README.txt.
Referenced by llvm::InstructionSelector::setupMF().
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl esp setl al ret Doing this correctly is tricky as the xor clobbers the flags We should generate bts btr etc instructions on targets where they are cheap or when codesize is important e int bit |
Definition at line 108 of file README.txt.
c |
Definition at line 973 of file README.txt.
is currently compiled esp esp jne LBB1_1 esp ret esp esp jne L_abort $stub esp ret This can be applied to any no return function call that takes no arguments etc the stack save restore logic could be shrink producing something like esp jne LBB1_1 ret esp call L_abort $stub Both are useful in different situations it could be shrink wrapped and tail called |
Definition at line 424 of file README.txt.
we currently eax ecx subl eax ret We would use one fewer register if codegen d eax neg eax eax ret Note that this isn t beneficial if the load can be folded into the sub In this case |
Definition at line 458 of file README.txt.
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many cases |
Definition at line 83 of file README.txt.
<i32> br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could ebp ebp This seems like a cross between remat and spill folding This has redundant subtractions of eax from a stack slot ecx doesn t change |
Definition at line 599 of file README.txt.
Definition at line 25 of file README.txt.
Referenced by compareMBBPriority().
Definition at line 1443 of file README.txt.
Definition at line 1447 of file README.txt.
Referenced by llvm::OpenMPIRBuilder::createCopyinClauseBlocks().
Definition at line 155 of file README.txt.
|
inline |
Definition at line 388 of file README.txt.
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t commutative |
Definition at line 77 of file README.txt.
<i32> br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could ebp ebp This seems like a cross between remat and spill folding This has redundant subtractions of eax from a stack slot ecx doesn t so we could simply subtract eax from ecx first and then use ecx (or vice-versa). This code<i1> br i1 label cond_true662 |
Definition at line 607 of file README.txt.
Definition at line 1470 of file README.txt.
<i32> br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could ebp ebp This seems like a cross between remat and spill folding This has redundant subtractions of eax from a stack slot ecx doesn t so we could simply subtract eax from ecx first and then use ecx (or vice-versa). This code<i1> br i1 label label cond_next715 produces cx movswl cx |
Definition at line 612 of file README.txt.
Definition at line 253 of file README.txt.
Definition at line 1480 of file README.txt.
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel Duplication |
Definition at line 51 of file README.txt.
For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd xmm3 ucomisd xmm0 ja LBB1_3 xmm2 xmm0 ret We should sink the load into xmm3 into the LBB1_2 block This should be pretty easy |
Definition at line 525 of file README.txt.
currently compiles eax eax je LBB0_3 testl eax |
Definition at line 36 of file README.txt.
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl ebx |
Definition at line 97 of file README.txt.
Referenced by get_cpu_features().
Definition at line 147 of file README.txt.
Referenced by get_cpu_features(), and getReadTimeStampCounter().
Definition at line 235 of file README.txt.
For example |
Definition at line 492 of file README.txt.
it should cost the same as a move shift on any modern but it s a lot shorter Downside is that it puts more pressure on register allocation because it has fixed operands Example |
Definition at line 928 of file README.txt.
_test eax eax ret Leaf functions that require one byte spill slot have a prolog like esp and an epilog like esp popl esi ret It would be and potentially faster |
Definition at line 479 of file README.txt.
float fh = (int) (u >> 16) |
Definition at line 292 of file README.txt.
is currently compiled esp esp jne LBB1_1 esp ret esp esp jne L_abort $stub esp ret This can be applied to any no return function call that takes no arguments etc the stack save restore logic could be shrink producing something like esp jne LBB1_1 ret esp call L_abort $stub Both are useful in different situations Finally |
Definition at line 423 of file README.txt.
Definition at line 293 of file README.txt.
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl esp setl al ret Doing this correctly is tricky as the xor clobbers the flags We should generate bts btr etc instructions on targets where they are cheap or when codesize is important e for |
Definition at line 108 of file README.txt.
Definition at line 973 of file README.txt.
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl esp setl al ret Doing this correctly is tricky as the xor clobbers the flags We should generate bts btr etc instructions on targets where they are cheap or when codesize is important e g |
Definition at line 106 of file README.txt.
Definition at line 954 of file README.txt.
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align edx rdx eax movl rsp eax rsp ret and it gets compiled into this on ebp esp eax movl ebp eax movl ebp eax esp popl ebp ret gcc generates |
Definition at line 1153 of file README.txt.
the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section coalesced instead of section coalesced Take a look at darwin h |
Definition at line 261 of file README.txt.
Referenced by llvm::CommonHandleTraits::Close(), llvm::CryptContextTraits::Close(), llvm::RegTraits::Close(), llvm::FindHandleTraits::Close(), llvm::DecodePSWAPMask(), freezeset(), llvm::object::hashSysV(), llvm::CommonHandleTraits::IsValid(), llvm::CryptContextTraits::IsValid(), llvm::RegTraits::IsValid(), load_counters(), load_counters16(), llvm::ScopedHandle< HandleTraits >::operator=(), and llvm::ARMFunctionInfo::setHasITBlocks().
<i32> br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could ebp ebp This seems like a cross between remat and spill folding This has redundant subtractions of eax from a stack slot However |
Definition at line 249 of file README.txt.
however |
Definition at line 253 of file README.txt.
< i32 > ret i32 conv5 And the following x86 eax movsbl ecx cmpl ecx sete al movzbl eax ret It should be possible to eliminate the sign extensions LLVM misses a load store narrowing opportunity in this i16 |
Definition at line 1493 of file README.txt.
Referenced by AMDGPUDAGToDAGISel::matchLoadD16FromBuildVector().
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel ideas |
Definition at line 51 of file README.txt.
Definition at line 291 of file README.txt.
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret eax ja L5 call abort Tail call optimization improvements |
Definition at line 665 of file README.txt.
We currently emits imull |
Definition at line 235 of file README.txt.
Current eax eax eax ret Ideal eax eax ret Re implement atomic builtins x86 does not have to use add to implement these it can use inc |
Definition at line 1367 of file README.txt.
Referenced by DecodeVLD2LN(), DecodeVLD3DupInstruction(), DecodeVLD3LN(), DecodeVLD4DupInstruction(), DecodeVLD4LN(), DecodeVST2LN(), DecodeVST3LN(), and DecodeVST4LN().
Current eax eax eax ret Ideal eax eax ret Re implement atomic builtins x86 does not have to use add to implement these Instead |
Definition at line 1366 of file README.txt.
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret eax ja L5 call abort Tail call optimization int64 |
Definition at line 680 of file README.txt.
currently compiles into |
Definition at line 1544 of file README.txt.
def iPTR |
Definition at line 1205 of file README.txt.
Definition at line 495 of file README.txt.
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen it |
Definition at line 151 of file README.txt.
Definition at line 747 of file README.txt.
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret eax ja L5 L4 |
Definition at line 662 of file README.txt.
Referenced by llvm::SparcTargetLowering::getRegisterByName().
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 L5 |
Definition at line 656 of file README.txt.
Referenced by llvm::SparcTargetLowering::getRegisterByName().
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align L6 |
Definition at line 1168 of file README.txt.
Referenced by llvm::SparcTargetLowering::getRegisterByName().
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret L7 |
Definition at line 658 of file README.txt.
Referenced by llvm::SparcTargetLowering::getRegisterByName(), and verifyLeafProcRegUse().
< i32 > ret i32 tmp10 ecx esp je LBB0_2 eax addl eax ret edx movl eax subl eax ret There s an obviously unnecessary movl in LBB0_2 |
Definition at line 999 of file README.txt.
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 LBB1_1 |
Definition at line 405 of file README.txt.
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp LBB1_2 |
Definition at line 519 of file README.txt.
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret LBB1_3 |
Definition at line 521 of file README.txt.
gcc compiles this esp xorl eax jmp L2 eax je L10 eax eax jne L3 call L_abort $stub esp call L_exit $stub esp eax ecx eax jge LBB1_4 ecx ecx jne LBB1_1 eax esp ret LBB1_4 |
Definition at line 773 of file README.txt.
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp LCFI0 |
Definition at line 1155 of file README.txt.
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen one of which is eax eax ecx cmovle eax eax ret which is probably but it s interesting at least |
Definition at line 166 of file README.txt.
Definition at line 1221 of file README.txt.
gcc compiles this esp xorl eax jmp L2 eax je L10 eax eax jne L3 call L_abort $stub esp call L_exit $stub llvm |
Definition at line 753 of file README.txt.
LLVM currently emits rax rax movq rax rax ret It could narrow the loads and stores to emit rax rax movq rax rax ret The trouble is that there is a TokenFactor between the store and the load |
Definition at line 1531 of file README.txt.
Referenced by abort_gzip(), DecodeT2LdStPre(), DecodeVLDST1Instruction(), DecodeVLDST2Instruction(), DecodeVLDST3Instruction(), DecodeVLDST4Instruction(), foo(), and LowerCallResult().
Definition at line 543 of file README.txt.
We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles eax eax The current instruction priority is based on pattern complexity The former is more complex because it folds a load so the latter will not be emitted Perhaps we should use AddedComplexity to give LEA32r a higher priority We should always try to match LEA first since the LEA matching code does some estimate to determine whether the match is profitable if we care more about code then imull is better It s two bytes shorter than movl leal On a Pentium M |
Definition at line 252 of file README.txt.
Referenced by llvm::Automaton< uint64_t >::add(), llvm::OpenMPIRBuilder::addAttributes(), llvm::orc::rt_bootstrap::ExecutorSharedMemoryMapperService::addBootstrapSymbols(), llvm::orc::rt_bootstrap::SimpleExecutorMemoryManager::addBootstrapSymbols(), llvm::orc::rt_bootstrap::SimpleExecutorDylibManager::addBootstrapSymbols(), llvm::CallGraphNode::addCalledFunction(), llvm::orc::LLLazyJIT::addLazyIRModule(), addMappingsFromTLI(), llvm::ModuleSymbolTable::addModule(), llvm::ExecutionEngine::addModule(), llvm::MCJIT::addModule(), addModuleFlags(), llvm::NamedMDNode::addOperand(), llvm::rdf::BlockNode::addPhi(), addPrepareFunction(), llvm::lto::Config::addSaveTemps(), llvm::orc::rt_bootstrap::addTo(), addVariantDeclaration(), llvm::AMDGPUMachineFunction::allocateKnownAddressLDSGlobal(), llvm::ValueMapCallbackVH< KeyT, ValueT, Config >::allUsesReplacedWith(), alwaysInlineImpl(), llvm::IRComparer< T >::analyzeIR(), llvm::GlobalsAAResult::analyzeModule(), annotateAllFunctions(), llvm::annotateValueSite(), llvm::appendToCompilerUsed(), appendToGlobalArray(), llvm::appendToGlobalCtors(), llvm::appendToGlobalDtors(), llvm::appendToUsed(), appendToUsedList(), llvm::cl::apply(), llvm::orc::LLJIT::applyDataLayout(), applyDebugify(), llvm::Value::assertModuleIsMaterializedImpl(), assureFPCallStub(), llvm::Automaton< uint64_t >::Automaton(), llvm::json::Object::begin(), llvm::xray::Graph< VertexAttribute, EdgeAttribute, VI >::InOutEdgeView< isConst, isOut >::begin(), llvm::DebugHandlerBase::beginModule(), llvm::CodeViewDebug::beginModule(), llvm::DwarfDebug::beginModule(), llvm::SwitchCG::BitTestCase::BitTestCase(), buildDebugInfoForNoopResumeDestroyFunc(), llvm::SPIRVGeneralDuplicatesTracker::buildDepsGraph(), llvm::SPIRVGlobalRegistry::buildGlobalVariable(), BuildInstOrderMap(), llvm::buildModuleSummaryIndex(), buildUMulWithOverflowFunc(), llvm::MemorySSA::CachingWalker::CachingWalker(), llvm::AMDGPUMachineFunction::calculateKnownAddressOfLDSGlobal(), llvm::pdb::DbiModuleDescriptorBuilder::calculateSerializedLength(), callAppendArgs(), callAppendStringN(), llvm::CallGraph::CallGraph(), llvm::CallGraphDOTInfo::CallGraphDOTInfo(), callIntrinsic(), callPrintfBegin(), llvm::Automaton< uint64_t >::canAdd(), canonicalizeShuffleMaskWithCommute(), canonicalizeShuffleMaskWithHorizOp(), CanWidenIV(), llvm::xray::Graph< VertexAttribute, EdgeAttribute, VI >::InOutEdgeView< isConst, isOut >::cbegin(), llvm::xray::Graph< VertexAttribute, EdgeAttribute, VI >::InOutEdgeView< isConst, isOut >::cend(), llvm::checkDebugInfoMetadata(), checkDenormalAttributeConsistency(), checkFunctionsAttributeConsistency(), llvm::object::Binary::checkOffset(), checkSize(), llvm::M68kSubtarget::classifyExternalReference(), llvm::M68kSubtarget::classifyGlobalFunctionReference(), llvm::X86Subtarget::classifyGlobalFunctionReference(), llvm::M68kSubtarget::classifyGlobalReference(), llvm::X86Subtarget::classifyGlobalReference(), llvm::json::Object::clear(), llvm::LazyValueInfo::clear(), llvm::ExecutionEngine::clearGlobalMappingsFromModule(), llvm::MemorySSA::ClobberWalkerBase::ClobberWalkerBase(), llvm::CloneModule(), llvm::orc::cloneToNewContext(), codegen(), llvm::dxil::Resources::collect(), CollectAddOperandsWithScales(), llvm::ModuleSymbolTable::CollectAsmSymbols(), llvm::ModuleSymbolTable::CollectAsmSymvers(), collectComdatMembers(), llvm::collectDebugInfoMetadata(), OutlinableGroup::collectGVNStoreSets(), llvm::collectUsedGlobalVariables(), combineAnd(), combineConcatVectorOps(), combineHorizOpWithShuffle(), combineRedundantDWordShuffle(), combineSetCCMOVMSK(), combineShuffleOfScalars(), combineTargetShuffle(), combineX86ShuffleChain(), combineX86ShuffleChainWithExtract(), combineX86ShufflesConstants(), combineX86ShufflesRecursively(), llvm::codeview::DebugCrossModuleExportsSubsection::commit(), llvm::pdb::DbiStreamBuilder::commit(), llvm::codeview::DebugCrossModuleImportsSubsection::commit(), commuteShuffle(), llvm::EHStreamer::computeActionsTable(), computeAddrSpace(), llvm::ScalarEvolution::computeConstantDifference(), llvm::ARM::computeDefaultTargetABI(), llvm::dxil::ComputedShaderFlags::computeFlags(), llvm::AsmPrinter::computeGlobalGOTEquivs(), llvm::X86TargetLowering::computeKnownBitsForTargetNode(), computeMemberData(), llvm::computeMinimumValueSizes(), llvm::X86TargetLowering::ComputeNumSignBitsForTargetNode(), llvm::rdf::Liveness::computePhiInfo(), computeVariableSummary(), computeVTableFuncs(), computeZeroableShuffleElements(), llvm::DwarfCompileUnit::constructImportedEntityDIE(), llvm::omp::containsOpenMP(), containsProfilingIntrinsics(), convertAnnotation2Metadata(), convertToRelativeLookupTables(), convertToRelLookupTable(), llvm::IntervalMapImpl::NodeBase< std::pair< IndexT, IndexT >, char, N >::copy(), llvm::Instruction::copyMetadata(), llvm::SanitizerStatReport::create(), llvm::MinidumpYAML::Stream::create(), llvm::Function::Create(), llvm::InstrProfSymtab::create(), llvm::EngineBuilder::create(), createAllocaInstAtEntry(), CreateAssert(), llvm::IRBuilderBase::CreateAssumption(), llvm::OpenMPIRBuilder::createAtomicCompare(), llvm::OpenMPIRBuilder::createAtomicRead(), llvm::OpenMPIRBuilder::createAtomicWrite(), llvm::IRBuilderBase::CreateBinaryIntrinsic(), createCloneDeclaration(), llvm::OpenMPIRBuilder::createCopyinClauseBlocks(), createCoroSave(), llvm::MIRParserImpl::createDummyFunction(), llvm::dxil::DXILOpBuilder::createDXILOpCall(), llvm::IRBuilderBase::CreateElementUnorderedAtomicMemCpy(), llvm::IRBuilderBase::CreateElementUnorderedAtomicMemMove(), llvm::IRBuilderBase::CreateElementUnorderedAtomicMemSet(), createEmptyFunction(), llvm::dxil::createEntryMD(), llvm::IRBuilderBase::CreateFAddReduce(), createFFSIntrinsic(), llvm::IRBuilderBase::CreateFMulReduce(), createFPFnStub(), createFrameHelperMachineFunction(), createFree(), llvm::sampleprofutil::createFSDiscriminatorVariable(), llvm::IRBuilderBase::CreateGCGetPointerBase(), llvm::IRBuilderBase::CreateGCGetPointerOffset(), llvm::IRBuilderBase::CreateGCRelocate(), CreateGCRelocates(), llvm::IRBuilderBase::CreateGCResult(), CreateGCStatepointCallCommon(), CreateGCStatepointInvokeCommon(), llvm::OpenMPIRBuilder::createGlobalFlag(), createGlobalFwdRef(), llvm::IRBuilderBase::CreateGlobalString(), llvm::IRBuilderBase::CreateGlobalStringPtr(), llvm::orc::createImplPointer(), createImportedModule(), llvm::DIBuilder::createImportedModule(), llvm::IRBuilderBase::CreateIntrinsic(), llvm::IRBuilderBase::CreateInvariantStart(), createIRLevelProfileFlagVar(), llvm::MCJIT::createJIT(), llvm::IRBuilderBase::CreateLaunderInvariantGroup(), llvm::IRBuilderBase::CreateLifetimeEnd(), llvm::IRBuilderBase::CreateLifetimeStart(), llvm::OpenMPIRBuilder::createLoopSkeleton(), createMalloc(), createMaskInstrs(), llvm::IRBuilderBase::CreateMemCpyInline(), llvm::IRBuilderBase::CreateMemMove(), llvm::IRBuilderBase::CreateMemSet(), llvm::IRBuilderBase::CreateMemSetInline(), llvm::IRBuilderBase::CreateMemTransferInst(), llvm::IRBuilderBase::CreateNoAliasScopeDeclaration(), llvm::OpenMPIRBuilder::createOffloadEntriesAndInfoMetadata(), llvm::OpenMPIRBuilder::createOffloadEntry(), llvm::OpenMPIRBuilder::createOffloadMapnames(), llvm::OpenMPIRBuilder::createOffloadMaptypes(), llvm::OpenMPIRBuilder::createOMPInteropDestroy(), llvm::OpenMPIRBuilder::createOMPInteropInit(), llvm::OpenMPIRBuilder::createOMPInteropUse(), llvm::OpenMPIRBuilder::createParallel(), llvm::createPGOFuncNameVar(), createPopcntIntrinsic(), createPowWithIntegerExponent(), llvm::IRBuilderBase::CreatePreserveArrayAccessIndex(), llvm::IRBuilderBase::CreatePreserveStructAccessIndex(), llvm::IRBuilderBase::CreatePreserveUnionAccessIndex(), createPrivateConstGlobalForString(), llvm::createPrivateGlobalForString(), createProfileFileNameVar(), llvm::createProfileFileNameVar(), CreatePrologue(), createRelLookupTable(), llvm::createSanitizerCtor(), llvm::createSanitizerCtorAndInitFunctions(), llvm::OpenMPIRBuilder::createSections(), llvm::Attributor::createShallowWrapper(), llvm::IRBuilderBase::CreateStripInvariantGroup(), createSwitchStatement(), createTargetMachine(), llvm::OpenMPIRBuilder::createTask(), llvm::ThunkInserter< Derived, InsertedThunksTy >::createThunkFunction(), llvm::IRBuilderBase::CreateUnaryIntrinsic(), llvm::IRBuilderBase::CreateVectorReverse(), llvm::IRBuilderBase::CreateVectorSplice(), llvm::IRBuilderBase::CreateVScale(), llvm::Function::createWithDefaultAttr(), createWrapper(), llvm::DataLayout::DataLayout(), debugAssign(), llvm::declareSanitizerInitFunction(), llvm::coro::declaresAnyIntrinsic(), declaresCoroCleanupIntrinsics(), declaresCoroEarlyIntrinsics(), declaresCoroElideIntrinsics(), llvm::coro::declaresIntrinsics(), DecodeCPSInstruction(), llvm::DecodePSHUFBMask(), llvm::DecodePSLLDQMask(), llvm::DecodePSRLDQMask(), DecodeT2CPSInstruction(), llvm::DecodeVPERMILPMask(), llvm::DecodeVPERMV3Mask(), llvm::DecodeVPERMVMask(), llvm::DecodeVPPERMMask(), llvm::ValueMapCallbackVH< KeyT, ValueT, Config >::deleted(), llvm::orc::shared::SPSSerializationTraits< SPSSequence< SPSTuple< SPSString, SPSValueT > >, StringMap< ValueT > >::deserialize(), llvm::DiagnosticInfoDebugMetadataVersion::DiagnosticInfoDebugMetadataVersion(), llvm::DiagnosticInfoIgnoringInvalidDebugMetadata::DiagnosticInfoIgnoringInvalidDebugMetadata(), llvm::DIBuilder::DIBuilder(), llvm::AVRAsmPrinter::doFinalization(), llvm::AMDGPUAsmPrinter::doFinalization(), llvm::NVPTXAsmPrinter::doFinalization(), llvm::legacy::FunctionPassManagerImpl::doFinalization(), llvm::AsmPrinter::doFinalization(), llvm::FPPassManager::doFinalization(), doImportingForModule(), llvm::PhysicalRegisterUsageInfo::doInitialization(), llvm::AMDGPUResourceUsageAnalysis::doInitialization(), llvm::SystemZAsmPrinter::doInitialization(), llvm::AMDGPUAAWrapperPass::doInitialization(), llvm::MIRProfileLoader::doInitialization(), llvm::X86AsmPrinter::doInitialization(), llvm::NVPTXAsmPrinter::doInitialization(), llvm::ProfileSummaryInfoWrapperPass::doInitialization(), llvm::MachineModuleInfoWrapperPass::doInitialization(), llvm::legacy::FunctionPassManagerImpl::doInitialization(), llvm::AsmPrinter::doInitialization(), llvm::FPPassManager::doInitialization(), dropTypeTests(), llvm::Metadata::dump(), llvm::PBQP::RegAlloc::PBQPRAGraph::dump(), llvm::MDNode::dumpTree(), llvm::dxil::DXILBitcodeWriter::DXILBitcodeWriter(), llvm::dxil::DXILOpBuilder::DXILOpBuilder(), eliminateAvailableExternally(), llvm::embedBitcodeInModule(), llvm::embedBufferInModule(), llvm::orc::CompileOnDemandLayer::emit(), llvm::orc::IRSpeculationLayer::emit(), llvm::ARMTargetLowering::emitAtomicCmpXchgNoStoreLLBalance(), llvm::AArch64TargetLowering::emitAtomicCmpXchgNoStoreLLBalance(), llvm::emitBinaryFloatFnCall(), emitBinaryFloatFnCallHelper(), llvm::emitCalloc(), EmitCamlGlobal(), llvm::TargetLoweringObjectFile::emitCGProfileMetadata(), llvm::WebAssemblyAsmPrinter::emitDecls(), llvm::SystemZAsmPrinter::emitEndOfAsmFile(), llvm::WebAssemblyAsmPrinter::emitEndOfAsmFile(), llvm::AMDGPUAsmPrinter::emitEndOfAsmFile(), llvm::emitFPutC(), llvm::emitFPutS(), llvm::emitFWrite(), llvm::ExecutionEngine::emitGlobals(), llvm::AMDGPU::HSAMD::MetadataStreamerMsgPackV3::emitHiddenKernelArgs(), llvm::AMDGPU::HSAMD::MetadataStreamerMsgPackV5::emitHiddenKernelArgs(), llvm::PMDataManager::emitInstrCountChangedRemark(), llvm::InstrProfRecordWriterTrait::EmitKeyDataLength(), llvm::memprof::RecordWriterTrait::EmitKeyDataLength(), llvm::memprof::FrameWriterTrait::EmitKeyDataLength(), emitLibCall(), llvm::emitLinkerFlagsForUsedCOFF(), llvm::HexagonTargetLowering::emitLoadLinked(), llvm::ARMTargetLowering::emitLoadLinked(), llvm::AArch64TargetLowering::emitLoadLinked(), llvm::emitMalloc(), llvm::PPCTargetLowering::emitMaskedAtomicCmpXchgIntrinsic(), llvm::PPCTargetLowering::emitMaskedAtomicRMWIntrinsic(), llvm::emitMemCpyChk(), llvm::TargetLoweringObjectFileELF::emitModuleMetadata(), llvm::TargetLoweringObjectFileMachO::emitModuleMetadata(), llvm::TargetLoweringObjectFileCOFF::emitModuleMetadata(), llvm::MCJIT::emitObject(), llvm::OpenMPIRBuilder::emitOffloadingArraysArgument(), llvm::OpenMPIRBuilder::emitOffloadingEntry(), llvm::WebAssemblyAsmPrinter::EmitProducerInfo(), llvm::emitPutChar(), llvm::emitPutS(), llvm::ARMAsmPrinter::emitStartOfAsmFile(), llvm::X86AsmPrinter::emitStartOfAsmFile(), llvm::HexagonTargetLowering::emitStoreConditional(), llvm::ARMTargetLowering::emitStoreConditional(), llvm::AArch64TargetLowering::emitStoreConditional(), llvm::WebAssemblyAsmPrinter::EmitTargetFeatures(), llvm::OpenMPIRBuilder::emitTargetKernel(), emitTPIDR2Save(), llvm::emitUnaryFloatFnCall(), emitUnaryFloatFnCallHelper(), llvm::json::Object::empty(), enablesValueProfiling(), llvm::json::Object::end(), llvm::xray::Graph< VertexAttribute, EdgeAttribute, VI >::InOutEdgeView< isConst, isOut >::end(), llvm::WinCFGuard::endModule(), llvm::WinException::endModule(), llvm::EngineBuilder::EngineBuilder(), llvm::json::Object::erase(), llvm::HexagonEvaluator::evaluate(), llvm::VPWidenCallRecipe::execute(), llvm::ExecutionEngine::ExecutionEngine(), llvm::X86TargetLowering::expandIndirectJTBranch(), llvm::ARMBaseInstrInfo::expandLoadStackGuardBase(), llvm::AArch64InstrInfo::expandPostRAPseudo(), llvm::BitTracker::RegisterCell::extract(), extractSubModule(), FactorOutConstant(), llvm::FailedWithMessage(), fillOverallFunction(), llvm::OpenMPIRBuilder::finalize(), llvm::MCJIT::finalizeModule(), llvm::MCJIT::finalizeObject(), llvm::json::Object::find(), llvm::CodeExtractor::findAllocas(), findArrayDimensionsRec(), findCalledFunction(), findCostForOutputBlocks(), FindCXAAtExit(), llvm::findDevirtualizableCallsForTypeTest(), FindDominatedInstruction(), findExtractedOutputToOverallOutputMapping(), findFuncPointers(), findGlobalCtors(), llvm::AMDGPU::findLDSVariablesToLower(), findLoadCallsAtConstantOffset(), llvm::MCJIT::findModuleForSymbol(), findPartitions(), llvm::IRSimilarity::IRSimilarityIdentifier::findSimilarity(), llvm::MCJIT::findSymbol(), fixupFPReturnAndCall(), foldExtractSubvectorFromShuffleVector(), foldICmpWithLowBitMaskedVal(), foldShuffleOfConcatUndefs(), foldSqrt(), llvm::InstCombinerImpl::foldVectorBinop(), foldVectorCmp(), forceRenaming(), formSplatFromShuffles(), llvm::JITSymbolFlags::fromGlobalValue(), llvm::orc::SymbolLookupSet::fromMapKeys(), llvm::FunctionImportGlobalProcessing::FunctionImportGlobalProcessing(), llvm::legacy::FunctionPassManager::FunctionPassManager(), llvm::FunctionSpecializer::FunctionSpecializer(), llvm::MCJIT::generateCodeForModule(), llvm::objcarc::ARCMDKindCache::get(), llvm::ScalarEvolution::getAddExpr(), getAddrIntType(), getAddrPtrType(), getAddrSizeInt(), getAllocaPos(), getAssignmentTrackingModuleFlag(), llvm::MachineFunctionAnalysisManager::getCachedResult(), llvm::BitTracker::MachineEvaluator::getCell(), llvm::AArch64TTIImpl::getCmpSelInstrCost(), llvm::LazyValueInfo::getConstantOnEdge(), llvm::LazyValueInfo::getConstantRangeOnEdge(), llvm::orc::getConstructors(), llvm::getDebugMetadataVersionFromModule(), llvm::Intrinsic::getDeclaration(), llvm::VPIntrinsic::getDeclarationForParams(), getDeclareIntrin(), getDefaultPersonalityFn(), llvm::TargetLoweringBase::getDefaultSafeStackPointerLocation(), getDescription(), llvm::orc::getDestructors(), llvm::omp::getDeviceKernels(), llvm::MCRegisterInfo::getDwarfRegNum(), getEmscriptenFunction(), getExtMask(), getFauxShuffleMask(), llvm::NewArchiveMember::getFile(), getFirstReloc(), llvm::getFloatFn(), getFreshReductionFunc(), llvm::AMDGPULibFunc::getFunction(), llvm::AMDGPULibFunc::getFunctionType(), llvm::AMDGPUMangledLibFunc::getFunctionType(), getGlobalVariable(), getHalfShuffleMask(), getImpl(), getInputSegmentList(), llvm::EVT::getIntegerVT(), getIntModuleFlagOrZero(), llvm::AArch64TTIImpl::getIntrinsicInstrCost(), llvm::GCNTTIImpl::getIntrinsicInstrCost(), getIntrinsicNameImpl(), llvm::X86TargetLowering::getIRStackGuard(), llvm::TargetLoweringBase::getIRStackGuard(), llvm::AMDGPUMachineFunction::getKernelLDSFunctionFromGlobal(), llvm::AMDGPUMachineFunction::getKernelLDSGlobalFromFunction(), getKmpcForDynamicFiniForType(), getKmpcForDynamicInitForType(), getKmpcForDynamicNextForType(), getKmpcForStaticInitForType(), llvm::TargetLibraryInfoImpl::getLibFunc(), llvm::MCRegisterInfo::getLLVMRegNum(), llvm::TypeBasedAAResult::getMemoryEffects(), llvm::TypeBasedAAResult::getModRefInfo(), llvm::TypeBasedAAResult::getModRefInfoMask(), llvm::ModuleSlotTracker::getModule(), llvm::CallGraphDOTInfo::getModule(), llvm::CallGraph::getModule(), llvm::DiagnosticInfoDebugMetadataVersion::getModule(), llvm::DiagnosticInfoIgnoringInvalidDebugMetadata::getModule(), getModuleFromVal(), llvm::RISCVELFTargetObjectFile::getModuleMetadata(), llvm::TargetLoweringObjectFileELF::getModuleMetadata(), llvm::orc::IRMaterializationUnit::getName(), llvm::Intrinsic::getName(), llvm::DIScope::getName(), getNumberOfRelocations(), GetObjCImageInfo(), llvm::object::getObject(), getObject(), llvm::NewArchiveMember::getOldMember(), llvm::DwarfUnit::getOrCreateContextDIE(), getOrCreateDXILOpFunction(), getOrCreateFrameHelper(), getOrCreateFunction(), llvm::getOrCreateFunctionComdat(), llvm::OpenMPIRBuilder::getOrCreateIdent(), llvm::OpenMPIRBuilder::getOrCreateInternalVariable(), llvm::DwarfUnit::getOrCreateModule(), llvm::OpenMPIRBuilder::getOrCreateRuntimeFunctionPtr(), llvm::getOrCreateSanitizerCtorAndInitFunctions(), llvm::OpenMPIRBuilder::getOrCreateSrcLocStr(), llvm::AMDGPULibFunc::getOrInsertFunction(), getOrInsertGlobal(), llvm::getOrInsertLibFunc(), getOrInsertValueProfilingCall(), getOutputSegmentMap(), llvm::HvxSelector::getPerfectCompletions(), getPerfectShuffleCost(), llvm::getPointerAtOffset(), llvm::MCJIT::getPointerToFunction(), llvm::LazyValueInfo::getPredicateAt(), llvm::LazyValueInfo::getPredicateOnEdge(), getPSHUFShuffleMask(), GetQuadraticEquation(), llvm::pdb::DbiModuleDescriptor::getRecordLength(), llvm::BitTracker::MachineEvaluator::getRef(), llvm::getReplayInlineAdvisor(), llvm::MachineFunctionAnalysisManager::getResult(), llvm::objcarc::getRVInstMarker(), llvm::TargetLoweringBase::getSafeStackPointerLocation(), getScalarIntrinsicDeclaration(), llvm::DIScope::getScope(), llvm::ARMTargetLowering::getSDagStackGuard(), llvm::AArch64TargetLowering::getSDagStackGuard(), llvm::PPCTargetLowering::getSDagStackGuard(), llvm::X86TargetLowering::getSDagStackGuard(), llvm::TargetLoweringBase::getSDagStackGuard(), getSectionID(), llvm::getShuffleDemandedElts(), GetSignReturnAddress(), llvm::TargetLibraryInfoImpl::getSizeTSize(), llvm::TargetLibraryInfo::getSizeTSize(), getSizeTTy(), GetSortedValueDataFromCallTargets(), llvm::getSplatIndex(), getSqrtCall(), llvm::ARMTargetLowering::getSSPStackGuardCheck(), llvm::AArch64TargetLowering::getSSPStackGuardCheck(), llvm::X86TargetLowering::getSSPStackGuardCheck(), getStackGuard(), llvm::orc::getStaticInitGVs(), llvm::InstrProfIncrementInst::getStep(), llvm::getStrideFromPointer(), getStrlenWithNull(), llvm::AsmPrinter::getSymbolPreferLocal(), GetSymbolRef(), getSymbolSectionID(), getTargetConstantBitsFromNode(), getTargetShuffleAndZeroables(), getTargetShuffleMask(), llvm::HexagonTargetLowering::getTgtMemIntrinsic(), llvm::ScalarEvolution::getUDivExpr(), llvm::getUniqueModuleId(), getV4X86ShuffleImm(), getVarName(), llvm::VFDatabase::getVectorizedFunction(), llvm::SelectionDAG::getVectorShuffle(), getVectorShuffle(), llvm::EVT::getVectorVT(), llvm::TargetLibraryInfoImpl::getWCharSize(), llvm::TargetLibraryInfo::getWCharSize(), llvm::GlobalVariable::GlobalVariable(), HandleByValArgumentInit(), llvm::TextChangeReporter< IRDataT< EmptyData > >::handleInitialIR(), llvm::ModuleSummaryIndex::hasExportedFunctions(), llvm::hasFloatFn(), llvm::PBQP::hash_value(), llvm::SITargetLowering::hasMemSDNodeUser(), llvm::SDNode::hasPredecessorHelper(), hasReturnsTwiceAttr(), hasVolatileUser(), llvm::haveNoCommonBitsSet(), llvm::inferAlignFromPtrInfo(), inferAllPrototypeAttributes(), llvm::inferNonMandatoryLibFuncAttrs(), llvm::InformationCache::InformationCache(), llvm::ThunkInserter< Derived, InsertedThunksTy >::init(), llvm::objcarc::ARCRuntimeEntryPoints::init(), llvm::DataLayout::init(), llvm::objcarc::ARCMDKindCache::init(), llvm::OpenMPIRBuilder::initialize(), INITIALIZE_PASS(), llvm::MIRParserImpl::initializeConstantPool(), initializeCounts(), llvm::IRSimilarity::IRInstructionMapper::initializeForBBs(), initializeRecordStreamer(), llvm::lto::initImportList(), llvm::PMDataManager::initSizeRemarkInfo(), llvm::InlineAdvisor::InlineAdvisor(), llvm::InlineFunction(), llvm::xray::Graph< VertexAttribute, EdgeAttribute, VI >::InOutEdgeView< isConst, isOut >::InOutEdgeView(), llvm::yaml::CustomMappingTraits< MapDocNode >::inputOne(), llvm::BitTracker::RegisterCell::insert(), insertCall(), llvm::DIBuilder::insertDbgAssign(), insertLifetimeMarkersSurroundingCall(), llvm::RISCVInstrInfo::insertOutlinedCall(), llvm::AArch64InstrInfo::insertOutlinedCall(), llvm::ARMBaseInstrInfo::insertOutlinedCall(), llvm::X86InstrInfo::insertOutlinedCall(), llvm::BPFCoreSharedInfo::insertPassThrough(), InsertSafepointPoll(), insertSinCosCall(), insertSpills(), llvm::SparcTargetLowering::insertSSPDeclarations(), llvm::ARMTargetLowering::insertSSPDeclarations(), llvm::AArch64TargetLowering::insertSSPDeclarations(), llvm::PPCTargetLowering::insertSSPDeclarations(), llvm::X86TargetLowering::insertSSPDeclarations(), llvm::TargetLoweringBase::insertSSPDeclarations(), insertUseHolderAfter(), llvm::ARMTTIImpl::instCombineIntrinsic(), llvm::mca::InstructionError< T >::InstructionError(), InstrumentAllFunctions(), llvm::SampleProfileProber::instrumentOneFunc(), instrumentOneFunc(), llvm::Attributor::internalizeFunctions(), internalizeGVsAfterImport(), llvm::InternalizePass::internalizeModule(), llvm::Interpreter::Interpreter(), llvm::IntervalIterator< NodeTy, OrigContainer_t, GT, IGT >::IntervalIterator(), llvm::CGSCCAnalysisManagerModuleProxy::Result::invalidate(), llvm::orc::IRMaterializationUnit::IRMaterializationUnit(), llvm::IRMover::IRMover(), isAddSubOrSubAddMask(), isAnyInRange(), isAnyZero(), isAnyZeroOrUndef(), llvm::isAssignmentTrackingEnabled(), isElementRotate(), isEXTMask(), llvm::ShuffleVectorInst::isExtractSubvectorMask(), isIdentity(), llvm::BasicTTIImplBase< AMDGPUTTIImpl >::isIndexedLoadLegal(), llvm::BasicTTIImplBase< AMDGPUTTIImpl >::isIndexedStoreLegal(), llvm::ShuffleVectorInst::isInsertSubvectorMask(), isINSMask(), llvm::isIRPGOFlagSet(), llvm::X86TargetLowering::isLegalAddressingMode(), llvm::isLibFuncEmittable(), isLowHalfOnly(), isMulSExtable(), isMultiLaneShuffleMask(), llvm::X86::isOffsetSuitableForCodeModel(), llvm::omp::isOpenMPDevice(), isReverseMask(), isREVMask(), llvm::RISCVTargetLowering::isShuffleMaskLegal(), llvm::AArch64TargetLowering::isShuffleMaskLegal(), llvm::ARMTargetLowering::isShuffleMaskLegal(), isSingletonExtMask(), isSingletonEXTMask(), isSingletonVEXTMask(), llvm::SelectionDAG::isSplatValue(), isTargetShuffleEquivalent(), isTRN_v_undef_Mask(), isTRNMask(), isTruncMask(), isUndefInRange(), isUndefOrEqual(), isUndefOrInRange(), isUndefOrZeroOrInRange(), isUZP_v_undef_Mask(), isUZPMask(), llvm::TargetLibraryInfo::isValidProtoForLibFunc(), llvm::AArch64TargetLowering::isVectorClearMaskLegal(), isVectorElementSwap(), isVEXTMask(), isVMOVNMask(), isVMOVNTruncMask(), llvm::isVREVMask(), isVTBLMask(), isVTRN_v_undef_Mask(), isVTRNMask(), isVUZP_v_undef_Mask(), isVUZPMask(), isVZIP_v_undef_Mask(), isVZIPMask(), isWideDUPMask(), isWideTypeMask(), isZIP_v_undef_Mask(), isZipMask(), isZIPMask(), llvm::SwitchCG::JumpTable::JumpTable(), llvm::LazyCallGraph::LazyCallGraph(), llvm::ThinLTOCodeGenerator::linkCombinedIndex(), llvm::Linker::Linker(), llvm::lintModule(), LLVMAddAlias2(), LLVMAddFunction(), LLVMAddGlobal(), LLVMAddGlobalIFunc(), LLVMAddGlobalInAddressSpace(), LLVMAddModule(), LLVMAddModuleFlag(), LLVMAddNamedMetadataOperand(), LLVMAppendModuleInlineAsm(), LLVMCloneModule(), LLVMCopyModuleFlagsMetadata(), LLVMCreateDIBuilder(), LLVMCreateDIBuilderDisallowUnresolved(), LLVMCreateExecutionEngineForModule(), LLVMCreateFunctionPassManagerForModule(), LLVMCreateInterpreterForModule(), LLVMCreateJITCompilerForModule(), LLVMCreateMCJITCompilerForModule(), LLVMCreateModuleProviderForExistingModule(), LLVMDIBuilderCreateImportedModuleFromModule(), LLVMDisposeModule(), LLVMDumpModule(), LLVMGetDataLayout(), LLVMGetDataLayoutStr(), LLVMGetFirstFunction(), LLVMGetFirstGlobal(), LLVMGetFirstGlobalAlias(), LLVMGetFirstGlobalIFunc(), LLVMGetFirstNamedMetadata(), LLVMGetLastFunction(), LLVMGetLastGlobal(), LLVMGetLastGlobalAlias(), LLVMGetLastGlobalIFunc(), LLVMGetLastNamedMetadata(), LLVMGetModuleContext(), LLVMGetModuleDataLayout(), LLVMGetModuleDebugMetadataVersion(), LLVMGetModuleFlag(), LLVMGetModuleIdentifier(), LLVMGetModuleInlineAsm(), LLVMGetNamedFunction(), LLVMGetNamedGlobal(), LLVMGetNamedGlobalAlias(), LLVMGetNamedGlobalIFunc(), LLVMGetNamedMetadata(), LLVMGetNamedMetadataNumOperands(), LLVMGetNamedMetadataOperands(), LLVMGetOrInsertComdat(), LLVMGetOrInsertNamedMetadata(), LLVMGetSourceFileName(), LLVMGetTarget(), LLVMGetTypeByName(), LLVMLinkModules2(), LLVMOrcCreateNewThreadSafeModule(), LLVMOrcThreadSafeModuleWithModuleDo(), LLVMPrintModuleToFile(), LLVMPrintModuleToString(), LLVMRemoveModule(), LLVMRunPassManager(), LLVMSetDataLayout(), LLVMSetModuleDataLayout(), LLVMSetModuleIdentifier(), LLVMSetModuleInlineAsm(), LLVMSetModuleInlineAsm2(), LLVMSetSourceFileName(), LLVMSetTarget(), LLVMStripModuleDebugInfo(), LLVMTargetMachineEmit(), LLVMTargetMachineEmitToFile(), LLVMTargetMachineEmitToMemoryBuffer(), LLVMVerifyModule(), LLVMWriteBitcodeToFD(), LLVMWriteBitcodeToFile(), LLVMWriteBitcodeToFileHandle(), LLVMWriteBitcodeToMemoryBuffer(), llvm::OpenMPIRBuilder::loadOffloadInfoMetadata(), TransferTracker::loadVarInloc(), lookThroughAnd(), lower1BitShuffleAsKSHIFTR(), lower256BitShuffle(), lower512BitShuffle(), llvm::coro::LowererBase::LowererBase(), llvm::AMDGPUTargetLowering::LowerFP_TO_FP16(), llvm::LegalizerHelper::lowerFPTRUNC_F64_TO_F16(), lowerFunnelShifts(), llvm::lowerGlobalIFuncUsersAsGlobalCtor(), lowerIntrinsic(), lowerIntrinsics(), lowerIntrinsicToFunction(), lowerLocalAllocas(), lowerObjCCall(), LowerRotate(), LowerShift(), lowerShuffleAsBlendOfPSHUFBs(), lowerShuffleAsByteRotateAndPermute(), lowerShuffleAsDecomposedShuffleMerge(), lowerShuffleAsElementInsertion(), lowerShuffleAsLanePermuteAndPermute(), lowerShuffleAsLanePermuteAndRepeatedMask(), lowerShuffleAsLanePermuteAndShuffle(), lowerShuffleAsLanePermuteAndSHUFP(), lowerShuffleAsPermuteAndUnpack(), lowerShuffleAsRepeatedMaskAndLanePermute(), lowerShuffleAsSplitOrBlend(), lowerShuffleAsUNPCKAndPermute(), lowerShuffleAsZeroOrAnyExtend(), lowerShuffleWithPERMV(), lowerShuffleWithPSHUFB(), lowerShuffleWithSHUFPS(), llvm::IntrinsicLowering::LowerToByteSwap(), lowerUMulWithOverflow(), llvm::HexagonTargetLowering::LowerUnalignedLoad(), lowerV16I8Shuffle(), lowerV4F32Shuffle(), lowerV4F64Shuffle(), lowerV4I32Shuffle(), lowerV8F16Shuffle(), lowerV8I16GeneralSingleInputShuffle(), lowerV8I16Shuffle(), llvm::HexagonTargetLowering::LowerVECTOR_SHUFFLE(), lowerVECTOR_SHUFFLE(), m_LoopInvariant(), llvm::PatternMatch::m_Unless(), llvm::BitTracker::MachineEvaluator::MachineEvaluator(), llvm::MachineJumpTableEntry::MachineJumpTableEntry(), llvm::MachineModuleSlotTracker::MachineModuleSlotTracker(), llvm::SparcTargetLowering::makeAddress(), llvm::ARMTargetLowering::makeDMB(), llvm::rdf::RegisterAggr::makeRegRef(), llvm::orc::makeStub(), llvm::yaml::MappingTraits< OffloadYAML::Binary::Member >::mapping(), llvm::rdf::PhysicalRegisterInfo::mapTo(), llvm::X86TargetLowering::markLibCallAttributes(), markRegisterParameterAttributes(), matchAliasCondition(), llvm::MCInstPrinter::matchAliasPatterns(), matchBinaryShuffle(), matchExpandedRem(), llvm::Pattern::MatchResult::MatchResult(), matchScalarReduction(), matchShuffleAsBitRotate(), matchShuffleAsBlend(), matchShuffleAsElementRotate(), matchShuffleAsEXTRQ(), matchShuffleWithUNPCK(), matchUnaryPermuteShuffle(), llvm::Module::materializeAll(), llvm::PBQP::Matrix::Matrix(), llvm::PBQP::RegAlloc::MatrixMetadata::MatrixMetadata(), llvm::rdf::CodeNode::members_if(), mergeConstants(), llvm::codeview::mergeIdRecords(), llvm::codeview::mergeTypeAndIdRecords(), llvm::codeview::mergeTypeRecords(), llvm::orc::shared::MethodWrapperHandler< RetT, ClassT, ArgTs >::MethodWrapperHandler(), llvm::MLInlineAdvisor::MLInlineAdvisor(), llvm::irsymtab::Reader::module_symbols(), llvm::objcarc::ModuleHasARC(), llvm::PseudoProbeManager::moduleIsProbed(), llvm::IRMutationStrategy::mutate(), llvm::IRMutator::mutateModule(), llvm::nameUnamedGlobals(), llvm::RuntimePointerChecking::needsChecking(), llvm::needsComdatForCounter(), llvm::needsParamAccessSummary(), nullifySetjmp(), llvm::LaneBitmask::operator!=(), llvm::MemDepResult::operator!=(), llvm::LaneBitmask::operator&(), llvm::LaneBitmask::operator&=(), llvm::orc::SimpleCompiler::operator()(), llvm::orc::ConcurrentIRCompiler::operator()(), llvm::orc::SymbolLinkagePromoter::operator()(), llvm::PBQP::Matrix::operator+(), llvm::PBQP::Matrix::operator+=(), llvm::LaneBitmask::operator<(), llvm::MemDepResult::operator<(), llvm::operator<<(), llvm::DiagnosticPrinterRawOStream::operator<<(), llvm::PBQP::operator<<(), false::operator<<(), llvm::json::Value::operator=(), llvm::LaneBitmask::operator==(), llvm::PBQP::Matrix::operator==(), llvm::MemDepResult::operator==(), llvm::MemDepResult::operator>(), llvm::LaneBitmask::operator|(), llvm::LaneBitmask::operator|=(), llvm::cl::applicator< Mod >::opt(), llvm::LibCallSimplifier::optimizeCall(), optimizeDoubleFP(), OptimizeFunctions(), OptimizeGlobalAliases(), llvm::optimizeGlobalCtorsList(), optimizeGlobalsInModule(), OptimizeGlobalVars(), llvm::GCOV::Options::Options(), orderModule(), llvm::yaml::CustomMappingTraits< MapDocNode >::output(), llvm::sys::OwningMemoryBlock::OwningMemoryBlock(), llvm::BlockFrequencyInfoImplBase::packageLoop(), packSegmentMask(), llvm::SpecialCaseList::parse(), llvm::parseAndVerify(), parseGlobalValue(), llvm::MIRParserImpl::parseIRModule(), llvm::MIRParserImpl::parseMachineFunction(), llvm::MIRParser::parseMachineFunctions(), llvm::MIRParserImpl::parseMachineFunctions(), llvm::object::Parser::Parser(), llvm::PassManagerPrettyStackEntry::PassManagerPrettyStackEntry(), PerformADDVecReduce(), performMulCombine(), PerformSplittingToNarrowingStores(), llvm::rdf::PhysicalRegisterInfo::PhysicalRegisterInfo(), llvm::DefaultVLIWScheduler::postprocessDAG(), predictUseListOrder(), llvm::LazyMachineBlockFrequencyInfoPass::print(), llvm::StackSafetyGlobalInfo::print(), llvm::Metadata::print(), llvm::IVUsersWrapperPass::print(), llvm::MachineBasicBlock::print(), llvm::MachineInstr::print(), llvm::Metadata::printAsOperand(), llvm::Value::printAsOperand(), printIRBlockReference(), llvm::rdf::PrintLaneMaskOpt::PrintLaneMaskOpt(), printMemberHeader(), printMemOperand(), printMetadataImpl(), llvm::printMIR(), PrintModRefResults(), PrintOps(), llvm::HexagonBlockRanges::PrintRangeMap::PrintRangeMap(), PrintResults(), llvm::MDNode::printTree(), printWithoutType(), llvm::DebugInfoFinder::processInstruction(), llvm::DebugInfoFinder::processLocation(), llvm::DebugInfoFinder::processModule(), profDataReferencedByCode(), llvm::ImmutableMap< KeyT, ValT, ValInfo >::Profile(), llvm::ImmutableMapRef< KeyT, ValT, ValInfo >::Profile(), llvm::ProfileSummaryInfo::ProfileSummaryInfo(), promoteIndirectCalls(), llvm::PseudoProbeManager::PseudoProbeManager(), llvm::BitTracker::MachineEvaluator::putCell(), llvm::AMDGPUPALMetadata::readFromIR(), llvm::sys::unicode::readNode(), llvm::RecordStreamer::RecordStreamer(), llvm::DebugifyEachInstrumentation::registerCallbacks(), llvm::VerifyInstrumentation::registerCallbacks(), llvm::rdf::RegisterRef::RegisterRef(), removeConstantFactors(), removeFromUsedList(), llvm::removeFromUsedLists(), llvm::ExecutionEngine::removeModule(), llvm::MCJIT::removeModule(), RemovePreallocated(), llvm::renameModuleForThinLTO(), llvm::replaceAllDbgUsesWith(), replaceCalledFunction(), ReplaceCallWith(), replaceFrameSizeAndAlignment(), replaceUnaryCall(), replaceWithTLIFunction(), llvm::ReplayInlineAdvisor::ReplayInlineAdvisor(), llvm::rdf::Liveness::resetKills(), resolveTargetShuffleInputsAndMask(), resolveZeroablesFromTargetShuffle(), llvm::InlineAdvisorAnalysis::Result::Result(), rewriteComdat(), llvm::GCNTTIImpl::rewriteIntrinsicWithAddressSpace(), llvm::HvxSelector::rotationDistance(), llvm::PoisonCheckingPass::run(), llvm::SyntheticCountsPropagation::run(), llvm::StripNonLineTableDebugInfoPass::run(), llvm::AMDGPUCtorDtorLoweringPass::run(), llvm::CGProfilePass::run(), llvm::MetaRenamerPass::run(), llvm::CrossDSOCFIPass::run(), llvm::CoroConditionalWrapper::run(), llvm::InstrOrderFilePass::run(), llvm::LowerGlobalDtorsPass::run(), llvm::LowerIFuncPass::run(), llvm::GCOVProfilerPass::run(), llvm::KCFIPass::run(), llvm::PreISelIntrinsicLoweringPass::run(), llvm::CoroCleanupPass::run(), llvm::ModuleDebugInfoPrinterPass::run(), llvm::LoopExtractorPass::run(), llvm::Annotation2MetadataPass::run(), llvm::ForceFunctionAttrsPass::run(), llvm::PartialInlinerPass::run(), llvm::ExtractGVPass::run(), llvm::CoroElidePass::run(), llvm::CallGraphDOTPrinterPass::run(), llvm::NameAnonGlobalPass::run(), llvm::DataFlowSanitizerPass::run(), llvm::DXILResourceAnalysis::run(), llvm::EliminateAvailableExternallyPass::run(), llvm::InferFunctionAttrsPass::run(), llvm::CoroEarlyPass::run(), llvm::CanonicalizeAliasesPass::run(), llvm::StripDeadPrototypesPass::run(), llvm::CoroSplitPass::run(), llvm::GlobalOptPass::run(), llvm::MergeFunctionsPass::run(), llvm::BlockExtractorPass::run(), llvm::GlobalSplitPass::run(), llvm::CalledValuePropagationPass::run(), llvm::StripSymbolsPass::run(), llvm::RewriteStatepointsForGC::run(), llvm::ConstantMergePass::run(), llvm::CallGraphViewerPass::run(), llvm::rdf::CopyPropagation::run(), llvm::MemProfilerPass::run(), llvm::StripNonDebugSymbolsPass::run(), llvm::SampleProfileLoaderPass::run(), llvm::ModuleInlinerPass::run(), llvm::DXILResourcePrinterPass::run(), llvm::ThinLTOBitcodeWriterPass::run(), llvm::dxil::PointerTypeAnalysis::run(), llvm::ModuleThreadSanitizerPass::run(), llvm::StripDebugDeclarePass::run(), llvm::GlobalDCEPass::run(), llvm::AlwaysInlinerPass::run(), llvm::PGOInstrumentationGenCreateVar::run(), llvm::InstrProfiling::run(), llvm::ScalarizerPass::run(), llvm::HotColdSplitting::run(), llvm::ModuleMemProfilerPass::run(), llvm::StripDeadDebugInfoPass::run(), llvm::AddressSanitizerPass::run(), llvm::HWAddressSanitizerPass::run(), llvm::OpenMPOptPass::run(), llvm::PrintModulePass::run(), llvm::TypeFinder::run(), llvm::MemorySanitizerPass::run(), llvm::SanitizerCoveragePass::run(), llvm::PGOInstrumentationGen::run(), llvm::SanitizerBinaryMetadataPass::run(), llvm::ModuleSummaryIndexAnalysis::run(), llvm::OpenMPOptCGSCCPass::run(), llvm::ObjCARCAPElimPass::run(), llvm::IPSCCPPass::run(), llvm::legacy::PassManager::run(), llvm::HotColdSplittingPass::run(), llvm::PGOInstrumentationUse::run(), llvm::dxil::ShaderFlagsAnalysis::run(), llvm::RelLookupTableConverterPass::run(), llvm::BitcodeWriterPass::run(), llvm::ReversePostOrderFunctionAttrsPass::run(), llvm::BPFAdjustOptPass::run(), llvm::dxil::ShaderFlagsAnalysisPrinter::run(), llvm::InternalizePass::run(), llvm::PGOIndirectCallPromotion::run(), llvm::DeadArgumentEliminationPass::run(), llvm::ThinLTOCodeGenerator::run(), llvm::InlinerPass::run(), NewPMDebugifyPass::run(), llvm::VerifierAnalysis::run(), llvm::RewriteSymbolPass::run(), llvm::FunctionImportPass::run(), llvm::ModuleInlinerWrapperPass::run(), llvm::GlobalsAA::run(), llvm::StackSafetyGlobalAnalysis::run(), llvm::SampleProfileProbePass::run(), llvm::RecomputeGlobalsAAPass::run(), llvm::VerifierPass::run(), llvm::AMDGPUPropagateAttributesLatePass::run(), llvm::StackSafetyGlobalPrinterPass::run(), llvm::AMDGPUReplaceLDSUseWithPointerPass::run(), llvm::MachineFunctionPassManager::run(), llvm::FunctionSpecializer::run(), llvm::AMDGPULowerModuleLDSPass::run(), llvm::PseudoProbeUpdatePass::run(), NewPMCheckDebugifyPass::run(), llvm::IROutliner::run(), llvm::LowerTypeTestsPass::run(), llvm::ProfileSummaryAnalysis::run(), llvm::ProfileSummaryPrinterPass::run(), llvm::WholeProgramDevirtPass::run(), llvm::MachineModuleAnalysis::run(), llvm::AMDGPUAlwaysInlinePass::run(), llvm::AMDGPUPrintfRuntimeBindingPass::run(), llvm::AMDGPUUnifyMetadataPass::run(), llvm::AssignmentTrackingPass::run(), llvm::CallGraphAnalysis::run(), llvm::CallGraphPrinterPass::run(), llvm::CallGraphSCCsPrinterPass::run(), llvm::InlineAdvisorAnalysis::run(), llvm::InlineCostAnnotationPrinterPass::run(), llvm::InlineAdvisorAnalysisPrinterPass::run(), llvm::ModuleToPostOrderCGSCCPassAdaptor::run(), llvm::FunctionAnalysisManagerCGSCCProxy::run(), llvm::IROutlinerPass::run(), llvm::legacy::PassManagerImpl::run(), llvm::MustBeExecutedContextPrinterPass::run(), llvm::IRSimilarityAnalysis::run(), llvm::IRSimilarityAnalysisPrinterPass::run(), llvm::ModuleToFunctionPassAdaptor::run(), llvm::LazyCallGraphAnalysis::run(), llvm::LazyCallGraphPrinterPass::run(), llvm::LazyCallGraphDOTPrinterPass::run(), llvm::AttributorPass::run(), llvm::AttributorCGSCCPass::run(), llvm::PseudoProbeVerifier::runAfterPass(), runCGProfilePass(), runCVP(), runImpl(), llvm::RewriteSymbolPass::runImpl(), runIPSCCP(), llvm::LPPassManager::runOnFunction(), llvm::FPPassManager::runOnFunction(), llvm::DXILResourceWrapper::runOnModule(), llvm::AMDGPUResourceUsageAnalysis::runOnModule(), llvm::ModuleSummaryIndexWrapperPass::runOnModule(), llvm::dxil::ShaderFlagsAnalysisWrapper::runOnModule(), llvm::GlobalsAAWrapperPass::runOnModule(), llvm::StackSafetyGlobalInfoWrapperPass::runOnModule(), llvm::SPIRVModuleAnalysis::runOnModule(), llvm::CallGraphWrapperPass::runOnModule(), llvm::FPPassManager::runOnModule(), llvm::IRSimilarityIdentifierWrapperPass::runOnModule(), llvm::ExecutionEngine::runStaticConstructorsDestructors(), llvm::salvageDebugInfoImpl(), llvm::SanitizerStatReport::SanitizerStatReport(), llvm::SCEVLoopAddRecRewriter::SCEVLoopAddRecRewriter(), SCEVLoopGuardRewriter::SCEVLoopGuardRewriter(), llvm::SCEVParameterRewriter::SCEVParameterRewriter(), llvm::HexagonDAGToDAGISel::SelectVAlignAddr(), llvm::orc::shared::SPSSerializationTraits< SPSSequence< SPSTuple< SPSString, SPSValueT > >, StringMap< ValueT > >::serialize(), llvm::SMEAttrs::set(), setAssignmentTrackingModuleFlag(), llvm::EngineBuilder::setCodeModel(), setCoroInfo(), llvm::DebugifyEachInstrumentation::setDebugifyMode(), llvm::DebugifyCustomPassManager::setDebugifyMode(), llvm::codegen::setFunctionAttributes(), llvm::setIrrLoopHeaderMetadata(), llvm::setKCFIType(), llvm::pdb::DbiStreamBuilder::setMachineType(), llvm::GVNPass::ValueTable::setMemDep(), llvm::cl::Option::setMiscFlag(), llvm::ImportedFunctionsInliningStatistics::setModuleInfo(), llvm::Module::setProfileSummary(), llvm::setProfMetadata(), setUsedInitializer(), llvm::VFABI::setVectorVariantNames(), shouldConvertToRelLookupTable(), shouldInstrumentReadWriteFromAddress(), llvm::OptPassGateInstrumentation::shouldRun(), llvm::InstCombinerImpl::SimplifyAnyMemTransfer(), llvm::TargetLowering::SimplifyDemandedVectorElts(), llvm::X86TargetLowering::SimplifyDemandedVectorEltsForTargetNode(), llvm::TargetLowering::SimplifyMultipleUseDemandedBits(), llvm::X86TargetLowering::SimplifyMultipleUseDemandedBitsForTargetNode(), llvm::json::Object::size(), llvm::orc::shared::SPSSerializationTraits< SPSSequence< SPSTuple< SPSString, SPSValueT > >, StringMap< ValueT > >::size(), llvm::ModulePass::skipModule(), llvm::MemorySSA::SkipSelfWalker::SkipSelfWalker(), llvm::ArrayRef< uint64_t >::slice(), llvm::MutableArrayRef< llvm::coverage::CounterMappingRegion >::slice(), SolveQuadraticAddRecExact(), SolveQuadraticAddRecRange(), llvm::SCCPSolver::solveWhileResolvedUndefsIn(), llvm::SCCPInstVisitor::solveWhileResolvedUndefsIn(), splitAndLowerShuffle(), llvm::splitCodeGen(), splitGlobals(), splitMask(), llvm::SplitModule(), llvm::StackSafetyGlobalInfo::StackSafetyGlobalInfo(), llvm::orc::StaticInitGVIterator::StaticInitGVIterator(), stripDeadDebugInfoImpl(), stripDeadPrototypes(), stripDebugDeclareImpl(), llvm::stripDebugifyMetadata(), llvm::StripDebugInfo(), llvm::stripNonLineTableDebugInfo(), stripNonValidData(), StripSymbolNames(), stripTBAA(), StripTypeNames(), substituteIntrinsicCalls(), llvm::orc::ThreadSafeModule::ThreadSafeModule(), llvm::PBQP::Matrix::transpose(), llvm::trimBlockToPageSize(), llvm::json::Object::try_emplace(), llvm::VFABI::tryDemangleForVFABI(), llvm::tryPromoteCall(), llvm::MemoryPhi::unorderedDeleteIncomingValue(), llvm::updatePublicTypeTestCalls(), llvm::updateVCallVisibilityInModule(), llvm::UpgradeARCRuntime(), llvm::UpgradeDebugInfo(), llvm::UpgradeIntrinsicCall(), llvm::UpgradeModuleFlags(), UpgradeRetainReleaseMarker(), llvm::UpgradeSectionAttributes(), UseTlsOffset(), llvm::json::Value::Value(), llvm::ValueEnumerator::ValueEnumerator(), llvm::dxil::ValueEnumerator::ValueEnumerator(), llvm::verifyModule(), llvm::MachineRegisterInfo::verifyUseList(), llvm::VFDatabase::VFDatabase(), llvm::InstVisitor< ObjectSizeOffsetVisitor, SizeOffsetType >::visit(), llvm::InstCombinerImpl::visitAllocSite(), llvm::InstCombinerImpl::visitCallInst(), llvm::InstCombinerImpl::visitFDiv(), visitMaskedMerge(), llvm::InstCombinerImpl::visitShl(), llvm::SelectionDAGBuilder::visitSPDescriptorParent(), llvm::InstCombinerImpl::visitXor(), llvm::dxil::Resources::write(), llvm::dxil::DXILBitcodeWriter::write(), llvm::writeArchiveToStream(), llvm::WriteBitcodeToFile(), llvm::dxil::WriteDXILToFile(), llvm::dxil::BitcodeWriter::writeModule(), llvm::BitcodeWriter::writeModule(), llvm::writeModule(), writeSymbolTable(), llvm::BitcodeWriter::writeSymtab(), llvm::BitcodeWriter::writeThinLinkBitcode(), and llvm::writeThinLinkBitcodeToFile().
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp movl |
Definition at line 117 of file README.txt.