LLVM  14.0.0git
Typedefs | Functions | Variables
lib/Target/X86/README.txt File Reference
#include <algorithm>
Include dependency graph for README.txt:

Typedefs

using tmp1 = urem i32 %X, 255 ret i32 %tmp1 } Currently it compiles to:... movl $2155905153, %ecx movl 8(%esp), %esi movl %esi, %eax mull %ecx ... This could be "reassociated" into:movl $2155905153, %eax movl 8(%esp), %ecx mull %ecx to avoid the copy. In fact, the existing two-address stuff would do this except that mul isn 't a commutative 2-addr instruction. I guess this has to be done at isel time based on the #uses to mul? Make sure the instruction which starts a loop does not cross a cacheline boundary. This requires knowning the exact length of each machine instruction. That is somewhat complicated, but doable. Example 256.bzip2:In the new trace, the hot loop has an instruction which crosses a cacheline boundary. In addition to potential cache misses, this can 't help decoding as I imagine there has to be some kind of complicated decoder reset and realignment to grab the bytes from the next cacheline. 532 532 0x3cfc movb(1809(%esp, %esi), %bl<<<--- spans 2 64 byte lines 942 942 0x3d03 movl %dh,(1809(%esp, %esi) 937 937 0x3d0a incl %esi 3 3 0x3d0b cmpb %bl, %dl 27 27 0x3d0d jnz 0x000062db< main+11707 > In c99 mode, the preprocessor doesn 't like assembly comments like #TRUNCATE. This could be a single 16-bit load. int f(char *p) { if((p[0]==1) &(p[1]==2)) return 1
 
using a = =0.0 ? 0.0 :(a > 0.0 ? 1.0 :-1.0)
 
using edx = sar eax, 31) more aggressively
 
using mtune = pentium2/3/4/m/etc.:abs:movl 4(%esp), %eax cltd xorl %edx, %eax subl %edx, %eax ret Take the following code(from http:extern unsigned char first_one[65536]
 
using tmp = call i32 @t4( i32 %tmp.5 )
 

Functions

http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg support (From http:but without the unnecessary and.) movl %ecx
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit shifts (in general) expand to really bad code. Instead of using cmovs
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some processors (which ones?)
 
void clearbit (int *target, int bit)
 
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen one of which is eax eax leal (%eax)
 
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen one of which is eax eax ecx cmovle eax eax ret which is probably but it s interesting at since libc is hand tuned for medium and large mem ops (avoiding RFO for large stores, TLB preheating, etc) Optimize this into something reasonable
 
_test eax movaps (%eax)
 
_test eax xmm0 movl (%esp)
 
_test eax xmm0 eax xmm1 comiss xmm1 setae al movzbl ecx eax edx ecx cmove eax ret Note the cmove can be replaced with a single cmovae There are a number of issues We are introducing a setcc between the result of the intrisic call and select The intrinsic is expected to produce a i32 value so a any extend (which becomes a zero extend) is added. We probably need some kind of target DAG combine hook to fix this. We generate significantly worse code for this than GCC
 
We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles eax leal (%eax,%eax, 2)
 
the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section coalesced instead of section coalesced Take a look at darwin there are other Darwin assembler directives that we do not make use of define i32 foo (i32 *%a, i32 %t)
 
esp movl (%esp, 1)
 
esp eax movl ecx ecx cvtsi2ss xmm0 eax cvtsi2ss xmm1 xmm0 addss xmm0 movss flds (%esp, 1) 0000002d addl $0x04
 
we currently eax movsbl (%esp)
 
_test eax subl (%esp)
 
For the entry BB esp pxor xmm0 movsd (%esp)
 
For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd LCPI1_1 (%rip)
 
For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd xmm3 ucomisd xmm0 ja LBB1_3 xmm2 xmm0 ret We should sink the load into xmm3 into the LBB1_2 block This should be pretty and will nuke all the copies bool full_add (unsigned a, unsigned b)
 
bool no_overflow (unsigned a, unsigned b)
 
< i32 > br label bb114 eax ecx movl ebp subl ebp movswl (%ebp)
 
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp ret (also really horrible code on ppc). This is due to the expand code for 64-bit compares. GCC produces multiple branches
 
int caller (int32 arg1, int32 arg2)
 
Here we don t need to write any variables to the top of the stack since they don t overwrite each other int callee (int32 arg1, int32 arg2)
 
Here we need to push the arguments because they overwrite each other main ()
 
void test (double *P)
 
The generated code on x86 for checking for signed overflow on a multiply the obvious way is much longer than it needs to be int x (int a, int b)
 
int FirstOnet (unsigned long long arg1)
 
The following code is currently eax eax ecx jb LBB1_2 eax movzbl first_one (%eax)
 
The following code is currently eax eax ecx jb LBB1_2 eax movzbl eax ret eax ret We could change the eax into movzwl (%esp)
 
< i32 > ret i32 tmp10 ecx esp je LBB0_2 eax addl eax ret edx movl eax subl eax ret There s an obviously unnecessary movl in and we could eliminate a couple more movls by putting(%esp) into %eax instead of %ecx. See rdar< i1 * > define fastcc void abort_gzip () noreturn nounwind
 
declare void exit (i32) noreturn nounwind This compiles into
 
 into (-m64)
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp leaq (%rsp)
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax addq (%rsp)
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 movq (%rsp)
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align edx rdx eax movl rsp eax rsp ret and it gets compiled into this on ebp esp eax movl ebp eax movl ebp eax esp popl ebp ret gcc ebp eax popl ebp ret Teach tblgen not to check bitconvert source type in some cases This allows us to consolidate the following patterns in X86InstrMMX v2i32 (MMX_MOVDQ2Qrr VR128:$src))>
 
def v4i16 (MMX_MOVDQ2Qrr VR128:$src))>
 
def v8i8 (MMX_MOVDQ2Qrr VR128:$src))>
 
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax addl (%esp)
 
Current eax eax eax ret Ideal eax eax ret Re implement atomic builtins __sync_add_and_fetch () and __sync_sub_and_fetch properly. When the return value is not used(i.e. only care about the value in the memory)
 
int bar (struct B *a)
 
define i32 bar (%struct.B *nocapture %a) nounwind readonly optsize
 
bar al al movzbl eax ret Missed when stored in a memory are stored as single byte objects the value of which is always (false) or 1(true). We are not using this fact
 
define i32 bar (i8 *nocapture %a) nounwind readonly optsize
 
<%struct.bf ** > define void t1 () nounwind ssp
 
LLVM currently emits rax rax movq bfi (%rip)
 
else if (x==1) qux()
 

Variables

Improvements to the multiply shift add algorithm
 
Improvements to the multiply shift add e g in LLVM
 
http __pad4__
 
http eax xorl edx testb
 
http eax xorl edx cl sete al setne dl sall cl
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support Also
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra shift
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction shorter
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax shrl
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl eax
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx xorl
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel ideas
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel Duplication
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence Problem
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t commutative
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many cases
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like this
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl al
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl ebx
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl esp setl al ret Doing this correctly is tricky though
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl esp setl al ret Doing this correctly is tricky as the xor clobbers the flags We should generate bts btr etc instructions on targets where they are cheap or when codesize is important e g
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl esp setl al ret Doing this correctly is tricky as the xor clobbers the flags We should generate bts btr etc instructions on targets where they are cheap or when codesize is important e for
 
http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl esp setl al ret Doing this correctly is tricky as the xor clobbers the flags We should generate bts btr etc instructions on targets where they are cheap or when codesize is important e int bit
 
Instead of the following for memset char edx movl
 
Instead of the following for memset char edx edx movw
 
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv X
 
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y _test1
 
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx sarl
 
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl ecx
 
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen it
 
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen one of which is eax cmpl
 
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen one of which is eax eax ecx cmovle eax eax ret which is probably slower
 
Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen one of which is eax eax ecx cmovle eax eax ret which is probably but it s interesting at least
 
_test __pad5__
 
_test eax xmm0 eax xmm1 comiss xmm0
 
_test eax xmm0 eax xmm1 comiss xmm1 setae al movzbl ecx eax edx ecx cmove eax ret Note the setae
 
_test eax xmm0 eax xmm1 comiss xmm1 setae al movzbl ecx eax edx ecx cmove eax ret Note the movzbl
 
We currently emits imull
 
We currently emits esp
 
We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles Note
 
We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles eax eax The current instruction priority is based on pattern complexity The former is more complex because it folds a load so the latter will not be emitted Perhaps we should use AddedComplexity to give LEA32r a higher priority We should always try to match LEA first since the LEA matching code does some estimate to determine whether the match is profitable However
 
We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles eax eax The current instruction priority is based on pattern complexity The former is more complex because it folds a load so the latter will not be emitted Perhaps we should use AddedComplexity to give LEA32r a higher priority We should always try to match LEA first since the LEA matching code does some estimate to determine whether the match is profitable if we care more about code size
 
We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles eax eax The current instruction priority is based on pattern complexity The former is more complex because it folds a load so the latter will not be emitted Perhaps we should use AddedComplexity to give LEA32r a higher priority We should always try to match LEA first since the LEA matching code does some estimate to determine whether the match is profitable if we care more about code then imull is better It s two bytes shorter than movl leal On a Pentium M
 
We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles eax eax The current instruction priority is based on pattern complexity The former is more complex because it folds a load so the latter will not be emitted Perhaps we should use AddedComplexity to give LEA32r a higher priority We should always try to match LEA first since the LEA matching code does some estimate to determine whether the match is profitable if we care more about code then imull is better It s two bytes shorter than movl leal On a Pentium both variants have the same characteristics with regard to throughput
 
 however
 
the multiplication has a latency of four cycles
 
the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section __TEXT
 
the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section __const_coal
 
the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section coalesced instead of section __DATA
 
the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section coalesced instead of section coalesced Take a look at darwin h
 
is pessimized by loop reduce and indvars u32 to float conversion improvement
 
float fh = (int) (u >> 16)
 
fh *return fh fl = 0x1.0p16f
 
 subl
 
esp eax movl ecx ecx cvtsi2ss xmm0 andl
 
esp eax movl ecx ecx cvtsi2ss xmm0 eax cvtsi2ss xmm1 mulss
 
esp eax movl ecx ecx cvtsi2ss xmm0 eax cvtsi2ss xmm1 xmm0 addss xmm1
 
 return
 
We should lrintf and probably other libc functions This code
 
is currently compiled to
 
is currently compiled esp esp jne LBB1_1 addl
 
is currently compiled esp esp jne LBB1_1 esp ret LBB1_1
 
is currently compiled esp esp jne LBB1_1 esp ret esp esp jne L_abort $stub esp ret This can be applied to any no return function call that takes no arguments etc Alternatively
 
is currently compiled esp esp jne LBB1_1 esp ret esp esp jne L_abort $stub esp ret This can be applied to any no return function call that takes no arguments etc the stack save restore logic could be shrink wrapped
 
is currently compiled esp esp jne LBB1_1 esp ret esp esp jne L_abort $stub esp ret This can be applied to any no return function call that takes no arguments etc the stack save restore logic could be shrink producing something like esp jne LBB1_1 ret esp call L_abort $stub Both are useful in different situations Finally
 
is currently compiled esp esp jne LBB1_1 esp ret esp esp jne L_abort $stub esp ret This can be applied to any no return function call that takes no arguments etc the stack save restore logic could be shrink producing something like esp jne LBB1_1 ret esp call L_abort $stub Both are useful in different situations it could be shrink wrapped and tail called
 
we currently produce
 
we currently eax ecx subl eax ret We would use one fewer register if codegen d as
 
we currently eax ecx subl eax ret We would use one fewer register if codegen d eax neg eax add
 
we currently eax ecx subl eax ret We would use one fewer register if codegen d eax neg eax eax ret Note that this isn t beneficial if the load can be folded into the sub In this case
 
we currently eax ecx subl eax ret We would use one fewer register if codegen d eax neg eax eax ret Note that this isn t beneficial if the load can be folded into the sub In this we want a sub
 
_test __pad6__
 
_test eax eax ret Leaf functions that require one byte spill slot have a prolog like esp and an epilog like esp popl esi ret It would be smaller
 
_test eax eax ret Leaf functions that require one byte spill slot have a prolog like esp and an epilog like esp popl esi ret It would be and potentially faster
 
For example
 
For the entry BB is
 
For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd xmm3 ucomisd xmm0 ja LBB1_3 LBB1_2
 
For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd xmm3 ucomisd xmm0 ja LBB1_3 xmm2 LBB1_3
 
For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd xmm3 ucomisd xmm0 ja LBB1_3 xmm2 xmm0 ret We should sink the load into xmm3 into the LBB1_2 block This should be pretty easy
 
For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd xmm3 ucomisd xmm0 ja LBB1_3 xmm2 xmm0 ret We should sink the load into xmm3 into the LBB1_2 block This should be pretty and will nuke all the copies This
 
Should compile edi setae al movzbl eax ret on x86
 
Should compile edi setae al movzbl eax ret on instead of the rather stupid looking
 
Should compile edi setae al movzbl eax ret on instead of the rather stupid edi setb al xorb
 
 preds
 
< i32tmp233 = sub i32 32
 
< i32tmp231232
 
< i32tmp245246 = sext i16 %tmp65 to i32
 
< i32tmp252253 = sext i16 %tmp68 to i32
 
< i32tmp254 = sub i32 32
 
< i32tmp553554 = bitcast i16* %tmp37 to i8*
 
< i8 * > tmp583584 = sext i16 %tmp98 to i32
 
< i32tmp585 = sub i32 32
 
< i32tmp614615 = sext i16 %tmp101 to i32
 
< i32tmp621622 = sext i16 %tmp104 to i32
 
< i32tmp623 = sub i32 32
 
< i32 > br label bb114 produces
 
< i32 > br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could be
 
< i32 > br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could ebp ebp This seems like a cross between remat and spill folding This has redundant subtractions of eax from a stack slot ecx doesn t change
 
< i32 > br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could ebp ebp This seems like a cross between remat and spill folding This has redundant subtractions of eax from a stack slot ecx doesn t so we could simply subtract eax from ecx first and then use ecx(or vice-versa). This code< i1 > br i1 tmp659
 
< i32 > br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could ebp ebp This seems like a cross between remat and spill folding This has redundant subtractions of eax from a stack slot ecx doesn t so we could simply subtract eax from ecx first and then use ecx(or vice-versa). This code< i1 > br i1 label cond_true662
 
< i32 > br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could ebp ebp This seems like a cross between remat and spill folding This has redundant subtractions of eax from a stack slot ecx doesn t so we could simply subtract eax from ecx first and then use ecx(or vice-versa). This code< i1 > br i1 label label cond_next715 produces cx movswl cx
 
to __pad7__
 
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove ax
 
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much nicer
 
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 L5
 
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret p2align
 
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret L7
 
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret eax ja L5 L4
 
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret eax ja L5 call abort Tail call optimization improvements
 
to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret eax ja L5 call abort Tail call optimization int64
 
Moving arg1 onto the stack slot of callee function would overwrite arg2 of the caller Possible optimizations
 
Moving arg1 onto the stack slot of callee function would overwrite arg2 of the caller Possible int32 arg2
 
gcc compiles this esp xorl eax jmp L2 L3
 
gcc compiles this esp xorl eax jmp L2 eax je L10 L2
 
gcc compiles this esp xorl eax jmp L2 eax je L10 eax eax jne L3 call L_abort $stub L10
 
gcc compiles this esp xorl eax jmp L2 eax je L10 eax eax jne L3 call L_abort $stub esp call L_exit $stub llvm
 
gcc compiles this esp xorl eax jmp L2 eax je L10 eax eax jne L3 call L_abort $stub esp call L_exit $stub esp eax ecx eax jge LBB1_4 ecx ecx jne LBB1_1 eax esp ret LBB1_4
 
it should cost the same as a move shift on any modern processor
 
it should cost the same as a move shift on any modern but it s a lot shorter Downside is that it puts more pressure on register allocation because it has fixed operands Example
 
The following code is currently generated
 
this lets us change the cmpl into a testl
 
this lets us change the cmpl into a which is and eliminate the shift We compile this function
 
this lets us change the cmpl into a which is and eliminate the shift We compile this i32 b { return (unsigned char)a == (unsigned char)b
 
this lets us change the cmpl into a which is and eliminate the shift We compile this i32 i32 c
 
this lets us change the cmpl into a which is and eliminate the shift We compile this i32 i32 i8 zeroext d nounwind
 
< i1 > br i1 tmp2
 
< i1 > br i1 label bb7
 
< i1 > br i1 label label bb bb
 
< i32 > ret i32 tmp10 ecx cmpb
 
< i32 > ret i32 tmp10 ecx esp je LBB0_2 eax addl eax ret LBB0_2
 
gets compiled into this on rsp movaps xmm7
 
gets compiled into this on rsp movaps rsp movaps xmm6
 
gets compiled into this on rsp movaps rsp movaps rsp movaps xmm5
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps xmm4
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps xmm3
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps xmm2
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq r9
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq r8
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rcx
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rdx
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsi
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rax
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax addq
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc generates
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp LCFI0
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align L6
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align edx rdx eax movl rsp eax rsp ret and it gets compiled into this on ebp esp eax movl ebp eax movl ebp eax esp popl ebp ret gcc ebp eax popl ebp ret Teach tblgen not to check bitconvert source type in some cases This allows us to consolidate the following patterns in X86InstrMMX td
 
gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align edx rdx eax movl rsp eax rsp ret and it gets compiled into this on ebp esp eax movl ebp eax movl ebp eax esp popl ebp ret gcc ebp eax popl ebp ret Teach tblgen not to check bitconvert source type in some cases This allows us to consolidate the following patterns in X86InstrMMX iPTR
 
def __pad8__
 
def __pad9__
 
There are other cases in various td files Take something like the following on unsigned y {return x % y
 
We currently generate a libcall
 
We currently generate a but we really shouldn t
 
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 and
 
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 or
 
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 xor
 
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 neg
 
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 shl
 
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 sra
 
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 srl
 
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 shld
 
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 shrd
 
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 atomic ops
 
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 atomic and others It is also currently not done for read modify write instructions It is also current not done if the OF or CF flags are needed The shift operators have the complication that when the shift count is zero
 
We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 atomic and others It is also currently not done for read modify write instructions It is also current not done if the OF or CF flags are needed The shift operators have the complication that when the shift count is EFLAGS is not set
 
Current output
 
Current eax eax eax ret Ideal eax eax ret Re implement atomic builtins x86 does not have to use add to implement these Instead
 
Current eax eax eax ret Ideal eax eax ret Re implement atomic builtins x86 does not have to use add to implement these it can use inc
 
bar __pad10__
 
bar al andb
 
bar al al movzbl eax ret Missed optimization
 
bar al al movzbl eax ret Missed when stored in a memory object
 
bar __pad11__
 
bar al al movzbl eax ret GCC produces bar
 
We generate the following IR with clang
 
We generate the following IR with i32 b nounwind readnone
 
< i32tmp6 = and i32 %tmp
 
< i32 >< i32cmp = icmp eq i32 %tmp6
 
< i32 >< i32 >< i1conv5 = zext i1 %cmp to i32
 
< i32conv1 = ashr i32 %sext
 
< i32 >< i32sext6 = shl i32 %b
 
< i32 >< i32 >< i32conv4 = ashr i32 %sext6
 
< i32 > ret i32 conv5 And the following x86 eax movsbl dil
 
< i32 > ret i32 conv5 And the following x86 eax movsbl ecx cmpl ecx sete al movzbl eax ret It should be possible to eliminate the sign extensions LLVM misses a load store narrowing opportunity in this i16
 
< i32 > ret i32 conv5 And the following x86 eax movsbl ecx cmpl ecx sete al movzbl eax ret It should be possible to eliminate the sign extensions LLVM misses a load store narrowing opportunity in this i32 bfi = external global %struct.bf*
 
LLVM currently emits rax rax movq rax rax ret It could narrow the loads and stores to emit rax rax movq rax rax ret The trouble is that there is a TokenFactor between the store and the load
 
currently compiles into
 
currently compiles eax eax je LBB0_3 testl eax jne LBB0_4 the testl could be removed
 

Typedef Documentation

◆ a

Should also combine to x Currently not optimized with clang emit llvm bc opt O3 int a

Definition at line 489 of file README.txt.

◆ edx

We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl edx

Definition at line 923 of file README.txt.

◆ mtune

using mtune = pentium2/3/4/m/etc.: abs: movl 4(%esp), %eax cltd xorl %edx, %eax subl %edx, %eax ret Take the following code (from http: extern unsigned char first_one[65536]

Definition at line 943 of file README.txt.

◆ tmp

< i32 > tmp = call i32 @t4( i32 %tmp.5 )

Definition at line 1347 of file README.txt.

◆ tmp1

using tmp1 = urem i32 %X, 255 ret i32 %tmp1 } Currently it compiles to: ... movl $2155905153, %ecx movl 8(%esp), %esi movl %esi, %eax mull %ecx ... This could be "reassociated" into: movl $2155905153, %eax movl 8(%esp), %ecx mull %ecx to avoid the copy. In fact, the existing two-address stuff would do this except that mul isn't a commutative 2-addr instruction. I guess this has to be done at isel time based on the #uses to mul? Make sure the instruction which starts a loop does not cross a cacheline boundary. This requires knowning the exact length of each machine instruction. That is somewhat complicated, but doable. Example 256.bzip2: In the new trace, the hot loop has an instruction which crosses a cacheline boundary. In addition to potential cache misses, this can't help decoding as I imagine there has to be some kind of complicated decoder reset and realignment to grab the bytes from the next cacheline. 532 532 0x3cfc movb (1809(%esp, %esi), %bl <<<--- spans 2 64 byte lines 942 942 0x3d03 movl %dh, (1809(%esp, %esi) 937 937 0x3d0a incl %esi 3 3 0x3d0b cmpb %bl, %dl 27 27 0x3d0d jnz 0x000062db <main+11707> In c99 mode, the preprocessor doesn't like assembly comments like #TRUNCATE. This could be a single 16-bit load. int f(char *p) { if ((p[0] == 1) & (p[1] == 2)) return 1

Definition at line 375 of file README.txt.

Function Documentation

◆ __sync_add_and_fetch()

Current eax eax eax ret Ideal eax eax ret Re implement atomic builtins __sync_add_and_fetch ( )

◆ abort_gzip()

<i32> ret i32 tmp10 ecx esp je LBB0_2 eax addl eax ret edx movl eax subl eax ret There s an obviously unnecessary movl in and we could eliminate a couple more movls by putting (%esp) into %eax instead of %ecx. See rdar<i1*> define fastcc void abort_gzip ( )

Definition at line 1060 of file README.txt.

References b, bb, call(), entry, exit(), i1, i32, load, nounwind, preds, store, and uses.

◆ addl()

We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax addl ( esp)

◆ addq()

gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax addq ( rsp)

◆ always()

bar al al movzbl eax ret Missed when stored in a memory are stored as single byte objects the value of which is always ( false  )

◆ bar() [1/3]

define i32 bar ( %struct.B *nocapture %  a)

Definition at line 1390 of file README.txt.

◆ bar() [2/3]

define i32 bar ( i8 *nocapture %  a)

Definition at line 1418 of file README.txt.

◆ bar() [3/3]

int bar ( struct B a)

Definition at line 1388 of file README.txt.

◆ bfi()

LLVM currently emits rax rax movq bfi ( rip)

◆ callee()

Here we don t need to write any variables to the top of the stack since they don t overwrite each other int callee ( int32  arg1,
int32  arg2 
)

◆ caller()

int caller ( int32  arg1,
int32  arg2 
)

Definition at line 681 of file README.txt.

Referenced by useFuncSeen().

◆ clearbit()

void clearbit ( int target,
int  bit 
)

Definition at line 111 of file README.txt.

References bit.

◆ exit()

declare void exit ( i32  )

◆ extend()

_test eax xmm0 eax xmm1 comiss xmm1 setae al movzbl ecx eax edx ecx cmove eax ret Note the cmove can be replaced with a single cmovae There are a number of issues We are introducing a setcc between the result of the intrisic call and select The intrinsic is expected to produce a i32 value so a any extend ( which becomes a zero  extend)

Definition at line 213 of file README.txt.

Referenced by DecodeAddSubERegInstruction().

◆ first_one()

The following code is currently eax eax ecx jb LBB1_2 eax movzbl first_one ( eax)

Referenced by FirstOnet().

◆ FirstOnet()

int FirstOnet ( unsigned long long  arg1)

Definition at line 944 of file README.txt.

References first_one().

◆ flds()

esp eax movl ecx ecx cvtsi2ss xmm0 eax cvtsi2ss xmm1 xmm0 addss xmm0 movss flds ( esp,
 
)

◆ foo()

the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section coalesced instead of section coalesced Take a look at darwin there are other Darwin assembler directives that we do not make use of define i32 foo ( i32 *%  a,
i32 t 
)

Definition at line 266 of file README.txt.

References add, entry, i1, i32, load, llvm::numbers::phi, preds, ret(), t, tmp2, tmp5, uses, and x.

◆ full_add()

For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd xmm3 ucomisd xmm0 ja LBB1_3 xmm2 xmm0 ret We should sink the load into xmm3 into the LBB1_2 block This should be pretty and will nuke all the copies bool full_add ( unsigned  a,
unsigned  b 
)

Definition at line 531 of file README.txt.

◆ if()

else if ( x  = =1)

◆ into()

into ( m64)

Definition at line 1092 of file README.txt.

References x.

◆ LCPI1_1()

For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd LCPI1_1 ( rip)

◆ leal() [1/2]

gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align edx rdx eax movl rsp eax rsp ret and it gets compiled into this on ebp esp eax movl ebp leal ( eax)

◆ leal() [2/2]

We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles eax leal ( eax,
eax,
 
)

◆ leaq()

gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp leaq ( rsp)

◆ main()

Here we need to push the arguments because they overwrite each other main ( )

Definition at line 718 of file README.txt.

◆ movaps()

_test eax movaps ( eax)

◆ movl() [1/2]

_test eax xmm0 movl ( esp)

◆ movl() [2/2]

esp movl ( esp,
 
)

◆ movq()

gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 movq ( rsp)

◆ movsbl()

we currently eax movsbl ( esp)

◆ movsd()

For the entry BB esp pxor xmm0 movsd ( esp)

◆ movswl()

< i32 > br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp movswl ( ebp)

◆ movzwl()

The following code is currently eax eax ecx jb LBB1_2 eax movzbl eax ret eax ret We could change the eax into movzwl ( esp)

◆ no_overflow()

bool no_overflow ( unsigned  a,
unsigned  b 
)

Definition at line 533 of file README.txt.

References b, and full_add().

◆ ops()

Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen one of which is eax eax ecx cmovle eax eax ret which is probably but it s interesting at since libc is hand tuned for medium and large mem ops ( avoiding RFO for large  stores,
TLB  preheating,
etc   
)

Definition at line 167 of file README.txt.

◆ processors()

http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some processors ( which ones?  )

◆ ret()

to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp ret ( also really horrible code on  ppc)

◆ shifts()

http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit shifts ( in  general)

◆ subl()

_test eax subl ( esp)

◆ support()

http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg support ( From http:but without the unnecessary  and.)

◆ t1()

<%struct.bf**> define void t1 ( )

◆ test()

void test ( double P)

Definition at line 898 of file README.txt.

References b, bar, c, P, and X.

◆ v2i32()

gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align edx rdx eax movl rsp eax rsp ret and it gets compiled into this on ebp esp eax movl ebp eax movl ebp eax esp popl ebp ret gcc ebp eax popl ebp ret Teach tblgen not to check bitconvert source type in some cases This allows us to consolidate the following patterns in X86InstrMMX v2i32 ( MMX_MOVDQ2Qrr VR128:$src  )

◆ v4i16()

def v4i16 ( MMX_MOVDQ2Qrr VR128:$src  )

◆ v8i8()

def v8i8 ( MMX_MOVDQ2Qrr VR128:$src  )

◆ x()

The generated code on x86 for checking for signed overflow on a multiply the obvious way is much longer than it needs to be int x ( int  a,
int  b 
)

Definition at line 913 of file README.txt.

References b.

Variable Documentation

◆ __const_coal

the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section coalesced instead of section __const_coal

Definition at line 259 of file README.txt.

◆ __DATA

the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section coalesced instead of section __DATA

Definition at line 260 of file README.txt.

◆ __pad10__

bar __pad10__

Definition at line 1400 of file README.txt.

◆ __pad11__

bar __pad11__

Definition at line 1426 of file README.txt.

◆ __pad4__

http __pad4__

Definition at line 20 of file README.txt.

◆ __pad5__

_test __pad5__

Definition at line 197 of file README.txt.

◆ __pad6__

_test __pad6__

Definition at line 462 of file README.txt.

◆ __pad7__

to __pad7__

Definition at line 630 of file README.txt.

◆ __pad8__

def __pad8__

Definition at line 1207 of file README.txt.

◆ __pad9__

def __pad9__

Definition at line 1210 of file README.txt.

◆ __TEXT

the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section __TEXT

Definition at line 259 of file README.txt.

◆ _test1

Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y _test1

Definition at line 143 of file README.txt.

◆ add

Current eax eax eax ret Ideal eax eax ret Re implement atomic builtins x86 does not have to use add to implement these it can use add

Definition at line 454 of file README.txt.

Referenced by foo().

◆ addl

gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align edx rdx eax movl rsp eax rsp ret and it gets compiled into this on ebp esp eax movl ebp eax movl ebp eax addl

Definition at line 397 of file README.txt.

◆ addq

gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align edx rdx eax movl rsp eax addq

Definition at line 1143 of file README.txt.

◆ al

< i32 > ret i32 conv5 And the following x86 eax movsbl ecx cmpl ecx sete al movzbl al

Definition at line 89 of file README.txt.

◆ algorithm

Improvements to the multiply shift add algorithm

Definition at line 10 of file README.txt.

◆ Also

We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 atomic and others It is also currently not done for read modify write instructions It is also current not done if the OF or CF flags are needed The shift operators have the complication that when the shift count is EFLAGS is not so they can only subsume a test instruction if the shift count is known to be non zero Also

Definition at line 30 of file README.txt.

◆ Alternatively

is currently compiled esp esp jne LBB1_1 esp ret esp esp jne L_abort $stub esp ret This can be applied to any no return function call that takes no arguments etc Alternatively

Definition at line 412 of file README.txt.

◆ and

We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 and

Definition at line 1271 of file README.txt.

Referenced by llvm::gsym::operator<<().

◆ andb

LLVM currently emits rax rax movq rax rax ret It could narrow the loads and stores to emit rax rax movq rax andb

Definition at line 1401 of file README.txt.

◆ andl

LLVM currently emits rax rax movq rax andl

Definition at line 302 of file README.txt.

◆ arg2

Moving arg1 onto the stack slot of callee function would overwrite arg2 of the caller Possible int32 arg2

Definition at line 700 of file README.txt.

◆ as

we currently eax ecx subl eax ret We would use one fewer register if codegen d as

Definition at line 452 of file README.txt.

◆ ax

to esp esp setne al movzbw ax esp setg cl movzbw cx cmove ax

Definition at line 637 of file README.txt.

◆ b

<i32> ret i32 conv5 And the following x86 edi dil sete al movzbl eax ret A cmpb instead of the xorl testb would be one instruction shorter Given the following C int b { return (unsigned char)a == (unsigned char)b

Definition at line 973 of file README.txt.

◆ bar

bar al al movzbl eax ret GCC produces bar

Definition at line 1434 of file README.txt.

Referenced by foo(), and test().

◆ bb

<i1> br i1 label label bb bb

◆ bb7

< i32 > ret i32 tmp6 bb7

Definition at line 976 of file README.txt.

◆ be

<i32> br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could be

Definition at line 592 of file README.txt.

◆ bfi

LLVM currently emits rax rax movq rax rax ret It could narrow the loads and stores to emit rax rax movq bfi = external global %struct.bf*

Definition at line 1495 of file README.txt.

Referenced by llvm::InstructionSelector::setupMF().

◆ bit

http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl esp setl al ret Doing this correctly is tricky as the xor clobbers the flags We should generate bts btr etc instructions on targets where they are cheap or when codesize is important e int bit
Initial value:
{
*target |= (1 << bit)

Definition at line 108 of file README.txt.

◆ c

c

Definition at line 973 of file README.txt.

◆ called

is currently compiled esp esp jne LBB1_1 esp ret esp esp jne L_abort $stub esp ret This can be applied to any no return function call that takes no arguments etc the stack save restore logic could be shrink producing something like esp jne LBB1_1 ret esp call L_abort $stub Both are useful in different situations it could be shrink wrapped and tail called

Definition at line 424 of file README.txt.

◆ case

we currently eax ecx subl eax ret We would use one fewer register if codegen d eax neg eax eax ret Note that this isn t beneficial if the load can be folded into the sub In this case

Definition at line 458 of file README.txt.

◆ cases

http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many cases

Definition at line 83 of file README.txt.

◆ change

<i32> br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could ebp ebp This seems like a cross between remat and spill folding This has redundant subtractions of eax from a stack slot ecx doesn t change

Definition at line 599 of file README.txt.

◆ cl

to esp esp setne al movzbw ax esp setg cl movzbw cl

Definition at line 25 of file README.txt.

Referenced by compareMBBPriority().

◆ clang

We generate the following IR with clang

Definition at line 1443 of file README.txt.

◆ cmp

< i32 >< i32 >< i32 >< i32 > cmp = icmp eq i32 %tmp6

◆ cmpb

<i32> ret i32 tmp10 ecx cmpb

Definition at line 992 of file README.txt.

◆ cmpl

currently compiles eax eax je LBB0_3 testl eax jne LBB0_4 the testl could be eax cmpl

Definition at line 155 of file README.txt.

◆ code

LLVM currently emits rax rax movq rax rax ret It could narrow the loads and stores to emit rax rax movq rax rax ret The trouble is that there is a TokenFactor between the store and the making it non trivial to determine if there s anything between the load and the store which would prohibit narrowing This code
inline

Definition at line 388 of file README.txt.

◆ commutative

http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t commutative

Definition at line 77 of file README.txt.

◆ cond_true662

<i32> br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could ebp ebp This seems like a cross between remat and spill folding This has redundant subtractions of eax from a stack slot ecx doesn t so we could simply subtract eax from ecx first and then use ecx (or vice-versa). This code<i1> br i1 label cond_true662

Definition at line 607 of file README.txt.

◆ conv1

<i32> conv1 = ashr i32 %sext

Definition at line 1470 of file README.txt.

◆ conv4

< i32 >< i32 >< i32 >< i32 > conv4 = ashr i32 %sext6

Definition at line 1472 of file README.txt.

◆ conv5

< i1 > conv5 = zext i1 %cmp to i32

Definition at line 1448 of file README.txt.

◆ cx

<i32> br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could ebp ebp This seems like a cross between remat and spill folding This has redundant subtractions of eax from a stack slot ecx doesn t so we could simply subtract eax from ecx first and then use ecx (or vice-versa). This code<i1> br i1 label label cond_next715 produces cx movswl cx

Definition at line 612 of file README.txt.

◆ cycles

the multiplication has a latency of four cycles

Definition at line 253 of file README.txt.

◆ dil

Definition at line 1480 of file README.txt.

◆ Duplication

http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel Duplication

Definition at line 51 of file README.txt.

◆ easy

For the entry BB esp pxor xmm0 xmm1 ucomisd xmm1 setnp al sete cl testb al jne LBB1_5 xmm2 cvtss2sd xmm3 ucomisd xmm0 ja LBB1_3 xmm2 xmm0 ret We should sink the load into xmm3 into the LBB1_2 block This should be pretty easy

Definition at line 525 of file README.txt.

◆ eax

currently compiles eax eax je LBB0_3 testl eax

Definition at line 36 of file README.txt.

◆ ebx

http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl ebx

Definition at line 97 of file README.txt.

◆ ecx

Definition at line 147 of file README.txt.

Referenced by getReadTimeStampCounter().

◆ esp

esp eax movl ecx ecx cvtsi2ss xmm0 eax cvtsi2ss xmm1 xmm0 addss xmm0 movss esp

Definition at line 235 of file README.txt.

◆ example

For example

Definition at line 492 of file README.txt.

◆ Example

it should cost the same as a move shift on any modern but it s a lot shorter Downside is that it puts more pressure on register allocation because it has fixed operands Example

Definition at line 928 of file README.txt.

◆ faster

_test eax eax ret Leaf functions that require one byte spill slot have a prolog like esp and an epilog like esp popl esi ret It would be and potentially faster

Definition at line 479 of file README.txt.

◆ fh

float fh = (int) (u >> 16)

Definition at line 292 of file README.txt.

◆ Finally

is currently compiled esp esp jne LBB1_1 esp ret esp esp jne L_abort $stub esp ret This can be applied to any no return function call that takes no arguments etc the stack save restore logic could be shrink producing something like esp jne LBB1_1 ret esp call L_abort $stub Both are useful in different situations Finally

Definition at line 423 of file README.txt.

◆ fl

fh* return fh fl = 0x1.0p16f

Definition at line 293 of file README.txt.

◆ for

http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl esp setl al ret Doing this correctly is tricky as the xor clobbers the flags We should generate bts btr etc instructions on targets where they are cheap or when codesize is important e for

Definition at line 108 of file README.txt.

◆ function

this lets us change the cmpl into a which is and eliminate the shift We compile this function

Definition at line 973 of file README.txt.

◆ g

http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel and Sequencing of Instructions Scheduling for reduced register pressure E g Minimum Register Instruction Sequence load p Because the compare isn t it is not matched with the load on both sides The dag combiner should be made smart enough to canonicalize the load into the RHS of a compare when it can invert the result of the compare for free In many LLVM generates code like eax cmpl esp setl al movzbl eax ret on some it is more efficient to do ebx xor eax cmpl esp setl al ret Doing this correctly is tricky as the xor clobbers the flags We should generate bts btr etc instructions on targets where they are cheap or when codesize is important e g

Definition at line 106 of file README.txt.

◆ generated

The following code is currently generated

Definition at line 954 of file README.txt.

◆ generates

gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align edx rdx eax movl rsp eax rsp ret and it gets compiled into this on ebp esp eax movl ebp eax movl ebp eax esp popl ebp ret gcc generates

Definition at line 1153 of file README.txt.

◆ h

the multiplication has a latency of four as opposed to two cycles for the movl lea variant It appears gcc place string data with linkonce linkage in section coalesced instead of section coalesced Take a look at darwin h

◆ However

<i32> br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp ebp This appears to be bad because the RA is not folding the store to the stack slot into the movl The above instructions could ebp ebp This seems like a cross between remat and spill folding This has redundant subtractions of eax from a stack slot However

Definition at line 249 of file README.txt.

◆ however

however

Definition at line 253 of file README.txt.

◆ i16

< i32 > ret i32 conv5 And the following x86 eax movsbl ecx cmpl ecx sete al movzbl eax ret It should be possible to eliminate the sign extensions LLVM misses a load store narrowing opportunity in this i16

Definition at line 1493 of file README.txt.

◆ ideas

http eax xorl edx cl sete al setne dl sall eax sall edx But that requires good bit subreg support this might be better It s an extra but it s one instruction and doesn t stress bit subreg eax eax movl edx edx sall eax sall cl edx bit we should expand to a conditional branch like GCC produces Some isel ideas

Definition at line 51 of file README.txt.

◆ improvement

is pessimized by loop reduce and indvars u32 to float conversion improvement

Definition at line 291 of file README.txt.

◆ improvements

to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret eax ja L5 call abort Tail call optimization improvements

Definition at line 665 of file README.txt.

◆ imull

We currently emits imull

Definition at line 235 of file README.txt.

◆ inc

Current eax eax eax ret Ideal eax eax ret Re implement atomic builtins x86 does not have to use add to implement these it can use inc

◆ Instead

Current eax eax eax ret Ideal eax eax ret Re implement atomic builtins x86 does not have to use add to implement these Instead

Definition at line 1366 of file README.txt.

◆ int64

to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret eax ja L5 call abort Tail call optimization int64

Definition at line 680 of file README.txt.

◆ into

currently compiles into

Definition at line 1544 of file README.txt.

◆ iPTR

def iPTR

Definition at line 1205 of file README.txt.

◆ is

For the entry BB is

Definition at line 495 of file README.txt.

◆ it

Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen it

Definition at line 151 of file README.txt.

◆ L10

gcc compiles this esp xorl eax jmp L2 eax je L10 eax eax jne L3 call L_abort $stub L10

Definition at line 747 of file README.txt.

◆ L2

gcc compiles this esp xorl eax jmp L2 eax je L10 L2

Definition at line 742 of file README.txt.

◆ L3

gcc compiles this esp xorl eax jmp L2 L3

Definition at line 739 of file README.txt.

◆ L4

to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret eax ja L5 L4

Definition at line 662 of file README.txt.

Referenced by llvm::SparcTargetLowering::getRegisterByName().

◆ L5

to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 L5

Definition at line 656 of file README.txt.

Referenced by llvm::SparcTargetLowering::getRegisterByName().

◆ L6

gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp eax eax jb L6 rdx eax rsp ret p2align L6

Definition at line 1168 of file README.txt.

Referenced by llvm::SparcTargetLowering::getRegisterByName().

◆ L7

to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much esp edx eax decl edx jle L7 esp ret L7

◆ LBB0_2

< i32 > ret i32 tmp10 ecx esp je LBB0_2 eax addl eax ret edx movl eax subl eax ret There s an obviously unnecessary movl in LBB0_2

Definition at line 999 of file README.txt.

◆ LBB1_1

We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 LBB1_1

Definition at line 405 of file README.txt.

◆ LBB1_2

gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp LBB1_2

Definition at line 519 of file README.txt.

◆ LBB1_3

gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret LBB1_3

Definition at line 521 of file README.txt.

◆ LBB1_4

gcc compiles this esp xorl eax jmp L2 eax je L10 eax eax jne L3 call L_abort $stub esp call L_exit $stub esp eax ecx eax jge LBB1_4 ecx ecx jne LBB1_1 eax esp ret LBB1_4

Definition at line 773 of file README.txt.

◆ LCFI0

gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp LCFI0

Definition at line 1155 of file README.txt.

◆ least

Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For we currently get ret i32 Y eax movl ecx ecx ecx addl eax eax ret GCC knows several different ways to codegen one of which is eax eax ecx cmovle eax eax ret which is probably but it s interesting at least

Definition at line 166 of file README.txt.

◆ libcall

We currently generate a libcall

Definition at line 1221 of file README.txt.

◆ LLVM

Improvements to the multiply shift add e g in LLVM

Definition at line 11 of file README.txt.

◆ llvm

gcc compiles this esp xorl eax jmp L2 eax je L10 eax eax jne L3 call L_abort $stub esp call L_exit $stub llvm

Definition at line 753 of file README.txt.

◆ load

LLVM currently emits rax rax movq rax rax ret It could narrow the loads and stores to emit rax rax movq rax rax ret The trouble is that there is a TokenFactor between the store and the load

◆ looking

Should compile edi setae al movzbl eax ret on instead of the rather stupid looking

Definition at line 543 of file README.txt.

◆ M

We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles eax eax The current instruction priority is based on pattern complexity The former is more complex because it folds a load so the latter will not be emitted Perhaps we should use AddedComplexity to give LEA32r a higher priority We should always try to match LEA first since the LEA matching code does some estimate to determine whether the match is profitable if we care more about code then imull is better It s two bytes shorter than movl leal On a Pentium M

Definition at line 252 of file README.txt.

Referenced by llvm::Automaton< uint64_t >::add(), llvm::CallGraphNode::addCalledFunction(), llvm::orc::LLLazyJIT::addLazyIRModule(), addMappingsFromTLI(), llvm::ModuleSymbolTable::addModule(), llvm::ExecutionEngine::addModule(), llvm::MCJIT::addModule(), addModuleFlags(), llvm::NamedMDNode::addOperand(), llvm::rdf::BlockNode::addPhi(), addPrepareFunction(), llvm::lto::Config::addSaveTemps(), addVariantDeclaration(), llvm::AMDGPUMachineFunction::allocateModuleLDSGlobal(), llvm::ValueMapCallbackVH< KeyT, ValueT, Config >::allUsesReplacedWith(), alwaysInlineImpl(), AMDGPUInformationCache::AMDGPUInformationCache(), llvm::ChangedIRComparer::analyzeIR(), llvm::GlobalsAAResult::analyzeModule(), annotateAllFunctions(), llvm::annotateValueSite(), llvm::appendToCompilerUsed(), appendToGlobalArray(), llvm::appendToGlobalCtors(), llvm::appendToGlobalDtors(), llvm::appendToUsed(), appendToUsedList(), llvm::cl::apply(), llvm::orc::LLJIT::applyDataLayout(), applyDebugify(), llvm::Value::assertModuleIsMaterializedImpl(), assureFPCallStub(), llvm::Automaton< uint64_t >::Automaton(), llvm::json::Object::begin(), llvm::xray::Graph< VertexAttribute, EdgeAttribute, VI >::InOutEdgeView< isConst, isOut >::begin(), llvm::DebugHandlerBase::beginModule(), llvm::CodeViewDebug::beginModule(), llvm::DwarfDebug::beginModule(), llvm::SwitchCG::BitTestCase::BitTestCase(), llvm::buildModuleSummaryIndex(), llvm::pdb::DbiModuleDescriptorBuilder::calculateSerializedLength(), callAppendArgs(), callAppendStringN(), llvm::CallGraph::CallGraph(), llvm::CallGraphDOTInfo::CallGraphDOTInfo(), callIntrinsic(), callPrintfBegin(), llvm::Automaton< uint64_t >::canAdd(), canonicalizeShuffleMaskWithCommute(), canonicalizeShuffleMaskWithHorizOp(), CanWidenIV(), llvm::xray::Graph< VertexAttribute, EdgeAttribute, VI >::InOutEdgeView< isConst, isOut >::cbegin(), llvm::xray::Graph< VertexAttribute, EdgeAttribute, VI >::InOutEdgeView< isConst, isOut >::cend(), llvm::checkDebugInfoMetadata(), checkDenormalAttributeConsistency(), checkFunctionsAttributeConsistency(), llvm::object::Binary::checkOffset(), checkSize(), llvm::M68kSubtarget::classifyExternalReference(), llvm::M68kSubtarget::classifyGlobalFunctionReference(), llvm::X86Subtarget::classifyGlobalFunctionReference(), llvm::M68kSubtarget::classifyGlobalReference(), llvm::X86Subtarget::classifyGlobalReference(), llvm::json::Object::clear(), llvm::ExecutionEngine::clearGlobalMappingsFromModule(), ScopedAliasMetadataDeepCloner::clone(), llvm::CloneModule(), llvm::orc::cloneToNewContext(), codegen(), CollectAddOperandsWithScales(), llvm::ModuleSymbolTable::CollectAsmSymbols(), llvm::ModuleSymbolTable::CollectAsmSymvers(), collectComdatMembers(), llvm::collectDebugInfoMetadata(), OutlinableGroup::collectGVNStoreSets(), llvm::AMDGPU::CollectReachableCallees::CollectReachableCallees(), llvm::AMDGPU::CollectReachableCallees::collectReachableCallees(), llvm::AMDGPU::collectReachableCallees(), llvm::collectUsedGlobalVariables(), combineAnd(), combineConcatVectorOps(), combineHorizOpWithShuffle(), combineRedundantDWordShuffle(), combineSetCCMOVMSK(), combineShuffleOfScalars(), combineTargetShuffle(), combineX86ShuffleChain(), combineX86ShuffleChainWithExtract(), combineX86ShufflesConstants(), combineX86ShufflesRecursively(), llvm::codeview::DebugCrossModuleExportsSubsection::commit(), llvm::pdb::DbiStreamBuilder::commit(), llvm::codeview::DebugCrossModuleImportsSubsection::commit(), commuteShuffle(), llvm::EHStreamer::computeActionsTable(), computeAddrSpace(), llvm::ScalarEvolution::computeConstantDifference(), llvm::ARM::computeDefaultTargetABI(), llvm::AsmPrinter::computeGlobalGOTEquivs(), llvm::SelectionDAG::computeKnownBits(), llvm::X86TargetLowering::computeKnownBitsForTargetNode(), computeMemberData(), llvm::computeMinimumValueSizes(), llvm::SelectionDAG::ComputeNumSignBits(), llvm::X86TargetLowering::ComputeNumSignBitsForTargetNode(), llvm::rdf::Liveness::computePhiInfo(), computeVariableSummary(), computeVTableFuncs(), computeZeroableShuffleElements(), constantArgPropagation(), llvm::DwarfCompileUnit::constructImportedEntityDIE(), llvm::omp::containsOpenMP(), containsProfilingIntrinsics(), convertAnnotation2Metadata(), convertToRelativeLookupTables(), convertToRelLookupTable(), llvm::IntervalMapImpl::NodeBase< std::pair< KeyT, KeyT >, ValT, N >::copy(), llvm::Instruction::copyMetadata(), llvm::SanitizerStatReport::create(), llvm::MinidumpYAML::Stream::create(), llvm::Function::Create(), llvm::InstrProfSymtab::create(), llvm::EngineBuilder::create(), createAllocaInstAtEntry(), CreateAssert(), llvm::IRBuilderBase::CreateAssumption(), llvm::OpenMPIRBuilder::createAtomicRead(), llvm::OpenMPIRBuilder::createAtomicWrite(), llvm::IRBuilderBase::CreateBinaryIntrinsic(), createCloneDeclaration(), llvm::OpenMPIRBuilder::createCopyinClauseBlocks(), createCoroSave(), createDevirtTriggerFunc(), llvm::MIRParserImpl::createDummyFunction(), llvm::OpenMPIRBuilder::createDynamicWorkshareLoop(), llvm::IRBuilderBase::CreateElementUnorderedAtomicMemCpy(), llvm::IRBuilderBase::CreateElementUnorderedAtomicMemMove(), llvm::IRBuilderBase::CreateElementUnorderedAtomicMemSet(), createEmptyFunction(), llvm::IRBuilderBase::CreateFAddReduce(), createFFSIntrinsic(), llvm::IRBuilderBase::CreateFMulReduce(), createFPFnStub(), createFrameHelperMachineFunction(), createFree(), llvm::sampleprofutil::createFSDiscriminatorVariable(), llvm::IRBuilderBase::CreateGCGetPointerBase(), llvm::IRBuilderBase::CreateGCGetPointerOffset(), llvm::IRBuilderBase::CreateGCRelocate(), CreateGCRelocates(), llvm::IRBuilderBase::CreateGCResult(), CreateGCStatepointCallCommon(), CreateGCStatepointInvokeCommon(), createGlobalFwdRef(), llvm::IRBuilderBase::CreateGlobalString(), llvm::IRBuilderBase::CreateGlobalStringPtr(), llvm::orc::createImplPointer(), createImportedModule(), llvm::DIBuilder::createImportedModule(), llvm::IRBuilderBase::CreateIntrinsic(), llvm::IRBuilderBase::CreateInvariantStart(), llvm::createIRLevelProfileFlagVar(), llvm::MCJIT::createJIT(), llvm::IRBuilderBase::CreateLaunderInvariantGroup(), llvm::IRBuilderBase::CreateLifetimeEnd(), llvm::IRBuilderBase::CreateLifetimeStart(), llvm::OpenMPIRBuilder::createLoopSkeleton(), createMalloc(), createMaskInstrs(), llvm::IRBuilderBase::CreateMemCpyInline(), llvm::IRBuilderBase::CreateMemMove(), llvm::IRBuilderBase::CreateMemSet(), llvm::IRBuilderBase::CreateMemTransferInst(), llvm::IRBuilderBase::CreateNoAliasScopeDeclaration(), llvm::OpenMPIRBuilder::createOffloadMapnames(), llvm::OpenMPIRBuilder::createOffloadMaptypes(), llvm::createPGOFuncNameVar(), createPopcntIntrinsic(), createPowWithIntegerExponent(), llvm::IRBuilderBase::CreatePreserveArrayAccessIndex(), llvm::IRBuilderBase::CreatePreserveStructAccessIndex(), llvm::IRBuilderBase::CreatePreserveUnionAccessIndex(), createPrivateGlobalForSourceLoc(), llvm::createPrivateGlobalForString(), createPrivateNonConstGlobalForString(), createProfileFileNameVar(), llvm::createProfileFileNameVar(), CreatePrologue(), createRelLookupTable(), llvm::createSanitizerCtor(), llvm::createSanitizerCtorAndInitFunctions(), llvm::OpenMPIRBuilder::createSections(), llvm::Attributor::createShallowWrapper(), llvm::OpenMPIRBuilder::createStaticWorkshareLoop(), llvm::IRBuilderBase::CreateStripInvariantGroup(), createSwitchStatement(), createTargetMachine(), llvm::ThunkInserter< Derived >::createThunkFunction(), llvm::IRBuilderBase::CreateUnaryIntrinsic(), llvm::IRBuilderBase::CreateVectorReverse(), llvm::IRBuilderBase::CreateVectorSplice(), llvm::IRBuilderBase::CreateVScale(), llvm::Function::createWithDefaultAttr(), createWrapper(), llvm::DataLayout::DataLayout(), debugAssign(), llvm::declareSanitizerInitFunction(), declaresCoroCleanupIntrinsics(), declaresCoroEarlyIntrinsics(), declaresCoroElideIntrinsics(), declaresCoroSplitIntrinsics(), llvm::coro::declaresIntrinsics(), DecodeCPSInstruction(), llvm::DecodePSHUFBMask(), llvm::DecodePSLLDQMask(), llvm::DecodePSRLDQMask(), DecodeT2CPSInstruction(), llvm::DecodeVPERMILPMask(), llvm::DecodeVPERMV3Mask(), llvm::DecodeVPERMVMask(), llvm::DecodeVPPERMMask(), llvm::ValueMapCallbackVH< KeyT, ValueT, Config >::deleted(), llvm::orc::shared::SPSSerializationTraits< SPSSequence< SPSTuple< SPSString, SPSValueT > >, StringMap< ValueT > >::deserialize(), llvm::orc::shared::SerializationTraits< ChannelT, std::map< K, V >, std::map< K2, V2 > >::deserialize(), llvm::orc::shared::SerializationTraits< ChannelT, std::map< K, V >, DenseMap< K2, V2 > >::deserialize(), llvm::DiagnosticInfoDebugMetadataVersion::DiagnosticInfoDebugMetadataVersion(), llvm::DiagnosticInfoIgnoringInvalidDebugMetadata::DiagnosticInfoIgnoringInvalidDebugMetadata(), llvm::DIBuilder::DIBuilder(), llvm::AMDGPUAsmPrinter::doFinalization(), llvm::NVPTXAsmPrinter::doFinalization(), llvm::legacy::FunctionPassManagerImpl::doFinalization(), llvm::AsmPrinter::doFinalization(), llvm::FPPassManager::doFinalization(), doImportingForModule(), llvm::PhysicalRegisterUsageInfo::doInitialization(), llvm::SystemZAsmPrinter::doInitialization(), llvm::AMDGPUAAWrapperPass::doInitialization(), llvm::objcarc::ObjCARCAAWrapperPass::doInitialization(), llvm::X86AsmPrinter::doInitialization(), llvm::ProfileSummaryInfoWrapperPass::doInitialization(), llvm::NVPTXAsmPrinter::doInitialization(), llvm::legacy::FunctionPassManagerImpl::doInitialization(), llvm::MachineModuleInfoWrapperPass::doInitialization(), llvm::AsmPrinter::doInitialization(), llvm::FPPassManager::doInitialization(), llvm::Metadata::dump(), llvm::PBQP::RegAlloc::PBQPRAGraph::dump(), eliminateAvailableExternally(), llvm::EmbedBitcodeInModule(), llvm::orc::CompileOnDemandLayer::emit(), llvm::orc::IRSpeculationLayer::emit(), llvm::ARMTargetLowering::emitAtomicCmpXchgNoStoreLLBalance(), llvm::AArch64TargetLowering::emitAtomicCmpXchgNoStoreLLBalance(), emitBinaryFloatFnCallHelper(), llvm::emitCalloc(), EmitCamlGlobal(), llvm::TargetLoweringObjectFile::emitCGProfileMetadata(), llvm::WebAssemblyAsmPrinter::emitEndOfAsmFile(), llvm::WebAssemblyAsmPrinter::emitExternalDecls(), llvm::emitFPutC(), llvm::emitFPutS(), llvm::emitFWrite(), llvm::ExecutionEngine::emitGlobals(), llvm::PMDataManager::emitInstrCountChangedRemark(), llvm::InstrProfRecordWriterTrait::EmitKeyDataLength(), emitLibCall(), llvm::emitLinkerFlagsForUsedCOFF(), llvm::HexagonTargetLowering::emitLoadLinked(), llvm::ARMTargetLowering::emitLoadLinked(), llvm::AArch64TargetLowering::emitLoadLinked(), llvm::emitMalloc(), llvm::PPCTargetLowering::emitMaskedAtomicCmpXchgIntrinsic(), llvm::PPCTargetLowering::emitMaskedAtomicRMWIntrinsic(), llvm::emitMemCpyChk(), llvm::TargetLoweringObjectFileELF::emitModuleMetadata(), llvm::TargetLoweringObjectFileMachO::emitModuleMetadata(), llvm::TargetLoweringObjectFileCOFF::emitModuleMetadata(), llvm::MCJIT::emitObject(), llvm::WebAssemblyAsmPrinter::EmitProducerInfo(), llvm::emitPutChar(), llvm::emitPutS(), llvm::ARMAsmPrinter::emitStartOfAsmFile(), llvm::AMDGPUAsmPrinter::emitStartOfAsmFile(), llvm::X86AsmPrinter::emitStartOfAsmFile(), llvm::HexagonTargetLowering::emitStoreConditional(), llvm::ARMTargetLowering::emitStoreConditional(), llvm::AArch64TargetLowering::emitStoreConditional(), llvm::WebAssemblyAsmPrinter::EmitTargetFeatures(), emitUnaryFloatFnCallHelper(), llvm::json::Object::empty(), enablesValueProfiling(), llvm::json::Object::end(), llvm::xray::Graph< VertexAttribute, EdgeAttribute, VI >::InOutEdgeView< isConst, isOut >::end(), llvm::WinCFGuard::endModule(), llvm::WinException::endModule(), llvm::EngineBuilder::EngineBuilder(), llvm::json::Object::erase(), llvm::HexagonEvaluator::evaluate(), llvm::ExecutionEngine::ExecutionEngine(), llvm::X86TargetLowering::expandIndirectJTBranch(), llvm::AArch64InstrInfo::expandPostRAPseudo(), llvm::BitTracker::RegisterCell::extract(), extractSubModule(), FactorOutConstant(), llvm::FailedWithMessage(), fillOverallFunction(), llvm::filterDeadComdatFunctions(), llvm::OpenMPIRBuilder::finalize(), llvm::MCJIT::finalizeModule(), llvm::MCJIT::finalizeObject(), llvm::json::Object::find(), llvm::CodeExtractor::findAllocas(), findArrayDimensionsRec(), findCalledFunction(), findCostForOutputBlocks(), FindCXAAtExit(), llvm::findDevirtualizableCallsForTypeTest(), FindDominatedInstruction(), findFuncPointers(), findGlobalCtors(), findLoadCallsAtConstantOffset(), llvm::MCJIT::findModuleForSymbol(), findPartitions(), llvm::IRSimilarity::IRSimilarityIdentifier::findSimilarity(), llvm::MCJIT::findSymbol(), llvm::AMDGPU::findVariablesToLower(), fixupFPReturnAndCall(), foldICmpWithLowBitMaskedVal(), foldShuffleOfConcatUndefs(), llvm::InstCombinerImpl::foldVectorBinop(), foldVectorCmp(), forceRenaming(), formSplatFromShuffles(), llvm::JITSymbolFlags::fromGlobalValue(), llvm::orc::SymbolLookupSet::fromMapKeys(), llvm::FunctionImportGlobalProcessing::FunctionImportGlobalProcessing(), llvm::legacy::FunctionPassManager::FunctionPassManager(), llvm::MCJIT::generateCodeForModule(), llvm::objcarc::ARCMDKindCache::get(), llvm::ScalarEvolution::getAddExpr(), getAddrIntType(), getAddrPtrType(), getAddrSizeInt(), getAllocaPos(), llvm::MachineFunctionAnalysisManager::getCachedResult(), llvm::BitTracker::MachineEvaluator::getCell(), llvm::AArch64TTIImpl::getCmpSelInstrCost(), llvm::LazyValueInfo::getConstantOnEdge(), llvm::LazyValueInfo::getConstantRangeOnEdge(), llvm::orc::getConstructors(), llvm::getDebugMetadataVersionFromModule(), llvm::Intrinsic::getDeclaration(), llvm::VPIntrinsic::getDeclarationForParams(), getDeclareIntrin(), getDefaultPersonalityFn(), llvm::TargetLoweringBase::getDefaultSafeStackPointerLocation(), getDescription(), llvm::orc::getDestructors(), llvm::omp::getDeviceKernels(), llvm::MCRegisterInfo::getDwarfRegNum(), getEmscriptenFunction(), getExcludedGlobals(), getExtMask(), getFauxShuffleMask(), llvm::NewArchiveMember::getFile(), getFirstReloc(), getFreshReductionFunc(), llvm::AMDGPULibFunc::getFunction(), llvm::AMDGPULibFunc::getFunctionType(), llvm::AMDGPUMangledLibFunc::getFunctionType(), getGlobalVariable(), getHalfShuffleMask(), getImpl(), getInputSegmentList(), llvm::EVT::getIntegerVT(), getIntModuleFlagOrZero(), llvm::AArch64TTIImpl::getIntrinsicInstrCost(), llvm::GCNTTIImpl::getIntrinsicInstrCost(), getIntrinsicNameImpl(), llvm::X86TargetLowering::getIRStackGuard(), llvm::TargetLoweringBase::getIRStackGuard(), getKmpcForDynamicInitForType(), getKmpcForDynamicNextForType(), getKmpcForStaticInitForType(), llvm::OpenMPIRBuilder::getLanemaskType(), llvm::MCRegisterInfo::getLLVMRegNum(), llvm::TypeBasedAAResult::getModRefBehavior(), llvm::TypeBasedAAResult::getModRefInfo(), llvm::ModuleSlotTracker::getModule(), llvm::CallGraphDOTInfo::getModule(), llvm::CallGraph::getModule(), llvm::DiagnosticInfoDebugMetadataVersion::getModule(), llvm::DiagnosticInfoIgnoringInvalidDebugMetadata::getModule(), getModuleFromVal(), llvm::RISCVELFTargetObjectFile::getModuleMetadata(), llvm::TargetLoweringObjectFileELF::getModuleMetadata(), llvm::orc::IRMaterializationUnit::getName(), llvm::Intrinsic::getName(), llvm::DIScope::getName(), getNumberOfRelocations(), GetObjCImageInfo(), llvm::object::getObject(), getObject(), llvm::NewArchiveMember::getOldMember(), llvm::DwarfUnit::getOrCreateContextDIE(), getOrCreateFrameHelper(), llvm::getOrCreateFunctionComdat(), llvm::OpenMPIRBuilder::getOrCreateIdent(), llvm::DwarfUnit::getOrCreateModule(), llvm::OpenMPIRBuilder::getOrCreateRuntimeFunctionPtr(), llvm::getOrCreateSanitizerCtorAndInitFunctions(), llvm::OpenMPIRBuilder::getOrCreateSrcLocStr(), llvm::AMDGPULibFunc::getOrInsertFunction(), getOrInsertGlobal(), getOrInsertValueProfilingCall(), getOutputSegmentMap(), llvm::getPointerAtOffset(), llvm::MCJIT::getPointerToFunction(), llvm::LazyValueInfo::getPredicateAt(), llvm::LazyValueInfo::getPredicateOnEdge(), getPSHUFShuffleMask(), GetQuadraticEquation(), llvm::pdb::DbiModuleDescriptor::getRecordLength(), getReductionIntrinsic(), llvm::BitTracker::MachineEvaluator::getRef(), llvm::MachineFunctionAnalysisManager::getResult(), llvm::objcarc::getRVInstMarker(), llvm::TargetLoweringBase::getSafeStackPointerLocation(), getScalarIntrinsicDeclaration(), llvm::DIScope::getScope(), llvm::ARMTargetLowering::getSDagStackGuard(), llvm::AArch64TargetLowering::getSDagStackGuard(), llvm::PPCTargetLowering::getSDagStackGuard(), llvm::X86TargetLowering::getSDagStackGuard(), llvm::TargetLoweringBase::getSDagStackGuard(), getSectionID(), getShuffleDemandedElts(), GetSignReturnAddress(), GetSortedValueDataFromCallTargets(), llvm::getSplatIndex(), getSqrtCall(), llvm::ARMTargetLowering::getSSPStackGuardCheck(), llvm::AArch64TargetLowering::getSSPStackGuardCheck(), llvm::X86TargetLowering::getSSPStackGuardCheck(), getStackGuard(), llvm::orc::getStaticInitGVs(), llvm::InstrProfIncrementInst::getStep(), llvm::getStrideFromPointer(), getStrlenWithNull(), llvm::AsmPrinter::getSymbolPreferLocal(), GetSymbolRef(), getSymbolSectionID(), getTargetConstantBitsFromNode(), getTargetShuffleAndZeroables(), getTargetShuffleMask(), llvm::HexagonTargetLowering::getTgtMemIntrinsic(), llvm::ScalarEvolution::getUDivExpr(), llvm::getUniqueModuleId(), llvm::AMDGPU::getUsedList(), getV4X86ShuffleImm(), getVarName(), llvm::VFDatabase::getVectorizedFunction(), llvm::SelectionDAG::getVectorShuffle(), llvm::EVT::getVectorVT(), llvm::TargetLibraryInfoImpl::getWCharSize(), llvm::TargetLibraryInfo::getWCharSize(), llvm::GlobalsMetadata::GlobalsMetadata(), llvm::GlobalVariable::GlobalVariable(), HandleByValArgumentInit(), llvm::TextChangeReporter< ChangedIRData >::handleInitialIR(), llvm::ModuleSummaryIndex::hasExportedFunctions(), llvm::PBQP::hash_value(), llvm::SDNode::hasPredecessorHelper(), hasReturnsTwiceAttr(), hasVolatileUser(), llvm::haveNoCommonBitsSet(), llvm::inferAlignFromPtrInfo(), inferAllPrototypeAttributes(), llvm::inferLibFuncAttributes(), llvm::InformationCache::InformationCache(), llvm::ThunkInserter< Derived >::init(), llvm::objcarc::ARCRuntimeEntryPoints::init(), llvm::DataLayout::init(), llvm::objcarc::ARCMDKindCache::init(), llvm::OpenMPIRBuilder::initialize(), INITIALIZE_PASS(), llvm::MIRParserImpl::initializeConstantPool(), initializeCounts(), initializeRecordStreamer(), llvm::lto::initImportList(), llvm::PMDataManager::initSizeRemarkInfo(), llvm::InlineAdvisor::InlineAdvisor(), llvm::InlineFunction(), llvm::xray::Graph< VertexAttribute, EdgeAttribute, VI >::InOutEdgeView< isConst, isOut >::InOutEdgeView(), llvm::yaml::CustomMappingTraits< MapDocNode >::inputOne(), llvm::BitTracker::RegisterCell::insert(), insertCall(), insertLifetimeMarkersSurroundingCall(), llvm::RISCVInstrInfo::insertOutlinedCall(), llvm::AArch64InstrInfo::insertOutlinedCall(), llvm::ARMBaseInstrInfo::insertOutlinedCall(), llvm::X86InstrInfo::insertOutlinedCall(), llvm::BPFCoreSharedInfo::insertPassThrough(), InsertSafepointPoll(), insertSinCosCall(), llvm::SparcTargetLowering::insertSSPDeclarations(), llvm::ARMTargetLowering::insertSSPDeclarations(), llvm::AArch64TargetLowering::insertSSPDeclarations(), llvm::PPCTargetLowering::insertSSPDeclarations(), llvm::X86TargetLowering::insertSSPDeclarations(), llvm::TargetLoweringBase::insertSSPDeclarations(), insertUseHolderAfter(), llvm::ARMTTIImpl::instCombineIntrinsic(), llvm::mca::InstructionError< T >::InstructionError(), InstrumentAllFunctions(), llvm::SampleProfileProber::instrumentOneFunc(), instrumentOneFunc(), llvm::Attributor::internalizeFunctions(), internalizeGVsAfterImport(), llvm::InternalizePass::internalizeModule(), llvm::Interpreter::Interpreter(), llvm::IntervalIterator< NodeTy, OrigContainer_t, GT, IGT >::IntervalIterator(), llvm::CGSCCAnalysisManagerModuleProxy::Result::invalidate(), llvm::orc::IRMaterializationUnit::IRMaterializationUnit(), llvm::IRMover::IRMover(), isAddSubOrSubAddMask(), isAnyInRange(), isAnyZero(), isAnyZeroOrUndef(), isEXTMask(), llvm::ShuffleVectorInst::isExtractSubvectorMask(), isIdentity(), llvm::BasicTTIImplBase< AMDGPUTTIImpl >::isIndexedLoadLegal(), llvm::BasicTTIImplBase< AMDGPUTTIImpl >::isIndexedStoreLegal(), llvm::ShuffleVectorInst::isInsertSubvectorMask(), isINSMask(), llvm::isIRPGOFlagSet(), llvm::X86TargetLowering::isLegalAddressingMode(), isMulSExtable(), isMultiLaneShuffleMask(), llvm::X86::isOffsetSuitableForCodeModel(), llvm::omp::isOpenMPDevice(), isReverseMask(), isREVMask(), llvm::RISCVTargetLowering::isShuffleMaskLegal(), llvm::AArch64TargetLowering::isShuffleMaskLegal(), llvm::ARMTargetLowering::isShuffleMaskLegal(), isSingletonEXTMask(), isSingletonVEXTMask(), llvm::SelectionDAG::isSplatValue(), isTRN_v_undef_Mask(), isTRNMask(), isUndefInRange(), isUndefOrEqual(), isUndefOrInRange(), isUndefOrZeroInRange(), isUndefOrZeroOrInRange(), isUZP_v_undef_Mask(), isUZPMask(), isVectorElementSwap(), isVEXTMask(), isVMOVNMask(), isVMOVNTruncMask(), llvm::isVREVMask(), isVTBLMask(), isVTRN_v_undef_Mask(), isVTRNMask(), isVUZP_v_undef_Mask(), isVUZPMask(), isVZIP_v_undef_Mask(), isVZIPMask(), isWideDUPMask(), isZIP_v_undef_Mask(), isZipMask(), isZIPMask(), llvm::SwitchCG::JumpTable::JumpTable(), llvm::LazyCallGraph::LazyCallGraph(), llvm::ThinLTOCodeGenerator::linkCombinedIndex(), llvm::Linker::Linker(), llvm::lintModule(), LLVMAddAlias(), LLVMAddFunction(), LLVMAddGlobal(), LLVMAddGlobalIFunc(), LLVMAddGlobalInAddressSpace(), LLVMAddModule(), LLVMAddModuleFlag(), LLVMAddNamedMetadataOperand(), LLVMAppendModuleInlineAsm(), LLVMCloneModule(), LLVMCopyModuleFlagsMetadata(), LLVMCreateDIBuilder(), LLVMCreateDIBuilderDisallowUnresolved(), LLVMCreateExecutionEngineForModule(), LLVMCreateFunctionPassManagerForModule(), LLVMCreateInterpreterForModule(), LLVMCreateJITCompilerForModule(), LLVMCreateMCJITCompilerForModule(), LLVMCreateModuleProviderForExistingModule(), LLVMDIBuilderCreateImportedModuleFromModule(), LLVMDisposeModule(), LLVMDumpModule(), LLVMGetDataLayout(), LLVMGetDataLayoutStr(), LLVMGetFirstFunction(), LLVMGetFirstGlobal(), LLVMGetFirstGlobalAlias(), LLVMGetFirstGlobalIFunc(), LLVMGetFirstNamedMetadata(), LLVMGetLastFunction(), LLVMGetLastGlobal(), LLVMGetLastGlobalAlias(), LLVMGetLastGlobalIFunc(), LLVMGetLastNamedMetadata(), LLVMGetModuleContext(), LLVMGetModuleDataLayout(), LLVMGetModuleDebugMetadataVersion(), LLVMGetModuleFlag(), LLVMGetModuleIdentifier(), LLVMGetModuleInlineAsm(), LLVMGetNamedFunction(), LLVMGetNamedGlobal(), LLVMGetNamedGlobalAlias(), LLVMGetNamedGlobalIFunc(), LLVMGetNamedMetadata(), LLVMGetNamedMetadataNumOperands(), LLVMGetNamedMetadataOperands(), LLVMGetOrInsertComdat(), LLVMGetOrInsertNamedMetadata(), LLVMGetSourceFileName(), LLVMGetTarget(), LLVMGetTypeByName(), LLVMLinkModules2(), LLVMOrcCreateNewThreadSafeModule(), LLVMOrcThreadSafeModuleWithModuleDo(), LLVMPrintModuleToFile(), LLVMPrintModuleToString(), LLVMRemoveModule(), LLVMRunPassManager(), LLVMSetDataLayout(), LLVMSetModuleDataLayout(), LLVMSetModuleIdentifier(), LLVMSetModuleInlineAsm(), LLVMSetModuleInlineAsm2(), LLVMSetSourceFileName(), LLVMSetTarget(), LLVMStripModuleDebugInfo(), LLVMTargetMachineEmit(), LLVMTargetMachineEmitToFile(), LLVMTargetMachineEmitToMemoryBuffer(), LLVMVerifyModule(), LLVMWriteBitcodeToFD(), LLVMWriteBitcodeToFile(), LLVMWriteBitcodeToFileHandle(), LLVMWriteBitcodeToMemoryBuffer(), lookThroughAnd(), lower1BitShuffleAsKSHIFTR(), lower256BitShuffle(), lower512BitShuffle(), llvm::coro::LowererBase::LowererBase(), llvm::AMDGPUTargetLowering::LowerFP_TO_FP16(), llvm::LegalizerHelper::lowerFPTRUNC_F64_TO_F16(), lowerIntrinsics(), lowerLocalAllocas(), lowerObjCCall(), LowerRotate(), LowerShift(), lowerShuffleAsBlendOfPSHUFBs(), lowerShuffleAsByteRotateAndPermute(), lowerShuffleAsDecomposedShuffleMerge(), lowerShuffleAsElementInsertion(), lowerShuffleAsLanePermuteAndPermute(), lowerShuffleAsLanePermuteAndRepeatedMask(), lowerShuffleAsLanePermuteAndShuffle(), lowerShuffleAsLanePermuteAndSHUFP(), lowerShuffleAsPermuteAndUnpack(), lowerShuffleAsRepeatedMaskAndLanePermute(), lowerShuffleAsSplitOrBlend(), lowerShuffleAsUNPCKAndPermute(), lowerShuffleAsZeroOrAnyExtend(), lowerShuffleWithPERMV(), lowerShuffleWithPSHUFB(), lowerShuffleWithSHUFPS(), llvm::IntrinsicLowering::LowerToByteSwap(), llvm::HexagonTargetLowering::LowerUnalignedLoad(), lowerV16I8Shuffle(), lowerV4F32Shuffle(), lowerV4F64Shuffle(), lowerV4I32Shuffle(), lowerV8I16GeneralSingleInputShuffle(), lowerV8I16Shuffle(), llvm::HexagonTargetLowering::LowerVECTOR_SHUFFLE(), lowerVECTOR_SHUFFLE(), m_LoopInvariant(), llvm::PatternMatch::m_Unless(), llvm::BitTracker::MachineEvaluator::MachineEvaluator(), llvm::MachineJumpTableEntry::MachineJumpTableEntry(), llvm::MachineModuleSlotTracker::MachineModuleSlotTracker(), llvm::SparcTargetLowering::makeAddress(), llvm::ARMTargetLowering::makeDMB(), llvm::rdf::RegisterAggr::makeRegRef(), llvm::orc::makeStub(), AAHeapToStackFunction::manifest(), llvm::rdf::PhysicalRegisterInfo::mapTo(), llvm::X86TargetLowering::markLibCallAttributes(), matchAliasCondition(), llvm::MCInstPrinter::matchAliasPatterns(), matchBinaryShuffle(), matchExpandedRem(), llvm::Pattern::MatchResult::MatchResult(), matchScalarReduction(), matchShuffleAsBitRotate(), matchShuffleAsBlend(), matchShuffleAsElementRotate(), matchShuffleAsEXTRQ(), matchShuffleWithUNPCK(), matchUnaryPermuteShuffle(), llvm::Module::materializeAll(), llvm::PBQP::Matrix::Matrix(), llvm::PBQP::RegAlloc::MatrixMetadata::MatrixMetadata(), llvm::rdf::CodeNode::members_if(), mergeConstants(), llvm::codeview::mergeIdRecords(), llvm::codeview::mergeTypeAndIdRecords(), llvm::codeview::mergeTypeRecords(), llvm::irsymtab::Reader::module_symbols(), llvm::objcarc::ModuleHasARC(), llvm::PseudoProbeManager::moduleIsProbed(), llvm::IRMutationStrategy::mutate(), llvm::IRMutator::mutateModule(), llvm::nameUnamedGlobals(), llvm::RuntimePointerChecking::needsChecking(), llvm::needsComdatForCounter(), llvm::needsParamAccessSummary(), llvm::LaneBitmask::operator!=(), llvm::MemDepResult::operator!=(), llvm::LaneBitmask::operator&(), llvm::LaneBitmask::operator&=(), llvm::orc::SimpleCompiler::operator()(), llvm::orc::ConcurrentIRCompiler::operator()(), llvm::orc::SymbolLinkagePromoter::operator()(), llvm::PBQP::Matrix::operator+(), llvm::PBQP::Matrix::operator+=(), llvm::LaneBitmask::operator<(), llvm::MemDepResult::operator<(), llvm::DiagnosticPrinterRawOStream::operator<<(), llvm::PBQP::operator<<(), false::operator<<(), llvm::operator<<(), llvm::json::Value::operator=(), llvm::LaneBitmask::operator==(), llvm::PBQP::Matrix::operator==(), llvm::MemDepResult::operator==(), llvm::MemDepResult::operator>(), llvm::LaneBitmask::operator|(), llvm::LaneBitmask::operator|=(), llvm::cl::applicator< Mod >::opt(), optimizeDoubleFP(), OptimizeFunctions(), OptimizeGlobalAliases(), llvm::optimizeGlobalCtorsList(), optimizeGlobalsInModule(), OptimizeGlobalVars(), llvm::GCOV::Options::Options(), orderModule(), llvm::yaml::CustomMappingTraits< MapDocNode >::output(), llvm::sys::OwningMemoryBlock::OwningMemoryBlock(), llvm::BlockFrequencyInfoImplBase::packageLoop(), packSegmentMask(), llvm::SpecialCaseList::parse(), llvm::parseAndVerify(), llvm::ARM::parseArchProfile(), parseGlobalValue(), llvm::MIRParserImpl::parseIRModule(), llvm::MIRParserImpl::parseMachineFunction(), llvm::MIRParser::parseMachineFunctions(), llvm::MIRParserImpl::parseMachineFunctions(), llvm::object::Parser::Parser(), llvm::PassManagerPrettyStackEntry::PassManagerPrettyStackEntry(), PerformADDVecReduce(), PerformSplittingToNarrowingStores(), llvm::rdf::PhysicalRegisterInfo::PhysicalRegisterInfo(), llvm::TypeBasedAAResult::pointsToConstantMemory(), llvm::DefaultVLIWScheduler::postprocessDAG(), predictUseListOrder(), prepareForSplit(), llvm::LazyMachineBlockFrequencyInfoPass::print(), llvm::StackSafetyGlobalInfo::print(), llvm::Metadata::print(), llvm::IVUsersWrapperPass::print(), llvm::MachineBasicBlock::print(), llvm::MachineInstr::print(), llvm::Metadata::printAsOperand(), llvm::Value::printAsOperand(), printIRBlockReference(), llvm::rdf::PrintLaneMaskOpt::PrintLaneMaskOpt(), printMemberHeader(), printMemOperand(), printMetadataImpl(), llvm::printMIR(), PrintModRefResults(), PrintOps(), llvm::HexagonBlockRanges::PrintRangeMap::PrintRangeMap(), PrintResults(), printWithoutType(), llvm::DebugInfoFinder::processInstruction(), llvm::DebugInfoFinder::processLocation(), llvm::DebugInfoFinder::processModule(), profDataReferencedByCode(), llvm::ImmutableMap< KeyT, ValT, ValInfo >::Profile(), llvm::ImmutableMapRef< KeyT, ValT, ValInfo >::Profile(), llvm::ProfileSummaryInfo::ProfileSummaryInfo(), promoteIndirectCalls(), llvm::PseudoProbeManager::PseudoProbeManager(), llvm::BitTracker::MachineEvaluator::putCell(), llvm::AMDGPUPALMetadata::readFromIR(), llvm::RecordStreamer::RecordStreamer(), llvm::DebugifyEachInstrumentation::registerCallbacks(), llvm::VerifyInstrumentation::registerCallbacks(), llvm::rdf::RegisterRef::RegisterRef(), ScopedAliasMetadataDeepCloner::remap(), removeConstantFactors(), llvm::ExecutionEngine::removeModule(), llvm::MCJIT::removeModule(), RemovePreallocated(), removeSSACopy(), llvm::renameModuleForThinLTO(), llvm::replaceAllDbgUsesWith(), replaceCalledFunction(), ReplaceCallWith(), replaceDevirtTrigger(), replaceFrameSize(), replaceLongjmpWithEmscriptenLongjmp(), replaceUnaryCall(), replaceWithTLIFunction(), llvm::ReplayInlineAdvisor::ReplayInlineAdvisor(), llvm::rdf::Liveness::resetKills(), resolveTargetShuffleInputsAndMask(), resolveZeroablesFromTargetShuffle(), llvm::rdf::DataFlowGraph::restrictRef(), llvm::InlineAdvisorAnalysis::Result::Result(), rewriteComdat(), llvm::GCNTTIImpl::rewriteIntrinsicWithAddressSpace(), llvm::PoisonCheckingPass::run(), llvm::SyntheticCountsPropagation::run(), llvm::CGProfilePass::run(), llvm::StripNonLineTableDebugInfoPass::run(), llvm::BlockExtractorPass::run(), llvm::MetaRenamerPass::run(), llvm::CrossDSOCFIPass::run(), llvm::GCOVProfilerPass::run(), llvm::InstrOrderFilePass::run(), llvm::ForceFunctionAttrsPass::run(), llvm::CoroCleanupPass::run(), llvm::PreISelIntrinsicLoweringPass::run(), llvm::Annotation2MetadataPass::run(), llvm::ModuleDebugInfoPrinterPass::run(), llvm::LoopExtractorPass::run(), llvm::DataFlowSanitizerPass::run(), llvm::ScalarizerPass::run(), llvm::CoroElidePass::run(), llvm::StripDeadPrototypesPass::run(), llvm::PartialInlinerPass::run(), llvm::InferFunctionAttrsPass::run(), llvm::CanonicalizeAliasesPass::run(), llvm::CoroEarlyPass::run(), llvm::NameAnonGlobalPass::run(), llvm::EliminateAvailableExternallyPass::run(), llvm::CoroSplitPass::run(), llvm::GlobalOptPass::run(), llvm::MergeFunctionsPass::run(), llvm::GlobalSplitPass::run(), llvm::CalledValuePropagationPass::run(), llvm::ThreadSanitizerPass::run(), llvm::HWAddressSanitizerPass::run(), llvm::RewriteStatepointsForGC::run(), llvm::StripSymbolsPass::run(), llvm::GlobalDCEPass::run(), llvm::ConstantMergePass::run(), llvm::IPSCCPPass::run(), llvm::MemProfilerPass::run(), llvm::rdf::CopyPropagation::run(), llvm::SampleProfileLoaderPass::run(), llvm::StripNonDebugSymbolsPass::run(), llvm::ThinLTOBitcodeWriterPass::run(), llvm::AlwaysInlinerPass::run(), llvm::StripDebugDeclarePass::run(), llvm::FunctionSpecializationPass::run(), llvm::PGOInstrumentationGenCreateVar::run(), llvm::OpenMPOptPass::run(), llvm::HotColdSplitting::run(), llvm::ModuleMemProfilerPass::run(), llvm::InstrProfiling::run(), llvm::StripDeadDebugInfoPass::run(), llvm::TypeFinder::run(), llvm::MemorySanitizerPass::run(), llvm::OpenMPOptCGSCCPass::run(), llvm::ModuleSanitizerCoveragePass::run(), llvm::PGOInstrumentationGen::run(), llvm::ModuleSummaryIndexAnalysis::run(), llvm::ObjCARCAPElimPass::run(), llvm::BPFAdjustOptPass::run(), llvm::PrintModulePass::run(), llvm::legacy::PassManager::run(), llvm::HotColdSplittingPass::run(), llvm::PGOInstrumentationUse::run(), llvm::RelLookupTableConverterPass::run(), llvm::ReversePostOrderFunctionAttrsPass::run(), llvm::BitcodeWriterPass::run(), llvm::InternalizePass::run(), llvm::PGOIndirectCallPromotion::run(), llvm::DeadArgumentEliminationPass::run(), llvm::ThinLTOCodeGenerator::run(), llvm::ASanGlobalsMetadataAnalysis::run(), llvm::InlinerPass::run(), NewPMDebugifyPass::run(), llvm::AddressSanitizerPass::run(), llvm::VerifierAnalysis::run(), llvm::ModuleInlinerWrapperPass::run(), llvm::StackSafetyGlobalAnalysis::run(), llvm::RewriteSymbolPass::run(), llvm::ModuleAddressSanitizerPass::run(), llvm::FunctionImportPass::run(), llvm::StackSafetyGlobalPrinterPass::run(), llvm::SampleProfileProbePass::run(), llvm::GlobalsAA::run(), llvm::VerifierPass::run(), llvm::MachineFunctionPassManager::run(), llvm::AMDGPUPropagateAttributesLatePass::run(), NewPMCheckDebugifyPass::run(), llvm::PseudoProbeUpdatePass::run(), llvm::AMDGPUReplaceLDSUseWithPointerPass::run(), llvm::IROutliner::run(), llvm::AMDGPULowerModuleLDSPass::run(), llvm::ProfileSummaryAnalysis::run(), llvm::LowerTypeTestsPass::run(), llvm::ProfileSummaryPrinterPass::run(), llvm::WholeProgramDevirtPass::run(), llvm::InlineAdvisorAnalysis::run(), llvm::AMDGPUAlwaysInlinePass::run(), llvm::AMDGPUPrintfRuntimeBindingPass::run(), llvm::AMDGPUUnifyMetadataPass::run(), llvm::MachineModuleAnalysis::run(), llvm::CallGraphAnalysis::run(), llvm::InlineCostAnnotationPrinterPass::run(), llvm::CallGraphPrinterPass::run(), llvm::IROutlinerPass::run(), llvm::ModuleToPostOrderCGSCCPassAdaptor::run(), llvm::FunctionAnalysisManagerCGSCCProxy::run(), llvm::legacy::PassManagerImpl::run(), llvm::MustBeExecutedContextPrinterPass::run(), llvm::IRSimilarityAnalysis::run(), llvm::IRSimilarityAnalysisPrinterPass::run(), llvm::ModuleToFunctionPassAdaptor::run(), llvm::LazyCallGraphAnalysis::run(), llvm::LazyCallGraphPrinterPass::run(), llvm::LazyCallGraphDOTPrinterPass::run(), llvm::AttributorPass::run(), llvm::AttributorCGSCCPass::run(), runCGProfilePass(), runCVP(), llvm::runFunctionSpecialization(), llvm::RewriteSymbolPass::runImpl(), llvm::runIPSCCP(), llvm::LPPassManager::runOnFunction(), llvm::FPPassManager::runOnFunction(), llvm::ModuleSummaryIndexWrapperPass::runOnModule(), FunctionSpecializationLegacyPass::runOnModule(), llvm::GlobalsAAWrapperPass::runOnModule(), llvm::StackSafetyGlobalInfoWrapperPass::runOnModule(), llvm::CallGraphWrapperPass::runOnModule(), llvm::FPPassManager::runOnModule(), AMDGPUAttributor::runOnModule(), llvm::IRSimilarityIdentifierWrapperPass::runOnModule(), IROutlinerLegacyPass::runOnModule(), llvm::ExecutionEngine::runStaticConstructorsDestructors(), llvm::salvageDebugInfoImpl(), llvm::SanitizerStatReport::SanitizerStatReport(), llvm::SCEVLoopAddRecRewriter::SCEVLoopAddRecRewriter(), SCEVLoopGuardRewriter::SCEVLoopGuardRewriter(), llvm::SCEVParameterRewriter::SCEVParameterRewriter(), ScopedAliasMetadataDeepCloner::ScopedAliasMetadataDeepCloner(), llvm::HexagonDAGToDAGISel::SelectVAlignAddr(), llvm::orc::shared::SPSSerializationTraits< SPSSequence< SPSTuple< SPSString, SPSValueT > >, StringMap< ValueT > >::serialize(), llvm::EngineBuilder::setCodeModel(), setCoroInfo(), llvm::DebugifyCustomPassManager::setDebugifyMode(), llvm::codegen::setFunctionAttributes(), llvm::setIrrLoopHeaderMetadata(), llvm::pdb::DbiStreamBuilder::setMachineType(), llvm::GVN::ValueTable::setMemDep(), llvm::cl::Option::setMiscFlag(), llvm::ImportedFunctionsInliningStatistics::setModuleInfo(), llvm::Module::setProfileSummary(), llvm::setProfMetadata(), setUsedInitializer(), llvm::VFABI::setVectorVariantNames(), shouldConvertToRelLookupTable(), shouldInstrumentReadWriteFromAddress(), simplifyAMDGCNMemoryIntrinsicDemanded(), llvm::InstCombinerImpl::SimplifyAnyMemTransfer(), llvm::TargetLowering::SimplifyDemandedBits(), llvm::TargetLowering::SimplifyDemandedVectorElts(), llvm::X86TargetLowering::SimplifyDemandedVectorEltsForTargetNode(), llvm::TargetLowering::SimplifyMultipleUseDemandedBits(), llvm::X86TargetLowering::SimplifyMultipleUseDemandedBitsForTargetNode(), llvm::orc::shared::SPSSerializationTraits< SPSSequence< SPSTuple< SPSString, SPSValueT > >, StringMap< ValueT > >::size(), llvm::json::Object::size(), llvm::ModulePass::skipModule(), llvm::ArrayRef< llvm::OperandBundleDefT >::slice(), llvm::MutableArrayRef< llvm::coverage::CounterMappingRegion >::slice(), SolveQuadraticAddRecExact(), SolveQuadraticAddRecRange(), splitAndLowerShuffle(), llvm::splitCodeGen(), splitGlobals(), splitMask(), llvm::SplitModule(), llvm::StackSafetyGlobalInfo::StackSafetyGlobalInfo(), llvm::orc::StaticInitGVIterator::StaticInitGVIterator(), stripDeadDebugInfoImpl(), stripDeadPrototypes(), stripDebugDeclareImpl(), llvm::stripDebugifyMetadata(), llvm::StripDebugInfo(), llvm::stripNonLineTableDebugInfo(), stripNonValidData(), StripSymbolNames(), stripTBAA(), StripTypeNames(), llvm::orc::ThreadSafeModule::ThreadSafeModule(), llvm::PBQP::Matrix::transpose(), llvm::trimBlockToPageSize(), llvm::json::Object::try_emplace(), llvm::VFABI::tryDemangleForVFABI(), llvm::tryPromoteCall(), llvm::MemoryPhi::unorderedDeleteIncomingValue(), llvm::updateVCallVisibilityInModule(), llvm::UpgradeARCRuntime(), llvm::UpgradeDebugInfo(), llvm::UpgradeIntrinsicCall(), llvm::UpgradeModuleFlags(), UpgradeRetainReleaseMarker(), llvm::UpgradeSectionAttributes(), UseTlsOffset(), llvm::json::Value::Value(), llvm::ValueEnumerator::ValueEnumerator(), llvm::verifyModule(), llvm::MachineRegisterInfo::verifyUseList(), llvm::VFDatabase::VFDatabase(), llvm::InstVisitor< ObjectSizeOffsetVisitor, SizeOffsetType >::visit(), llvm::InstCombinerImpl::visitAllocSite(), llvm::InstCombinerImpl::visitCallInst(), visitMaskedMerge(), llvm::SelectionDAGBuilder::visitSPDescriptorParent(), llvm::InstCombinerImpl::visitXor(), llvm::InnerLoopVectorizer::widenCallInstruction(), llvm::writeArchiveToStream(), llvm::WriteBitcodeToFile(), llvm::writeModule(), llvm::BitcodeWriter::writeModule(), writeSymbolTable(), llvm::BitcodeWriter::writeSymtab(), llvm::BitcodeWriter::writeThinLinkBitcode(), and llvm::WriteThinLinkBitcodeToFile().

◆ movl

gets compiled into this on rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movaps rsp movq rsp movq rsp movq rsp movq rsp movq rsp rax movq rsp rax movq rsp rsp rsp eax eax jbe LBB1_3 rcx rax movq rsp eax rsp ret ecx eax rcx movl rsp jmp LBB1_2 gcc rsp rax movq rsp rsp movq rsp rax movq rsp movl

Definition at line 117 of file README.txt.

◆ movw

< i32 > br label bb114 eax ecx movl ebp subl ebp eax movl ebp subl ebp eax movl ebp subl ebp eax subl ecx movl ebp eax movl ebp eax movl ebp movw

Definition at line 121 of file README.txt.

◆ movzbl

into eax xorps xmm0 xmm0 eax xmm0 eax xmm0 ret esp movzbl

Definition at line 210 of file README.txt.

◆ mulss

esp eax movl ecx ecx cvtsi2ss xmm0 eax cvtsi2ss xmm1 mulss

Definition at line 304 of file README.txt.

◆ neg

We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 neg

Definition at line 1271 of file README.txt.

◆ nicer

to esp esp setne al movzbw ax esp setg cl movzbw cx cmove cx cl jne LBB1_2 esp which is much nicer

Definition at line 650 of file README.txt.

◆ Note

We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles Note

Definition at line 239 of file README.txt.

Referenced by llvm::ELFYAML::NoteSection::NoteSection().

◆ nounwind

We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 nounwind
Initial value:
{
%tmp2 = icmp eq i8 %d, 0

Definition at line 973 of file README.txt.

Referenced by abort_gzip(), AANoUnwindFunction::trackStatistics(), and AANoUnwindCallSite::trackStatistics().

◆ object