LLVM  16.0.0git
#include <string.h>
#include <altivec.h>

## Typedefs

using C = (vector float) vec_cmpeq(*A, *B)

## Functions

Implement PPCInstrInfo::isLoadFromStackSlot isStoreToStackSlot for vector to generate better spill code The first should be a single lvx from the constant the second should be a xor bar (x)

void foo (void)

if (!vec_any_eq(*A, *B)) *B

float space text globl _test align oris r3 lis ha16 (LCPI1_0) addi r4

float space text globl _test align oris r3 lis stfs r1 addi lfs lo16() LCPI1_0 (r3) stfs f0

float space text globl _test align oris r3 lis stfs r1 addi lfs lo16() r1 lvx r4 lvx r5 vmrghw v2 vspltw vmrghw v3 r2 blr int foo (vector float *x, vector float *y)

if (vec_all_ge(a, b)) aa|=0x1

if (vec_any_ge(a, b)) aa|=0x2

vector float f (vector float a, vector float b)

## Variables

Implement PPCInstrInfo::isLoadFromStackSlot isStoreToStackSlot for vector registers

Implement PPCInstrInfo::isLoadFromStackSlot isStoreToStackSlot for vector to generate better spill code The first should be a single lvx from the constant pool

Implement PPCInstrInfo::isLoadFromStackSlot isStoreToStackSlot for vector to generate better spill code The first should be a single lvx from the constant the second should be a xor stvx

Altivec not

Altivec we can use Consider this

v4f32 Vector2 = { Vector.X, Vector.X, Vector.X, Vector.X }

Since we know that Vector is byte aligned and we know the element offset of X

Since we know that Vector is byte aligned and we know the element offset of we should change the load into a lve *x instruction

Since we know that Vector is byte aligned and we know the element offset of we should change the load into a lve *x instead of doing a load store lve *x sequence Implement passing vectors by value into calls and receiving them as arguments GCC apparently tries to codegen

Since we know that Vector is byte aligned and we know the element offset of we should change the load into a lve *x instead of doing a load store lve *x sequence Implement passing vectors by value into calls and receiving them as arguments GCC apparently tries to then a load and vperm of Variable We need a way to teach tblgen that some operands of an intrinsic are required to be constants The verifier should enforce this constraint We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a byte aligned stack slot

Since we know that Vector is byte aligned and we know the element offset of we should change the load into a lve *x instead of doing a load store lve *x sequence Implement passing vectors by value into calls and receiving them as arguments GCC apparently tries to then a load and vperm of Variable We need a way to teach tblgen that some operands of an intrinsic are required to be constants The verifier should enforce this constraint We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a byte aligned stack followed by a load vperm We should probably just store it to a scalar stack then use lvsl vperm to load it If the value is already in memory this is a big win extract_vector_elt of an arbitrary constant vector can be done with the following instructions

Since we know that Vector is byte aligned and we know the element offset of we should change the load into a lve *x instead of doing a load store lve *x sequence Implement passing vectors by value into calls and receiving them as arguments GCC apparently tries to then a load and vperm of Variable We need a way to teach tblgen that some operands of an intrinsic are required to be constants The verifier should enforce this constraint We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a byte aligned stack followed by a load vperm We should probably just store it to a scalar stack then use lvsl vperm to load it If the value is already in memory this is a big win extract_vector_elt of an arbitrary constant vector can be done with the following vec_ste & destloc

A = C

we get the following basic block

we get the following basic r4 lvx v3

we get the following basic r4 lvx r3 vcmpeqfp v4

we get the following basic r4 lvx r3 vcmpeqfp v2 vcmpeqfp v2

we get the following basic r4 lvx r3 vcmpeqfp v2 vcmpeqfp v2 bne cr6

we get the following basic r4 lvx r3 vcmpeqfp v2 vcmpeqfp v2 bne LBB1_2

cond_next The vcmpeqfp vcmpeqfp instructions currently cannot be merged when the vcmpeqfp result is used by a branch This can be improved The code generated for this is truly aweful

cond_next The vcmpeqfp vcmpeqfp instructions currently cannot be merged when the vcmpeqfp result is used by a branch This can be improved The code generated for this is truly float b

float space text globl _test align _test

float space text globl _test align oris r3

float space text globl _test align oris r2

float space text globl _test align oris mtspr

float space text globl _test align oris r3 lis r1

float space text globl _test align oris r3 lis stfs f1

float space text globl _test align oris r3 lis stfs r1 addi r5

float space text globl _test align oris r3 lis stfs r1 addi lfs f0

A predicate compare being used in a select_cc should have the same peephole applied to it as a predicate compare used by a br_cc There should be no mfcr here

A predicate compare being used in a select_cc should have the same peephole applied to it as a predicate compare used by a br_cc There should be no mfcr oris r5 li li r6

A predicate compare being used in a select_cc should have the same peephole applied to it as a predicate compare used by a br_cc There should be no mfcr oris r5 li li lvx r4 lvx r3 vcmpeqfp v2 mfcr rlwinm cmpwi cr0

entry LBB1_1

entry mr r6 r2 blr CodeGen PowerPC vec_constants ll has an and operation that should be codegen d to andc The issue is that the all ones build vector is SelectNodeTo d a VSPLTISB instruction node before the and xor is selected which prevents the vnot pattern from matching An alternative to the store store load approach for illegal insert element lowering would be

splat index

vcmpeq to generate a select mask lvsl slot x

vperm to rotate result into correct slot vsel result together Should codegen branches on vec_any vec_all to avoid mfcr Two examples

return aa

We should do a little better with eliminating dead stores The stores to the stack are dead since a and b are not needed

Function Attrs

Function align align store< 16 x i8 >< i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16 >< 16 x i8 > * a

Function align align store< 16 x i8 >< i8 1, i8 2, i8 3, i8 4, i8 5, i8 6, i8 7, i8 8, i8 9, i8 10, i8 11, i8 12, i8 13, i8 14, i8 15, i8 16 >< 16 x i8 > align store< 16 x i8 >< i8 113, i8 114, i8 115, i8 116, i8 117, i8 118, i8 119, i8 120, i8 121, i8 122, i8 123, i8 124, i8 125, i8 126, i8 127, i8 112 >< 16 x i8align = load <16 x i8>* %a

Function< 16 x i8 > Produces the following code with mtriple

Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha addis

Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha addi

Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha LCPI0_0 toc l LCPI0_1 toc l lxvw4x

Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha LCPI0_0 toc l LCPI0_1 toc l stxvw4x

Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha LCPI0_0 toc l LCPI0_1 toc l ori

Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha LCPI0_0 toc l LCPI0_1 toc l vpmsumb

Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha LCPI0_0 toc l LCPI0_1 toc l blr long quad The two stxvw4x instructions are not needed With the associated permutes are present too The following example is found in test CodeGen PowerPC vec_add_sub_doubleword ll

## ◆ C

 using C = (vector float)vec_cmpeq(*A, *B)

Definition at line 86 of file README_ALTIVEC.txt.

## ◆ bar()

 Implement PPCInstrInfo::isLoadFromStackSlot isStoreToStackSlot for vector to generate better spill code The first should be a single lvx from the constant the second should be a xor bar ( x )

Referenced by foo().

## ◆ f()

 vector float f ( vector float a, vector float b )

Definition at line 200 of file README_ALTIVEC.txt.

References a, and b.

## ◆ foo() [1/2]

 float space text globl _test align oris r3 lis stfs r1 addi lfs lo16() r1 lvx r4 lvx r5 vmrghw v2 vspltw vmrghw v3 r2 blr int foo ( vector float * x, vector float * y )

Definition at line 137 of file README_ALTIVEC.txt.

## ◆ foo() [2/2]

 void foo ( void )

Definition at line 17 of file README_ALTIVEC.txt.

References llvm::support::aligned, bar(), and x.

## ◆ ha16()

 float space text globl _test align oris r3 lis ha16 ( LCPI1_0 )

## ◆ if() [1/3]

 if ( !vec_any_eq *, * B )

## ◆ if() [2/3]

 if ( vec_all_ge(a, b) )
pure virtual

## ◆ if() [3/3]

 if ( vec_any_ge(a, b) )
pure virtual

## ◆ LCPI1_0()

 float space text globl _test align oris r3 lis stfs r1 addi lfs lo16() LCPI1_0 ( r3 )

## Variable Documentation

Definition at line 25 of file README_ALTIVEC.txt.

Definition at line 112 of file README_ALTIVEC.txt.

## ◆ _test

 float space text globl _test align _test

Definition at line 118 of file README_ALTIVEC.txt.

## ◆ a

 Function align align store<16 x i8><16 x i8> * a

Definition at line 217 of file README_ALTIVEC.txt.

Referenced by f().

## ◆ aa

 return aa

Definition at line 197 of file README_ALTIVEC.txt.

 Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha LCPI0_0 toc l LCPI0_1 toc l addi

Definition at line 233 of file README_ALTIVEC.txt.

 Function<16 x i8> Produces the following code with LCPI0_0 toc ha addis

Definition at line 232 of file README_ALTIVEC.txt.

## ◆ align

 Function align align store<16 x i8><16 x i8> align store<16 x i8><16 x i8> align = load <16 x i8>* %a

Definition at line 219 of file README_ALTIVEC.txt.

## ◆ aweful

 cond_next The vcmpeqfp vcmpeqfp instructions currently cannot be merged when the vcmpeqfp result is used by a branch This can be improved The code generated for this is truly aweful

Definition at line 108 of file README_ALTIVEC.txt.

## ◆ b

 Function align align store<16 x i8><16 x i8> align store<16 x i8><16 x i8> * b
Initial value:
{
return (vector float){ 0.0, a, 0.0, 0.0}

Definition at line 108 of file README_ALTIVEC.txt.

Referenced by f().

## ◆ be

 entry mr r6 r2 blr CodeGen PowerPC vec_constants ll has an and operation that should be codegen d to andc The issue is that the all ones build vector is SelectNodeTo d a VSPLTISB instruction node before the and xor is selected which prevents the vnot pattern from matching An alternative to the store store load approach for illegal insert element lowering would be

Definition at line 181 of file README_ALTIVEC.txt.

## ◆ codegen

 Since we know that Vector is byte aligned and we know the element offset of we should change the load into a lve* x instead of doing a load store lve* x sequence Implement passing vectors by value into calls and receiving them as arguments GCC apparently tries to codegen
Initial value:
{ C1, C2, Variable, C3 } as a constant pool load
of C1/C2/C3

Definition at line 46 of file README_ALTIVEC.txt.

## ◆ cr0

 A predicate compare being used in a select_cc should have the same peephole applied to it as a predicate compare used by a br_cc There should be no mfcr oris r5 li li lvx r4 lvx r3 vcmpeqfp v2 mfcr rlwinm cmpwi bne cr0

Definition at line 156 of file README_ALTIVEC.txt.

## ◆ cr6

 we get the following basic r4 lvx r3 vcmpeqfp v2 vcmpeqfp v2 bne cr6

Definition at line 99 of file README_ALTIVEC.txt.

## ◆ destloc

 Since we know that Vector is byte aligned and we know the element offset of we should change the load into a lve* x instead of doing a load store lve* x sequence Implement passing vectors by value into calls and receiving them as arguments GCC apparently tries to then a load and vperm of Variable We need a way to teach tblgen that some operands of an intrinsic are required to be constants The verifier should enforce this constraint We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a byte aligned stack followed by a load vperm We should probably just store it to a scalar stack then use lvsl vperm to load it If the value is already in memory this is a big win extract_vector_elt of an arbitrary constant vector can be done with the following vec_ste& destloc

Definition at line 67 of file README_ALTIVEC.txt.

## ◆ examples

 vperm to rotate result into correct slot vsel result together Should codegen branches on vec_any vec_all to avoid mfcr Two examples

Definition at line 190 of file README_ALTIVEC.txt.

## ◆ f0

 float space text globl _test align oris r3 lis stfs r1 addi lfs f0

Definition at line 125 of file README_ALTIVEC.txt.

## ◆ f1

 float space text globl _test align oris r3 lis stfs f1

Definition at line 123 of file README_ALTIVEC.txt.

## ◆ here

 A predicate compare being used in a select_cc should have the same peephole applied to it as a predicate compare used by a br_cc There should be no mfcr here

Definition at line 147 of file README_ALTIVEC.txt.

## ◆ instruction

 Since we know that Vector is byte aligned and we know the element offset of we should change the load into a lve* x instruction

Definition at line 37 of file README_ALTIVEC.txt.

## ◆ instructions

 Since we know that Vector is byte aligned and we know the element offset of we should change the load into a lve* x instead of doing a load store lve* x sequence Implement passing vectors by value into calls and receiving them as arguments GCC apparently tries to then a load and vperm of Variable We need a way to teach tblgen that some operands of an intrinsic are required to be constants The verifier should enforce this constraint We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a byte aligned stack followed by a load vperm We should probably just store it to a scalar stack then use lvsl vperm to load it If the value is already in memory this is a big win extract_vector_elt of an arbitrary constant vector can be done with the following instructions

Definition at line 66 of file README_ALTIVEC.txt.

## ◆ LBB1_1

 We currently generate a but we really shouldn eax ecx xorl edx divl ecx eax divl ecx movl eax ret A similar code sequence works for division We currently compile i32 v2 eax eax jo LBB1_2 LBB1_1

Definition at line 159 of file README_ALTIVEC.txt.

## ◆ LBB1_2

 entry mr r5 LBB1_2

Definition at line 99 of file README_ALTIVEC.txt.

## ◆ ll

 Function<16 x i8> Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha LCPI0_0 toc l LCPI0_1 toc l blr long quad The two stxvw4x instructions are not needed With the associated permutes are present too The following example is found in test CodeGen PowerPC vec_add_sub_doubleword ll

Definition at line 257 of file README_ALTIVEC.txt.

## ◆ lxvw4x

 Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha LCPI0_0 toc l LCPI0_1 toc l lxvw4x

Definition at line 235 of file README_ALTIVEC.txt.

## ◆ mtriple

 Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha LCPI0_0 toc l LCPI0_1 toc l blr long quad The two stxvw4x instructions are not needed With mtriple
Initial value:
=powerpc64-unknown-linux-gnu:

Definition at line 230 of file README_ALTIVEC.txt.

## ◆ mtspr

 entry mr r6 mtspr

Definition at line 120 of file README_ALTIVEC.txt.

## ◆ needed

 We should do a little better with eliminating dead stores The stores to the stack are dead since a and b are not needed

Definition at line 212 of file README_ALTIVEC.txt.

## ◆ not

 Altivec not

Definition at line 28 of file README_ALTIVEC.txt.

## ◆ ori

 Function<16 x i8> Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha LCPI0_0 toc l LCPI0_1 toc l ori

Definition at line 239 of file README_ALTIVEC.txt.

## ◆ pool

 Implement PPCInstrInfo::isLoadFromStackSlot isStoreToStackSlot for vector to generate better spill code The first should be a single lvx from the constant pool

Definition at line 8 of file README_ALTIVEC.txt.

Referenced by llvm::gsym::DwarfTransformer::convert().

## ◆ r1

 float space text globl _test align oris r3 lis stfs r1 addi r1

Definition at line 122 of file README_ALTIVEC.txt.

## ◆ r2

 A predicate compare being used in a select_cc should have the same peephole applied to it as a predicate compare used by a br_cc There should be no mfcr oris r2

Definition at line 119 of file README_ALTIVEC.txt.

## ◆ r3

 entry mr r3

Definition at line 119 of file README_ALTIVEC.txt.

## ◆ r5

 A predicate compare being used in a select_cc should have the same peephole applied to it as a predicate compare used by a br_cc There should be no mfcr oris r5 li r5

Definition at line 124 of file README_ALTIVEC.txt.

## ◆ r6

 entry mr r6

Definition at line 151 of file README_ALTIVEC.txt.

## ◆ registers

 Implement PPCInstrInfo::isLoadFromStackSlot isStoreToStackSlot for vector registers

Definition at line 4 of file README_ALTIVEC.txt.

## ◆ slot

 Since we know that Vector is byte aligned and we know the element offset of we should change the load into a lve *x instead of doing a load store lve *x sequence Implement passing vectors by value into calls and receiving them as arguments GCC apparently tries to then a load and vperm of Variable We need a way to teach tblgen that some operands of an intrinsic are required to be constants The verifier should enforce this constraint We currently codegen SCALAR_TO_VECTOR as a store of the scalar to a byte aligned stack followed by a load vperm We should probably just store it to a scalar stack slot

## ◆ stvx

 Implement PPCInstrInfo::isLoadFromStackSlot isStoreToStackSlot for vector to generate better spill code The first should be a single lvx from the constant the second should be a xor stvx

Definition at line 12 of file README_ALTIVEC.txt.

## ◆ stxvw4x

 Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha LCPI0_0 toc l LCPI0_1 toc l stxvw4x

Definition at line 238 of file README_ALTIVEC.txt.

## ◆ this

 Altivec we can use Consider this

Definition at line 33 of file README_ALTIVEC.txt.

## ◆ v2

 A predicate compare being used in a select_cc should have the same peephole applied to it as a predicate compare used by a br_cc There should be no mfcr oris r5 li li lvx r4 lvx r3 vcmpeqfp v2

Definition at line 98 of file README_ALTIVEC.txt.

## ◆ v3

 A predicate compare being used in a select_cc should have the same peephole applied to it as a predicate compare used by a br_cc There should be no mfcr oris r5 li li lvx r4 lvx r3 vcmpeqfp v3

Definition at line 95 of file README_ALTIVEC.txt.

## ◆ v4

 we get the following basic r4 lvx r3 vcmpeqfp v4

Definition at line 96 of file README_ALTIVEC.txt.

## ◆ Vector2

 v4f32 Vector2 = { Vector.X, Vector.X, Vector.X, Vector.X }

Definition at line 34 of file README_ALTIVEC.txt.

## ◆ vpmsumb

 Function<16 x i8> Produces the following code with LCPI0_0 toc ha LCPI0_1 toc ha LCPI0_0 toc l LCPI0_1 toc l vpmsumb

Definition at line 243 of file README_ALTIVEC.txt.

## ◆ X

 Instead of the following for memset char edx edx edx It might be better to generate eax movl edx movl edx movw edx when we can spare a register It reduces code size Evaluate what the best way to codegen sdiv C is For X

## ◆ x

 vcmpeq to generate a select mask lvsl slot x

Definition at line 182 of file README_ALTIVEC.txt.

Referenced by foo().

as
compiles conv shl5 shl ret i32 or10 it would be better as
constant
we should consider alternate ways to model stack dependencies Lots of things could be done in WebAssemblyTargetTransformInfo cpp there are numerous optimization related hooks that can be overridden in WebAssemblyTargetLowering Instead of the OptimizeReturned which should consider preserving the returned attribute through to MachineInstrs and extending the MemIntrinsicResults pass to do this optimization on calls too That would also let the WebAssemblyPeephole pass clean up dead defs for such as it does for stores Consider implementing and or getMachineCombinerPatterns Find a clean way to fix the problem which leads to the Shrink Wrapping pass being run after the WebAssembly PEI pass When setting multiple variables to the same constant
Function< 16 x i8 > Produces the following code with LCPI0_0 toc ha addis
C1
instcombine should handle this C2 when C1
pool
Implement PPCInstrInfo::isLoadFromStackSlot isStoreToStackSlot for vector to generate better spill code The first should be a single lvx from the constant pool
a
=0.0 ? 0.0 :(a > 0.0 ? 1.0 :-1.0) a