Go to the documentation of this file.
14 #ifndef LLVM_MCA_SUPPORT_H
15 #define LLVM_MCA_SUPPORT_H
52 unsigned Numerator, Denominator;
57 : Numerator(Cycles), Denominator(ResourceUnits) {}
60 assert(Denominator &&
"Invalid denominator (must be non-zero).");
61 return (Denominator == 1) ? Numerator : (
double)Numerator / Denominator;
101 assert(
Mask &&
"Processor Resource Mask cannot be zero!");
110 unsigned NumMicroOps,
115 #endif // LLVM_MCA_SUPPORT_H
This is an optimization pass for GlobalISel generic memory operations.
We currently emits eax Perhaps this is what we really should generate is Is imull three or four cycles eax eax The current instruction priority is based on pattern complexity The former is more complex because it folds a load so the latter will not be emitted Perhaps we should use AddedComplexity to give LEA32r a higher priority We should always try to match LEA first since the LEA matching code does some estimate to determine whether the match is profitable if we care more about code then imull is better It s two bytes shorter than movl leal On a Pentium M
into xmm2 addss xmm2 xmm1 xmm3 addss xmm3 movaps xmm0 unpcklps xmm0 ret seems silly when it could just be one addps Expand libm rounding functions main should enable SSE DAZ mode and other fast SSE modes Think about doing i64 math in SSE regs on x86 This testcase should have no SSE instructions in and only one load from a constant double
unsigned getNumerator() const
InstructionError(std::string M, const T &MCI)
constexpr std::underlying_type_t< E > Mask()
Get a bitmask with 1s in all places up to the high-order bit of E's largest value.
double computeBlockRThroughput(const MCSchedModel &SM, unsigned DispatchWidth, unsigned NumMicroOps, ArrayRef< unsigned > ProcResourceUsage)
Compute the reciprocal block throughput from a set of processor resource cycles.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
ResourceCycles & operator+=(const ResourceCycles &RHS)
This class implements an extremely fast bulk output stream that can only output to a stream.
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
OutputIt move(R &&Range, OutputIt Out)
Provide wrappers to std::move which take ranges instead of having to pass begin/end explicitly.
Base class for user error types.
std::error_code convertToErrorCode() const override
Convert this error to a std::error_code.
std::error_code inconvertibleErrorCode()
The value returned by this function can be returned from convertToErrorCode for Error values where no...
Machine model for scheduling, bundling, and heuristics.
unsigned getResourceStateIndex(uint64_t Mask)
unsigned countLeadingZeros(T Val, ZeroBehavior ZB=ZB_Width)
Count number of 0's from the most significant bit to the least stopping at the first 1.
void log(raw_ostream &OS) const override
Print an error message to an output stream.
unsigned getDenominator() const
ResourceCycles(unsigned Cycles, unsigned ResourceUnits=1)
This class represents the number of cycles per resource (fractions of cycles).
void computeProcResourceMasks(const MCSchedModel &SM, MutableArrayRef< uint64_t > Masks)
Populates vector Masks with processor resource masks.