This pass simplifies certain intrinsic calls when the arguments are uniform. More...

#include "AMDGPU.h"
#include "GCNSubtarget.h"
#include "llvm/Analysis/DomTreeUpdater.h"
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/UniformityAnalysis.h"
#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstIterator.h"
#include "llvm/IR/InstVisitor.h"
#include "llvm/IR/IntrinsicsAMDGPU.h"
#include "llvm/IR/PatternMatch.h"
#include "llvm/InitializePasses.h"
#include "llvm/Target/TargetMachine.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"

Macros
#define	DEBUG_TYPE "amdgpu-uniform-intrinsic-combine"

Functions
static bool	isDivergentUseWithNew (const Use &U, const UniformityInfo &UI, const ValueMap< const Value *, bool > &Tracker)
	Wrapper for querying uniformity info that first checks locally tracked instructions.
static bool	optimizeUniformIntrinsic (IntrinsicInst &II, const UniformityInfo &UI, ValueMap< const Value *, bool > &Tracker)
	Optimizes uniform intrinsics calls if their operand can be proven uniform.
static bool	runUniformIntrinsicCombine (Function &F, const UniformityInfo &UI)
	Iterates over intrinsic calls in the Function to optimize.
	INITIALIZE_PASS_BEGIN (AMDGPUUniformIntrinsicCombineLegacy, DEBUG_TYPE, "AMDGPU Uniform Intrinsic Combine", false, false) INITIALIZE_PASS_END(AMDGPUUniformIntrinsicCombineLegacy

Variables
	DEBUG_TYPE
AMDGPU Uniform Intrinsic	Combine
AMDGPU Uniform Intrinsic	false

Detailed Description

This pass simplifies certain intrinsic calls when the arguments are uniform.

It's true that this pass has transforms that can lead to a situation where some instruction whose operand was previously recognized as statically uniform is later on no longer recognized as statically uniform. However, the semantics of how programs execute don't (and must not, for this precise reason) care about static uniformity, they only ever care about dynamic uniformity. And every instruction that's downstream and cares about dynamic uniformity must be convergent (and isel will introduce v_readfirstlane for them if their operands can't be proven statically uniform).

Definition in file AMDGPUUniformIntrinsicCombine.cpp.

Macro Definition Documentation

◆ DEBUG_TYPE

#define DEBUG_TYPE "amdgpu-uniform-intrinsic-combine"

Definition at line 38 of file AMDGPUUniformIntrinsicCombine.cpp.

Function Documentation

◆ INITIALIZE_PASS_BEGIN()

INITIALIZE_PASS_BEGIN	(	AMDGPUUniformIntrinsicCombineLegacy	,
		DEBUG_TYPE	,
		"AMDGPU Uniform Intrinsic Combine"	,
		false	,
		false	)

References DEBUG_TYPE, and INITIALIZE_PASS_DEPENDENCY.

◆ isDivergentUseWithNew()

bool isDivergentUseWithNew	(	const Use &	U,
		const UniformityInfo &	UI,
		const ValueMap< const Value *, bool > &	Tracker )

static

Wrapper for querying uniformity info that first checks locally tracked instructions.

Definition at line 47 of file AMDGPUUniformIntrinsicCombine.cpp.

References llvm::ValueMap< KeyT, ValueT, Config >::end(), llvm::ValueMap< KeyT, ValueT, Config >::find(), and llvm::GenericUniformityInfo< ContextT >::isDivergentUse().

Referenced by optimizeUniformIntrinsic().

◆ optimizeUniformIntrinsic()

bool optimizeUniformIntrinsic	(	IntrinsicInst &	II,
		const UniformityInfo &	UI,
		ValueMap< const Value *, bool > &	Tracker )

static

Optimizes uniform intrinsics calls if their operand can be proven uniform.

We deliberately do not simplify readfirstlane with a uniform argument, so that frontends can use it to force a copy to SGPR and thereby prevent the backend from generating unwanted waterfall loops.

Definition at line 56 of file AMDGPUUniformIntrinsicCombine.cpp.

References Changed, llvm::BinaryOperator::CreateNot(), llvm::dbgs(), llvm::dyn_cast(), llvm::Intrinsic::getOrInsertDeclaration(), llvm::CmpInst::ICMP_EQ, llvm::CmpInst::ICMP_NE, II, isDivergentUseWithNew(), LLVM_DEBUG, llvm::PatternMatch::m_Zero(), llvm::make_early_inc_range(), llvm::PatternMatch::match(), and Mod.