Developer Guideline for AMDGPU¶
Introduction¶
This document highlights coding conventions, test policies, and other
development guidelines that apply to all AMDGPU-related code across the LLVM
project (the backend in llvm/lib/Target/AMDGPU, Clang AMDGPU support, LLD,
associated tests, etc.). It is not a replacement for or summary of the
LLVM Coding Standards or the
LLVM Testing Guide; contributors
are expected to be familiar with those documents as well.
The topics covered here are those that come up frequently during AMDGPU code reviews. Some overlap with existing upstream rules and are restated here for easy reference.
Coding Standards¶
AMDGPU-related code follows the LLVM Coding Standards with the refinements listed below.
Use of auto¶
The LLVM Coding Standards describe the policy for auto in
Use auto Type Deduction to Make Code More Readable.
Below are more concrete examples of how that policy applies in AMDGPU code.
Do not use auto except in the following cases:
Lambda expressions:
auto Pred = [](unsigned Val) { return Val > 0; };
Casts where the target type is already spelled out on the right-hand side (
cast,dyn_cast,static_cast, etc.):auto *Inst = cast<CallInst>(V); auto *MD = dyn_cast<MDNode>(Op); auto Width = static_cast<unsigned>(Val);
Iterators:
auto It = Container.begin();
Structured bindings:
auto [Key, Value] = Pair;
In all other cases, write the type explicitly.
// Avoid - the type is not obvious from the right-hand side.
auto Reg = MI.getOperand(0).getReg();
auto Size = DL.getTypeAllocSize(Ty);
// Preferred - spell out the type.
Register Reg = MI.getOperand(0).getReg();
TypeSize Size = DL.getTypeAllocSize(Ty);
Use of Braces¶
The LLVM Coding Standards discuss brace usage in Don’t Use Braces on Simple Single-Statement Bodies of if/else/loop Statements. In AMDGPU code, braces may be omitted only when the single statement fits on one line. If the statement spans more than one line (e.g. because of a long argument list that wraps), keep the braces.
For if/else chains, if either branch requires braces, add braces to
all branches to keep them symmetric.
// OK - single statement on one line, braces omitted.
for (unsigned I = 0; I < N; ++I)
doSomething(I);
// Required - single statement, but spans multiple lines.
if (Cond) {
doSomethingElse(LongArgument1,
LongArgument2);
}
// Required - the else branch needs braces, so the if branch gets them too.
if (Cond) {
doSomething();
} else {
doSomethingElse(LongArgument1,
LongArgument2);
}
// Avoid - asymmetric braces.
if (Cond)
doSomething();
else {
doSomethingElse(LongArgument1,
LongArgument2);
}
Instruction Naming¶
Instruction names in TableGen definitions (e.g. in VOP3PInstructions.td,
VOP2Instructions.td, etc.) should follow the terminology of the ISA
documentation. Use all-caps names that match the documentation’s name for
at least one target. When the compiler needs a variant of a documented
instruction (e.g. a version with additional property flags or extra register
uses), append a lowercase suffix to distinguish it from the canonical name.
// Good - matches the ISA documentation exactly.
defm V_ADD_F32 : VOP2Inst_VOPD <"v_add_f32", ...>;
// Good - lowercase suffix for a compiler-invented variant.
defm V_ADD_F32_e64 : ...;
// Avoid - deviates from the documented name without reason.
defm V_Add_F32 : ...;
// Avoid - all-caps suffix for a compiler-invented variant;
// use a lowercase suffix instead.
defm V_ADD_F32_E64 : ...;
Error and Diagnostic Messages¶
Messages passed to assert, llvm_unreachable, report_fatal_error,
diagnostic handlers, and similar should not start with an uppercase letter.
Use lowercase as if the message were a continuation of a sentence, not the
beginning of one. Do not end the message with a period.
// Good
report_fatal_error("malformed block");
// Avoid
report_fatal_error("Malformed block");
Design Practices¶
Prefer Feature Checks over Generation Checks¶
Avoid conditioning logic on a specific GPU generation (e.g. isGFX11()).
Generation checks are fragile: when a new generation ships, every such check
must be audited to decide whether it should include the new generation too.
Instead, query the specific capability that the code actually depends on via a
feature predicate (e.g. hasFeatureX()). Feature predicates are
self-documenting, compose better across generations, and do not require updates
when a new generation is added.
// Avoid - ties the logic to a specific generation.
if (ST.isGFX11())
handleNewBehaviour();
// Preferred - checks the actual capability.
if (ST.hasPackedFP32Ops())
handleNewBehaviour();
Prefer Separate Opcodes over Subtarget Checks¶
When an instruction has different properties on different subtargets (e.g.
different implicit register uses, different scheduling info, or different
encoding constraints), define separate opcodes for each variant rather than
using a single opcode and scattering if checks on the subtarget throughout
the code. Distinct opcodes keep TableGen definitions self-contained, make
scheduling and register allocation more accurate, and avoid a class of bugs
where a subtarget check is accidentally omitted.
Document New Builtins¶
All new AMDGPU builtins must have documentation added in
clang/include/clang/Basic/BuiltinsAMDGPUDocs.td. The documentation entry
should be included in the same patch that introduces the builtin.
Pull Requests¶
Keep Changes Focused¶
Each pull request should contain one logical change. Unrelated
modifications - formatting fixes, variable renames, whitespace cleanup, etc. -
should be submitted as separate PRs, even if they touch the same files. Mixing
unrelated changes into a functional PR makes review harder, obscures the intent
of the change in git log, and complicates reverts if something goes wrong.
Test Policy¶
Well-written tests are essential for a healthy codebase. The guidelines below
apply to all AMDGPU regression tests (llvm/test/CodeGen/AMDGPU,
llvm/test/MC/AMDGPU, etc.). See also the general
Best practices for regression tests
and the
Precommit workflow for tests
in the LLVM Testing Guide.
Use Minimal, Reduced Tests¶
Every test should be the smallest input that exercises the behaviour under test. Avoid copying a full function from a real workload and pasting it into a test file. Instead, reduce the input so that it contains only the instructions and control flow needed to trigger the relevant code path. A minimal test is easier to understand, faster to run, and less likely to break for unrelated reasons.
Avoid Undefined Behavior¶
Tests should not rely on undefined behavior (UB). As the
best practices section of the Testing Guide
notes, avoid undef and poison values unless they are the point of the
test - patterns like br i1 undef are likely to break as future
optimizations evolve.
In addition, avoid loads from or stores to null unless the test targets an
address space where address zero is a valid memory location rather than a null
pointer. For example, on AMDGPU, address space 0 (generic/flat) treats zero as
a null pointer, but address space 3 (LDS) does not, so a load from
ptr addrspace(3) null can be valid.
; Avoid - null is a null pointer in addrspace(0).
define void @example_bad() {
%val = load i32, ptr null
ret void
}
; OK - addrspace(3) has no null pointer, address zero is valid.
define void @example_ok() {
%val = load i32, ptr addrspace(3) null
ret void
}
Use Named Values¶
Prefer descriptive, named IR values over anonymous numbered values. Names serve as lightweight documentation and make it much easier to understand a test’s intent at a glance.
; Preferred - names describe what each value represents.
define float @fma_example(float %x, float %y, float %z) {
%fma = call float @llvm.fma.f32(float %x, float %y, float %z)
ret float %fma
}
; Avoid - anonymous numbers reveal nothing about intent.
define float @fma_example(float %0, float %1, float %2) {
%4 = call float @llvm.fma.f32(float %0, float %1, float %2)
ret float %4
}
Use Compact Virtual Register Numbers in MIR Tests¶
In MIR tests, virtual register numbers should be compact and start from
%0. Avoid leaving gaps or starting at arbitrary high numbers (e.g.
%128, %256). Sparse numbering makes tests harder to follow and
suggests the test was extracted from a larger function without proper
reduction.
# Preferred - compact, sequential numbering.
%0:vgpr_32 = COPY $vgpr0
%1:vgpr_32 = COPY $vgpr1
%2:vgpr_32 = V_ADD_U32_e32 %0, %1, implicit $exec
# Avoid - sparse numbering with gaps.
%128:vgpr_32 = COPY $vgpr0
%130:vgpr_32 = COPY $vgpr1
%255:vgpr_32 = V_ADD_U32_e32 %128, %130, implicit $exec
Trim Unnecessary Attributes and Metadata¶
Strip attributes, metadata, and other annotations that are not relevant to the behaviour being tested. Extra noise makes it harder to see what a test actually depends on and can cause spurious failures when defaults change.
For example, unless a test specifically exercises a particular function attribute or metadata node, remove them:
; Preferred - only the essential attributes remain.
define amdgpu_kernel void @store_i32(ptr addrspace(1) %ptr, i32 %val) {
store i32 %val, ptr addrspace(1) %ptr
ret void
}
; Avoid - unrelated attributes and metadata obscure the test's purpose.
define amdgpu_kernel void @store_i32(ptr addrspace(1) %ptr, i32 %val) #0 !dbg !5 {
store i32 %val, ptr addrspace(1) %ptr, align 4, !tbaa !11
ret void
}
attributes #0 = { nounwind "frame-pointer"="all" }
Include Negative Tests¶
Changes that introduce new restrictions, validations, or user-facing constructs should include negative tests that verify the correct diagnostic or rejection. Cases that require negative tests include, but are not limited to:
New builtins — verify that wrong argument types, wrong argument counts, and unsupported target features produce the expected Sema errors (e.g.
clang/test/SemaOpenCL/builtins-amdgcn-error.cl).New backend instructions — verify that the assembler rejects invalid operands, illegal modifiers, and unsupported subtargets (e.g.
llvm/test/MC/AMDGPU/gfx950_err.s).New target types or features — verify that incompatible target IDs, missing features, and invalid subtarget combinations are diagnosed (e.g.
clang/test/Driver/invalid-target-id.cl).
Cover All Code Paths¶
Tests for a PR should ideally cover all of the code changes introduced by that
PR. When adding a new instruction, for example, this means testing all
supported combinations of operand kinds (VGPR, SGPR, immediate, inline
constant, literal constant) as well as applicable modifiers (opsel,
neg_lo, neg_hi, clamp, etc.). The goal is to ensure that every
encoding and selection path exercised by the new code is verified, so that
regressions in any variant are caught immediately.
Pipe Input via stdin¶
Where feasible, feed the test file through stdin using < %s rather than
passing it as a positional argument. This is the conventional style in AMDGPU
tests and avoids the need for an explicit -o -.
; llc — preferred.
; RUN: llc -mtriple=amdgcn -mcpu=gfx900 < %s | FileCheck %s
; opt — preferred.
; RUN: opt -S -mtriple=amdgcn -passes=instcombine < %s | FileCheck %s
; llvm-mc — preferred.
; RUN: llvm-mc -triple=amdgcn -mcpu=gfx900 -show-encoding < %s | FileCheck %s
This is not always possible. For example, llc infers the input format from
the file extension; when reading from stdin it defaults to IR, so MIR tests
need to pass the file as a positional argument:
; MIR — pass the file directly so llc sees the .mir extension.
; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -run-pass=... %s -o - | FileCheck %s
Use -filetype=null When Output Is Irrelevant¶
When a test only needs to verify diagnostics, error messages, or the absence of
a crash - and does not care about the actual code-generation output - pass
-filetype=null to llc. This skips object or assembly emission entirely,
making the test faster and avoiding fragile CHECK lines tied to unrelated
output.
; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -filetype=null %s 2>&1 | FileCheck %s
Auto-Generate Check Lines¶
When possible, use the UTC (Update Test Checks) scripts to generate CHECK
lines rather than writing them by hand. Auto-generated checks are
comprehensive, consistent, and easy to update when output changes.
The most commonly used scripts for AMDGPU are:
llvm/utils/update_llc_test_checks.py- forllcCodeGen tests.llvm/utils/update_mir_test_checks.py- for MIR tests.llvm/utils/update_mc_test_checks.py- for MC (assembly/disassembly) tests.
A typical workflow looks like:
# Write the test with a RUN line but no CHECK lines, then auto-generate:
$ llvm/utils/update_llc_test_checks.py llvm/test/CodeGen/AMDGPU/my-test.ll
# After a code change that intentionally alters output, re-generate:
$ llvm/utils/update_llc_test_checks.py --update-only llvm/test/CodeGen/AMDGPU/my-test.ll
Each script embeds a UTC_ARGS: comment in the test file so that subsequent
runs of the script use the same options. Consult the --help output of each
script for the full set of available flags.
When writing check lines by hand, prefer CHECK-NEXT over CHECK-NOT.
Negative pattern matches fail silently when the output changes - the
CHECK-NOT pattern may no longer appear for entirely unrelated reasons, and
the test will still pass without actually verifying the intended behaviour.
CHECK-NEXT ties the assertion to a specific position in the output, so any
unexpected change causes a visible failure.
Note
Hand-written CHECK lines are still appropriate when a test needs to
verify only a narrow slice of the output (e.g. a single instruction) or
when the auto-generated output would be excessively verbose and obscure the
intent. In such cases, keep the hand-written checks focused and document
why auto-generation was not used.
