Developer Guideline for AMDGPU

Introduction

This document highlights coding conventions, test policies, and other development guidelines that apply to all AMDGPU-related code across the LLVM project (the backend in llvm/lib/Target/AMDGPU, Clang AMDGPU support, LLD, associated tests, etc.). It is not a replacement for or summary of the LLVM Coding Standards or the LLVM Testing Guide; contributors are expected to be familiar with those documents as well.

The topics covered here are those that come up frequently during AMDGPU code reviews. Some overlap with existing upstream rules and are restated here for easy reference.

Coding Standards

AMDGPU-related code follows the LLVM Coding Standards with the refinements listed below.

Use of auto

The LLVM Coding Standards describe the policy for auto in Use auto Type Deduction to Make Code More Readable. Below are more concrete examples of how that policy applies in AMDGPU code.

Do not use auto except in the following cases:

  • Lambda expressions:

    auto Pred = [](unsigned Val) { return Val > 0; };
    
  • Casts where the target type is already spelled out on the right-hand side (cast, dyn_cast, static_cast, etc.):

    auto *Inst = cast<CallInst>(V);
    auto *MD = dyn_cast<MDNode>(Op);
    auto Width = static_cast<unsigned>(Val);
    
  • Iterators:

    auto It = Container.begin();
    
  • Structured bindings:

    auto [Key, Value] = Pair;
    

In all other cases, write the type explicitly.

// Avoid - the type is not obvious from the right-hand side.
auto Reg = MI.getOperand(0).getReg();
auto Size = DL.getTypeAllocSize(Ty);

// Preferred - spell out the type.
Register Reg = MI.getOperand(0).getReg();
TypeSize Size = DL.getTypeAllocSize(Ty);

Use of Braces

The LLVM Coding Standards discuss brace usage in Don’t Use Braces on Simple Single-Statement Bodies of if/else/loop Statements. In AMDGPU code, braces may be omitted only when the single statement fits on one line. If the statement spans more than one line (e.g. because of a long argument list that wraps), keep the braces.

For if/else chains, if either branch requires braces, add braces to all branches to keep them symmetric.

// OK - single statement on one line, braces omitted.
for (unsigned I = 0; I < N; ++I)
  doSomething(I);

// Required - single statement, but spans multiple lines.
if (Cond) {
  doSomethingElse(LongArgument1,
                  LongArgument2);
}

// Required - the else branch needs braces, so the if branch gets them too.
if (Cond) {
  doSomething();
} else {
  doSomethingElse(LongArgument1,
                  LongArgument2);
}

// Avoid - asymmetric braces.
if (Cond)
  doSomething();
else {
  doSomethingElse(LongArgument1,
                  LongArgument2);
}

Instruction Naming

Instruction names in TableGen definitions (e.g. in VOP3PInstructions.td, VOP2Instructions.td, etc.) should follow the terminology of the ISA documentation. Use all-caps names that match the documentation’s name for at least one target. When the compiler needs a variant of a documented instruction (e.g. a version with additional property flags or extra register uses), append a lowercase suffix to distinguish it from the canonical name.

// Good - matches the ISA documentation exactly.
defm V_ADD_F32 : VOP2Inst_VOPD <"v_add_f32", ...>;

// Good - lowercase suffix for a compiler-invented variant.
defm V_ADD_F32_e64 : ...;

// Avoid - deviates from the documented name without reason.
defm V_Add_F32 : ...;

// Avoid - all-caps suffix for a compiler-invented variant;
// use a lowercase suffix instead.
defm V_ADD_F32_E64 : ...;

Error and Diagnostic Messages

Messages passed to assert, llvm_unreachable, report_fatal_error, diagnostic handlers, and similar should not start with an uppercase letter. Use lowercase as if the message were a continuation of a sentence, not the beginning of one. Do not end the message with a period.

// Good
report_fatal_error("malformed block");

// Avoid
report_fatal_error("Malformed block");

Design Practices

Prefer Feature Checks over Generation Checks

Avoid conditioning logic on a specific GPU generation (e.g. isGFX11()). Generation checks are fragile: when a new generation ships, every such check must be audited to decide whether it should include the new generation too.

Instead, query the specific capability that the code actually depends on via a feature predicate (e.g. hasFeatureX()). Feature predicates are self-documenting, compose better across generations, and do not require updates when a new generation is added.

// Avoid - ties the logic to a specific generation.
if (ST.isGFX11())
  handleNewBehaviour();

// Preferred - checks the actual capability.
if (ST.hasPackedFP32Ops())
  handleNewBehaviour();

Prefer Separate Opcodes over Subtarget Checks

When an instruction has different properties on different subtargets (e.g. different implicit register uses, different scheduling info, or different encoding constraints), define separate opcodes for each variant rather than using a single opcode and scattering if checks on the subtarget throughout the code. Distinct opcodes keep TableGen definitions self-contained, make scheduling and register allocation more accurate, and avoid a class of bugs where a subtarget check is accidentally omitted.

Document New Builtins

All new AMDGPU builtins must have documentation added in clang/include/clang/Basic/BuiltinsAMDGPUDocs.td. The documentation entry should be included in the same patch that introduces the builtin.

Pull Requests

Keep Changes Focused

Each pull request should contain one logical change. Unrelated modifications - formatting fixes, variable renames, whitespace cleanup, etc. - should be submitted as separate PRs, even if they touch the same files. Mixing unrelated changes into a functional PR makes review harder, obscures the intent of the change in git log, and complicates reverts if something goes wrong.

Test Policy

Well-written tests are essential for a healthy codebase. The guidelines below apply to all AMDGPU regression tests (llvm/test/CodeGen/AMDGPU, llvm/test/MC/AMDGPU, etc.). See also the general Best practices for regression tests and the Precommit workflow for tests in the LLVM Testing Guide.

Use Minimal, Reduced Tests

Every test should be the smallest input that exercises the behaviour under test. Avoid copying a full function from a real workload and pasting it into a test file. Instead, reduce the input so that it contains only the instructions and control flow needed to trigger the relevant code path. A minimal test is easier to understand, faster to run, and less likely to break for unrelated reasons.

Avoid Undefined Behavior

Tests should not rely on undefined behavior (UB). As the best practices section of the Testing Guide notes, avoid undef and poison values unless they are the point of the test - patterns like br i1 undef are likely to break as future optimizations evolve.

In addition, avoid loads from or stores to null unless the test targets an address space where address zero is a valid memory location rather than a null pointer. For example, on AMDGPU, address space 0 (generic/flat) treats zero as a null pointer, but address space 3 (LDS) does not, so a load from ptr addrspace(3) null can be valid.

; Avoid - null is a null pointer in addrspace(0).
define void @example_bad() {
  %val = load i32, ptr null
  ret void
}

; OK - addrspace(3) has no null pointer, address zero is valid.
define void @example_ok() {
  %val = load i32, ptr addrspace(3) null
  ret void
}

Use Named Values

Prefer descriptive, named IR values over anonymous numbered values. Names serve as lightweight documentation and make it much easier to understand a test’s intent at a glance.

; Preferred - names describe what each value represents.
define float @fma_example(float %x, float %y, float %z) {
  %fma = call float @llvm.fma.f32(float %x, float %y, float %z)
  ret float %fma
}

; Avoid - anonymous numbers reveal nothing about intent.
define float @fma_example(float %0, float %1, float %2) {
  %4 = call float @llvm.fma.f32(float %0, float %1, float %2)
  ret float %4
}

Use Compact Virtual Register Numbers in MIR Tests

In MIR tests, virtual register numbers should be compact and start from %0. Avoid leaving gaps or starting at arbitrary high numbers (e.g. %128, %256). Sparse numbering makes tests harder to follow and suggests the test was extracted from a larger function without proper reduction.

# Preferred - compact, sequential numbering.
%0:vgpr_32 = COPY $vgpr0
%1:vgpr_32 = COPY $vgpr1
%2:vgpr_32 = V_ADD_U32_e32 %0, %1, implicit $exec

# Avoid - sparse numbering with gaps.
%128:vgpr_32 = COPY $vgpr0
%130:vgpr_32 = COPY $vgpr1
%255:vgpr_32 = V_ADD_U32_e32 %128, %130, implicit $exec

Trim Unnecessary Attributes and Metadata

Strip attributes, metadata, and other annotations that are not relevant to the behaviour being tested. Extra noise makes it harder to see what a test actually depends on and can cause spurious failures when defaults change.

For example, unless a test specifically exercises a particular function attribute or metadata node, remove them:

; Preferred - only the essential attributes remain.
define amdgpu_kernel void @store_i32(ptr addrspace(1) %ptr, i32 %val) {
  store i32 %val, ptr addrspace(1) %ptr
  ret void
}

; Avoid - unrelated attributes and metadata obscure the test's purpose.
define amdgpu_kernel void @store_i32(ptr addrspace(1) %ptr, i32 %val) #0 !dbg !5 {
  store i32 %val, ptr addrspace(1) %ptr, align 4, !tbaa !11
  ret void
}

attributes #0 = { nounwind "frame-pointer"="all" }

Include Negative Tests

Changes that introduce new restrictions, validations, or user-facing constructs should include negative tests that verify the correct diagnostic or rejection. Cases that require negative tests include, but are not limited to:

  • New builtins — verify that wrong argument types, wrong argument counts, and unsupported target features produce the expected Sema errors (e.g. clang/test/SemaOpenCL/builtins-amdgcn-error.cl).

  • New backend instructions — verify that the assembler rejects invalid operands, illegal modifiers, and unsupported subtargets (e.g. llvm/test/MC/AMDGPU/gfx950_err.s).

  • New target types or features — verify that incompatible target IDs, missing features, and invalid subtarget combinations are diagnosed (e.g. clang/test/Driver/invalid-target-id.cl).

Cover All Code Paths

Tests for a PR should ideally cover all of the code changes introduced by that PR. When adding a new instruction, for example, this means testing all supported combinations of operand kinds (VGPR, SGPR, immediate, inline constant, literal constant) as well as applicable modifiers (opsel, neg_lo, neg_hi, clamp, etc.). The goal is to ensure that every encoding and selection path exercised by the new code is verified, so that regressions in any variant are caught immediately.

Pipe Input via stdin

Where feasible, feed the test file through stdin using < %s rather than passing it as a positional argument. This is the conventional style in AMDGPU tests and avoids the need for an explicit -o -.

; llc  preferred.
; RUN: llc -mtriple=amdgcn -mcpu=gfx900 < %s | FileCheck %s

; opt  preferred.
; RUN: opt -S -mtriple=amdgcn -passes=instcombine < %s | FileCheck %s

; llvm-mc  preferred.
; RUN: llvm-mc -triple=amdgcn -mcpu=gfx900 -show-encoding < %s | FileCheck %s

This is not always possible. For example, llc infers the input format from the file extension; when reading from stdin it defaults to IR, so MIR tests need to pass the file as a positional argument:

; MIR  pass the file directly so llc sees the .mir extension.
; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -run-pass=... %s -o - | FileCheck %s

Use -filetype=null When Output Is Irrelevant

When a test only needs to verify diagnostics, error messages, or the absence of a crash - and does not care about the actual code-generation output - pass -filetype=null to llc. This skips object or assembly emission entirely, making the test faster and avoiding fragile CHECK lines tied to unrelated output.

; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -filetype=null %s 2>&1 | FileCheck %s

Auto-Generate Check Lines

When possible, use the UTC (Update Test Checks) scripts to generate CHECK lines rather than writing them by hand. Auto-generated checks are comprehensive, consistent, and easy to update when output changes.

The most commonly used scripts for AMDGPU are:

  • llvm/utils/update_llc_test_checks.py - for llc CodeGen tests.

  • llvm/utils/update_mir_test_checks.py - for MIR tests.

  • llvm/utils/update_mc_test_checks.py - for MC (assembly/disassembly) tests.

A typical workflow looks like:

# Write the test with a RUN line but no CHECK lines, then auto-generate:
$ llvm/utils/update_llc_test_checks.py llvm/test/CodeGen/AMDGPU/my-test.ll

# After a code change that intentionally alters output, re-generate:
$ llvm/utils/update_llc_test_checks.py --update-only llvm/test/CodeGen/AMDGPU/my-test.ll

Each script embeds a UTC_ARGS: comment in the test file so that subsequent runs of the script use the same options. Consult the --help output of each script for the full set of available flags.

When writing check lines by hand, prefer CHECK-NEXT over CHECK-NOT. Negative pattern matches fail silently when the output changes - the CHECK-NOT pattern may no longer appear for entirely unrelated reasons, and the test will still pass without actually verifying the intended behaviour. CHECK-NEXT ties the assertion to a specific position in the output, so any unexpected change causes a visible failure.

Note

Hand-written CHECK lines are still appropriate when a test needs to verify only a narrow slice of the output (e.g. a single instruction) or when the auto-generated output would be excessively verbose and obscure the intent. In such cases, keep the hand-written checks focused and document why auto-generation was not used.