=============================== Developer Guideline for AMDGPU =============================== .. contents:: :local: Introduction ============ This document highlights coding conventions, test policies, and other development guidelines that apply to all AMDGPU-related code across the LLVM project (the backend in ``llvm/lib/Target/AMDGPU``, Clang AMDGPU support, LLD, associated tests, etc.). It is **not** a replacement for or summary of the `LLVM Coding Standards `_ or the `LLVM Testing Guide `_; contributors are expected to be familiar with those documents as well. The topics covered here are those that come up frequently during AMDGPU code reviews. Some overlap with existing upstream rules and are restated here for easy reference. Coding Standards ================ AMDGPU-related code follows the `LLVM Coding Standards `_ with the refinements listed below. Use of ``auto`` --------------- The LLVM Coding Standards describe the policy for ``auto`` in `Use auto Type Deduction to Make Code More Readable `_. Below are more concrete examples of how that policy applies in AMDGPU code. Do **not** use ``auto`` except in the following cases: * Lambda expressions: .. code-block:: c++ auto Pred = [](unsigned Val) { return Val > 0; }; * Casts where the target type is already spelled out on the right-hand side (``cast``, ``dyn_cast``, ``static_cast``, etc.): .. code-block:: c++ auto *Inst = cast(V); auto *MD = dyn_cast(Op); auto Width = static_cast(Val); * Iterators: .. code-block:: c++ auto It = Container.begin(); * Structured bindings: .. code-block:: c++ auto [Key, Value] = Pair; In all other cases, write the type explicitly. .. code-block:: c++ // Avoid - the type is not obvious from the right-hand side. auto Reg = MI.getOperand(0).getReg(); auto Size = DL.getTypeAllocSize(Ty); // Preferred - spell out the type. Register Reg = MI.getOperand(0).getReg(); TypeSize Size = DL.getTypeAllocSize(Ty); Use of Braces ------------- The LLVM Coding Standards discuss brace usage in `Don't Use Braces on Simple Single-Statement Bodies of if/else/loop Statements `_. In AMDGPU code, braces may be omitted **only when the single statement fits on one line**. If the statement spans more than one line (e.g. because of a long argument list that wraps), keep the braces. For ``if``/``else`` chains, if **either** branch requires braces, add braces to **all** branches to keep them symmetric. .. code-block:: c++ // OK - single statement on one line, braces omitted. for (unsigned I = 0; I < N; ++I) doSomething(I); // Required - single statement, but spans multiple lines. if (Cond) { doSomethingElse(LongArgument1, LongArgument2); } // Required - the else branch needs braces, so the if branch gets them too. if (Cond) { doSomething(); } else { doSomethingElse(LongArgument1, LongArgument2); } // Avoid - asymmetric braces. if (Cond) doSomething(); else { doSomethingElse(LongArgument1, LongArgument2); } Instruction Naming ------------------ Instruction names in TableGen definitions (e.g. in ``VOP3PInstructions.td``, ``VOP2Instructions.td``, etc.) should follow the terminology of the ISA documentation. Use **all-caps** names that match the documentation's name for at least one target. When the compiler needs a variant of a documented instruction (e.g. a version with additional property flags or extra register uses), append a **lowercase** suffix to distinguish it from the canonical name. .. code-block:: text // Good - matches the ISA documentation exactly. defm V_ADD_F32 : VOP2Inst_VOPD <"v_add_f32", ...>; // Good - lowercase suffix for a compiler-invented variant. defm V_ADD_F32_e64 : ...; // Avoid - deviates from the documented name without reason. defm V_Add_F32 : ...; // Avoid - all-caps suffix for a compiler-invented variant; // use a lowercase suffix instead. defm V_ADD_F32_E64 : ...; Error and Diagnostic Messages ----------------------------- Messages passed to ``assert``, ``llvm_unreachable``, ``report_fatal_error``, diagnostic handlers, and similar should **not** start with an uppercase letter. Use lowercase as if the message were a continuation of a sentence, not the beginning of one. Do not end the message with a period. .. code-block:: c++ // Good report_fatal_error("malformed block"); // Avoid report_fatal_error("Malformed block"); Design Practices ================= Prefer Feature Checks over Generation Checks --------------------------------------------- Avoid conditioning logic on a specific GPU generation (e.g. ``isGFX11()``). Generation checks are fragile: when a new generation ships, every such check must be audited to decide whether it should include the new generation too. Instead, query the specific capability that the code actually depends on via a feature predicate (e.g. ``hasFeatureX()``). Feature predicates are self-documenting, compose better across generations, and do not require updates when a new generation is added. .. code-block:: c++ // Avoid - ties the logic to a specific generation. if (ST.isGFX11()) handleNewBehaviour(); // Preferred - checks the actual capability. if (ST.hasPackedFP32Ops()) handleNewBehaviour(); Prefer Separate Opcodes over Subtarget Checks ----------------------------------------------- When an instruction has different properties on different subtargets (e.g. different implicit register uses, different scheduling info, or different encoding constraints), define **separate opcodes** for each variant rather than using a single opcode and scattering ``if`` checks on the subtarget throughout the code. Distinct opcodes keep TableGen definitions self-contained, make scheduling and register allocation more accurate, and avoid a class of bugs where a subtarget check is accidentally omitted. Document New Builtins --------------------- All new AMDGPU builtins must have documentation added in ``clang/include/clang/Basic/BuiltinsAMDGPUDocs.td``. The documentation entry should be included in the same patch that introduces the builtin. Pull Requests ============= Keep Changes Focused -------------------- Each pull request should contain **one logical change**. Unrelated modifications - formatting fixes, variable renames, whitespace cleanup, etc. - should be submitted as separate PRs, even if they touch the same files. Mixing unrelated changes into a functional PR makes review harder, obscures the intent of the change in ``git log``, and complicates reverts if something goes wrong. Test Policy =========== Well-written tests are essential for a healthy codebase. The guidelines below apply to all AMDGPU regression tests (``llvm/test/CodeGen/AMDGPU``, ``llvm/test/MC/AMDGPU``, etc.). See also the general `Best practices for regression tests `_ and the `Precommit workflow for tests `_ in the LLVM Testing Guide. Use Minimal, Reduced Tests -------------------------- Every test should be the **smallest input that exercises the behaviour under test**. Avoid copying a full function from a real workload and pasting it into a test file. Instead, reduce the input so that it contains only the instructions and control flow needed to trigger the relevant code path. A minimal test is easier to understand, faster to run, and less likely to break for unrelated reasons. Avoid Undefined Behavior ------------------------ Tests should not rely on undefined behavior (UB). As the `best practices section of the Testing Guide `_ notes, avoid ``undef`` and ``poison`` values unless they are the point of the test - patterns like ``br i1 undef`` are likely to break as future optimizations evolve. In addition, avoid loads from or stores to ``null`` unless the test targets an address space where address zero is a valid memory location rather than a null pointer. For example, on AMDGPU, address space 0 (generic/flat) treats zero as a null pointer, but address space 3 (LDS) does not, so a load from ``ptr addrspace(3) null`` can be valid. .. code-block:: llvm ; Avoid - null is a null pointer in addrspace(0). define void @example_bad() { %val = load i32, ptr null ret void } ; OK - addrspace(3) has no null pointer, address zero is valid. define void @example_ok() { %val = load i32, ptr addrspace(3) null ret void } Use Named Values ---------------- Prefer descriptive, named IR values over anonymous numbered values. Names serve as lightweight documentation and make it much easier to understand a test's intent at a glance. .. code-block:: llvm ; Preferred - names describe what each value represents. define float @fma_example(float %x, float %y, float %z) { %fma = call float @llvm.fma.f32(float %x, float %y, float %z) ret float %fma } ; Avoid - anonymous numbers reveal nothing about intent. define float @fma_example(float %0, float %1, float %2) { %4 = call float @llvm.fma.f32(float %0, float %1, float %2) ret float %4 } Use Compact Virtual Register Numbers in MIR Tests -------------------------------------------------- In MIR tests, virtual register numbers should be compact and start from ``%0``. Avoid leaving gaps or starting at arbitrary high numbers (e.g. ``%128``, ``%256``). Sparse numbering makes tests harder to follow and suggests the test was extracted from a larger function without proper reduction. .. code-block:: none # Preferred - compact, sequential numbering. %0:vgpr_32 = COPY $vgpr0 %1:vgpr_32 = COPY $vgpr1 %2:vgpr_32 = V_ADD_U32_e32 %0, %1, implicit $exec # Avoid - sparse numbering with gaps. %128:vgpr_32 = COPY $vgpr0 %130:vgpr_32 = COPY $vgpr1 %255:vgpr_32 = V_ADD_U32_e32 %128, %130, implicit $exec Trim Unnecessary Attributes and Metadata ----------------------------------------- Strip attributes, metadata, and other annotations that are **not relevant to the behaviour being tested**. Extra noise makes it harder to see what a test actually depends on and can cause spurious failures when defaults change. For example, unless a test specifically exercises a particular function attribute or metadata node, remove them: .. code-block:: llvm ; Preferred - only the essential attributes remain. define amdgpu_kernel void @store_i32(ptr addrspace(1) %ptr, i32 %val) { store i32 %val, ptr addrspace(1) %ptr ret void } ; Avoid - unrelated attributes and metadata obscure the test's purpose. define amdgpu_kernel void @store_i32(ptr addrspace(1) %ptr, i32 %val) #0 !dbg !5 { store i32 %val, ptr addrspace(1) %ptr, align 4, !tbaa !11 ret void } attributes #0 = { nounwind "frame-pointer"="all" } Include Negative Tests ---------------------- Changes that introduce new restrictions, validations, or user-facing constructs should include **negative tests** that verify the correct diagnostic or rejection. Cases that require negative tests include, but are not limited to: * **New builtins** — verify that wrong argument types, wrong argument counts, and unsupported target features produce the expected Sema errors (e.g. ``clang/test/SemaOpenCL/builtins-amdgcn-error.cl``). * **New backend instructions** — verify that the assembler rejects invalid operands, illegal modifiers, and unsupported subtargets (e.g. ``llvm/test/MC/AMDGPU/gfx950_err.s``). * **New target types or features** — verify that incompatible target IDs, missing features, and invalid subtarget combinations are diagnosed (e.g. ``clang/test/Driver/invalid-target-id.cl``). Cover All Code Paths -------------------- Tests for a PR should ideally cover all of the code changes introduced by that PR. When adding a new instruction, for example, this means testing all supported combinations of operand kinds (VGPR, SGPR, immediate, inline constant, literal constant) as well as applicable modifiers (``opsel``, ``neg_lo``, ``neg_hi``, ``clamp``, etc.). The goal is to ensure that every encoding and selection path exercised by the new code is verified, so that regressions in any variant are caught immediately. Pipe Input via ``stdin`` ------------------------ Where feasible, feed the test file through ``stdin`` using ``< %s`` rather than passing it as a positional argument. This is the conventional style in AMDGPU tests and avoids the need for an explicit ``-o -``. .. code-block:: bash ; llc — preferred. ; RUN: llc -mtriple=amdgcn -mcpu=gfx900 < %s | FileCheck %s ; opt — preferred. ; RUN: opt -S -mtriple=amdgcn -passes=instcombine < %s | FileCheck %s ; llvm-mc — preferred. ; RUN: llvm-mc -triple=amdgcn -mcpu=gfx900 -show-encoding < %s | FileCheck %s This is not always possible. For example, ``llc`` infers the input format from the file extension; when reading from ``stdin`` it defaults to IR, so MIR tests need to pass the file as a positional argument: .. code-block:: bash ; MIR — pass the file directly so llc sees the .mir extension. ; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -run-pass=... %s -o - | FileCheck %s Use ``-filetype=null`` When Output Is Irrelevant ------------------------------------------------- When a test only needs to verify diagnostics, error messages, or the absence of a crash - and does not care about the actual code-generation output - pass ``-filetype=null`` to ``llc``. This skips object or assembly emission entirely, making the test faster and avoiding fragile ``CHECK`` lines tied to unrelated output. .. code-block:: bash ; RUN: llc -mtriple=amdgcn -mcpu=gfx900 -filetype=null %s 2>&1 | FileCheck %s Auto-Generate Check Lines ------------------------- When possible, use the UTC (Update Test Checks) scripts to generate ``CHECK`` lines rather than writing them by hand. Auto-generated checks are comprehensive, consistent, and easy to update when output changes. The most commonly used scripts for AMDGPU are: * ``llvm/utils/update_llc_test_checks.py`` - for ``llc`` CodeGen tests. * ``llvm/utils/update_mir_test_checks.py`` - for MIR tests. * ``llvm/utils/update_mc_test_checks.py`` - for MC (assembly/disassembly) tests. A typical workflow looks like: .. code-block:: bash # Write the test with a RUN line but no CHECK lines, then auto-generate: $ llvm/utils/update_llc_test_checks.py llvm/test/CodeGen/AMDGPU/my-test.ll # After a code change that intentionally alters output, re-generate: $ llvm/utils/update_llc_test_checks.py --update-only llvm/test/CodeGen/AMDGPU/my-test.ll Each script embeds a ``UTC_ARGS:`` comment in the test file so that subsequent runs of the script use the same options. Consult the ``--help`` output of each script for the full set of available flags. When writing check lines by hand, prefer ``CHECK-NEXT`` over ``CHECK-NOT``. Negative pattern matches fail silently when the output changes - the ``CHECK-NOT`` pattern may no longer appear for entirely unrelated reasons, and the test will still pass without actually verifying the intended behaviour. ``CHECK-NEXT`` ties the assertion to a specific position in the output, so any unexpected change causes a visible failure. .. note:: Hand-written ``CHECK`` lines are still appropriate when a test needs to verify only a narrow slice of the output (e.g. a single instruction) or when the auto-generated output would be excessively verbose and obscure the intent. In such cases, keep the hand-written checks focused and document why auto-generation was not used.