Memory Model Relaxation Annotations

Introduction

Memory Model Relaxation Annotations (MMRAs) are target-defined properties on instructions that can be used to selectively relax constraints placed by the memory model. For example:

  • The use of VulkanMemoryModel in a SPIRV program allows certain memory operations to be reordered across acquire or release operations.

  • OpenCL APIs expose primitives to only fence a specific set of address spaces. Carrying that information to the backend can enable the use of faster synchronization instructions, rather than fencing all address spaces everytime.

MMRAs offer an opt-in system for targets to relax the default LLVM memory model. As such, they are attached to an operation using LLVM metadata which can always be dropped without affecting correctness.

Definitions

memory operation

A load, a store, an atomic, or a function call that is marked as accessing memory.

synchronizing operation

An instruction that synchronizes memory with other threads (e.g. an atomic or a fence).

tag

Metadata attached to a memory or synchronizing operation that represents some target-defined property regarding memory synchronization.

An operation may have multiple tags that each represent a different property.

A tag is composed of a pair of metadata string: a prefix and a suffix.

In LLVM IR, the pair is represented using a metadata tuple. In other cases (comments, documentation, etc.), we may use the prefix:suffix notation. For example:

Example: Tags in Metadata
!0 = !{!"scope", !"workgroup"}  # scope:workgroup
!1 = !{!"scope", !"device"}     # scope:device
!2 = !{!"scope", !"system"}     # scope:system

Note

The only semantics relevant to the optimizer is the “compatibility” relation defined below. All other semantics are target defined.

Tags can also be organised in lists to allow operations to specify all of the tags they belong to. Such a list is referred to as a “set of tags”.

Example: Set of Tags in Metadata
!0 = !{!"scope", !"workgroup"}
!1 = !{!"sync-as", !"private"}
!2 = !{!0, !2}

Note

If an operation does not have MMRA metadata, it’s treated as if it has an empty list (!{}) of tags.

Note that it is not an error if a tag is not recognized by the instruction it is applied to, or by the current target. Such tags are simply ignored.

Both synchronizing operations and memory operations can have zero or more tags attached to them using the !mmra syntax.

For the sake of readability in examples below, we use a (non-functional) short syntax to represent MMMRA metadata:

Short Syntax Example
store %ptr1 # foo:bar
store %ptr1 !mmra !{!"foo", !"bar"}

These two notations can be used in this document and are strictly equivalent. However, only the second version is functional.

compatibility

Two sets of tags are said to be compatible iff, for every unique tag prefix P present in at least one set:

  • the other set contains no tag with prefix P, or

  • at least one tag with prefix P is common to both sets.

The above definition implies that an empty set is always compatible with any other set. This is an important property as it ensures that if a transform drops the metadata on an operation, it can never affect correctness. In other words, the memory model cannot be relaxed further by deleting metadata from instructions.

The happens-before Relation

Compatibility checks can be used to opt out of the happens-before relation established between two instructions.

Ordering

When two instructions’ metadata are not compatible, any program order between them are not in happens-before.

For example, consider two tags foo:bar and foo:baz exposed by a target:

A: store %ptr1                 # foo:bar
B: store %ptr2                 # foo:baz
X: store atomic release %ptr3  # foo:bar

In the above figure, A is compatible with X, and hence A happens-before X. But B is not compatible with X, and hence it is not happens-before X.

Synchronization

If an synchronizing operation has one or more tags, then whether it synchronizes-with and participates in the seq_cst order with other operations is target dependent.

Whether the following example synchronizes with another sequence depends on the target-defined semantics of foo:bar and foo:bux.

fence release               # foo:bar
store atomic %ptr1          # foo:bux

Examples

Example 1:
A: store ptr addrspace(1) %ptr2                  # sync-as:1 vulkan:nonprivate
B: store atomic release ptr addrspace(1) %ptr3   # sync-as:0 vulkan:nonprivate

A and B are not ordered relative to each other (no happens-before) because their sets of tags are not compatible.

Note that the sync-as value does not have to match the addrspace value. e.g. In Example 1, a store-release to a location in addrspace(1) wants to only synchronize with operations happening in addrspace(0).

Example 2:
A: store ptr addrspace(1) %ptr2                 # sync-as:1 vulkan:nonprivate
B: store atomic release ptr addrspace(1) %ptr3  # sync-as:1 vulkan:nonprivate

The ordering of A and B is unaffected because their set of tags are compatible.

Note that A and B may or may not be in happens-before due to other reasons.

Example 3:
A: store ptr addrspace(1) %ptr2                 # sync-as:1 vulkan:nonprivate
B: store atomic release ptr addrspace(1) %ptr3  # vulkan:nonprivate

The ordering of A and B is unaffected because their set of tags are compatible.

Example 4:
A: store ptr addrspace(1) %ptr2                 # sync-as:1
B: store atomic release ptr addrspace(1) %ptr3  # sync-as:2

A and B do not have to be ordered relative to each other (no happens-before) because their sets of tags are not compatible.

Use-cases

SPIRV NonPrivatePointer

MMRAs can support the SPIRV capability VulkanMemoryModel, where synchronizing operations only affect memory operations that specify NonPrivatePointer semantics.

The example below is generated from a SPIRV program using the following recipe:

  • Add vulkan:nonprivate to every synchronizing operation.

  • Add vulkan:nonprivate to every non-atomic memory operation that is marked NonPrivatePointer.

  • Add vulkan:private to tags of every non-atomic memory operation that is not marked NonPrivatePointer.

Thread T1:
 A: store %ptr1                 # vulkan:nonprivate
 B: store %ptr2                 # vulkan:private
 X: store atomic release %ptr3  # vulkan:nonprivate

Thread T2:
 Y: load atomic acquire %ptr3   # vulkan:nonprivate
 C: load %ptr2                  # vulkan:private
 D: load %ptr1                  # vulkan:nonprivate

Compatibility ensures that operation A is ordered relative to X while operation D is ordered relative to Y. If X synchronizes with Y, then A happens-before D. No such relation can be inferred about operations B and C.

Note

The Vulkan Memory Model considers all atomic operation non-private.

Whether vulkan:nonprivate would be specified on atomic operations is an implementation detail, as an atomic operation is always nonprivate. The implementation may choose to be explicit and emit IR with vulkan:nonprivate on every atomic operation, or it could choose to only emit vulkan::private and assume vulkan:nonprivate by default.

Operations marked with vulkan:private effectively opt out of the happens-before order in a SPIRV program since they are incompatible with every synchronizing operation. Note that SPIRV operations that are not marked NonPrivatePointer are not entirely private to the thread — they are implicitly synchronized at the start or end of a thread by the Vulkan system-synchronizes-with relationship. This example assumes that the target-defined semantics of vulkan:private correctly implements this property.

This scheme is general enough to express the interoperability of SPIRV programs with other environments.

Thread T1:
A: store %ptr1                 # vulkan:nonprivate
X: store atomic release %ptr2  # vulkan:nonprivate

Thread T2:
Y: load atomic acquire %ptr2   # foo:bar
B: load %ptr1

In the above example, thread T1 originates from a SPIRV program while thread T2 originates from a non-SPIRV program. Whether X can synchronize with Y is target defined. If X synchronizes with Y, then A happens before B (because A/X and Y/B are compatible).

Implementation Example

Consider the implementation of SPIRV NonPrivatePointer on a target where all memory operations are cached, and the entire cache is flushed or invalidated at a release or acquire respectively. A possible scheme is that when translating a SPIRV program, memory operations marked NonPrivatePointer should not be cached, and the cache contents should not be touched during an acquire and release operation.

This could be implemented using the tags that share the vulkan: prefix, as follows:

  • For memory operations:

    • Operations with vulkan:nonprivate should bypass the cache.

    • Operations with vulkan:private should be cached.

    • Operations that specify neither or both should conservatively bypass the cache to ensure correctness.

  • For synchronizing operations:

    • Operations with vulkan:nonprivate should not flush or invalidate the cache.

    • Operations with vulkan:private should flush or invalidate the cache.

    • Operations that specify neither or both should conservatively flush or invalidate the cache to ensure correctness.

Note

In such an implementation, dropping the metadata on an operation, while not affecting correctness, may have big performance implications. e.g. an operation bypasses the cache when it shouldn’t.

Memory Types

MMRAs may express the selective synchronization of different memory types.

As an example, a target may expose an sync-as:<N> tag to pass information about which address spaces are synchronized by the execution of a synchronizing operation.

Note

Address spaces are used here as a common example, but this concept can apply for other “memory types”. What “memory types” means here is up to the target.

# let 1 = global address space
# let 3 = local address space

Thread T1:
A: store %ptr1                                  # sync-as:1
B: store %ptr2                                  # sync-as:3
X: store atomic release ptr addrspace(0) %ptr3  # sync-as:3

Thread T2:
Y: load atomic acquire ptr addrspace(0) %ptr3   # sync-as:3
C: load %ptr2                                   # sync-as:3
D: load %ptr1                                   # sync-as:1

In the above figure, X and Y are atomic operations on a location in the global address space. If X synchronizes with Y, then B happens-before C in the local address space. But no such statement can be made about operations A and D, although they are peformed on a location in the global address space.

Implementation Example: Adding Address Space Information to Fences

Languages such as OpenCL C provide fence operations such as atomic_work_item_fence that can take an explicit address space to fence.

By default, LLVM has no means to carry that information in the IR, so the information is lost during lowering to LLVM IR. This means that targets such as AMDGPU have to conservatively emit instructions to fence all address spaces in all cases, which can have a noticeable performance impact in high-performance applications.

MMRAs may be used to preserve that information at the IR level, all the way through code generation. For example, a fence that only affects the global address space addrspace(1) may be lowered as

fence release # sync-as:1

and the target may use the presence of sync-as:1 to infer that it must only emit instruction to fence the global address space.

Note that as MMRAs are opt in, a fence that does not have MMRA metadata could still be lowered conservatively, so this optimization would only apply if the front-end emits the MMRA metadata on the fence instructions.

Additional Topics

Note

The following sections are informational.

Performance Impact

MMRAs are a way to capture optimization opportunities in the program. But when an operation mentions no tags or conflicting tags, the target may need to produce conservative code to ensure correctness at the cost of performance. This can happen in the following situations:

  1. When a target first introduces MMRAs, the frontend might not have been updated to emit them.

  2. An optimization may drop MMRA metadata.

  3. An optimization may add arbitrary tags to an operation.

Note that targets can always choose to ignore (or even drop) MMRAs and revert to the default behavior/codegen heuristics without affecting correctness.

Consequences of the Absence of happens-before

In the happens-before section, we defined how an happens-before relation between two instruction can be broken by leveraging compatibility between MMRAs. When the instructions are incompatible and there is no happens-before relation, we say that the instructions “do not have to be ordered relative to each other”.

“Ordering” in this context is a very broad term which covers both static and runtime aspects.

When there is no ordering constraint, we could statically reorder the instructions in an optimizer transform if the reordering does not break other constraints as single location coherence. Static reordering is one consequence of breaking happens-before, but is not the most interesting one.

Run-time consequences are more interesting. When there is an happens-before relation between instructions, the target has to emit synchronization code to ensure other threads will observe the effects of the instructions in the right order.

For instance, the target may have to wait for previous loads & stores to finish before starting a fence-release, or there may be a need to flush a memory cache before executing the next instruction. In the absence of happens-before, there is no such requirement and no waiting or flushing is required. This may noticeably speed up execution in some cases.

Combining Operations

If a pass can combine multiple memory or synchronizing operations into one, it needs to be able to combine MMRAs. One possible way to achieve this is by doing a prefix-wise union of the tag sets.

Let A and B be two tags set, and U be the prefix-wise union of A and B. For every unique tag prefix P present in A or B:

  • If either A or B has no tags with prefix P, no tags with prefix P are added to U.

  • If both A and B have at least one tag with prefix P, all tags with prefix P from both sets are added to U.

Passes should avoid aggressively combining MMRAs, as this can result in significant losses of information. While this cannot affect correctness, it may affect performance.

As a general rule of thumb, common passes such as SimplifyCFG that aggressively combine/reorder operations should only combine instructions that have identical sets of tags. Passes that combine less frequently, or that are well aware of the cost of combining the MMRAs can use the prefix-wise union described above.

Examples:

A: store release %ptr1  # foo:x, foo:y, bar:x
B: store release %ptr2  # foo:x, bar:y

# Unique prefixes P = [foo, bar]
# "foo:x" is common to A and B so it's added to U.
# "bar:x" != "bar:y" so it's not added to U.
U: store release %ptr3  # foo:x
A: store release %ptr1  # foo:x, foo:y
B: store release %ptr2  # foo:x, bux:y

# Unique prefixes P = [foo, bux]
# "foo:x" is common to A and B so it's added to U.
# No tags have the prefix "bux" in A.
U: store release %ptr3  # foo:x
A: store release %ptr1
B: store release %ptr2  # foo:x, bar:y

# Unique prefixes P = [foo, bar]
# No tags with "foo" or "bar" in A, so no tags added.
U: store release %ptr3