The LLVM Compiler Infrastructure Project

About
Program
Code of Conduct
Contact

Conference Date: April 14-15, 2026
Location: Clayton Hotel Burlington Road, Dublin, Ireland
Event Website

About

The EuroLLVM Developers' Meeting is a bi-annual gathering of the entire LLVM Project community. The conference is organized by the LLVM Foundation and many volunteers within the LLVM community. Developers and users of LLVM, Clang, and related subprojects will enjoy attending interesting talks, impromptu discussions, and networking with the many members of our community. Whether you are a new to the LLVM project or a long time member, there is something for each attendee.

What can you can expect at an LLVM Developers' Meeting?

Keynotes: Keynotes are 40-45 minute talks that provide an overview of a topic or project and often capture the history and impact of choices made and what that means for current and future development.
Technical Talks: These 20-30 minute talks cover all topics from core infrastructure talks, to project's using LLVM's infrastructure. Attendees will take away technical information that could be pertinent to their project or general interest.
Tutorials: Tutorials are 50-60 minute sessions that dive down deep into a technical topic. Expect in depth examples and explanations.
Lightning Talks: These are fast 5 minute talks that give you a taste of a project or topic. Attendees will hear a wide range of topics and probably leave wanting to learn more.
Quick Talks: Quick 10 minute talks that dive a bit deeper into a topic, but not as deep as a Technical Talk.
Student Technical Talks: Graduate or Undergraduate students present their work using LLVM.
Panels: Panel sessions are guided discussions about a specific topic. The panel consists of ~3 developers who discuss a topic through prepared questions from a moderator. The audience is also given the opportunity to ask questions of the panel.

Who attends?

Active developers of projects in the LLVM Umbrella (LLVM core, Clang, LLDB, libc++, compiler_rt, flang, lld, MLIR, etc).
Anyone interested in using these as part of another project.
Students and Researchers
Compiler, programming language, and runtime enthusiasts.
Those interested in using compiler and toolchain technology in novel and interesting ways.
Sponsors and partners utilizing LLVM technology in their products.

The LLVM Developers' Meeting strives to be the best conference to meet other LLVM developers and users.

For future announcements or questions: Please visit the LLVM Discourse forums. Most posts are in the Announcements or Community categories and tagged with eurollvm

Program

Keynotes

Capabilities Great and Small: CHERI, CHERIoT, and LLVM [ Slides ]
Speakers: Owen Anderson, presented by Edoardo Marangoni, Q&A featuring Owen Anderson and David Chisnall

Support for the CHERI capability architecture and its embedded derivative, CHERIoT, is being upstreamed to LLVM. This talk explores what that means for LLVM developers, which parts of the toolchain are affected, and how these changes may interact with existing frontend, optimizer, backend, and linker work. We’ll take a look at work to date, the status of upstreaming, and open problems where community involvement could help.

Keynote: The Testing Funnel: Validating LLVM at Scale [ Slides ]
Speaker: Reid Kleckner

LLVM is foundational software, so stakes for making changes to it are very high, and most downstream vendors of LLVM have elaborate testing pipelines to validate all of the properties they care about. However, because these private pipelines are often slow and disconnected from the community, detecting defects is expensive and leads to labor-intensive negotiations over misaligned technical requirements. To improve the experience for both contributors and downstreams, this talk shares lessons from 15 years of LLVM continuous integration experience to advocate for shared upstream CI infrastructure that "shifts left" on testing, to create a better, more harmonious experience for contributors and consumers alike.

Technical Talks

Fast-math flags: a bag of issues and a handful of solutions [ Slides ]
Speaker: Mikolag Pirog

LLVM's fast-math flags allow users to specify which floating-point transformations they want to see. Unfortunately, their semantics are both not entirely respected by LLVM and underspecified. This talk covers a number of fast-math issues, an overview of efforts to address them and gives an outlook for the future, more systemic approach to fast-math flags.

MLIR-iteration cycle goes brrr: defining ops and rewrites in Python [ Slides ]
Speaker: Rolf Morel

This is a tutorial on MLIR’s new Python bindings for defining (1) dialects and ops, via an embedded op-definition DSL, as well as (2) writing rewrites (passes, pattern rewrites, and transform ops) that integrate with the existing infrastructure. These features enable a new iteration cycle for developing MLIR compilers: we can now do rapid prototyping of dialects and rewrites in a high-level language without having a compiler in the loop!

Scaling Certified Instruction Selection For LLVM IR Through Bitblasting [ Slides ]
Speakers: Sarah Linh Kuhn, Luisa Cicolini

Instruction selection is responsible for turning high-level languages into efficient, reliable machine code. Yet, today’s LLVM backends often introduce subtle bugs through complex optimizing rewrites which are coupled with code generation passes. Fully verified backends like CompCert's avoid these issues at the cost of heavy, complex manual proofs. We present an LLVM instruction selector verified in Lean, which benefits from a small trusted base, strong automation, and relies on authoritative RISC-V semantics. Using Sail’s new Lean backend, we formalize the RISC-V ISA and automatically verify real LLVM instruction selection and optimization patterns, exploiting Lean’s bitvector library and its verified bitblaster. Our selector achieves performance comparable to LLVM’s GlobalISel (11.9% more cycles estimated with MCA, geomean) while providing machine-checked correctness. This demonstrates that practical, trustworthy verification can scale to modern, rapidly evolving compiler ecosystems.

The LLVM Release Process, a status update [ Slides ]
Speakers: Tobias Hieta, Cullen Rhodes, Douglas Yung

LLVM’s release process is an ever-evolving chaos machine. Since our last talk on it in 2023, we have shipped six releases and made a number of changes to how we plan, cut, and stabilize a release. This talk is a practical status update on what changed, why it changed, what works well today, and what still creates stress for release managers and contributors. I’ll also introduce our two new Release Managers and close with ideas for where the process should go next, with the goal of fewer surprises when “the release is coming.”

Toward A More Declarative InstCombine: Generalization & Parametric Bitvector Algorithms [ Slides ]
Speaker: Siddharth Bhat

LLVM contains thousands of bitwidth-dependent rewrites that are hard to maintain and reason about. We introduce new parametric bitvector algorithms that automatically generalize these rewrites across all widths. By applying a mixed unary–binary encoding and finite-state reasoning, our solver lifts concrete LLVM test cases into true width-independent identities, recovering parametric rewrites from LLVM's test suite that has fixed width rewrites. This moves LLVM toward a declarative InstCombine specification, where rewrite rules are uniform, provably correct, and mechanically derived.

Rust or CHERI? [ Slides ]
Speakers: Edoardo Marangoni

CHERI and Rust may, at first, appear as two mutually exclusive and clashing philosophies that want to solve the same problems. We claim that it is the opposite: Rust and CHERI (and CHERIoT, in particular) are complementary and work best when used together, as Rust provides compile-time guarantees for safe code whereas CHERI provides runtime guarantees for every fragment of unsafe code. In this talk we will show evidence to back our claim, presents recent work to bring production-quality Rust support to CHERIoT leveraging to the maturity of the CHERIoT port of LLVM and Rust’s strict provenance model (which aligns naturally with CHERI’s capabilities) and discuss our plans to engage with the Rust and LLVM communities.

Tracking Warnings at Scale: Extending Clang Diagnostics to Support Issue Baselining and Backslide Prevention [ Slides ]
Speaker: Dave Bartolomeo

Compiler warnings are an effective tool for developers to catch potential security, compatibility, and correctness problems — however a challenging problem is ensuring that specific warnings are actually enabled across codebases and that diagnostics are addressed over time. This talk describes how we have extended Clang's diagnostic infrastructure to enable developers to build a policy enforcement system that supports issue baselining (i.e., reporting only newly introduced warnings) and backslide prevention (informing project maintainers when a warning has accidentally been turned off).

Lighthouse: infrastructure for end-to-end MLIR-compilers and testing [ Slides ]
Speaker: Renato Golin

Last year, a new project was added to the LLVM family: Lighthouse. Its main purpose to guide the development and testing of MLIR based compilers. Like the LLVM test-suite, it should be a common ground for validating upstream assumptions about code, IR, dialects. At the same time, it enables building specific compilers in minutes, using the evolving Python API and MLIR’s Python bindings. In this talk, we’ll show the project main structure, including its components and how to use them to build a simple compiler. We’ll then show the infrastructure that uses those components to validate assumptions in MLIR (canonical forms, invariants, applicability of transforms and passes, correctness tests, etc), and how you can create your own on top of that. Finally, we’ll provide a number of pipeline examples, going from generic PyTorch models to performant execution on various targets.

What Compiler Implementers and Language Designers Need to Know About Pointer Authentication [ Slides ]
Speaker: Oliver Hunt

In this talk, which is targeted at compiler implementers and languages designers, we will give a high-level overview of the security benefits of pointer authentication and why LLVM-based compilers should adopt it.

Effective Clang Tidy [ Slides ]
Speaker: Tom James

This talk explores strategies that have worked well when writing custom clang-tidy checks for Simcenter STAR-CCM+, a complex real-world codebase. It is intended to be intermediate level, so some familiarity with clang-tidy is assumed. In particular, participants should be familiar with writing Clang AST matchers.

CppInterOp: Interactive C++ as a Service and Advanced Language Interoperability [ Slides ]
Speaker: Aaron Jomy

CppInterOp provides compiler-as-a-service capabilities using Clang-REPL, enabling dynamic languages to interoperate with C++. This talk covers its technical architecture, including runtime template instantiation, C++ overload resolution, and JitCall - a lightweight runtime wrapper generator. We demonstrate how CppInterOp's API, built on Clang's AST, enables practical Python-C++ interoperability, including use cases like invoking CUDA kernels from Python and cross-language inheritance between Python and C++ classes.

clang-reforge: Automatic whole-codebase source code rewriting tool for security hardening [ Slides ]
Speaker: Jan Korous

We're building clang-reforge, an automatic source code rewriting tool that enables adoption of bounds-safety in large existing C++ codebases. clang-reforge analyzes source code to identify unsafe pointer operations and capture pointer flow. It replaces built-in pointers with bounds-safe types in pointer flow segments from allocation sites to unsafe operations, such as pointer arithmetic. We have a working internal prototype and we're now rebuilding it on top of clang's Scalable Static Analysis Framework.

Finding Injection Vulnerabilities: Improvements of the Taint Analysis of the Clang Static Analyzer [ Slides ]
Speaker: Daniel Krupp

Clang Static Analyzer provides a configurable taint analysis checker optin.taint.GenericTaint and a few specialized taint checkers (in the optin.taint group) which can identify potential improper input validation security vulnerabilities. Although promising, the current implementation is still in its early stages, and its limitations prevent it from efficient industrial use. We were able to identify key issues after taking measurements on both the synthetic Juliet test suite and real-world projects. Based on these findings, we propose some improvements to the current solution, which we prototyped and evaluated.

Bounds Checking with the Clang Static Analyzer: Improvements and Insights [ Slides ]
Speaker: Donat Nagy

This talk presents my improvements in the checker `security.ArrayBound`, which become ready for production use in April 2025. In addition to the results, I will also showcase an educational issue that had plagued the older prototype-quality bounds validator checker; and briefly speak about the planned generalizations that would extend my improvements to other related checkers.

Floating-Point Types in MLIR: Infrastructure, New Types and Dialect Design [ Slides ]
Speaker: Matthias Springer

This technical talk summarizes recent improvements to MLIR’s floating-point type infrastructure, focusing on how to represent and lower the rapidly growing zoo of low-precision and block-scaled formats used in modern ML workloads. It introduces the "FloatTypeInterface", explains the interaction with LLVM’s "APFloat" and "fltSemantics", and shows step-by-step how to add new floating-point types, from extending "APFloat" to defining lowering rules and dialect design for "special" FP types across high-level and low-level dialects. The talk also covers the new "arith-to-apfloat" infrastructure for software emulation of low-precision FP arithmetics on CPUs, discusses current limitations of adding FP types without patching LLVM, and outlines future directions for more extensible, vendor-friendly floating-point type systems in MLIR.

Optimising small AArch64 cores: stories from the trenches [ Slides ]
Speaker: Ties Stuij

Optimizing LLVM for AArch64 has in the past mostly concentrated on big cores which prioritized performance over efficiency. As the market has become more interested in smaller AArch64 cores outside of mobile phones, we have been putting effort into optimizing LLVM for these more constrained cores, which flirt with the embedded space. In this talk we discuss why LLVM has left performance off the table for these smaller cores, and we will give some examples of how we improved this. We will also touch on how benchmarking for these cores differs from benchmarking the bigger AArch64 cores.

Writing a Formal Execution and Memory Model for Execution Synchronization Primitives on AMD GPUs. [ Slides ]
Speaker: Pierre van Houtryve

Overview of ongoing efforts to develop and document a formal execution model for the execution synchronization primitives (barriers) of the AMDGPU backend, and how they integrate with the LLVM and AMDGPU target-specific memory models. The talk will also cover the motivation for this work, the benefits for users and developers, and the challenges we faced and are still facing.

Adding Nullability Checking and Annotations to Many Millions of Lines of Code [ Slides ]
Speaker: Jan Voung

At Google, our team has been working on reducing null pointer dereference crashes in a huge and diverse C++ codebase by: (a) adopting the Clang nullability annotations and (b) implementing a flow-sensitive intra-procedural dataflow analysis in ClangTidy that verifies that code adheres to the contracts of the annotations. However, to get the most coverage, we need to introduce the annotations to millions of lines of code. To assist with that, we've developed an inter-TU annotation inference tool, and added inferred annotations through a "large-scale change". This talk introduces how the ClangTidy verification and inference tools work, but also discusses the practical experience and the challenges we faced attempting to infer the annotations in "legacy" C++.

rocMLIR: High-Performance ML Compilation for AMD GPUs with MLIR [ Slides ]
Speaker: Pablo Antonio Martinez

This talk presents rocMLIR, a kernel generator for AMD GPUs using MLIR. We present the compilation flow from high-level IR (TOSA and Linalg dialects) to low-level code generation using downstream and upstream MLIR dialects (AMDGPU and ROCDL). We focus on implementing MI300X/MI350X features in MLIR, including double-rate MFMAs, DirectToLDS, and support for MXFP4/FP4 data types. We also cover application-specific optimizations such as SplitK for GEMMs and KV Cache for attention, along with fusion strategies.

LLVM Foundation Updates [ Slides ]
Speaker: LLVM Foundation Board of Directors

The LLVM Foundation Board of Directors gives an update on status of the LLVM Foundation.

Tutorials

All About Alias Analysis [ Slides ]
Speaker: Nikita Popov

This tutorial introduces "alias analysis", which is the fundamental building block for most memory optimizations. The tutorial covers both high-level concepts and usage of alias analysis, as well as important aspects of the alias analysis implementation.

Creating a runtime using the LLVM_ENABLE_RUNTIMES system [ Slides ]
Speaker: Michael Kruse

Few will need to create a new runtime library for LLVM, and it is not actually the goal of this tutorial. We intend to illustrate the inner workings and conventions of the LLVM build system. Currently, our runtimes (compiler-rt, libc++, openmp, ...) build code is still mostly based on the patterns from when each runtime had its own SVN repository, had to be able to be built independently, and therefore all runtimes implement their own boilerplate. Eventually, they should converge instead of each runtime introducing their own solutions to their build problems.

In addition to an introduction to the history of the LLVM_ENABLE_RUNTIMES system and its rationale, what better way to establish a common ground for our runtimes than creating a new runtime that does not suffer from the burden of breaking existing builds? Let's build a template runtime from scratch in 15 steps:

1. Register with the LLVM Build System

2. Build a Library Artifact

3. Create an Program Using the Library

4. Build Modes

5. CMake Cache Files

6. Artifact Output Location

7. Installation

8. Shared and Static Libraries

9. Regression Testing

10. Unittests

11. Sphinx Docs

12. Doxygen Docs

13. Cross-Compilation

14. Accelerator Offloading

15. Depending on Other LLVM Libraries

Some basic knowledge of CMake is expected. The primary intended target audience is LLVM runtime-library and downstream maintainers who have to keep their build working and regularly have to adapt CMake code. Also join if you just want to know what happens when you run `ninja check-compiler-rt`. In this tutorial we will create a new runtime to exemplify the working of the often misunderstood LLVM_ENABLE_RUNTIMES system. Few contributors will ever feel the need to create a new runtime from scratch, but understanding it helps identifying and fixing build issues for configuration that no CI is actively testing. The talk will go into details of how multi-stage bootstrapping, cross-compilation, multilib, GPU-offloading, multi-compiler support, and inter-dependent runtimes are designed to work. We also explore configuration options and which corners could still be improved.

HIVM: MLIR Dialect Stack for Ascend NPU Compilation [ Slides ]
Speakers: Vladislav Tarasov, presented by Hugo Trachino, Q&A with Vladislav Tarasov and Andrey Bokhanko

Huawei Ascend NPUs combine DaVinci AI cores with a rich memory/synchronization hierarchy and, on newer generations, a SIMD+SIMT execution model, making performance-oriented compilation challenging. We present HIVM, an open-source family of MLIR dialects that lowers PyTorch/Inductor -> Triton -> MLIR (HIVM) -> LLVM IR, enabling Ascend-specific optimizations such as layout assignment/propagation, vector intrinsic selection/legalization, and explicit DMA/transfer scheduling with synchronization. The pipeline ultimately targets the BiSheng LLVM-based backend to produce executable code for Ascend chips. The talk walks step-by-step through the key IR levels and transformation passes, serving as a practical baseline for developers building MLIR toolchains for Ascend.

Hands-on Using Clang as a library [ Slides ]
Speaker: Aaron Jomy

This tutorial teaches how to use Clang as a library to build a C++ REPL with incremental compilation by splitting translation units into partial compilation steps. We demonstrate how to create a compiler-as-a-service that enables programmatic instantiation and invocation of C++ template functions from client code. Finally, we integrate these components with the Python runtime to examine practical cross-language interoperability.

Implementing C++26 std::simd with LLVM: A Layered, Compiler-First Approach [ Slides ]
Speaker: Daniel Towner

The C++26 standard introduces std::simd for portable data parallelism. We present a complete implementation built on LLVM using a layered architecture: a minimal base layer interfaces with LLVM's SIMD capabilities, while higher-level features build on this foundation. Multi-target support emerges naturally by building upon LLVM's own architecture support, and a dispatch tag system isolates target-specific code to minimal locations. This design has proven particularly effective for x86, cleanly handling the many ISA variants (SSE, AVX, AVX-512, etc.), and should extend naturally to any SIMD target LLVM supports. As the compiler's support improves, the library improves too. We'll share our architectural patterns, performance results, and areas where LLVM could be enhanced to enable better code generation. The authors have been involved in the C++ standardisation process for this the library and our goal is to release it as open source.

Quick Talks

Challenges in binary rewriting: enabling BOLT to optimize CFI-hardened binaries [ Slides ]
Speaker: Gergely Balint

BOLT is increasingly adopted as it can provide additional performance uplift on top of LTO+PGO optimized binaries. At the same time, AArch64 binaries are commonly deployed with Control Flow Integrity features (PAC and BTI) enabled. This creates a practical challenge: until recently, BOLT couldn’t optimize such binaries. It would either crash, or worse: emit incorrect binaries, crashing at runtime. The talk introduces our work on enabling these features, and describes key engineering challenges, including how implementing such features differs from their compiler counterparts.

Anatomy of Tiling and Vectorizing linalg.pack and linalg.unpack [ Slides ]
Speaker: Ege Beysel

linalg.pack and linalg.unpack enable explicit data-tiling and layout transformations in MLIR, but their use in data-tiled compilation flows raises subtle questions about alignment, legality, and vectorization. This talk explores how these operations interact with MLIR’s tiling and vectorization infrastructure, focusing on alignment constraints, masking semantics, and performance implications. Using an end-to-end data-tiled matmul example, the talk highlights practical guidance and performance gains for developers building high-performance tensor pipelines.

Leveraging BOLT to improve data prefetching for AArch64 binaries [ Slides ]
Speakers: Shanzhi Chen, Wei Wei

The post-link optimizer BOLT has provided a bunch of binary-level optimizations which mostly focus on code layout and effectively reduce front-end stalls in the Top-Down performance analysis view. In addition, we found that BOLT could also be a handy tool to emit prefetching instructions in binaries and to alleviate back-end stalls resulting from cache misses. In this talk, we will cover how to leverage BOLT to improve data prefetching for AArch64 binaries. A new pass is added to BOLT to provide prefetching support for different variations of AArch64 load instructions. And the existing dataflow analysis framework in BOLT is also enabled for AArch64 to provide register liveness information for prefetching addresses. In addition, the ARM SPE-based profiling technique is employed to provide valuable insights into memory operations and to complete the overall profile-guided data prefetching optimization in BOLT.

Mojo Compile-time Interpreter in MLIR [ Slides ]
Speaker: Weiwei Chen

Mojo supports powerful compile time meta-programming that helps to unlock performance on heterogeneous accelerators by enabling generic abstractions across different targets. Almost any runtime Mojo code can be moved to compile-time to trade for runtime performance while constants evaluated at compile-time can be materialized into runtime values. In this talk, we will dive into the architecture of Mojo’s MLIR based compile-time interpreter which is at the core of materializing generic code into concrete form during compilation. We'll share implementation insights, performance challenges, and lessons learned, while fostering discussion on building meta-programming compilers with MLIR.

Self-Contained, Target-Specific GEMM Code Generation in MLIR [ Slides ]
Speakers: Adam Siemieniuk, Renato Golin, Rolf Morel

We present an MLIR-based approach for generating target-specific, highly optimized GEMM kernels that is fully self-contained within the LLVM/MLIR compiler infrastructure and does not rely on external libraries such as LIBXSMM, Intel MKL, or oneDNN. Building on prior work in the TPP-MLIR compiler, we upstream FP32, BF16, and INT8 code-generation techniques into MLIR and propose a transform schedule that combines existing and newly upstreamed passes to lower matmul, batch matmul, and batch-reduce matmul operations into optimized kernels, achieving performance competitive with the LIBXSMM library. As a future work, we plan to leverage auto-tuning techniques to select efficient tile sizes based on hardware characteristics and problem dimensions.

LLVM JIT — Upcoming Challenges and Opportunities [ Slides ]
Speaker: Lang Hames

LLVM's JIT can now run arbitrary real-world applications, as demonstrated by Xcode's Previews feature. Despite this success, enormous opportunities for improvement remain—especially in performance, memory consumption, tooling, and optimization. This talk will describe the most promising opportunities in these areas, sketch a roadmap for tackling them, and discuss how the community can collaborate to accelerate progress.

Apple GPU Support in Mojo [ Slides ]
Speaker: Kolya Panchenko

Mojo's powerful compile-time meta-programming system and unified syntax make it exceptionally well-suited for heterogeneous accelerator programming, with proven success on CUDA and ROCm platforms. As many developers work on Apple products (such as iMacs, MacBooks, Mac Studios etc) adding Apple GPU support to Mojo is a natural next step. However, unlike CUDA and ROCm, which have open-source compiler toolchains in upstream LLVM, Apple's GPU compiler stack is proprietary. We will present chosen design and discuss challenges of Apple GPU’s integration into Mojo.

Attack of the Clones: Speeding Up Coroutine Compilation [ Slides ]
Speaker: Artem Pianykh

Compiling coroutines with full debug information shouldn't be dramatically slower than with line tables — but we found CoroSplitPass running over 100x slower, adding minutes to compilation time. The cause traced back to LLVM's function cloning, where processing debug info metadata was O(Module) rather than O(Function). This talk covers the investigation, the upstream patches, and how the fix ended up benefiting all users of the function cloning API.

Tracking Operations Through MLIR Pass Pipelines Using Source Locations [ Slides ]
Speaker: Florian Walbroel

This talk presents a source-location-driven approach for tracking the evolution of MLIR operations across deep pass pipelines. Motivated by real-world optimization work on quantized convolutions in IREE, it shows how preserved source locations can be used to reconstruct operation lineage across IR stages, enabling systematic reasoning about transformation effects. The talk surveys source location semantics under common MLIR transformations and demonstrates a reusable Python-based tool that supports interactive, cross-stage operation tracking for improved debuggability in large MLIR programs.

Lightning Talks

Using MLIR Linalg Category Ops for Smarter Compilation [ Slides ]
Speaker: Javed Absar

Linalg dialect of MLIR recently added `category ops` as an intermediate abstraction between the two existing forms: named ops and generic ops. A mechanism (-linalg-morph-ops=_-to-_) to move between these forms was also introduced. When something new appears, adoption is often slow due to existing workflows and lack of awareness. This talk will: (a) Motivate the reason for this additional representation (b) Explain what now exists and how to benefit from it (c) Show how category ops can help certain compilation flows.

Highlighting function names in LLDB backtraces [ Slides ]
Speaker: Michael Buch

C++ backtraces tend to be hard to read because function names are hidden amongst many layers of namespaces, template arguments and function parameters. In the debugger, a user often wants to simply get a quick overview of the function callstack. But traditionally this has been difficult to decipher. To improve readability, LLDB recently gained the ability to selectively hide or format various parts of C++ function names. This talk describes how we implemented this by extending the LLVM demangler and how other language plugins can take advantage of this infrastructure.

Engineering a Hybrid Rust and MLIR Toolchain for AI Agents [ Slides ]
Speaker: Miguel Cárdenas

Developing a compiler for Agentic AI workloads presents a unique challenge because the runtime demands the safety and ergonomics of Rust while the optimization pipeline requires the mature infrastructure of MLIR. This talk presents the architecture of a toolchain designed to leverage the strengths of both ecosystems. We discuss how we structured a hybrid build system where Rust drives the compilation process and defines runtime semantics, while C++ manages the core MLIR dialect and transformations. The session covers the practical engineering required to bridge these worlds, from orchestrating CMake and Cargo to managing the boundary between Rust runtime metadata and MLIR operation definitions. We share lessons learned about the complexity of linking, the tradeoffs of code generation, and the reality of maintaining a custom dialect across the language barrier.

Compact Unwind Information for ELF [ Slides ]
Speaker: Alexis Engelke

ELF unwind information is encoded as DWARF bytecode for most architectures, which results in a large size overhead and is complex to interpret, precluding its use in e.g. tracing profilers. This talk will present a compact format for asynchronous unwind info that can accurately represent almost all functions generated by LLVM -O3 for x86-64 with a substantially smaller size. We will also discuss portability to other architectures, differences from other unwinding/tracing formats, and interoperability with other toolchains.

Coverage directed codebase reduction for the procedural generation of LIT tests [ Slides ]
Speaker: Freya Fewtrell

The LIT test suite, whilst extensive, still leaves many parts of the LLVM codebase uncovered. Compiling large real-world C++ code from Sony Interactive Entertainment’s downstream integration test suite routinely exercises code paths that the upstream LIT suite never reaches. To shift this testing leftwards, we have built a tool that uses coverage-directed reduction to automatically turn such high-coverage sources into minimal, self-contained LLVM IR fragments suitable for inclusion as upstream LIT tests. This lightning talk describes how the tool works, the challenges encountered when trying to automate test reduction at scale, and whether increasing coverage alone is sufficient motivation for new upstream LIT tests.

Improving DemandedBits Analysis for Shift Operations in LLVM [ Slides ]
Speaker: Panagiotis Karouzakis

The DemandedBits analysis is utilized in some optimization passes, such as vectorization and dead code elimination; a similar analysis is employed in InstCombine. We improve DemandedBits reasoning for all basic shift operations, enabling more precise bit-level information propagation. Our improvements reduced code size, enabled additional loop-invariant code motion, and lead to more instruction-level simplifications.

STRTAB Hash & Slash: Reducing STRTAB size by hashing its entries [ Slides ]
Speaker: Vy Nguyen

The string table(STRTAB) accounts for a significant portion of object file overhead at Google, with long mangled names, such as those from Protos, frequently reaching 35% of the total file size. Optimizing this space is the key to faster builds and leaner binaries. In this talk, we propose an approach to reduce the string table through hashing its entries, demonstrating a reduction in overall binary size

Extending Lifetime Safety: Verification of [[clang:noescape]] annotation [ Slides ]
Speaker: Abhinav Pradeep

Clang features an intra-procedural, flow-sensitive lifetime analysis designed to catch temporal safety errors like use-after-free, use-after-scope, and use-after-return. The talk presents work on leveraging this analysis to verify [[clang::noescape]] annotations. This effort focuses on applying the "Origins and Loans" model to enforce memory safety guarantees that were previously unverified by the compiler.

Reproducible Large-Workload Recipes for BOLT with Nixpkgs [ Slides ]
Speaker: Peter Waller

BOLT needs large, realistic binaries for integration testing, but distributing binaries directly creates governance and supply-chain problems and makes it hard to reproduce the exact toolchains and build flags that shape BOLT behaviour. We show our pinned, auditable Nixpkgs build recipes (including emit-relocs Chromium) and discuss how this enables teams to reproduce identical workloads, generate provenance/SBOMs, and compare results consistently across machines and CI.

What’s new in LLDB on Windows [ Slides ]
Speaker: Charles Zablit

We've been improving LLDB and LLDB-DAP on Windows, adding key features like STDIO support, Unicode handling, better Python integration, and switching to an open-source PDB implementation to bring Windows debugging up to par with other platforms.

Student Technical Talks

IR2Vec Python Bindings: Native Integration for Pythonic, ML Workflows [ Slides ]
Speaker: Nishant Sachdeva

IR2Vec is a widely adopted framework for generating vector embeddings from LLVM IR to enable machine-learning–driven compiler optimizations. This work introduces native Python bindings for IR2Vec using pybind11, enabling seamless and efficient integration with Python-based ML ecosystems such as PyTorch and TensorFlow. By replacing subprocess-based CLI invocation with a direct programmatic interface, the bindings eliminate process overhead, provide inbuilt C++–Python type conversion, and support robust exception handling. The implementation and usage are demonstrated through practical embedding-generation examples. The project is currently under active development for upstream integration into the LLVM monorepo, with multiple pull requests already accepted, and is available in beta form via TestPyPI.

GPU optimizations, and where Rust knows more than LLVM [ Slides ]
Speaker: Marcelo Domínguez

In this talk we compare the performance of Rust's `std:offload` interface on various benchmarks with C++ OpenMP, CUDA, and ROCm implementations. We show the impact of a new set of LLVM-IR optimizations, and the performance difference between "safe" and "unsafe" Rust. We briefly introduce two aliasing models that are under consideration in the Rust community, and how higher-level Rust alias info can be combined with our lower-level LLVM-IR opt pass.

Accelerating Pass Order Auto-tuning via Profile-Guided Cost Modeling [ Slides ]
Speaker: Bingyu Gao, presented by Wei Wei

LLVM pass ordering auto-tuning can outperform standard -O3, but it is often hindered by an enormous search space and the high overhead of hundreds of dynamic measurements. This talk presents an efficient auto-tuning framework that minimizes expensive measurements using a profile-guided relative cost model and calibrated beam search. Evaluation on cBench shows an average 10.46% speedup over -O3 with only 20 dynamic measurements, significantly accelerating the search for optimal pass sequences.

Panels

Clang and LLVM in Modern Gaming Platforms [ Slides ]
Speakers: Nicolai Hähnle, Tobias Hieta, Felix Klinge, Chris Bieneman, Jeremy Morse

A moderated panel with AMD, Intel, Sony, and Microsoft will examine how Clang/LLVM power real-world game production, from platform SDKs and build pipelines to shader compilers and security tooling and identify where upstream collaboration can have the biggest impact.

Posters

MemorySSA-Based Reaching Definitions for IR2Vec Flow-Aware Embeddings [ Slides ]
Speaker: Nishant Sachdeva

IR2Vec is a widely adopted framework for generating program embeddings from LLVM IR, supporting both Symbolic and Flow-Aware inference modes. The Flow-Aware mode captures data dependencies by computing reaching definitions over memory operations, but its original implementation relies on a custom control-flow graph traversal, effectively reimplementing analyses already available in LLVM. This work replaces the custom reaching-definitions logic with LLVM’s MemorySSA framework, yielding more semantically rich embeddings while significantly simplifying the implementation. By leveraging MemorySSA’s def-use chains, the new approach correctly handles complex memory behaviors including pointer indirection, loop-carried dependencies, structured data access, and dynamic allocation. Through detailed IR case studies, we demonstrate how the MemorySSA-based design eliminates spurious dependencies and enables richer, more accurate flow-aware embeddings.

Bridging Runtime Gaps in LLVM: Vendor-Agnostic Dispatch for ML Kernels [ Slides ]
Speaker: S Akash

While LLVM and MLIR have revolutionized portable code generation for machine learning, a significant "runtime gap" remains: the inability to dynamically introspect hardware and dispatch kernels across heterogeneous NVIDIA, AMD, and CPU environments without per-target recompilation. We explore the architecture of vendor-agnostic rerouting, and existing works like SYCL alongside lightweight, header-only approaches.

Engineering a Hybrid Rust and MLIR Toolchain for AI Agents [ Slides ]
Speaker: Miguel Cárdenas

Floating-Point Datapaths in CIRCT via FloPoCo AST Export and flopoco-arith-to-comb lowering [ Slides ]
Speaker: Louis Ledoux

This work bridges the gap between floating-point arithmetic in MLIR and circuit-level hardware representations in CIRCT. While many accelerators are dominated by floating-point datapaths, existing flows either defer floating-point realization to HLS-oriented dialects or rely on external generators, limiting compiler visibility at the stage where hardware-specific trade-offs are most naturally expressed. The approach restructures FloPoCo to expose arithmetic hardware as explicit combinational graphs and introduces a new MLIR lowering pass that progressively translates floating-point regions into CIRCT-compatible datapaths. Multiple lowering strategies are supported, ranging from IEEE-754–preserving operator mappings to fused and specialized datapaths that reduce rounding, area, and numerical error. As a concrete result, a floating-point kernel extracted from a PyTorch LLaMA layer is compiled end-to-end to a 1.5 mm² chip in a 130 nm process node.

Adding Compilation Metadata To Binaries To Make Disassembly Decidable [ Slides ]
Speaker: Daniel Engel

Once a program has been compiled into a binary, it is nigh impossible to lift it back into a higher-level representation that is well-suited for analyses, instrumentation, and patching. Disassemblers run into undecidable problems such as "which bytes are instructions?" or "how are the data sections structured?". Producing a representation that can be recompiled correctly is even harder. Standard debugging formats such as DWARF do not contain enough information to make this task possible. However, at some point during the compilation process, the compiler knew all this information. In this talk, we explore which information can be extracted from the standard ELF format, which information clang can already emit, and which remains inaccessible.

A CPU Autotuning Pipeline for MLIR-IREE [ Slides ]
Speakers: Chun Lin Huang, Jenq Kuen Lee

We present an autotuning pipeline for IREE’s LLVM-CPU backend that enables Transform Dialect–driven, compile-time multi-level tiling with CPU-specific constraints. In single-dispatch experiments, our constrained tuning flow achieves up to 20% speedup. We also outline next steps toward joint tuning of per-layer sub-FP8 precision variants and tiling using an XGBoost-guided, budgeted evaluation strategy under a quality floor.

A stride towards generating segment accesses in RVV [ Slides ]
Speaker: Athanasios Kastoras

We present an unconventional way of emitting RVV segment access instructions based on a loop-vectorize pass that emits strided accesses instead of gathers and scatters. We implement a pass that groups consecutive strided accesses, represented as VP intrinsics, and, if feasible, lowers them to RVV intrinsics of segment instructions. Then, we reuse the analysis part of this pass to cost groups of recipes as a single segment instruction, which enables the vectorization of loops that otherwise were going to be deemed unprofitable.

Non-destructive PDL Rewriting for Multi-Level Equality Saturation [ Slides ]
Speakers: Jules Merckx, Sasha Lopoukhine

We present our work on representing e-graphs directly in MLIR IR. We extend the PDL dialect with new operations to allow for non-destructive rewriting. We implement our approach both in xDSL (Python) and MLIR, which makes equality saturation at multiple levels of abstraction accessible to compiler developers. Finally, we show that this system can be used to achieve comparable results to Herbie, a state-of-the-art tool to optimize floating-point expressions for accuracy.

Confirming the Impact of Warning Message Quality in the Clang Static Analyzer [ Slides ]
Speaker: Kristóf Umann

The Clang Static Analyzer has enjoyed almost 2 decades of industrial adoption, with more and more focus on its usability. Prior research indicated that, among other things, warning message quality is a leading source of dissatisfaction with static analysis tools, which developers of the Clang Static Analyzer tackled but never conclusively proved the benefits of these changes. This talk fills this gap by presenting a method for measuring warning message quality through a human experiment. We sent out surveys in three stages to fine-tune our methodology, with our final one receiving 64 responses from regular static analysis users. We were able to confirm many long-suspected but never confirmed theories circulating among the Clang Static Analyzer contributors: the value of summarizing functions, trimming bug reports, and simplifying low-level code. Based on these results, we also created and landed a bug report improvement, which is available since Clang 19.0.0.

MLIR Workshop

CUDA Tile IR: Lessons from a Tile-Centric CUDA Dialect for MLIR [ Slides ]
Speakers: Matthias Springer, Lorenzo Chelini

This talk presents CUDA Tile IR, a tile-based CUDA dialect for MLIR, focusing on the design trade-offs that differentiate it from upstream dialects such as arith, tensor, memref, linalg, and async. Using concrete examples (e.g., matrix multiplication, TMA–friendly load/store patterns, and token-based ordering), the talk contrasts CUDA Tile IR’s type system, operations, and overall dialect design with existing MLIR abstractions, and highlights practical lessons for developers designing their own vendor-specific dialects.

ASTER: MLIR-Based Assembly Tooling and Representations
Speakers: Nicolas Vasilache, Fabian Mora Corder, Kunwar Grover

Today, achieving peak performance on modern AI accelerators often requires control over low-level hardware features. This trend is expected to further exacerbate as more asynchronicity and dynamism are built first-class in the hardware. As Dark Silicon trends continue, hardware is expected to expose coarser-grain primitives and coarser-grain programming models must be used (e.g. with warp/wave specialization, the low-level programming model increasingly resembles MPI/MIMD-style parallelism but complexified by low-level hardware constraints such as instruction issue ports or warp/wave scheduling and specialization). AMD’s open approach to hardware ISA documentation creates a unique opportunity to build world-class assembly tooling in the open, making AMDGPU ASM accessible to a broader community as well as higher-level tools, while maintaining expert-level control. To reap the benefits of modern and future HW we believe an order of magnitude better low-level tooling is needed. Aster builds the foundations for highly-controllable assembly production and pushes the boundaries of what’s possible in low‑level performance tooling.

Auto-tuning MLIR schedules: a case study targeting Intel GPUs [ Slides ]
Speakers: Tuomas Karna, Rolf Morel

We present an end-to-end MLIR schedule for ML workloads and the auto-tuning thereof. The schedule targets Intel Battlemage GPUs and takes kernels all the way from Linalg-dialect ingress to LLVM IR. Alongside scheduling ops for, e.g., tiling, the schedule encodes its auto-tuning problem through tuneable “knob” ops and encodes the many hardware-imposed constraints on parameters as ops as well. We present a harness that performs automatic optimization of such auto-tuneable schedules and show how our approach - using upstream MLIR and without user-specified tiling - can achieve performance comparable to state-of-the-art frameworks.

Progressive Arithmetic Lowering from Tensor Kernels to Synthesizable Datapaths [ Slides ]
Speakers: Louis Ledoux, Pierre Cochard, Florent de Dinechin

This work presents an end-to-end MLIR-based compilation flow that lowers high-level machine-learning and DSP kernels to explicit combinational and sequential datapaths suitable for CIRCT and RTL export. The flow treats arithmetic as a multi-level concern, progressively exposing numeric intent, structured control, and hardware structure from tensor programs down to circuit-level IR. It integrates real-number expression recovery, configurable floating- and fixed-point lowering, and direct construction of circuit-level datapaths using CIRCT-compatible representations. The approach enables floating-point and specialized arithmetic to remain first-class up to the circuit boundary, supporting fine-grained trade-offs in precision, performance, and area. The flow has been validated on real silicon, including a taped-out DSP design using the GF180MCU open-source PDK and an end-to-end compilation of a PyTorch LLaMA layer into a synthesized Sky130 process-node.

Multi Stage Sequential Reinforcement Learning Environment for MLIR Meta-Optimization
Speaker: Prakanth Thilakaraj

Machine learning–guided compiler optimization has recently attracted significant attention, with applications such as optimization pass prediction in LLVM IR and heuristic tuning in domain-specific language (DSL) compilers. In this work, we extend this paradigm to the Multi-Level Intermediate Representation (MLIR), a compiler infrastructure for DSL development that is widely used in AI compiler stacks. We present MLIRCompilerEnv, a reinforcement learning (RL) environment for predicting optimization passes and pass options in MLIR-based compiler pipelines, and evaluate it in the context of AI compilation. Our design targets the linalg, affine, and scf dialect levels, while remaining extensible to other MLIR-based compiler stacks for meta-optimization. We formulate the compiler optimization passes as a multi-stage RL problem, where each dialect stage is associated with a dedicated agent responsible for predicting optimization passes and their corresponding options. A message-passing graph neural network serves as a shared backbone across agents extracting structural features from the intermediate representations (IR). Agents operate sequentially to mirror the ordering of stages in the MLIR lowering pipeline, and end-to-end execution runtime is used as the reward signal to guide learning. To support structured learning over compiler IRs, we introduce a graph construction framework that fuses control-flow and data-flow information into a unified graph representation, covering MLIR dialects including linalg, scf, affine, arith, and math. We illustrate how this design naturally generalizes beyond AI compilers and facilitates extension to other MLIR-based compiler stacks such as stencil compilers. Initial results demonstrate the feasibility of the approach and we discuss performance trade-offs and compilation overhead introduced by the learning-based framework. The framework is currently being further developed to include matching optimization passes to the MLIR based IREE compiler for a comprehensive like-for-like comparison and analysis.

From Graphs to Warps: Semantic Interoperability Across MLIR Abstraction Levels [ Slides ]
Speaker: Nachiketa Gargi

Modern ML compiler stacks span multiple semantic abstraction levels, from graph-level program representations to tile-based computation and SIMT kernels. Composing these layers reliably remains challenging in practice, particularly in GPU compilers, where execution semantics, memory hierarchy, and parallelism are explicit and tightly coupled to performance. This talk characterizes recurring interoperability failures that arise when crossing abstraction boundaries, such as loss of semantic information during lowering, conflicting ownership of layout and scheduling decisions, and non-composable cost models. Using examples drawn from the MLIR ecosystem we illustrate why these interoperability failures are fundamental rather than incidental. The goal of this talk is to frame semantic interoperability as a first-class problem in MLIR-based compiler design and to outline open research questions.

Beyond Constants: Mojo’s Attribute-Based Expression System [ Slides ]
Speaker: Billy Zhu

The Mojo programming language leverages MLIR’s attribute system to represent compile-time expressions as attribute trees, enabling parametric IR where operations are “staged” with expression attributes that evaluate to constants during the compilation pipeline. Our expression language is built on a typed lambda calculus using de Bruijn indices for bound variables, with dialects contributing their own types and operators to create a rich, extensible system. To evaluate these expressions, we developed a custom AttrTypeReplacer with depth-aware caching (handling de Bruijn index-based references) and context-aware replacement (providing access to outer symbol tables). This talk presents our representation and our evaluator design as a case study for programming language developers using MLIR for high-level IR, demonstrating how MLIR’s attribute system can be so much more than just for “constants”.

Tamagoyaki: MLIR-Native Equality Saturation [ Slides ]
Speakers: Sasha Lopoukhine, Jules Merckx, Sam Coward, Jianyi Cheng, Bjorn De Sutter, Tobias Grosser

Equality saturation (EqSat) is an expression rewriting technique based on efficiently representing equivalent expressions. In recent years, it has successfully been applied in many different domains. Applying EqSat on MLIR code, however, requires additional effort for converting rewrite patterns and IR to formats that are understood by external tools. We present Tamagoyaki, an MLIR (and xDSL) framework to represent equivalences directly in your IR. Building on pdl, we apply rewrite patterns on this representation. By encoding equivalences explicitly in IR, we open the door to including more complex compiler passes in EqSat. As a case study, we replicate the core procedure of Herbie, a floating point accuracy optimizer, directly in MLIR. Additionally we show how Tamagoyaki can easily be applied to other MLIR projects such as CIRCT.

MLIR-RAJA: Bridging AI Models and HPC Performance Portability [ Slides ]
Speakers: Tai-Hsiang Peng, Hung-Ming Lai, Chun-Lin Huang, Wei-Shen Huang, Jenq-Kuen Lee

RAJA, originally developed by Lawrence Livermore National Lab, is a C++ template library widely used in High-Performance Computing (HPC). It ensures that code can run efficiently on different hardware, such as CPUs and GPUs, without needing to be rewritten. At the same time, AI models are becoming essential tools for modern scientific discovery. To connect these two worlds, we introduce MLIR-RAJA, a MLIR dialect that links high-level AI frameworks with the performance benefits of RAJA. Our work defines a specific RAJA Dialect in MLIR to build an automated end-to-end flow, capable of translating AI models directly into optimized RAJA C++. This solution eliminates the barrier between AI development and HPC execution, enabling the automatic generation of portable code for AI Models. Experimental results demonstrate that our structure-aware optimizations achieve significant improvement in sequential execution over baseline. Furthermore, these performance gains extend to OpenMP-enabled parallel execution. Finally, we flow MLIR RAJA dialects into LLVM to utilize a variety of backends to support MLIR RAJA computations.

Code of Conduct

The LLVM Foundation is dedicated to providing an inclusive and safe experience for everyone. We do not tolerate harassment of participants in any form. By registering for this event, we expect you to have read and agree to the LLVM Code of Conduct.

Contact

To contact the organizer, email events@llvm.org

2026 EuroLLVM Developers' Meeting